本期的 15 篇论文如下: [00:23] 🌏 Crowdsource, Crawl, or Generate? Creating SEA-VL, a Multicultural Vision-Language Dataset for Southeast Asia(众包、爬取还是生成?创建东南亚视觉语言数据集SEA-VL) [01:04] 🧠 LMM-R1: Empowering 3B LMMs with Strong Reasoning Abilities Through Two-Stage Rule-Based RL(LMM-R1:通过两阶段基于规则的强化学习赋予3B参数大模态模型强大的推理能力) [01:43] 🎵 YuE: Scaling Open Foundation Models for Long-Form Music Generation(YuE:扩展开放基础模型用于长篇音乐生成) [02:17] 👤 Uni$\textbf{F}^2$ace: Fine-grained Face Understanding and Generation with Unified Multimodal Models(UniF²ace:基于统一多模态模型的细粒度人脸理解和生成) [02:59] 🎥 MagicInfinite: Generating Infinite Talking Videos with Your Words and Voice(MagicInfinite:用你的文字和声音生成无限对话视频) [03:42] 🧠 SegAgent: Exploring Pixel Understanding Capabilities in MLLMs by Imitating Human Annotator Trajectories(SegAgent:通过模仿人类标注者轨迹探索多模态大模型的像素理解能力) [04:19] 🌐 Seedream 2.0: A Native Chinese-English Bilingual Image Generation Foundation Model(Seedream 2.0:一种中英双语图像生成基础模型) [05:03] 🌐 Gemini Embedding: Generalizable Embeddings from Gemini(双子座嵌入:从双子座模型中获得可泛化的嵌入) [05:45] 🧠 Implicit Reasoning in Transformers is Reasoning through Shortcuts(Transformer中的隐式推理是通过捷径实现的) [06:21] 🌟 LightGen: Efficient Image Generation through Knowledge Distillation and Direct Preference Optimization(LightGen:通过知识蒸馏和直接偏好优化实现高效图像生成) [07:06] 🎥 Tuning-Free Multi-Event Long Video Generation via Synchronized Coupled Sampling(无需调参的多事件长视频生成通过同步耦合采样) [07:44] 🧠 Optimizing Test-Time Compute via Meta Reinforcement Fine-Tuning(通过元强化微调优化测试时计算) [08:30] 🌐 OmniMamba: Efficient and Unified Multimodal Understanding and Generation via State Space Models(OmniMamba:基于线性架构的高效统一多模态理解和生成模型) [09:14] 🧠 CineBrain: A Large-Scale Multi-Modal Brain Dataset During Naturalistic Audiovisual Narrative Processing(CineBrain:自然视听叙事处理中的大规模多模态脑数据集) [09:52] 🎥 Video Action Differencing(视频动作差异分析) 【关注我们】 您还可以在以下平台找到我们,获得播客内容以外更多信息 小红书: AI速递
本期的 11 篇论文如下: [00:25] 🤖 Feature-Level Insights into Artificial Text Detection with Sparse Autoencoders(基于稀疏自编码器的人工文本检测特征分析) [01:00] 🧠 SEAP: Training-free Sparse Expert Activation Pruning Unlock the Brainpower of Large Language Models(SEAP:无训练的稀疏专家激活剪枝解锁大规模语言模型的脑力) [01:43] 🧠 MM-Eureka: Exploring Visual Aha Moment with Rule-based Large-scale Reinforcement Learning(MM-Eureka:基于规则的大规模强化学习在视觉顿悟中的探索) [02:27] 📝 Taking Notes Brings Focus? Towards Multi-Turn Multimodal Dialogue Learning(记笔记能带来专注吗?面向多轮多模态对话学习) [03:09] 🎬 Automated Movie Generation via Multi-Agent CoT Planning(基于多智能体链式思维规划的自动化电影生成) [03:44] 🔒 FedRand: Enhancing Privacy in Federated Learning with Randomized LoRA Subparameter Updates(FedRand:通过随机化LoRA子参数更新增强联邦学习中的隐私保护) [04:18] 🔥 DistiLLM-2: A Contrastive Approach Boosts the Distillation of LLMs(DistiLLM-2:一种对比方法提升大语言模型蒸馏效果) [04:53] 🚀 EasyControl: Adding Efficient and Flexible Control for Diffusion Transformer(EasyControl:为扩散Transformer添加高效灵活的控制) [05:38] 🛠 FEA-Bench: A Benchmark for Evaluating Repository-Level Code Generation for Feature Implementation(FEA-Bench:评估功能实现仓库级代码生成基准) [06:15] 🚗 AlphaDrive: Unleashing the Power of VLMs in Autonomous Driving via Reinforcement Learning and Reasoning(AlphaDrive:通过强化学习和推理释放VLMs在自动驾驶中的潜力) [07:01] 📚 SurveyForge: On the Outline Heuristics, Memory-Driven Generation, and Multi-dimensional Evaluation for Automated Survey Writing(SurveyForge:关于大纲启发式、记忆驱动生成和多维度评估的自动化综述写作) 【关注我们】 您还可以在以下平台找到我们,获得播客内容以外更多信息 小红书: AI速递
本期的 20 篇论文如下: [00:19] 🌐 Unified Reward Model for Multimodal Understanding and Generation(多模态理解和生成的统一奖励模型) [01:04] 🇷 RuCCoD: Towards Automated ICD Coding in Russian(RuCCoD:面向俄语自动化的ICD编码研究) [01:41] 🌍 EuroBERT: Scaling Multilingual Encoders for European Languages(EuroBERT:扩展欧洲语言的多语言编码器) [02:28] 🗣 S2S-Arena, Evaluating Speech2Speech Protocols on Instruction Following with Paralinguistic Information(S2S-Arena:评估语音到语音协议在指令跟随中的副语言信息) [03:08] 🧠 Sketch-of-Thought: Efficient LLM Reasoning with Adaptive Cognitive-Inspired Sketching(思维草图:结合认知启发草图的高效LLM推理) [03:47] 🧠 Forgetting Transformer: Softmax Attention with a Forget Gate(遗忘Transformer:带遗忘门的Softmax注意力机制) [04:28] 🧠 R1-Searcher: Incentivizing the Search Capability in LLMs via Reinforcement Learning(R1-Searcher:通过强化学习激励LLMs的搜索能力) [05:19] 🎥 VideoPainter: Any-length Video Inpainting and Editing with Plug-and-Play Context Control(VideoPainter:任意长度视频修复与编辑的即插即用上下文控制) [06:04] 🎭 R1-Omni: Explainable Omni-Multimodal Emotion Recognition with Reinforcing Learning(R1-Omni:基于强化学习的可解释全模态情感识别) [06:50] 🎥 TrajectoryCrafter: Redirecting Camera Trajectory for Monocular Videos via Diffusion Models(TrajectoryCrafter:通过扩散模型重定向单目视频的相机轨迹) [07:26] 🌊 ProReflow: Progressive Reflow with Decomposed Velocity(ProReflow:渐进式重流与分解速度) [08:11] 🤖 BEHAVIOR Robot Suite: Streamlining Real-World Whole-Body Manipulation for Everyday Household Activities(BEHAVIOR机器人套件:简化日常家庭活动的全身操作) [08:50] 🧠 An Empirical Study on Eliciting and Improving R1-like Reasoning Models(关于启发和提升类似R1推理模型的实证研究) [09:27] 🧠 Linear-MoE: Linear Sequence Modeling Meets Mixture-of-Experts(线性-专家混合模型:线性序列建模与专家混合模型的结合) [10:13] 🧠 TinyR1-32B-Preview: Boosting Accuracy with Branch-Merge Distillation(TinyR1-32B-Preview:通过分支-合并蒸馏提升准确性) [10:56] 🧑 LONGCODEU: Benchmarking Long-Context Language Models on Long Code Understanding(LONGCODEU:评估长上下文语言模型在长代码理解中的表现) [11:41] 🔄 Learning from Failures in Multi-Attempt Reinforcement Learning(从失败中学习:多尝试强化学习) [12:20] 🔍 SAGE: A Framework of Precise Retrieval for RAG(SAGE:RAG精准检索框架) [13:01] 🧠 R1-Zero's "Aha Moment" in Visual Reasoning on a 2B Non-SFT Model(R1-Zero在2B非SFT模型上的视觉推理中的“顿悟时刻”) [13:39] 🤖 Know You First and Be You Better: Modeling Human-Like User Simulators via Implicit Profiles(初次了解你并更好地成为你:通过隐式用户画像建模人类对话模拟器) 【关注我们】 您还可以在以下平台找到我们,获得播客内容以外更多信息 小红书: AI速递
本期的 5 篇论文如下: [00:35] TOP1(🔥64) | 🧠 Phi-4-Mini Technical Report: Compact yet Powerful Multimodal Language Models via Mixture-of-LoRAs(Phi-4-Mini技术报告:通过LoRA混合的多模态语言模型实现紧凑且强大的性能) [02:30] TOP2(🔥58) | 🛠 START: Self-taught Reasoner with Tools(自教工具集成推理器) [04:36] TOP3(🔥57) | 🧠 Visual-RFT: Visual Reinforcement Fine-Tuning(视觉强化微调:视觉强化微调) [06:40] TOP4(🔥52) | 🌍 Babel: Open Multilingual Large Language Models Serving Over 90% of Global Speakers(巴别塔:服务于全球90%以上人口的开源多语言大型语言模型) [09:03] TOP5(🔥51) | 📊 Predictive Data Selection: The Data That Predicts Is the Data That Teaches(预测性数据选择:预测数据即为教学数据) 【关注我们】 您还可以在以下平台找到我们,获得播客内容以外更多信息 小红书: AI速递
本期的 18 篇论文如下: [00:21] 🛠 START: Self-taught Reasoner with Tools(自教工具集成推理器) [01:03] 👓 EgoLife: Towards Egocentric Life Assistant(EgoLife:面向自我中心的生活助手) [01:39] 📞 LLM as a Broken Telephone: Iterative Generation Distorts Information(大型语言模型作为失真传话:迭代生成对信息的影响) [02:14] 🧠 LINGOLY-TOO: Disentangling Memorisation from Reasoning with Linguistic Templatisation and Orthographic Obfuscation(LINGOLY-TOO:通过语言模板化和正字法混淆分离记忆与推理) [02:51] 🔄 HybridNorm: Towards Stable and Efficient Transformer Training via Hybrid Normalization(混合归一化:通过混合归一化实现稳定高效的Transformer训练) [03:34] 🎥 Token-Efficient Long Video Understanding for Multimodal LLMs(高效的多模态大语言模型长视频理解) [04:14] 🧠 FuseChat-3.0: Preference Optimization Meets Heterogeneous Model Fusion(FuseChat-3.0:偏好优化与异构模型融合) [04:58] 🎮 PokéChamp: an Expert-level Minimax Language Agent(宝可冠军:一个专家级的Minimax语言代理) [05:42] 🎧 Audio Flamingo 2: An Audio-Language Model with Long-Audio Understanding and Expert Reasoning Abilities(音频火烈鸟2:具有长音频理解和专家推理能力的音频语言模型) [06:21] 📊 IFIR: A Comprehensive Benchmark for Evaluating Instruction-Following in Expert-Domain Information Retrieval(IFIR:评估专家领域信息检索中指令遵循的综合基准) [07:02] 📊 Identifying Sensitive Weights via Post-quantization Integral(通过后量化积分识别敏感权重) [07:46] 📏 L$^2$M: Mutual Information Scaling Law for Long-Context Language Modeling(L²M:长上下文语言模型的互信息缩放定律) [08:22] 🎥 The Best of Both Worlds: Integrating Language Models and Diffusion Models for Video Generation(双剑合璧:结合语言模型与扩散模型进行视频生成) [09:05] 🤖 Lost in Literalism: How Supervised Training Shapes Translationese in LLMs(迷失于字面主义:监督训练如何塑造LLMs中的翻译体) [09:48] 🚀 Dedicated Feedback and Edit Models Empower Inference-Time Scaling for Open-Ended General-Domain Tasks(专用反馈和编辑模型增强开放式通用领域任务的推理时扩展) [10:33] 🧠 Union of Experts: Adapting Hierarchical Routing to Equivalently Decomposed Transformer(专家联盟:将分层路由适应等价分解的Transformer) [11:13] 🤖 Combining Flow Matching and Transformers for Efficient Solution of Bayesian Inverse Problems(结合流匹配与Transformer实现高效的贝叶斯反问题求解) [11:54] 🚫 Understanding and Predicting Derailment in Toxic Conversations on GitHub(理解与预测GitHub上毒性对话中的脱轨现象) 【关注我们】 您还可以在以下平台找到我们,获得播客内容以外更多信息 小红书: AI速递
本期的 17 篇论文如下: [00:24] 🌍 Babel: Open Multilingual Large Language Models Serving Over 90% of Global Speakers(巴别塔:服务于全球90%以上人口的开源多语言大型语言模型) [01:11] 🧠 ABC: Achieving Better Control of Multimodal Embeddings using VLMs(ABC:使用视觉语言模型实现多模态嵌入的更好控制) [01:47] 🩺 Enhancing Abnormality Grounding for Vision Language Models with Knowledge Descriptions(利用知识描述增强视觉语言模型在异常定位中的性能) [02:24] 🎥 GEN3C: 3D-Informed World-Consistent Video Generation with Precise Camera Control(GEN3C:具备精确相机控制和时间上3D一致性的生成视频模型) [03:02] 🧠 KodCode: A Diverse, Challenging, and Verifiable Synthetic Dataset for Coding(KodCode:一个多样、具有挑战性且可验证的合成代码数据集) [03:43] 🧠 CrowdSelect: Synthetic Instruction Data Selection with Multi-LLM Wisdom(CrowdSelect:基于多LLM智慧的合成指令数据选择) [04:26] 📄 QE4PE: Word-level Quality Estimation for Human Post-Editing(QE4PE:面向人工译后编辑的词语级质量评估) [05:08] 🗣 Exploring Rewriting Approaches for Different Conversational Tasks(探索不同对话任务的重写方法) [05:43] 🧠 Process-based Self-Rewarding Language Models(基于过程的自奖励语言模型) [06:23] 🤖 Fine-Tuning Small Language Models for Domain-Specific AI: An Edge AI Perspective(针对特定领域的AI进行小型语言模型微调:边缘AI视角) [07:00] 🌐 Mixture of Structural-and-Textual Retrieval over Text-rich Graph Knowledge Bases(基于文本丰富图知识库的结构与文本混合检索) [07:40] 🛠 Retrieval Models Aren't Tool-Savvy: Benchmarking Tool Retrieval for Large Language Models(检索模型不擅长工具使用:大型语言模型工具检索基准测试) [08:22] 🤖 FLAME: A Federated Learning Benchmark for Robotic Manipulation(FLAME: 机器人操作的联邦学习基准) [09:01] 🛡 Benchmarking Large Language Models for Multi-Language Software Vulnerability Detection(多语言软件漏洞检测的大语言模型基准测试) [09:53] 🤖 CognitiveDrone: A VLA Model and Evaluation Benchmark for Real-Time Cognitive Task Solving and Reasoning in UAVs(认知无人机:一种用于无人机实时认知任务解决和推理的VLA模型及评估基准) [10:36] 🚗 Interact, Instruct to Improve: A LLM-Driven Parallel Actor-Reasoner Framework for Enhancing Autonomous Vehicle Interactions(交互、指导以提升:一种用于增强自动驾驶车辆交互的LLM驱动并行行动者-推理者框架) [11:14] 🇨 SwiLTra-Bench: The Swiss Legal Translation Benchmark(SwiLTra-Bench:瑞士法律翻译基准) 【关注我们】 您还可以在以下平台找到我们,获得播客内容以外更多信息 小红书: AI速递
本期的 18 篇论文如下: [00:21] 🚀 MPO: Boosting LLM Agents with Meta Plan Optimization(MPO:通过元计划优化提升LLM代理) [00:59] 🤖 Mask-DPO: Generalizable Fine-grained Factuality Alignment of LLMs(Mask-DPO:大语言模型的可泛化细粒度事实性对齐) [01:43] 🧩 LADDER: Self-Improving LLMs Through Recursive Problem Decomposition(LADDER:通过递归问题分解实现自我改进的LLMs) [02:26] 📚 Wikipedia in the Era of LLMs: Evolution and Risks(大语言模型时代的维基百科:演变与风险) [03:06] 🚀 PipeOffload: Improving Scalability of Pipeline Parallelism with Memory Optimization(PipeOffload:通过内存优化提升流水线并行的可扩展性) [03:50] 🔄 Iterative Value Function Optimization for Guided Decoding(迭代价值函数优化指导解码) [04:33] 🤖 MultiAgentBench: Evaluating the Collaboration and Competition of LLM agents(多智能体基准:评估LLM智能体的协作与竞争) [05:19] ⚡ FR-Spec: Accelerating Large-Vocabulary Language Models via Frequency-Ranked Speculative Sampling(FR-Spec:通过频率排序的推测采样加速大词汇量语言模型) [05:58] 🧐 SemViQA: A Semantic Question Answering System for Vietnamese Information Fact-Checking(SemViQA:越南信息事实核查的语义问答系统) [06:45] 🖼 RectifiedHR: Enable Efficient High-Resolution Image Generation via Energy Rectification(RectifiedHR:通过能量校正实现高效的高分辨率图像生成) [07:18] 🌐 UFO: A Unified Approach to Fine-grained Visual Perception via Open-ended Language Interface(UFO:通过开放式语言接口实现细粒度视觉感知统一方法) [07:56] 🧠 ATLaS: Agent Tuning via Learning Critical Steps(通过学习关键步骤进行代理调优) [08:41] 🤖 Language Models can Self-Improve at State-Value Estimation for Better Search(语言模型能够在状态值估计中自我改进以提升搜索效果) [09:24] 🔧 IterPref: Focal Preference Learning for Code Generation via Iterative Debugging(迭代调试优化的代码生成偏好学习) [10:15] 🔬 SPIDER: A Comprehensive Multi-Organ Supervised Pathology Dataset and Baseline Models(SPIDER:综合多器官监督病理数据集与基线模型) [10:56] 🌐 Improve Representation for Imbalanced Regression through Geometric Constraints(通过几何约束改进不平衡回归的表示) [11:35] 🎯 Q-Eval-100K: Evaluating Visual Quality and Alignment Level for Text-to-Vision Content(Q-Eval-100K:评估文本到视觉内容的质量与对齐水平) [12:16] 🤖 AppAgentX: Evolving GUI Agents as Proficient Smartphone Users(AppAgentX:演进出熟练使用智能手机的图形用户界面代理) 【关注我们】 您还可以在以下平台找到我们,获得播客内容以外更多信息 小红书: AI速递
本期的 20 篇论文如下: [00:21] 🧠 Visual-RFT: Visual Reinforcement Fine-Tuning(视觉强化微调:视觉强化微调) [01:05] 🌐 Difix3D+: Improving 3D Reconstructions with Single-Step Diffusion Models(Difix3D+:通过单步扩散模型改进三维重建) [01:43] 🧠 Phi-4-Mini Technical Report: Compact yet Powerful Multimodal Language Models via Mixture-of-LoRAs(Phi-4-Mini技术报告:通过LoRA混合的多模态语言模型实现紧凑且强大的性能) [02:25] 🎥 OneRec: Unifying Retrieve and Rank with Generative Recommender and Iterative Preference Alignment(OneRec:统一生成推荐与迭代偏好对齐) [03:04] 🤔 When an LLM is apprehensive about its answers -- and when its uncertainty is justified(当LLM对其答案感到不安时——以及何时其不确定性是有道理的) [03:46] 🎵 DiffRhythm: Blazingly Fast and Embarrassingly Simple End-to-End Full-Length Song Generation with Latent Diffusion(DiffRhythm:基于潜在扩散的超快速且极度简单的端到端全长歌曲生成) [04:28] 🐯 Liger: Linearizing Large Language Models to Gated Recurrent Structures(Liger:将大型语言模型线性化为门控递归结构) [05:05] 📊 Qilin: A Multimodal Information Retrieval Dataset with APP-level User Sessions(麒麟:一个包含应用级用户会话的多模态信息检索数据集) [05:50] 🧠 Cognitive Behaviors that Enable Self-Improving Reasoners, or, Four Habits of Highly Effective STaRs(实现自我改进推理者的认知行为,或,高效STaRs的四个习惯) [06:28] ⚡ Speculative Ad-hoc Querying(投机性即席查询) [07:15] ⚡ DuoDecoding: Hardware-aware Heterogeneous Speculative Decoding with Dynamic Multi-Sequence Drafting(双解码:硬件感知的异构推测解码与动态多序列草稿) [07:52] 🎨 Kiss3DGen: Repurposing Image Diffusion Models for 3D Asset Generation(Kiss3DGen: repurposing Image Diffusion Models for 3D Asset Generation) [08:31] 🧠 Word Form Matters: LLMs' Semantic Reconstruction under Typoglycemia(词形重要:LLM在字谜现象下的语义重构) [09:10] ⚡ From Hours to Minutes: Lossless Acceleration of Ultra Long Sequence Generation up to 100K Tokens(从小时到分钟:超长序列生成的高效加速,最高可达100K tokens) [09:47] 🔍 Large-Scale Data Selection for Instruction Tuning(大规模数据选择用于指令微调) [10:26] 🌐 SampleMix: A Sample-wise Pre-training Data Mixing Strategey by Coordinating Data Quality and Diversity(SampleMix:一种协调数据质量和多样性的样本级预训练数据混合策略) [11:01] 🤖 CodeArena: A Collective Evaluation Platform for LLM Code Generation(CodeArena:面向LLM代码生成的大规模评估平台) [11:47] 🎥 VideoUFO: A Million-Scale User-Focused Dataset for Text-to-Video Generation(视频UFO:用于文本到视频生成的大规模用户聚焦数据集) [12:42] 🎙 PodAgent: A Comprehensive Framework for Podcast Generation(PodAgent:播客生成的综合框架) [13:18] 🏠 Unposed Sparse Views Room Layout Reconstruction in the Age of Pretrain Model(无姿态稀疏视角房间布局重建在预训练模型时代的应用) 【关注我们】 您还可以在以下平台找到我们,获得播客内容以外更多信息 小红书: AI速递
本期的 10 篇论文如下: [00:20] 🌲 DeepSolution: Boosting Complex Engineering Solution Design via Tree-based Exploration and Bi-point Thinking(深度解决方案:通过基于树的探索与双点思维提升复杂工程解决方案设计) [00:55] ✍ Chain of Draft: Thinking Faster by Writing Less(草稿链:通过减少书写提高思考速度) [01:39] 🧠 ViDoRAG: Visual Document Retrieval-Augmented Generation via Dynamic Iterative Reasoning Agents(ViDoRAG:基于动态迭代推理代理的视觉文档检索增强生成) [02:20] 🧠 SoS1: O1 and R1-Like Reasoning LLMs are Sum-of-Square Solvers(SoS1:O1 和 R1 类推理 LLM 是平方和求解器) [03:09] 🧠 Optimal Brain Apoptosis(最优脑凋亡) [03:51] 🧠 Tell me why: Visual foundation models as self-explainable classifiers(告诉我为什么:视觉基础模型作为自解释分类器) [04:31] 🤖 Sim-to-Real Reinforcement Learning for Vision-Based Dexterous Manipulation on Humanoids(基于视觉的类人灵巧操作的仿真到现实强化学习) [05:09] ⚡ LiteASR: Efficient Automatic Speech Recognition with Low-Rank Approximation(LiteASR:基于低秩近似的有效自动语音识别) [05:47] 🎥 HAIC: Improving Human Action Understanding and Generation with Better Captions for Multi-modal Large Language Models(HAIC:利用更好的字幕提升多模态大语言模型的人类行为理解和生成) [06:26] 🥬 LettuceDetect: A Hallucination Detection Framework for RAG Applications(LettuceDetect:用于RAG应用的幻觉检测框架) 【关注我们】 您还可以在以下平台找到我们,获得播客内容以外更多信息 小红书: AI速递
本期的 10 篇论文如下: [00:39] TOP1(🔥196) | 🤖 SmolLM2: When Smol Goes Big -- Data-Centric Training of a Small Language Model(SmolLM2:当小型模型走向大型化——以数据为中心的小型语言模型训练) [02:32] TOP2(🔥183) | 🎥 OmniHuman-1: Rethinking the Scaling-Up of One-Stage Conditioned Human Animation Models(OmniHuman-1:重新思考一阶段条件人类动画模型的扩展) [05:02] TOP3(🔥182) | 🦜 The Stochastic Parrot on LLM's Shoulder: A Summative Assessment of Physical Concept Understanding(随机鹦鹉在大语言模型肩上:物理概念理解的总结性评估) [06:41] TOP4(🔥167) | 🧠 MLGym: A New Framework and Benchmark for Advancing AI Research Agents(MLGym:推进AI研究代理的新框架与基准) [09:03] TOP5(🔥152) | 🌐 Qwen2.5-VL Technical Report(Qwen2.5-VL 技术报告) [11:48] TOP6(🔥152) | 🔍 LLM-Microscope: Uncovering the Hidden Role of Punctuation in Context Memory of Transformers(LLM显微镜:揭示标点符号在Transformer上下文记忆中的隐藏作用) [13:41] TOP7(🔥142) | 🚀 InfiniteHiP: Extending Language Model Context Up to 3 Million Tokens on a Single GPU(InfiniteHiP:在单个GPU上扩展语言模型上下文至300万 tokens) [16:06] TOP8(🔥140) | 🤔 Can 1B LLM Surpass 405B LLM? Rethinking Compute-Optimal Test-Time Scaling(10亿参数LLM能否超越4050亿参数LLM?重新思考计算最优的测试时缩放) [18:40] TOP9(🔥137) | ⚡ Native Sparse Attention: Hardware-Aligned and Natively Trainable Sparse Attention(原生稀疏注意力:硬件对齐与原生可训练的稀疏注意力) [20:46] TOP10(🔥125) | 💼 Expect the Unexpected: FailSafe Long Context QA for Finance(预料之外:金融领域长上下文问答的FailSafe) 【关注我们】 您还可以在以下平台找到我们,获得播客内容以外更多信息 小红书: AI速递
本期的 5 篇论文如下: [00:50] TOP1(🔥152) | 🔍 LLM-Microscope: Uncovering the Hidden Role of Punctuation in Context Memory of Transformers(LLM显微镜:揭示标点符号在Transformer上下文记忆中的隐藏作用) [03:08] TOP2(🔥89) | 📚 SurveyX: Academic Survey Automation via Large Language Models(SurveyX 基于大型语言模型的学术调查自动化系统) [05:42] TOP3(🔥65) | 🎥 VideoGrain: Modulating Space-Time Attention for Multi-grained Video Editing(视频粒度:调节时空注意力实现多粒度视频编辑) [07:24] TOP4(🔥64) | 📖 Thus Spake Long-Context Large Language Model(长上下文大语言模型如是说) [09:35] TOP5(🔥61) | 🤖 OmniAlign-V: Towards Enhanced Alignment of MLLMs with Human Preference(OmniAlign-V:迈向多模态大语言模型与人类偏好增强对齐) 【关注我们】 您还可以在以下平台找到我们,获得播客内容以外更多信息 小红书: AI速递
本期的 19 篇论文如下: [00:23] 🧠 Self-rewarding correction for mathematical reasoning(自我奖励的数学推理校正) [01:03] 🧠 MedVLM-R1: Incentivizing Medical Reasoning Capability of Vision-Language Models (VLMs) via Reinforcement Learning(MedVLM-R1:通过强化学习激励视觉语言模型的医疗推理能力) [01:53] 🧠 R2-T2: Re-Routing in Test-Time for Multimodal Mixture-of-Experts(R2-T2:测试时重路由在多模态专家混合模型中的应用) [02:34] 🧬 LongRoPE2: Near-Lossless LLM Context Window Scaling(LongRoPE2:近乎无损的LLM上下文窗口扩展) [03:11] 🧠 FINEREASON: Evaluating and Improving LLMs' Deliberate Reasoning through Reflective Puzzle Solving(FINEREASON:通过反思性谜题解决评估和改进大语言模型的深思熟虑推理) [04:02] 🤖 CODESYNC: Synchronizing Large Language Models with Dynamic Code Evolution at Scale(CODESYNC:大规模动态代码演化与大型语言模型同步) [04:48] 🚀 Lean and Mean: Decoupled Value Policy Optimization with Global Value Guidance(精简与高效:基于全局价值引导的解耦价值策略优化) [05:33] 🧩 UniTok: A Unified Tokenizer for Visual Generation and Understanding(UniTok:面向视觉生成与理解的统一分词器) [06:12] 🚀 NeoBERT: A Next-Generation BERT(NeoBERT:下一代BERT) [06:47] 🌀 FlexiDiT: Your Diffusion Transformer Can Easily Generate High-Quality Samples with Less Compute(FlexiDiT:让你的扩散Transformer轻松生成高质量样本,计算量更少) [07:30] 🛠 SoRFT: Issue Resolving with Subtask-oriented Reinforced Fine-Tuning(SoRFT:面向子任务的强化微调问题解决方法) [08:07] 🤖 Building Interactable Replicas of Complex Articulated Objects via Gaussian Splatting(基于高斯样条构建复杂 articulated 物体的交互式副本) [08:45] 🎨 Multimodal Representation Alignment for Image Generation: Text-Image Interleaved Control Is Easier Than You Think(多模态表示对齐用于图像生成:文本-图像交错控制比你想象的更简单) [09:30] 🎥 Mobius: Text to Seamless Looping Video Generation via Latent Shift(Mobius:通过潜在位移从文本生成无缝循环视频) [10:08] 🛡 Guardians of the Agentic System: Preventing Many Shots Jailbreak with Agentic System(代理系统守护者:通过代理系统防止多次越狱) [10:49] 🤖 R1-T1: Fully Incentivizing Translation Capability in LLMs via Reasoning Learning(通过推理学习全面激励大语言模型中的翻译能力) [11:29] 🧠 On Relation-Specific Neurons in Large Language Models(关于大型语言模型中的关系特定神经元) [12:05] 🔄 Training Consistency Models with Variational Noise Coupling(基于变分噪声耦合的训练一致性模型) [12:46] ⚡ Efficient Gaussian Splatting for Monocular Dynamic Scene Rendering via Sparse Time-Variant Attribute Modeling(通过稀疏时变属性建模实现单目动态场景渲染的高效高斯光栅化) 【关注我们】 您还可以在以下平台找到我们,获得播客内容以外更多信息 小红书: AI速递
与播客爱好者一起交流
播放列表还是空的
去找些喜欢的节目添加进来吧