节目列表: HuggingFace 每日AI论文速递 - EarsOnMe

2026.02.17 | 查询锚定用户画像；量子原生数据库

【赞助商】通勤路上就听AI每周谈。AI每周谈，每周带你回顾上周AI大事传送门 🔗https://www.xiaoyuzhoufm.com/podcast/688a34636f5a275f1cba40fd 【目录】本期的 15 篇论文如下： [00:29] 🧠 Query as Anchor: Scenario-Adaptive User Representation via Large Language Model（查询作为锚点：基于大型语言模型的场景自适应用户表征） [01:14] ⚛ Qute: Towards Quantum-Native Database（Qute：迈向量子原生数据库） [01:59] 🧠 InnoEval: On Research Idea Evaluation as a Knowledge-Grounded, Multi-Perspective Reasoning Problem（InnoEval：将研究思想评估视为知识驱动、多视角推理问题） [03:05] 🔍 REDSearcher: A Scalable and Cost-Efficient Framework for Long-Horizon Search Agents（REDSearcher：一种可扩展且经济高效的长视野搜索智能体框架） [03:56] 🚀 BitDance: Scaling Autoregressive Generative Models with Binary Tokens（BitDance：使用二进制令牌扩展自回归生成模型） [04:38] 🧠 Experiential Reinforcement Learning（经验性强化学习） [05:24] 🧠 Embed-RL: Reinforcement Learning for Reasoning-Driven Multimodal Embeddings（Embed-RL：基于强化学习的推理驱动多模态嵌入方法） [06:21] 🧩 UniWeTok: An Unified Binary Tokenizer with Codebook Size $\mathit{2^{128}}$ for Unified Multimodal Large Language Model（UniWeTok：一种用于统一多模态大语言模型的、具有$\mathit{2^{128}}$码本大小的统一二进制分词器） [07:13] 🔍 BrowseComp-$V^3$: A Visual, Vertical, and Verifiable Benchmark for Multimodal Browsing Agents（BrowseComp-V³：面向多模态浏览代理的视觉、垂直与可验证基准） [08:18] 🧠 LaViDa-R1: Advancing Reasoning for Unified Multimodal Diffusion Language Models（LaViDa-R1：推进统一多模态扩散语言模型的推理能力） [09:02] 🗣 Conversational Image Segmentation: Grounding Abstract Concepts with Scalable Supervision（对话式图像分割：通过可扩展监督将抽象概念落地） [10:00] 🧠 Nanbeige4.1-3B: A Small General Model that Reasons, Aligns, and Acts（Nanbeige4.1-3B：一个能够推理、对齐与行动的小型通用模型） [10:49] 🎨 FireRed-Image-Edit-1.0 Techinical Report（FireRed-图像编辑-1.0 技术报告） [11:26] 🧬 Data Darwinism Part I: Unlocking the Value of Scientific Data for Pre-training（数据达尔文主义第一部分：释放科学数据在预训练中的价值） [12:04] 🌐 WebWorld: A Large-Scale World Model for Web Agent Training（WebWorld：用于网络智能体训练的大规模世界模型）【关注我们】您还可以在以下平台找到我们，获得播客内容以外更多信息小红书: AI速递

13分钟

79

1个月前

2026.02.16 | 特征激活补数据；区域蒸馏藏放大

HuggingFace 每日AI论文速递

【赞助商】通勤路上就听AI每周谈。AI每周谈，每周带你回顾上周AI大事传送门 🔗https://www.xiaoyuzhoufm.com/podcast/688a34636f5a275f1cba40fd 【目录】本期的 15 篇论文如下： [00:30] 🧠 Less is Enough: Synthesizing Diverse Data in Feature Space of LLMs（少即是够：在大型语言模型特征空间中合成多样化数据） [01:19] 🔍 Zooming without Zooming: Region-to-Image Distillation for Fine-Grained Multimodal Perception（无需缩放：面向细粒度多模态感知的区域到图像蒸馏） [02:03] 🏥 MedXIAOHE: A Comprehensive Recipe for Building Medical MLLMs（MedXIAOHE：构建医疗多模态大语言模型的完整方案） [02:43] 🎯 OneVision-Encoder: Codec-Aligned Sparsity as a Foundational Principle for Multimodal Intelligence（OneVision-编码器：以编解码器对齐的稀疏性作为多模态智能的基础原则） [03:29] 🔬 What does RL improve for Visual Reasoning? A Frankenstein-Style Analysis（强化学习对视觉推理有何改进？一项弗兰肯斯坦式分析） [04:18] 🤖 RLinf-Co: Reinforcement Learning-Based Sim-Real Co-Training for VLA Models（RLinf-Co：基于强化学习的仿真-现实协同训练VLA模型） [05:05] 🤖 ABot-M0: VLA Foundation Model for Robotic Manipulation with Action Manifold Learning（ABot-M0：基于动作流形学习的机器人操作VLA基础模型） [05:53] 🎬 Towards Universal Video MLLMs with Attribute-Structured and Quality-Verified Instructions（迈向具有属性结构和质量验证指令的通用视频多模态大语言模型） [06:55] 🤝 Intelligent AI Delegation（智能AI委托框架） [07:49] 📍 GeoAgent: Learning to Geolocate Everywhere with Reinforced Geographic Characteristics（GeoAgent：通过强化地理特征学习实现无处不在的地理定位） [08:39] ⚙ BPDQ: Bit-Plane Decomposition Quantization on a Variable Grid for Large Language Models（BPDQ：基于可变网格的比特平面分解量化用于大语言模型） [09:37] ⚡ FLAC: Maximum Entropy RL via Kinetic Energy Regularized Bridge Matching（FLAC：通过动能正则化桥匹配实现最大熵强化学习） [10:14] 🔍 On Robustness and Chain-of-Thought Consistency of RL-Finetuned VLMs（关于RL微调视觉语言模型的鲁棒性与思维链一致性研究） [11:03] ⚡ DICE: Diffusion Large Language Models Excel at Generating CUDA Kernels（DICE：扩散大语言模型在生成CUDA内核方面表现出色） [11:48] ⚡ CoPE-VideoLM: Codec Primitives For Efficient Video Language Models（CoPE-VideoLM：面向高效视频语言模型的编解码器原语）【关注我们】您还可以在以下平台找到我们，获得播客内容以外更多信息小红书: AI速递

13分钟

53

1个月前

【周末特辑】2月第3周最火AI论文 | OPUS精准选数据；弱模型反向助攻强模型

HuggingFace 每日AI论文速递

【赞助商】通勤路上就听AI每周谈。AI每周谈，每周带你回顾上周AI大事传送门 🔗https://www.xiaoyuzhoufm.com/podcast/688a34636f5a275f1cba40fd 【目录】本期的 5 篇论文如下： [00:52] TOP1(🔥305) | 🚀 OPUS: Towards Efficient and Principled Data Selection in Large Language Model Pre-training in Every Iteration（OPUS：迈向大规模语言模型预训练中高效且原理化的逐轮数据选择） [02:42] TOP2(🔥250) | 📈 Weak-Driven Learning: How Weak Agents make Strong Agents Stronger（弱驱动学习：弱智能体如何使强智能体更强） [04:59] TOP3(🔥186) | 💻 Code2World: A GUI World Model via Renderable Code Generation（Code2World：通过可渲染代码生成的GUI世界模型） [07:19] TOP4(🔥179) | 📈 QuantaAlpha: An Evolutionary Framework for LLM-Driven Alpha Mining（QuantaAlpha：一种基于大语言模型驱动的阿尔法挖掘进化框架） [10:02] TOP5(🔥172) | ⚡ Step 3.5 Flash: Open Frontier-Level Intelligence with 11B Active Parameters（Step 3.5 Flash：拥有110亿活跃参数的前沿级智能模型）【关注我们】您还可以在以下平台找到我们，获得播客内容以外更多信息小红书: AI速递

12分钟

99+

1个月前

2026.02.13 | 自演化AI难守安全；音频大模型统一token

HuggingFace 每日AI论文速递

【赞助商】通勤路上就听AI每周谈。AI每周谈，每周带你回顾上周AI大事传送门 🔗https://www.xiaoyuzhoufm.com/podcast/688a34636f5a275f1cba40fd 【目录】本期的 15 篇论文如下： [00:31] ⚠ The Devil Behind Moltbook: Anthropic Safety is Always Vanishing in Self-Evolving AI Societies（魔书背后的魔鬼：在自我进化的AI社会中，人类安全价值总是趋于消失） [01:24] 🎵 MOSS-Audio-Tokenizer: Scaling Audio Tokenizers for Future Audio Foundation Models（MOSS-Audio-Tokenizer：为未来音频基础模型扩展音频分词器） [02:28] 🧠 Learning beyond Teacher: Generalized On-Policy Distillation with Reward Extrapolation（超越教师的学习：基于奖励外推的广义策略蒸馏） [03:05] 🤖 GigaBrain-0.5M*: a VLA That Learns From World Model-Based Reinforcement Learning（GigaBrain-0.5M*：一种通过世界模型强化学习训练的视觉-语言-动作模型） [03:56] ⚖ LawThinker: A Deep Research Legal Agent in Dynamic Environments（LawThinker：动态环境中的深度研究法律智能体） [04:33] 🔍 Think Longer to Explore Deeper: Learn to Explore In-Context via Length-Incentivized Reinforcement Learning（思之愈久，探之愈深：通过长度激励强化学习实现上下文内探索） [05:16] 🎨 Stroke of Surprise: Progressive Semantic Illusions in Vector Sketching（惊喜之笔：矢量草图绘制中的渐进式语义错觉） [06:01] 🚀 DeepGen 1.0: A Lightweight Unified Multimodal Model for Advancing Image Generation and Editing（DeepGen 1.0：一个用于推进图像生成与编辑的轻量级统一多模态模型） [06:55] 🧩 Composition-RL: Compose Your Verifiable Prompts for Reinforcement Learning of Large Language Models（Composition-RL：为大型语言模型强化学习组合可验证提示） [07:38] 🧠 Thinking with Drafting: Optical Decompression via Logical Reconstruction（思维与草稿：通过逻辑重构实现光学解压缩） [08:17] 🗳 dVoting: Fast Voting for dLLMs（dVoting：面向扩散大语言模型的快速投票推理方法） [09:09] 🤖 RISE: Self-Improving Robot Policy with Compositional World Model（RISE：基于组合世界模型的机器人策略自改进框架） [09:54] 🤖 $χ_{0}$: Resource-Aware Robust Manipulation via Taming Distributional Inconsistencies（χ₀：通过驯服分布不一致实现资源感知的鲁棒机器人操作） [10:48] 🤖 EgoHumanoid: Unlocking In-the-Wild Loco-Manipulation with Robot-Free Egocentric Demonstration（EgoHumanoid：利用无机器人自我中心演示解锁野外移动操作） [11:45] 🔍 Unveiling Implicit Advantage Symmetry: Why GRPO Struggles with Exploration and Difficulty Adaptation（揭示隐式优势对称性：为何GRPO在探索与难度适应中举步维艰）【关注我们】您还可以在以下平台找到我们，获得播客内容以外更多信息小红书: AI速递

13分钟

72

1个月前

2026.02.12 | 稀疏MoE比肩GPT-5；GENIUS测流体智能

HuggingFace 每日AI论文速递

【赞助商】通勤路上就听AI每周谈。AI每周谈，每周带你回顾上周AI大事传送门 🔗https://www.xiaoyuzhoufm.com/podcast/688a34636f5a275f1cba40fd 【目录】本期的 15 篇论文如下： [00:28] ⚡ Step 3.5 Flash: Open Frontier-Level Intelligence with 11B Active Parameters（Step 3.5 Flash：拥有110亿活跃参数的前沿级智能模型） [01:06] 🧠 GENIUS: Generative Fluid Intelligence Evaluation Suite（GENIUS：生成式流体智能评估套件） [01:46] 🤖 PhyCritic: Multimodal Critic Models for Physical AI（PhyCritic：面向物理人工智能的多模态评判模型） [02:18] ⚙ ASA: Training-Free Representation Engineering for Tool-Calling Agents（ASA：面向工具调用智能体的免训练表征工程） [02:59] 🧠 When to Memorize and When to Stop: Gated Recurrent Memory for Long-Context Reasoning（何时记忆与何时停止：用于长上下文推理的门控循环记忆） [03:38] 🧮 Towards Autonomous Mathematics Research（迈向自主数学研究） [04:15] 🎬 TimeChat-Captioner: Scripting Multi-Scene Videos with Time-Aware and Structural Audio-Visual Captions（TimeChat-Captioner：基于时间感知与结构化音视频描述的多场景视频脚本生成） [05:12] 🧠 G-LNS: Generative Large Neighborhood Search for LLM-Based Automatic Heuristic Design（G-LNS：基于大语言模型的生成式大邻域搜索自动启发式设计） [06:02] ⚙ FeatureBench: Benchmarking Agentic Coding for Complex Feature Development（FeatureBench：面向复杂功能开发的智能体编码基准测试） [06:44] 🧑 DataChef: Cooking Up Optimal Data Recipes for LLM Adaptation via Reinforcement Learning（DataChef：通过强化学习为LLM适应烹饪最优数据配方） [07:28] 🚀 ROCKET: Rapid Optimization via Calibration-guided Knapsack Enhanced Truncation for Efficient Model Compression（ROCKET：基于校准引导的背包增强截断的快速优化，用于高效模型压缩） [08:27] 📈 Online Causal Kalman Filtering for Stable and Effective Policy Optimization（在线因果卡尔曼滤波用于稳定有效的策略优化） [09:24] 🧠 Internalizing Meta-Experience into Memory for Guided Reinforcement Learning in Large Language Models（将元经验内化至记忆以指导大语言模型的强化学习） [10:06] 🗣 Ex-Omni: Enabling 3D Facial Animation Generation for Omni-modal Large Language Models（Ex-Omni：赋能全模态大语言模型生成3D面部动画） [10:47] 🔄 Data Repetition Beats Data Scaling in Long-CoT Supervised Fine-Tuning（在长链思维监督微调中，数据重复优于数据扩展）【关注我们】您还可以在以下平台找到我们，获得播客内容以外更多信息小红书: AI速递

11分钟

99+

1个月前

2026.02.11 | OPUS对齐更新选数据；Code2World代码预演GUI

HuggingFace 每日AI论文速递

【赞助商】通勤路上就听AI每周谈。AI每周谈，每周带你回顾上周AI大事传送门 🔗https://www.xiaoyuzhoufm.com/podcast/688a34636f5a275f1cba40fd 【目录】本期的 15 篇论文如下： [00:33] 🚀 OPUS: Towards Efficient and Principled Data Selection in Large Language Model Pre-training in Every Iteration（OPUS：迈向大规模语言模型预训练中高效且原理化的逐轮数据选择） [01:17] 💻 Code2World: A GUI World Model via Renderable Code Generation（Code2World：通过可渲染代码生成的GUI世界模型） [02:05] 🤖 UI-Venus-1.5 Technical Report（UI-Venus-1.5 技术报告） [02:58] 🧠 Chain of Mindset: Reasoning with Adaptive Cognitive Modes（思维链模式：基于自适应认知模式的推理） [03:52] 🧠 SkillRL: Evolving Agents via Recursive Skill-Augmented Reinforcement Learning（SkillRL：通过递归技能增强强化学习进化智能体） [04:29] 🔬 P1-VL: Bridging Visual Perception and Scientific Reasoning in Physics Olympiads（P1-VL：连接视觉感知与物理奥赛中的科学推理） [05:24] 🤖 Agent World Model: Infinity Synthetic Environments for Agentic Reinforcement Learning（智能体世界模型：面向智能体强化学习的无限合成环境） [05:58] 🔍 Prism: Spectral-Aware Block-Sparse Attention（Prism：基于频谱感知的块稀疏注意力机制） [06:41] ⚡ DLLM-Searcher: Adapting Diffusion Large Language Model for Search Agents（DLLM-Searcher：适配扩散大语言模型用于搜索智能体） [07:23] 🎬 Olaf-World: Orienting Latent Actions for Video World Modeling（Olaf-World：面向视频世界建模的潜在动作定向） [08:18] 🎨 Condition Errors Refinement in Autoregressive Image Generation with Diffusion Loss（基于扩散损失的图像自回归生成中的条件误差优化） [09:09] 🍌 Agent Banana: High-Fidelity Image Editing with Agentic Thinking and Tooling（智能体香蕉：基于智能体思维与工具的高保真图像编辑） [09:50] 🎯 SCALE: Self-uncertainty Conditioned Adaptive Looking and Execution for Vision-Language-Action Models（SCALE：基于自不确定度条件化的自适应视觉感知与执行视觉-语言-动作模型） [10:37] 🤖 BagelVLA: Enhancing Long-Horizon Manipulation via Interleaved Vision-Language-Action Generation（BagelVLA：通过交错式视觉-语言-动作生成增强长视野操作） [11:31] 🎬 TokenTrim: Inference-Time Token Pruning for Autoregressive Long Video Generation（TokenTrim：用于自回归长视频生成的推理时令牌剪枝）【关注我们】您还可以在以下平台找到我们，获得播客内容以外更多信息小红书: AI速递

12分钟

99+

1个月前

2026.02.10 | ReAlign零训弥合图文隙；MOVA同步生成视音频

HuggingFace 每日AI论文速递

【赞助商】通勤路上就听AI每周谈。AI每周谈，每周带你回顾上周AI大事传送门 🔗https://www.xiaoyuzhoufm.com/podcast/688a34636f5a275f1cba40fd 【目录】本期的 15 篇论文如下： [00:34] 🔀 Modality Gap-Driven Subspace Alignment Training Paradigm For Multimodal Large Language Models（面向多模态大语言模型的模态间隙驱动的子空间对齐训练范式） [01:23] 🎬 MOVA: Towards Scalable and Synchronized Video-Audio Generation（MOVA：迈向可扩展且同步的视频-音频生成） [02:03] 📈 QuantaAlpha: An Evolutionary Framework for LLM-Driven Alpha Mining（QuantaAlpha：一种基于大语言模型驱动的阿尔法挖掘进化框架） [02:51] 🤖 Recurrent-Depth VLA: Implicit Test-Time Compute Scaling of Vision-Language-Action Models via Latent Iterative Reasoning（循环深度视觉语言动作模型：通过潜在迭代推理实现隐式测试时计算扩展） [03:24] 🎯 Alleviating Sparse Rewards by Modeling Step-Wise and Long-Term Sampling Effects in Flow-Based GRPO（通过建模逐步与长期采样效应缓解流式GRPO中的稀疏奖励问题） [04:22] ⚡ LLaDA2.1: Speeding Up Text Diffusion via Token Editing（LLaDA2.1：通过令牌编辑加速文本扩散） [05:02] 📱 GEBench: Benchmarking Image Generation Models as GUI Environments（GEBench：将图像生成模型作为GUI环境的基准测试） [05:52] 🎬 Demo-ICL: In-Context Learning for Procedural Video Knowledge Acquisition（Demo-ICL：面向过程性视频知识获取的上下文学习） [06:42] 🧠 Learning Query-Aware Budget-Tier Routing for Runtime Agent Memory（学习查询感知的预算层级路由以实现运行时智能体记忆） [07:20] 📈 Weak-Driven Learning: How Weak Agents make Strong Agents Stronger（弱驱动学习：弱智能体如何使强智能体更强） [08:12] 📊 LOCA-bench: Benchmarking Language Agents Under Controllable and Extreme Context Growth（LOCA-bench：在可控与极端上下文增长下对语言智能体进行基准测试） [08:59] 🔍 GISA: A Benchmark for General Information-Seeking Assistant（GISA：通用信息寻求助手基准） [09:56] 🧭 WorldCompass: Reinforcement Learning for Long-Horizon World Models（WorldCompass：面向长视野世界模型的强化学习） [10:35] 🧪 LatentChem: From Textual CoT to Latent Thinking in Chemical Reasoning（LatentChem：从文本思维链到化学推理中的潜在思维） [11:20] 🧭 Theory of Space: Can Foundation Models Construct Spatial Beliefs through Active Exploration?（空间理论：基础模型能否通过主动探索构建空间信念？）【关注我们】您还可以在以下平台找到我们，获得播客内容以外更多信息小红书: AI速递

12分钟

78

1个月前

2026.02.09 | AI问诊如住院医；互动悟规则才是真智能

HuggingFace 每日AI论文速递

【赞助商】通勤路上就听AI每周谈。AI每周谈，每周带你回顾上周AI大事传送门 🔗https://www.xiaoyuzhoufm.com/podcast/688a34636f5a275f1cba40fd 【目录】本期的 15 篇论文如下： [00:32] 🩺 Baichuan-M3: Modeling Clinical Inquiry for Reliable Medical Decision-Making（Baichuan-M3：建模临床问询以实现可靠的医疗决策） [01:17] 🧭 OdysseyArena: Benchmarking Large Language Models For Long-Horizon, Active and Inductive Interactions（奥德赛竞技场：面向长视野、主动与归纳交互的大语言模型基准测试） [02:03] 📈 On the Entropy Dynamics in Reinforcement Fine-Tuning of Large Language Models（论大型语言模型强化微调中的熵动态） [02:47] 🎯 F-GRPO: Don't Let Your Policy Learn the Obvious and Forget the Rare（F-GRPO：别让你的策略学会常见而遗忘罕见） [03:48] ⚖ MSign: An Optimizer Preventing Training Instability in Large Language Models via Stable Rank Restoration（MSign：一种通过稳定秩恢复防止大语言模型训练不稳定的优化器） [04:33] 🤖 DreamDojo: A Generalist Robot World Model from Large-Scale Human Videos（DreamDojo：基于大规模人类视频的通用机器人世界模型） [05:14] 🧠 Self-Improving Multilingual Long Reasoning via Translation-Reasoning Integrated Training（通过翻译-推理集成训练实现自我改进的多语言长推理） [06:07] 🧮 Judging What We Cannot Solve: A Consequence-Based Approach for Oracle-Free Evaluation of Research-Level Math（评判我们无法解决的问题：一种基于后果的无监督研究级数学评估方法） [06:46] 🎯 POINTS-GUI-G: GUI-Grounding Journey（POINTS-GUI-G：图形用户界面基础任务之旅） [07:45] 🧠 MemGUI-Bench: Benchmarking Memory of Mobile GUI Agents in Dynamic Environments（MemGUI-Bench：动态环境中移动GUI代理内存能力的基准测试） [08:29] 🧠 Back to Basics: Revisiting Exploration in Reinforcement Learning for LLM Reasoning via Generative Probabilities（回归基础：通过生成概率重新审视强化学习在LLM推理中的探索） [09:18] 🎵 AudioSAE: Towards Understanding of Audio-Processing Models with Sparse AutoEncoders（AudioSAE：利用稀疏自编码器理解音频处理模型） [09:59] ⚡ Canzona: A Unified, Asynchronous, and Load-Balanced Framework for Distributed Matrix-based Optimizers（Canzona：一个统一、异步且负载均衡的分布式矩阵优化器框架） [11:02] 🧠 InftyThink+: Effective and Efficient Infinite-Horizon Reasoning via Reinforcement Learning（InftyThink+：通过强化学习实现高效且有效的无限视野推理） [11:49] 🧠 PlanViz: Evaluating Planning-Oriented Image Generation and Editing for Computer-Use Tasks（PlanViz：面向计算机使用任务的规划导向图像生成与编辑评估）【关注我们】您还可以在以下平台找到我们，获得播客内容以外更多信息小红书: AI速递

13分钟

99+

1个月前

【周末特辑】2月第2周最火AI论文 | 分阶段统一动作空间；ERNIE 5.0大一统多模态

HuggingFace 每日AI论文速递

【赞助商】通勤路上就听AI每周谈。AI每周谈，每周带你回顾上周AI大事传送门 🔗https://www.xiaoyuzhoufm.com/podcast/688a34636f5a275f1cba40fd 【目录】本期的 5 篇论文如下： [00:48] TOP1(🔥235) | 🤖 Green-VLA: Staged Vision-Language-Action Model for Generalist Robots（Green-VLA：面向通用机器人的分阶段视觉-语言-动作模型） [02:54] TOP2(🔥235) | 🧠 ERNIE 5.0 Technical Report（ERNIE 5.0 技术报告） [05:14] TOP3(🔥206) | 🤖 Kimi K2.5: Visual Agentic Intelligence（Kimi K2.5：视觉智能体） [07:49] TOP4(🔥147) | 🔍 Vision-DeepResearch: Incentivizing DeepResearch Capability in Multimodal Large Language Models（Vision-DeepResearch：激励多模态大语言模型中的深度研究能力） [10:28] TOP5(🔥137) | 🍌 PaperBanana: Automating Academic Illustration for AI Scientists（PaperBanana：面向AI科学家的学术插图自动化生成框架）【关注我们】您还可以在以下平台找到我们，获得播客内容以外更多信息小红书: AI速递

13分钟

99+

1个月前

2026.02.06 | RLVR去长度偏见；长镜头不换记忆

HuggingFace 每日AI论文速递

【赞助商】通勤路上就听AI每周谈。AI每周谈，每周带你回顾上周AI大事传送门 🔗https://www.xiaoyuzhoufm.com/podcast/688a34636f5a275f1cba40fd 【目录】本期的 15 篇论文如下： [00:29] 📊 Length-Unbiased Sequence Policy Optimization: Revealing and Controlling Response Length Variation in RLVR（长度无偏序列策略优化：揭示与控制RLVR中的响应长度变化） [01:20] 🎬 Context Forcing: Consistent Autoregressive Video Generation with Long Context（上下文强制：具有长上下文的一致自回归视频生成） [02:11] 🧠 RISE-Video: Can Video Generators Decode Implicit World Rules?（RISE-Video：视频生成器能否解码隐含的世界规则？） [02:57] 🔮 ProAct: Agentic Lookahead in Interactive Environments（ProAct：交互式环境中的前瞻性智能体规划） [03:47] ⚡ Dr. Kernel: Reinforcement Learning Done Right for Triton Kernel Generations（Dr. Kernel：用于Triton内核生成的强化学习正确实现） [04:39] 🧭 Steering LLMs via Scalable Interactive Oversight（通过可扩展的交互式监督引导大型语言模型） [05:27] 🧠 Grounding and Enhancing Informativeness and Utility in Dataset Distillation（数据集约简中信息性与实用性的基础与增强） [06:13] 🧪 Retrieval-Infused Reasoning Sandbox: A Benchmark for Decoupling Retrieval and Reasoning Capabilities（检索增强推理沙盒：一个解耦检索与推理能力的基准） [07:07] 🔍 Semantic Search over 9 Million Mathematical Theorems（对超过900万个数学定理的语义搜索） [07:57] 🕷 Spider-Sense: Intrinsic Risk Sensing for Efficient Agent Defense with Hierarchical Adaptive Screening（Spider-Sense：基于内在风险感知的高效智能体防御与分层自适应筛查） [08:39] 🧪 CAR-bench: Evaluating the Consistency and Limit-Awareness of LLM Agents under Real-World Uncertainty（CAR-bench：评估现实世界不确定性下LLM智能体的一致性与极限感知能力） [09:30] 🤖 InterPrior: Scaling Generative Control for Physics-Based Human-Object Interactions（InterPrior：基于物理的人-物交互生成控制扩展框架） [10:22] 🎬 Thinking in Frames: How Visual Context and Test-Time Scaling Empower Video Reasoning（帧中思考：视觉上下文与测试时缩放如何赋能视频推理） [11:14] 🔄 SwimBird: Eliciting Switchable Reasoning Mode in Hybrid Autoregressive MLLMs（SwimBird：在混合自回归多模态大语言模型中引发可切换推理模式） [12:20] 🔍 SAGE: Benchmarking and Improving Retrieval for Deep Research Agents（SAGE：深度研究智能体的检索基准评测与性能提升）【关注我们】您还可以在以下平台找到我们，获得播客内容以外更多信息小红书: AI速递

13分钟

99+

1个月前

2026.02.05 | ERNIE 5.0统一模态；FASA稀疏注意力省内存

HuggingFace 每日AI论文速递

【赞助商】通勤路上就听AI每周谈。AI每周谈，每周带你回顾上周AI大事传送门 🔗https://www.xiaoyuzhoufm.com/podcast/688a34636f5a275f1cba40fd 【目录】本期的 15 篇论文如下： [00:29] 🧠 ERNIE 5.0 Technical Report（ERNIE 5.0 技术报告） [01:11] ⚡ FASA: Frequency-aware Sparse Attention（FASA：基于频率感知的稀疏注意力机制） [02:01] 📊 Training Data Efficiency in Multimodal Process Reward Models（多模态过程奖励模型中的训练数据效率研究） [02:44] 🤖 WideSeek-R1: Exploring Width Scaling for Broad Information Seeking via Multi-Agent Reinforcement Learning（WideSeek-R1：通过多智能体强化学习探索宽度扩展以实现广泛信息检索） [03:28] ⚡ OmniSIFT: Modality-Asymmetric Token Compression for Efficient Omni-modal Large Language Models（OmniSIFT：面向高效全模态大语言模型的模态非对称令牌压缩） [04:21] ⚡ HySparse: A Hybrid Sparse Attention Architecture with Oracle Token Selection and KV Cache Sharing（HySparse：一种具有预言机令牌选择和KV缓存共享的混合稀疏注意力架构） [05:02] 🤖 EgoActor: Grounding Task Planning into Spatial-aware Egocentric Actions for Humanoid Robots via Visual-Language Models（EgoActor：通过视觉语言模型将任务规划落地为空间感知的具身动作） [06:05] 🎬 Quant VideoGen: Auto-Regressive Long Video Generation via 2-Bit KV-Cache Quantization（Quant VideoGen：通过2位KV缓存量化实现自回归长视频生成） [06:59] 🤖 SoMA: A Real-to-Sim Neural Simulator for Robotic Soft-body Manipulation（SoMA：面向机器人软体操作的真实到仿真神经模拟器） [07:44] 🔍 TIDE: Trajectory-based Diagnostic Evaluation of Test-Time Improvement in LLM Agents（TIDE：基于轨迹的LLM智能体测试时改进诊断评估） [08:21] 🧠 Semantic Routing: Exploring Multi-Layer LLM Feature Weighting for Diffusion Transformers（语义路由：探索扩散变换器中多层LLM特征加权的融合框架） [09:12] 🤖 Rethinking the Trust Region in LLM Reinforcement Learning（重新思考大语言模型强化学习中的信任区域） [09:54] ♻ Residual Context Diffusion Language Models（残差上下文扩散语言模型） [10:40] 🧱 HY3D-Bench: Generation of 3D Assets（HY3D-Bench：3D资产的生成） [11:34] 🎨 AutoFigure: Generating and Refining Publication-Ready Scientific Illustrations（AutoFigure：生成与优化可直接用于发表的科学插图）【关注我们】您还可以在以下平台找到我们，获得播客内容以外更多信息小红书: AI速递

12分钟

99+

1个月前

2026.02.04 | 看图写代码省token；临时组队降成本

HuggingFace 每日AI论文速递

【赞助商】通勤路上就听AI每周谈。AI每周谈，每周带你回顾上周AI大事传送门 🔗https://www.xiaoyuzhoufm.com/podcast/688a34636f5a275f1cba40fd 【目录】本期的 15 篇论文如下： [00:32] 👁 CodeOCR: On the Effectiveness of Vision Language Models in Code Understanding（CodeOCR：视觉语言模型在代码理解中的有效性研究） [01:18] 🤖 AOrchestra: Automating Sub-Agent Creation for Agentic Orchestration（AOrchestra：面向智能体编排的子智能体自动创建） [02:01] 🔍 No Global Plan in Chain-of-Thought: Uncover the Latent Planning Horizon of LLMs（思维链中无全局规划：揭示大语言模型的潜在规划视野） [02:43] 🔗 daVinci-Agency: Unlocking Long-Horizon Agency Data-Efficiently（daVinci-Agency：高效解锁长程智能体工作流） [03:23] 🧠 Research on World Models Is Not Merely Injecting World Knowledge into Specific Tasks（世界模型研究并非仅将世界知识注入特定任务） [04:06] 🎬 3D-Aware Implicit Motion Control for View-Adaptive Human Video Generation（面向视角自适应人体视频生成的3D感知隐式运动控制） [04:56] 🤖 MARS: Modular Agent with Reflective Search for Automated AI Research（MARS：具备反思搜索能力的模块化智能体用于自动化人工智能研究） [05:41] 📊 CoBA-RL: Capability-Oriented Budget Allocation for Reinforcement Learning in LLMs（CoBA-RL：面向大语言模型强化学习的基于能力的预算分配算法） [06:25] ⚡ Diversity-Preserved Distribution Matching Distillation for Fast Visual Synthesis（保持多样性的分布匹配蒸馏用于快速视觉合成） [07:19] 🤖 SWE-World: Building Software Engineering Agents in Docker-Free Environments（SWE-World：在无Docker环境中构建软件工程智能体） [08:09] 🤖 SWE-Master: Unleashing the Potential of Software Engineering Agents via Post-Training（SWE-Master：通过后训练释放软件工程智能体的潜力） [09:14] 📊 Learning Query-Specific Rubrics from Human Preferences for DeepResearch Report Generation（基于人类偏好的查询特定评分规则学习用于深度研究报告生成） [10:08] ⚡ Parallel-Probe: Towards Efficient Parallel Thinking via 2D Probing（Parallel-Probe：通过二维探测实现高效并行思维） [10:59] 🎯 Unified Personalized Reward Model for Vision Generation（视觉生成的统一个性化奖励模型） [11:47] 🔍 WideSeek: Advancing Wide Research via Multi-Agent Scaling（WideSeek：通过多智能体扩展推进广度研究）【关注我们】您还可以在以下平台找到我们，获得播客内容以外更多信息小红书: AI速递

12分钟

99+

1个月前