HuggingFace 每日AI论文速递 - 节目列表

2025.11.18 | RL奥赛夺金；Uni-MoE 2.0全能跃升

本期的 14 篇论文如下： [00:17] 🏅 P1: Mastering Physics Olympiads with Reinforcement Learning（用强化学习攻克物理奥赛） [00:56] 🌐 Uni-MoE-2.0-Omni: Scaling Language-Centric Omnimodal Large Model with Advanced MoE, Training and Data（Uni-MoE 2.0 Omni：以语言为中心的万模态大模型，通过先进MoE、训练与数据实现规模跃升） [01:42] 🧩 Part-X-MLLM: Part-aware 3D Multimodal Large Language Model（Part-X-MLLM：面向部件感知的3D多模态大语言模型） [02:22] 🧠 TiViBench: Benchmarking Think-in-Video Reasoning for Video Generative Models（TiViBench：视频生成模型思维推理基准测试） [03:08] 🚀 GroupRank: A Groupwise Reranking Paradigm Driven by Reinforcement Learning（GroupRank：一种由强化学习驱动的分组重排范式） [03:49] 🧩 PhysX-Anything: Simulation-Ready Physical 3D Assets from Single Image（PhysX-Anything：单张图像生成可仿真物理3D资产） [04:28] 🌌 UFO$^3$: Weaving the Digital Agent Galaxy（UFO³：编织数字智能体银河） [04:59] 🍲 Souper-Model: How Simple Arithmetic Unlocks State-of-the-Art LLM Performance（“汤”级模型：简单加权平均即可让大语言模型性能跃升） [05:38] 🌍 OlmoEarth: Stable Latent Image Modeling for Multimodal Earth Observation（OlmoEarth：面向多模态地球观测的稳定潜变量图像建模） [06:19] 🔄 Live-SWE-agent: Can Software Engineering Agents Self-Evolve on the Fly?（Live-SWE-agent：软件工程智能体能否实时自我进化？） [06:51] 🚀 MiroThinker: Pushing the Performance Boundaries of Open-Source Research Agents via Model, Context, and Interactive Scaling（MiroThinker：通过模型、上下文与交互扩展，将开源研究智能体性能推向新边界） [07:36] 🎯 Test-Time Spectrum-Aware Latent Steering for Zero-Shot Generalization in Vision-Language Models（测试时谱感知潜变量引导实现视觉-语言模型零样本泛化） [08:19] 🧠 WebCoach: Self-Evolving Web Agents with Cross-Session Memory Guidance（WebCoach：具备跨会话记忆引导的自进化网页智能体） [09:10] 🧬 Evolve the Method, Not the Prompts: Evolutionary Synthesis of Jailbreak Attacks on LLMs（进化方法而非提示：面向大模型的越狱攻击演化合成）【关注我们】您还可以在以下平台找到我们，获得播客内容以外更多信息小红书: AI速递

10分钟

2025.11.17 | RoPE去噪救长文本；AI速筛离子液体

本期的 13 篇论文如下： [00:24] 🧹 DoPE: Denoising Rotary Position Embedding（DoPE：面向旋转位置嵌入的去噪处理） [00:58] 🧪 AIonopedia: an LLM agent orchestrating multimodal learning for ionic liquid discovery（AIonopedia：面向离子液体发现的LLM智能体多模态学习编排） [01:44] 🖼 UI2Code^N: A Visual Language Model for Test-Time Scalable Interactive UI-to-Code Generation（UI2Code^N：面向测试时可扩展交互式UI转代码生成的视觉语言模型） [02:20] 🚀 Virtual Width Networks（虚拟宽度网络） [02:56] ⚡ LiteAttention: A Temporal Sparse Attention for Diffusion Transformers（LiteAttention：面向扩散Transformer的时序稀疏注意力机制） [03:32] 🌐 Simulating the Visual World with Artificial Intelligence: A Roadmap（用人工智能模拟视觉世界：路线图） [04:12] 📐 GGBench: A Geometric Generative Reasoning Benchmark for Unified Multimodal Models（GGBench：面向统一多模态模型的几何生成推理基准） [05:00] 🧏 HI-TransPA: Hearing Impairments Translation Personal Assistant（HI-TransPA：面向听障者的语音-唇形翻译个人助手） [05:35] 🚀 MarsRL: Advancing Multi-Agent Reasoning System via Reinforcement Learning with Agentic Pipeline Parallelism（MarsRL：基于智能体流水线并行强化学习的多智能体推理系统进阶研究） [06:38] 🎭 EmoVid: A Multimodal Emotion Video Dataset for Emotion-Centric Video Understanding and Generation（EmoVid：面向情感中心视频理解与生成的大规模多模态情感视频数据集） [07:18] 🧭 SpatialThinker: Reinforcing 3D Reasoning in Multimodal LLMs via Spatial Rewards（SpatialThinker：用空间奖励强化多模态大模型的3D推理） [07:55] 📊 Workload Schedulers -- Genesis, Algorithms and Differences（工作负载调度器——起源、算法与差异） [08:51] 🚗 CATS-V2V: A Real-World Vehicle-to-Vehicle Cooperative Perception Dataset with Complex Adverse Traffic Scenarios（CATS-V2V：面向复杂恶劣交通场景的真实车车协同感知数据集）【关注我们】您还可以在以下平台找到我们，获得播客内容以外更多信息小红书: AI速递

10分钟

【周末特辑】11月第3周最火AI论文 | 3D游戏智能体开源方案；桌面AI少样本精准操控

本期的 5 篇论文如下： [00:38] TOP1(🔥135) | 🌍 Lumine: An Open Recipe for Building Generalist Agents in 3D Open Worlds（Lumine：在3D开放世界中打造通才智能体的开源方案） [02:47] TOP2(🔥97) | 🖥 Grounding Computer Use Agents on Human Demonstrations（基于人类演示的计算机使用智能体定位研究） [04:44] TOP3(🔥89) | 🧠 Tiny Model, Big Logic: Diversity-Driven Optimization Elicits Large-Model Reasoning Ability in VibeThinker-1.5B（小模型大逻辑：多样性驱动优化唤醒VibeThinker-1.5B的大模型推理力） [06:33] TOP4(🔥84) | 🧠 HaluMem: Evaluating Hallucinations in Memory Systems of Agents（HaluMem：智能体记忆系统幻觉评估基准） [08:56] TOP5(🔥67) | 🧩 IterResearch: Rethinking Long-Horizon Agents via Markovian State Reconstruction（IterResearch：基于马尔可夫状态重构的长程智能体再思考）【关注我们】您还可以在以下平台找到我们，获得播客内容以外更多信息小红书: AI速递

11分钟

2025.11.14 | UniVA四合一开源视频通才；Depth Anything 3单ViT通吃3D

本期的 4 篇论文如下： [00:24] 🎬 UniVA: Universal Video Agent towards Open-Source Next-Generation Video Generalist（UniVA：面向开源下一代视频通才的通用视频智能体） [00:59] 🌐 Depth Anything 3: Recovering the Visual Space from Any Views（Depth Anything 3：从任意视角恢复视觉空间） [01:50] 🔍 AlphaResearch: Accelerating New Algorithm Discovery with Language Models（AlphaResearch：用语言模型加速全新算法发现） [02:21] 🔍 MuSc-V2: Zero-Shot Multimodal Industrial Anomaly Classification and Segmentation with Mutual Scoring of Unlabeled Samples（MuSc-V2：无需标注样本的零样本多模态工业异常分类与分割）【关注我们】您还可以在以下平台找到我们，获得播客内容以外更多信息小红书: AI速递

3分钟

91

2025.11.13 | 原神数据炼成7B通用AI；零训练轨迹秒变视频遥控器

本期的 9 篇论文如下： [00:19] 🌍 Lumine: An Open Recipe for Building Generalist Agents in 3D Open Worlds（Lumine：在3D开放世界中打造通才智能体的开源方案） [00:54] 🎬 Time-to-Move: Training-Free Motion Controlled Video Generation via Dual-Clock Denoising（Time-to-Move：无需训练的双时钟去噪运动控制视频生成） [01:31] ⚡ TiDAR: Think in Diffusion, Talk in Autoregression（TiDAR：扩散式思考，自回归式表达） [02:15] 🔄 LoopTool: Closing the Data-Training Loop for Robust LLM Tool Calls（LoopTool：闭合数据-训练循环，铸就鲁棒LLM工具调用） [02:51] 🤖 WMPO: World Model-based Policy Optimization for Vision-Language-Action Models（基于世界模型的视觉-语言-动作策略优化） [03:33] 🖥 WebVIA: A Web-based Vision-Language Agentic Framework for Interactive and Verifiable UI-to-Code Generation（WebVIA：可交互可验证的网页端视觉-语言智能体UI代码生成框架） [04:19] 🎯 Toward the Frontiers of Reliable Diffusion Sampling via Adversarial Sinkhorn Attention Guidance（迈向对抗式Sinkhorn注意力引导的可靠扩散采样新前沿） [04:55] 🤖 Agentic Refactoring: An Empirical Study of AI Coding Agents（智能体重构：AI编程智能体的大规模实证研究） [05:31] 🛡 Stemming Hallucination in Language Models Using a Licensing Oracle（利用许可证预言机遏制语言模型幻觉）【关注我们】您还可以在以下平台找到我们，获得播客内容以外更多信息小红书: AI速递

6分钟

2025.11.12 | 1.5B小模型反超671B大模型；多智能体质检聊天机器人

本期的 9 篇论文如下： [00:24] 🧠 Tiny Model, Big Logic: Diversity-Driven Optimization Elicits Large-Model Reasoning Ability in VibeThinker-1.5B（小模型大逻辑：多样性驱动优化唤醒VibeThinker-1.5B的大模型推理力） [00:59] 🤝 Adaptive Multi-Agent Response Refinement in Conversational Systems（对话系统中自适应多智能体响应精炼机制） [01:30] 🧩 Wasm: A Pipeline for Constructing Structured Arabic Interleaved Multimodal Corpora（Wasm：构建结构化阿拉伯交错型多模态语料的流水线） [02:17] ⚡ KLASS: KL-Guided Fast Inference in Masked Diffusion Models（KLASS：基于KL散度引导的掩码扩散模型快速采样） [02:53] 🖥 Grounding Computer Use Agents on Human Demonstrations（基于人类演示的计算机使用智能体定位研究） [03:37] 🎥 VideoSSR: Video Self-Supervised Reinforcement Learning（VideoSSR：视频自监督强化学习） [04:19] 🚪 The Path Not Taken: RLVR Provably Learns Off the Principals（未被选择的路径：RLVR确实沿非主方向学习） [05:14] 🔗 BiCA: Effective Biomedical Dense Retrieval with Citation-Aware Hard Negatives（BiCA：面向引文感知难负样本的生物医学稠密检索） [05:56] 🤹 Walking the Tightrope of LLMs for Software Development: A Practitioners' Perspective（游走于大型语言模型的钢丝绳——开发者视角的平衡之道）【关注我们】您还可以在以下平台找到我们，获得播客内容以外更多信息小红书: AI速递

6分钟

2025.11.11 | 小窗口勤总结刷新深度研究；先广撒网再啃难题激活代码竞赛

本期的 13 篇论文如下： [00:25] 🧩 IterResearch: Rethinking Long-Horizon Agents via Markovian State Reconstruction（IterResearch：基于马尔可夫状态重构的长程智能体再思考） [01:16] 🏆 DRIVE: Data Curation Best Practices for Reinforcement Learning with Verifiable Reward in Competitive Code Generation（DRIVE：面向可验证奖励强化学习的竞赛级代码生成数据精选最佳实践） [02:03] 🔬 The Station: An Open-World Environment for AI-Driven Discovery（“站”：面向AI驱动科学发现的开放世界环境） [02:43] 🚀 RedOne 2.0: Rethinking Domain-specific LLM Post-Training in Social Networking Services（RedOne 2.0：社交网络场景下领域大模型后训练新范式） [03:15] 🧠 SofT-GRPO: Surpassing Discrete-Token LLM Reinforcement Learning via Gumbel-Reparameterized Soft-Thinking Policy Optimization（SofT-GRPO：用Gumbel重参数化软思考策略优化让离散Token强化学习望尘莫及） [03:53] 🧭 Routing Manifold Alignment Improves Generalization of Mixture-of-Experts LLMs（路由流形对齐提升混合专家大语言模型的泛化能力） [04:30] 🔍 Reasoning with Confidence: Efficient Verification of LLM Reasoning Steps via Uncertainty Heads（以置信度推理：通过不确定性头高效验证大模型推理步骤） [05:10] 🎬 MVU-Eval: Towards Multi-Video Understanding Evaluation for Multimodal LLMs（MVU-Eval：面向多模态大模型的多视频理解评测基准） [05:50] 🎨 MPJudge: Towards Perceptual Assessment of Music-Induced Paintings（MPJudge：面向音乐诱发绘画的感知评估） [06:57] 🔄 RLoop: An Self-Improving Framework for Reinforcement Learning with Iterative Policy Initialization（RLoop：一种通过迭代策略初始化自我提升的强化学习框架） [07:36] 🤖 Robot Learning from a Physical World Model（基于物理世界模型的机器人学习） [08:21] 🛠 NURBGen: High-Fidelity Text-to-CAD Generation through LLM-Driven NURBS Modeling（NURBGen：基于大模型驱动NURBS建模的高保真文本转CAD生成） [08:52] 🚀 SWE-fficiency: Can Language Models Optimize Real-World Repositories on Real Workloads?（SWE-fficiency：语言模型能否在真实工作负载下优化真实仓库性能？）【关注我们】您还可以在以下平台找到我们，获得播客内容以外更多信息小红书: AI速递

9分钟

2025.11.10 | DeepEyesV2小模型边看图边写代码；纯数据让AI长出立体眼

本期的 7 篇论文如下： [00:21] 🧠 DeepEyesV2: Toward Agentic Multimodal Model（DeepEyesV2：迈向智能体多模态模型） [01:13] 🧭 Visual Spatial Tuning（视觉空间调优） [01:54] 🦹 Too Good to be Bad: On the Failure of LLMs to Role-Play Villains（过于完美以致无法邪恶：大语言模型反派角色扮演的失败） [02:27] 🧠 Towards Mitigating Hallucinations in Large Vision-Language Models by Refining Textual Embeddings（通过精炼文本嵌入减轻大型视觉-语言模型中的幻觉） [03:13] 🪡 Jailbreaking in the Haystack（干草堆中的越狱攻击） [03:48] 🎯 CritiCal: Can Critique Help LLM Uncertainty or Confidence Calibration?（CritiCal：语言批判能否校准大模型置信度？） [04:23] 🏃 Dense Motion Captioning（密集动作字幕生成）【关注我们】您还可以在以下平台找到我们，获得播客内容以外更多信息小红书: AI速递

5分钟

【周末特辑】10月第4周最火AI论文 | 内部概率+投票剪尾，RPC省样本提精度

本期的 5 篇论文如下： [00:29] TOP1(🔥135) | 🧠 A Theoretical Study on Bridging Internal Probability and Self-Consistency for LLM Reasoning（大模型推理中内部概率与自洽性桥接的理论研究） [03:02] TOP2(🔥104) | 🚀 Efficient Long-context Language Model Training by Core Attention Disaggregation（通过核心注意力拆解实现高效长上下文语言模型训练） [05:29] TOP3(🔥100) | 🧠 LightMem: Lightweight and Efficient Memory-Augmented Generation（LightMem：轻量高效的记忆增强生成框架） [07:33] TOP4(🔥90) | 🧠 Every Attention Matters: An Efficient Hybrid Architecture for Long-Context Reasoning（每一种注意力都重要：面向长上下文推理的高效混合架构） [10:18] TOP5(🔥79) | 🤖 DeepAnalyze: Agentic Large Language Models for Autonomous Data Science（DeepAnalyze：面向自主数据科学的智能体大模型）【关注我们】您还可以在以下平台找到我们，获得播客内容以外更多信息小红书: AI速递

13分钟

2025.10.27 | DeepAgent一步推理+ToolPO；视频即提示DiT秒控百种语义

本期的 15 篇论文如下： [00:27] 🧠 DeepAgent: A General Reasoning Agent with Scalable Toolsets（DeepAgent：具备可扩展工具集的通用推理智能体） [01:01] 🎬 Video-As-Prompt: Unified Semantic Control for Video Generation（视频即提示：统一语义控制的视频生成新范式） [01:35] 🔧 From Denoising to Refining: A Corrective Framework for Vision-Language Diffusion Model（从去噪到精修：视觉-语言扩散模型的纠错式生成框架） [02:14] 🧩 Sample By Step, Optimize By Chunk: Chunk-Level GRPO For Text-to-Image Generation（逐段采样、分块优化：面向文本到图像生成的块级GRPO方法） [02:51] 🧠 A Definition of AGI（AGI的量化定义） [03:23] 🧩 Sparser Block-Sparse Attention via Token Permutation（基于Token置换的稀疏块稀疏注意力机制） [04:14] 🧭 UI-Ins: Enhancing GUI Grounding with Multi-Perspective Instruction-as-Reasoning（UI-Ins：以“指令即推理”多视角增强GUI定位） [04:57] 🧠 Reasoning with Sampling: Your Base Model is Smarter Than You Think（基于采样的推理：你的基础模型比你想象的更聪明） [05:30] 🧠 RECALL: REpresentation-aligned Catastrophic-forgetting ALLeviation via Hierarchical Model Merging（RECALL：基于表示对齐的层级模型融合缓解大模型灾难性遗忘） [06:08] 📐 Visual Diffusion Models are Geometric Solvers（视觉扩散模型是几何求解器） [06:56] 🌍 WorldGrow: Generating Infinite 3D World（无限3D世界生成：WorldGrow） [07:35] 🎬 RAPO++: Cross-Stage Prompt Optimization for Text-to-Video Generation via Data Alignment and Test-Time Scaling（RAPO++：面向文生视频的跨阶段提示优化——数据对齐与测试时缩放） [08:14] 🔗 Model Merging with Functional Dual Anchors（基于功能双锚点的模型融合方法） [08:49] 🧭 Map the Flow: Revealing Hidden Pathways of Information in VideoLLMs（揭示VideoLLM隐藏信息通路：视频语言模型内部流动图谱） [09:34] 📊 Document Understanding, Measurement, and Manipulation Using Category Theory（基于范畴论的文档理解、度量与操控）【关注我们】您还可以在以下平台找到我们，获得播客内容以外更多信息小红书: AI速递

10分钟

【周末特辑】11月第2周最火AI论文 | 视频生成即推理；SVG草图变代码

本期的 5 篇论文如下： [00:31] TOP1(🔥137) | 🎬 Thinking with Video: Video Generation as a Promising Multimodal Reasoning Paradigm（用视频思考：视频生成作为统一多模态推理新范式） [02:43] TOP2(🔥95) | 🖼 VCode: a Multimodal Coding Benchmark with SVG as Symbolic Visual Representation（VCode：以SVG为符号视觉表征的多模态代码评测基准） [05:12] TOP3(🔥90) | 🚀 Diffusion Language Models are Super Data Learners（扩散语言模型是超级数据学习者） [07:18] TOP4(🔥88) | 👁 Don't Blind Your VLA: Aligning Visual Representations for OOD Generalization（别让VLA变盲：对齐视觉表征实现分布外泛化） [09:24] TOP5(🔥79) | 🧠 Every Activation Boosted: Scaling General Reasoner to 1 Trillion Open Language Foundation（全激活赋能：将通用推理模型扩展到万亿参数的开放语言基座）【关注我们】您还可以在以下平台找到我们，获得播客内容以外更多信息小红书: AI速递

12分钟