HuggingFace 每日AI论文速递 - 节目列表

【月末特辑】1月最火AI论文 | DeepSeek-R1强化学习提升LLM推理能力；长文本处理突破

本期的 10 篇论文如下：[00:40] TOP1(🔥281) | 🧠 DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning（DeepSeek-R1：通过强化学习激励大语言模型的推理能力）[03:13] TOP2(🔥271) | ⚡ MiniMax-01: Scaling Foundation Models with Lightning Attention（MiniMax-01：基于闪电注意力机制扩展基础模型）[05:36] TOP3(🔥249) | 🧠 rStar-Math: Small LLMs Can Master Math Reasoning with Self-Evolved Deep Thinking（rStar-Math：小型语言模型通过自我进化的深度思考掌握数学推理）[08:13] TOP4(🔥103) | 🧠 Evolving Deeper LLM Thinking（演化更深层次的LLM思维）[10:28] TOP5(🔥99) | 📚 2.5 Years in Class: A Multimodal Textbook for Vision-Language Pretraining（2.5年课堂：用于视觉-语言预训练的多模态教科书）[12:51] TOP6(🔥90) | 🚀 REINFORCE++: A Simple and Efficient Approach for Aligning Large Language Models（REINFORCE++：一种简单高效的大语言模型对齐方法）[15:15] TOP7(🔥90) | 🧠 Towards System 2 Reasoning in LLMs: Learning How to Think With Meta Chain-of-Though（迈向LLMs中的系统2推理：学习如何通过元思维链进行思考）[17:14] TOP8(🔥89) | 📊 The Lessons of Developing Process Reward Models in Mathematical Reasoning（数学推理中过程奖励模型开发的经验教训）[19:33] TOP9(🔥88) | 🤔 Agent-R: Training Language Model Agents to Reflect via Iterative Self-Training（Agent-R：通过迭代自训练使语言模型代理具备反思能力）[21:35] TOP10(🔥87) | 🧠 The GAN is dead; long live the GAN! A Modern GAN Baseline（GAN已死；GAN万岁！一个现代的GAN基线）【关注我们】您还可以在以下平台找到我们，获得播客内容以外更多信息小红书: AI速递在小宇宙查看该单集文稿

24分钟

【周末特辑】1月第4周最火AI论文 | 强化学习优于监督微调，HLE挑战LLMs能力。

本期的 5 篇论文如下：[00:35] TOP1(🔥53) | 🧠 SFT Memorizes, RL Generalizes: A Comparative Study of Foundation Model Post-training（监督微调记忆，强化学习泛化：基础模型后训练的比较研究）[03:02] TOP2(🔥48) | 🧠 Humanity's Last Exam（人类最后的考试）[05:21] TOP3(🔥47) | 🛡 GuardReasoner: Towards Reasoning-based LLM Safeguards（GuardReasoner：面向基于推理的LLM安全防护）[07:44] TOP4(🔥45) | 🎙 Baichuan-Omni-1.5 Technical Report（百川全能1.5技术报告）[10:07] TOP5(🔥42) | 📚 Qwen2.5-1M Technical Report（Qwen2.5-1M 技术报告）【关注我们】您还可以在以下平台找到我们，获得播客内容以外更多信息小红书: AI速递在小宇宙查看该单集文稿

12分钟

2025.01.31 | GuardReasoner提升LLM安全，MedXpertQA挑战医疗AI推理。

本期的 8 篇论文如下：[00:25] 🛡 GuardReasoner: Towards Reasoning-based LLM Safeguards（GuardReasoner：面向基于推理的LLM安全防护）[01:04] 🩺 MedXpertQA: Benchmarking Expert-Level Medical Reasoning and Understanding（MedXpertQA：专家级医疗推理与理解基准测试）[01:58] 🧠 Thoughts Are All Over the Place: On the Underthinking of o1-Like LLMs（思维四处游走：关于o1类LLMs的浅思现象）[02:40] 🌐 Streaming DiLoCo with overlapping communication: Towards a Distributed Free Lunch（带有重叠通信的流式DiLoCo：迈向分布式免费午餐）[03:20] 🌍 PhysBench: Benchmarking and Enhancing Vision-Language Models for Physical World Understanding（PhysBench：评估与增强视觉-语言模型在物理世界理解中的表现）[04:09] 🤖 WILDCHAT-50M: A Deep Dive Into the Role of Synthetic Data in Post-Training（WILDCHAT-50M：深入探讨合成数据在训练后阶段的作用）[05:04] 🛡 o3-mini vs DeepSeek-R1: Which One is Safer?（o3-mini 与 DeepSeek-R1：哪个更安全？）[05:41] 🤔 Large Language Models Think Too Fast To Explore Effectively（大语言模型思考过快导致探索效果不佳）【关注我们】您还可以在以下平台找到我们，获得播客内容以外更多信息小红书: AI速递在小宇宙查看该单集文稿

6分钟

98

2025.01.30 | 批评提升推理，AI能耗引关注

本期的 5 篇论文如下：[00:25] 🧠 Critique Fine-Tuning: Learning to Critique is More Effective than Learning to Imitate（批评微调：学习批评比学习模仿更有效）[01:10] 🌍 Exploring the sustainable scaling of AI dilemma: A projective study of corporations' AI environmental impacts（探索AI可持续扩展的困境：企业AI环境影响的预测性研究）[01:50] 🌟 Atla Selene Mini: A General Purpose Evaluation Model（Atla Selene Mini：一种通用评估模型）[02:27] ⚠ Early External Safety Testing of OpenAI's o3-mini: Insights from the Pre-Deployment Evaluation（OpenAI的o3-mini早期外部安全测试：部署前评估的见解）[03:06] 🦠 Virus: Harmful Fine-tuning Attack for Large Language Models Bypassing Guardrail Moderation（病毒：绕过防护机制的大语言模型有害微调攻击）【关注我们】您还可以在以下平台找到我们，获得播客内容以外更多信息小红书: AI速递在小宇宙查看该单集文稿

4分钟

2025.01.29 | RL泛化优，SFT稳定输出；FP4量化降成本，精度保持。

本期的 8 篇论文如下：[00:26] 🧠 SFT Memorizes, RL Generalizes: A Comparative Study of Foundation Model Post-training（监督微调记忆，强化学习泛化：基础模型后训练的比较研究）[01:07] ⚡ Optimizing Large Language Model Training Using FP4 Quantization（优化使用FP4量化的超大语言模型训练）[01:47] 📚 Over-Tokenized Transformer: Vocabulary is Generally Worth Scaling（过度分词的Transformer：词汇量通常值得扩展）[02:30] 🧠 Open Problems in Mechanistic Interpretability（机制解释性中的开放问题）[03:14] 🌐 DiffSplat: Repurposing Image Diffusion Models for Scalable Gaussian Splat Generation（DiffSplat：利用图像扩散模型进行可扩展的3D高斯喷洒生成）[03:58] 🔍 Low-Rank Adapters Meet Neural Architecture Search for LLM Compression（低秩适配器与神经架构搜索在大语言模型压缩中的应用）[04:41] 🌐 IndicMMLU-Pro: Benchmarking Indic Large Language Models on Multi-Task Language Understanding（IndicMMLU-Pro：在多任务语言理解上评估印度语言大型语言模型）[05:27] 📚 Histoires Morales: A French Dataset for Assessing Moral Alignment（道德故事：评估道德一致性的法语数据集）【关注我们】您还可以在以下平台找到我们，获得播客内容以外更多信息小红书: AI速递在小宇宙查看该单集文稿

6分钟

2025.01.28 | Baichuan多模态模型表现优异，长上下文处理成本降低。

本期的 9 篇论文如下：[00:26] 🎙 Baichuan-Omni-1.5 Technical Report（百川全能1.5技术报告）[01:03] 📚 Qwen2.5-1M Technical Report（Qwen2.5-1M 技术报告）[01:47] 🤖 Towards General-Purpose Model-Free Reinforcement Learning（面向通用无模型强化学习的研究）[02:25] 🗣 Emilia: A Large-Scale, Extensive, Multilingual, and Diverse Dataset for Speech Generation（Emilia：一个大规模、广泛、多语言和多样化的语音生成数据集）[03:07] 🧠 ARWKV: Pretrain is not what we need, an RNN-Attention-Based Language Model Born from Transformer（ARWKV：预训练并非我们所需要的，基于RNN-注意力机制的语言模型诞生于Transformer）[03:52] 🧠 iFormer: Integrating ConvNet and Transformer for Mobile Application（iFormer：将卷积网络与Transformer集成应用于移动应用）[04:38] 🧠 Parameters vs FLOPs: Scaling Laws for Optimal Sparsity for Mixture-of-Experts Language Models（参数 vs FLOPs：混合专家语言模型最优稀疏性的缩放规律）[05:19] 🧠 Mixture-of-Mamba: Enhancing Multi-Modal State-Space Models with Modality-Aware Sparsity（混合Mamba：通过模态感知稀疏性增强多模态状态空间模型）[06:09] 📊 Feasible Learning（可行学习）【关注我们】您还可以在以下平台找到我们，获得播客内容以外更多信息小红书: AI速递在小宇宙查看该单集文稿

7分钟

82

2025.01.27 | 测试复杂性提升，冗余问题待解决

本期的 9 篇论文如下：[00:25] 🧠 Humanity's Last Exam（人类最后的考试）[01:06] 📊 Redundancy Principles for MLLMs Benchmarks（多模态大语言模型基准测试的冗余原则）[01:45] 🔗 Chain-of-Retrieval Augmented Generation（链式检索增强生成）[02:24] 📊 RealCritic: Towards Effectiveness-Driven Evaluation of Language Model Critiques（RealCritic：面向效果驱动的语言模型批评评估）[03:12] 👤 Relightable Full-Body Gaussian Codec Avatars（可重新照明的全身高斯编解码虚拟形象）[03:57] 📷 AdaIR: Adaptive All-in-One Image Restoration via Frequency Mining and Modulation（AdaIR：基于频率挖掘与调制的自适应全功能图像恢复）[04:40] 🌀 Denoising as Adaptation: Noise-Space Domain Adaptation for Image Restoration（去噪作为适应：基于噪声空间的图像复原域适应）[05:20] 🌐 Multiview Equivariance Improves 3D Correspondence Understanding with Minimal Feature Finetuning（多视角等变性提升基于最小特征微调的3D对应理解）[06:01] 🌍 GeoPixel: Pixel Grounding Large Multimodal Model in Remote Sensing（GeoPixel：遥感领域中的像素级大尺度多模态模型）【关注我们】您还可以在以下平台找到我们，获得播客内容以外更多信息小红书: AI速递在小宇宙查看该单集文稿

7分钟

73

【周末特辑】1月第3周最火AI论文 | DeepSeek-R1强化学习提升LLM推理能力，进化搜索优化复杂任务解决。

本期的 5 篇论文如下：[00:37] TOP1(🔥167) | 🧠 DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning（DeepSeek-R1：通过强化学习激励大语言模型的推理能力）[02:59] TOP2(🔥95) | 🧠 Evolving Deeper LLM Thinking（演化更深层次的LLM思维）[05:07] TOP3(🔥73) | 🤔 Agent-R: Training Language Model Agents to Reflect via Iterative Self-Training（Agent-R：通过迭代自训练使语言模型代理具备反思能力）[07:15] TOP4(🔥73) | 🎥 MMVU: Measuring Expert-Level Multi-Discipline Video Understanding（MMVU：专家级多学科视频理解的测量）[09:29] TOP5(🔥64) | 👁 VideoLLaMA 3: Frontier Multimodal Foundation Models for Image and Video Understanding（VideoLLaMA 3：面向图像与视频理解的前沿多模态基础模型）【关注我们】您还可以在以下平台找到我们，获得播客内容以外更多信息小红书: AI速递在小宇宙查看该单集文稿

12分钟

2025.01.24 | SRMT提升多智能体协作能力，VideoReward优化视频生成质量。

本期的 15 篇论文如下：[00:26] 🧠 SRMT: Shared Memory for Multi-agent Lifelong Pathfinding（SRMT：多智能体终身路径规划中的共享记忆）[01:05] 🎥 Improving Video Generation with Human Feedback（利用人类反馈改进视频生成）[01:40] ⚡ Sigma: Differential Rescaling of Query, Key and Value for Efficient Language Models（Sigma：查询、键和值的差分重缩放以实现高效语言模型）[02:20] 🖼 Can We Generate Images with CoT? Let's Verify and Reinforce Image Generation Step by Step（能否通过思维链生成图像？逐步验证和强化图像生成）[02:55] 🖼 IMAGINE-E: Image Generation Intelligence Evaluation of State-of-the-art Text-to-Image Models（IMAGINE-E：最先进文本到图像模型的图像生成智能评估）[03:32] 📚 Video-MMMU: Evaluating Knowledge Acquisition from Multi-Discipline Professional Videos（Video-MMMU：评估从多学科专业视频中获取知识的能力）[04:14] 🎥 DiffuEraser: A Diffusion Model for Video Inpainting（DiffuEraser：基于扩散模型的视频修复）[04:50] 🎥 Temporal Preference Optimization for Long-Form Video Understanding（长视频理解中的时序偏好优化）[05:29] 🎨 One-Prompt-One-Story: Free-Lunch Consistent Text-to-Image Generation Using a Single Prompt（一提示一故事：使用单一提示实现免费午餐式一致的文本到图像生成）[06:07] 🎥 EchoVideo: Identity-Preserving Human Video Generation by Multimodal Feature Fusion（EchoVideo：基于多模态特征融合的身份保持人类视频生成）[06:42] 🧠 Step-KTO: Optimizing Mathematical Reasoning through Stepwise Binary Feedback（Step-KTO：通过逐步二元反馈优化数学推理）[07:17] 🧠 Debate Helps Weak-to-Strong Generalization（辩论助力弱到强泛化）[07:53] 🤔 Evolution and The Knightian Blindspot of Machine Learning（进化与机器学习的奈特盲点）[08:30] 🧪 Hallucinations Can Improve Large Language Models in Drug Discovery（幻觉可以提升大语言模型在药物发现中的表现）[09:10] 🌀 GSTAR: Gaussian Surface Tracking and Reconstruction（GSTAR：高斯曲面跟踪与重建）【关注我们】您还可以在以下平台找到我们，获得播客内容以外更多信息小红书: AI速递在小宇宙查看该单集文稿

10分钟

77

2025.01.23 | DeepSeek-R1强化学习提升推理能力，多智能体框架实现虚拟电影自动化

本期的 9 篇论文如下：[00:24] 🧠 DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning（DeepSeek-R1：通过强化学习激励大语言模型的推理能力）[01:07] 🎬 FilmAgent: A Multi-Agent Framework for End-to-End Film Automation in Virtual 3D Spaces（FilmAgent：虚拟3D空间中的端到端电影自动化多智能体框架）[01:48] 🔄 Test-Time Preference Optimization: On-the-Fly Alignment via Iterative Textual Feedback（测试时偏好优化：通过迭代文本反馈实现即时对齐）[02:25] 👁 VideoLLaMA 3: Frontier Multimodal Foundation Models for Image and Video Understanding（VideoLLaMA 3：面向图像与视频理解的前沿多模态基础模型）[03:03] 🚀 Kimi k1.5: Scaling Reinforcement Learning with LLMs（Kimi k1.5：利用大语言模型扩展强化学习）[03:40] 🧠 Autonomy-of-Experts Models（专家自主模型）[04:18] 🏆 Pairwise RM: Perform Best-of-N Sampling with Knockout Tournament（成对奖励模型：通过淘汰赛进行最佳N采样）[05:01] ✂ O1-Pruner: Length-Harmonizing Fine-Tuning for O1-Like Reasoning Pruning（O1-Pruner：基于长度协调的微调用于O1类推理剪枝）[05:34] 🤖 IntellAgent: A Multi-Agent Framework for Evaluating Conversational AI Systems（IntellAgent：用于评估对话AI系统的多智能体框架）【关注我们】您还可以在以下平台找到我们，获得播客内容以外更多信息小红书: AI速递在小宇宙查看该单集文稿

6分钟

2025.01.22 | Agent-R提升语言模型实时纠错能力，MMVU评估多学科视频理解专家级表现。

本期的 16 篇论文如下：[00:24] 🤔 Agent-R: Training Language Model Agents to Reflect via Iterative Self-Training（Agent-R：通过迭代自训练使语言模型代理具备反思能力）[00:59] 🎥 MMVU: Measuring Expert-Level Multi-Discipline Video Understanding（MMVU：专家级多学科视频理解的测量）[01:35] ⚖ Demons in the Detail: On Implementing Load Balancing Loss for Training Specialized Mixture-of-Expert Models（细节中的魔鬼：实现负载均衡损失以训练专业化专家混合模型）[02:17] 🤖 UI-TARS: Pioneering Automated GUI Interaction with Native Agents（UI-TARS：开创性的原生GUI交互自动化代理）[02:55] 🤖 Mobile-Agent-E: Self-Evolving Mobile Assistant for Complex Tasks（Mobile-Agent-E：面向复杂任务的自我进化移动助手）[03:31] 🎨 TokenVerse: Versatile Multi-concept Personalization in Token Modulation Space（TokenVerse：基于令牌调制空间的多概念个性化方法）[04:14] 🏆 InternLM-XComposer2.5-Reward: A Simple Yet Effective Multi-Modal Reward Model（InternLM-XComposer2.5-Reward：一种简单而有效的多模态奖励模型）[04:57] 🎥 Video Depth Anything: Consistent Depth Estimation for Super-Long Videos（视频深度任意：超长视频的一致性深度估计）[05:39] 🤖 Learn-by-interact: A Data-Centric Framework for Self-Adaptive Agents in Realistic Environments（通过交互学习：现实环境中自适应代理的数据中心框架）[06:18] 🧠 Reasoning Language Models: A Blueprint（推理语言模型：蓝图）[06:58] 🎨 Hunyuan3D 2.0: Scaling Diffusion Models for High Resolution Textured 3D Assets Generation（Hunyuan3D 2.0：扩展扩散模型以生成高分辨率纹理3D资产）[07:40] 🧠 Condor: Enhance LLM Alignment with Knowledge-Driven Data Synthesis and Refinement（Condor：通过知识驱动的数据合成与精炼增强大语言模型的对齐能力）[08:21] 🎥 EMO2: End-Effector Guided Audio-Driven Avatar Video Generation（EMO2：基于末端执行器引导的音频驱动虚拟形象视频生成）[08:55] 🎥 Go-with-the-Flow: Motion-Controllable Video Diffusion Models Using Real-Time Warped Noise（随流而动：使用实时扭曲噪声实现运动可控的视频扩散模型）[09:32] 🌍 GPS as a Control Signal for Image Generation（GPS作为图像生成的控制信号）[10:11] ⚠ MSTS: A Multimodal Safety Test Suite for Vision-Language Models（MSTS：面向视觉-语言模型的多模态安全测试套件）【关注我们】您还可以在以下平台找到我们，获得播客内容以外更多信息小红书: AI速递在小宇宙查看该单集文稿

11分钟