节目列表: HuggingFace 每日AI论文速递 - EarsOnMe

8个月前

2025.02.07 | 特征流提升模型可解释性，超IF增强指令跟随能力。

本期的 21 篇论文如下： [00:24] 🔄 Analyze Feature Flow to Enhance Interpretation and Steering in Language Models（分析特征流以增强语言模型的解释与控制） [01:03] 🤖 UltraIF: Advancing Instruction Following from the Wild（超IF：从野外推进指令跟随） [01:40] 🎥 DynVFX: Augmenting Real Videos with Dynamic Content（DynVFX：用动态内容增强真实视频） [02:16] 🌐 Ola: Pushing the Frontiers of Omni-Modal Language Model with Progressive Modality Alignment（Ola：通过渐进式模态对齐推动全模态语言模型的前沿） [02:51] 🏃 MotionLab: Unified Human Motion Generation and Editing via the Motion-Condition-Motion Paradigm（MotionLab：基于运动-条件-运动范式的统一人体运动生成与编辑） [03:31] 🤖 Great Models Think Alike and this Undermines AI Oversight（伟大的模型思维相似，这削弱了AI监督） [04:07] 📚 MAGA: MAssive Genre-Audience Reformulation to Pretraining Corpus Expansion（MAGA：大规模体裁-受众重构以扩展预训练语料库） [04:47] 🏆 Gold-medalist Performance in Solving Olympiad Geometry with AlphaGeometry2（在解决奥林匹克几何问题中实现金牌选手水平的AlphaGeometry2） [05:25] 🤖 ScoreFlow: Mastering LLM Agent Workflows via Score-based Preference Optimization（ScoreFlow：基于评分偏好优化的LLM代理工作流掌握） [06:07] 🎙 Llasa: Scaling Train-Time and Inference-Time Compute for Llama-based Speech Synthesis（Llasa：扩展基于Llama的语音合成中的训练和推理计算） [06:51] 🎥 MotionCanvas: Cinematic Shot Design with Controllable Image-to-Video Generation（MotionCanvas：基于可控图像到视频生成的电影镜头设计） [07:38] 📊 ChartCitor: Multi-Agent Framework for Fine-Grained Chart Visual Attribution（ChartCitor：细粒度图表视觉归属的多代理框架） [08:18] 🧠 BOLT: Bootstrap Long Chain-of-Thought in Language Models without Distillation（BOLT：无需蒸馏的大语言模型长链思维自举） [09:01] 🔄 Beyond Prompt Content: Enhancing LLM Performance via Content-Format Integrated Prompt Optimization（超越提示内容：通过内容-格式集成提示优化提升大语言模型性能） [09:45] 🌀 Weak-to-Strong Diffusion with Reflection（从弱到强扩散与反射） [10:26] 🤖 PlotGen: Multi-Agent LLM-based Scientific Data Visualization via Multimodal Feedback（PlotGen：基于多智能体LLM的科学数据可视化通过多模态反馈） [11:04] 🔧 Enhancing Code Generation for Low-Resource Languages: No Silver Bullet（提升低资源语言的代码生成：没有银弹） [11:48] 🔓 Speak Easy: Eliciting Harmful Jailbreaks from LLMs with Simple Interactions（轻松对话：通过简单互动从LLM中引出有害越狱行为） [12:22] 🤖 PILAF: Optimal Human Preference Sampling for Reward Modeling（PILAF：最优人类偏好采样用于奖励建模） [13:05] 🎥 Towards Physical Understanding in Video Generation: A 3D Point Regularization Approach（面向视频生成的物理理解：一种3D点正则化方法） [13:47] 🤖 Learning Real-World Action-Video Dynamics with Heterogeneous Masked Autoregression（基于异质掩码自回归的现实世界动作视频动态学习）【关注我们】您还可以在以下平台找到我们，获得播客内容以外更多信息小红书: AI速递

14分钟

8个月前

2025.02.06 | 数据优化提升模型性能，模拟市场再现复杂行为。

本期的 10 篇论文如下： [00:26] 🤖 SmolLM2: When Smol Goes Big -- Data-Centric Training of a Small Language Model（SmolLM2：当小型模型走向大型化——以数据为中心的小型语言模型训练） [01:08] 🌐 TwinMarket: A Scalable Behavioral and Social Simulation for Financial Markets（双市场：一种可扩展的金融市场的行为与社会模拟） [01:45] 🧠 Demystifying Long Chain-of-Thought Reasoning in LLMs（揭秘大语言模型中的长链推理） [02:23] 🧠 LIMO: Less is More for Reasoning（LIMO：少即是多的推理） [03:15] 🧠 Boosting Multimodal Reasoning with MCTS-Automated Structured Thinking（通过蒙特卡洛树搜索提升多模态推理的自动化结构化思考） [04:04] 🧠 A Probabilistic Inference Approach to Inference-Time Scaling of LLMs using Particle-Based Monte Carlo Methods（基于粒子蒙特卡罗方法的概率推理在大语言模型推理时缩放中的应用） [04:47] 🔓 Jailbreaking with Universal Multi-Prompts（基于通用多提示的越狱技术） [05:25] 🎨 LayerTracer: Cognitive-Aligned Layered SVG Synthesis via Diffusion Transformer（LayerTracer：基于扩散变换器的认知对齐分层SVG合成） [06:27] 🧠 Token Assorted: Mixing Latent and Text Tokens for Improved Language Model Reasoning（令牌混合：通过混合潜在与文本令牌提升语言模型推理能力） [07:09] 🧠 On Teacher Hacking in Language Model Distillation（语言模型蒸馏中的教师模型攻击现象研究）【关注我们】您还可以在以下平台找到我们，获得播客内容以外更多信息小红书: AI速递

8分钟

2025.02.05 | 逆桥匹配蒸馏提速，视频JAM提升运动连贯。

本期的 9 篇论文如下： [00:25] ⚡ Inverse Bridge Matching Distillation（逆桥匹配蒸馏） [01:02] 🎥 VideoJAM: Joint Appearance-Motion Representations for Enhanced Motion Generation in Video Models（视频JAM：增强视频模型运动生成的联合外观-运动表示） [01:44] 🤖 ACECODER: Acing Coder RL via Automated Test-Case Synthesis（ACECODER：通过自动化测试用例合成提升编码模型） [02:25] 🧠 QLASS: Boosting Language Agent Inference via Q-Guided Stepwise Search（QLASS：通过Q引导的逐步搜索提升语言代理推理） [03:09] 📉 Can LLMs Maintain Fundamental Abilities under KV Cache Compression?（LLM在KV缓存压缩下的基本能力保持情况） [03:56] 🧠 Satori: Reinforcement Learning with Chain-of-Action-Thought Enhances LLM Reasoning via Autoregressive Search（Satori：通过链式动作思维增强LLM推理的自回归搜索） [04:46] 🖼 Generating Multi-Image Synthetic Data for Text-to-Image Customization（生成多图像合成数据用于文本到图像定制） [05:31] 🤔 Rethinking Mixture-of-Agents: Is Mixing Different Large Language Models Beneficial?（重新思考混合代理：混合不同大型语言模型是否有益？） [06:13] 🎯 Concept Steerers: Leveraging K-Sparse Autoencoders for Controllable Generations（概念引导器：利用K稀疏自编码器实现可控生成）【关注我们】您还可以在以下平台找到我们，获得播客内容以外更多信息小红书: AI速递

7分钟

2025.02.04 | DAAs性能提升，OmniHuman动画优化。

本期的 20 篇论文如下： [00:26] 🤔 The Differences Between Direct Alignment Algorithms are a Blur（直接对齐算法的差异逐渐模糊） [01:07] 🤖 OmniHuman-1: Rethinking the Scaling-Up of One-Stage Conditioned Human Animation Models（OmniHuman-1：重新思考单阶段条件式人体动画模型的放大） [01:48] 💡 Process Reinforcement through Implicit Rewards（基于隐式奖励的过程强化） [02:36] ⚖ Preference Leakage: A Contamination Problem in LLM-as-a-judge（偏好泄露：LLM即评判器中的污染问题） [03:14] 🛡 SafeRAG: Benchmarking Security in Retrieval-Augmented Generation of Large Language Model（SafeRAG：评估大语言模型检索增强生成中的安全性） [04:02] 🚀 FastKV: KV Cache Compression for Fast Long-Context Processing with Token-Selective Propagation（FastKV：通过令牌选择性传播实现快速长文本处理的KV缓存压缩） [04:50] 🌍 AIN: The Arabic INclusive Large Multimodal Model（AIN：阿拉伯语包容性大型多模态模型） [05:39] 🧠 DeepRAG: Thinking to Retrieval Step by Step for Large Language Models（DeepRAG：面向大型语言模型的逐步思考检索） [06:30] 🤔 MM-IQ: Benchmarking Human-Like Abstraction and Reasoning in Multimodal Models（MM-IQ：多模态模型中类人抽象与推理能力的基准测试） [07:19] 🛡 Almost Surely Safe Alignment of Large Language Models at Inference-Time（大语言模型在推理时近乎完全安全的对齐） [08:04] 🤔 ZebraLogic: On the Scaling Limits of LLMs for Logical Reasoning（ZebraLogic：关于大型语言模型在逻辑推理中的扩展极限） [08:49] 🤔 The Jumping Reasoning Curve? Tracking the Evolution of Reasoning Performance in GPT-[n] and o-[n] Models on Multimodal Puzzles（跳跃的推理曲线？追踪GPT-[n]和o-[n]模型在多模态谜题上的推理性能演变） [09:38] 🎮 Improving Transformer World Models for Data-Efficient RL（改进Transformer世界模型以实现数据高效的强化学习） [10:22] 💡 Improved Training Technique for Latent Consistency Models（改进的潜在一致性模型训练技术） [11:07] 🧠 Scaling Embedding Layers in Language Models（语言模型中扩展嵌入层） [11:42] 🎨 SliderSpace: Decomposing the Visual Capabilities of Diffusion Models（SliderSpace：解构扩散模型的视觉能力） [12:24] 🤔 PhD Knowledge Not Required: A Reasoning Challenge for Large Language Models（无需博士知识：大型语言模型的推理挑战） [13:08] 🧠 Lifelong Sequential Knowledge Editing without Model Degradation（终身序列知识编辑，且不降低模型性能） [13:46] 🔬 Current Pathology Foundation Models are unrobust to Medical Center Differences（当前病理学基础模型对于医疗中心差异不具有鲁棒性） [14:37] 🫀 A Study on the Performance of U-Net Modifications in Retroperitoneal Tumor Segmentation（U-Net改进模型在腹膜后肿瘤分割中的性能研究）【关注我们】您还可以在以下平台找到我们，获得播客内容以外更多信息小红书: AI速递

15分钟

2025.02.03 | 测试时缩放提升推理，奖励引导解码减少计算。

本期的 9 篇论文如下： [00:26] 🧠 s1: Simple test-time scaling（简单的测试时缩放） [01:18] ⚡ Reward-Guided Speculative Decoding for Efficient LLM Reasoning（奖励引导的推测解码方法用于高效LLM推理） [02:00] 🧠 Self-supervised Quantized Representation for Seamlessly Integrating Knowledge Graphs with Large Language Models（自监督量化表示法用于无缝集成知识图谱与大型语言模型） [02:41] 🛡 Constitutional Classifiers: Defending against Universal Jailbreaks across Thousands of Hours of Red Teaming（宪法分类器：在数千小时的红队测试中防御通用越狱攻击） [03:28] 🌍 DINO-WM: World Models on Pre-trained Visual Features enable Zero-shot Planning（DINO-WM：基于预训练视觉特征的世界模型实现零样本规划） [04:13] 🧠 Trading Inference-Time Compute for Adversarial Robustness（推理时间计算对对抗鲁棒性的影响） [04:54] 🧠 INT: Instance-Specific Negative Mining for Task-Generic Promptable Segmentation（任务通用提示分割的实例特定负样本挖掘） [05:30] 📰 Unraveling the Capabilities of Language Models in News Summarization（揭秘语言模型在新闻摘要中的能力） [06:09] 🎥 Fast Encoder-Based 3D from Casual Videos via Point Track Processing（基于快速编码器的从随意视频中进行3D重建的点轨迹处理）【关注我们】您还可以在以下平台找到我们，获得播客内容以外更多信息小红书: AI速递

7分钟

92

【月末特辑】1月最火AI论文 | DeepSeek-R1强化学习提升LLM推理能力；长文本处理突破

本期的 10 篇论文如下： [00:40] TOP1(🔥281) | 🧠 DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning（DeepSeek-R1：通过强化学习激励大语言模型的推理能力） [03:13] TOP2(🔥271) | ⚡ MiniMax-01: Scaling Foundation Models with Lightning Attention（MiniMax-01：基于闪电注意力机制扩展基础模型） [05:36] TOP3(🔥249) | 🧠 rStar-Math: Small LLMs Can Master Math Reasoning with Self-Evolved Deep Thinking（rStar-Math：小型语言模型通过自我进化的深度思考掌握数学推理） [08:13] TOP4(🔥103) | 🧠 Evolving Deeper LLM Thinking（演化更深层次的LLM思维） [10:28] TOP5(🔥99) | 📚 2.5 Years in Class: A Multimodal Textbook for Vision-Language Pretraining（2.5年课堂：用于视觉-语言预训练的多模态教科书） [12:51] TOP6(🔥90) | 🚀 REINFORCE++: A Simple and Efficient Approach for Aligning Large Language Models（REINFORCE++：一种简单高效的大语言模型对齐方法） [15:15] TOP7(🔥90) | 🧠 Towards System 2 Reasoning in LLMs: Learning How to Think With Meta Chain-of-Though（迈向LLMs中的系统2推理：学习如何通过元思维链进行思考） [17:14] TOP8(🔥89) | 📊 The Lessons of Developing Process Reward Models in Mathematical Reasoning（数学推理中过程奖励模型开发的经验教训） [19:33] TOP9(🔥88) | 🤔 Agent-R: Training Language Model Agents to Reflect via Iterative Self-Training（Agent-R：通过迭代自训练使语言模型代理具备反思能力） [21:35] TOP10(🔥87) | 🧠 The GAN is dead; long live the GAN! A Modern GAN Baseline（GAN已死；GAN万岁！一个现代的GAN基线）【关注我们】您还可以在以下平台找到我们，获得播客内容以外更多信息小红书: AI速递

24分钟

【周末特辑】1月第4周最火AI论文 | 强化学习优于监督微调，HLE挑战LLMs能力。

本期的 5 篇论文如下： [00:35] TOP1(🔥53) | 🧠 SFT Memorizes, RL Generalizes: A Comparative Study of Foundation Model Post-training（监督微调记忆，强化学习泛化：基础模型后训练的比较研究） [03:02] TOP2(🔥48) | 🧠 Humanity's Last Exam（人类最后的考试） [05:21] TOP3(🔥47) | 🛡 GuardReasoner: Towards Reasoning-based LLM Safeguards（GuardReasoner：面向基于推理的LLM安全防护） [07:44] TOP4(🔥45) | 🎙 Baichuan-Omni-1.5 Technical Report（百川全能1.5技术报告） [10:07] TOP5(🔥42) | 📚 Qwen2.5-1M Technical Report（Qwen2.5-1M 技术报告）【关注我们】您还可以在以下平台找到我们，获得播客内容以外更多信息小红书: AI速递

12分钟

2025.01.31 | GuardReasoner提升LLM安全，MedXpertQA挑战医疗AI推理。

本期的 8 篇论文如下： [00:25] 🛡 GuardReasoner: Towards Reasoning-based LLM Safeguards（GuardReasoner：面向基于推理的LLM安全防护） [01:04] 🩺 MedXpertQA: Benchmarking Expert-Level Medical Reasoning and Understanding（MedXpertQA：专家级医疗推理与理解基准测试） [01:58] 🧠 Thoughts Are All Over the Place: On the Underthinking of o1-Like LLMs（思维四处游走：关于o1类LLMs的浅思现象） [02:40] 🌐 Streaming DiLoCo with overlapping communication: Towards a Distributed Free Lunch（带有重叠通信的流式DiLoCo：迈向分布式免费午餐） [03:20] 🌍 PhysBench: Benchmarking and Enhancing Vision-Language Models for Physical World Understanding（PhysBench：评估与增强视觉-语言模型在物理世界理解中的表现） [04:09] 🤖 WILDCHAT-50M: A Deep Dive Into the Role of Synthetic Data in Post-Training（WILDCHAT-50M：深入探讨合成数据在训练后阶段的作用） [05:04] 🛡 o3-mini vs DeepSeek-R1: Which One is Safer?（o3-mini 与 DeepSeek-R1：哪个更安全？） [05:41] 🤔 Large Language Models Think Too Fast To Explore Effectively（大语言模型思考过快导致探索效果不佳）【关注我们】您还可以在以下平台找到我们，获得播客内容以外更多信息小红书: AI速递

6分钟

98

2025.01.30 | 批评提升推理，AI能耗引关注

本期的 5 篇论文如下： [00:25] 🧠 Critique Fine-Tuning: Learning to Critique is More Effective than Learning to Imitate（批评微调：学习批评比学习模仿更有效） [01:10] 🌍 Exploring the sustainable scaling of AI dilemma: A projective study of corporations' AI environmental impacts（探索AI可持续扩展的困境：企业AI环境影响的预测性研究） [01:50] 🌟 Atla Selene Mini: A General Purpose Evaluation Model（Atla Selene Mini：一种通用评估模型） [02:27] ⚠ Early External Safety Testing of OpenAI's o3-mini: Insights from the Pre-Deployment Evaluation（OpenAI的o3-mini早期外部安全测试：部署前评估的见解） [03:06] 🦠 Virus: Harmful Fine-tuning Attack for Large Language Models Bypassing Guardrail Moderation（病毒：绕过防护机制的大语言模型有害微调攻击）【关注我们】您还可以在以下平台找到我们，获得播客内容以外更多信息小红书: AI速递

4分钟

2025.01.29 | RL泛化优，SFT稳定输出；FP4量化降成本，精度保持。

本期的 8 篇论文如下： [00:26] 🧠 SFT Memorizes, RL Generalizes: A Comparative Study of Foundation Model Post-training（监督微调记忆，强化学习泛化：基础模型后训练的比较研究） [01:07] ⚡ Optimizing Large Language Model Training Using FP4 Quantization（优化使用FP4量化的超大语言模型训练） [01:47] 📚 Over-Tokenized Transformer: Vocabulary is Generally Worth Scaling（过度分词的Transformer：词汇量通常值得扩展） [02:30] 🧠 Open Problems in Mechanistic Interpretability（机制解释性中的开放问题） [03:14] 🌐 DiffSplat: Repurposing Image Diffusion Models for Scalable Gaussian Splat Generation（DiffSplat：利用图像扩散模型进行可扩展的3D高斯喷洒生成） [03:58] 🔍 Low-Rank Adapters Meet Neural Architecture Search for LLM Compression（低秩适配器与神经架构搜索在大语言模型压缩中的应用） [04:41] 🌐 IndicMMLU-Pro: Benchmarking Indic Large Language Models on Multi-Task Language Understanding（IndicMMLU-Pro：在多任务语言理解上评估印度语言大型语言模型） [05:27] 📚 Histoires Morales: A French Dataset for Assessing Moral Alignment（道德故事：评估道德一致性的法语数据集）【关注我们】您还可以在以下平台找到我们，获得播客内容以外更多信息小红书: AI速递

6分钟

2025.01.28 | Baichuan多模态模型表现优异，长上下文处理成本降低。

本期的 9 篇论文如下： [00:26] 🎙 Baichuan-Omni-1.5 Technical Report（百川全能1.5技术报告） [01:03] 📚 Qwen2.5-1M Technical Report（Qwen2.5-1M 技术报告） [01:47] 🤖 Towards General-Purpose Model-Free Reinforcement Learning（面向通用无模型强化学习的研究） [02:25] 🗣 Emilia: A Large-Scale, Extensive, Multilingual, and Diverse Dataset for Speech Generation（Emilia：一个大规模、广泛、多语言和多样化的语音生成数据集） [03:07] 🧠 ARWKV: Pretrain is not what we need, an RNN-Attention-Based Language Model Born from Transformer（ARWKV：预训练并非我们所需要的，基于RNN-注意力机制的语言模型诞生于Transformer） [03:52] 🧠 iFormer: Integrating ConvNet and Transformer for Mobile Application（iFormer：将卷积网络与Transformer集成应用于移动应用） [04:38] 🧠 Parameters vs FLOPs: Scaling Laws for Optimal Sparsity for Mixture-of-Experts Language Models（参数 vs FLOPs：混合专家语言模型最优稀疏性的缩放规律） [05:19] 🧠 Mixture-of-Mamba: Enhancing Multi-Modal State-Space Models with Modality-Aware Sparsity（混合Mamba：通过模态感知稀疏性增强多模态状态空间模型） [06:09] 📊 Feasible Learning（可行学习）【关注我们】您还可以在以下平台找到我们，获得播客内容以外更多信息小红书: AI速递

7分钟

82