播客: HuggingFace 每日AI论文速递 - EarsOnMe

播客简介

每天10分钟，带您快速了解当日HuggingFace热门AI论文内容。每个工作日更新，欢迎订阅。 📢播客节目在小宇宙、Apple Podcast平台搜索【HuggingFace 每日AI论文速递】 🖼另外还有图文版，可在小红书搜索并关注【AI速递】

创作者

拨号上网 1 档播客

节目

2026.06.05 | ArcANE框架量化角色弧线；TIDE模型实现主动洞察

HuggingFace 每日AI论文速递

【目录】本期的 15 篇论文如下： [00:31] 🎭 ArcANE: Do Role-Playing Language Agents Stay in Character at the Right Time?（ArcANE：角色扮演语言代理在正确时刻保持角色一致性吗？） [01:26] 🔍 TIDE: Proactive Multi-Problem Discovery via Template-Guided Iteration（TIDE：通过模板引导的迭代实现主动多问题发现） [02:27] 🤖 AdaPlanBench: Evaluating Adaptive Planning in Large Language Model Agents under World and User Constraints（AdaPlanBench：在世界与用户约束下评估大语言模型智能体的自适应规划能力） [03:14] 🎥 VideoKR: Towards Knowledge- and Reasoning-Intensive Video Understanding（VideoKR：迈向知识和推理密集型视频理解） [04:09] 🤖 RobotValues: Evaluating Household Robots When Human Values Conflict（机器人价值观：当人类价值观冲突时评估家用机器人） [05:01] 🌐 Reinforcement Learning Elicits Contextual Learning of Unseen Language Translation（强化学习引发对未见语言的上下文翻译学习） [05:58] 🎬 LoomVideo: Unifying Multimodal Inputs into Video Generation and Editing（LoomVideo：统一多模态输入的视频生成与编辑） [06:49] 📸 Personal AI Agent for Camera Roll VQA（个人相机胶卷视觉问答的AI助手） [07:36] 🧠 Rethinking Continual Experience Internalization for Self-Evolving LLM Agents（重新思考持续经验内化以实现自演化的大语言模型智能体） [08:27] ⚖ Complexity-Balanced Diffusion Splitting（复杂度平衡扩散分割） [09:28] 🤖 Dream.exe: Can Video Generation Models Dream Executable Robot Manipulation?（Dream.exe：视频生成模型能否构想出可执行的机器人操作？） [10:33] 🔬 Unsupervised Skill Discovery for Agentic Data Analysis（面向智能体数据分析的无监督技能发现） [11:25] 🔍 LLMs Can Leak Training Data But Do They Want To? A Propensity-Aware Evaluation of Memorization in LLMs（大型语言模型可能泄露训练数据，但它们愿意吗？一种基于倾向性的记忆评估方法） [12:17] 🎯 Towards One-to-Many Temporal Grounding（迈向一对多时序定位） [13:16] 💰 The Shadow Price of Reasoning: Economic Perspective on Optimal Budget Allocation for LLMs（推理的影子价格：大型语言模型最优预算分配的经济学视角）【关注我们】您还可以在以下平台找到我们，获得播客内容以外更多信息小红书: AI速递【赞助商】 OpenClaw快报每天五分钟，听听 OpenClaw 快报，带你了解最新动态和业内讨论传送门 https://www.xiaoyuzhoufm.com/podcast/6a1732a2dffa135d0ab5ef43

14分钟

45

2周前

2026.06.04 | 全模态统一框架；音频实时主动交互

HuggingFace 每日AI论文速递

【目录】本期的 15 篇论文如下： [00:31] 🌌 Cosmos 3: Omnimodal World Models for Physical AI（宇宙3：面向物理AI的全模态世界模型） [01:36] 🎧 Audio Interaction Model（音频交互模型） [02:31] 🔍 Where Do Deep-Research Agents Go Wrong? Span-Level Error Localization in Agent Trajectories（深度研究型智能体错在哪里？智能体轨迹中的跨度级错误定位） [03:30] 🔍 Reproducing, Analyzing, and Detecting Reward Hacking in Rubric-Based Reinforcement Learning（在基于评分标准的强化学习中复现、分析与检测奖励作弊行为） [04:25] 🧭 OVO-S-Bench: A Hierarchical Benchmark for Streaming Spatial Intelligence in Multimodal LLMs（OVO-S-Bench：面向多模态大语言模型流式空间智能的分层基准） [05:27] ⚡ Qwen-Image-Flash: Beyond Objective Design（Qwen-Image-Flash：超越客观设计） [06:18] 🧠 M$^3$Eval: Multi-Modal Memory Evaluation through Cognitively-Grounded Video Tasks（M$^3$Eval：基于认知视频任务的多模态记忆评估） [07:13] 🎥 Echo-Infinity: Learning Evolving Memory for Real-Time Infinite Video Generation（回声无限：面向实时无限视频生成的可学习演化记忆） [08:14] 🧠 ThoughtFold: Folding Reasoning Chains via Introspective Preference Learning（思维折叠：通过内省偏好学习折叠推理链） [09:08] 🧪 Benchmarks are Not Enough: RAMP for Runtime Assessing of Agentic Models in Production Systems（基准测试并不足够：用于生产系统中智能体模型运行时评估的RAMP框架） [10:15] ⚡ Streaming Communication in Multi-Agent Reasoning（多智能体推理中的流式通信） [11:08] 🎯 Self-Distilled Policy Gradient（自蒸馏策略梯度） [12:13] 🧠 MemTrain: Self-Supervised Context Memory Training（MemTrain：自监督上下文记忆训练） [13:05] 🧩 Eliciting Complex Spatial Reasoning in MLLMs through Wide-Baseline Matching（通过宽基线匹配激发多模态大语言模型中的复杂空间推理能力） [14:11] 🤖 MMG2Skill: Can Agents Distill In-the-Wild Guides into Self-Evolving Skills?（MMG2Skill：智能体能否从野外指南中蒸馏出自我进化的技能？）【关注我们】您还可以在以下平台找到我们，获得播客内容以外更多信息小红书: AI速递【赞助商】 OpenClaw快报每天五分钟，听听 OpenClaw 快报，带你了解最新动态和业内讨论传送门 https://www.xiaoyuzhoufm.com/podcast/6a1732a2dffa135d0ab5ef43

15分钟

87

2周前

2026.06.03 | 信任区域教小模型；人形GPT追踪动作

HuggingFace 每日AI论文速递

【目录】本期的 15 篇论文如下： [00:31] 🎯 Trust Region On-Policy Distillation（信任区域同策略蒸馏） [01:17] 🤖 Humanoid-GPT: Scaling Data and Structure for Zero-Shot Motion Tracking（人形GPT：扩展数据与结构实现零样本运动追踪） [02:07] 🧠 A Local Perturbation Theory for Cross-Domain Interference and Recovery in Multi-Domain RL（多领域强化学习中跨域干扰与恢复的局部微扰理论） [03:06] 🧠 World Models Meet Language Models: On the Complementarity of Concrete and Abstract Reasoning（世界模型与语言模型：具体与抽象推理的互补性） [03:57] 🏥 AutoMedBench: Towards Medical AutoResearch with Agentic AI Models（AutoMedBench：面向医疗自主研究的智能体AI模型基准） [05:09] 🖼 Decoupled Residual Denoising Diffusion Models for Unified and Data Efficient Image-to-Image Translation（解耦残差去噪扩散模型用于统一且数据高效的图像到图像翻译） [06:12] 😴 Language Models Need Sleep: Learning to Self-Modify and Consolidate Memories（语言模型需要睡眠：学习自我修改与记忆巩固） [07:09] 🧩 TRON: Targeted Rule-Verifiable Online Environments for Visual Reasoning RL（TRON：面向视觉推理强化学习的目标驱动、规则可验证的在线环境） [08:07] 💬 $Ψ$-Bench: Evaluating Persona-Sensitive Influencing in Persuasive Dialogues（Ψ-Bench：评估说服性对话中个性感知影响能力） [09:08] 🧩 Decentralized Instruction Tuning: Conflict-Aware Splitting and Weight Merging（去中心化指令微调：冲突感知分割与权重合并） [10:05] 🎯 Small RL Controller, Large Language Model: RL-Guided Adaptive Sampling for Test-Time Scaling（小型强化学习控制器与大型语言模型：基于强化学习引导的自适应采样实现测试时扩展） [11:09] 📄 PaddleOCR-VL-1.6: Expanding the Frontier of Document Parsing with Under-Optimized Region Refinement and Progressive Post-Training（PaddleOCR-VL-1.6：通过欠优化区域精炼与渐进式后训练扩展文档解析前沿） [12:14] 🗺 PlatonicNav: Unveiling Semantic Correspondence in Navigation with Platonic Topological Maps（柏拉图导航：利用柏拉图拓扑图揭示导航中的语义对应关系） [13:16] 🔍 Diagnosing Harmful Continuation in Answer-Correct Long-CoT Training Traces（诊断正确答案长链思维训练轨迹中的有害延续） [14:05] 🎵 MERIT: Learning Disentangled Music Representations for Audio Similarity（MERIT：学习用于音频相似性的解耦音乐表示）【关注我们】您还可以在以下平台找到我们，获得播客内容以外更多信息小红书: AI速递【赞助商】 OpenClaw快报每天五分钟，听听 OpenClaw 快报，带你了解最新动态和业内讨论传送门 https://www.xiaoyuzhoufm.com/podcast/6a1732a2dffa135d0ab5ef43

15分钟

82

2周前

2026.06.02 | 多智能体框架生成可编辑图表；参数高效微调支撑百万个性化模型

HuggingFace 每日AI论文速递

【目录】本期的 15 篇论文如下： [00:33] 🎨 Crafter: A Multi-Agent Harness for Editable Scientific Figure Generation from Diverse Inputs（Crafter：一种用于从多样化输入生成可编辑科学图形的多智能体框架） [01:39] 🧩 On the Scaling of PEFT: Towards Million Personal Models of Trillion Parameters（关于参数高效微调的规模化：迈向万亿参数级别的百万个性化模型） [02:35] 🧪 A Matter of TASTE: Improving Coverage and Difficulty of Agent Benchmarks（品味之道：提升智能体基准测试的覆盖度与难度） [03:25] 🌐 K-BrowseComp: A Web Browsing Agent Benchmark Grounded in Korean Contexts（K-BrowseComp：基于韩国语境的网页浏览代理基准测试） [04:21] ⚡ Draft-OPD: On-Policy Distillation for Speculative Draft Models（Draft-OPD：面向推测草稿模型的在策略蒸馏） [05:10] 🎓 VLMs are Good Teachers for Video Reasoning via Adaptive Test-Time Optimization（视觉语言模型作为视频推理的优质教师：通过自适应测试时优化） [06:18] 📡 X-Stream: Exploring MLLMs as Multiplexers for Multi-Stream Understanding（X-Stream：探索多模态大语言模型作为多流理解的多路复用器） [07:13] 🎬 VideoMLA: Low-Rank Latent KV Cache for Minute-Scale Autoregressive Video Diffusion（VideoMLA：用于分钟级自回归视频扩散的低秩潜在KV缓存） [07:59] 🤖 SkillAdaptor: Self-Adapting Skills for LLM Agents from Trajectories（SkillAdaptor：面向LLM智能体的自适应技能从轨迹中学习） [08:54] 🧠 Which Pretraining Paradigm Better Serves Spatial Intelligence? An Empirical Comparison of Vision-Language and Video Generation Models（哪种预训练范式更好地服务于空间智能？视觉语言模型与视频生成模型的实证比较） [09:51] 🧠 NITP: Next Implicit Token Prediction for LLM Pre-training（NITP：面向大语言模型预训练的下一隐式词元预测） [10:50] 👀 Where to Look: Can Foundation Models Reach a Target Viewpoint Through Active Exploration?（该看向哪里：基础模型能否通过主动探索达到目标视角？） [11:46] 🎬 LVSA: Training-Free Sparse Attention for Long Video Diffusion（LVSA：面向长视频扩散的无训练稀疏注意力机制） [12:38] 🛑 ESPO: Early-Stopping Proximal Policy Optimization（早期停止的近端策略优化） [13:37] 🎤 StreamChar: Long-Horizon Streaming Character Audio-Video Generation with Decoupled Orchestration（StreamChar：基于解耦编排的长时流式角色音频-视频生成）【关注我们】您还可以在以下平台找到我们，获得播客内容以外更多信息小红书: AI速递【赞助商】 OpenClaw快报每天五分钟，听听 OpenClaw 快报，带你了解最新动态和业内讨论传送门 https://www.xiaoyuzhoufm.com/podcast/6a1732a2dffa135d0ab5ef43

14分钟

84

2周前