HuggingFace 每日AI论文速递 - 节目列表

4天前

2026.06.19 | RATs让机器人自主玩耍学技能；Moebius用0.2B参数实现10B级修复性能

【赞助商】 OpenClaw快报每天五分钟，听听 OpenClaw 快报，带你了解最新动态和业内讨论传送门 https://www.xiaoyuzhoufm.com/podcast/6a1732a2dffa135d0ab5ef43 【目录】本期的 15 篇论文如下： [00:33] 🤖 Playful Agentic Robot Learning（趣味自主型机器人学习） [01:22] 🎨 Moebius: 0.2B Lightweight Image Inpainting Framework with 10B-Level Performance（Moebius：0.2B轻量级图像修复框架，实现10B级性能） [02:10] 🧠 S-Agent: Spatial Tool-Use Elicits Reasoning for Spatial Intelligence（S-Agent：空间工具使用激发空间智能推理） [03:10] 📊 Beyond Static Leaderboards: Predictive Validity for the Evaluation of LLM Agents（超越静态排行榜：LLM智能体评估的预测有效性） [04:05] 🎨 FreeStyle: Free Control of Style-Content Dual-Reference Generation from Community LoRA Mining（FreeStyle：基于社区LoRA挖掘的自由风格-内容双参考生成控制） [05:06] 🪄 JanusMesh: Fast and Zero-Shot 3D Visual Illusion Generation via Cross-Space Denoising（JanusMesh：通过跨空间去噪实现快速且零样本的3D视觉错觉生成） [05:58] 🤖 ENPIRE: Agentic Robot Policy Self-Improvement in the Real World（ENPIRE：实体世界中智能体机器人策略的自我改进） [06:57] 👁 Thinking with Visual Grounding（视觉锚定思考） [07:41] 🔍 Understanding the Behaviors of Environment-aware Information Retrieval（理解环境感知信息检索的行为） [08:37] 🤖 FAPO: Fully Autonomous Prompt Optimization of Multi-Step LLM Pipelines（FAPO：多步骤大语言模型管道的全自主提示优化） [09:28] 🧊 Adaptive Volumetric Mechanical Property Fields Invariant to Resolution（自适应体积力学属性场，分辨率不变性） [10:23] 📸 DF3DV-1K: A Large-Scale Dataset and Benchmark for Distractor-Free Novel View Synthesis（DF3DV-1K：用于无干扰新视角合成的大规模数据集与基准） [11:16] 🌍 Holo-World: Unified Camera, Object and Weather Control for Video World Model（全息世界：面向视频世界模型的统一相机、物体与天气控制） [12:12] 🎨 ImageWAM: Do World Action Models Really Need Video Generation, or Just Image Editing?（ImageWAM：世界动作模型真的需要视频生成，还是只需图像编辑？） [13:07] 🎯 Selective Synergistic Learning for Video Object-Centric Learning（面向视频对象中心学习的选择性协同学习）【关注我们】您还可以在以下平台找到我们，获得播客内容以外更多信息小红书: AI速递

89

5天前

2026.06.18 | 多模态大模型记忆成短板；语言指令驱动3D轨迹预测

【赞助商】 OpenClaw快报每天五分钟，听听 OpenClaw 快报，带你了解最新动态和业内讨论传送门 https://www.xiaoyuzhoufm.com/podcast/6a1732a2dffa135d0ab5ef43 【目录】本期的 15 篇论文如下： [00:32] 🧠 Beyond the Current Observation: Evaluating Multimodal Large Language Models in Controllable Non-Markov Games（超越当前观测：评估多模态大语言模型在可控非马尔可夫博弈中的表现） [01:29] 🎯 MolmoMotion: Forecasting Point Trajectories in 3D with Language Instruction（MolmoMotion：基于语言指令的3D点轨迹预测） [02:15] 🌍 Kairos: A Native World Model Stack for Physical AI（Kairos：面向物理智能的原生世界模型栈） [03:05] 🛠 Guava: An Effective and Universal Harness for Embodied Manipulation（番石榴：一种有效且通用的具身操作框架） [03:58] ⚡ EfficientRollout: System-Aware Self-Speculative Decoding for RL Rollouts（高效展开：面向强化学习展开的系统感知自推测解码） [04:45] 🎯 The Reward Was in Your Data All Along: Correcting Flow Matching with Discriminator-Guided RL（奖励一直就在你的数据中：用判别器引导的强化学习纠正流匹配） [05:51] 🔍 SAE Interventions are Unreliable: Post-Intervention Recovery of Suppressed Behavior（SAE干预不可靠：抑制行为在干预后的恢复） [06:36] 🤖 From Trainee to Trainer: LLM-Designed Training Environment for RL with Multi-Agent Reasoning（从受训者到训练者：面向多智能体推理的LLM设计训练环境强化学习框架） [07:27] 🧠 Reinforcing Dual-Path Reasoning in Spatial Vision Language Models（增强空间视觉语言模型中的双路径推理） [08:25] 🎯 Trust the Right Teacher: Quality-Aware Self-Distillation for GUI Grounding（信任正确的教师：面向GUI定位的质量感知自蒸馏方法） [09:25] 👁 Native Active Perception as Reasoning for Omni-Modal Understanding（原生主动感知作为全模态理解的推理） [10:15] 🐱 MaineCoon: Pursuing A Real-Time Audio-Visual Social World Model（缅因猫：追求实时的音视频社交世界模型） [11:08] 🖌 Sumi: Open Uniform Diffusion Language Model from Scratch（Sumi：从头构建的开放均匀扩散语言模型） [11:51] 🎯 STARE: Surprisal-Guided Token-Level Advantage Reweighting for Policy Entropy Stability（STARE：基于惊异度的令牌级优势重加权以实现策略熵稳定性） [12:46] 🌍 Beyond Alignment: Value Diversity as a Collective Property in Multicultural Agent Systems（超越对齐：价值多样性作为多元文化智能体系统中的集体属性）【关注我们】您还可以在以下平台找到我们，获得播客内容以外更多信息小红书: AI速递

6天前

2026.06.17 | 梯度视角破解RLVR崩塌；游戏引擎端到端生成挑战

【赞助商】 OpenClaw快报每天五分钟，听听 OpenClaw 快报，带你了解最新动态和业内讨论传送门 https://www.xiaoyuzhoufm.com/podcast/6a1732a2dffa135d0ab5ef43 【目录】本期的 9 篇论文如下： [00:32] 🏆 A Gradient Perspective on RLVR Stability and Winner Advantage Policy Optimization（关于RLVR稳定性的梯度视角及胜者优势策略优化） [01:28] 🎮 GameCraft-Bench: Can Agents Build Playable Games End-to-End in a Real Game Engine?（GameCraft-Bench：智能体能否在真实游戏引擎中端到端构建可玩游戏？） [02:20] 🏥 TRIAGE: Dialectical Reasoning for Explainable Risk Prediction on Irregularly Sampled Medical Time Series with LLMs（TRIAGE：基于辩证推理的可解释风险预测框架，用于大语言模型处理不规则采样的医疗时间序列） [03:20] 🤖 LectūraAgents: A Multi-Agent Framework for Adaptive Personalized AI-Assisted Learning and Embodied Teaching（LectūraAgents：面向自适应个性化AI辅助学习与具身教学的多智能体框架） [04:02] 🌍 ActWorld: From Explorable to Interactive World Model via Action-Aware Memory（ActWorld：通过动作感知记忆从可探索到可交互的世界模型） [04:57] 🖼 Unified Multimodal Autoregressive Modeling with Shared Context-Visual Tokenizer is Key to Unification（统一多模态自回归建模：共享上下文-视觉分词器是实现统一的关键） [06:01] 🔬 Aligning Quantum Operators with Large Language Models（将量子算子与大语言模型对齐） [06:54] 🔄 Looped World Models（循环世界模型） [07:50] 🌐 Beyond Monolingual Deep Research: Evaluating Agents and Retrievers with Cross-Lingual BrowseComp-Plus（超越单语深度研究：使用跨语言BrowseComp-Plus评估智能体与检索器）【关注我们】您还可以在以下平台找到我们，获得播客内容以外更多信息小红书: AI速递

9分钟

80

2026.06.15 | AI视频精准控镜；智能体推理细粒度优化

【赞助商】 OpenClaw快报每天五分钟，听听 OpenClaw 快报，带你了解最新动态和业内讨论传送门 https://www.xiaoyuzhoufm.com/podcast/6a1732a2dffa135d0ab5ef43 【目录】本期的 15 篇论文如下： [00:30] 🎥 OmniDirector: General Multi-Shot Camera Cloning without Cross-Paired Data（OmniDirector：无需配对数据的通用多镜头相机克隆） [01:25] 🤖 APPO: Agentic Procedural Policy Optimization（智能体程序策略优化） [02:23] 🧠 Memory is Reconstructed, Not Retrieved: Graph Memory for LLM Agents（记忆是重构的，而非检索：面向LLM智能体的图记忆机制） [03:16] 🤖 From Chatbot to Digital Colleague: The Paradigm Shift Toward Persistent Autonomous AI（从聊天机器人到数字同事：向持久自主人工智能的范式转变） [04:12] 🎼 Orchestra-o1: Omnimodal Agent Orchestration（管弦乐队-o1：全模态智能体编排框架） [05:12] 🔧 HarnessX: A Composable, Adaptive, and Evolvable Agent Harness Foundry（HarnessX：一个可组合、自适应且可演化的代理框架铸造厂） [06:16] 🎥 Rethinking RAG in Long Videos: What to Retrieve and How to Use It?（重新思考长视频中的检索增强生成：检索什么以及如何使用？） [07:19] 🎬 OmniVideo-100K: A Dataset for Audio-Visual Reasoning through Structured Scripts and Evidence Chains（OmniVideo-100K：一个通过结构化脚本和证据链进行音视频推理的数据集） [08:36] 🤖 From AGI to ASI（从通用人工智能到超级人工智能） [09:34] 🧠 Smaller Models are Natural Explorers for Policy-Level Diversity in GRPO（小型模型是GRPO中策略级多样性的天然探索者） [10:32] 🛡 RedAct: Redacting Agent Capability Traces for Procedural Skill Protection（RedAct：为保护程序技能而屏蔽智能体能力痕迹） [11:21] 👁 LLM Agents Can See Code Repositories（LLM智能体能够“看见”代码仓库） [12:14] 🩺 Measuring Epistemic Resilience of LLMs Under Misleading Medical Context（测量大语言模型在误导性医疗语境下的认知韧性） [13:15] 🔄 Skip a Layer or Loop It? Learning Program-of-Layers in LLMs（跳过一层还是循环它？学习大语言模型中的层程序化执行） [14:12] 🎨 RepFusion: Leveraging Multimodal Priors for Denoising in Representation Space（RepFusion：利用多模态先验在表示空间中进行去噪）【关注我们】您还可以在以下平台找到我们，获得播客内容以外更多信息小红书: AI速递

15分钟

【周末特辑】6月第3周最火AI论文 | ALE测试揭示AI智能体真实能力，通过率仅26.2%

【赞助商】 OpenClaw快报每天五分钟，听听 OpenClaw 快报，带你了解最新动态和业内讨论传送门 https://www.xiaoyuzhoufm.com/podcast/6a1732a2dffa135d0ab5ef43 【目录】本期的 5 篇论文如下： [00:43] TOP1(🔥463) | 🌍 ABot-Earth 0.5: Generative 3D Earth Model（ABot-Earth 0.5：生成式三维地球模型） [02:51] TOP2(🔥331) | 📊 Agents' Last Exam（智能体的最后一场考试） [05:41] TOP3(🔥182) | 🎥 Kwai Keye-VL-2.0 Technical Report（快手可灵-VL-2.0技术报告） [07:58] TOP4(🔥121) | 🧠 EvoArena: Tracking Memory Evolution for Robust LLM Agents in Dynamic Environments（EvoArena：在动态环境中追踪记忆演化以实现鲁棒的LLM智能体） [09:52] TOP5(🔥117) | 🧠 Imaginative Perception Tokens Enhance Spatial Reasoning in Multimodal Language Models（想象感知标记增强多模态语言模型的空间推理能力）【关注我们】您还可以在以下平台找到我们，获得播客内容以外更多信息小红书: AI速递

12分钟

2026.06.12 | EvoArena追踪记忆演化，测试AI洞悉偏好变化

【赞助商】 OpenClaw快报每天五分钟，听听 OpenClaw 快报，带你了解最新动态和业内讨论传送门 https://www.xiaoyuzhoufm.com/podcast/6a1732a2dffa135d0ab5ef43 【目录】本期的 15 篇论文如下： [00:31] 🧠 EvoArena: Tracking Memory Evolution for Robust LLM Agents in Dynamic Environments（EvoArena：在动态环境中追踪记忆演化以实现鲁棒的LLM智能体） [01:32] 🧠 SpatialClaw: Rethinking Action Interface for Agentic Spatial Reasoning（SpatialClaw：重新思考面向智能体空间推理的动作接口） [02:33] 🔍 FORT-Searcher: Synthesizing Shortcut-Resistant Search Tasks for Training Deep Search Agents（FORT-Searcher：为训练深度搜索代理合成抗捷径搜索任务） [03:31] 🛠 Robust-U1: Can MLLMs Self-Recover Corrupted Visual Content for Robust Understanding?（鲁棒-U1：多模态大语言模型能否自我恢复受损视觉内容以实现鲁棒理解？） [04:27] 🔄 InterleaveThinker: Reinforcing Agentic Interleaved Generation（交错思考者：强化智能交错生成） [05:14] 🧮 MaxProof: Scaling Mathematical Proof with Generative-Verifier RL and Population-Level Test-Time Scaling（MaxProof：利用生成-验证强化学习与群体级测试时扩展实现数学证明的规模化） [06:11] 🧠 MiniMax Sparse Attention（MiniMax稀疏注意力机制） [06:50] 🖥 WeaveBench: A Long-Horizon, Real-World Benchmark for Computer-Use Agents with Hybrid Interfaces（WeaveBench：面向混合界面计算机使用代理的长期、真实世界基准） [07:50] 🔬 LabVLA: Grounding Vision-Language-Action Models in Scientific Laboratories（LabVLA：在科学实验室中落地视觉-语言-动作模型） [08:55] 🦾 HYDRA-X: Native Unified Multimodal Models with Holistic Visual Tokenizers（HYDRA-X：具备整体视觉分词器的原生统一多模态模型） [09:45] 🧩 N-GRPO: Embedding-Level Neighbor Mixing for Enhanced Policy Optimization（N-GRPO：基于嵌入级邻居混合的增强策略优化） [10:44] 🔬 EurekAgent: Agent Environment Engineering is All You Need For Autonomous Scientific Discovery（EurekAgent：智能体环境工程是自主科学发现所需的一切） [11:38] 🏃 VideoMDM: Towards 3D Human Motion Generation From 2D Supervision（VideoMDM：从二维监督迈向三维人体运动生成） [12:35] 🔍 Where, What, Why, and Importance: Structured Defect Grounding for Text-to-Image Feedback（位置、类型、原因与重要性：面向文本到图像反馈的结构化缺陷定位） [13:26] 🔀 Demystifying Hidden-State Recurrence: Switchable Latent Reasoning with On-Policy Reinforcement Learning（揭秘隐藏状态循环：基于在线强化学习的可切换潜在推理）【关注我们】您还可以在以下平台找到我们，获得播客内容以外更多信息小红书: AI速递

2026.06.11 | 流形幂迭代优化路由器；假设树精炼驱动自主研究

【赞助商】 OpenClaw快报每天五分钟，听听 OpenClaw 快报，带你了解最新动态和业内讨论传送门 https://www.xiaoyuzhoufm.com/podcast/6a1732a2dffa135d0ab5ef43 【目录】本期的 15 篇论文如下： [00:31] 🔀 Redesign Mixture-of-Experts Routers with Manifold Power Iteration（利用流形幂迭代重新设计混合专家路由器） [01:16] 🌳 Toward Generalist Autonomous Research via Hypothesis-Tree Refinement（迈向通用自主研究：通过假设树精炼实现） [02:06] 🧪 Claw-SWE-Bench: A Benchmark for Evaluating OpenClaw-style Agent Harnesses on Coding Tasks（Claw-SWE-Bench：用于评估OpenClaw风格智能体框架在编码任务上的基准测试） [03:12] 🌐 Agentic Environment Engineering for Large Language Models: A Survey of Environment Modeling, Synthesis, Evaluation, and Application（面向大语言模型的智能体环境工程：环境建模、合成、评估与应用综述） [04:10] 🎯 Beyond Scalar Rewards by Internalizing Reasoning into Score Distributions（超越标量奖励：将推理内化为评分分布） [05:13] 📊 TRL-Bench: Standardizing Cross-Paradigm Representation-Level Evaluation of Tabular Encoders（TRL-Bench：标准化表格编码器的跨范式表示级评估） [05:57] 🔄 Reason, Then Re-reason: Cross-view Revisiting Improves Spatial Reasoning（先推理，再重新推理：跨视角重访提升空间推理能力） [06:45] 🧩 DeNovoSWE: Scaling Long-Horizon Environments for Generating Entire Repositories from Scratch（DeNovoSWE：从零开始扩展长周期环境以生成完整代码仓库） [07:42] 🤖 World Pilot: Steering Vision-Language-Action Models with World-Action Priors（世界领航员：利用世界-动作先验引导视觉-语言-动作模型） [08:45] 🧠 On Subquadratic Architectures: From Applications to Principles（论次二次架构：从应用到原理） [09:31] 🧩 ComBench: A Benchmark for Rigorous Proof Reasoning and Constructive Realization in Olympiad-Level Combinatorics（ComBench：面向奥林匹克级组合数学的严谨证明推理与构造实现基准） [10:24] 🔓 Grammar-Constrained Decoding Can Jailbreak LLMs into Generating Malicious Code（语法约束解码可诱使大语言模型生成恶意代码） [11:25] 🎥 InternVideo3: Agentify Foundation Models with Multimodal Contextual Reasoning（InternVideo3：通过多模态上下文推理将基础模型智能体化） [12:18] ⚡ Breaking Entropy Bounds: Accelerating RL Training via MTP with Rejection Sampling（打破熵界：通过带拒绝采样的多令牌预测加速强化学习训练） [13:14] 🔍 ICA Lens: Interpreting Language Models Without Training Another Dictionary（ICA透镜：无需训练另一本词典即可解读语言模型）【关注我们】您还可以在以下平台找到我们，获得播客内容以外更多信息小红书: AI速递

79

2026.06.10 | 快手可灵长视频理解新突破；ABot-Earth三维生成仅需十分钟

【目录】本期的 15 篇论文如下： [00:32] 🎥 Kwai Keye-VL-2.0 Technical Report（快手可灵-VL-2.0技术报告） [01:24] 🌍 ABot-Earth 0.5: Generative 3D Earth Model（ABot-Earth 0.5：生成式三维地球模型） [02:13] 🤖 Role-Agent: Bootstrapping LLM Agents via Dual-Role Evolution（角色代理：通过双角色进化引导LLM代理） [03:08] 🔧 Retrospective Harness Optimization: Improving LLM Agents via Self-Preference over Trajectory Rollouts（回顾性装备优化：通过轨迹展开上的自我偏好改进LLM智能体） [04:06] 🐝 SearchSwarm: Towards Delegation Intelligence in Agentic LLMs for Long-Horizon Deep Research（搜索蜂群：面向长周期深度研究的代理型大语言模型委派智能） [05:02] 🎥 MemDreamer: Decoupling Perception and Reasoning for Long Video Understanding via Hierarchical Graph Memory and Agentic Retrieval Mechanism（MemDreamer：通过分层图记忆与智能检索机制解耦感知与推理实现长视频理解） [06:05] 📊 Data Journalist Agent: Transforming Data into Verifiable Multimodal Stories（数据记者智能体：将数据转化为可验证的多模态故事） [07:01] 🎭 SCAIL-2: Unifying Controlled Character Animation with End-to-end In-Context Conditioning（SCAIL-2：通过端到端上下文条件控制统一受控角色动画） [07:58] 🔀 Flow-DPPO: Divergence Proximal Policy Optimization for Flow Matching Models（Flow-DPPO：面向流匹配模型的散度近端策略优化） [09:05] 🏅 WorldOlympiad: Can Your World Model Survive a Triathlon?（世界奥林匹克：你的世界模型能经受三项赛考验吗？） [10:00] 🎯 Rethinking the Divergence Regularization in LLM RL（重新思考大语言模型强化学习中的散度正则化） [10:56] 💋 Lip Forcing: Few-Step Autoregressive Diffusion for Real-time Lip Synchronization（唇部强制：用于实时唇部同步的少步自回归扩散） [11:57] 🤖 EEVEE: Towards Test-time Prompt Learning in the Real World for Self-Improving Agents（EEVEE：面向真实世界测试时提示学习的自改进智能体） [12:52] 🧠 One Token per Multimodal Evidence: Latent Memory for Resource-Constrained QA（每多模态证据一个令牌：面向资源受限问答的潜在记忆） [13:49] 🤖 Workflow-GYM: Towards Long-Horizon Evaluation of Computer-use Agentic tasks in Real-World Professional Fields（工作流健身房：面向真实世界专业领域的长周期计算机使用代理任务评估）【关注我们】您还可以在以下平台找到我们，获得播客内容以外更多信息小红书: AI速递【赞助商】 OpenClaw快报每天五分钟，听听 OpenClaw 快报，带你了解最新动态和业内讨论传送门 https://www.xiaoyuzhoufm.com/podcast/6a1732a2dffa135d0ab5ef43

15分钟

2026.06.09 | 代码探索短板凸显；策略内蒸馏几何特性揭示

【目录】本期的 15 篇论文如下： [00:32] 🔍 SWE-Explore: Benchmarking How Coding Agents Explore Repositories（SWE-Explore：基准测试编码代理如何探索代码仓库） [01:34] 🔍 On the Geometry of On-Policy Distillation（论策略内蒸馏的几何特性） [02:26] 🧠 Latent Spatial Memory for Video World Models（面向视频世界模型的潜在空间记忆） [03:20] 🎬 CoVEBench: Can Video Editing Models Handle Complex Instructions?（CoVEBench：视频编辑模型能否处理复杂指令？） [04:20] 🧠 LatentSkill: From In-Context Textual Skills to In-Weight Latent Skills for LLM Agents（潜在技能：从上下文文本技能到LLM智能体的权重内潜在技能） [05:10] ⚡ FlashMemory-DeepSeek-V4: Lightning Index Ultra-Long Context via Lookahead Sparse Attention（闪存-深度求索V4：通过前向稀疏注意力实现闪电般超长上下文处理） [06:06] 🌍 SpatialWorld: Benchmarking Interactive Spatial Reasoning of Multimodal Agents in Real-World Tasks（空间世界：真实世界任务中多模态智能体交互式空间推理的基准测试） [07:10] 🧠 Human Psychometric Questionnaires Mischaracterize LLM Behavior（人类心理测量问卷误判LLM行为） [08:19] 🧠 Echo-Memory: A Controlled Study of Memory in Action World Models（回响记忆：动作世界模型中记忆机制的受控研究） [09:08] 🎮 OmniGameArena: A Unified UE5 Benchmark for VLM Game Agents with Improvement Dynamics（OmniGameArena：一个统一的UE5基准测试，用于具备改进动态的VLM游戏智能体） [10:03] 🤖 AHA-WAM:Asynchronous Horizon-Adaptive World-Action Modeling with Observation-Guided Context Routing（AHA-WAM：异步自适应时域世界-动作建模与观测引导上下文路由） [11:08] 🎥 SwiftVR: Real-Time One-Step Generative Video Restoration（SwiftVR：实时一步生成式视频修复） [12:12] 🧠 Bayesian-Agent: Posterior-Guided Skill Evolution for LLM Agent Harnesses（贝叶斯智能体：基于后验引导的技能演化用于LLM智能体框架） [13:02] 🎬 OmniCap-IF: Benchmarking and Improving Instruction Following Abilities for Omni-Video Captioning（OmniCap-IF：全方位视频字幕生成的指令遵循能力基准测试与改进） [14:14] 🎯 Skill-RM: Unifying Heterogeneous Evaluation Criteria via Agent Skill（技能奖励模型：通过智能体技能统一异构评估标准）【关注我们】您还可以在以下平台找到我们，获得播客内容以外更多信息小红书: AI速递【赞助商】 OpenClaw快报每天五分钟，听听 OpenClaw 快报，带你了解最新动态和业内讨论传送门 https://www.xiaoyuzhoufm.com/podcast/6a1732a2dffa135d0ab5ef43

15分钟

80

2026.06.08 | 解嵌入矩阵是隐式特征透镜；EmbedFilter线性过滤提升文本嵌入效果。

【目录】本期的 15 篇论文如下： [00:33] 🔍 Your UnEmbedding Matrix is Secretly a Feature Lens for Text Embeddings（你的解嵌入矩阵实际上是文本嵌入的隐式特征透镜） [01:25] 🤝 SoCRATES: Towards Reliable Automated Evaluation of Proactive LLM Mediation across Domains and Socio-cognitive Variations（SoCRATES：面向跨领域和社会认知变化的主动式大语言模型调解的可靠自动评估） [02:15] 🎧 MMAE: A Massive Multitask Audio Editing Benchmark（MMAE：大规模多任务音频编辑基准） [03:15] 🧬 GENEB: Why Genomic Models Are Hard to Compare（GENEB：为什么基因组模型难以比较） [04:11] 🌍 AnchorWorld: Embodied Egocentric World Simulation with View-based Evolution Customization（AnchorWorld：基于视角演化定制的具身自我中心世界模拟） [04:55] 🎨 Direct 3D-Aware Object Insertion via Decomposed Visual Proxies（通过分解视觉代理实现直接的三维感知物体插入） [05:45] 🤖 Robots Need More than VLA and World Models（机器人需要的不仅仅是VLA与世界模型） [06:42] 🛠 When Tools Fail: Benchmarking Dynamic Replanning and Anomaly Recovery in LLM Agents（当工具失效时：大语言模型代理中的动态重规划和异常恢复基准测试） [07:36] 🧠 SubtleMemory: A Benchmark for Fine-Grained Relational Memory Discrimination in Long-Horizon AI Agents（SubtleMemory：面向长周期AI代理的细粒度关系记忆辨别基准） [08:30] 🧠 OpenSkill: Open-World Self-Evolution for LLM Agents（OpenSkill：面向LLM智能体的开放世界自我进化框架） [09:22] 🌍 UniSHARP: Universal Sharp Monocular View Synthesis（UniSHARP：通用锐利单目视图合成） [10:10] 🏃 LIMMT: Less is More for Motion Tracking（LIMMT：少即是多用于运动追踪） [10:57] 👁 Watch, Remember, Reason: Human-View Video Understanding with MLLMs（观看、记忆、推理：基于多模态大语言模型的人类视角视频理解） [11:56] 🎙 dots.tts Technical Report（dots.tts技术报告） [12:52] 🧠 Thinking with Imagination: Agentic Visual Spatial Reasoning with World Simulators（用想象力思考：基于世界模拟器的具身视觉空间推理）【关注我们】您还可以在以下平台找到我们，获得播客内容以外更多信息小红书: AI速递【赞助商】 OpenClaw快报每天五分钟，听听 OpenClaw 快报，带你了解最新动态和业内讨论传送门 https://www.xiaoyuzhoufm.com/podcast/6a1732a2dffa135d0ab5ef43

【周末特辑】6月第2周最火AI论文 | 多智能体协同生成可编辑科学图形；PEFT实现万亿参数模型百万个性化

【目录】本期的 5 篇论文如下： [00:50] TOP1(🔥190) | 🎨 Crafter: A Multi-Agent Harness for Editable Scientific Figure Generation from Diverse Inputs（Crafter：一种用于从多样化输入生成可编辑科学图形的多智能体框架） [03:13] TOP2(🔥175) | 🧩 On the Scaling of PEFT: Towards Million Personal Models of Trillion Parameters（关于参数高效微调的规模化：迈向万亿参数级别的百万个性化模型） [06:08] TOP3(🔥140) | 🚀 Domino: Decoupling Causal Modeling from Autoregressive Drafting in Speculative Decoding（多米诺：在推测解码中将因果建模与自回归草稿生成解耦） [08:35] TOP4(🔥108) | 🧠 COLLEAGUE.SKILL: Automated AI Skill Generation via Expert Knowledge Distillation（COLLEAGUE.SKILL：通过专家知识蒸馏实现自动化AI技能生成） [10:55] TOP5(🔥102) | 🔍 GrepSeek: Training Search Agents for Direct Corpus Interaction（GrepSeek：训练用于直接语料库交互的搜索智能体）【关注我们】您还可以在以下平台找到我们，获得播客内容以外更多信息小红书: AI速递【赞助商】 OpenClaw快报每天五分钟，听听 OpenClaw 快报，带你了解最新动态和业内讨论传送门 https://www.xiaoyuzhoufm.com/podcast/6a1732a2dffa135d0ab5ef43

13分钟