Album

HuggingFace 每日AI论文速递

10分钟速读热门AI论文

拨号上网 佚名
1.43万 订阅 618 集 2周前
播客简介
每天10分钟,带您快速了解当日HuggingFace热门AI论文内容。每个工作日更新,欢迎订阅。 📢播客节目在小宇宙、Apple Podcast平台搜索【HuggingFace 每日AI论文速递】 🖼另外还有图文版,可在小红书搜索并关注【AI速递】
节目
2026.06.05 | ArcANE框架量化角色弧线;TIDE模型实现主动洞察

2026.06.05 | ArcANE框架量化角色弧线;TIDE模型实现主动洞察

HuggingFace 每日AI论文速递

【目录】 本期的 15 篇论文如下: [00:31] 🎭 ArcANE: Do Role-Playing Language Agents Stay in Character at the Right Time?(ArcANE:角色扮演语言代理在正确时刻保持角色一致性吗?) [01:26] 🔍 TIDE: Proactive Multi-Problem Discovery via Template-Guided Iteration(TIDE:通过模板引导的迭代实现主动多问题发现) [02:27] 🤖 AdaPlanBench: Evaluating Adaptive Planning in Large Language Model Agents under World and User Constraints(AdaPlanBench:在世界与用户约束下评估大语言模型智能体的自适应规划能力) [03:14] 🎥 VideoKR: Towards Knowledge- and Reasoning-Intensive Video Understanding(VideoKR:迈向知识和推理密集型视频理解) [04:09] 🤖 RobotValues: Evaluating Household Robots When Human Values Conflict(机器人价值观:当人类价值观冲突时评估家用机器人) [05:01] 🌐 Reinforcement Learning Elicits Contextual Learning of Unseen Language Translation(强化学习引发对未见语言的上下文翻译学习) [05:58] 🎬 LoomVideo: Unifying Multimodal Inputs into Video Generation and Editing(LoomVideo:统一多模态输入的视频生成与编辑) [06:49] 📸 Personal AI Agent for Camera Roll VQA(个人相机胶卷视觉问答的AI助手) [07:36] 🧠 Rethinking Continual Experience Internalization for Self-Evolving LLM Agents(重新思考持续经验内化以实现自演化的大语言模型智能体) [08:27] ⚖ Complexity-Balanced Diffusion Splitting(复杂度平衡扩散分割) [09:28] 🤖 Dream.exe: Can Video Generation Models Dream Executable Robot Manipulation?(Dream.exe:视频生成模型能否构想出可执行的机器人操作?) [10:33] 🔬 Unsupervised Skill Discovery for Agentic Data Analysis(面向智能体数据分析的无监督技能发现) [11:25] 🔍 LLMs Can Leak Training Data But Do They Want To? A Propensity-Aware Evaluation of Memorization in LLMs(大型语言模型可能泄露训练数据,但它们愿意吗?一种基于倾向性的记忆评估方法) [12:17] 🎯 Towards One-to-Many Temporal Grounding(迈向一对多时序定位) [13:16] 💰 The Shadow Price of Reasoning: Economic Perspective on Optimal Budget Allocation for LLMs(推理的影子价格:大型语言模型最优预算分配的经济学视角) 【关注我们】 您还可以在以下平台找到我们,获得播客内容以外更多信息 小红书: AI速递 【赞助商】 OpenClaw快报 每天五分钟,听听 OpenClaw 快报,带你了解最新动态和业内讨论 传送门 https://www.xiaoyuzhoufm.com/podcast/6a1732a2dffa135d0ab5ef43

14分钟
45
2周前
2026.06.04 | 全模态统一框架;音频实时主动交互

2026.06.04 | 全模态统一框架;音频实时主动交互

HuggingFace 每日AI论文速递

【目录】 本期的 15 篇论文如下: [00:31] 🌌 Cosmos 3: Omnimodal World Models for Physical AI(宇宙3:面向物理AI的全模态世界模型) [01:36] 🎧 Audio Interaction Model(音频交互模型) [02:31] 🔍 Where Do Deep-Research Agents Go Wrong? Span-Level Error Localization in Agent Trajectories(深度研究型智能体错在哪里?智能体轨迹中的跨度级错误定位) [03:30] 🔍 Reproducing, Analyzing, and Detecting Reward Hacking in Rubric-Based Reinforcement Learning(在基于评分标准的强化学习中复现、分析与检测奖励作弊行为) [04:25] 🧭 OVO-S-Bench: A Hierarchical Benchmark for Streaming Spatial Intelligence in Multimodal LLMs(OVO-S-Bench:面向多模态大语言模型流式空间智能的分层基准) [05:27] ⚡ Qwen-Image-Flash: Beyond Objective Design(Qwen-Image-Flash:超越客观设计) [06:18] 🧠 M$^3$Eval: Multi-Modal Memory Evaluation through Cognitively-Grounded Video Tasks(M$^3$Eval:基于认知视频任务的多模态记忆评估) [07:13] 🎥 Echo-Infinity: Learning Evolving Memory for Real-Time Infinite Video Generation(回声无限:面向实时无限视频生成的可学习演化记忆) [08:14] 🧠 ThoughtFold: Folding Reasoning Chains via Introspective Preference Learning(思维折叠:通过内省偏好学习折叠推理链) [09:08] 🧪 Benchmarks are Not Enough: RAMP for Runtime Assessing of Agentic Models in Production Systems(基准测试并不足够:用于生产系统中智能体模型运行时评估的RAMP框架) [10:15] ⚡ Streaming Communication in Multi-Agent Reasoning(多智能体推理中的流式通信) [11:08] 🎯 Self-Distilled Policy Gradient(自蒸馏策略梯度) [12:13] 🧠 MemTrain: Self-Supervised Context Memory Training(MemTrain:自监督上下文记忆训练) [13:05] 🧩 Eliciting Complex Spatial Reasoning in MLLMs through Wide-Baseline Matching(通过宽基线匹配激发多模态大语言模型中的复杂空间推理能力) [14:11] 🤖 MMG2Skill: Can Agents Distill In-the-Wild Guides into Self-Evolving Skills?(MMG2Skill:智能体能否从野外指南中蒸馏出自我进化的技能?) 【关注我们】 您还可以在以下平台找到我们,获得播客内容以外更多信息 小红书: AI速递 【赞助商】 OpenClaw快报 每天五分钟,听听 OpenClaw 快报,带你了解最新动态和业内讨论 传送门 https://www.xiaoyuzhoufm.com/podcast/6a1732a2dffa135d0ab5ef43

15分钟
87
2周前
2026.06.03 | 信任区域教小模型;人形GPT追踪动作

2026.06.03 | 信任区域教小模型;人形GPT追踪动作

HuggingFace 每日AI论文速递

【目录】 本期的 15 篇论文如下: [00:31] 🎯 Trust Region On-Policy Distillation(信任区域同策略蒸馏) [01:17] 🤖 Humanoid-GPT: Scaling Data and Structure for Zero-Shot Motion Tracking(人形GPT:扩展数据与结构实现零样本运动追踪) [02:07] 🧠 A Local Perturbation Theory for Cross-Domain Interference and Recovery in Multi-Domain RL(多领域强化学习中跨域干扰与恢复的局部微扰理论) [03:06] 🧠 World Models Meet Language Models: On the Complementarity of Concrete and Abstract Reasoning(世界模型与语言模型:具体与抽象推理的互补性) [03:57] 🏥 AutoMedBench: Towards Medical AutoResearch with Agentic AI Models(AutoMedBench:面向医疗自主研究的智能体AI模型基准) [05:09] 🖼 Decoupled Residual Denoising Diffusion Models for Unified and Data Efficient Image-to-Image Translation(解耦残差去噪扩散模型用于统一且数据高效的图像到图像翻译) [06:12] 😴 Language Models Need Sleep: Learning to Self-Modify and Consolidate Memories(语言模型需要睡眠:学习自我修改与记忆巩固) [07:09] 🧩 TRON: Targeted Rule-Verifiable Online Environments for Visual Reasoning RL(TRON:面向视觉推理强化学习的目标驱动、规则可验证的在线环境) [08:07] 💬 $Ψ$-Bench: Evaluating Persona-Sensitive Influencing in Persuasive Dialogues(Ψ-Bench:评估说服性对话中个性感知影响能力) [09:08] 🧩 Decentralized Instruction Tuning: Conflict-Aware Splitting and Weight Merging(去中心化指令微调:冲突感知分割与权重合并) [10:05] 🎯 Small RL Controller, Large Language Model: RL-Guided Adaptive Sampling for Test-Time Scaling(小型强化学习控制器与大型语言模型:基于强化学习引导的自适应采样实现测试时扩展) [11:09] 📄 PaddleOCR-VL-1.6: Expanding the Frontier of Document Parsing with Under-Optimized Region Refinement and Progressive Post-Training(PaddleOCR-VL-1.6:通过欠优化区域精炼与渐进式后训练扩展文档解析前沿) [12:14] 🗺 PlatonicNav: Unveiling Semantic Correspondence in Navigation with Platonic Topological Maps(柏拉图导航:利用柏拉图拓扑图揭示导航中的语义对应关系) [13:16] 🔍 Diagnosing Harmful Continuation in Answer-Correct Long-CoT Training Traces(诊断正确答案长链思维训练轨迹中的有害延续) [14:05] 🎵 MERIT: Learning Disentangled Music Representations for Audio Similarity(MERIT:学习用于音频相似性的解耦音乐表示) 【关注我们】 您还可以在以下平台找到我们,获得播客内容以外更多信息 小红书: AI速递 【赞助商】 OpenClaw快报 每天五分钟,听听 OpenClaw 快报,带你了解最新动态和业内讨论 传送门 https://www.xiaoyuzhoufm.com/podcast/6a1732a2dffa135d0ab5ef43

15分钟
82
2周前
2026.06.02 | 多智能体框架生成可编辑图表;参数高效微调支撑百万个性化模型

2026.06.02 | 多智能体框架生成可编辑图表;参数高效微调支撑百万个性化模型

HuggingFace 每日AI论文速递

【目录】 本期的 15 篇论文如下: [00:33] 🎨 Crafter: A Multi-Agent Harness for Editable Scientific Figure Generation from Diverse Inputs(Crafter:一种用于从多样化输入生成可编辑科学图形的多智能体框架) [01:39] 🧩 On the Scaling of PEFT: Towards Million Personal Models of Trillion Parameters(关于参数高效微调的规模化:迈向万亿参数级别的百万个性化模型) [02:35] 🧪 A Matter of TASTE: Improving Coverage and Difficulty of Agent Benchmarks(品味之道:提升智能体基准测试的覆盖度与难度) [03:25] 🌐 K-BrowseComp: A Web Browsing Agent Benchmark Grounded in Korean Contexts(K-BrowseComp:基于韩国语境的网页浏览代理基准测试) [04:21] ⚡ Draft-OPD: On-Policy Distillation for Speculative Draft Models(Draft-OPD:面向推测草稿模型的在策略蒸馏) [05:10] 🎓 VLMs are Good Teachers for Video Reasoning via Adaptive Test-Time Optimization(视觉语言模型作为视频推理的优质教师:通过自适应测试时优化) [06:18] 📡 X-Stream: Exploring MLLMs as Multiplexers for Multi-Stream Understanding(X-Stream:探索多模态大语言模型作为多流理解的多路复用器) [07:13] 🎬 VideoMLA: Low-Rank Latent KV Cache for Minute-Scale Autoregressive Video Diffusion(VideoMLA:用于分钟级自回归视频扩散的低秩潜在KV缓存) [07:59] 🤖 SkillAdaptor: Self-Adapting Skills for LLM Agents from Trajectories(SkillAdaptor:面向LLM智能体的自适应技能从轨迹中学习) [08:54] 🧠 Which Pretraining Paradigm Better Serves Spatial Intelligence? An Empirical Comparison of Vision-Language and Video Generation Models(哪种预训练范式更好地服务于空间智能?视觉语言模型与视频生成模型的实证比较) [09:51] 🧠 NITP: Next Implicit Token Prediction for LLM Pre-training(NITP:面向大语言模型预训练的下一隐式词元预测) [10:50] 👀 Where to Look: Can Foundation Models Reach a Target Viewpoint Through Active Exploration?(该看向哪里:基础模型能否通过主动探索达到目标视角?) [11:46] 🎬 LVSA: Training-Free Sparse Attention for Long Video Diffusion(LVSA:面向长视频扩散的无训练稀疏注意力机制) [12:38] 🛑 ESPO: Early-Stopping Proximal Policy Optimization(早期停止的近端策略优化) [13:37] 🎤 StreamChar: Long-Horizon Streaming Character Audio-Video Generation with Decoupled Orchestration(StreamChar:基于解耦编排的长时流式角色音频-视频生成) 【关注我们】 您还可以在以下平台找到我们,获得播客内容以外更多信息 小红书: AI速递 【赞助商】 OpenClaw快报 每天五分钟,听听 OpenClaw 快报,带你了解最新动态和业内讨论 传送门 https://www.xiaoyuzhoufm.com/podcast/6a1732a2dffa135d0ab5ef43

14分钟
84
2周前
评价

空空如也

加入我们的 Discord

与播客爱好者一起交流

立即加入

扫描微信二维码

添加微信好友,获取更多播客资讯

微信二维码

播放列表

自动播放下一个

播放列表还是空的

去找些喜欢的节目添加进来吧