HuggingFace 每日AI论文速递 - 节目列表

2026.04.17 | HY-World2.0统一生成与重建;DR³-Eval建可复现研究基准

HuggingFace 每日AI论文速递

【赞助商】 通勤路上就听AI每周谈。AI每周谈,每周带你回顾上周AI大事 传送门 🔗www.xiaoyuzhoufm.com 【目录】 本期的 15 篇论文如下: 00:31 🌍 HY-World 2.0: A Multi-Modal World Model for Reconstructing, Generating, and Simulating 3D Worlds(HY-World 2.0:用于重建、生成和模拟3D世界的多模态世界模型) 01:24 🔬 DR$^{3}$-Eval: Towards Realistic and Reproducible Deep Research Evaluation(DR³-Eval:迈向现实且可复现的深度研究评估) 02:16 🚗 RAD-2: Scaling Reinforcement Learning in a Generator-Discriminator Framework(RAD-2:在生成器-判别器框架中扩展强化学习) 03:15 🤖 HiVLA: A Visual-Grounded-Centric Hierarchical Embodied Manipulation System(HiVLA:一种以视觉接地为中心的分层具身操作系统) 04:04 🛡 ASGuard: Activation-Scaling Guard to Mitigate Targeted Jailbreaking Attack(ASGuard:基于激活缩放防护的定向越狱攻击缓解方法) 05:02 🧠 How to Fine-Tune a Reasoning Model? A Teacher-Student Cooperation Framework to Synthesize Student-Consistent SFT Data(如何微调推理模型?一种师生协作框架以合成学生一致的SFT数据) 05:37 🌍 GlobalSplat: Efficient Feed-Forward 3D Gaussian Splatting via Global Scene Tokens(GlobalSplat:通过全局场景令牌实现高效前馈式3D高斯溅射) 06:20 🔍 UniDoc-RL: Coarse-to-Fine Visual RAG with Hierarchical Actions and Dense Rewards(UniDoc-RL:基于分层动作与密集奖励的从粗到细视觉检索增强生成) 06:56 🧠 Switch-KD: Visual-Switch Knowledge Distillation for Vision-Language Models(Switch-KD:面向视觉语言模型的视觉切换知识蒸馏) 07:34 🛣 TRACER: Trace-Based Adaptive Cost-Efficient Routing for LLM Classification(TRACER:基于追踪的自适应成本高效路由用于大语言模型分类) 08:41 🤖 Dive into Claude Code: The Design Space of Today's and Future AI Agent Systems(深入Claude Code:当今及未来AI代理系统的设计空间) 09:25 🎬 Representations Before Pixels: Semantics-Guided Hierarchical Video Prediction(像素之前:语义引导的分层视频预测) 10:13 🧭 Don't Retrieve, Navigate: Distilling Enterprise Knowledge into Navigable Agent Skills for QA and RAG(无需检索,直接导航:将企业知识提炼为可导航的智能体技能以支持问答与检索增强生成) 11:03 🚀 LongAct: Harnessing Intrinsic Activation Patterns for Long-Context Reinforcement Learning(LongAct:利用内在激活模式进行长上下文强化学习) 11:45 ⚡ KV Packet: Recomputation-Free Context-Independent KV Caching for LLMs(KV数据包:面向大语言模型的无重计算上下文无关KV缓存方法) 【关注我们】 您还可以在以下平台找到我们,获得播客内容以外更多信息 小红书: AI速递

13分钟
99+
3周前

2026.04.16 | Seedance 2.0一统多模态生成;RationalRewards让奖励模型讲理

HuggingFace 每日AI论文速递

【赞助商】 通勤路上就听AI每周谈。AI每周谈,每周带你回顾上周AI大事 传送门 🔗www.xiaoyuzhoufm.com 【目录】 本期的 15 篇论文如下: 00:31 🎬 Seedance 2.0: Advancing Video Generation for World Complexity(Seedance 2.0:面向世界复杂性的视频生成技术进展) 01:21 🧠 RationalRewards: Reasoning Rewards Scale Visual Generation Both Training and Test Time(RationalRewards:推理奖励在训练和测试时均能提升视觉生成) 02:08 🧠 SpatialEvo: Self-Evolving Spatial Intelligence via Deterministic Geometric Environments(SpatialEvo:通过确定性几何环境实现自我演化的空间智能) 02:49 🧪 OccuBench: Evaluating AI Agents on Real-World Professional Tasks via Language World Models(OccuBench:基于语言世界模型评估AI代理在真实世界专业任务上的表现) 03:39 🎮 GameWorld: Towards Standardized and Verifiable Evaluation of Multimodal Game Agents(GameWorld:面向多模态游戏智能体的标准化与可验证评估) 04:22 🧠 Memory Transfer Learning: How Memories are Transferred Across Domains in Coding Agents(记忆迁移学习:编码智能体中记忆如何跨领域迁移) 05:21 🧠 From $P(y|x)$ to $P(y)$: Investigating Reinforcement Learning in Pre-train Space(从条件分布$P(y|x)$到边际分布$P(y)$:探索预训练空间中的强化学习) 06:16 🎯 Target Policy Optimization(目标策略优化) 07:12 🧩 Sema Code: Decoupling AI Coding Agents into Programmable, Embeddable Infrastructure(Sema Code:将AI编码代理解耦为可编程、可嵌入的基础设施) 08:04 🤖 SemaClaw: A Step Towards General-Purpose Personal AI Agents through Harness Engineering(SemaClaw:通过约束工程迈向通用个人AI代理的一步) 08:40 🔍 Free Geometry: Refining 3D Reconstruction from Longer Versions of Itself(自由几何:从自身的更长版本中精炼三维重建) 09:26 🔍 TIP: Token Importance in On-Policy Distillation(TIP:基于策略蒸馏中的令牌重要性) 10:21 🔬 ReconPhys: Reconstruct Appearance and Physical Attributes from Single Video(ReconPhys:从单视频重建外观与物理属性) 11:15 🔍 UI-Zoomer: Uncertainty-Driven Adaptive Zoom-In for GUI Grounding(UI-Zoomer:基于不确定性的自适应放大用于图形用户界面定位) 12:17 🤖 TREX: Automating LLM Fine-tuning via Agent-Driven Tree-based Exploration(TREX:基于智能体驱动树状探索的LLM自动微调) 【关注我们】 您还可以在以下平台找到我们,获得播客内容以外更多信息 小红书: AI速递

13分钟
99+
3周前

2026.04.15 | ClawGUI开源全家桶;KnowRL精简提示提效

HuggingFace 每日AI论文速递

【赞助商】 通勤路上就听AI每周谈。AI每周谈,每周带你回顾上周AI大事 传送门 🔗www.xiaoyuzhoufm.com 【目录】 本期的 15 篇论文如下: 00:33 🤖 ClawGUI: A Unified Framework for Training, Evaluating, and Deploying GUI Agents(ClawGUI:用于训练、评估和部署GUI智能体的统一框架) 01:21 🧠 KnowRL: Boosting LLM Reasoning via Reinforcement Learning with Minimal-Sufficient Knowledge Guidance(KnowRL:通过强化学习与最小充分知识指导提升大语言模型推理能力) 02:16 🧠 Rethinking On-Policy Distillation of Large Language Models: Phenomenology, Mechanism, and Recipe(重新思考大型语言模型的在线策略蒸馏:现象学、机制与方案) 03:09 🤖 Turing Test on Screen: A Benchmark for Mobile GUI Agent Humanization(屏幕上的图灵测试:移动GUI代理拟人化基准) 04:01 🧠 SPPO: Sequence-Level PPO for Long-Horizon Reasoning Tasks(SPPO:面向长程推理任务的序列级近端策略优化) 04:47 🤖 Toward Autonomous Long-Horizon Engineering for ML Research(迈向自主长周期机器学习研究工程) 05:33 ⚖ BERT-as-a-Judge: A Robust Alternative to Lexical Methods for Efficient Reference-Based LLM Evaluation(BERT作为评判者:一种用于高效基于参考的LLM评估的鲁棒性替代词汇方法) 06:17 🔍 Towards Long-horizon Agentic Multimodal Search(迈向长视野智能体多模态搜索) 06:57 🌍 Lyra 2.0: Explorable Generative 3D Worlds(Lyra 2.0:可探索的生成式3D世界) 07:40 ⚡ Self-Adversarial One Step Generation via Condition Shifting(通过条件偏移实现的自对抗单步生成) 08:37 🤖 Habitat-GS: A High-Fidelity Navigation Simulator with Dynamic Gaussian Splatting(Habitat-GS:基于动态高斯溅射的高保真导航模拟器) 09:20 ⚖ Many-Tier Instruction Hierarchy in LLM Agents(大语言模型代理中的多层级指令层次结构) 10:04 🚀 Nemotron 3 Super: Open, Efficient Mixture-of-Experts Hybrid Mamba-Transformer Model for Agentic Reasoning(Nemotron 3 Super:用于智能体推理的开放、高效混合专家Mamba-Transformer模型) 10:52 🧠 Rethinking the Diffusion Model from a Langevin Perspective(从朗之万视角重新思考扩散模型) 11:44 🤖 LARY: A Latent Action Representation Yielding Benchmark for Generalizable Vision-to-Action Alignment(LARY:一种用于通用视觉-动作对齐基准的潜在动作表示) 【关注我们】 您还可以在以下平台找到我们,获得播客内容以外更多信息 小红书: AI速递

12分钟
99+
3周前

2026.04.14 | 错罚记忆助大模型提分;注意力沉没机制全解析

HuggingFace 每日AI论文速递

【赞助商】 通勤路上就听AI每周谈。AI每周谈,每周带你回顾上周AI大事 传送门 🔗https://www.xiaoyuzhoufm.com/podcast/688a34636f5a275f1cba40fd 【目录】 本期的 15 篇论文如下: [00:31] 🧠 The Past Is Not Past: Memory-Enhanced Dynamic Reward Shaping(过去并未过去:基于记忆增强的动态奖励塑形) [01:20] 🔍 Attention Sink in Transformers: A Survey on Utilization, Interpretation, and Mitigation(Transformer中的注意力沉没现象:利用、解释与缓解策略综述) [02:08] ⚛ QuanBench+: A Unified Multi-Framework Benchmark for LLM-Based Quantum Code Generation(QuanBench+:面向基于大语言模型的量子代码生成的统一多框架基准测试) [02:59] 🎬 OmniShow: Unifying Multimodal Conditions for Human-Object Interaction Video Generation(OmniShow:统一多模态条件的人-物交互视频生成) [03:35] 🎨 Strips as Tokens: Artist Mesh Generation with Native UV Segmentation(条带即令牌:基于原生UV分割的艺术家网格生成) [04:11] 🎬 Uni-ViGU: Towards Unified Video Generation and Understanding via A Diffusion-Based Video Generator(Uni-ViGU:通过基于扩散的视频生成器实现统一的视频生成与理解) [05:13] 🔍 Pseudo-Unification: Entropy Probing Reveals Divergent Information Patterns in Unified Multimodal Models(伪统一:熵探测揭示统一多模态模型中的信息模式分歧) [05:57] 🔍 CodeTracer: Towards Traceable Agent States(CodeTracer:迈向可追溯的智能体状态) [06:45] 🧪 CocoaBench: Evaluating Unified Digital Agents in the Wild(CocoaBench:在真实场景中评估统一数字智能体) [07:32] 🕸 Tracing the Roots: A Multi-Agent Framework for Uncovering Data Lineage in Post-Training LLMs(溯源寻根:用于揭示后训练大语言模型中数据谱系的多智能体框架) [08:17] 🤔 Introspective Diffusion Language Models(内省扩散语言模型) [09:12] 🧠 Solving Physics Olympiad via Reinforcement Learning on Physics Simulators(基于物理模拟器的强化学习解决物理奥林匹克竞赛问题) [09:50] 🎬 Prompt Relay: Inference-Time Temporal Control for Multi-Event Video Generation(提示接力:面向多事件视频生成的推理时态控制) [10:38] 🎵 Audio Flamingo Next: Next-Generation Open Audio-Language Models for Speech, Sound, and Music(Audio Flamingo Next:面向语音、声音与音乐的下一代开放音频-语言模型) [11:33] ⚡ SPEED-Bench: A Unified and Diverse Benchmark for Speculative Decoding(SPEED-Bench:一个用于推测解码的统一且多样化的基准测试) 【关注我们】 您还可以在以下平台找到我们,获得播客内容以外更多信息 小红书: AI速递

13分钟
99+
3周前

2026.04.13 | 百万图训WildDet3D;工业数据炼FORGE小钢炮

HuggingFace 每日AI论文速递

【赞助商】 通勤路上就听AI每周谈。AI每周谈,每周带你回顾上周AI大事 传送门 🔗https://www.xiaoyuzhoufm.com/podcast/688a34636f5a275f1cba40fd 【目录】 本期的 15 篇论文如下: [00:32] 🔍 WildDet3D: Scaling Promptable 3D Detection in the Wild(WildDet3D:可扩展的野外可提示三维检测) [01:39] 🔧 FORGE:Fine-grained Multimodal Evaluation for Manufacturing Scenarios(FORGE:面向制造场景的细粒度多模态评估) [02:25] 🔍 RefineAnything: Multimodal Region-Specific Refinement for Perfect Local Details(RefineAnything:面向完美局部细节的多模态区域特定精细化) [02:58] 🔍 EXAONE 4.5 Technical Report(EXAONE 4.5 技术报告) [03:56] 🎮 Matrix-Game 3.0: Real-Time and Streaming Interactive World Model with Long-Horizon Memory(Matrix-Game 3.0:具备长时记忆的实时流式交互世界模型) [04:47] ⚡ ECHO: Efficient Chest X-ray Report Generation with One-step Block Diffusion(ECHO:基于一步块扩散的高效胸部X光报告生成) [05:40] ♻ ELT: Elastic Looped Transformers for Visual Generation(ELT:用于视觉生成的弹性循环Transformer) [06:25] 🔍 VisionFoundry: Teaching VLMs Visual Perception with Synthetic Images(VisionFoundry:利用合成图像教授视觉语言模型视觉感知) [07:19] 🧠 Structured Causal Video Reasoning via Multi-Objective Alignment(通过多目标对齐实现结构化因果视频推理) [08:11] ⚠ Backdoor Attacks on Decentralised Post-Training(去中心化后训练中的后门攻击) [08:58] 🧠 AgentSwing: Adaptive Parallel Context Management Routing for Long-Horizon Web Agents(AgentSwing:面向长视野网络智能体的自适应并行上下文管理路由) [09:55] ⚠ Large Language Models Generate Harmful Content Using a Distinct, Unified Mechanism(大语言模型通过一种独特且统一的机制生成有害内容) [10:49] 🎭 Cross-Modal Emotion Transfer for Emotion Editing in Talking Face Video(面向说话人脸视频情感编辑的跨模态情感迁移) [11:28] 🔍 ScheMatiQ: From Research Question to Structured Data through Interactive Schema Discovery(ScheMatiQ:从研究问题到结构化数据——通过交互式模式发现) [12:17] 🔍 $p1$: Better Prompt Optimization with Fewer Prompts(p1:用更少的提示实现更好的提示优化) 【关注我们】 您还可以在以下平台找到我们,获得播客内容以外更多信息 小红书: AI速递

13分钟
99+
3周前

2026.04.10 | 群体外挂让AI升级;注意力内鬼兑现数字

HuggingFace 每日AI论文速递

【赞助商】 通勤路上就听AI每周谈。AI每周谈,每周带你回顾上周AI大事 传送门 🔗https://www.xiaoyuzhoufm.com/podcast/688a34636f5a275f1cba40fd 【目录】 本期的 15 篇论文如下: [00:33] 🧬 SkillClaw: Let Skills Evolve Collectively with Agentic Evolver(SkillClaw:让技能在智能体演化器中集体进化) [01:24] 🔢 When Numbers Speak: Aligning Textual Numerals and Visual Instances in Text-to-Video Diffusion Models(当数字说话:在文本到视频扩散模型中实现文本数字与视觉实例的对齐) [02:22] 🎨 MegaStyle: Constructing Diverse and Scalable Style Dataset via Consistent Text-to-Image Style Mapping(MegaStyle:通过一致的文本到图像风格映射构建多样且可扩展的风格数据集) [03:15] 🤖 HY-Embodied-0.5: Embodied Foundation Models for Real-World Agents(HY-Embodied-0.5:面向现实世界智能体的具身基础模型) [04:07] 🧠 Rethinking Generalization in Reasoning SFT: A Conditional Analysis on Optimization, Data, and Model Capability(重新审视推理监督微调中的泛化问题:关于优化、数据与模型能力的条件性分析) [04:52] 🤖 ClawBench: Can AI Agents Complete Everyday Online Tasks?(ClawBench:AI智能体能否完成日常在线任务?) [05:31] 📱 KnowU-Bench: Towards Interactive, Proactive, and Personalized Mobile Agent Evaluation(KnowU-Bench:迈向交互式、主动式与个性化的移动代理评估) [06:18] 🧠 Externalization in LLM Agents: A Unified Review of Memory, Skills, Protocols and Harness Engineering(LLM智能体中的外部化:对记忆、技能、协议与治理工程的一体化综述) [07:09] 🎭 LPM 1.0: Video-based Character Performance Model(LPM 1.0:基于视频的角色表演模型) [07:58] 🧠 OpenSpatial: A Principled Data Engine for Empowering Spatial Intelligence(OpenSpatial:一个赋能空间智能的原则性数据引擎) [08:50] 🧠 Act Wisely: Cultivating Meta-Cognitive Tool Use in Agentic Multimodal Models(明智行动:在智能多模态模型中培养元认知工具使用能力) [09:41] ⚡ DMax: Aggressive Parallel Decoding for dLLMs(DMax:面向扩散语言模型的激进并行解码) [10:20] 🧠 Graph of Skills: Dependency-Aware Structural Retrieval for Massive Agent Skills(技能图谱:面向海量智能体技能的依赖感知结构化检索方法) [11:02] 🧩 OmniJigsaw: Enhancing Omni-Modal Reasoning via Modality-Orchestrated Reordering(OmniJigsaw:通过模态编排重排序增强全模态推理) [11:41] 🧠 OpenVLThinkerV2: A Generalist Multimodal Reasoning Model for Multi-domain Visual Tasks(OpenVLThinkerV2:一个面向多领域视觉任务的通用多模态推理模型) 【关注我们】 您还可以在以下平台找到我们,获得播客内容以外更多信息 小红书: AI速递

13分钟
99+
4周前

2026.04.09 | RL智能体模板病;分步生图更可控

HuggingFace 每日AI论文速递

【赞助商】 通勤路上就听AI每周谈。AI每周谈,每周带你回顾上周AI大事 传送门 🔗https://www.xiaoyuzhoufm.com/podcast/688a34636f5a275f1cba40fd 【目录】 本期的 15 篇论文如下: [00:31] 🧠 RAGEN-2: Reasoning Collapse in Agentic RL(RAGEN-2:智能体强化学习中的推理崩溃) [01:21] 🎨 Think in Strokes, Not Pixels: Process-Driven Image Generation via Interleaved Reasoning(以笔画思考,而非像素:通过交错推理实现过程驱动的图像生成) [02:00] ⚡ MARS: Enabling Autoregressive Models Multi-Token Generation(MARS:实现自回归模型的多令牌生成) [02:51] 🌍 INSPATIO-WORLD: A Real-Time 4D World Simulator via Spatiotemporal Autoregressive Modeling(INSPATIO-WORLD:基于时空自回归建模的实时4D世界模拟器) [03:48] 🔬 SEVerA: Verified Synthesis of Self-Evolving Agents(SEVerA:可验证自进化智能体的合成) [04:41] 🔍 TC-AE: Unlocking Token Capacity for Deep Compression Autoencoders(TC-AE:解锁深度压缩自编码器的令牌容量) [05:26] ⚡ FP4 Explore, BF16 Train: Diffusion Reinforcement Learning via Efficient Rollout Scaling(FP4探索,BF16训练:通过高效扩展rollout的扩散模型强化学习) [06:17] 🔄 FlowInOne:Unifying Multimodal Generation as Image-in, Image-out Flow Matching(FlowInOne:将多模态生成统一为图像输入-图像输出的流匹配) [07:00] 🧠 Neural Computers(神经计算机) [07:37] 🎯 Personalized RewardBench: Evaluating Reward Models with Human Aligned Personalization(个性化奖励模型基准:基于人类对齐个性化的奖励模型评估) [08:22] 💡 Learning to Hint for Reinforcement Learning(强化学习的提示学习) [09:11] 🧠 Fast Spatial Memory with Elastic Test-Time Training(基于弹性测试时训练的高速空间记忆) [09:44] 🎬 MoRight: Motion Control Done Right(MoRight:正确的运动控制) [10:21] 🌐 Improving Semantic Proximity in Information Retrieval through Cross-Lingual Alignment(通过跨语言对齐提升信息检索中的语义邻近性) [11:02] 📊 Beyond Hard Negatives: The Importance of Score Distribution in Knowledge Distillation for Dense Retrieval(超越困难负样本:知识蒸馏中分数分布对稠密检索的重要性) 【关注我们】 您还可以在以下平台找到我们,获得播客内容以外更多信息 小红书: AI速递

12分钟
99+
1个月前

2026.04.08 | Video-MME-v2地狱题库拷打模型;Claw-Eval全程审计守卫可信代理

HuggingFace 每日AI论文速递

【赞助商】 通勤路上就听AI每周谈。AI每周谈,每周带你回顾上周AI大事 传送门 🔗https://www.xiaoyuzhoufm.com/podcast/688a34636f5a275f1cba40fd 【目录】 本期的 15 篇论文如下: [00:34] 🎯 Video-MME-v2: Towards the Next Stage in Benchmarks for Comprehensive Video Understanding(Video-MME-v2:迈向全面视频理解基准的下一个阶段) [01:19] 🔬 Claw-Eval: Toward Trustworthy Evaluation of Autonomous Agents(Claw-Eval:迈向可信赖的自主智能体评估) [02:06] 🤖 Learning to Retrieve from Agent Trajectories(从智能体轨迹中学习检索) [02:53] 🧪 ACES: Who Tests the Tests? Leave-One-Out AUC Consistency for Code Generation(ACES:谁来测试测试?代码生成的留一法AUC一致性) [03:42] 👗 Vanast: Virtual Try-On with Human Image Animation via Synthetic Triplet Supervision(Vanast:基于合成三元组监督的虚拟试穿与人体图像动画) [04:31] ⏱ Beyond Accuracy: Unveiling Inefficiency Patterns in Tool-Integrated Reasoning(超越准确率:揭示工具集成推理中的低效模式) [05:23] 🧠 ThinkTwice: Jointly Optimizing Large Language Models for Reasoning and Self-Refinement(ThinkTwice:联合优化大型语言模型的推理与自我精炼能力) [06:03] 🔍 Paper Circle: An Open-source Multi-agent Research Discovery and Analysis Framework(论文圈:一个开源的多智能体研究文献发现与分析框架) [06:52] 🔍 How Well Do Agentic Skills Work in the Wild: Benchmarking LLM Skill Usage in Realistic Settings(智能体技能在真实场景中的效用评估:基准测试LLM在现实环境下的技能使用) [07:33] 🚀 MegaTrain: Full Precision Training of 100B+ Parameter Large Language Models on a Single GPU(MegaTrain:在单GPU上全精度训练1000亿+参数大语言模型) [08:11] 🛠 DARE: Diffusion Large Language Models Alignment and Reinforcement Executor(DARE:扩散大语言模型的对齐与强化执行器) [08:54] 🧠 In-Place Test-Time Training(原位测试时训练) [09:39] 🎬 Watch Before You Answer: Learning from Visually Grounded Post-Training(先看后答:基于视觉基础的后训练学习) [10:13] 🔍 Demystifying When Pruning Works via Representation Hierarchies(通过表征层次解析剪枝何时有效) [10:59] 🤖 Action Images: End-to-End Policy Learning via Multiview Video Generation(动作图像:通过多视角视频生成的端到端策略学习) 【关注我们】 您还可以在以下平台找到我们,获得播客内容以外更多信息 小红书: AI速递

12分钟
99+
1个月前

2026.04.07 | 统一世界模型框架;小模型大数据突破

HuggingFace 每日AI论文速递

【赞助商】 通勤路上就听AI每周谈。AI每周谈,每周带你回顾上周AI大事 传送门 🔗https://www.xiaoyuzhoufm.com/podcast/688a34636f5a275f1cba40fd 【目录】 本期的 15 篇论文如下: [00:33] 🧠 OpenWorldLib: A Unified Codebase and Definition of Advanced World Models(OpenWorldLib:一个统一代码库与高级世界模型定义) [01:26] 📊 MinerU2.5-Pro: Pushing the Limits of Data-Centric Document Parsing at Scale(MinerU2.5-Pro:在规模上突破数据为中心文档解析的极限) [02:07] 🧠 TriAttention: Efficient Long Reasoning with Trigonometric KV Compression(TriAttention:基于三角函数的KV压缩实现高效长序列推理) [02:58] 🎥 AURA: Always-On Understanding and Real-Time Assistance via Video Streams(AURA:基于视频流的持续理解与实时辅助系统) [03:42] 🔍 LIBERO-Para: A Diagnostic Benchmark and Metrics for Paraphrase Robustness in VLA Models(LIBERO-Para:面向VLA模型的释义鲁棒性诊断基准与度量) [04:24] 🎯 SpatialEdit: Benchmarking Fine-Grained Image Spatial Editing(SpatialEdit:细粒度图像空间编辑基准测试) [05:07] 📈 Adam's Law: Textual Frequency Law on Large Language Models(亚当定律:大语言模型上的文本频率定律) [05:56] 🗂 FileGram: Grounding Agent Personalization in File-System Behavioral Traces(FileGram:基于文件系统行为轨迹的智能体个性化研究) [06:45] 🧪 ClawArena: Benchmarking AI Agents in Evolving Information Environments(ClawArena:在演化信息环境中对AI智能体进行基准测试) [07:38] 🧠 LightThinker++: From Reasoning Compression to Memory Management(LightThinker++:从推理压缩到内存管理) [08:12] 🔄 Unifying Group-Relative and Self-Distillation Policy Optimization via Sample Routing(通过样本路由统一组相对与自蒸馏策略优化) [08:50] 🧠 SkillX: Automatically Constructing Skill Knowledge Bases for Agents(SkillX:面向智能体的技能知识库自动构建框架) [09:39] 🤖 Self-Execution Simulation Improves Coding Models(自执行模拟提升代码模型性能) [10:22] 🧠 Vero: An Open RL Recipe for General Visual Reasoning(Vero:一种用于通用视觉推理的开放强化学习方案) [11:12] 🛡 Your Agent, Their Asset: A Real-World Safety Analysis of OpenClaw(你的智能体,他们的资产:OpenClaw 的现实世界安全性分析) 【关注我们】 您还可以在以下平台找到我们,获得播客内容以外更多信息 小红书: AI速递

12分钟
99+
1个月前

2026.04.06 | 自蒸馏RL堵信息泄露;极简流式视频反超记忆派

HuggingFace 每日AI论文速递

【赞助商】 通勤路上就听AI每周谈。AI每周谈,每周带你回顾上周AI大事 传送门 🔗https://www.xiaoyuzhoufm.com/podcast/688a34636f5a275f1cba40fd 【目录】 本期的 15 篇论文如下: [00:29] 🧠 Self-Distilled RLVR(基于自蒸馏的强化学习与可验证奖励) [01:18] 🎯 A Simple Baseline for Streaming Video Understanding(流式视频理解的简单基线) [02:07] 🔍 Token Warping Helps MLLMs Look from Nearby Viewpoints(Token扭曲助力多模态大语言模型从邻近视角观察) [03:06] 🔍 Agentic-MME: What Agentic Capability Really Brings to Multimodal Intelligence?(Agentic-MME:能动性能力究竟为多模态智能带来了什么?) [03:57] 📈 Test-Time Scaling Makes Overtraining Compute-Optimal(测试时扩展使过度训练达到计算最优) [04:56] 🧠 Communicating about Space: Language-Mediated Spatial Integration Across Partial Views(空间交流:跨局部视角的语言中介空间整合) [05:39] 🏆 GrandCode: Achieving Grandmaster Level in Competitive Programming via Agentic Reinforcement Learning(GrandCode:通过智能体强化学习在竞技编程中达到宗师级水平) [06:27] 🤖 InCoder-32B-Thinking: Industrial Code World Model for Thinking(InCoder-32B-Thinking:面向思考的工业代码世界模型) [07:22] 🛡 AgentSocialBench: Evaluating Privacy Risks in Human-Centered Agentic Social Networks(AgentSocialBench:评估以人为中心的代理社交网络中的隐私风险) [08:10] ⚠ AgentHazard: A Benchmark for Evaluating Harmful Behavior in Computer-Use Agents(AgentHazard:计算机使用智能体有害行为评估基准) [08:52] ⚡ Swift-SVD: Theoretical Optimality Meets Practical Efficiency in Low-Rank LLM Compression(Swift-SVD:理论最优性与实际效率在低秩大语言模型压缩中的结合) [09:39] 🔍 VLMs Need Words: Vision Language Models Ignore Visual Detail In Favor of Semantic Anchors(视觉语言模型需要词汇:视觉语言模型忽略视觉细节而依赖语义锚点) [10:30] 📊 Xpertbench: Expert Level Tasks with Rubrics-Based Evaluation(Xpertbench:基于量规评估的专家级任务基准) [11:16] 🎬 Salt: Self-Consistent Distribution Matching with Cache-Aware Training for Fast Video Generation(Salt:用于快速视频生成的自洽分布匹配与缓存感知训练) [12:04] 🤝 CoME-VL: Scaling Complementary Multi-Encoder Vision-Language Learning(CoME-VL:扩展互补多编码器视觉语言学习) 【关注我们】 您还可以在以下平台找到我们,获得播客内容以外更多信息 小红书: AI速递

13分钟
99+
1个月前

加入我们的 Discord

与播客爱好者一起交流

立即加入

扫描微信二维码

添加微信好友,获取更多播客资讯

微信二维码

播放列表

自动播放下一个

播放列表还是空的

去找些喜欢的节目添加进来吧