HuggingFace 每日AI论文速递 - 节目列表

2026.04.13 | 百万图训WildDet3D;工业数据炼FORGE小钢炮

2026.04.13 | 百万图训WildDet3D;工业数据炼FORGE小钢炮

HuggingFace 每日AI论文速递

【赞助商】通勤路上就听AI每周谈。AI每周谈,每周带你回顾上周AI大事传送门 🔗https://www.xiaoyuzhoufm.com/podcast/688a34636f5a275f1cba40fd【目录】本期的 15 篇论文如下:[00:32] 🔍 WildDet3D: Scaling Promptable 3D Detection in the Wild(WildDet3D:可扩展的野外可提示三维检测)[01:39] 🔧 FORGE:Fine-grained Multimodal Evaluation for Manufacturing Scenarios(FORGE:面向制造场景的细粒度多模态评估)[02:25] 🔍 RefineAnything: Multimodal Region-Specific Refinement for Perfect Local Details(RefineAnything:面向完美局部细节的多模态区域特定精细化)[02:58] 🔍 EXAONE 4.5 Technical Report(EXAONE 4.5 技术报告)[03:56] 🎮 Matrix-Game 3.0: Real-Time and Streaming Interactive World Model with Long-Horizon Memory(Matrix-Game 3.0:具备长时记忆的实时流式交互世界模型)[04:47] ⚡ ECHO: Efficient Chest X-ray Report Generation with One-step Block Diffusion(ECHO:基于一步块扩散的高效胸部X光报告生成)[05:40] ♻ ELT: Elastic Looped Transformers for Visual Generation(ELT:用于视觉生成的弹性循环Transformer)[06:25] 🔍 VisionFoundry: Teaching VLMs Visual Perception with Synthetic Images(VisionFoundry:利用合成图像教授视觉语言模型视觉感知)[07:19] 🧠 Structured Causal Video Reasoning via Multi-Objective Alignment(通过多目标对齐实现结构化因果视频推理)[08:11] ⚠ Backdoor Attacks on Decentralised Post-Training(去中心化后训练中的后门攻击)[08:58] 🧠 AgentSwing: Adaptive Parallel Context Management Routing for Long-Horizon Web Agents(AgentSwing:面向长视野网络智能体的自适应并行上下文管理路由)[09:55] ⚠ Large Language Models Generate Harmful Content Using a Distinct, Unified Mechanism(大语言模型通过一种独特且统一的机制生成有害内容)[10:49] 🎭 Cross-Modal Emotion Transfer for Emotion Editing in Talking Face Video(面向说话人脸视频情感编辑的跨模态情感迁移)[11:28] 🔍 ScheMatiQ: From Research Question to Structured Data through Interactive Schema Discovery(ScheMatiQ:从研究问题到结构化数据——通过交互式模式发现)[12:17] 🔍 $p1$: Better Prompt Optimization with Fewer Prompts(p1:用更少的提示实现更好的提示优化)【关注我们】您还可以在以下平台找到我们,获得播客内容以外更多信息小红书: AI速递在小宇宙查看该单集文稿

13分钟
99+
2个月前
2026.04.10 | 群体外挂让AI升级;注意力内鬼兑现数字

2026.04.10 | 群体外挂让AI升级;注意力内鬼兑现数字

HuggingFace 每日AI论文速递

【赞助商】通勤路上就听AI每周谈。AI每周谈,每周带你回顾上周AI大事传送门 🔗https://www.xiaoyuzhoufm.com/podcast/688a34636f5a275f1cba40fd【目录】本期的 15 篇论文如下:[00:33] 🧬 SkillClaw: Let Skills Evolve Collectively with Agentic Evolver(SkillClaw:让技能在智能体演化器中集体进化)[01:24] 🔢 When Numbers Speak: Aligning Textual Numerals and Visual Instances in Text-to-Video Diffusion Models(当数字说话:在文本到视频扩散模型中实现文本数字与视觉实例的对齐)[02:22] 🎨 MegaStyle: Constructing Diverse and Scalable Style Dataset via Consistent Text-to-Image Style Mapping(MegaStyle:通过一致的文本到图像风格映射构建多样且可扩展的风格数据集)[03:15] 🤖 HY-Embodied-0.5: Embodied Foundation Models for Real-World Agents(HY-Embodied-0.5:面向现实世界智能体的具身基础模型)[04:07] 🧠 Rethinking Generalization in Reasoning SFT: A Conditional Analysis on Optimization, Data, and Model Capability(重新审视推理监督微调中的泛化问题:关于优化、数据与模型能力的条件性分析)[04:52] 🤖 ClawBench: Can AI Agents Complete Everyday Online Tasks?(ClawBench:AI智能体能否完成日常在线任务?)[05:31] 📱 KnowU-Bench: Towards Interactive, Proactive, and Personalized Mobile Agent Evaluation(KnowU-Bench:迈向交互式、主动式与个性化的移动代理评估)[06:18] 🧠 Externalization in LLM Agents: A Unified Review of Memory, Skills, Protocols and Harness Engineering(LLM智能体中的外部化:对记忆、技能、协议与治理工程的一体化综述)[07:09] 🎭 LPM 1.0: Video-based Character Performance Model(LPM 1.0:基于视频的角色表演模型)[07:58] 🧠 OpenSpatial: A Principled Data Engine for Empowering Spatial Intelligence(OpenSpatial:一个赋能空间智能的原则性数据引擎)[08:50] 🧠 Act Wisely: Cultivating Meta-Cognitive Tool Use in Agentic Multimodal Models(明智行动:在智能多模态模型中培养元认知工具使用能力)[09:41] ⚡ DMax: Aggressive Parallel Decoding for dLLMs(DMax:面向扩散语言模型的激进并行解码)[10:20] 🧠 Graph of Skills: Dependency-Aware Structural Retrieval for Massive Agent Skills(技能图谱:面向海量智能体技能的依赖感知结构化检索方法)[11:02] 🧩 OmniJigsaw: Enhancing Omni-Modal Reasoning via Modality-Orchestrated Reordering(OmniJigsaw:通过模态编排重排序增强全模态推理)[11:41] 🧠 OpenVLThinkerV2: A Generalist Multimodal Reasoning Model for Multi-domain Visual Tasks(OpenVLThinkerV2:一个面向多领域视觉任务的通用多模态推理模型)【关注我们】您还可以在以下平台找到我们,获得播客内容以外更多信息小红书: AI速递在小宇宙查看该单集文稿

13分钟
99+
2个月前
2026.04.09 | RL智能体模板病;分步生图更可控

2026.04.09 | RL智能体模板病;分步生图更可控

HuggingFace 每日AI论文速递

【赞助商】通勤路上就听AI每周谈。AI每周谈,每周带你回顾上周AI大事传送门 🔗https://www.xiaoyuzhoufm.com/podcast/688a34636f5a275f1cba40fd【目录】本期的 15 篇论文如下:[00:31] 🧠 RAGEN-2: Reasoning Collapse in Agentic RL(RAGEN-2:智能体强化学习中的推理崩溃)[01:21] 🎨 Think in Strokes, Not Pixels: Process-Driven Image Generation via Interleaved Reasoning(以笔画思考,而非像素:通过交错推理实现过程驱动的图像生成)[02:00] ⚡ MARS: Enabling Autoregressive Models Multi-Token Generation(MARS:实现自回归模型的多令牌生成)[02:51] 🌍 INSPATIO-WORLD: A Real-Time 4D World Simulator via Spatiotemporal Autoregressive Modeling(INSPATIO-WORLD:基于时空自回归建模的实时4D世界模拟器)[03:48] 🔬 SEVerA: Verified Synthesis of Self-Evolving Agents(SEVerA:可验证自进化智能体的合成)[04:41] 🔍 TC-AE: Unlocking Token Capacity for Deep Compression Autoencoders(TC-AE:解锁深度压缩自编码器的令牌容量)[05:26] ⚡ FP4 Explore, BF16 Train: Diffusion Reinforcement Learning via Efficient Rollout Scaling(FP4探索,BF16训练:通过高效扩展rollout的扩散模型强化学习)[06:17] 🔄 FlowInOne:Unifying Multimodal Generation as Image-in, Image-out Flow Matching(FlowInOne:将多模态生成统一为图像输入-图像输出的流匹配)[07:00] 🧠 Neural Computers(神经计算机)[07:37] 🎯 Personalized RewardBench: Evaluating Reward Models with Human Aligned Personalization(个性化奖励模型基准:基于人类对齐个性化的奖励模型评估)[08:22] 💡 Learning to Hint for Reinforcement Learning(强化学习的提示学习)[09:11] 🧠 Fast Spatial Memory with Elastic Test-Time Training(基于弹性测试时训练的高速空间记忆)[09:44] 🎬 MoRight: Motion Control Done Right(MoRight:正确的运动控制)[10:21] 🌐 Improving Semantic Proximity in Information Retrieval through Cross-Lingual Alignment(通过跨语言对齐提升信息检索中的语义邻近性)[11:02] 📊 Beyond Hard Negatives: The Importance of Score Distribution in Knowledge Distillation for Dense Retrieval(超越困难负样本:知识蒸馏中分数分布对稠密检索的重要性)【关注我们】您还可以在以下平台找到我们,获得播客内容以外更多信息小红书: AI速递在小宇宙查看该单集文稿

12分钟
99+
2个月前
2026.04.08 | Video-MME-v2地狱题库拷打模型;Claw-Eval全程审计守卫可信代理

2026.04.08 | Video-MME-v2地狱题库拷打模型;Claw-Eval全程审计守卫可信代理

HuggingFace 每日AI论文速递

【赞助商】通勤路上就听AI每周谈。AI每周谈,每周带你回顾上周AI大事传送门 🔗https://www.xiaoyuzhoufm.com/podcast/688a34636f5a275f1cba40fd【目录】本期的 15 篇论文如下:[00:34] 🎯 Video-MME-v2: Towards the Next Stage in Benchmarks for Comprehensive Video Understanding(Video-MME-v2:迈向全面视频理解基准的下一个阶段)[01:19] 🔬 Claw-Eval: Toward Trustworthy Evaluation of Autonomous Agents(Claw-Eval:迈向可信赖的自主智能体评估)[02:06] 🤖 Learning to Retrieve from Agent Trajectories(从智能体轨迹中学习检索)[02:53] 🧪 ACES: Who Tests the Tests? Leave-One-Out AUC Consistency for Code Generation(ACES:谁来测试测试?代码生成的留一法AUC一致性)[03:42] 👗 Vanast: Virtual Try-On with Human Image Animation via Synthetic Triplet Supervision(Vanast:基于合成三元组监督的虚拟试穿与人体图像动画)[04:31] ⏱ Beyond Accuracy: Unveiling Inefficiency Patterns in Tool-Integrated Reasoning(超越准确率:揭示工具集成推理中的低效模式)[05:23] 🧠 ThinkTwice: Jointly Optimizing Large Language Models for Reasoning and Self-Refinement(ThinkTwice:联合优化大型语言模型的推理与自我精炼能力)[06:03] 🔍 Paper Circle: An Open-source Multi-agent Research Discovery and Analysis Framework(论文圈:一个开源的多智能体研究文献发现与分析框架)[06:52] 🔍 How Well Do Agentic Skills Work in the Wild: Benchmarking LLM Skill Usage in Realistic Settings(智能体技能在真实场景中的效用评估:基准测试LLM在现实环境下的技能使用)[07:33] 🚀 MegaTrain: Full Precision Training of 100B+ Parameter Large Language Models on a Single GPU(MegaTrain:在单GPU上全精度训练1000亿+参数大语言模型)[08:11] 🛠 DARE: Diffusion Large Language Models Alignment and Reinforcement Executor(DARE:扩散大语言模型的对齐与强化执行器)[08:54] 🧠 In-Place Test-Time Training(原位测试时训练)[09:39] 🎬 Watch Before You Answer: Learning from Visually Grounded Post-Training(先看后答:基于视觉基础的后训练学习)[10:13] 🔍 Demystifying When Pruning Works via Representation Hierarchies(通过表征层次解析剪枝何时有效)[10:59] 🤖 Action Images: End-to-End Policy Learning via Multiview Video Generation(动作图像:通过多视角视频生成的端到端策略学习)【关注我们】您还可以在以下平台找到我们,获得播客内容以外更多信息小红书: AI速递在小宇宙查看该单集文稿

12分钟
99+
2个月前
2026.04.07 | 统一世界模型框架;小模型大数据突破

2026.04.07 | 统一世界模型框架;小模型大数据突破

HuggingFace 每日AI论文速递

【赞助商】通勤路上就听AI每周谈。AI每周谈,每周带你回顾上周AI大事传送门 🔗https://www.xiaoyuzhoufm.com/podcast/688a34636f5a275f1cba40fd【目录】本期的 15 篇论文如下:[00:33] 🧠 OpenWorldLib: A Unified Codebase and Definition of Advanced World Models(OpenWorldLib:一个统一代码库与高级世界模型定义)[01:26] 📊 MinerU2.5-Pro: Pushing the Limits of Data-Centric Document Parsing at Scale(MinerU2.5-Pro:在规模上突破数据为中心文档解析的极限)[02:07] 🧠 TriAttention: Efficient Long Reasoning with Trigonometric KV Compression(TriAttention:基于三角函数的KV压缩实现高效长序列推理)[02:58] 🎥 AURA: Always-On Understanding and Real-Time Assistance via Video Streams(AURA:基于视频流的持续理解与实时辅助系统)[03:42] 🔍 LIBERO-Para: A Diagnostic Benchmark and Metrics for Paraphrase Robustness in VLA Models(LIBERO-Para:面向VLA模型的释义鲁棒性诊断基准与度量)[04:24] 🎯 SpatialEdit: Benchmarking Fine-Grained Image Spatial Editing(SpatialEdit:细粒度图像空间编辑基准测试)[05:07] 📈 Adam's Law: Textual Frequency Law on Large Language Models(亚当定律:大语言模型上的文本频率定律)[05:56] 🗂 FileGram: Grounding Agent Personalization in File-System Behavioral Traces(FileGram:基于文件系统行为轨迹的智能体个性化研究)[06:45] 🧪 ClawArena: Benchmarking AI Agents in Evolving Information Environments(ClawArena:在演化信息环境中对AI智能体进行基准测试)[07:38] 🧠 LightThinker++: From Reasoning Compression to Memory Management(LightThinker++:从推理压缩到内存管理)[08:12] 🔄 Unifying Group-Relative and Self-Distillation Policy Optimization via Sample Routing(通过样本路由统一组相对与自蒸馏策略优化)[08:50] 🧠 SkillX: Automatically Constructing Skill Knowledge Bases for Agents(SkillX:面向智能体的技能知识库自动构建框架)[09:39] 🤖 Self-Execution Simulation Improves Coding Models(自执行模拟提升代码模型性能)[10:22] 🧠 Vero: An Open RL Recipe for General Visual Reasoning(Vero:一种用于通用视觉推理的开放强化学习方案)[11:12] 🛡 Your Agent, Their Asset: A Real-World Safety Analysis of OpenClaw(你的智能体,他们的资产:OpenClaw 的现实世界安全性分析)【关注我们】您还可以在以下平台找到我们,获得播客内容以外更多信息小红书: AI速递在小宇宙查看该单集文稿

12分钟
99+
2个月前
2026.04.06 | 自蒸馏RL堵信息泄露;极简流式视频反超记忆派

2026.04.06 | 自蒸馏RL堵信息泄露;极简流式视频反超记忆派

HuggingFace 每日AI论文速递

【赞助商】通勤路上就听AI每周谈。AI每周谈,每周带你回顾上周AI大事传送门 🔗https://www.xiaoyuzhoufm.com/podcast/688a34636f5a275f1cba40fd【目录】本期的 15 篇论文如下:[00:29] 🧠 Self-Distilled RLVR(基于自蒸馏的强化学习与可验证奖励)[01:18] 🎯 A Simple Baseline for Streaming Video Understanding(流式视频理解的简单基线)[02:07] 🔍 Token Warping Helps MLLMs Look from Nearby Viewpoints(Token扭曲助力多模态大语言模型从邻近视角观察)[03:06] 🔍 Agentic-MME: What Agentic Capability Really Brings to Multimodal Intelligence?(Agentic-MME:能动性能力究竟为多模态智能带来了什么?)[03:57] 📈 Test-Time Scaling Makes Overtraining Compute-Optimal(测试时扩展使过度训练达到计算最优)[04:56] 🧠 Communicating about Space: Language-Mediated Spatial Integration Across Partial Views(空间交流:跨局部视角的语言中介空间整合)[05:39] 🏆 GrandCode: Achieving Grandmaster Level in Competitive Programming via Agentic Reinforcement Learning(GrandCode:通过智能体强化学习在竞技编程中达到宗师级水平)[06:27] 🤖 InCoder-32B-Thinking: Industrial Code World Model for Thinking(InCoder-32B-Thinking:面向思考的工业代码世界模型)[07:22] 🛡 AgentSocialBench: Evaluating Privacy Risks in Human-Centered Agentic Social Networks(AgentSocialBench:评估以人为中心的代理社交网络中的隐私风险)[08:10] ⚠ AgentHazard: A Benchmark for Evaluating Harmful Behavior in Computer-Use Agents(AgentHazard:计算机使用智能体有害行为评估基准)[08:52] ⚡ Swift-SVD: Theoretical Optimality Meets Practical Efficiency in Low-Rank LLM Compression(Swift-SVD:理论最优性与实际效率在低秩大语言模型压缩中的结合)[09:39] 🔍 VLMs Need Words: Vision Language Models Ignore Visual Detail In Favor of Semantic Anchors(视觉语言模型需要词汇:视觉语言模型忽略视觉细节而依赖语义锚点)[10:30] 📊 Xpertbench: Expert Level Tasks with Rubrics-Based Evaluation(Xpertbench:基于量规评估的专家级任务基准)[11:16] 🎬 Salt: Self-Consistent Distribution Matching with Cache-Aware Training for Fast Video Generation(Salt:用于快速视频生成的自洽分布匹配与缓存感知训练)[12:04] 🤝 CoME-VL: Scaling Complementary Multi-Encoder Vision-Language Learning(CoME-VL:扩展互补多编码器视觉语言学习)【关注我们】您还可以在以下平台找到我们,获得播客内容以外更多信息小红书: AI速递在小宇宙查看该单集文稿

13分钟
99+
2个月前
【月末特辑】3月最火AI论文 | AI学品味挑好题;降噪步骤链解视频

【月末特辑】3月最火AI论文 | AI学品味挑好题;降噪步骤链解视频

HuggingFace 每日AI论文速递

【赞助商】通勤路上就听AI每周谈。AI每周谈,每周带你回顾上周AI大事传送门 🔗https://www.xiaoyuzhoufm.com/podcast/688a34636f5a275f1cba40fd【目录】本期的 10 篇论文如下:[00:48] TOP1(🔥415) | 🧠 AI Can Learn Scientific Taste(AI可以学习科学品味)[02:41] TOP2(🔥367) | 🧠 Demystifing Video Reasoning(揭秘视频推理机制)[04:53] TOP3(🔥306) | 🏭 InCoder-32B: Code Foundation Model for Industrial Scenarios(InCoder-32B:面向工业场景的代码基础模型)[06:57] TOP4(🔥248) | 🗣 SocialOmni: Benchmarking Audio-Visual Social Interactivity in Omni Models(SocialOmni:全模态模型中视听社交交互能力的基准测试)[08:40] TOP5(🔥210) | 🧠 Bootstrapping Exploration with Group-Level Natural Language Feedback in Reinforcement Learning(强化学习中利用群体级自然语言反馈引导探索)[10:38] TOP6(🔥191) | 🤝 Heterogeneous Agent Collaborative Reinforcement Learning(异构智能体协作强化学习)[12:28] TOP7(🔥184) | 🧩 Utonia: Toward One Encoder for All Point Clouds(Utonia:迈向适用于所有点云的统一编码器)[14:15] TOP8(🔥184) | 🎬 Helios: Real Real-Time Long Video Generation Model(Helios:实时长视频生成模型)[16:46] TOP9(🔥184) | 🤖 MiroThinker-1.7 & H1: Towards Heavy-Duty Research Agents via Verification(MiroThinker-1.7与H1:通过验证迈向重型研究智能体)[18:47] TOP10(🔥171) | 🧠 Attention Residuals(注意力残差)【关注我们】您还可以在以下平台找到我们,获得播客内容以外更多信息小红书: AI速递在小宇宙查看该单集文稿

21分钟
99+
2个月前
【周末特辑】4月第1周最火AI论文 | FIPO破推理长度瓶颈;CARLA-Air空地仿真合一

【周末特辑】4月第1周最火AI论文 | FIPO破推理长度瓶颈;CARLA-Air空地仿真合一

HuggingFace 每日AI论文速递

【赞助商】通勤路上就听AI每周谈。AI每周谈,每周带你回顾上周AI大事传送门 🔗https://www.xiaoyuzhoufm.com/podcast/688a34636f5a275f1cba40fd【目录】本期的 5 篇论文如下:[00:40] TOP1(🔥309) | 🧠 FIPO: Eliciting Deep Reasoning with Future-KL Influenced Policy Optimization(FIPO:通过未来KL影响策略优化引导深度推理)[02:58] TOP2(🔥302) | 🚁 CARLA-Air: Fly Drones Inside a CARLA World -- A Unified Infrastructure for Air-Ground Embodied Intelligence(CARLA-Air:在CARLA世界中飞行无人机——面向空地具身智能的统一基础设施)[05:23] TOP3(🔥170) | 🛡 ClawKeeper: Comprehensive Safety Protection for OpenClaw Agents Through Skills, Plugins, and Watchers(ClawKeeper:通过技能、插件和监视器为OpenClaw代理提供全面的安全保护)[07:56] TOP4(🔥151) | 🎬 ShotStream: Streaming Multi-Shot Video Generation for Interactive Storytelling(ShotStream:用于交互式叙事的多镜头流式视频生成)[10:17] TOP5(🔥147) | 🧠 Out of Sight but Not Out of Mind: Hybrid Memory for Dynamic Video World Models(视野之外,记忆犹在:用于动态视频世界模型的混合记忆)【关注我们】您还可以在以下平台找到我们,获得播客内容以外更多信息小红书: AI速递在小宇宙查看该单集文稿

13分钟
99+
2个月前
2026.04.03 | DataFlex让数据像乐高;潜在空间成AI新地图

2026.04.03 | DataFlex让数据像乐高;潜在空间成AI新地图

HuggingFace 每日AI论文速递

【赞助商】通勤路上就听AI每周谈。AI每周谈,每周带你回顾上周AI大事传送门 🔗https://www.xiaoyuzhoufm.com/podcast/688a34636f5a275f1cba40fd【目录】本期的 15 篇论文如下:[00:41] 🔄 DataFlex: A Unified Framework for Data-Centric Dynamic Training of Large Language Models(DataFlex:面向大语言模型数据中心化动态训练的统一框架)[01:48] 🧠 The Latent Space: Foundation, Evolution, Mechanism, Ability, and Outlook(潜在空间:基础、演进、机制、能力与展望)[02:45] 🧠 SKILL0: In-Context Agentic Reinforcement Learning for Skill Internalization(SKILL0:用于技能内化的上下文智能体强化学习)[03:22] 🎮 Generative World Renderer(生成式世界渲染器)[04:09] 👁 EgoSim: Egocentric World Simulator for Embodied Interaction Generation(EgoSim:面向具身交互生成的第一人称世界模拟器)[05:24] 🧠 LatentUM: Unleashing the Potential of Interleaved Cross-Modal Reasoning via a Latent-Space Unified Model(LatentUM:通过潜在空间统一模型释放交错跨模态推理的潜力)[06:06] 🧠 Omni-SimpleMem: Autoresearch-Guided Discovery of Lifelong Multimodal Agent Memory(Omni-SimpleMem:基于自主研究引导的终身多模态智能体记忆发现)[06:47] 🚗 UniDriveVLA: Unifying Understanding, Perception, and Action Planning for Autonomous Driving(UniDriveVLA:统一自动驾驶中的理解、感知与动作规划)[07:35] 🎯 Steerable Visual Representations(可操控的视觉表示)[08:12] 🎬 VOID: Video Object and Interaction Deletion(VOID:视频对象与交互删除)[09:06] 🤖 Investigating Autonomous Agent Contributions in the Wild: Activity Patterns and Code Change over Time(探究自主编码代理在真实项目中的贡献:活动模式与代码随时间的变化)[09:47] 🚀 ASI-Evolve: AI Accelerates AI(ASI-Evolve:人工智能加速人工智能发展)[10:50] 🎭 Tex3D: Objects as Attack Surfaces via Adversarial 3D Textures for Vision-Language-Action Models(Tex3D:通过对抗性3D纹理将物体作为视觉-语言-动作模型的攻击面)[11:36] 🤖 GPA: Learning GUI Process Automation from Demonstrations(GPA:通过演示学习图形用户界面流程自动化)[12:24] 🔍 VideoZeroBench: Probing the Limits of Video MLLMs with Spatio-Temporal Evidence Verification(VideoZeroBench:通过时空证据验证探究视频多模态大语言模型的极限)【关注我们】您还可以在以下平台找到我们,获得播客内容以外更多信息小红书: AI速递在小宇宙查看该单集文稿

13分钟
99
2个月前
2026.04.02 | ClawKeeper三层守护智能体安全;终端智能体轻量API夺冠

2026.04.02 | ClawKeeper三层守护智能体安全;终端智能体轻量API夺冠

HuggingFace 每日AI论文速递

【赞助商】通勤路上就听AI每周谈。AI每周谈,每周带你回顾上周AI大事传送门 🔗https://www.xiaoyuzhoufm.com/podcast/688a34636f5a275f1cba40fd【目录】本期的 15 篇论文如下:[00:27] 🛡 ClawKeeper: Comprehensive Safety Protection for OpenClaw Agents Through Skills, Plugins, and Watchers(ClawKeeper:通过技能、插件和监视器为OpenClaw代理提供全面的安全保护)[01:20] 💻 Terminal Agents Suffice for Enterprise Automation(终端智能体足以实现企业自动化)[02:03] 📊 MiroEval: Benchmarking Multimodal Deep Research Agents in Process and Outcome(MiroEval:面向过程和结果的多模态深度研究智能体基准测试)[02:54] 🧠 ViGoR-Bench: How Far Are Visual Generative Models From Zero-Shot Visual Reasoners?(ViGoR-Bench:视觉生成模型距离零样本视觉推理器还有多远?)[03:40] 🔬 Vision2Web: A Hierarchical Benchmark for Visual Website Development with Agent Verification(Vision2Web:基于智能体验证的视觉网站开发分层基准)[04:26] 📊 QuitoBench: A High-Quality Open Time Series Forecasting Benchmark(QuitoBench:一个高质量开放时间序列预测基准)[05:12] 🧠 Reasoning Shift: How Context Silently Shortens LLM Reasoning(推理偏移:上下文如何悄然缩短大语言模型的推理过程)[05:59] 📊 HippoCamp: Benchmarking Contextual Agents on Personal Computers(HippoCamp:在个人计算机上评估情境智能体的基准)[06:52] 🧠 PerceptionComp: A Video Benchmark for Complex Perception-Centric Reasoning(PerceptionComp:面向复杂感知推理的视频基准测试)[07:34] ⚡ Universal YOCO for Efficient Depth Scaling(通用YOCO:实现高效深度扩展)[08:12] 🔄 Brevity Constraints Reverse Performance Hierarchies in Language Models(简洁性约束逆转语言模型的性能层级)[08:48] 🧠 GaussianGPT: Towards Autoregressive 3D Gaussian Scene Generation(GaussianGPT:迈向自回归3D高斯场景生成)[09:25] 📝 Paper Reconstruction Evaluation: Evaluating Presentation and Hallucination in AI-written Papers(论文重构评估:评估AI撰写论文的呈现质量与幻觉问题)[10:11] 🚀 Embarrassingly Simple Self-Distillation Improves Code Generation(极其简单的自蒸馏提升代码生成能力)[10:54] 🤖 Proactive Agent Research Environment: Simulating Active Users to Evaluate Proactive Assistants(主动式智能体研究环境:通过模拟主动用户来评估主动式助手)【关注我们】您还可以在以下平台找到我们,获得播客内容以外更多信息小红书: AI速递在小宇宙查看该单集文稿

12分钟
99+
2个月前
2026.04.01 | FIPO用KL引导深度推理;LongCat统一多模态token

2026.04.01 | FIPO用KL引导深度推理;LongCat统一多模态token

HuggingFace 每日AI论文速递

【赞助商】通勤路上就听AI每周谈。AI每周谈,每周带你回顾上周AI大事传送门 🔗https://www.xiaoyuzhoufm.com/podcast/688a34636f5a275f1cba40fd【目录】本期的 15 篇论文如下:[00:30] 🧠 FIPO: Eliciting Deep Reasoning with Future-KL Influenced Policy Optimization(FIPO:通过未来KL影响策略优化引导深度推理)[01:12] 🧩 LongCat-Next: Lexicalizing Modalities as Discrete Tokens(LongCat-Next:将多模态信息离散化为标记)[01:48] 🚁 CARLA-Air: Fly Drones Inside a CARLA World -- A Unified Infrastructure for Air-Ground Embodied Intelligence(CARLA-Air:在CARLA世界中飞行无人机——面向空地具身智能的统一基础设施)[02:31] 🧬 Lingshu-Cell: A generative cellular world model for transcriptome modeling toward virtual cells(Lingshu-Cell:一种用于转录组建模的生成式细胞世界模型,迈向虚拟细胞)[03:33] 🤖 GEMS: Agent-Native Multimodal Generation with Memory and Skills(GEMS:具备记忆与技能的智能体原生多模态生成框架)[04:12] 🎬 VGGRPO: Towards World-Consistent Video Generation with 4D Latent Reward(VGGRPO:迈向具有4D潜在奖励的世界一致性视频生成)[05:04] 🤖 Unify-Agent: A Unified Multimodal Agent for World-Grounded Image Synthesis(Unify-Agent:面向世界接地的图像合成的统一多模态智能体)[05:45] 🔬 daVinci-LLM:Towards the Science of Pretraining(daVinci-LLM:迈向预训练的科学)[06:19] 🎬 CutClaw: Agentic Hours-Long Video Editing via Music Synchronization(CutClaw:通过音乐同步实现代理式数小时视频编辑)[07:10] 🔍 MonitorBench: A Comprehensive Benchmark for Chain-of-Thought Monitorability in Large Language Models(MonitorBench:大型语言模型中思维链可监控性的综合基准)[07:58] 🧬 FlowPIE: Test-Time Scientific Idea Evolution with Flow-Guided Literature Exploration(FlowPIE:基于流引导文献探索的测试时科学思想演化)[08:46] 🏙 Extend3D: Town-Scale 3D Generation(Extend3D:城镇尺度的三维生成)[09:28] 💭 Think Anywhere in Code Generation(代码生成中的随处思考)[10:18] ⚙ OptiMer: Optimal Distribution Vector Merging Is Better than Data Mixing for Continual Pre-Training(OptiMer:最优分布向量合并优于数据混合用于持续预训练)[11:03] 🎨 VectorGym: A Multitask Benchmark for SVG Code Generation, Sketching, and Editing(VectorGym:面向SVG代码生成、绘制与编辑的多任务基准)【关注我们】您还可以在以下平台找到我们,获得播客内容以外更多信息小红书: AI速递在小宇宙查看该单集文稿

12分钟
93
2个月前

加入我们的 Discord

与播客爱好者一起交流

立即加入

扫描微信二维码

添加微信好友,获取更多播客资讯

微信二维码

播放列表

自动播放下一个

播放列表还是空的

去找些喜欢的节目添加进来吧