HuggingFace 每日AI论文速递 - 节目列表

2026.03.04 | 统一模型“对齐税”拖累理解;通用点云编码器一锅端多场景

2026.03.04 | 统一模型“对齐税”拖累理解;通用点云编码器一锅端多场景

HuggingFace 每日AI论文速递

【赞助商】通勤路上就听AI每周谈。AI每周谈,每周带你回顾上周AI大事传送门 🔗https://www.xiaoyuzhoufm.com/podcast/688a34636f5a275f1cba40fd【目录】本期的 15 篇论文如下:[00:32] 🔍 UniG2U-Bench: Do Unified Models Advance Multimodal Understanding?(UniG2U-Bench:统一模型是否推动了多模态理解的发展?)[01:40] 🧩 Utonia: Toward One Encoder for All Point Clouds(Utonia:迈向适用于所有点云的统一编码器)[02:21] 🔍 BeyondSWE: Can Current Code Agent Survive Beyond Single-Repo Bug Fixing?(超越SWE:当前代码智能体能否在单仓库缺陷修复之外生存?)[03:00] 🔍 Beyond Language Modeling: An Exploration of Multimodal Pretraining(超越语言建模:多模态预训练的探索)[03:53] 🧠 Beyond Length Scaling: Synergizing Breadth and Depth for Generative Reward Models(超越长度缩放:融合广度与深度以优化生成式奖励模型)[04:40] 🎯 How Controllable Are Large Language Models? A Unified Evaluation across Behavioral Granularities(大型语言模型的可控性如何?跨行为粒度的统一评估)[05:16] 🎬 Kling-MotionControl Technical Report(Kling-MotionControl技术报告)[05:58] 🎬 Kiwi-Edit: Versatile Video Editing via Instruction and Reference Guidance(Kiwi-Edit:基于指令与参考引导的通用视频编辑)[07:01] 🤖 Qwen3-Coder-Next Technical Report(Qwen3-Coder-Next技术报告)[07:46] 🧠 PRISM: Pushing the Frontier of Deep Think via Process Reward Model-Guided Inference(PRISM:通过过程奖励模型引导的推理推动深度思考前沿)[08:30] 🔍 InfoPO: Information-Driven Policy Optimization for User-Centric Agents(InfoPO:面向用户中心智能体的信息驱动策略优化)[09:29] 🔬 Surgical Post-Training: Cutting Errors, Keeping Knowledge(手术式后训练:精准修正错误,稳固保留知识)[10:14] 🎛 CFG-Ctrl: Control-Based Classifier-Free Diffusion Guidance(CFG-Ctrl:基于控制的Classifier-Free扩散引导)[10:53] 🎬 NOVA: Sparse Control, Dense Synthesis for Pair-Free Video Editing(NOVA:稀疏控制与密集合成的无配对视频编辑框架)[11:58] ⚡ Spilled Energy in Large Language Models(大语言模型中的能量溢出)【关注我们】您还可以在以下平台找到我们,获得播客内容以外更多信息小红书: AI速递在小宇宙查看该单集文稿

12分钟
99+
3个月前
2026.03.03 | 自适应扩展省算力;令牌秒变动效

2026.03.03 | 自适应扩展省算力;令牌秒变动效

HuggingFace 每日AI论文速递

【赞助商】通勤路上就听AI每周谈。AI每周谈,每周带你回顾上周AI大事传送门 🔗https://www.xiaoyuzhoufm.com/podcast/688a34636f5a275f1cba40fd【目录】本期的 15 篇论文如下:[00:30] ⚡ From Scale to Speed: Adaptive Test-Time Scaling for Image Editing(从规模到速度:图像编辑的自适应测试时扩展)[01:16] 🎨 OmniLottie: Generating Vector Animations via Parameterized Lottie Tokens(OmniLottie:通过参数化Lottie令牌生成矢量动画)[01:57] 🤖 OpenAutoNLU: Open Source AutoML Library for NLU(OpenAutoNLU:面向自然语言理解的开源自动机器学习库)[02:37] 🧩 MMR-Life: Piecing Together Real-life Scenes for Multimodal Multi-image Reasoning(MMR-Life:拼凑真实生活场景以实现多模态多图像推理)[03:32] 📊 RubricBench: Aligning Model-Generated Rubrics with Human Standards(RubricBench:对齐模型生成的评分标准与人类标准)[04:16] 🧠 CHIMERA: Compact Synthetic Data for Generalizable LLM Reasoning(CHIMERA:用于通用大语言模型推理的紧凑合成数据集)[05:04] 🔍 VGGT-Det: Mining VGGT Internal Priors for Sensor-Geometry-Free Multi-View Indoor 3D Object Detection(VGGT-Det:挖掘VGGT内部先验实现无需传感器几何的多视角室内3D目标检测)[06:08] 🤖 CoVe: Training Interactive Tool-Use Agents via Constraint-Guided Verification(CoVe:通过约束引导验证训练交互式工具使用智能体)[06:50] ⚙ SWE-rebench V2: Language-Agnostic SWE Task Collection at Scale(SWE-rebench V2:大规模语言无关的软件工程任务集合)[07:37] 📊 Spectral Condition for $μ$P under Width-Depth Scaling(宽度-深度缩放下 $μ$P 的光谱条件)[08:21] 🎬 WorldStereo: Bridging Camera-Guided Video Generation and Scene Reconstruction via 3D Geometric Memories(WorldStereo:通过3D几何记忆桥接相机引导视频生成与场景重建)[09:08] 🧠 LLaDA-o: An Effective and Length-Adaptive Omni Diffusion Model(LLaDA-o:一种高效且长度自适应的全能扩散模型)[10:11] 🧠 Efficient RLVR Training via Weighted Mutual Information Data Selection(基于加权互信息数据选择的高效RLVR训练方法)[10:48] 🧠 Learn Hard Problems During RL with Reference Guided Fine-tuning(通过参考引导微调在强化学习中学习难题)[11:51] 🔬 When Does RL Help Medical VLMs? Disentangling Vision, SFT, and RL Gains(强化学习何时助力医学视觉语言模型?解构视觉、监督微调与强化学习的增益)【关注我们】您还可以在以下平台找到我们,获得播客内容以外更多信息小红书: AI速递在小宇宙查看该单集文稿

13分钟
99+
3个月前
2026.03.02 | dLLM统一扩散框架;SpatialScore让AI读懂空间

2026.03.02 | dLLM统一扩散框架;SpatialScore让AI读懂空间

HuggingFace 每日AI论文速递

【赞助商】通勤路上就听AI每周谈。AI每周谈,每周带你回顾上周AI大事传送门 🔗https://www.xiaoyuzhoufm.com/podcast/688a34636f5a275f1cba40fd【目录】本期的 15 篇论文如下:[00:29] 🛠 dLLM: Simple Diffusion Language Modeling(dLLM:简单的扩散语言建模)[01:15] 🧠 Enhancing Spatial Understanding in Image Generation via Reward Modeling(通过奖励建模增强图像生成中的空间理解)[02:11] 🌍 Recovered in Translation: Efficient Pipeline for Automated Translation of Benchmarks and Datasets(在翻译中恢复:自动化基准测试与数据集翻译的高效流程)[03:08] ⚡ CUDA Agent: Large-Scale Agentic RL for High-Performance CUDA Kernel Generation(CUDA Agent:用于高性能CUDA内核生成的大规模智能体强化学习系统)[03:59] 🎬 Mode Seeking meets Mean Seeking for Fast Long Video Generation(模式寻求与均值寻求相遇:实现快速长视频生成)[04:44] 🧩 Compositional Generalization Requires Linear, Orthogonal Representations in Vision Embedding Models(组合泛化要求视觉嵌入模型具备线性正交表示)[05:31] ⚡ LK Losses: Direct Acceptance Rate Optimization for Speculative Decoding(LK损失函数:用于推测解码的直接接受率优化)[06:21] 🔍 CiteAudit: You Cited It, But Did You Read It? A Benchmark for Verifying Scientific References in the LLM Era(CiteAudit:你引用了它,但你读过吗?大语言模型时代科学参考文献验证基准)[07:16] ⚡ Accelerating Masked Image Generation by Learning Latent Controlled Dynamics(通过学习潜在控制动力学加速掩码图像生成)[08:00] 🧠 Memory Caching: RNNs with Growing Memory(记忆缓存:具有增长记忆能力的循环神经网络)[08:38] 📊 InfoNCE Induces Gaussian Distribution(InfoNCE诱导高斯分布)[09:28] 🧠 Ref-Adv: Exploring MLLM Visual Reasoning in Referring Expression Tasks(Ref-Adv:探索多模态大语言模型在指代表达任务中的视觉推理能力)[10:28] ⚡ SenCache: Accelerating Diffusion Model Inference via Sensitivity-Aware Caching(SenCache:基于敏感度感知的缓存加速扩散模型推理)[11:15] 🎬 LongVideo-R1: Smart Navigation for Low-cost Long Video Understanding(LongVideo-R1:面向低成本长视频理解的智能导航)[11:53] ⚡ Vectorizing the Trie: Efficient Constrained Decoding for LLM-based Generative Retrieval on Accelerators(向量化字典树:面向加速器的高效约束解码用于基于LLM的生成式检索)【关注我们】您还可以在以下平台找到我们,获得播客内容以外更多信息小红书: AI速递在小宇宙查看该单集文稿

12分钟
99+
3个月前
2026.02.27 | 诊断补课反超72B;三一致性考趴世界模型

2026.02.27 | 诊断补课反超72B;三一致性考趴世界模型

HuggingFace 每日AI论文速递

【赞助商】通勤路上就听AI每周谈。AI每周谈,每周带你回顾上周AI大事传送门 🔗https://www.xiaoyuzhoufm.com/podcast/688a34636f5a275f1cba40fd【目录】本期的 15 篇论文如下:[00:31] 🔍 From Blind Spots to Gains: Diagnostic-Driven Iterative Training for Large Multimodal Models(从盲点到增益:诊断驱动的迭代训练用于大型多模态模型)[01:16] 🌍 The Trinity of Consistency as a Defining Principle for General World Models(一致性三位一体:作为通用世界模型定义原则)[01:49] 🧭 MobilityBench: A Benchmark for Evaluating Route-Planning Agents in Real-World Mobility Scenarios(MobilityBench:一个用于评估现实世界移动场景中路径规划智能体的基准)[02:52] 🧠 OmniGAIA: Towards Native Omni-Modal AI Agents(OmniGAIA:迈向原生全模态人工智能体)[03:44] 🔍 Imagination Helps Visual Reasoning, But Not Yet in Latent Space(想象力助力视觉推理,但尚未在潜在空间中实现)[04:26] 🧠 Exploratory Memory-Augmented LLM Agent via Hybrid On- and Off-Policy Optimization(基于混合在线与离线策略优化的探索性记忆增强大语言模型智能体)[05:26] 🛡 AgentDropoutV2: Optimizing Information Flow in Multi-Agent Systems via Test-Time Rectify-or-Reject Pruning(AgentDropoutV2:通过测试时修正或拒绝剪枝优化多智能体系统中的信息流)[06:18] 🔍 Search More, Think Less: Rethinking Long-Horizon Agentic Search for Efficiency and Generalization(多搜索,少思考:重新思考长视野智能搜索的效率与泛化性)[06:54] 🩺 MediX-R1: Open Ended Medical Reinforcement Learning(MediX-R1:开放式医学强化学习框架)[07:42] ⚡ Accelerating Diffusion via Hybrid Data-Pipeline Parallelism Based on Conditional Guidance Scheduling(基于条件引导调度的混合数据-流水线并行加速扩散模型)[08:43] 🤖 EmbodMocap: In-the-Wild 4D Human-Scene Reconstruction for Embodied Agents(EmbodMocap:面向具身智能体的野外4D人-场景重建)[09:41] 🎮 AI Gamestore: Scalable, Open-Ended Evaluation of Machine General Intelligence with Human Games(AI游戏商店:通过人类游戏对机器通用智能进行可扩展、开放式评估)[10:26] 🚶 Causal Motion Diffusion Models for Autoregressive Motion Generation(因果运动扩散模型用于自回归运动生成)[11:09] ⚡ veScale-FSDP: Flexible and High-Performance FSDP at Scale(veScale-FSDP:大规模灵活且高性能的FSDP)[11:51] 🚗 Risk-Aware World Model Predictive Control for Generalizable End-to-End Autonomous Driving(面向可泛化端到端自动驾驶的风险感知世界模型预测控制)【关注我们】您还可以在以下平台找到我们,获得播客内容以外更多信息小红书: AI速递在小宇宙查看该单集文稿

13分钟
99+
3个月前
2026.02.26 | 分子图生成首破99%化学有效性;DreamID-Omni把多人脸音色混剪错配率砍到8%

2026.02.26 | 分子图生成首破99%化学有效性;DreamID-Omni把多人脸音色混剪错配率砍到8%

HuggingFace 每日AI论文速递

【赞助商】通勤路上就听AI每周谈。AI每周谈,每周带你回顾上周AI大事传送门 🔗https://www.xiaoyuzhoufm.com/podcast/688a34636f5a275f1cba40fd【目录】本期的 15 篇论文如下:[00:31] ⚗ MolHIT: Advancing Molecular-Graph Generation with Hierarchical Discrete Diffusion Models(MolHIT:基于分层离散扩散模型推进分子图生成)[01:08] 🎭 DreamID-Omni: Unified Framework for Controllable Human-Centric Audio-Video Generation(DreamID-Omni:可控人本音视频生成统一框架)[01:49] 🧪 ARLArena: A Unified Framework for Stable Agentic Reinforcement Learning(ARLArena:一个用于稳定智能体强化学习的统一框架)[02:40] ⚡ HyTRec: A Hybrid Temporal-Aware Attention Architecture for Long Behavior Sequential Recommendation(HyTRec:一种用于长行为序列推荐的混合时序感知注意力架构)[03:22] 🎬 SkyReels-V4: Multi-modal Video-Audio Generation, Inpainting and Editing model(SkyReels-V4:多模态视频-音频生成、修复与编辑模型)[04:10] 🎮 Solaris: Building a Multiplayer Video World Model in Minecraft(Solaris:在《我的世界》中构建多人视频世界模型)[05:20] 🤖 GUI-Libra: Training Native GUI Agents to Reason and Act with Action-aware Supervision and Partially Verifiable RL(GUI-Libra:通过动作感知监督和部分可验证强化学习训练原生GUI智能体进行推理与行动)[06:19] 🎬 JavisDiT++: Unified Modeling and Optimization for Joint Audio-Video Generation(JavisDiT++:面向联合音视频生成的统一建模与优化)[07:11] 🌐 Image Generation with a Sphere Encoder(使用球面编码器的图像生成)[07:51] 🧭 World Guidance: World Modeling in Condition Space for Action Generation(世界引导:基于条件空间的世界建模用于动作生成)[08:31] 🔍 NanoKnow: How to Know What Your Language Model Knows(NanoKnow:如何知晓你的语言模型知道什么)[09:10] ⚡ DualPath: Breaking the Storage Bandwidth Bottleneck in Agentic LLM Inference(DualPath:打破智能体化大语言模型推理中的存储带宽瓶颈)[10:11] 🧠 The Design Space of Tri-Modal Masked Diffusion Models(三模态掩码扩散模型的设计空间研究)[10:46] 🔤 VecGlypher: Unified Vector Glyph Generation with Language Models(VecGlypher:基于语言模型的统一矢量字形生成)[11:20] ⚡ SeaCache: Spectral-Evolution-Aware Cache for Accelerating Diffusion Models(SeaCache:一种用于加速扩散模型的频谱演化感知缓存)【关注我们】您还可以在以下平台找到我们,获得播客内容以外更多信息小红书: AI速递在小宇宙查看该单集文稿

12分钟
99+
3个月前
2026.02.25 | 数据工程赋能小模型;轻量重排刷新长文本SOTA

2026.02.25 | 数据工程赋能小模型;轻量重排刷新长文本SOTA

HuggingFace 每日AI论文速递

【赞助商】通勤路上就听AI每周谈。AI每周谈,每周带你回顾上周AI大事传送门 🔗https://www.xiaoyuzhoufm.com/podcast/688a34636f5a275f1cba40fd【目录】本期的 15 篇论文如下:[00:29] 🖥 On Data Engineering for Scaling LLM Terminal Capabilities(论扩展大型语言模型终端能力的数据工程)[01:20] 🧠 Query-focused and Memory-aware Reranker for Long Context Processing(面向长文本处理的查询聚焦与记忆感知重排序器)[02:12] 🔗 From Perception to Action: An Interactive Benchmark for Vision Reasoning(从感知到行动:视觉推理的交互式基准)[03:04] 🤖 PyVision-RL: Forging Open Agentic Vision Models via RL(PyVision-RL:通过强化学习锻造开放的智能体视觉模型)[03:52] 📊 LongCLI-Bench: A Preliminary Benchmark and Study for Long-horizon Agentic Programming in Command-Line Interfaces(LongCLI-Bench:命令行界面中长视野智能体编程的初步基准与研究)[04:41] 🔍 DREAM: Deep Research Evaluation with Agentic Metrics(DREAM:基于智能体指标的深度研究评估)[05:39] 📈 Conv-FinRe: A Conversational and Longitudinal Benchmark for Utility-Grounded Financial Recommendation(Conv-FinRe:面向效用驱动的金融推荐对话式与长期性基准)[06:49] ⚙ QuantVLA: Scale-Calibrated Post-Training Quantization for Vision-Language-Action Models(QuantVLA:面向视觉-语言-动作模型的尺度校准后训练量化)[07:35] 🤖 Learning from Trials and Errors: Reflective Test-Time Planning for Embodied LLMs(从试错中学习:具身大语言模型的反思性测试时规划)[08:20] 🚀 The Diffusion Duality, Chapter II: $Ψ$-Samplers and Efficient Curriculum(扩散对偶性第二章:Ψ采样器与高效课程学习)[09:05] 🧩 Communication-Inspired Tokenization for Structured Image Representations(面向结构化图像表征的通信启发式分词方法)[10:02] 🤖 Aletheia tackles FirstProof autonomously(Aletheia自主攻克首届FirstProof挑战)[10:42] ⚡ Untied Ulysses: Memory-Efficient Context Parallelism via Headwise Chunking(解绑的尤利西斯:通过注意力头分块实现内存高效上下文并行)[11:34] ⚡ The Art of Efficient Reasoning: Data, Reward, and Optimization(高效推理的艺术:数据、奖励与优化)[12:13] 🔒 Adaptive Text Anonymization: Learning Privacy-Utility Trade-offs via Prompt Optimization(自适应文本匿名化:通过提示优化学习隐私与效用的权衡)【关注我们】您还可以在以下平台找到我们,获得播客内容以外更多信息小红书: AI速递在小宇宙查看该单集文稿

13分钟
99+
4个月前
2026.02.24 | VBVR百万视频补推理教材;VLANeXt十二配方炼成VLA

2026.02.24 | VBVR百万视频补推理教材;VLANeXt十二配方炼成VLA

HuggingFace 每日AI论文速递

【赞助商】通勤路上就听AI每周谈。AI每周谈,每周带你回顾上周AI大事传送门 🔗https://www.xiaoyuzhoufm.com/podcast/688a34636f5a275f1cba40fd【目录】本期的 14 篇论文如下:[00:31] 🧠 A Very Big Video Reasoning Suite(一个超大规模视频推理套件)[01:16] 🧪 VLANeXt: Recipes for Building Strong VLA Models(VLANeXt:构建强大视觉-语言-动作模型的实践指南)[02:06] 🧭 ManCAR: Manifold-Constrained Latent Reasoning with Adaptive Test-Time Computation for Sequential Recommendation(ManCAR:用于序列推荐的具有自适应测试时计算的流形约束潜在推理)[02:54] 🤖 TOPReward: Token Probabilities as Hidden Zero-Shot Rewards for Robotics(TOPReward:将标记概率作为机器人学的隐藏零样本奖励)[03:45] 📱 Mobile-O: Unified Multimodal Understanding and Generation on Mobile Device(Mobile-O:移动设备上的统一多模态理解与生成)[04:40] 🧠 DSDR: Dual-Scale Diversity Regularization for Exploration in LLM Reasoning(DSDR:用于大语言模型推理探索的双尺度多样性正则化)[05:54] 🎯 Learning Cross-View Object Correspondence via Cycle-Consistent Mask Prediction(通过循环一致掩码预测学习跨视角物体对应关系)[06:44] 🎻 SkillOrchestra: Learning to Route Agents via Skill Transfer(SkillOrchestra:通过技能迁移学习路由智能体)[07:28] 🤖 RoboCurate: Harnessing Diversity with Action-Verified Neural Trajectory for Robot Learning(RoboCurate:利用动作验证神经轨迹的多样性进行机器人学习)[08:02] 🚀 K-Search: LLM Kernel Generation via Co-Evolving Intrinsic World Model(K-Search:通过协同演化内在世界模型进行LLM内核生成)[08:43] 🤖 SimVLA: A Simple VLA Baseline for Robotic Manipulation(SimVLA:用于机器人操作的简单视觉-语言-动作基线)[09:29] 🧠 tttLRM: Test-Time Training for Long Context and Autoregressive 3D Reconstruction(tttLRM:基于测试时训练的长上下文自回归三维重建)[10:23] 🗜 Nacrith: Neural Lossless Compression via Ensemble Context Modeling and High-Precision CDF Coding(Nacrith:基于集成上下文建模与高精度CDF编码的神经无损压缩)[11:08] 🧬 AAVGen: Precision Engineering of Adeno-associated Viral Capsids for Renal Selective Targeting(AAVGen:用于肾脏选择性靶向的腺相关病毒衣壳精准工程)【关注我们】您还可以在以下平台找到我们,获得播客内容以外更多信息小红书: AI速递在小宇宙查看该单集文稿

12分钟
99+
4个月前
2026.02.23 | VESPO防抖离线RL;推理模型学会“点到为止”

2026.02.23 | VESPO防抖离线RL;推理模型学会“点到为止”

HuggingFace 每日AI论文速递

【赞助商】通勤路上就听AI每周谈。AI每周谈,每周带你回顾上周AI大事传送门 🔗https://www.xiaoyuzhoufm.com/podcast/688a34636f5a275f1cba40fd【目录】本期的 10 篇论文如下:[00:40] ⚖ VESPO: Variational Sequence-Level Soft Policy Optimization for Stable Off-Policy LLM Training(VESPO:用于稳定离策略LLM训练的变分序列级软策略优化)[01:45] 💭 Does Your Reasoning Model Implicitly Know When to Stop Thinking?(你的推理模型是否隐含地知道何时停止思考?)[02:44] 🎮 Generated Reality: Human-centric World Simulation using Interactive Video Generation with Hand and Camera Control(生成现实:基于交互式视频生成与手部和相机控制的人本世界模拟)[03:24] 🤖 EgoPush: Learning End-to-End Egocentric Multi-Object Rearrangement for Mobile Robots(EgoPush:面向移动机器人的端到端自我中心多物体重排学习)[04:11] 🤖 SARAH: Spatially Aware Real-time Agentic Humans(SARAH:具备空间感知能力的实时拟人化智能体)[05:05] 🎬 VidEoMT: Your ViT is Secretly Also a Video Segmentation Model(VidEoMT:你的ViT模型暗中也是一个视频分割模型)[05:51] ✂ Sink-Aware Pruning for Diffusion Language Models(面向扩散语言模型的汇点感知剪枝)[06:36] 🎯 Selective Training for Large Vision Language Models via Visual Information Gain(基于视觉信息增益的大型视觉语言模型选择性训练)[07:18] 🧮 DeepVision-103K: A Visually Diverse, Broad-Coverage, and Verifiable Mathematical Dataset for Multimodal Reasoning(DeepVision-103K:一个视觉多样、覆盖广泛且可验证的多模态推理数学数据集)[08:16] 🤖 Learning Smooth Time-Varying Linear Policies with an Action Jacobian Penalty(通过动作雅可比惩罚学习平滑时变线性策略)【关注我们】您还可以在以下平台找到我们,获得播客内容以外更多信息小红书: AI速递在小宇宙查看该单集文稿

9分钟
99+
4个月前
2026.02.20 | 砍95%注意力画质反升;边压缩边生成FID 1.4

2026.02.20 | 砍95%注意力画质反升;边压缩边生成FID 1.4

HuggingFace 每日AI论文速递

【赞助商】通勤路上就听AI每周谈。AI每周谈,每周带你回顾上周AI大事传送门 🔗https://www.xiaoyuzhoufm.com/podcast/688a34636f5a275f1cba40fd【目录】本期的 15 篇论文如下:[00:31] ⚡ SpargeAttention2: Trainable Sparse Attention via Hybrid Top-k+Top-p Masking and Distillation Fine-Tuning(SpargeAttention2:通过混合Top-k+Top-p掩码与蒸馏微调实现可训练的稀疏注意力)[01:27] 🧠 Unified Latents (UL): How to train your latents(统一隐变量(UL):如何训练你的隐变量)[02:05] 🤖 Mobile-Agent-v3.5: Multi-platform Fundamental GUI Agents(Mobile-Agent-v3.5:多平台基础图形用户界面智能体)[02:58] 🚗 "What Are You Doing?": Effects of Intermediate Feedback from Agentic LLM In-Car Assistants During Multi-Step Processing(“你在做什么?”:多步骤处理过程中来自具身化LLM车载助手的中间反馈效果研究)[03:45] ⚠ Frontier AI Risk Management Framework in Practice: A Risk Analysis Technical Report v1.5(前沿人工智能风险管理框架实践:风险分析技术报告 v1.5)[04:40] ⚡ DDiT: Dynamic Patch Scheduling for Efficient Diffusion Transformers(DDiT:面向高效扩散变换器的动态补丁调度)[05:39] 🧠 Arcee Trinity Large Technical Report(Arcee Trinity 大型技术报告)[06:23] 🖥 Computer-Using World Model(计算机使用世界模型)[07:20] 🔬 ArXiv-to-Model: A Practical Study of Scientific LM Training(ArXiv到模型:科学语言模型训练的实践研究)[07:59] 🧬 Discovering Multiagent Learning Algorithms with Large Language Models(利用大语言模型发现多智能体学习算法)[08:42] 🖐 TactAlign: Human-to-Robot Policy Transfer via Tactile Alignment(TactAlign:通过触觉对齐实现人机策略迁移)[09:24] 🤖 FRAPPE: Infusing World Modeling into Generalist Policies via Multiple Future Representation Alignment(FRAPPE:通过多未来表示对齐将世界建模注入通用策略)[10:08] 🧠 World Models for Policy Refinement in StarCraft II(用于《星际争霸II》策略优化的世界模型)[10:46] ⚡ 2Mamba2Furious: Linear in Complexity, Competitive in Accuracy(2Mamba2Furious:线性复杂度,媲美准确度)[11:19] 🤿 StereoAdapter-2: Globally Structure-Consistent Underwater Stereo Depth Estimation(StereoAdapter-2:全局结构一致的水下立体深度估计)【关注我们】您还可以在以下平台找到我们,获得播客内容以外更多信息小红书: AI速递在小宇宙查看该单集文稿

12分钟
99+
4个月前
2026.02.19 | 可学习路由+量化加速视频扩散;残差追踪让人形90%抓取

2026.02.19 | 可学习路由+量化加速视频扩散;残差追踪让人形90%抓取

HuggingFace 每日AI论文速递

【赞助商】通勤路上就听AI每周谈。AI每周谈,每周带你回顾上周AI大事传送门 🔗https://www.xiaoyuzhoufm.com/podcast/688a34636f5a275f1cba40fd【目录】本期的 14 篇论文如下:[00:30] ⚡ SLA2: Sparse-Linear Attention with Learnable Routing and QAT(SLA2:具有可学习路由和量化感知训练的稀疏线性注意力)[01:16] 🤖 Learning Humanoid End-Effector Control for Open-Vocabulary Visual Loco-Manipulation(面向开放词汇视觉移动操作的人形机器人末端执行器控制学习)[02:02] 🧠 RynnBrain: Open Embodied Foundation Models(RynnBrain:开放式具身基础模型)[02:46] 🔑 Empty Shelves or Lost Keys? Recall Is the Bottleneck for Parametric Factuality(空书架还是丢钥匙?回忆是参数化事实性的瓶颈)[03:33] 🕺 SAM 3D Body: Robust Full-Body Human Mesh Recovery(SAM 3D 人体:鲁棒的全身体三维人体网格重建)[04:41] 🤝 Multi-agent cooperation through in-context co-player inference(通过上下文共玩家推断实现多智能体合作)[05:28] 📊 MAEB: Massive Audio Embedding Benchmark(MAEB:大规模音频嵌入基准测试)[06:04] 🤖 World Action Models are Zero-shot Policies(世界行动模型是零样本策略)[06:44] 🔬 Towards a Science of AI Agent Reliability(迈向AI智能体可靠性的科学)[07:20] 🧠 MMA: Multimodal Memory Agent(MMA:多模态记忆智能体)[08:09] 🚀 Optimizing Few-Step Generation with Adaptive Matching Distillation(通过自适应匹配蒸馏优化少步生成)[08:56] 🧭 Learning Situated Awareness in the Real World(在现实世界中学习情境感知)[09:28] ⚠ Visual Memory Injection Attacks for Multi-Turn Conversations(面向多轮对话的视觉记忆注入攻击)[10:10] 🤖 BiManiBench: A Hierarchical Benchmark for Evaluating Bimanual Coordination of Multimodal Large Language Models(BiManiBench:用于评估多模态大语言模型双手协调能力的层次化基准)【关注我们】您还可以在以下平台找到我们,获得播客内容以外更多信息小红书: AI速递在小宇宙查看该单集文稿

11分钟
72
4个月前

加入我们的 Discord

与播客爱好者一起交流

立即加入

扫描微信二维码

添加微信好友,获取更多播客资讯

微信二维码

播放列表

自动播放下一个

播放列表还是空的

去找些喜欢的节目添加进来吧