HuggingFace 每日AI论文速递 - 2025.10.13 | 桌面交互预训练解锁机器人潜能；统一模型赋予相机空间想象力 - EarsOnMe

主播

节目简介

来源：小宇宙

本期的 14 篇论文如下：

[00:20] 🖥 D2E: Scaling Vision-Action Pretraining on Desktop Data for Transfer to Embodied AI（D2E：利用桌面数据规模化视觉-动作预训练以迁移至具身智能）

[01:13] 📷 Thinking with Camera: A Unified Multimodal Model for Camera-Centric Understanding and Generation（基于相机的统一多模态理解与生成模型）

[01:56] 🎨 TAG:Tangential Amplifying Guidance for Hallucination-Resistant Diffusion Sampling（TAG：抑制幻觉的扩散采样切向放大引导）

[02:31] 🧠 Multimodal Prompt Optimization: Why Not Leverage Multiple Modalities for MLLMs（多模态提示优化：为何不为多模态大模型释放全模态潜能）

[03:05] 🚀 AutoPR: Let's Automate Your Academic Promotion!（AutoPR：让学术晋升一键自动化！）

[03:39] 🧭 R-Horizon: How Far Can Your Large Reasoning Model Really Go in Breadth and Depth?（R-HORIZON：你的大推理模型在广度与深度上究竟能走多远？）

[04:14] 🚀 Webscale-RL: Automated Data Pipeline for Scaling RL Data to Pretraining Levels（Webscale-RL：把强化学习数据扩展到预训练体量的自动化流水线）

[04:56] 🛰 SpaceVista: All-Scale Visual Spatial Reasoning from mm to km（SpaceVista：毫米到千米全尺度视觉空间推理）

[05:37] 🎥 StreamingVLM: Real-Time Understanding for Infinite Video Streams（StreamingVLM：面向无限视频流的实时理解框架）

[06:19] 🌐 KORMo: Korean Open Reasoning Model for Everyone（KORMo：人人可用的韩语开放推理模型）

[06:42] ♻ Don't Waste Mistakes: Leveraging Negative RL-Groups via Confidence Reweighting（别浪费错误：通过置信度加权利用负RL组）

[07:25] 🧠 Bridging Reasoning to Learning: Unmasking Illusions using Complexity Out of Distribution Generalization（从推理到学习的桥梁：以复杂度分布外泛化揭穿幻觉）

[08:16] ⚡ DISCO: Diversifying Sample Condensation for Efficient Model Evaluation（DISCO：以模型分歧为导向的样本浓缩加速评测）

[08:56] 🚗 Progressive Gaussian Transformer with Anisotropy-aware Sampling for Open Vocabulary Occupancy Prediction（面向开放词汇占用预测的各向异性采样渐进高斯Transformer）

【关注我们】

您还可以在以下平台找到我们，获得播客内容以外更多信息

小红书: AI速递

2025.10.13 | 桌面交互预训练解锁机器人潜能；统一模型赋予相机空间想象力