HuggingFace 每日AI论文速递 - 节目列表

2026.01.26 | LongCat练5600亿MoE代理满分；SWE-Pruner剪五成Token更快

【赞助商】通勤路上就听AI每周谈。AI每周谈，每周带你回顾上周AI大事传送门 🔗https://www.xiaoyuzhoufm.com/podcast/688a34636f5a275f1cba40fd 【目录】本期的 15 篇论文如下： [00:32] 🧠 LongCat-Flash-Thinking-2601 Technical Report（LongCat-Flash-Thinking-2601 技术报告） [01:13] ✂ SWE-Pruner: Self-Adaptive Context Pruning for Coding Agents（SWE-Pruner：面向编码代理的自适应上下文剪枝框架） [02:08] 🧠 TwinBrainVLA: Unleashing the Potential of Generalist VLMs for Embodied Tasks via Asymmetric Mixture-of-Transformers（TwinBrainVLA：通过非对称混合Transformer释放通用视觉语言模型在具身任务中的潜力） [02:58] 🧠 VisGym: Diverse, Customizable, Scalable Environments for Multimodal Agents（VisGym：面向多模态智能体的多样化、可定制、可扩展环境） [03:58] 🧬 Inference-Time Scaling of Verification: Self-Evolving Deep Research Agents via Test-Time Rubric-Guided Verification（验证的推理时扩展：通过测试时准则引导验证实现自演化的深度研究智能体） [04:40] ⚡ Jet-RL: Enabling On-Policy FP8 Reinforcement Learning with Unified Training and Rollout Precision Flow（Jet-RL：通过统一的训练与推理精度流实现基于策略的FP8强化学习） [05:32] ⚡ SALAD: Achieve High-Sparsity Attention via Efficient Linear Attention Tuning for Video Diffusion Transformer（SALAD：通过高效线性注意力调优实现视频扩散Transformer的高稀疏性注意力） [06:11] 🧠 MeepleLM: A Virtual Playtester Simulating Diverse Subjective Experiences（MeepleLM：模拟多样化主观体验的虚拟游戏测试员） [06:55] 🎬 Memory-V2V: Augmenting Video-to-Video Diffusion Models with Memory（Memory-V2V：利用记忆增强视频到视频扩散模型） [07:43] 🧠 Knowledge is Not Enough: Injecting RL Skills for Continual Adaptation（知识不足够：注入强化学习技能以实现持续适应） [08:22] 🚀 Endless Terminals: Scaling RL Environments for Terminal Agents（无尽终端：为终端智能体扩展强化学习环境） [09:09] 🧪 DSGym: A Holistic Framework for Evaluating and Training Data Science Agents（DSGym：一个用于评估和训练数据科学智能体的整体框架） [10:11] 🧠 Dancing in Chains: Strategic Persuasion in Academic Rebuttal via Theory of Mind（镣铐之舞：基于心智理论的学术反驳中的策略性说服） [10:58] 💻 Guidelines to Prompt Large Language Models for Code Generation: An Empirical Characterization（面向代码生成的大语言模型提示指南：一项实证性特征研究） [11:39] ⚖ Mecellem Models: Turkish Models Trained from Scratch and Continually Pre-trained for the Legal Domain（Mecellem模型：针对法律领域从零开始训练与持续预训练的土耳其语模型）【关注我们】您还可以在以下平台找到我们，获得播客内容以外更多信息小红书: AI速递

12分钟

【周末特辑】1月第4周最火AI论文 | Agentic LLM进化成行动派；群体RL纠偏难度歧视

【赞助商】通勤路上就听AI每周谈。AI每周谈，每周带你回顾上周AI大事传送门 🔗https://www.xiaoyuzhoufm.com/podcast/688a34636f5a275f1cba40fd 【目录】本期的 5 篇论文如下： [00:44] TOP1(🔥159) | 🤖 Agentic Reasoning for Large Language Models（大语言模型的智能体推理） [03:02] TOP2(🔥138) | ⚖ Your Group-Relative Advantage Is Biased（你的组相对优势存在偏差） [05:37] TOP3(🔥71) | 🤖 Being-H0.5: Scaling Human-Centric Robot Learning for Cross-Embodiment Generalization（Being-H0.5：基于人类中心机器人学习的跨具身泛化扩展） [08:18] TOP4(🔥63) | 🚀 EvoCUA: Evolving Computer Use Agents via Learning from Scalable Synthetic Experience（EvoCUA：通过从可扩展合成经验中学习来演化计算机使用智能体） [10:14] TOP5(🔥62) | ⚙ ABC-Bench: Benchmarking Agentic Backend Coding in Real-World Development（ABC-Bench：面向真实世界开发的智能体后端编码基准测试）【关注我们】您还可以在以下平台找到我们，获得播客内容以外更多信息小红书: AI速递

12分钟

2026.01.23 | BayesianVLA逼模型“读心”；扩散模型“按顺序”更聪明

【赞助商】通勤路上就听AI每周谈。AI每周谈，每周带你回顾上周AI大事传送门 🔗https://www.xiaoyuzhoufm.com/podcast/688a34636f5a275f1cba40fd 【目录】本期的 15 篇论文如下： [00:32] 🤖 BayesianVLA: Bayesian Decomposition of Vision Language Action Models via Latent Action Queries（BayesianVLA：通过潜在动作查询对视觉语言动作模型进行贝叶斯分解） [01:22] ⚠ The Flexibility Trap: Why Arbitrary Order Limits Reasoning Potential in Diffusion Language Models（灵活性陷阱：为何任意顺序生成会限制扩散语言模型的推理潜力） [02:26] 🎥 HERMES: KV Cache as Hierarchical Memory for Efficient Streaming Video Understanding（HERMES：将KV缓存作为分层内存以实现高效流式视频理解） [03:14] 🚀 EvoCUA: Evolving Computer Use Agents via Learning from Scalable Synthetic Experience（EvoCUA：通过从可扩展合成经验中学习来演化计算机使用智能体） [04:02] 🧪 LLM-in-Sandbox Elicits General Agentic Intelligence（沙盒中的LLM激发通用智能体智能） [04:54] 🚀 Stable-DiffCoder: Pushing the Frontier of Code Diffusion Large Language Model（Stable-DiffCoder：推进代码扩散大语言模型的前沿） [05:34] 🎭 SAMTok: Representing Any Mask with Two Words（SAMTok：用两个词表示任意掩码） [06:30] 🚀 Scaling Text-to-Image Diffusion Transformers with Representation Autoencoders（使用表征自编码器扩展文本到图像扩散变换器） [07:23] 🔬 Learning to Discover at Test Time（在测试时学习发现） [08:08] 🔍 Rethinking Composed Image Retrieval Evaluation: A Fine-Grained Benchmark from Image Editing（重新思考组合图像检索评估：一个源自图像编辑的细粒度基准） [09:06] ⚙ Towards Automated Kernel Generation in the Era of LLMs（大语言模型时代的自动化内核生成研究） [09:48] 🔄 OpenVision 3: A Family of Unified Visual Encoder for Both Understanding and Generation（OpenVision 3：一个用于理解和生成的统一视觉编码器家族） [10:45] 💻 Terminal-Bench: Benchmarking Agents on Hard, Realistic Tasks in Command Line Interfaces（终端基准测试：在命令行界面中对智能体进行困难、现实任务的基准评估） [11:29] 🗣 Qwen3-TTS Technical Report（Qwen3-TTS技术报告） [12:13] 🤖 Cosmos Policy: Fine-Tuning Video Models for Visuomotor Control and Planning（Cosmos策略：通过微调视频模型实现视觉运动控制与规划）【关注我们】您还可以在以下平台找到我们，获得播客内容以外更多信息小红书: AI速递

13分钟

2026.01.22 | LLM变数字特工；视频模型先考后练

【赞助商】通勤路上就听AI每周谈。AI每周谈，每周带你回顾上周AI大事传送门 🔗https://www.xiaoyuzhoufm.com/podcast/688a34636f5a275f1cba40fd 【目录】本期的 15 篇论文如下： [00:30] 🤖 Agentic Reasoning for Large Language Models（大语言模型的智能体推理） [01:05] 🤖 Rethinking Video Generation Model for the Embodied World（为具身世界重新思考视频生成模型） [01:43] 🤖 Paper2Rebuttal: A Multi-Agent Framework for Transparent Author Response Assistance（Paper2Rebuttal：一个用于透明作者回复辅助的多智能体框架） [02:34] 📊 MMDeepResearch-Bench: A Benchmark for Multimodal Deep Research Agents（MMDeepResearch-Bench：面向多模态深度研究智能体的基准测试） [03:24] 🧠 Render-of-Thought: Rendering Textual Chain-of-Thought as Images for Visual Latent Reasoning（思维渲染：将文本链式思维渲染为图像以进行视觉潜在推理） [04:03] 📄 Typhoon OCR: Open Vision-Language Model For Thai Document Extraction（台风OCR：面向泰语文档提取的开放视觉语言模型） [04:51] 🛡 FinVault: Benchmarking Financial Agent Safety in Execution-Grounded Environments（FinVault：面向执行环境基准测试的金融智能体安全性评估） [05:41] ⚡ Typhoon ASR Real-time: FastConformer-Transducer for Thai Automatic Speech Recognition（台风ASR实时系统：面向泰语自动语音识别的FastConformer-Transducer模型） [06:45] 🔍 XR: Cross-Modal Agents for Composed Image Retrieval（XR：用于组合图像检索的跨模态智能体） [07:29] 🔊 Quantifying Speaker Embedding Phonological Rule Interactions in Accented Speech Synthesis（量化口音语音合成中说话人嵌入与音系规则的交互作用） [08:19] 🤖 Numina-Lean-Agent: An Open and General Agentic Reasoning System for Formal Mathematics（Numina-Lean-Agent：一个开放通用的形式数学智能体推理系统） [09:15] 🤖 RoboBrain 2.5: Depth in Sight, Time in Mind（RoboBrain 2.5：洞见深度，心系时序） [10:16] 🔍 Lost in the Prompt Order: Revealing the Limitations of Causal Attention in Language Models（迷失于提示顺序：揭示语言模型中因果注意力的局限性） [10:59] 🧠 AgentEHR: Advancing Autonomous Clinical Decision-Making via Retrospective Summarization（AgentEHR：通过回顾性摘要推进自主临床决策） [11:43] 🕳 The Responsibility Vacuum: Organizational Failure in Scaled Agent Systems（责任真空：规模化智能体系统中的组织性失效）【关注我们】您还可以在以下平台找到我们，获得播客内容以外更多信息小红书: AI速递

12分钟

2026.01.21 | AI修Bug统一打分；MLLM未来预测仍易盲猜

【赞助商】通勤路上就听AI每周谈。AI每周谈，每周带你回顾上周AI大事传送门 🔗https://www.xiaoyuzhoufm.com/podcast/688a34636f5a275f1cba40fd 【目录】本期的 15 篇论文如下： [00:31] 🤖 Advances and Frontiers of LLM-based Issue Resolution in Software Engineering: A Comprehensive Survey（基于大语言模型的软件工程问题解决：进展、前沿与全面综述） [01:15] 🔮 FutureOmni: Evaluating Future Forecasting from Omni-Modal Context for Multimodal LLMs（FutureOmni：评估多模态大语言模型基于全模态上下文进行未来预测的能力） [02:11] ⚡ Toward Efficient Agents: Memory, Tool learning, and Planning（迈向高效智能体：记忆、工具学习与规划） [02:51] 🤖 Being-H0.5: Scaling Human-Centric Robot Learning for Cross-Embodiment Generalization（Being-H0.5：基于人类中心机器人学习的跨具身泛化扩展） [03:40] 🎬 OmniTransfer: All-in-one Framework for Spatio-temporal Video Transfer（OmniTransfer：时空视频迁移的一体化框架） [04:28] 🧠 $\texttt{MemoryRewardBench}$: Benchmarking Reward Models for Long-Term Memory Management in Large Language Models（《MemoryRewardBench：面向大语言模型长期记忆管理的奖励模型基准评测》） [05:15] 🧠 Think3D: Thinking with Space for Spatial Reasoning（Think3D：利用空间进行空间推理的思考） [06:06] 🫁 UniX: Unifying Autoregression and Diffusion for Chest X-Ray Understanding and Generation（UniX：统一自回归与扩散模型用于胸部X光片理解与生成） [07:08] ⚙ ToolPRMBench: Evaluating and Advancing Process Reward Models for Tool-using Agents（ToolPRMBench：评估和推进工具使用智能体的过程奖励模型） [07:58] 🧠 Aligning Agentic World Models via Knowledgeable Experience Learning（通过知识化经验学习对齐具身世界模型） [08:45] 🤖 Agentic-R: Learning to Retrieve for Agentic Search（Agentic-R：面向智能体搜索的检索学习） [09:25] 🔤 LightOnOCR: A 1B End-to-End Multilingual Vision-Language Model for State-of-the-Art OCR（LightOnOCR：一个用于最先进OCR的10亿参数端到端多语言视觉语言模型） [10:14] 📊 PRiSM: Benchmarking Phone Realization in Speech Models（PRiSM：语音模型中音素实现的基准测试） [11:02] 🔍 On the Evidentiary Limits of Membership Inference for Copyright Auditing（论成员推理在版权审计中的证据性局限） [11:46] 🔒 Fundamental Limitations of Favorable Privacy-Utility Guarantees for DP-SGD（差分隐私随机梯度下降（DP-SGD）中有利隐私-效用保证的基本局限性）【关注我们】您还可以在以下平台找到我们，获得播客内容以外更多信息小红书: AI速递

13分钟

2026.01.20 | 沙盒测通才是真后端；分叉合并少字多想

【赞助商】通勤路上就听AI每周谈。AI每周谈，每周带你回顾上周AI大事传送门 🔗https://www.xiaoyuzhoufm.com/podcast/688a34636f5a275f1cba40fd 【目录】本期的 8 篇论文如下： [00:30] ⚙ ABC-Bench: Benchmarking Agentic Backend Coding in Real-World Development（ABC-Bench：面向真实世界开发的智能体后端编码基准测试） [01:15] 🧠 Multiplex Thinking: Reasoning via Token-wise Branch-and-Merge（多路思考：基于词元级分支与合并的推理方法） [02:13] 🕺 CoDance: An Unbind-Rebind Paradigm for Robust Multi-Subject Animation（CoDance：一种用于鲁棒多主体动画的解绑-重绑范式） [03:01] 🧭 The Assistant Axis: Situating and Stabilizing the Default Persona of Language Models（助手轴：定位与稳定语言模型的默认人格） [03:30] 🧠 Spurious Rewards Paradox: Mechanistically Understanding How RLVR Activates Memorization Shortcuts in LLMs（虚假奖励悖论：从机制上理解RLVR如何激活LLM中的记忆捷径） [04:21] 🔬 SIN-Bench: Tracing Native Evidence Chains in Long-Context Multimodal Scientific Interleaved Literature（SIN-Bench：在长上下文多模态科学交织文献中追踪原生证据链） [05:08] 🧭 YaPO: Learnable Sparse Activation Steering Vectors for Domain Adaptation（YaPO：用于领域适应的可学习稀疏激活导向向量） [05:56] 🧬 Medical SAM3: A Foundation Model for Universal Prompt-Driven Medical Image Segmentation（Medical SAM3：面向通用提示驱动医学图像分割的基础模型）【关注我们】您还可以在以下平台找到我们，获得播客内容以外更多信息小红书: AI速递

7分钟

2026.01.19 | GRPO回报纠偏助啃难题；毒苹果AI未用已扰市

【赞助商】通勤路上就听AI每周谈。AI每周谈，每周带你回顾上周AI大事传送门 🔗 https://www.xiaoyuzhoufm.com/podcast/688a34636f5a275f1cba40fd 【目录】本期的 15 篇论文如下： [00:33] ⚖ Your Group-Relative Advantage Is Biased（你的组相对优势存在偏差） [01:20] 🍎 The Poisoned Apple Effect: Strategic Manipulation of Mediated Markets via Technology Expansion of AI Agents（毒苹果效应：通过AI代理技术扩展对中介市场的战略性操纵） [02:08] 🛠 Unlocking Implicit Experience: Synthesizing Tool-Use Trajectories from Text（解锁隐性经验：从文本合成工具使用轨迹） [03:14] 📊 RubricHub: A Comprehensive and Highly Discriminative Rubric Dataset via Automated Coarse-to-Fine Generation（RubricHub：通过自动化粗到细生成构建的全面且高区分度的评分标准数据集） [04:20] 🤔 When Personalization Misleads: Understanding and Mitigating Hallucinations in Personalized LLMs（当个性化误导时：理解并缓解个性化大语言模型中的幻觉现象） [05:18] 🤖 ACoT-VLA: Action Chain-of-Thought for Vision-Language-Action Models（ACoT-VLA：面向视觉-语言-动作模型的动作思维链） [06:07] 🚧 BAPO: Boundary-Aware Policy Optimization for Reliable Agentic Search（BAPO：面向可靠智能搜索的边界感知策略优化） [07:04] 🎯 ProFit: Leveraging High-Value Signals in SFT via Probability-Guided Token Selection（ProFit：通过概率引导的令牌选择在SFT中利用高价值信号） [08:01] 🤖 FrankenMotion: Part-level Human Motion Generation and Composition（FrankenMotion：部件级人体运动生成与组合） [08:54] 🧠 Reasoning Models Generate Societies of Thought（推理模型生成思想社会） [09:40] 🤖 PersonalAlign: Hierarchical Implicit Intent Alignment for Personalized GUI Agent with Long-Term User-Centric Records（PersonalAlign：基于长期用户中心化记录的个性化GUI代理的层次化隐式意图对齐） [10:27] 🔍 Building Production-Ready Probes For Gemini（构建适用于Gemini的生产级探针） [11:21] ⚙ PhysRVG: Physics-Aware Unified Reinforcement Learning for Video Generative Models（PhysRVG：基于物理感知统一强化学习的视频生成模型） [12:31] 🧊 ShapeR: Robust Conditional 3D Shape Generation from Casual Captures（ShapeR：从随意拍摄中实现鲁棒的条件式3D形状生成） [13:24] 🚀 AstroReason-Bench: Evaluating Unified Agentic Planning across Heterogeneous Space Planning Problems（AstroReason-Bench：评估异构空间规划问题中的统一智能体规划能力）【关注我们】您还可以在以下平台找到我们，获得播客内容以外更多信息小红书: AI速递

14分钟

【周末特辑】1月第3周最火AI论文 | VideoDR测模型搜证漂移；BabyVision曝视觉短板

本期的 5 篇论文如下： [00:29] TOP1(🔥201) | 🔍 Watching, Reasoning, and Searching: A Video Deep Research Benchmark on Open Web for Agentic Video Reasoning（观察、推理与搜索：面向智能体视频推理的开放网络视频深度研究基准） [02:45] TOP2(🔥179) | 👶 BabyVision: Visual Reasoning Beyond Language（BabyVision：超越语言的视觉推理） [05:00] TOP3(🔥158) | 🗺 Thinking with Map: Reinforced Parallel Map-Augmented Agent for Geolocalization（借助地图思考：用于地理定位的强化并行地图增强智能体） [07:03] TOP4(🔥140) | 🏙 Urban Socio-Semantic Segmentation with Vision-Language Reasoning（基于视觉语言推理的城市社会语义分割） [09:07] TOP5(🔥134) | 🚀 STEP3-VL-10B Technical Report（STEP3-VL-10B 技术报告）【关注我们】您还可以在以下平台找到我们，获得播客内容以外更多信息小红书: AI速递

11分钟

2026.01.16 | 10B模型逆袭千亿巨头；AI一眼读出城市功能

本期的 15 篇论文如下： [00:20] 🚀 STEP3-VL-10B Technical Report（STEP3-VL-10B 技术报告） [01:01] 🏙 Urban Socio-Semantic Segmentation with Vision-Language Reasoning（基于视觉语言推理的城市社会语义分割） [01:42] 💡 Rewarding the Rare: Uniqueness-Aware RL for Creative Problem Solving in LLMs（奖励罕见：面向LLM创造性问题解决的独特性感知强化学习） [02:33] 🤖 Collaborative Multi-Agent Test-Time Reinforcement Learning for Reasoning（用于推理的协作式多智能体测试时强化学习） [03:14] 🧬 Beyond Static Tools: Test-Time Tool Evolution for Scientific Reasoning（超越静态工具：面向科学推理的测试时工具演化） [03:59] 📊 DanQing: An Up-to-Date Large-Scale Chinese Vision-Language Pre-training Dataset（丹青：一个最新的大规模中文视觉语言预训练数据集） [04:39] 🎨 CoF-T2I: Video Models as Pure Visual Reasoners for Text-to-Image Generation（CoF-T2I：将视频模型作为纯视觉推理器用于文本到图像生成） [05:33] 🧠 Toward Ultra-Long-Horizon Agentic Science: Cognitive Accumulation for Machine Learning Engineering（迈向超长视野的代理科学：机器学习工程中的认知积累） [06:12] 🤔 Think-Then-Generate: Reasoning-Aware Text-to-Image Diffusion with LLM Encoders（先思后生：基于大语言模型编码器的推理感知文本到图像扩散方法） [06:48] 🔧 MatchTIR: Fine-Grained Supervision for Tool-Integrated Reasoning via Bipartite Matching（MatchTIR：通过二分图匹配实现工具集成推理的细粒度监督） [07:29] 🛡 A Safety Report on GPT-5.2, Gemini 3 Pro, Qwen3-VL, Doubao 1.8, Grok 4.1 Fast, Nano Banana Pro, and Seedream 4.5（关于GPT-5.2、Gemini 3 Pro、Qwen3-VL、Doubao 1.8、Grok 4.1 Fast、Nano Banana Pro和Seedream 4.5的安全性报告） [08:09] 🛡 ToolSafe: Enhancing Tool Invocation Safety of LLM-based agents via Proactive Step-level Guardrail and Feedback（ToolSafe：通过主动的步骤级护栏与反馈增强基于LLM的智能体的工具调用安全性） [08:59] 🎬 FlowAct-R1: Towards Interactive Humanoid Video Generation（FlowAct-R1：迈向交互式人形视频生成） [09:39] 🎨 VIBE: Visual Instruction Based Editor（VIBE：基于视觉指令的编辑器） [10:09] ⚡ Transition Matching Distillation for Fast Video Generation（用于快速视频生成的过渡匹配蒸馏）【关注我们】您还可以在以下平台找到我们，获得播客内容以外更多信息小红书: AI速递

11分钟

2026.01.15 | 算法自进化夺冠；LLM远瞻省token

本期的 15 篇论文如下： [00:20] 🧬 Controlled Self-Evolution for Algorithmic Code Optimization（用于算法代码优化的受控自进化方法） [00:52] 🧠 MAXS: Meta-Adaptive Exploration with LLM Agents（MAXS：基于大语言模型智能体的元自适应探索） [01:27] 🧠 $A^3$-Bench: Benchmarking Memory-Driven Scientific Reasoning via Anchor and Attractor Activation（A³-Bench：通过锚点与吸引子激活基准测试记忆驱动的科学推理） [02:10] 🔍 DeepResearchEval: An Automated Framework for Deep Research Task Construction and Agentic Evaluation（DeepResearchEval：面向深度研究任务构建与智能体评估的自动化框架） [02:53] 🔬 SkinFlow: Efficient Information Transmission for Open Dermatological Diagnosis via Dynamic Visual Encoding and Staged RL（SkinFlow：通过动态视觉编码与分阶段强化学习实现开放皮肤病诊断的高效信息传输） [03:49] ⚡ Fast-ThinkAct: Efficient Vision-Language-Action Reasoning via Verbalizable Latent Planning（Fast-ThinkAct：基于可言语化潜在规划的高效视觉-语言-动作推理） [04:20] 🧊 OpenVoxel: Training-Free Grouping and Captioning Voxels for Open-Vocabulary 3D Scene Understanding（OpenVoxel：无需训练的体素分组与描述，实现开放词汇3D场景理解） [05:03] 🧠 Distribution-Aligned Sequence Distillation for Superior Long-CoT Reasoning（面向卓越长链思维推理的分布对齐序列蒸馏） [06:04] 🧠 ExpSeek: Self-Triggered Experience Seeking for Web Agents（ExpSeek：面向网络智能体的自触发经验寻求方法） [06:53] ⚠ Are LLMs Vulnerable to Preference-Undermining Attacks (PUA)? A Factorial Analysis Methodology for Diagnosing the Trade-off between Preference Alignment and Real-World Validity（大型语言模型是否易受偏好颠覆攻击？一种诊断偏好对齐与现实有效性权衡的因子分析方法论） [07:30] 🔄 EvoFSM: Controllable Self-Evolution for Deep Research with Finite State Machines（EvoFSM：基于有限状态机的可控自演化深度研究框架） [08:04] 🧠 Imagine-then-Plan: Agent Learning from Adaptive Lookahead with World Models（想象而后规划：基于世界模型的自适应前瞻智能体学习） [08:46] 🌐 TranslateGemma Technical Report（TranslateGemma技术报告） [09:22] 🧠 The AI Hippocampus: How Far are We From Human Memory?（AI海马体：我们距离人类记忆还有多远？） [10:03] 🎯 FocusUI: Efficient UI Grounding via Position-Preserving Visual Token Selection（FocusUI：通过位置保持的视觉令牌选择实现高效用户界面定位）【关注我们】您还可以在以下平台找到我们，获得播客内容以外更多信息小红书: AI速递

11分钟

2026.01.14 | 合成数据喂出低资源学霸；AI自演多轮对话更靠谱

本期的 15 篇论文如下： [00:20] 🌍 Solar Open Technical Report（Solar Open 技术报告） [00:54] 🤖 User-Oriented Multi-Turn Dialogue Generation with Tool Use at scale（面向用户的大规模多轮对话生成与工具使用） [01:39] 🧠 MemGovern: Enhancing Code Agents through Learning from Governed Human Experiences（MemGovern：通过从受治理的人类经验中学习来增强代码代理） [02:11] 🖱 ShowUI-$π$: Flow-based Generative Models as GUI Dexterous Hands（ShowUI-π：基于流的生成模型作为GUI灵巧手） [02:44] 🧠 KnowMe-Bench: Benchmarking Person Understanding for Lifelong Digital Companions（KnowMe-Bench：面向终身数字伴侣的人物理解基准测试） [03:15] 🏆 ArenaRL: Scaling RL for Open-Ended Agents via Tournament-based Relative Ranking（ArenaRL：通过基于锦标赛的相对排名扩展开放智能体强化学习） [04:07] 🧠 Ministral 3（Ministral 3系列模型） [04:51] ⚖ The Confidence Dichotomy: Analyzing and Mitigating Miscalibration in Tool-Use Agents（置信度二分法：分析与缓解工具使用智能体中的校准错误） [05:31] 🧭 VLingNav: Embodied Navigation with Adaptive Reasoning and Visual-Assisted Linguistic Memory（VLingNav：基于自适应推理与视觉辅助语言记忆的具身导航） [06:24] 🎬 End-to-End Video Character Replacement without Structural Guidance（无需结构引导的端到端视频角色替换） [07:06] 🎬 Motion Attribution for Video Generation（视频生成中的运动归因） [07:36] 🚀 SnapGen++: Unleashing Diffusion Transformers for Efficient High-Fidelity Image Generation on Edge Devices（SnapGen++：释放扩散变换器在边缘设备上实现高效高保真图像生成） [08:12] ⚖ JudgeRLVR: Judge First, Generate Second for Efficient Reasoning（JudgeRLVR：先判断后生成的高效推理方法） [08:46] 📊 Aligning Text, Code, and Vision: A Multi-Objective Reinforcement Learning Framework for Text-to-Visualization（对齐文本、代码与视觉：基于多目标强化学习的文本到可视化生成框架） [09:25] 🔍 Towards Comprehensive Stage-wise Benchmarking of Large Language Models in Fact-Checking（迈向大型语言模型在事实核查中的全面分阶段基准测试）【关注我们】您还可以在以下平台找到我们，获得播客内容以外更多信息小红书: AI速递

10分钟