节目列表: HuggingFace 每日AI论文速递 - EarsOnMe

2026.02.02 | ASTRA合成轨迹炼工具；THINKSAFE自对齐保安全

【赞助商】通勤路上就听AI每周谈。AI每周谈，每周带你回顾上周AI大事传送门 🔗https://www.xiaoyuzhoufm.com/podcast/688a34636f5a275f1cba40fd 【目录】本期的 15 篇论文如下： [00:33] 🤖 ASTRA: Automated Synthesis of agentic Trajectories and Reinforcement Arenas（ASTRA：基于自动化轨迹合成与强化学习竞技场的智能体训练框架） [01:22] 🛡 THINKSAFE: Self-Generated Safety Alignment for Reasoning Models（THINKSAFE：推理模型的自生成安全对齐） [02:18] 🧠 TTCS: Test-Time Curriculum Synthesis for Self-Evolving（TTCS：面向自进化的测试时课程合成） [03:09] 🍌 PaperBanana: Automating Academic Illustration for AI Scientists（PaperBanana：面向AI科学家的学术插图自动化生成框架） [03:51] 🔬 FourierSampler: Unlocking Non-Autoregressive Potential in Diffusion Language Models via Frequency-Guided Generation（傅里叶采样器：通过频率引导生成解锁扩散语言模型的非自回归潜力） [04:40] 🧠 ReGuLaR: Variational Latent Reasoning Guided by Rendered Chain-of-Thought（ReGuLaR：基于渲染思维链指导的变分潜在推理） [05:22] 🎯 SSL: Sweet Spot Learning for Differentiated Guidance in Agentic Optimization（SSL：基于甜点学习的差异化引导智能体优化） [06:02] 🎯 DenseGRPO: From Sparse to Dense Reward for Flow Matching Model Alignment（DenseGRPO：从稀疏奖励到稠密奖励的流匹配模型对齐方法） [07:08] 🧠 Pushing the Boundaries of Natural Reasoning: Interleaved Bonus from Formal-Logic Verification（突破自然推理的边界：形式逻辑验证的交织增益） [07:55] 📄 PaddleOCR-VL-1.5: Towards a Multi-Task 0.9B VLM for Robust In-the-Wild Document Parsing（PaddleOCR-VL-1.5：面向鲁棒野外文档解析的多任务0.9B视觉语言模型） [08:45] 🎬 DreamActor-M2: Universal Character Image Animation via Spatiotemporal In-Context Learning（DreamActor-M2：通过时空上下文学习的通用角色图像动画） [09:42] 🧠 MemOCR: Layout-Aware Visual Memory for Efficient Long-Horizon Reasoning（MemOCR：面向高效长程推理的布局感知视觉记忆） [10:24] 🦢 Golden Goose: A Simple Trick to Synthesize Unlimited RLVR Tasks from Unverifiable Internet Text（金鹅：一种从未经验证的互联网文本中合成无限RLVR任务的简单技巧） [11:13] 📊 Statistical Estimation of Adversarial Risk in Large Language Models under Best-of-N Sampling（大语言模型在最佳N采样下对抗性风险的统计估计） [12:00] ⚡ RM -RF: Reward Model for Run-Free Unit Test Evaluation（RM-RF：一种用于免运行单元测试评估的奖励模型）【关注我们】您还可以在以下平台找到我们，获得播客内容以外更多信息小红书: AI速递

13分钟

56

1天前

【月末特辑】1月最火AI论文 | mHC稳梯度；GDPO解多奖励

HuggingFace 每日AI论文速递

【赞助商】通勤路上就听AI每周谈。AI每周谈，每周带你回顾上周AI大事传送门 🔗https://www.xiaoyuzhoufm.com/podcast/688a34636f5a275f1cba40fd 【目录】本期的 10 篇论文如下： [00:42] TOP1(🔥292) | 🧠 mHC: Manifold-Constrained Hyper-Connections（mHC：流形约束的超连接） [03:06] TOP2(🔥212) | 📈 GDPO: Group reward-Decoupled Normalization Policy Optimization for Multi-reward RL Optimization（GDPO：面向多奖励强化学习优化的组奖励解耦归一化策略优化） [04:45] TOP3(🔥209) | 🔍 Watching, Reasoning, and Searching: A Video Deep Research Benchmark on Open Web for Agentic Video Reasoning（观察、推理与搜索：面向智能体视频推理的开放网络视频深度研究基准） [06:59] TOP4(🔥193) | 👶 BabyVision: Visual Reasoning Beyond Language（BabyVision：超越语言的视觉推理） [08:57] TOP5(🔥190) | 🚀 STEP3-VL-10B Technical Report（STEP3-VL-10B 技术报告） [10:39] TOP6(🔥186) | 🤖 Agentic Reasoning for Large Language Models（大语言模型的智能体推理） [12:58] TOP7(🔥181) | 🧹 Can LLMs Clean Up Your Mess? A Survey of Application-Ready Data Preparation with LLMs（大语言模型能否清理你的数据？基于LLM的应用就绪数据准备综述） [15:19] TOP8(🔥171) | 🧠 LongCat-Flash-Thinking-2601 Technical Report（LongCat-Flash-Thinking-2601 技术报告） [17:22] TOP9(🔥165) | 🗺 Thinking with Map: Reinforced Parallel Map-Augmented Agent for Geolocalization（借助地图思考：用于地理定位的强化并行地图增强智能体） [19:17] TOP10(🔥158) | 🧠 Idea2Story: An Automated Pipeline for Transforming Research Concepts into Complete Scientific Narratives（Idea2Story：将研究概念转化为完整科学叙事的自动化流程）【关注我们】您还可以在以下平台找到我们，获得播客内容以外更多信息小红书: AI速递

22分钟

98

2天前

【周末特辑】2月第1周最火AI论文 | LLM当管家，数据变净菜；LongCat训特工，上网打副本

HuggingFace 每日AI论文速递

【赞助商】通勤路上就听AI每周谈。AI每周谈，每周带你回顾上周AI大事传送门 🔗https://www.xiaoyuzhoufm.com/podcast/688a34636f5a275f1cba40fd 【目录】本期的 5 篇论文如下： [00:39] TOP1(🔥181) | 🧹 Can LLMs Clean Up Your Mess? A Survey of Application-Ready Data Preparation with LLMs（大语言模型能否清理你的数据？基于LLM的应用就绪数据准备综述） [02:50] TOP2(🔥169) | 🧠 LongCat-Flash-Thinking-2601 Technical Report（LongCat-Flash-Thinking-2601 技术报告） [04:51] TOP3(🔥138) | 🧠 Idea2Story: An Automated Pipeline for Transforming Research Concepts into Complete Scientific Narratives（Idea2Story：将研究概念转化为完整科学叙事的自动化流程） [06:40] TOP4(🔥123) | 🤖 daVinci-Dev: Agent-native Mid-training for Software Engineering（daVinci-Dev：面向软件工程的智能体原生中期训练） [08:51] TOP5(🔥120) | 🛡 AgentDoG: A Diagnostic Guardrail Framework for AI Agent Safety and Security（AgentDoG：面向AI智能体安全与安全的诊断性护栏框架）【关注我们】您还可以在以下平台找到我们，获得播客内容以外更多信息小红书: AI速递

11分钟

99+

2天前

2026.01.30 | 空间智能基准测不准；Idea2Story一键成文

HuggingFace 每日AI论文速递

【赞助商】通勤路上就听AI每周谈。AI每周谈，每周带你回顾上周AI大事传送门 🔗https://www.xiaoyuzhoufm.com/podcast/688a34636f5a275f1cba40fd 【目录】本期的 15 篇论文如下： [00:29] 🧭 Everything in Its Place: Benchmarking Spatial Intelligence of Text-to-Image Models（万物归位：文本到图像模型空间智能基准测试） [01:21] 🧠 Idea2Story: An Automated Pipeline for Transforming Research Concepts into Complete Scientific Narratives（Idea2Story：将研究概念转化为完整科学叙事的自动化流程） [02:19] ⚡ Scaling Embeddings Outperforms Scaling Experts in Language Models（在语言模型中扩展嵌入层优于扩展专家混合） [02:58] 🔍 OCRVerse: Towards Holistic OCR in End-to-End Vision-Language Models（OCRVerse：迈向端到端视觉语言模型中的整体OCR） [03:39] 🤖 DynamicVLA: A Vision-Language-Action Model for Dynamic Object Manipulation（DynamicVLA：面向动态物体操作的视觉-语言-动作模型） [04:33] 🧠 MMFineReason: Closing the Multimodal Reasoning Gap via Open Data-Centric Methods（MMFineReason：通过开放数据为中心的方法弥合多模态推理鸿沟） [05:20] 🔺 PLANING: A Loosely Coupled Triangle-Gaussian Framework for Streaming 3D Reconstruction（PLANING：一种用于流式三维重建的松散耦合三角-高斯框架） [06:08] 🧠 ConceptMoE: Adaptive Token-to-Concept Compression for Implicit Compute Allocation（ConceptMoE：面向隐式计算分配的自适应令牌到概念压缩） [07:01] 🧩 AgentLongBench: A Controllable Long Benchmark For Long-Contexts Agents via Environment Rollouts（AgentLongBench：通过环境推演实现可控的长上下文智能体基准测试） [07:43] 🧠 Exploring Reasoning Reward Model for Agents（探索智能体推理奖励模型） [08:39] 🎤 Qwen3-ASR Technical Report（Qwen3-ASR技术报告） [09:27] 🚀 Language-based Trial and Error Falls Behind in the Era of Experience（经验时代下基于语言的试错方法已然落后） [10:16] 🌐 Typhoon-S: Minimal Open Post-Training for Sovereign Large Language Models（台风-S：主权大语言模型的最小化开放后训练方法） [11:02] ⚡ Scalable Power Sampling: Unlocking Efficient, Training-Free Reasoning for LLMs via Distribution Sharpening（可扩展的幂采样：通过分布锐化解锁LLM高效、免训练推理） [11:59] 🧠 MAD: Modality-Adaptive Decoding for Mitigating Cross-Modal Hallucinations in Multimodal Large Language Models（MAD：模态自适应解码用于缓解多模态大语言模型中的跨模态幻觉）【关注我们】您还可以在以下平台找到我们，获得播客内容以外更多信息小红书: AI速递

13分钟

99+

4天前

2026.01.29 | 难题优先补数学推理；LingBot生成交互平行世界

HuggingFace 每日AI论文速递

【赞助商】通勤路上就听AI每周谈。AI每周谈，每周带你回顾上周AI大事传送门 🔗https://www.xiaoyuzhoufm.com/podcast/688a34636f5a275f1cba40fd 【目录】本期的 13 篇论文如下： [00:33] 🧠 Harder Is Better: Boosting Mathematical Reasoning via Difficulty-Aware GRPO and Multi-Aspect Question Reformulation（越难越好：通过难度感知GRPO与多角度问题重构提升数学推理能力） [01:21] 🌍 Advancing Open-source World Models（推进开源世界模型） [01:55] 🧠 DeepSeek-OCR 2: Visual Causal Flow（DeepSeek-OCR 2：视觉因果流） [02:58] 🚀 Spark: Strategic Policy-Aware Exploration via Dynamic Branching for Long-Horizon Agentic Learning（Spark：通过关键状态动态分支实现战略策略感知探索的长视野智能体学习） [03:49] 🔬 Innovator-VL: A Multimodal Large Language Model for Scientific Discovery（创新者-VL：面向科学发现的多模态大语言模型） [04:34] 🔄 Linear representations in language models can change dramatically over a conversation（语言模型中的线性表征在对话过程中会发生剧烈变化） [05:26] 🚀 SERA: Soft-Verified Efficient Repository Agents（SERA：软验证高效代码库智能体） [06:01] 🤖 OmegaUse: Building a General-Purpose GUI Agent for Autonomous Task Execution（OmegaUse：构建用于自主任务执行的通用图形用户界面代理） [06:46] 🤖 GDCNet: Generative Discrepancy Comparison Network for Multimodal Sarcasm Detection（GDCNet：用于多模态讽刺检测的生成式差异比较网络） [07:37] 🗣 SE-DiCoW: Self-Enrolled Diarization-Conditioned Whisper（SE-DiCoW：自注册的说话人日志条件化Whisper模型） [08:27] 📊 RIR-Mega-Speech: A Reverberant Speech Corpus with Comprehensive Acoustic Metadata and Reproducible Evaluation（RIR-Mega-Speech：一个包含全面声学元数据且可复现评估的混响语音语料库） [09:16] ✏ SketchDynamics: Exploring Free-Form Sketches for Dynamic Intent Expression in Animation Generation（SketchDynamics：探索自由手绘草图在动画生成中的动态意图表达） [10:07] 🚀 UPLiFT: Efficient Pixel-Dense Feature Upsampling with Local Attenders（UPLiFT：利用局部注意力机制实现高效像素密集特征上采样）【关注我们】您还可以在以下平台找到我们，获得播客内容以外更多信息小红书: AI速递

11分钟

99+

5天前

2026.01.28 | AgentDoG筑护栏诊断风险根源；AdaReasoner排工具小模型逆袭

HuggingFace 每日AI论文速递

【赞助商】通勤路上就听AI每周谈。AI每周谈，每周带你回顾上周AI大事传送门 🔗https://www.xiaoyuzhoufm.com/podcast/688a34636f5a275f1cba40fd 【目录】本期的 14 篇论文如下： [00:30] 🛡 AgentDoG: A Diagnostic Guardrail Framework for AI Agent Safety and Security（AgentDoG：面向AI智能体安全与安全的诊断性护栏框架） [01:21] 🧩 AdaReasoner: Dynamic Tool Orchestration for Iterative Visual Reasoning（AdaReasoner：面向迭代式视觉推理的动态工具编排） [02:11] 🤖 A Pragmatic VLA Foundation Model（一个实用的VLA基础模型） [02:56] 🧠 Visual Generation Unlocks Human-Like Reasoning through Multimodal World Models（视觉生成通过多模态世界模型解锁类人推理） [03:39] 🌍 World Craft: Agentic Framework to Create Visualizable Worlds via Text（World Craft：通过文本创建可视化世界的智能体框架） [04:26] 🧠 AVMeme Exam: A Multimodal Multilingual Multicultural Benchmark for LLMs' Contextual and Cultural Knowledge and Thinking（AVMeme 考试：针对大语言模型情境与文化知识与思维能力的多模态多语言多文化基准测试） [05:08] 🌲 FABLE: Forest-Based Adaptive Bi-Path LLM-Enhanced Retrieval for Multi-Document Reasoning（FABLE：基于森林的自适应双路径LLM增强检索用于多文档推理） [05:55] 🛡 TriPlay-RL: Tri-Role Self-Play Reinforcement Learning for LLM Safety Alignment（TriPlay-RL：面向大语言模型安全对齐的三角色自博弈强化学习） [06:44] 🎯 Selective Steering: Norm-Preserving Control Through Discriminative Layer Selection（选择性导向：通过判别性层选择实现规范保持的控制） [07:17] ⚡ Revisiting Parameter Server in LLM Post-Training（重新审视大语言模型后训练中的参数服务器范式） [08:00] 🧠 Post-LayerNorm Is Back: Stable, ExpressivE, and Deep（后层归一化回归：稳定、高表达且深度的Transformer架构） [08:38] 🧬 GPCR-Filter: a deep learning framework for efficient and precise GPCR modulator discovery（GPCR-Filter：用于高效精准GPCR调节剂发现的深度学习框架） [09:39] ⚠ HalluCitation Matters: Revealing the Impact of Hallucinated References with 300 Hallucinated Papers in ACL Conferences（幻觉引用问题：基于ACL会议中300篇幻觉论文揭示其影响） [10:38] 📊 Benchmarks Saturate When The Model Gets Smarter Than The Judge（当模型比评估者更聪明时，基准测试趋于饱和）【关注我们】您还可以在以下平台找到我们，获得播客内容以外更多信息小红书: AI速递

11分钟

99+

6天前

2026.01.27 | Agent原生训练刷新SWE-Bench；LLM重塑数据清洗 pipeline

HuggingFace 每日AI论文速递

【赞助商】通勤路上就听AI每周谈。AI每周谈，每周带你回顾上周AI大事传送门 🔗https://www.xiaoyuzhoufm.com/podcast/688a34636f5a275f1cba40fd 【目录】本期的 15 篇论文如下： [00:33] 🤖 daVinci-Dev: Agent-native Mid-training for Software Engineering（daVinci-Dev：面向软件工程的智能体原生中期训练） [01:21] 🧹 Can LLMs Clean Up Your Mess? A Survey of Application-Ready Data Preparation with LLMs（大语言模型能否清理你的数据？基于LLM的应用就绪数据准备综述） [02:21] 🎬 The Script is All You Need: An Agentic Framework for Long-Horizon Dialogue-to-Cinematic Video Generation（剧本即一切：面向长时域对话到电影视频生成的智能体框架） [03:08] 🔬 Scientific Image Synthesis: Benchmarking, Methodologies, and Downstream Utility（科学图像合成：基准测试、方法论与下游效用） [04:00] 🔬 iFSQ: Improving FSQ for Image Generation with 1 Line of Code（iFSQ：一行代码改进FSQ用于图像生成） [04:42] ⚡ Elastic Attention: Test-time Adaptive Sparsity Ratios for Efficient Transformers（弹性注意力：面向高效Transformer的测试时自适应稀疏率） [05:36] 🎬 Self-Refining Video Sampling（自优化视频采样） [06:31] 🧠 Teaching Models to Teach Themselves: Reasoning at the Edge of Learnability（教模型自我教学：可学习性边缘的推理） [07:23] 🎤 VIBEVOICE-ASR Technical Report（VIBEVOICE-ASR技术报告） [08:06] 📊 CGPT: Cluster-Guided Partial Tables with LLM-Generated Supervision for Table Retrieval（CGPT：基于聚类引导的部分表格与LLM生成监督的表格检索方法） [09:04] 📊 STAR: Semantic Table Representation with Header-Aware Clustering and Adaptive Weighted Fusion（STAR：基于表头感知聚类与自适应加权融合的语义表格表示） [09:51] 🧠 Paying Less Generalization Tax: A Cross-Domain Generalization Study of RL Training for LLM Agents（减少泛化税：关于LLM智能体强化学习训练的跨领域泛化研究） [10:26] 🚀 AR-Omni: A Unified Autoregressive Model for Any-to-Any Generation（AR-Omni：一种用于任意到任意生成的统一自回归模型） [11:15] 🔍 SAGE: Steerable Agentic Data Generation for Deep Search with Execution Feedback（SAGE：基于执行反馈的可控智能体数据生成用于深度搜索） [12:04] 🤖 Agentic Very Long Video Understanding（基于智能体的超长视频理解）【关注我们】您还可以在以下平台找到我们，获得播客内容以外更多信息小红书: AI速递

13分钟

99+

1周前

2026.01.26 | LongCat练5600亿MoE代理满分；SWE-Pruner剪五成Token更快

HuggingFace 每日AI论文速递

【赞助商】通勤路上就听AI每周谈。AI每周谈，每周带你回顾上周AI大事传送门 🔗https://www.xiaoyuzhoufm.com/podcast/688a34636f5a275f1cba40fd 【目录】本期的 15 篇论文如下： [00:32] 🧠 LongCat-Flash-Thinking-2601 Technical Report（LongCat-Flash-Thinking-2601 技术报告） [01:13] ✂ SWE-Pruner: Self-Adaptive Context Pruning for Coding Agents（SWE-Pruner：面向编码代理的自适应上下文剪枝框架） [02:08] 🧠 TwinBrainVLA: Unleashing the Potential of Generalist VLMs for Embodied Tasks via Asymmetric Mixture-of-Transformers（TwinBrainVLA：通过非对称混合Transformer释放通用视觉语言模型在具身任务中的潜力） [02:58] 🧠 VisGym: Diverse, Customizable, Scalable Environments for Multimodal Agents（VisGym：面向多模态智能体的多样化、可定制、可扩展环境） [03:58] 🧬 Inference-Time Scaling of Verification: Self-Evolving Deep Research Agents via Test-Time Rubric-Guided Verification（验证的推理时扩展：通过测试时准则引导验证实现自演化的深度研究智能体） [04:40] ⚡ Jet-RL: Enabling On-Policy FP8 Reinforcement Learning with Unified Training and Rollout Precision Flow（Jet-RL：通过统一的训练与推理精度流实现基于策略的FP8强化学习） [05:32] ⚡ SALAD: Achieve High-Sparsity Attention via Efficient Linear Attention Tuning for Video Diffusion Transformer（SALAD：通过高效线性注意力调优实现视频扩散Transformer的高稀疏性注意力） [06:11] 🧠 MeepleLM: A Virtual Playtester Simulating Diverse Subjective Experiences（MeepleLM：模拟多样化主观体验的虚拟游戏测试员） [06:55] 🎬 Memory-V2V: Augmenting Video-to-Video Diffusion Models with Memory（Memory-V2V：利用记忆增强视频到视频扩散模型） [07:43] 🧠 Knowledge is Not Enough: Injecting RL Skills for Continual Adaptation（知识不足够：注入强化学习技能以实现持续适应） [08:22] 🚀 Endless Terminals: Scaling RL Environments for Terminal Agents（无尽终端：为终端智能体扩展强化学习环境） [09:09] 🧪 DSGym: A Holistic Framework for Evaluating and Training Data Science Agents（DSGym：一个用于评估和训练数据科学智能体的整体框架） [10:11] 🧠 Dancing in Chains: Strategic Persuasion in Academic Rebuttal via Theory of Mind（镣铐之舞：基于心智理论的学术反驳中的策略性说服） [10:58] 💻 Guidelines to Prompt Large Language Models for Code Generation: An Empirical Characterization（面向代码生成的大语言模型提示指南：一项实证性特征研究） [11:39] ⚖ Mecellem Models: Turkish Models Trained from Scratch and Continually Pre-trained for the Legal Domain（Mecellem模型：针对法律领域从零开始训练与持续预训练的土耳其语模型）【关注我们】您还可以在以下平台找到我们，获得播客内容以外更多信息小红书: AI速递

12分钟

99+

1周前

【周末特辑】1月第4周最火AI论文 | Agentic LLM进化成行动派；群体RL纠偏难度歧视

HuggingFace 每日AI论文速递

【赞助商】通勤路上就听AI每周谈。AI每周谈，每周带你回顾上周AI大事传送门 🔗https://www.xiaoyuzhoufm.com/podcast/688a34636f5a275f1cba40fd 【目录】本期的 5 篇论文如下： [00:44] TOP1(🔥159) | 🤖 Agentic Reasoning for Large Language Models（大语言模型的智能体推理） [03:02] TOP2(🔥138) | ⚖ Your Group-Relative Advantage Is Biased（你的组相对优势存在偏差） [05:37] TOP3(🔥71) | 🤖 Being-H0.5: Scaling Human-Centric Robot Learning for Cross-Embodiment Generalization（Being-H0.5：基于人类中心机器人学习的跨具身泛化扩展） [08:18] TOP4(🔥63) | 🚀 EvoCUA: Evolving Computer Use Agents via Learning from Scalable Synthetic Experience（EvoCUA：通过从可扩展合成经验中学习来演化计算机使用智能体） [10:14] TOP5(🔥62) | ⚙ ABC-Bench: Benchmarking Agentic Backend Coding in Real-World Development（ABC-Bench：面向真实世界开发的智能体后端编码基准测试）【关注我们】您还可以在以下平台找到我们，获得播客内容以外更多信息小红书: AI速递

12分钟

99+

1周前

2026.01.23 | BayesianVLA逼模型“读心”；扩散模型“按顺序”更聪明

HuggingFace 每日AI论文速递

【赞助商】通勤路上就听AI每周谈。AI每周谈，每周带你回顾上周AI大事传送门 🔗https://www.xiaoyuzhoufm.com/podcast/688a34636f5a275f1cba40fd 【目录】本期的 15 篇论文如下： [00:32] 🤖 BayesianVLA: Bayesian Decomposition of Vision Language Action Models via Latent Action Queries（BayesianVLA：通过潜在动作查询对视觉语言动作模型进行贝叶斯分解） [01:22] ⚠ The Flexibility Trap: Why Arbitrary Order Limits Reasoning Potential in Diffusion Language Models（灵活性陷阱：为何任意顺序生成会限制扩散语言模型的推理潜力） [02:26] 🎥 HERMES: KV Cache as Hierarchical Memory for Efficient Streaming Video Understanding（HERMES：将KV缓存作为分层内存以实现高效流式视频理解） [03:14] 🚀 EvoCUA: Evolving Computer Use Agents via Learning from Scalable Synthetic Experience（EvoCUA：通过从可扩展合成经验中学习来演化计算机使用智能体） [04:02] 🧪 LLM-in-Sandbox Elicits General Agentic Intelligence（沙盒中的LLM激发通用智能体智能） [04:54] 🚀 Stable-DiffCoder: Pushing the Frontier of Code Diffusion Large Language Model（Stable-DiffCoder：推进代码扩散大语言模型的前沿） [05:34] 🎭 SAMTok: Representing Any Mask with Two Words（SAMTok：用两个词表示任意掩码） [06:30] 🚀 Scaling Text-to-Image Diffusion Transformers with Representation Autoencoders（使用表征自编码器扩展文本到图像扩散变换器） [07:23] 🔬 Learning to Discover at Test Time（在测试时学习发现） [08:08] 🔍 Rethinking Composed Image Retrieval Evaluation: A Fine-Grained Benchmark from Image Editing（重新思考组合图像检索评估：一个源自图像编辑的细粒度基准） [09:06] ⚙ Towards Automated Kernel Generation in the Era of LLMs（大语言模型时代的自动化内核生成研究） [09:48] 🔄 OpenVision 3: A Family of Unified Visual Encoder for Both Understanding and Generation（OpenVision 3：一个用于理解和生成的统一视觉编码器家族） [10:45] 💻 Terminal-Bench: Benchmarking Agents on Hard, Realistic Tasks in Command Line Interfaces（终端基准测试：在命令行界面中对智能体进行困难、现实任务的基准评估） [11:29] 🗣 Qwen3-TTS Technical Report（Qwen3-TTS技术报告） [12:13] 🤖 Cosmos Policy: Fine-Tuning Video Models for Visuomotor Control and Planning（Cosmos策略：通过微调视频模型实现视觉运动控制与规划）【关注我们】您还可以在以下平台找到我们，获得播客内容以外更多信息小红书: AI速递

13分钟

99+

1周前

2026.01.22 | LLM变数字特工；视频模型先考后练

HuggingFace 每日AI论文速递

【赞助商】通勤路上就听AI每周谈。AI每周谈，每周带你回顾上周AI大事传送门 🔗https://www.xiaoyuzhoufm.com/podcast/688a34636f5a275f1cba40fd 【目录】本期的 15 篇论文如下： [00:30] 🤖 Agentic Reasoning for Large Language Models（大语言模型的智能体推理） [01:05] 🤖 Rethinking Video Generation Model for the Embodied World（为具身世界重新思考视频生成模型） [01:43] 🤖 Paper2Rebuttal: A Multi-Agent Framework for Transparent Author Response Assistance（Paper2Rebuttal：一个用于透明作者回复辅助的多智能体框架） [02:34] 📊 MMDeepResearch-Bench: A Benchmark for Multimodal Deep Research Agents（MMDeepResearch-Bench：面向多模态深度研究智能体的基准测试） [03:24] 🧠 Render-of-Thought: Rendering Textual Chain-of-Thought as Images for Visual Latent Reasoning（思维渲染：将文本链式思维渲染为图像以进行视觉潜在推理） [04:03] 📄 Typhoon OCR: Open Vision-Language Model For Thai Document Extraction（台风OCR：面向泰语文档提取的开放视觉语言模型） [04:51] 🛡 FinVault: Benchmarking Financial Agent Safety in Execution-Grounded Environments（FinVault：面向执行环境基准测试的金融智能体安全性评估） [05:41] ⚡ Typhoon ASR Real-time: FastConformer-Transducer for Thai Automatic Speech Recognition（台风ASR实时系统：面向泰语自动语音识别的FastConformer-Transducer模型） [06:45] 🔍 XR: Cross-Modal Agents for Composed Image Retrieval（XR：用于组合图像检索的跨模态智能体） [07:29] 🔊 Quantifying Speaker Embedding Phonological Rule Interactions in Accented Speech Synthesis（量化口音语音合成中说话人嵌入与音系规则的交互作用） [08:19] 🤖 Numina-Lean-Agent: An Open and General Agentic Reasoning System for Formal Mathematics（Numina-Lean-Agent：一个开放通用的形式数学智能体推理系统） [09:15] 🤖 RoboBrain 2.5: Depth in Sight, Time in Mind（RoboBrain 2.5：洞见深度，心系时序） [10:16] 🔍 Lost in the Prompt Order: Revealing the Limitations of Causal Attention in Language Models（迷失于提示顺序：揭示语言模型中因果注意力的局限性） [10:59] 🧠 AgentEHR: Advancing Autonomous Clinical Decision-Making via Retrospective Summarization（AgentEHR：通过回顾性摘要推进自主临床决策） [11:43] 🕳 The Responsibility Vacuum: Organizational Failure in Scaled Agent Systems（责任真空：规模化智能体系统中的组织性失效）【关注我们】您还可以在以下平台找到我们，获得播客内容以外更多信息小红书: AI速递

12分钟

99+

1周前

2026.01.21 | AI修Bug统一打分；MLLM未来预测仍易盲猜

HuggingFace 每日AI论文速递

【赞助商】通勤路上就听AI每周谈。AI每周谈，每周带你回顾上周AI大事传送门 🔗https://www.xiaoyuzhoufm.com/podcast/688a34636f5a275f1cba40fd 【目录】本期的 15 篇论文如下： [00:31] 🤖 Advances and Frontiers of LLM-based Issue Resolution in Software Engineering: A Comprehensive Survey（基于大语言模型的软件工程问题解决：进展、前沿与全面综述） [01:15] 🔮 FutureOmni: Evaluating Future Forecasting from Omni-Modal Context for Multimodal LLMs（FutureOmni：评估多模态大语言模型基于全模态上下文进行未来预测的能力） [02:11] ⚡ Toward Efficient Agents: Memory, Tool learning, and Planning（迈向高效智能体：记忆、工具学习与规划） [02:51] 🤖 Being-H0.5: Scaling Human-Centric Robot Learning for Cross-Embodiment Generalization（Being-H0.5：基于人类中心机器人学习的跨具身泛化扩展） [03:40] 🎬 OmniTransfer: All-in-one Framework for Spatio-temporal Video Transfer（OmniTransfer：时空视频迁移的一体化框架） [04:28] 🧠 $\texttt{MemoryRewardBench}$: Benchmarking Reward Models for Long-Term Memory Management in Large Language Models（《MemoryRewardBench：面向大语言模型长期记忆管理的奖励模型基准评测》） [05:15] 🧠 Think3D: Thinking with Space for Spatial Reasoning（Think3D：利用空间进行空间推理的思考） [06:06] 🫁 UniX: Unifying Autoregression and Diffusion for Chest X-Ray Understanding and Generation（UniX：统一自回归与扩散模型用于胸部X光片理解与生成） [07:08] ⚙ ToolPRMBench: Evaluating and Advancing Process Reward Models for Tool-using Agents（ToolPRMBench：评估和推进工具使用智能体的过程奖励模型） [07:58] 🧠 Aligning Agentic World Models via Knowledgeable Experience Learning（通过知识化经验学习对齐具身世界模型） [08:45] 🤖 Agentic-R: Learning to Retrieve for Agentic Search（Agentic-R：面向智能体搜索的检索学习） [09:25] 🔤 LightOnOCR: A 1B End-to-End Multilingual Vision-Language Model for State-of-the-Art OCR（LightOnOCR：一个用于最先进OCR的10亿参数端到端多语言视觉语言模型） [10:14] 📊 PRiSM: Benchmarking Phone Realization in Speech Models（PRiSM：语音模型中音素实现的基准测试） [11:02] 🔍 On the Evidentiary Limits of Membership Inference for Copyright Auditing（论成员推理在版权审计中的证据性局限） [11:46] 🔒 Fundamental Limitations of Favorable Privacy-Utility Guarantees for DP-SGD（差分隐私随机梯度下降（DP-SGD）中有利隐私-效用保证的基本局限性）【关注我们】您还可以在以下平台找到我们，获得播客内容以外更多信息小红书: AI速递

13分钟

99+

1周前