HuggingFace 每日AI论文速递 - 节目列表

2026.01.14 | 合成数据喂出低资源学霸;AI自演多轮对话更靠谱

HuggingFace 每日AI论文速递

本期的 15 篇论文如下: [00:20] 🌍 Solar Open Technical Report(Solar Open 技术报告) [00:54] 🤖 User-Oriented Multi-Turn Dialogue Generation with Tool Use at scale(面向用户的大规模多轮对话生成与工具使用) [01:39] 🧠 MemGovern: Enhancing Code Agents through Learning from Governed Human Experiences(MemGovern:通过从受治理的人类经验中学习来增强代码代理) [02:11] 🖱 ShowUI-$π$: Flow-based Generative Models as GUI Dexterous Hands(ShowUI-π:基于流的生成模型作为GUI灵巧手) [02:44] 🧠 KnowMe-Bench: Benchmarking Person Understanding for Lifelong Digital Companions(KnowMe-Bench:面向终身数字伴侣的人物理解基准测试) [03:15] 🏆 ArenaRL: Scaling RL for Open-Ended Agents via Tournament-based Relative Ranking(ArenaRL:通过基于锦标赛的相对排名扩展开放智能体强化学习) [04:07] 🧠 Ministral 3(Ministral 3系列模型) [04:51] ⚖ The Confidence Dichotomy: Analyzing and Mitigating Miscalibration in Tool-Use Agents(置信度二分法:分析与缓解工具使用智能体中的校准错误) [05:31] 🧭 VLingNav: Embodied Navigation with Adaptive Reasoning and Visual-Assisted Linguistic Memory(VLingNav:基于自适应推理与视觉辅助语言记忆的具身导航) [06:24] 🎬 End-to-End Video Character Replacement without Structural Guidance(无需结构引导的端到端视频角色替换) [07:06] 🎬 Motion Attribution for Video Generation(视频生成中的运动归因) [07:36] 🚀 SnapGen++: Unleashing Diffusion Transformers for Efficient High-Fidelity Image Generation on Edge Devices(SnapGen++:释放扩散变换器在边缘设备上实现高效高保真图像生成) [08:12] ⚖ JudgeRLVR: Judge First, Generate Second for Efficient Reasoning(JudgeRLVR:先判断后生成的高效推理方法) [08:46] 📊 Aligning Text, Code, and Vision: A Multi-Objective Reinforcement Learning Framework for Text-to-Visualization(对齐文本、代码与视觉:基于多目标强化学习的文本到可视化生成框架) [09:25] 🔍 Towards Comprehensive Stage-wise Benchmarking of Large Language Models in Fact-Checking(迈向大型语言模型在事实核查中的全面分阶段基准测试) 【关注我们】 您还可以在以下平台找到我们,获得播客内容以外更多信息 小红书: AI速递

10分钟
99+
3个月前

2026.01.13 | VideoDR让模型边搜边推理;BabyVision揭视觉短板

HuggingFace 每日AI论文速递

本期的 15 篇论文如下: [00:20] 🔍 Watching, Reasoning, and Searching: A Video Deep Research Benchmark on Open Web for Agentic Video Reasoning(观察、推理与搜索:面向智能体视频推理的开放网络视频深度研究基准) [01:01] 👶 BabyVision: Visual Reasoning Beyond Language(BabyVision:超越语言的视觉推理) [01:45] 🚀 PaCoRe: Learning to Scale Test-Time Compute with Parallel Coordinated Reasoning(PaCoRe:通过并行协调推理学习扩展测试时计算) [02:24] 🧠 X-Coder: Advancing Competitive Programming with Fully Synthetic Tasks, Solutions, and Tests(X-Coder:基于全合成任务、解决方案与测试推进竞争性编程) [03:03] ⚡ MHLA: Restoring Expressivity of Linear Attention via Token-Level Multi-Head(MHLA:通过令牌级多头机制恢复线性注意力的表达能力) [03:41] ⚡ GlimpRouter: Efficient Collaborative Inference by Glimpsing One Token of Thoughts(GlimpRouter:通过瞥见思维令牌实现高效协同推理) [04:17] 🤖 OS-Symphony: A Holistic Framework for Robust and Generalist Computer-Using Agent(OS-Symphony:一个用于鲁棒且通用的计算机使用智能体的整体框架) [05:20] 📉 Lost in the Noise: How Reasoning Models Fail with Contextual Distractors(迷失于噪声之中:推理模型如何因上下文干扰物而失效) [06:00] 🚀 Beyond Hard Masks: Progressive Token Evolution for Diffusion Language Models(超越硬掩码:扩散语言模型的渐进式令牌演化) [06:30] 🧠 Controllable Memory Usage: Balancing Anchoring and Innovation in Long-Term Human-Agent Interaction(可控内存使用:在长期人机交互中平衡锚定与创新) [07:10] 🚗 DrivingGen: A Comprehensive Benchmark for Generative Video World Models in Autonomous Driving(DrivingGen:自动驾驶生成式视频世界模型的综合基准) [07:58] 🤖 MegaFlow: Large-Scale Distributed Orchestration System for the Agentic Era(MegaFlow:面向智能体时代的大规模分布式编排系统) [08:26] 🎨 Boosting Latent Diffusion Models via Disentangled Representation Alignment(通过解耦表征对齐提升潜在扩散模型) [09:08] 🤔 What Users Leave Unsaid: Under-Specified Queries Limit Vision-Language Models(用户未言明之处:欠明确的查询限制视觉语言模型) [09:45] 🔧 ET-Agent: Incentivizing Effective Tool-Integrated Reasoning Agent via Behavior Calibration(ET-Agent:通过行为校准激励有效的工具集成推理智能体) 【关注我们】 您还可以在以下平台找到我们,获得播客内容以外更多信息 小红书: AI速递

10分钟
99+
3个月前

2026.01.12 | 地图AI强化寻位;多模态Lean形式化

HuggingFace 每日AI论文速递

本期的 15 篇论文如下: [00:20] 🗺 Thinking with Map: Reinforced Parallel Map-Augmented Agent for Geolocalization(借助地图思考:用于地理定位的强化并行地图增强智能体) [01:03] 🧠 MMFormalizer: Multimodal Autoformalization in the Wild(MMFormalizer:面向真实世界的多模态自动形式化方法) [01:38] 🧬 The Molecular Structure of Thought: Mapping the Topology of Long Chain-of-Thought Reasoning(思维分子结构:长链思维推理的拓扑映射) [02:21] 🎭 CaricatureGS: Exaggerating 3D Gaussian Splatting Faces With Gaussian Curvature(CaricatureGS:基于高斯曲率夸张3D高斯泼溅人脸) [03:04] 🔍 Chaining the Evidence: Robust Reinforcement Learning for Deep Search Agents with Citation-Aware Rubric Rewards(证据链构建:基于引文感知评分奖励的深度搜索智能体鲁棒强化学习) [03:47] ⚙ EnvScaler: Scaling Tool-Interactive Environments for LLM Agent via Programmatic Synthesis(EnvScaler:通过程序化合成扩展LLM智能体的工具交互环境) [04:22] 🔮 Can We Predict Before Executing Machine Learning Agents?(我们能在执行前预测机器学习智能体的行为吗?) [04:59] 🖼 AgentOCR: Reimagining Agent History via Optical Self-Compression(AgentOCR:通过光学自压缩重构智能体历史) [05:39] 🎬 VideoAR: Autoregressive Video Generation via Next-Frame & Scale Prediction(VideoAR:通过下一帧与尺度预测的自回归视频生成) [06:29] 🔍 Qwen3-VL-Embedding and Qwen3-VL-Reranker: A Unified Framework for State-of-the-Art Multimodal Retrieval and Ranking(Qwen3-VL-Embedding与Qwen3-VL-Reranker:用于最先进多模态检索与排序的统一框架) [07:23] 🔍 Illusions of Confidence? Diagnosing LLM Truthfulness via Neighborhood Consistency(自信的幻觉?通过邻域一致性诊断大语言模型的真实性) [08:07] 🔄 Orient Anything V2: Unifying Orientation and Rotation Understanding(Orient Anything V2:统一物体朝向与旋转理解的增强基础模型) [08:37] 🔍 SmartSearch: Process Reward-Guided Query Refinement for Search Agents(SmartSearch:面向搜索代理的流程奖励引导查询优化框架) [09:23] ⚙ Goal Force: Teaching Video Models To Accomplish Physics-Conditioned Goals(目标力:教导视频模型实现物理条件目标) [10:11] 📊 Same Claim, Different Judgment: Benchmarking Scenario-Induced Bias in Multilingual Financial Misinformation Detection(相同声明,不同判断:多语言金融虚假信息检测中场景诱导偏见的基准测试) 【关注我们】 您还可以在以下平台找到我们,获得播客内容以外更多信息 小红书: AI速递

11分钟
99+
3个月前

2026.01.09 | GDPO解耦奖励优化多任务;可学习乘数解锁矩阵尺度

HuggingFace 每日AI论文速递

本期的 15 篇论文如下: [00:21] 📈 GDPO: Group reward-Decoupled Normalization Policy Optimization for Multi-reward RL Optimization(GDPO:面向多奖励强化学习优化的组奖励解耦归一化策略优化) [01:05] ⚖ Learnable Multipliers: Freeing the Scale of Language Model Matrix Layers(可学习的乘数:释放语言模型矩阵层的尺度) [01:33] 🌙 RL-AWB: Deep Reinforcement Learning for Auto White Balance Correction in Low-Light Night-time Scenes(RL-AWB:基于深度强化学习的低光照夜间场景自动白平衡校正) [02:07] 🤖 RoboVIP: Multi-View Video Generation with Visual Identity Prompting Augments Robot Manipulation(RoboVIP:基于视觉身份提示的多视角视频生成增强机器人操作) [02:56] 🤝 RelayLLM: Efficient Reasoning via Collaborative Decoding(RelayLLM:基于协作解码的高效推理框架) [03:31] 🌲 AT$^2$PO: Agentic Turn-based Policy Optimization via Tree Search(AT²PO:基于树搜索的智能体回合制策略优化) [04:24] 🤔 VideoAuto-R1: Video Auto Reasoning via Thinking Once, Answering Twice(VideoAuto-R1:通过思考一次、回答两次实现视频自动推理) [04:57] 🎬 VerseCrafter: Dynamic Realistic Video World Model with 4D Geometric Control(VerseCrafter:具有4D几何控制的动态逼真视频世界模型) [05:34] 🔍 The Illusion of Specialization: Unveiling the Domain-Invariant "Standing Committee" in Mixture-of-Experts Models(专业化的幻象:揭示混合专家模型中的领域不变“常务委员会”) [06:09] 🎯 Few Tokens Matter: Entropy Guided Attacks on Vision-Language Models(少数令牌至关重要:针对视觉语言模型的熵引导攻击) [06:40] 🎥 Plenoptic Video Generation(全光视频生成) [07:12] ⚖ Agent-as-a-Judge(智能体作为评审者) [07:43] 📄 DocDancer: Towards Agentic Document-Grounded Information Seeking(DocDancer:面向智能体化的文档驱动信息检索) [08:20] 🧠 Re-Align: Structured Reasoning-guided Alignment for In-Context Image Generation and Editing(Re-Align:基于结构化推理引导对齐的上下文图像生成与编辑) [09:05] 🧠 DiffCoT: Diffusion-styled Chain-of-Thought Reasoning in LLMs(DiffCoT:大语言模型中的扩散风格思维链推理) 【关注我们】 您还可以在以下平台找到我们,获得播客内容以外更多信息 小红书: AI速递

10分钟
99+
4个月前

2026.01.08 | 熵加权微调保旧学;演化技能网络不断进阶

HuggingFace 每日AI论文速递

本期的 15 篇论文如下: [00:21] ⚖ Entropy-Adaptive Fine-Tuning: Resolving Confident Conflicts to Mitigate Forgetting(熵自适应微调:解决置信冲突以缓解遗忘) [01:15] 🧠 Evolving Programmatic Skill Networks(演化式程序化技能网络) [01:51] 🧠 Atlas: Orchestrating Heterogeneous Models and Tools for Multi-Domain Complex Reasoning(Atlas:面向多领域复杂推理的异构模型与工具编排框架) [02:31] 📊 Benchmark^2: Systematic Evaluation of LLM Benchmarks(基准测试的基准测试:大语言模型评估基准的系统性评估) [03:12] 🎬 Klear: Unified Multi-Task Audio-Video Joint Generation(Klear:统一的多任务音视频联合生成) [03:53] 🎬 Choreographing a World of Dynamic Objects(动态物体的编排:一个通用生成式流水线) [04:36] ✅ Agentic Rubrics as Contextual Verifiers for SWE Agents(作为上下文验证器的智能评分标准在软件工程代理中的应用) [05:11] ⚗ MDAgent2: Large Language Model for Code Generation and Knowledge Q&A in Molecular Dynamics(MDAgent2:用于分子动力学代码生成与知识问答的大语言模型) [05:55] 🚀 E-GRPO: High Entropy Steps Drive Effective Reinforcement Learning for Flow Models(E-GRPO:高熵步驱动流模型的有效强化学习) [06:53] 🛡 RedBench: A Universal Dataset for Comprehensive Red Teaming of Large Language Models(RedBench:一个用于大型语言模型全面红队测试的通用数据集) [07:36] 📊 EpiQAL: Benchmarking Large Language Models in Epidemiological Question Answering for Enhanced Alignment and Reasoning(EpiQAL:面向增强对齐与推理的流行病学问答大语言模型基准评测) [08:15] 🧠 Enhancing Linguistic Competence of Language Models through Pre-training with Language Learning Tasks(通过语言学习任务预训练增强语言模型的语言能力) [08:48] 🔬 Why LLMs Aren't Scientists Yet: Lessons from Four Autonomous Research Attempts(为什么大语言模型还不是科学家:来自四次自主研究尝试的教训) [09:25] 🤖 ThinkRL-Edit: Thinking in Reinforcement Learning for Reasoning-Centric Image Editing(ThinkRL-Edit:基于强化学习的思维式推理中心图像编辑) [10:17] 🧠 MAGMA: A Multi-Graph based Agentic Memory Architecture for AI Agents(MAGMA:一种基于多图的AI智能体记忆架构) 【关注我们】 您还可以在以下平台找到我们,获得播客内容以外更多信息 小红书: AI速递

11分钟
95
4个月前

2026.01.07 | 无限深度任意采样;端到端语音转录分离

HuggingFace 每日AI论文速递

本期的 15 篇论文如下: [00:25] 🔍 InfiniDepth: Arbitrary-Resolution and Fine-Grained Depth Estimation with Neural Implicit Fields(InfiniDepth:基于神经隐式场的任意分辨率与细粒度深度估计) [01:07] 🎙 MOSS Transcribe Diarize: Accurate Transcription with Speaker Diarization(MOSS转录与说话人分离:带说话人归属和时间戳的准确转录) [01:46] 🔬 SciEvalKit: An Open-source Evaluation Toolkit for Scientific General Intelligence(SciEvalKit:一个用于科学通用智能的开源评估工具包) [02:32] 🎬 LTX-2: Efficient Joint Audio-Visual Foundation Model(LTX-2:高效的联合视听基础模型) [03:26] 🦄 UniCorn: Towards Self-Improving Unified Multimodal Models through Self-Generated Supervision(UniCorn:通过自生成监督实现自改进统一多模态模型) [04:06] 🎨 DreamStyle: A Unified Framework for Video Stylization(DreamStyle:视频风格化的统一框架) [04:38] 🧠 CogFlow: Bridging Perception and Reasoning through Knowledge Internalization for Visual Mathematical Problem Solving(CogFlow:通过知识内化桥接感知与推理,用于视觉数学问题求解) [05:25] ⚡ MiMo-V2-Flash Technical Report(MiMo-V2-Flash 技术报告) [06:15] 🎮 NitroGen: An Open Foundation Model for Generalist Gaming Agents(NitroGen:通用游戏智能体的开放基础模型) [06:58] 🤖 SOP: A Scalable Online Post-Training System for Vision-Language-Action Models(SOP:一种可扩展的视觉-语言-动作模型在线后训练系统) [07:43] 🛡 OpenRT: An Open-Source Red Teaming Framework for Multimodal LLMs(OpenRT:一个用于多模态大语言模型的开源红队测试框架) [08:31] 📍 The Sonar Moment: Benchmarking Audio-Language Models in Audio Geo-Localization(声纳时刻:音频语言模型在音频地理定位中的基准测试) [09:14] 🔍 X-MuTeST: A Multilingual Benchmark for Explainable Hate Speech Detection and A Novel LLM-consulted Explanation Framework(X-MuTeST:一个用于可解释仇恨言论检测的多语言基准及一种新颖的LLM咨询解释框架) [09:57] 🧠 Parallel Latent Reasoning for Sequential Recommendation(并行潜在推理用于序列推荐) [10:27] 🤖 WebGym: Scaling Training Environments for Visual Web Agents with Realistic Tasks(WebGym:利用真实任务扩展视觉网络代理的训练环境) 【关注我们】 您还可以在以下平台找到我们,获得播客内容以外更多信息 小红书: AI速递

11分钟
99+
4个月前

2026.01.06 | K-EXAONE MoE;NextFlow统一序列建模多模态

HuggingFace 每日AI论文速递

本期的 15 篇论文如下: [00:21] 🧠 K-EXAONE Technical Report(K-EXAONE技术报告) [00:56] 🚀 NextFlow: Unified Sequential Modeling Activates Multimodal Understanding and Generation(NextFlow:统一序列建模激活多模态理解与生成) [01:36] 🎭 DreamID-V:Bridging the Image-to-Video Gap for High-Fidelity Face Swapping via Diffusion Transformer(DreamID-V:通过扩散Transformer弥合图像到视频的鸿沟以实现高保真人脸交换) [02:19] 🎨 VAR RL Done Right: Tackling Asynchronous Policy Conflicts in Visual Autoregressive Generation(VAR强化学习优化之道:解决视觉自回归生成中的异步策略冲突) [03:04] 🚀 GARDO: Reinforcing Diffusion Models without Reward Hacking(GARDO:无需奖励黑客攻击的扩散模型强化方法) [03:41] 🎨 VINO: A Unified Visual Generator with Interleaved OmniModal Context(VINO:一种具有交错式全模态上下文的统一视觉生成器) [04:17] ♾ InfiniteVGGT: Visual Geometry Grounded Transformer for Endless Streams(InfiniteVGGT:面向无尽流数据的视觉几何基础Transformer) [04:54] 🧠 Can LLMs Predict Their Own Failures? Self-Awareness via Internal Circuits(大型语言模型能否预测自身失败?通过内部电路实现自我感知) [05:23] 🚀 Falcon-H1R: Pushing the Reasoning Frontiers with a Hybrid Model for Efficient Test-Time Scaling(Falcon-H1R:通过混合模型实现高效测试时扩展,推动推理前沿) [05:57] 🔄 Talk2Move: Reinforcement Learning for Text-Instructed Object-Level Geometric Transformation in Scenes(Talk2Move:基于强化学习的文本指令场景物体几何变换框架) [06:43] 🔄 Recursive Language Models(递归语言模型) [07:12] 🧠 KV-Embedding: Training-free Text Embedding via Internal KV Re-routing in Decoder-only LLMs(KV-嵌入:通过仅解码器大语言模型内部KV重路由实现免训练文本嵌入) [07:51] ⚠ COMPASS: A Framework for Evaluating Organization-Specific Policy Alignment in LLMs(COMPASS:评估大语言模型中组织特定政策对齐性的框架) [08:52] 🛰 Toward Stable Semi-Supervised Remote Sensing Segmentation via Co-Guidance and Co-Fusion(通过协同引导与协同融合实现稳定的半监督遥感分割) [09:40] 🧱 SWE-Lego: Pushing the Limits of Supervised Fine-tuning for Software Issue Resolving(SWE-Lego:推动软件问题解决的监督微调极限) 【关注我们】 您还可以在以下平台找到我们,获得播客内容以外更多信息 小红书: AI速递

10分钟
99+
4个月前

2026.01.05 | Agent流水线提速;4D建模平民化

HuggingFace 每日AI论文速递

本期的 12 篇论文如下: [00:22] 🤖 Youtu-Agent: Scaling Agent Productivity with Automated Generation and Hybrid Policy Optimization(Youtu-Agent:通过自动化生成与混合策略优化扩展智能体生产力) [00:52] 🌍 NeoVerse: Enhancing 4D World Model with in-the-wild Monocular Videos(NeoVerse:利用野外单目视频增强4D世界模型) [01:27] 🤖 Avatar Forcing: Real-Time Interactive Head Avatar Generation for Natural Conversation(Avatar Forcing:面向自然对话的实时交互式头部虚拟人生成) [01:59] 🚀 SenseNova-MARS: Empowering Multimodal Agentic Reasoning and Search via Reinforcement Learning(SenseNova-MARS:通过强化学习赋能多模态代理推理与搜索) [02:35] 🎭 Taming Hallucinations: Boosting MLLMs' Video Understanding via Counterfactual Video Generation(驯服幻觉:通过反事实视频生成提升多模态大语言模型的视频理解能力) [03:14] 🎬 AdaGaR: Adaptive Gabor Representation for Dynamic Scene Reconstruction(AdaGaR:用于动态场景重建的自适应Gabor表示) [03:47] 🧠 Deep Delta Learning(深度Delta学习) [04:11] 🧠 Nested Learning: The Illusion of Deep Learning Architectures(嵌套学习:深度学习架构的幻象) [04:47] 🧠 Diversity or Precision? A Deep Dive into Next Token Prediction(多样性还是精确性?深入探究下一个词元预测) [05:23] 🧠 Fast-weight Product Key Memory(快速权重乘积键值记忆) [05:58] 🧬 InfoSynth: Information-Guided Benchmark Synthesis for LLMs(InfoSynth:面向大语言模型的信息引导基准合成框架) [06:28] 🌀 MorphAny3D: Unleashing the Power of Structured Latent in 3D Morphing(MorphAny3D:释放结构化隐空间在3D形变中的力量) 【关注我们】 您还可以在以下平台找到我们,获得播客内容以外更多信息 小红书: AI速递

7分钟
99+
4个月前

【月末特辑】12月最火AI论文 | 代码智能全链路落地;开源模型推理代理双突破

HuggingFace 每日AI论文速递

本期的 10 篇论文如下: [00:29] TOP1(🔥279) | 🧠 From Code Foundation Models to Agents and Applications: A Practical Guide to Code Intelligence(从代码基础模型到智能体与应用:代码智能实用指南) [02:22] TOP2(🔥242) | 🚀 DeepSeek-V3.2: Pushing the Frontier of Open Large Language Models(DeepSeek-V3.2:推动开放大型语言模型前沿) [04:45] TOP3(🔥217) | 🚀 Z-Image: An Efficient Image Generation Foundation Model with Single-Stream Diffusion Transformer(Z-Image:基于单流扩散Transformer的高效图像生成基础模型) [06:47] TOP4(🔥195) | ⚙ DataFlow: An LLM-Driven Framework for Unified Data Preparation and Workflow Automation in the Era of Data-Centric AI(DataFlow:面向数据为中心AI时代的统一数据准备与工作流自动化LLM驱动框架) [09:19] TOP5(🔥181) | 🎬 LongVT: Incentivizing "Thinking with Long Videos" via Native Tool Calling(LongVT:通过原生工具调用激励“长视频思考”) [11:39] TOP6(🔥167) | 🤖 Live Avatar: Streaming Real-time Audio-Driven Avatar Generation with Infinite Length(实时虚拟化身:基于无限时长的流式实时音频驱动化身生成) [13:15] TOP7(🔥163) | 🎬 Kling-Omni Technical Report(Kling-Omni技术报告) [15:12] TOP8(🔥149) | 📊 DAComp: Benchmarking Data Agents across the Full Data Intelligence Lifecycle(DAComp:跨全数据智能生命周期的数据智能体基准测试) [17:44] TOP9(🔥146) | 🧠 Qwen3-VL Technical Report(Qwen3-VL 技术报告) [20:34] TOP10(🔥128) | 🎬 Wan-Move: Motion-controllable Video Generation via Latent Trajectory Guidance(Wan-Move:通过潜在轨迹引导实现运动可控的视频生成) 【关注我们】 您还可以在以下平台找到我们,获得播客内容以外更多信息 小红书: AI速递

23分钟
99+
4个月前

加入我们的 Discord

与播客爱好者一起交流

立即加入

扫描微信二维码

添加微信好友,获取更多播客资讯

微信二维码

播放列表

自动播放下一个

播放列表还是空的

去找些喜欢的节目添加进来吧