节目列表: HuggingFace 每日AI论文速递 - EarsOnMe

2026.01.06 | K-EXAONE MoE；NextFlow统一序列建模多模态

本期的 15 篇论文如下： [00:21] 🧠 K-EXAONE Technical Report（K-EXAONE技术报告） [00:56] 🚀 NextFlow: Unified Sequential Modeling Activates Multimodal Understanding and Generation（NextFlow：统一序列建模激活多模态理解与生成） [01:36] 🎭 DreamID-V:Bridging the Image-to-Video Gap for High-Fidelity Face Swapping via Diffusion Transformer（DreamID-V：通过扩散Transformer弥合图像到视频的鸿沟以实现高保真人脸交换） [02:19] 🎨 VAR RL Done Right: Tackling Asynchronous Policy Conflicts in Visual Autoregressive Generation（VAR强化学习优化之道：解决视觉自回归生成中的异步策略冲突） [03:04] 🚀 GARDO: Reinforcing Diffusion Models without Reward Hacking（GARDO：无需奖励黑客攻击的扩散模型强化方法） [03:41] 🎨 VINO: A Unified Visual Generator with Interleaved OmniModal Context（VINO：一种具有交错式全模态上下文的统一视觉生成器） [04:17] ♾ InfiniteVGGT: Visual Geometry Grounded Transformer for Endless Streams（InfiniteVGGT：面向无尽流数据的视觉几何基础Transformer） [04:54] 🧠 Can LLMs Predict Their Own Failures? Self-Awareness via Internal Circuits（大型语言模型能否预测自身失败？通过内部电路实现自我感知） [05:23] 🚀 Falcon-H1R: Pushing the Reasoning Frontiers with a Hybrid Model for Efficient Test-Time Scaling（Falcon-H1R：通过混合模型实现高效测试时扩展，推动推理前沿） [05:57] 🔄 Talk2Move: Reinforcement Learning for Text-Instructed Object-Level Geometric Transformation in Scenes（Talk2Move：基于强化学习的文本指令场景物体几何变换框架） [06:43] 🔄 Recursive Language Models（递归语言模型） [07:12] 🧠 KV-Embedding: Training-free Text Embedding via Internal KV Re-routing in Decoder-only LLMs（KV-嵌入：通过仅解码器大语言模型内部KV重路由实现免训练文本嵌入） [07:51] ⚠ COMPASS: A Framework for Evaluating Organization-Specific Policy Alignment in LLMs（COMPASS：评估大语言模型中组织特定政策对齐性的框架） [08:52] 🛰 Toward Stable Semi-Supervised Remote Sensing Segmentation via Co-Guidance and Co-Fusion（通过协同引导与协同融合实现稳定的半监督遥感分割） [09:40] 🧱 SWE-Lego: Pushing the Limits of Supervised Fine-tuning for Software Issue Resolving（SWE-Lego：推动软件问题解决的监督微调极限）【关注我们】您还可以在以下平台找到我们，获得播客内容以外更多信息小红书: AI速递

10分钟

99+

4周前

2026.01.05 | Agent流水线提速；4D建模平民化

HuggingFace 每日AI论文速递

本期的 12 篇论文如下： [00:22] 🤖 Youtu-Agent: Scaling Agent Productivity with Automated Generation and Hybrid Policy Optimization（Youtu-Agent：通过自动化生成与混合策略优化扩展智能体生产力） [00:52] 🌍 NeoVerse: Enhancing 4D World Model with in-the-wild Monocular Videos（NeoVerse：利用野外单目视频增强4D世界模型） [01:27] 🤖 Avatar Forcing: Real-Time Interactive Head Avatar Generation for Natural Conversation（Avatar Forcing：面向自然对话的实时交互式头部虚拟人生成） [01:59] 🚀 SenseNova-MARS: Empowering Multimodal Agentic Reasoning and Search via Reinforcement Learning（SenseNova-MARS：通过强化学习赋能多模态代理推理与搜索） [02:35] 🎭 Taming Hallucinations: Boosting MLLMs' Video Understanding via Counterfactual Video Generation（驯服幻觉：通过反事实视频生成提升多模态大语言模型的视频理解能力） [03:14] 🎬 AdaGaR: Adaptive Gabor Representation for Dynamic Scene Reconstruction（AdaGaR：用于动态场景重建的自适应Gabor表示） [03:47] 🧠 Deep Delta Learning（深度Delta学习） [04:11] 🧠 Nested Learning: The Illusion of Deep Learning Architectures（嵌套学习：深度学习架构的幻象） [04:47] 🧠 Diversity or Precision? A Deep Dive into Next Token Prediction（多样性还是精确性？深入探究下一个词元预测） [05:23] 🧠 Fast-weight Product Key Memory（快速权重乘积键值记忆） [05:58] 🧬 InfoSynth: Information-Guided Benchmark Synthesis for LLMs（InfoSynth：面向大语言模型的信息引导基准合成框架） [06:28] 🌀 MorphAny3D: Unleashing the Power of Structured Latent in 3D Morphing（MorphAny3D：释放结构化隐空间在3D形变中的力量）【关注我们】您还可以在以下平台找到我们，获得播客内容以外更多信息小红书: AI速递

7分钟

99+

4周前

【周末特辑】1月第1周最火AI论文 | mHC 稳梯度；思维景观 RAG 读长文

HuggingFace 每日AI论文速递

本期的 5 篇论文如下： [00:33] TOP1(🔥132) | 🧠 mHC: Manifold-Constrained Hyper-Connections（mHC：流形约束的超连接） [02:32] TOP2(🔥100) | 🧠 Mindscape-Aware Retrieval Augmented Generation for Improved Long Context Understanding（面向提升长文本理解的思维景观感知检索增强生成） [04:45] TOP3(🔥94) | 🎬 InsertAnywhere: Bridging 4D Scene Geometry and Diffusion Models for Realistic Video Object Insertion（InsertAnywhere：连接4D场景几何与扩散模型以实现逼真的视频对象插入） [07:17] TOP4(🔥86) | 🔗 Coupling Experts and Routers in Mixture-of-Experts via an Auxiliary Loss（通过辅助损失耦合专家混合模型中的专家与路由器） [09:23] TOP5(🔥62) | 🎬 LiveTalk: Real-Time Multimodal Interactive Video Diffusion via Improved On-Policy Distillation（LiveTalk：通过改进的策略内蒸馏实现实时多模态交互式视频扩散）【关注我们】您还可以在以下平台找到我们，获得播客内容以外更多信息小红书: AI速递

11分钟

99+

1个月前

2026.01.02 | 语义密度压缩；扩散边画边想

HuggingFace 每日AI论文速递

本期的 3 篇论文如下： [00:19] 🧠 Dynamic Large Concept Models: Latent Reasoning in an Adaptive Semantic Space（动态大型概念模型：自适应语义空间中的潜在推理） [00:56] 🧠 DiffThinker: Towards Generative Multimodal Reasoning with Diffusion Models（DiffThinker：基于扩散模型的生成式多模态推理） [01:27] 🔄 On the Role of Discreteness in Diffusion LLMs（论离散性在扩散语言模型中的作用）【关注我们】您还可以在以下平台找到我们，获得播客内容以外更多信息小红书: AI速递

2分钟

99+

1个月前

【月末特辑】12月最火AI论文 | 代码智能全链路落地；开源模型推理代理双突破

HuggingFace 每日AI论文速递

本期的 10 篇论文如下： [00:29] TOP1(🔥279) | 🧠 From Code Foundation Models to Agents and Applications: A Practical Guide to Code Intelligence（从代码基础模型到智能体与应用：代码智能实用指南） [02:22] TOP2(🔥242) | 🚀 DeepSeek-V3.2: Pushing the Frontier of Open Large Language Models（DeepSeek-V3.2：推动开放大型语言模型前沿） [04:45] TOP3(🔥217) | 🚀 Z-Image: An Efficient Image Generation Foundation Model with Single-Stream Diffusion Transformer（Z-Image：基于单流扩散Transformer的高效图像生成基础模型） [06:47] TOP4(🔥195) | ⚙ DataFlow: An LLM-Driven Framework for Unified Data Preparation and Workflow Automation in the Era of Data-Centric AI（DataFlow：面向数据为中心AI时代的统一数据准备与工作流自动化LLM驱动框架） [09:19] TOP5(🔥181) | 🎬 LongVT: Incentivizing "Thinking with Long Videos" via Native Tool Calling（LongVT：通过原生工具调用激励“长视频思考”） [11:39] TOP6(🔥167) | 🤖 Live Avatar: Streaming Real-time Audio-Driven Avatar Generation with Infinite Length（实时虚拟化身：基于无限时长的流式实时音频驱动化身生成） [13:15] TOP7(🔥163) | 🎬 Kling-Omni Technical Report（Kling-Omni技术报告） [15:12] TOP8(🔥149) | 📊 DAComp: Benchmarking Data Agents across the Full Data Intelligence Lifecycle（DAComp：跨全数据智能生命周期的数据智能体基准测试） [17:44] TOP9(🔥146) | 🧠 Qwen3-VL Technical Report（Qwen3-VL 技术报告） [20:34] TOP10(🔥128) | 🎬 Wan-Move: Motion-controllable Video Generation via Latent Trajectory Guidance（Wan-Move：通过潜在轨迹引导实现运动可控的视频生成）【关注我们】您还可以在以下平台找到我们，获得播客内容以外更多信息小红书: AI速递

23分钟

99+

1个月前

2026.01.01 | 小模型也能原生外挂；30B-MoE智体逼近大模型

HuggingFace 每日AI论文速递

本期的 15 篇论文如下： [00:22] 🚀 Youtu-LLM: Unlocking the Native Agentic Potential for Lightweight Large Language Models（Youtu-LLM：解锁轻量级大语言模型的原生智能体潜力） [01:00] 🤖 Let It Flow: Agentic Crafting on Rock and Roll, Building the ROME Model within an Open Agentic Learning Ecosystem（任其流动：摇滚乐上的智能体构建，在开放智能体学习生态系统中建立ROME模型） [01:52] 🧠 mHC: Manifold-Constrained Hyper-Connections（mHC：流形约束的超连接） [02:25] 🔍 GaMO: Geometry-aware Multi-view Diffusion Outpainting for Sparse-View 3D Reconstruction（GaMO：面向稀疏视图三维重建的几何感知多视角扩散外绘） [03:11] 🔮 Scaling Open-Ended Reasoning to Predict the Future（扩展开放端推理以预测未来） [04:00] 🧠 AI Meets Brain: Memory Systems from Cognitive Neuroscience to Autonomous Agents（AI遇见大脑：从认知神经科学到自主智能体的记忆系统） [04:32] 🎬 PhyGDPO: Physics-Aware Groupwise Direct Preference Optimization for Physically Consistent Text-to-Video Generation（PhyGDPO：面向物理一致性文本到视频生成的物理感知分组直接偏好优化） [05:16] 🦾 GR-Dexter Technical Report（GR-Dexter技术报告） [05:59] 🎬 SpaceTimePilot: Generative Rendering of Dynamic Scenes Across Space and Time（SpaceTimePilot：跨时空动态场景的生成式渲染） [06:56] 🔍 Fantastic Reasoning Behaviors and Where to Find Them: Unsupervised Discovery of the Reasoning Process（神奇推理行为及其发现：推理过程的无监督探索） [07:28] 🧠 BEDA: Belief Estimation as Probabilistic Constraints for Performing Strategic Dialogue Acts（BEDA：将信念估计作为执行策略性对话行为的概率约束） [08:06] 🧭 Forging Spatial Intelligence: A Roadmap of Multi-Modal Data Pre-Training for Autonomous Systems（锻造空间智能：面向自主系统的多模态数据预训练路线图） [08:47] 🧠 Figure It Out: Improving the Frontier of Reasoning with Active Visual Thinking（图式求解：通过主动视觉思维提升推理前沿） [09:20] 🎯 Factorized Learning for Temporally Grounded Video-Language Models（面向时序定位视频语言模型的因子化学习） [09:59] 🎞 Pretraining Frame Preservation in Autoregressive Video Memory Compression（自回归视频记忆压缩中的预训练帧保留）【关注我们】您还可以在以下平台找到我们，获得播客内容以外更多信息小红书: AI速递

10分钟

99+

1个月前

2025.12.31 | 粗模精雕UltraShape；涂鸦编辑DreamOmni3

HuggingFace 每日AI论文速递

本期的 6 篇论文如下： [00:24] 🧊 UltraShape 1.0: High-Fidelity 3D Shape Generation via Scalable Geometric Refinement（UltraShape 1.0：通过可扩展几何精化的高保真3D形状生成） [01:00] 🎨 DreamOmni3: Scribble-based Editing and Generation（DreamOmni3：基于涂鸦的编辑与生成） [01:34] 🧠 End-to-End Test-Time Training for Long Context（面向长上下文的端到端测试时训练） [02:18] 🔬 Evaluating Parameter Efficient Methods for RLVR（评估强化学习可验证奖励中的参数高效方法） [03:02] 🔍 GraphLocator: Graph-guided Causal Reasoning for Issue Localization（GraphLocator：基于图引导因果推理的缺陷定位方法） [03:35] ⚠ GateBreaker: Gate-Guided Attacks on Mixture-of-Expert LLMs（GateBreaker：针对专家混合大语言模型的基于门控的引导攻击）【关注我们】您还可以在以下平台找到我们，获得播客内容以外更多信息小红书: AI速递

4分钟

46

1个月前

2025.12.30 | ERC耦合路由与专家；LiveTalk实时视频对话

HuggingFace 每日AI论文速递

本期的 15 篇论文如下： [00:24] 🔗 Coupling Experts and Routers in Mixture-of-Experts via an Auxiliary Loss（通过辅助损失耦合专家混合模型中的专家与路由器） [01:07] 🎬 LiveTalk: Real-Time Multimodal Interactive Video Diffusion via Improved On-Policy Distillation（LiveTalk：通过改进的策略内蒸馏实现实时多模态交互式视频扩散） [01:55] 🌍 Yume-1.5: A Text-Controlled Interactive World Generation Model（Yume-1.5：一种文本控制的交互式世界生成模型） [02:30] 🔍 SmartSnap: Proactive Evidence Seeking for Self-Verifying Agents（SmartSnap：自验证智能体的主动证据寻求范式） [02:59] 🔮 Diffusion Knows Transparency: Repurposing Video Diffusion for Transparent Object Depth and Normal Estimation（扩散模型知晓透明度：将视频扩散模型重新用于透明物体的深度与法线估计） [03:40] 🎯 SpotEdit: Selective Region Editing in Diffusion Transformers（SpotEdit：扩散变换器中的选择性区域编辑） [04:23] 🚀 Dream-VL & Dream-VLA: Open Vision-Language and Vision-Language-Action Models with Diffusion Language Model Backbone（Dream-VL与Dream-VLA：基于扩散语言模型骨干的开放视觉-语言与视觉-语言-动作模型） [05:09] 🔍 GRAN-TED: Generating Robust, Aligned, and Nuanced Text Embedding for Diffusion Models（GRAN-TED：为扩散模型生成鲁棒、对齐且细致的文本嵌入） [05:56] 🤖 Act2Goal: From World Model To General Goal-conditioned Policy（Act2Goal：从世界模型到通用目标条件策略） [06:31] ⚡ Stream-DiffVSR: Low-Latency Streamable Video Super-Resolution via Auto-Regressive Diffusion（Stream-DiffVSR：基于自回归扩散的低延迟可流式视频超分辨率） [06:59] 🌐 Web World Models（Web世界模型） [07:34] 🚀 DiRL: An Efficient Post-Training Framework for Diffusion Language Models（DiRL：一种高效的扩散语言模型后训练框架） [08:19] 🎬 Video-BrowseComp: Benchmarking Agentic Video Research on Open Web（Video-BrowseComp：面向开放网络的智能体视频研究基准测试） [09:02] 🧠 Training AI Co-Scientists Using Rubric Rewards（使用评分标准奖励训练AI科研助手） [09:39] 🧩 Monadic Context Engineering（单子上下文工程）【关注我们】您还可以在以下平台找到我们，获得播客内容以外更多信息小红书: AI速递

10分钟

45

1个月前

2025.12.29 | 鸟瞰式检索提效小模型；4D扩散一键插入逼真物体

HuggingFace 每日AI论文速递

本期的 13 篇论文如下： [00:27] 🧠 Mindscape-Aware Retrieval Augmented Generation for Improved Long Context Understanding（面向提升长文本理解的思维景观感知检索增强生成） [01:07] 🎬 InsertAnywhere: Bridging 4D Scene Geometry and Diffusion Models for Realistic Video Object Insertion（InsertAnywhere：连接4D场景几何与扩散模型以实现逼真的视频对象插入） [01:46] 🤖 MAI-UI Technical Report: Real-World Centric Foundation GUI Agents（MAI-UI技术报告：面向真实世界的通用图形用户界面智能体） [02:22] 👁 UniPercept: Towards Unified Perceptual-Level Image Understanding across Aesthetics, Quality, Structure, and Texture（UniPercept：迈向跨美学、质量、结构与纹理的统一感知级图像理解） [03:04] 🎨 ProEdit: Inversion-based Editing From Prompts Done Right（ProEdit：基于反演的提示编辑的正确方法） [03:58] ⏱ TimeBill: Time-Budgeted Inference for Large Language Models（TimeBill：面向大语言模型的时间预算推理框架） [04:37] 🧠 See Less, See Right: Bi-directional Perceptual Shaping For Multimodal Reasoning（少看，看对：用于多模态推理的双向感知塑造） [05:16] 🌦 Omni-Weather: Unified Multimodal Foundation Model for Weather Generation and Understanding（Omni-Weather：用于天气生成与理解的多模态统一基础模型） [05:48] 🧠 SVBench: Evaluation of Video Generation Models on Social Reasoning（SVBench：视频生成模型在社会推理能力上的评估） [06:27] 🔍 InSight-o3: Empowering Multimodal Foundation Models with Generalized Visual Search（InSight-o3：赋能多模态基础模型实现广义视觉搜索） [07:15] 🎨 SlideTailor: Personalized Presentation Slide Generation for Scientific Papers（SlideTailor：面向科研论文的个性化演示文稿幻灯片生成） [08:11] 🤖 SWE-RM: Execution-free Feedback For Software Engineering Agents（SWE-RM：面向软件工程智能体的无执行反馈机制） [08:48] ⚡ A 58-Addition, Rank-23 Scheme for General 3x3 Matrix Multiplication（一种用于通用3x3矩阵乘法的58次加法、秩23方案）【关注我们】您还可以在以下平台找到我们，获得播客内容以外更多信息小红书: AI速递

9分钟

88

1个月前

【周末特辑】12月第5周最火AI论文 | DataFlow炼数工厂上线；AI科学家跑不完闭环

HuggingFace 每日AI论文速递

本期的 5 篇论文如下： [00:42] TOP1(🔥188) | ⚙ DataFlow: An LLM-Driven Framework for Unified Data Preparation and Workflow Automation in the Era of Data-Centric AI（DataFlow：面向数据为中心AI时代的统一数据准备与工作流自动化LLM驱动框架） [02:34] TOP2(🔥105) | 🔬 Probing Scientific General Intelligence of LLMs with Scientist-Aligned Workflows（通过科学家对齐的工作流程探究大语言模型的科学通用智能） [05:04] TOP3(🔥85) | 🎬 SemanticGen: Video Generation in Semantic Space（SemanticGen：在语义空间中的视频生成） [07:03] TOP4(🔥73) | 🔍 Step-DeepResearch Technical Report（Step-DeepResearch技术报告） [09:31] TOP5(🔥71) | 🧠 PhysBrain: Human Egocentric Data as a Bridge from Vision Language Models to Physical Intelligence（PhysBrain：以人类第一人称数据为桥梁，从视觉语言模型迈向物理智能）【关注我们】您还可以在以下平台找到我们，获得播客内容以外更多信息小红书: AI速递

12分钟

99+

1个月前

2025.12.26 | 暗号token涨点视觉推理；3D便签本让视频长脑子

HuggingFace 每日AI论文速递

本期的 6 篇论文如下： [00:19] 🧠 Latent Implicit Visual Reasoning（潜在隐式视觉推理） [00:56] 🎬 Spatia: Video Generation with Updatable Spatial Memory（Spatia：基于可更新空间记忆的视频生成） [01:36] 🧠 Schoenfeld's Anatomy of Mathematical Reasoning by Language Models（基于舍恩菲尔德理论的语言模型数学推理解剖） [02:11] 🔍 How Much 3D Do Video Foundation Models Encode?（视频基础模型编码了多少3D信息？） [02:58] 🎯 VA-$π$: Variational Policy Alignment for Pixel-Aware Autoregressive Generation（VA-π：面向像素感知自回归生成的变分策略对齐） [03:36] 🚀 GTR-Turbo: Merged Checkpoint is Secretly a Free Teacher for Agentic VLM Training（GTR-Turbo：合并的检查点秘密成为智能体化视觉语言模型训练的免费教师）【关注我们】您还可以在以下平台找到我们，获得播客内容以外更多信息小红书: AI速递

4分钟

85

1个月前

2025.12.25 | 四维动态理解刷新VLM；单卡200倍速生成高清视频

HuggingFace 每日AI论文速递

本期的 14 篇论文如下： [00:20] 🧠 Learning to Reason in 4D: Dynamic Spatial Understanding for Vision Language Models（学习在四维空间中推理：视觉语言模型的动态空间理解） [01:11] ⚡ TurboDiffusion: Accelerating Video Diffusion Models by 100-200 Times（TurboDiffusion：将视频扩散模型加速100-200倍） [01:52] 🧭 T2AV-Compass: Towards Unified Evaluation for Text-to-Audio-Video Generation（T2AV-Compass：迈向文本到音视频生成的统一评估） [02:38] 🎬 DreaMontage: Arbitrary Frame-Guided One-Shot Video Generation（DreaMontage：基于任意帧引导的单镜头视频生成） [03:21] 🔍 Beyond Memorization: A Multi-Modal Ordinal Regression Benchmark to Expose Popularity Bias in Vision-Language Models（超越记忆：一个多模态序数回归基准揭示视觉语言模型中的流行度偏差） [04:07] 🎬 HiStream: Efficient High-Resolution Video Generation via Redundancy-Eliminated Streaming（HiStream：通过消除冗余的流式处理实现高效高分辨率视频生成） [04:52] 🚀 Nemotron 3 Nano: Open, Efficient Mixture-of-Experts Hybrid Mamba-Transformer Model for Agentic Reasoning（Nemotron 3 Nano：用于智能体推理的开放、高效混合专家Mamba-Transformer模型） [05:38] 🔍 TokSuite: Measuring the Impact of Tokenizer Choice on Language Model Behavior（TokSuite：衡量分词器选择对语言模型行为的影响） [06:12] 🚀 NVIDIA Nemotron 3: Efficient and Open Intelligence（NVIDIA Nemotron 3：高效且开放的智能模型） [06:57] 🎬 Learning from Next-Frame Prediction: Autoregressive Video Modeling Encodes Effective Representations（基于下一帧预测的学习：自回归视频建模编码有效表示） [07:27] 🎬 Streaming Video Instruction Tuning（流式视频指令微调） [08:02] 🧠 Multi-hop Reasoning via Early Knowledge Alignment（通过早期知识对齐实现多跳推理） [08:43] 📊 SWE-EVO: Benchmarking Coding Agents in Long-Horizon Software Evolution Scenarios（SWE-EVO：在长周期软件演化场景中评估编码智能体的基准） [09:24] 🏆 LLM Swiss Round: Aggregating Multi-Benchmark Performance via Competitive Swiss-System Dynamics（LLM瑞士轮：通过竞争性瑞士制动态聚合多基准性能）【关注我们】您还可以在以下平台找到我们，获得播客内容以外更多信息小红书: AI速递

10分钟

82

1个月前