HuggingFace 每日AI论文速递 - 节目列表

2026.02.10 | ReAlign零训弥合图文隙;MOVA同步生成视音频

HuggingFace 每日AI论文速递

【赞助商】 通勤路上就听AI每周谈。AI每周谈,每周带你回顾上周AI大事 传送门 🔗https://www.xiaoyuzhoufm.com/podcast/688a34636f5a275f1cba40fd 【目录】 本期的 15 篇论文如下: [00:34] 🔀 Modality Gap-Driven Subspace Alignment Training Paradigm For Multimodal Large Language Models(面向多模态大语言模型的模态间隙驱动的子空间对齐训练范式) [01:23] 🎬 MOVA: Towards Scalable and Synchronized Video-Audio Generation(MOVA:迈向可扩展且同步的视频-音频生成) [02:03] 📈 QuantaAlpha: An Evolutionary Framework for LLM-Driven Alpha Mining(QuantaAlpha:一种基于大语言模型驱动的阿尔法挖掘进化框架) [02:51] 🤖 Recurrent-Depth VLA: Implicit Test-Time Compute Scaling of Vision-Language-Action Models via Latent Iterative Reasoning(循环深度视觉语言动作模型:通过潜在迭代推理实现隐式测试时计算扩展) [03:24] 🎯 Alleviating Sparse Rewards by Modeling Step-Wise and Long-Term Sampling Effects in Flow-Based GRPO(通过建模逐步与长期采样效应缓解流式GRPO中的稀疏奖励问题) [04:22] ⚡ LLaDA2.1: Speeding Up Text Diffusion via Token Editing(LLaDA2.1:通过令牌编辑加速文本扩散) [05:02] 📱 GEBench: Benchmarking Image Generation Models as GUI Environments(GEBench:将图像生成模型作为GUI环境的基准测试) [05:52] 🎬 Demo-ICL: In-Context Learning for Procedural Video Knowledge Acquisition(Demo-ICL:面向过程性视频知识获取的上下文学习) [06:42] 🧠 Learning Query-Aware Budget-Tier Routing for Runtime Agent Memory(学习查询感知的预算层级路由以实现运行时智能体记忆) [07:20] 📈 Weak-Driven Learning: How Weak Agents make Strong Agents Stronger(弱驱动学习:弱智能体如何使强智能体更强) [08:12] 📊 LOCA-bench: Benchmarking Language Agents Under Controllable and Extreme Context Growth(LOCA-bench:在可控与极端上下文增长下对语言智能体进行基准测试) [08:59] 🔍 GISA: A Benchmark for General Information-Seeking Assistant(GISA:通用信息寻求助手基准) [09:56] 🧭 WorldCompass: Reinforcement Learning for Long-Horizon World Models(WorldCompass:面向长视野世界模型的强化学习) [10:35] 🧪 LatentChem: From Textual CoT to Latent Thinking in Chemical Reasoning(LatentChem:从文本思维链到化学推理中的潜在思维) [11:20] 🧭 Theory of Space: Can Foundation Models Construct Spatial Beliefs through Active Exploration?(空间理论:基础模型能否通过主动探索构建空间信念?) 【关注我们】 您还可以在以下平台找到我们,获得播客内容以外更多信息 小红书: AI速递

12分钟
78
2个月前

2026.02.09 | AI问诊如住院医;互动悟规则才是真智能

HuggingFace 每日AI论文速递

【赞助商】 通勤路上就听AI每周谈。AI每周谈,每周带你回顾上周AI大事 传送门 🔗https://www.xiaoyuzhoufm.com/podcast/688a34636f5a275f1cba40fd 【目录】 本期的 15 篇论文如下: [00:32] 🩺 Baichuan-M3: Modeling Clinical Inquiry for Reliable Medical Decision-Making(Baichuan-M3:建模临床问询以实现可靠的医疗决策) [01:17] 🧭 OdysseyArena: Benchmarking Large Language Models For Long-Horizon, Active and Inductive Interactions(奥德赛竞技场:面向长视野、主动与归纳交互的大语言模型基准测试) [02:03] 📈 On the Entropy Dynamics in Reinforcement Fine-Tuning of Large Language Models(论大型语言模型强化微调中的熵动态) [02:47] 🎯 F-GRPO: Don't Let Your Policy Learn the Obvious and Forget the Rare(F-GRPO:别让你的策略学会常见而遗忘罕见) [03:48] ⚖ MSign: An Optimizer Preventing Training Instability in Large Language Models via Stable Rank Restoration(MSign:一种通过稳定秩恢复防止大语言模型训练不稳定的优化器) [04:33] 🤖 DreamDojo: A Generalist Robot World Model from Large-Scale Human Videos(DreamDojo:基于大规模人类视频的通用机器人世界模型) [05:14] 🧠 Self-Improving Multilingual Long Reasoning via Translation-Reasoning Integrated Training(通过翻译-推理集成训练实现自我改进的多语言长推理) [06:07] 🧮 Judging What We Cannot Solve: A Consequence-Based Approach for Oracle-Free Evaluation of Research-Level Math(评判我们无法解决的问题:一种基于后果的无监督研究级数学评估方法) [06:46] 🎯 POINTS-GUI-G: GUI-Grounding Journey(POINTS-GUI-G:图形用户界面基础任务之旅) [07:45] 🧠 MemGUI-Bench: Benchmarking Memory of Mobile GUI Agents in Dynamic Environments(MemGUI-Bench:动态环境中移动GUI代理内存能力的基准测试) [08:29] 🧠 Back to Basics: Revisiting Exploration in Reinforcement Learning for LLM Reasoning via Generative Probabilities(回归基础:通过生成概率重新审视强化学习在LLM推理中的探索) [09:18] 🎵 AudioSAE: Towards Understanding of Audio-Processing Models with Sparse AutoEncoders(AudioSAE:利用稀疏自编码器理解音频处理模型) [09:59] ⚡ Canzona: A Unified, Asynchronous, and Load-Balanced Framework for Distributed Matrix-based Optimizers(Canzona:一个统一、异步且负载均衡的分布式矩阵优化器框架) [11:02] 🧠 InftyThink+: Effective and Efficient Infinite-Horizon Reasoning via Reinforcement Learning(InftyThink+:通过强化学习实现高效且有效的无限视野推理) [11:49] 🧠 PlanViz: Evaluating Planning-Oriented Image Generation and Editing for Computer-Use Tasks(PlanViz:面向计算机使用任务的规划导向图像生成与编辑评估) 【关注我们】 您还可以在以下平台找到我们,获得播客内容以外更多信息 小红书: AI速递

13分钟
99+
3个月前

2026.02.06 | RLVR去长度偏见;长镜头不换记忆

HuggingFace 每日AI论文速递

【赞助商】 通勤路上就听AI每周谈。AI每周谈,每周带你回顾上周AI大事 传送门 🔗https://www.xiaoyuzhoufm.com/podcast/688a34636f5a275f1cba40fd 【目录】 本期的 15 篇论文如下: [00:29] 📊 Length-Unbiased Sequence Policy Optimization: Revealing and Controlling Response Length Variation in RLVR(长度无偏序列策略优化:揭示与控制RLVR中的响应长度变化) [01:20] 🎬 Context Forcing: Consistent Autoregressive Video Generation with Long Context(上下文强制:具有长上下文的一致自回归视频生成) [02:11] 🧠 RISE-Video: Can Video Generators Decode Implicit World Rules?(RISE-Video:视频生成器能否解码隐含的世界规则?) [02:57] 🔮 ProAct: Agentic Lookahead in Interactive Environments(ProAct:交互式环境中的前瞻性智能体规划) [03:47] ⚡ Dr. Kernel: Reinforcement Learning Done Right for Triton Kernel Generations(Dr. Kernel:用于Triton内核生成的强化学习正确实现) [04:39] 🧭 Steering LLMs via Scalable Interactive Oversight(通过可扩展的交互式监督引导大型语言模型) [05:27] 🧠 Grounding and Enhancing Informativeness and Utility in Dataset Distillation(数据集约简中信息性与实用性的基础与增强) [06:13] 🧪 Retrieval-Infused Reasoning Sandbox: A Benchmark for Decoupling Retrieval and Reasoning Capabilities(检索增强推理沙盒:一个解耦检索与推理能力的基准) [07:07] 🔍 Semantic Search over 9 Million Mathematical Theorems(对超过900万个数学定理的语义搜索) [07:57] 🕷 Spider-Sense: Intrinsic Risk Sensing for Efficient Agent Defense with Hierarchical Adaptive Screening(Spider-Sense:基于内在风险感知的高效智能体防御与分层自适应筛查) [08:39] 🧪 CAR-bench: Evaluating the Consistency and Limit-Awareness of LLM Agents under Real-World Uncertainty(CAR-bench:评估现实世界不确定性下LLM智能体的一致性与极限感知能力) [09:30] 🤖 InterPrior: Scaling Generative Control for Physics-Based Human-Object Interactions(InterPrior:基于物理的人-物交互生成控制扩展框架) [10:22] 🎬 Thinking in Frames: How Visual Context and Test-Time Scaling Empower Video Reasoning(帧中思考:视觉上下文与测试时缩放如何赋能视频推理) [11:14] 🔄 SwimBird: Eliciting Switchable Reasoning Mode in Hybrid Autoregressive MLLMs(SwimBird:在混合自回归多模态大语言模型中引发可切换推理模式) [12:20] 🔍 SAGE: Benchmarking and Improving Retrieval for Deep Research Agents(SAGE:深度研究智能体的检索基准评测与性能提升) 【关注我们】 您还可以在以下平台找到我们,获得播客内容以外更多信息 小红书: AI速递

13分钟
99+
3个月前

2026.02.05 | ERNIE 5.0统一模态;FASA稀疏注意力省内存

HuggingFace 每日AI论文速递

【赞助商】 通勤路上就听AI每周谈。AI每周谈,每周带你回顾上周AI大事 传送门 🔗https://www.xiaoyuzhoufm.com/podcast/688a34636f5a275f1cba40fd 【目录】 本期的 15 篇论文如下: [00:29] 🧠 ERNIE 5.0 Technical Report(ERNIE 5.0 技术报告) [01:11] ⚡ FASA: Frequency-aware Sparse Attention(FASA:基于频率感知的稀疏注意力机制) [02:01] 📊 Training Data Efficiency in Multimodal Process Reward Models(多模态过程奖励模型中的训练数据效率研究) [02:44] 🤖 WideSeek-R1: Exploring Width Scaling for Broad Information Seeking via Multi-Agent Reinforcement Learning(WideSeek-R1:通过多智能体强化学习探索宽度扩展以实现广泛信息检索) [03:28] ⚡ OmniSIFT: Modality-Asymmetric Token Compression for Efficient Omni-modal Large Language Models(OmniSIFT:面向高效全模态大语言模型的模态非对称令牌压缩) [04:21] ⚡ HySparse: A Hybrid Sparse Attention Architecture with Oracle Token Selection and KV Cache Sharing(HySparse:一种具有预言机令牌选择和KV缓存共享的混合稀疏注意力架构) [05:02] 🤖 EgoActor: Grounding Task Planning into Spatial-aware Egocentric Actions for Humanoid Robots via Visual-Language Models(EgoActor:通过视觉语言模型将任务规划落地为空间感知的具身动作) [06:05] 🎬 Quant VideoGen: Auto-Regressive Long Video Generation via 2-Bit KV-Cache Quantization(Quant VideoGen:通过2位KV缓存量化实现自回归长视频生成) [06:59] 🤖 SoMA: A Real-to-Sim Neural Simulator for Robotic Soft-body Manipulation(SoMA:面向机器人软体操作的真实到仿真神经模拟器) [07:44] 🔍 TIDE: Trajectory-based Diagnostic Evaluation of Test-Time Improvement in LLM Agents(TIDE:基于轨迹的LLM智能体测试时改进诊断评估) [08:21] 🧠 Semantic Routing: Exploring Multi-Layer LLM Feature Weighting for Diffusion Transformers(语义路由:探索扩散变换器中多层LLM特征加权的融合框架) [09:12] 🤖 Rethinking the Trust Region in LLM Reinforcement Learning(重新思考大语言模型强化学习中的信任区域) [09:54] ♻ Residual Context Diffusion Language Models(残差上下文扩散语言模型) [10:40] 🧱 HY3D-Bench: Generation of 3D Assets(HY3D-Bench:3D资产的生成) [11:34] 🎨 AutoFigure: Generating and Refining Publication-Ready Scientific Illustrations(AutoFigure:生成与优化可直接用于发表的科学插图) 【关注我们】 您还可以在以下平台找到我们,获得播客内容以外更多信息 小红书: AI速递

12分钟
99+
3个月前

2026.02.04 | 看图写代码省token;临时组队降成本

HuggingFace 每日AI论文速递

【赞助商】 通勤路上就听AI每周谈。AI每周谈,每周带你回顾上周AI大事 传送门 🔗https://www.xiaoyuzhoufm.com/podcast/688a34636f5a275f1cba40fd 【目录】 本期的 15 篇论文如下: [00:32] 👁 CodeOCR: On the Effectiveness of Vision Language Models in Code Understanding(CodeOCR:视觉语言模型在代码理解中的有效性研究) [01:18] 🤖 AOrchestra: Automating Sub-Agent Creation for Agentic Orchestration(AOrchestra:面向智能体编排的子智能体自动创建) [02:01] 🔍 No Global Plan in Chain-of-Thought: Uncover the Latent Planning Horizon of LLMs(思维链中无全局规划:揭示大语言模型的潜在规划视野) [02:43] 🔗 daVinci-Agency: Unlocking Long-Horizon Agency Data-Efficiently(daVinci-Agency:高效解锁长程智能体工作流) [03:23] 🧠 Research on World Models Is Not Merely Injecting World Knowledge into Specific Tasks(世界模型研究并非仅将世界知识注入特定任务) [04:06] 🎬 3D-Aware Implicit Motion Control for View-Adaptive Human Video Generation(面向视角自适应人体视频生成的3D感知隐式运动控制) [04:56] 🤖 MARS: Modular Agent with Reflective Search for Automated AI Research(MARS:具备反思搜索能力的模块化智能体用于自动化人工智能研究) [05:41] 📊 CoBA-RL: Capability-Oriented Budget Allocation for Reinforcement Learning in LLMs(CoBA-RL:面向大语言模型强化学习的基于能力的预算分配算法) [06:25] ⚡ Diversity-Preserved Distribution Matching Distillation for Fast Visual Synthesis(保持多样性的分布匹配蒸馏用于快速视觉合成) [07:19] 🤖 SWE-World: Building Software Engineering Agents in Docker-Free Environments(SWE-World:在无Docker环境中构建软件工程智能体) [08:09] 🤖 SWE-Master: Unleashing the Potential of Software Engineering Agents via Post-Training(SWE-Master:通过后训练释放软件工程智能体的潜力) [09:14] 📊 Learning Query-Specific Rubrics from Human Preferences for DeepResearch Report Generation(基于人类偏好的查询特定评分规则学习用于深度研究报告生成) [10:08] ⚡ Parallel-Probe: Towards Efficient Parallel Thinking via 2D Probing(Parallel-Probe:通过二维探测实现高效并行思维) [10:59] 🎯 Unified Personalized Reward Model for Vision Generation(视觉生成的统一个性化奖励模型) [11:47] 🔍 WideSeek: Advancing Wide Research via Multi-Agent Scaling(WideSeek:通过多智能体扩展推进广度研究) 【关注我们】 您还可以在以下平台找到我们,获得播客内容以外更多信息 小红书: AI速递

12分钟
99+
3个月前

2026.02.03 | 分阶段训练统一动作空间;MoE+视觉编码器并行智能体

HuggingFace 每日AI论文速递

【赞助商】 通勤路上就听AI每周谈。AI每周谈,每周带你回顾上周AI大事 传送门 🔗https://www.xiaoyuzhoufm.com/podcast/688a34636f5a275f1cba40fd 【目录】 本期的 15 篇论文如下: [00:32] 🤖 Green-VLA: Staged Vision-Language-Action Model for Generalist Robots(Green-VLA:面向通用机器人的分阶段视觉-语言-动作模型) [01:24] 🤖 Kimi K2.5: Visual Agentic Intelligence(Kimi K2.5:视觉智能体) [02:09] 🔍 Vision-DeepResearch: Incentivizing DeepResearch Capability in Multimodal Large Language Models(Vision-DeepResearch:激励多模态大语言模型中的深度研究能力) [03:08] 🔍 Vision-DeepResearch Benchmark: Rethinking Visual and Textual Search for Multimodal Large Language Models(Vision-DeepResearch 基准:重新思考多模态大语言模型的视觉与文本搜索) [03:57] 🔄 Closing the Loop: Universal Repository Representation with RPG-Encoder(闭环:基于RPG-Encoder的通用代码仓库表示方法) [04:39] 🧠 UniReason 1.0: A Unified Reasoning Framework for World Knowledge Aligned Image Generation and Editing(UniReason 1.0:面向世界知识对齐图像生成与编辑的统一推理框架) [05:23] 📊 WildGraphBench: Benchmarking GraphRAG with Wild-Source Corpora(WildGraphBench:基于野生来源语料库的图检索增强生成基准测试) [06:28] 📚 FS-Researcher: Test-Time Scaling for Long-Horizon Research Tasks with File-System-Based Agents(FS-Researcher:基于文件系统的智能体在长周期研究任务中的测试时扩展) [07:23] 🚀 SWE-Universe: Scale Real-World Verifiable Environments to Millions(SWE-Universe:将真实世界可验证的软件工程环境扩展至百万规模) [08:13] 📚 Wiki Live Challenge: Challenging Deep Research Agents with Expert-Level Wikipedia Articles(维基实时挑战:用专家级维基百科文章挑战深度研究智能体) [08:58] ⚖ SLIME: Stabilized Likelihood Implicit Margin Enforcement for Preference Optimization(SLIME:基于稳定似然的隐式边界强化偏好优化) [09:45] 🎨 PixelGen: Pixel Diffusion Beats Latent Diffusion with Perceptual Loss(PixelGen:基于感知损失的像素扩散模型超越潜在扩散模型) [10:38] ⚙ RLAnything: Forge Environment, Policy, and Reward Model in Completely Dynamic RL System(RLAnything:在完全动态强化学习系统中锻造环境、策略与奖励模型) [11:30] 🧠 Mind-Brush: Integrating Agentic Cognitive Search and Reasoning into Image Generation(思维画笔:将智能认知搜索与推理融入图像生成) [12:17] 🎬 PISCES: Annotation-free Text-to-Video Post-Training via Optimal Transport-Aligned Rewards(PISCES:基于最优传输对齐奖励的无标注文本到视频后训练方法) 【关注我们】 您还可以在以下平台找到我们,获得播客内容以外更多信息 小红书: AI速递

13分钟
99+
3个月前

2026.02.02 | ASTRA合成轨迹炼工具;THINKSAFE自对齐保安全

HuggingFace 每日AI论文速递

【赞助商】 通勤路上就听AI每周谈。AI每周谈,每周带你回顾上周AI大事 传送门 🔗https://www.xiaoyuzhoufm.com/podcast/688a34636f5a275f1cba40fd 【目录】 本期的 15 篇论文如下: [00:33] 🤖 ASTRA: Automated Synthesis of agentic Trajectories and Reinforcement Arenas(ASTRA:基于自动化轨迹合成与强化学习竞技场的智能体训练框架) [01:22] 🛡 THINKSAFE: Self-Generated Safety Alignment for Reasoning Models(THINKSAFE:推理模型的自生成安全对齐) [02:18] 🧠 TTCS: Test-Time Curriculum Synthesis for Self-Evolving(TTCS:面向自进化的测试时课程合成) [03:09] 🍌 PaperBanana: Automating Academic Illustration for AI Scientists(PaperBanana:面向AI科学家的学术插图自动化生成框架) [03:51] 🔬 FourierSampler: Unlocking Non-Autoregressive Potential in Diffusion Language Models via Frequency-Guided Generation(傅里叶采样器:通过频率引导生成解锁扩散语言模型的非自回归潜力) [04:40] 🧠 ReGuLaR: Variational Latent Reasoning Guided by Rendered Chain-of-Thought(ReGuLaR:基于渲染思维链指导的变分潜在推理) [05:22] 🎯 SSL: Sweet Spot Learning for Differentiated Guidance in Agentic Optimization(SSL:基于甜点学习的差异化引导智能体优化) [06:02] 🎯 DenseGRPO: From Sparse to Dense Reward for Flow Matching Model Alignment(DenseGRPO:从稀疏奖励到稠密奖励的流匹配模型对齐方法) [07:08] 🧠 Pushing the Boundaries of Natural Reasoning: Interleaved Bonus from Formal-Logic Verification(突破自然推理的边界:形式逻辑验证的交织增益) [07:55] 📄 PaddleOCR-VL-1.5: Towards a Multi-Task 0.9B VLM for Robust In-the-Wild Document Parsing(PaddleOCR-VL-1.5:面向鲁棒野外文档解析的多任务0.9B视觉语言模型) [08:45] 🎬 DreamActor-M2: Universal Character Image Animation via Spatiotemporal In-Context Learning(DreamActor-M2:通过时空上下文学习的通用角色图像动画) [09:42] 🧠 MemOCR: Layout-Aware Visual Memory for Efficient Long-Horizon Reasoning(MemOCR:面向高效长程推理的布局感知视觉记忆) [10:24] 🦢 Golden Goose: A Simple Trick to Synthesize Unlimited RLVR Tasks from Unverifiable Internet Text(金鹅:一种从未经验证的互联网文本中合成无限RLVR任务的简单技巧) [11:13] 📊 Statistical Estimation of Adversarial Risk in Large Language Models under Best-of-N Sampling(大语言模型在最佳N采样下对抗性风险的统计估计) [12:00] ⚡ RM -RF: Reward Model for Run-Free Unit Test Evaluation(RM-RF:一种用于免运行单元测试评估的奖励模型) 【关注我们】 您还可以在以下平台找到我们,获得播客内容以外更多信息 小红书: AI速递

13分钟
95
3个月前

【月末特辑】1月最火AI论文 | mHC稳梯度;GDPO解多奖励

HuggingFace 每日AI论文速递

【赞助商】 通勤路上就听AI每周谈。AI每周谈,每周带你回顾上周AI大事 传送门 🔗https://www.xiaoyuzhoufm.com/podcast/688a34636f5a275f1cba40fd 【目录】 本期的 10 篇论文如下: [00:42] TOP1(🔥292) | 🧠 mHC: Manifold-Constrained Hyper-Connections(mHC:流形约束的超连接) [03:06] TOP2(🔥212) | 📈 GDPO: Group reward-Decoupled Normalization Policy Optimization for Multi-reward RL Optimization(GDPO:面向多奖励强化学习优化的组奖励解耦归一化策略优化) [04:45] TOP3(🔥209) | 🔍 Watching, Reasoning, and Searching: A Video Deep Research Benchmark on Open Web for Agentic Video Reasoning(观察、推理与搜索:面向智能体视频推理的开放网络视频深度研究基准) [06:59] TOP4(🔥193) | 👶 BabyVision: Visual Reasoning Beyond Language(BabyVision:超越语言的视觉推理) [08:57] TOP5(🔥190) | 🚀 STEP3-VL-10B Technical Report(STEP3-VL-10B 技术报告) [10:39] TOP6(🔥186) | 🤖 Agentic Reasoning for Large Language Models(大语言模型的智能体推理) [12:58] TOP7(🔥181) | 🧹 Can LLMs Clean Up Your Mess? A Survey of Application-Ready Data Preparation with LLMs(大语言模型能否清理你的数据?基于LLM的应用就绪数据准备综述) [15:19] TOP8(🔥171) | 🧠 LongCat-Flash-Thinking-2601 Technical Report(LongCat-Flash-Thinking-2601 技术报告) [17:22] TOP9(🔥165) | 🗺 Thinking with Map: Reinforced Parallel Map-Augmented Agent for Geolocalization(借助地图思考:用于地理定位的强化并行地图增强智能体) [19:17] TOP10(🔥158) | 🧠 Idea2Story: An Automated Pipeline for Transforming Research Concepts into Complete Scientific Narratives(Idea2Story:将研究概念转化为完整科学叙事的自动化流程) 【关注我们】 您还可以在以下平台找到我们,获得播客内容以外更多信息 小红书: AI速递

22分钟
99+
3个月前

2026.01.30 | 空间智能基准测不准;Idea2Story一键成文

HuggingFace 每日AI论文速递

【赞助商】 通勤路上就听AI每周谈。AI每周谈,每周带你回顾上周AI大事 传送门 🔗https://www.xiaoyuzhoufm.com/podcast/688a34636f5a275f1cba40fd 【目录】 本期的 15 篇论文如下: [00:29] 🧭 Everything in Its Place: Benchmarking Spatial Intelligence of Text-to-Image Models(万物归位:文本到图像模型空间智能基准测试) [01:21] 🧠 Idea2Story: An Automated Pipeline for Transforming Research Concepts into Complete Scientific Narratives(Idea2Story:将研究概念转化为完整科学叙事的自动化流程) [02:19] ⚡ Scaling Embeddings Outperforms Scaling Experts in Language Models(在语言模型中扩展嵌入层优于扩展专家混合) [02:58] 🔍 OCRVerse: Towards Holistic OCR in End-to-End Vision-Language Models(OCRVerse:迈向端到端视觉语言模型中的整体OCR) [03:39] 🤖 DynamicVLA: A Vision-Language-Action Model for Dynamic Object Manipulation(DynamicVLA:面向动态物体操作的视觉-语言-动作模型) [04:33] 🧠 MMFineReason: Closing the Multimodal Reasoning Gap via Open Data-Centric Methods(MMFineReason:通过开放数据为中心的方法弥合多模态推理鸿沟) [05:20] 🔺 PLANING: A Loosely Coupled Triangle-Gaussian Framework for Streaming 3D Reconstruction(PLANING:一种用于流式三维重建的松散耦合三角-高斯框架) [06:08] 🧠 ConceptMoE: Adaptive Token-to-Concept Compression for Implicit Compute Allocation(ConceptMoE:面向隐式计算分配的自适应令牌到概念压缩) [07:01] 🧩 AgentLongBench: A Controllable Long Benchmark For Long-Contexts Agents via Environment Rollouts(AgentLongBench:通过环境推演实现可控的长上下文智能体基准测试) [07:43] 🧠 Exploring Reasoning Reward Model for Agents(探索智能体推理奖励模型) [08:39] 🎤 Qwen3-ASR Technical Report(Qwen3-ASR技术报告) [09:27] 🚀 Language-based Trial and Error Falls Behind in the Era of Experience(经验时代下基于语言的试错方法已然落后) [10:16] 🌐 Typhoon-S: Minimal Open Post-Training for Sovereign Large Language Models(台风-S:主权大语言模型的最小化开放后训练方法) [11:02] ⚡ Scalable Power Sampling: Unlocking Efficient, Training-Free Reasoning for LLMs via Distribution Sharpening(可扩展的幂采样:通过分布锐化解锁LLM高效、免训练推理) [11:59] 🧠 MAD: Modality-Adaptive Decoding for Mitigating Cross-Modal Hallucinations in Multimodal Large Language Models(MAD:模态自适应解码用于缓解多模态大语言模型中的跨模态幻觉) 【关注我们】 您还可以在以下平台找到我们,获得播客内容以外更多信息 小红书: AI速递

13分钟
99+
3个月前

2026.01.29 | 难题优先补数学推理;LingBot生成交互平行世界

HuggingFace 每日AI论文速递

【赞助商】 通勤路上就听AI每周谈。AI每周谈,每周带你回顾上周AI大事 传送门 🔗https://www.xiaoyuzhoufm.com/podcast/688a34636f5a275f1cba40fd 【目录】 本期的 13 篇论文如下: [00:33] 🧠 Harder Is Better: Boosting Mathematical Reasoning via Difficulty-Aware GRPO and Multi-Aspect Question Reformulation(越难越好:通过难度感知GRPO与多角度问题重构提升数学推理能力) [01:21] 🌍 Advancing Open-source World Models(推进开源世界模型) [01:55] 🧠 DeepSeek-OCR 2: Visual Causal Flow(DeepSeek-OCR 2:视觉因果流) [02:58] 🚀 Spark: Strategic Policy-Aware Exploration via Dynamic Branching for Long-Horizon Agentic Learning(Spark:通过关键状态动态分支实现战略策略感知探索的长视野智能体学习) [03:49] 🔬 Innovator-VL: A Multimodal Large Language Model for Scientific Discovery(创新者-VL:面向科学发现的多模态大语言模型) [04:34] 🔄 Linear representations in language models can change dramatically over a conversation(语言模型中的线性表征在对话过程中会发生剧烈变化) [05:26] 🚀 SERA: Soft-Verified Efficient Repository Agents(SERA:软验证高效代码库智能体) [06:01] 🤖 OmegaUse: Building a General-Purpose GUI Agent for Autonomous Task Execution(OmegaUse:构建用于自主任务执行的通用图形用户界面代理) [06:46] 🤖 GDCNet: Generative Discrepancy Comparison Network for Multimodal Sarcasm Detection(GDCNet:用于多模态讽刺检测的生成式差异比较网络) [07:37] 🗣 SE-DiCoW: Self-Enrolled Diarization-Conditioned Whisper(SE-DiCoW:自注册的说话人日志条件化Whisper模型) [08:27] 📊 RIR-Mega-Speech: A Reverberant Speech Corpus with Comprehensive Acoustic Metadata and Reproducible Evaluation(RIR-Mega-Speech:一个包含全面声学元数据且可复现评估的混响语音语料库) [09:16] ✏ SketchDynamics: Exploring Free-Form Sketches for Dynamic Intent Expression in Animation Generation(SketchDynamics:探索自由手绘草图在动画生成中的动态意图表达) [10:07] 🚀 UPLiFT: Efficient Pixel-Dense Feature Upsampling with Local Attenders(UPLiFT:利用局部注意力机制实现高效像素密集特征上采样) 【关注我们】 您还可以在以下平台找到我们,获得播客内容以外更多信息 小红书: AI速递

11分钟
99+
3个月前

加入我们的 Discord

与播客爱好者一起交流

立即加入

扫描微信二维码

添加微信好友,获取更多播客资讯

微信二维码

播放列表

自动播放下一个

播放列表还是空的

去找些喜欢的节目添加进来吧