2026.02.17 | 查询锚定用户画像;量子原生数据库

HuggingFace 每日AI论文速递

【赞助商】 通勤路上就听AI每周谈。AI每周谈,每周带你回顾上周AI大事 传送门 🔗https://www.xiaoyuzhoufm.com/podcast/688a34636f5a275f1cba40fd 【目录】 本期的 15 篇论文如下: [00:29] 🧠 Query as Anchor: Scenario-Adaptive User Representation via Large Language Model(查询作为锚点:基于大型语言模型的场景自适应用户表征) [01:14] ⚛ Qute: Towards Quantum-Native Database(Qute:迈向量子原生数据库) [01:59] 🧠 InnoEval: On Research Idea Evaluation as a Knowledge-Grounded, Multi-Perspective Reasoning Problem(InnoEval:将研究思想评估视为知识驱动、多视角推理问题) [03:05] 🔍 REDSearcher: A Scalable and Cost-Efficient Framework for Long-Horizon Search Agents(REDSearcher:一种可扩展且经济高效的长视野搜索智能体框架) [03:56] 🚀 BitDance: Scaling Autoregressive Generative Models with Binary Tokens(BitDance:使用二进制令牌扩展自回归生成模型) [04:38] 🧠 Experiential Reinforcement Learning(经验性强化学习) [05:24] 🧠 Embed-RL: Reinforcement Learning for Reasoning-Driven Multimodal Embeddings(Embed-RL:基于强化学习的推理驱动多模态嵌入方法) [06:21] 🧩 UniWeTok: An Unified Binary Tokenizer with Codebook Size $\mathit{2^{128}}$ for Unified Multimodal Large Language Model(UniWeTok:一种用于统一多模态大语言模型的、具有$\mathit{2^{128}}$码本大小的统一二进制分词器) [07:13] 🔍 BrowseComp-$V^3$: A Visual, Vertical, and Verifiable Benchmark for Multimodal Browsing Agents(BrowseComp-V³:面向多模态浏览代理的视觉、垂直与可验证基准) [08:18] 🧠 LaViDa-R1: Advancing Reasoning for Unified Multimodal Diffusion Language Models(LaViDa-R1:推进统一多模态扩散语言模型的推理能力) [09:02] 🗣 Conversational Image Segmentation: Grounding Abstract Concepts with Scalable Supervision(对话式图像分割:通过可扩展监督将抽象概念落地) [10:00] 🧠 Nanbeige4.1-3B: A Small General Model that Reasons, Aligns, and Acts(Nanbeige4.1-3B:一个能够推理、对齐与行动的小型通用模型) [10:49] 🎨 FireRed-Image-Edit-1.0 Techinical Report(FireRed-图像编辑-1.0 技术报告) [11:26] 🧬 Data Darwinism Part I: Unlocking the Value of Scientific Data for Pre-training(数据达尔文主义第一部分:释放科学数据在预训练中的价值) [12:04] 🌐 WebWorld: A Large-Scale World Model for Web Agent Training(WebWorld:用于网络智能体训练的大规模世界模型) 【关注我们】 您还可以在以下平台找到我们,获得播客内容以外更多信息 小红书: AI速递

13分钟
79
1个月前

2026.02.16 | 特征激活补数据;区域蒸馏藏放大

HuggingFace 每日AI论文速递

【赞助商】 通勤路上就听AI每周谈。AI每周谈,每周带你回顾上周AI大事 传送门 🔗https://www.xiaoyuzhoufm.com/podcast/688a34636f5a275f1cba40fd 【目录】 本期的 15 篇论文如下: [00:30] 🧠 Less is Enough: Synthesizing Diverse Data in Feature Space of LLMs(少即是够:在大型语言模型特征空间中合成多样化数据) [01:19] 🔍 Zooming without Zooming: Region-to-Image Distillation for Fine-Grained Multimodal Perception(无需缩放:面向细粒度多模态感知的区域到图像蒸馏) [02:03] 🏥 MedXIAOHE: A Comprehensive Recipe for Building Medical MLLMs(MedXIAOHE:构建医疗多模态大语言模型的完整方案) [02:43] 🎯 OneVision-Encoder: Codec-Aligned Sparsity as a Foundational Principle for Multimodal Intelligence(OneVision-编码器:以编解码器对齐的稀疏性作为多模态智能的基础原则) [03:29] 🔬 What does RL improve for Visual Reasoning? A Frankenstein-Style Analysis(强化学习对视觉推理有何改进?一项弗兰肯斯坦式分析) [04:18] 🤖 RLinf-Co: Reinforcement Learning-Based Sim-Real Co-Training for VLA Models(RLinf-Co:基于强化学习的仿真-现实协同训练VLA模型) [05:05] 🤖 ABot-M0: VLA Foundation Model for Robotic Manipulation with Action Manifold Learning(ABot-M0:基于动作流形学习的机器人操作VLA基础模型) [05:53] 🎬 Towards Universal Video MLLMs with Attribute-Structured and Quality-Verified Instructions(迈向具有属性结构和质量验证指令的通用视频多模态大语言模型) [06:55] 🤝 Intelligent AI Delegation(智能AI委托框架) [07:49] 📍 GeoAgent: Learning to Geolocate Everywhere with Reinforced Geographic Characteristics(GeoAgent:通过强化地理特征学习实现无处不在的地理定位) [08:39] ⚙ BPDQ: Bit-Plane Decomposition Quantization on a Variable Grid for Large Language Models(BPDQ:基于可变网格的比特平面分解量化用于大语言模型) [09:37] ⚡ FLAC: Maximum Entropy RL via Kinetic Energy Regularized Bridge Matching(FLAC:通过动能正则化桥匹配实现最大熵强化学习) [10:14] 🔍 On Robustness and Chain-of-Thought Consistency of RL-Finetuned VLMs(关于RL微调视觉语言模型的鲁棒性与思维链一致性研究) [11:03] ⚡ DICE: Diffusion Large Language Models Excel at Generating CUDA Kernels(DICE:扩散大语言模型在生成CUDA内核方面表现出色) [11:48] ⚡ CoPE-VideoLM: Codec Primitives For Efficient Video Language Models(CoPE-VideoLM:面向高效视频语言模型的编解码器原语) 【关注我们】 您还可以在以下平台找到我们,获得播客内容以外更多信息 小红书: AI速递

13分钟
53
1个月前

2026.02.13 | 自演化AI难守安全;音频大模型统一token

HuggingFace 每日AI论文速递

【赞助商】 通勤路上就听AI每周谈。AI每周谈,每周带你回顾上周AI大事 传送门 🔗https://www.xiaoyuzhoufm.com/podcast/688a34636f5a275f1cba40fd 【目录】 本期的 15 篇论文如下: [00:31] ⚠ The Devil Behind Moltbook: Anthropic Safety is Always Vanishing in Self-Evolving AI Societies(魔书背后的魔鬼:在自我进化的AI社会中,人类安全价值总是趋于消失) [01:24] 🎵 MOSS-Audio-Tokenizer: Scaling Audio Tokenizers for Future Audio Foundation Models(MOSS-Audio-Tokenizer:为未来音频基础模型扩展音频分词器) [02:28] 🧠 Learning beyond Teacher: Generalized On-Policy Distillation with Reward Extrapolation(超越教师的学习:基于奖励外推的广义策略蒸馏) [03:05] 🤖 GigaBrain-0.5M*: a VLA That Learns From World Model-Based Reinforcement Learning(GigaBrain-0.5M*:一种通过世界模型强化学习训练的视觉-语言-动作模型) [03:56] ⚖ LawThinker: A Deep Research Legal Agent in Dynamic Environments(LawThinker:动态环境中的深度研究法律智能体) [04:33] 🔍 Think Longer to Explore Deeper: Learn to Explore In-Context via Length-Incentivized Reinforcement Learning(思之愈久,探之愈深:通过长度激励强化学习实现上下文内探索) [05:16] 🎨 Stroke of Surprise: Progressive Semantic Illusions in Vector Sketching(惊喜之笔:矢量草图绘制中的渐进式语义错觉) [06:01] 🚀 DeepGen 1.0: A Lightweight Unified Multimodal Model for Advancing Image Generation and Editing(DeepGen 1.0:一个用于推进图像生成与编辑的轻量级统一多模态模型) [06:55] 🧩 Composition-RL: Compose Your Verifiable Prompts for Reinforcement Learning of Large Language Models(Composition-RL:为大型语言模型强化学习组合可验证提示) [07:38] 🧠 Thinking with Drafting: Optical Decompression via Logical Reconstruction(思维与草稿:通过逻辑重构实现光学解压缩) [08:17] 🗳 dVoting: Fast Voting for dLLMs(dVoting:面向扩散大语言模型的快速投票推理方法) [09:09] 🤖 RISE: Self-Improving Robot Policy with Compositional World Model(RISE:基于组合世界模型的机器人策略自改进框架) [09:54] 🤖 $χ_{0}$: Resource-Aware Robust Manipulation via Taming Distributional Inconsistencies(χ₀:通过驯服分布不一致实现资源感知的鲁棒机器人操作) [10:48] 🤖 EgoHumanoid: Unlocking In-the-Wild Loco-Manipulation with Robot-Free Egocentric Demonstration(EgoHumanoid:利用无机器人自我中心演示解锁野外移动操作) [11:45] 🔍 Unveiling Implicit Advantage Symmetry: Why GRPO Struggles with Exploration and Difficulty Adaptation(揭示隐式优势对称性:为何GRPO在探索与难度适应中举步维艰) 【关注我们】 您还可以在以下平台找到我们,获得播客内容以外更多信息 小红书: AI速递

13分钟
72
1个月前

2026.02.12 | 稀疏MoE比肩GPT-5;GENIUS测流体智能

HuggingFace 每日AI论文速递

【赞助商】 通勤路上就听AI每周谈。AI每周谈,每周带你回顾上周AI大事 传送门 🔗https://www.xiaoyuzhoufm.com/podcast/688a34636f5a275f1cba40fd 【目录】 本期的 15 篇论文如下: [00:28] ⚡ Step 3.5 Flash: Open Frontier-Level Intelligence with 11B Active Parameters(Step 3.5 Flash:拥有110亿活跃参数的前沿级智能模型) [01:06] 🧠 GENIUS: Generative Fluid Intelligence Evaluation Suite(GENIUS:生成式流体智能评估套件) [01:46] 🤖 PhyCritic: Multimodal Critic Models for Physical AI(PhyCritic:面向物理人工智能的多模态评判模型) [02:18] ⚙ ASA: Training-Free Representation Engineering for Tool-Calling Agents(ASA:面向工具调用智能体的免训练表征工程) [02:59] 🧠 When to Memorize and When to Stop: Gated Recurrent Memory for Long-Context Reasoning(何时记忆与何时停止:用于长上下文推理的门控循环记忆) [03:38] 🧮 Towards Autonomous Mathematics Research(迈向自主数学研究) [04:15] 🎬 TimeChat-Captioner: Scripting Multi-Scene Videos with Time-Aware and Structural Audio-Visual Captions(TimeChat-Captioner:基于时间感知与结构化音视频描述的多场景视频脚本生成) [05:12] 🧠 G-LNS: Generative Large Neighborhood Search for LLM-Based Automatic Heuristic Design(G-LNS:基于大语言模型的生成式大邻域搜索自动启发式设计) [06:02] ⚙ FeatureBench: Benchmarking Agentic Coding for Complex Feature Development(FeatureBench:面向复杂功能开发的智能体编码基准测试) [06:44] 🧑 DataChef: Cooking Up Optimal Data Recipes for LLM Adaptation via Reinforcement Learning(DataChef:通过强化学习为LLM适应烹饪最优数据配方) [07:28] 🚀 ROCKET: Rapid Optimization via Calibration-guided Knapsack Enhanced Truncation for Efficient Model Compression(ROCKET:基于校准引导的背包增强截断的快速优化,用于高效模型压缩) [08:27] 📈 Online Causal Kalman Filtering for Stable and Effective Policy Optimization(在线因果卡尔曼滤波用于稳定有效的策略优化) [09:24] 🧠 Internalizing Meta-Experience into Memory for Guided Reinforcement Learning in Large Language Models(将元经验内化至记忆以指导大语言模型的强化学习) [10:06] 🗣 Ex-Omni: Enabling 3D Facial Animation Generation for Omni-modal Large Language Models(Ex-Omni:赋能全模态大语言模型生成3D面部动画) [10:47] 🔄 Data Repetition Beats Data Scaling in Long-CoT Supervised Fine-Tuning(在长链思维监督微调中,数据重复优于数据扩展) 【关注我们】 您还可以在以下平台找到我们,获得播客内容以外更多信息 小红书: AI速递

11分钟
99+
1个月前

2026.02.11 | OPUS对齐更新选数据;Code2World代码预演GUI

HuggingFace 每日AI论文速递

【赞助商】 通勤路上就听AI每周谈。AI每周谈,每周带你回顾上周AI大事 传送门 🔗https://www.xiaoyuzhoufm.com/podcast/688a34636f5a275f1cba40fd 【目录】 本期的 15 篇论文如下: [00:33] 🚀 OPUS: Towards Efficient and Principled Data Selection in Large Language Model Pre-training in Every Iteration(OPUS:迈向大规模语言模型预训练中高效且原理化的逐轮数据选择) [01:17] 💻 Code2World: A GUI World Model via Renderable Code Generation(Code2World:通过可渲染代码生成的GUI世界模型) [02:05] 🤖 UI-Venus-1.5 Technical Report(UI-Venus-1.5 技术报告) [02:58] 🧠 Chain of Mindset: Reasoning with Adaptive Cognitive Modes(思维链模式:基于自适应认知模式的推理) [03:52] 🧠 SkillRL: Evolving Agents via Recursive Skill-Augmented Reinforcement Learning(SkillRL:通过递归技能增强强化学习进化智能体) [04:29] 🔬 P1-VL: Bridging Visual Perception and Scientific Reasoning in Physics Olympiads(P1-VL:连接视觉感知与物理奥赛中的科学推理) [05:24] 🤖 Agent World Model: Infinity Synthetic Environments for Agentic Reinforcement Learning(智能体世界模型:面向智能体强化学习的无限合成环境) [05:58] 🔍 Prism: Spectral-Aware Block-Sparse Attention(Prism:基于频谱感知的块稀疏注意力机制) [06:41] ⚡ DLLM-Searcher: Adapting Diffusion Large Language Model for Search Agents(DLLM-Searcher:适配扩散大语言模型用于搜索智能体) [07:23] 🎬 Olaf-World: Orienting Latent Actions for Video World Modeling(Olaf-World:面向视频世界建模的潜在动作定向) [08:18] 🎨 Condition Errors Refinement in Autoregressive Image Generation with Diffusion Loss(基于扩散损失的图像自回归生成中的条件误差优化) [09:09] 🍌 Agent Banana: High-Fidelity Image Editing with Agentic Thinking and Tooling(智能体香蕉:基于智能体思维与工具的高保真图像编辑) [09:50] 🎯 SCALE: Self-uncertainty Conditioned Adaptive Looking and Execution for Vision-Language-Action Models(SCALE:基于自不确定度条件化的自适应视觉感知与执行视觉-语言-动作模型) [10:37] 🤖 BagelVLA: Enhancing Long-Horizon Manipulation via Interleaved Vision-Language-Action Generation(BagelVLA:通过交错式视觉-语言-动作生成增强长视野操作) [11:31] 🎬 TokenTrim: Inference-Time Token Pruning for Autoregressive Long Video Generation(TokenTrim:用于自回归长视频生成的推理时令牌剪枝) 【关注我们】 您还可以在以下平台找到我们,获得播客内容以外更多信息 小红书: AI速递

12分钟
99+
1个月前

2026.02.10 | ReAlign零训弥合图文隙;MOVA同步生成视音频

HuggingFace 每日AI论文速递

【赞助商】 通勤路上就听AI每周谈。AI每周谈,每周带你回顾上周AI大事 传送门 🔗https://www.xiaoyuzhoufm.com/podcast/688a34636f5a275f1cba40fd 【目录】 本期的 15 篇论文如下: [00:34] 🔀 Modality Gap-Driven Subspace Alignment Training Paradigm For Multimodal Large Language Models(面向多模态大语言模型的模态间隙驱动的子空间对齐训练范式) [01:23] 🎬 MOVA: Towards Scalable and Synchronized Video-Audio Generation(MOVA:迈向可扩展且同步的视频-音频生成) [02:03] 📈 QuantaAlpha: An Evolutionary Framework for LLM-Driven Alpha Mining(QuantaAlpha:一种基于大语言模型驱动的阿尔法挖掘进化框架) [02:51] 🤖 Recurrent-Depth VLA: Implicit Test-Time Compute Scaling of Vision-Language-Action Models via Latent Iterative Reasoning(循环深度视觉语言动作模型:通过潜在迭代推理实现隐式测试时计算扩展) [03:24] 🎯 Alleviating Sparse Rewards by Modeling Step-Wise and Long-Term Sampling Effects in Flow-Based GRPO(通过建模逐步与长期采样效应缓解流式GRPO中的稀疏奖励问题) [04:22] ⚡ LLaDA2.1: Speeding Up Text Diffusion via Token Editing(LLaDA2.1:通过令牌编辑加速文本扩散) [05:02] 📱 GEBench: Benchmarking Image Generation Models as GUI Environments(GEBench:将图像生成模型作为GUI环境的基准测试) [05:52] 🎬 Demo-ICL: In-Context Learning for Procedural Video Knowledge Acquisition(Demo-ICL:面向过程性视频知识获取的上下文学习) [06:42] 🧠 Learning Query-Aware Budget-Tier Routing for Runtime Agent Memory(学习查询感知的预算层级路由以实现运行时智能体记忆) [07:20] 📈 Weak-Driven Learning: How Weak Agents make Strong Agents Stronger(弱驱动学习:弱智能体如何使强智能体更强) [08:12] 📊 LOCA-bench: Benchmarking Language Agents Under Controllable and Extreme Context Growth(LOCA-bench:在可控与极端上下文增长下对语言智能体进行基准测试) [08:59] 🔍 GISA: A Benchmark for General Information-Seeking Assistant(GISA:通用信息寻求助手基准) [09:56] 🧭 WorldCompass: Reinforcement Learning for Long-Horizon World Models(WorldCompass:面向长视野世界模型的强化学习) [10:35] 🧪 LatentChem: From Textual CoT to Latent Thinking in Chemical Reasoning(LatentChem:从文本思维链到化学推理中的潜在思维) [11:20] 🧭 Theory of Space: Can Foundation Models Construct Spatial Beliefs through Active Exploration?(空间理论:基础模型能否通过主动探索构建空间信念?) 【关注我们】 您还可以在以下平台找到我们,获得播客内容以外更多信息 小红书: AI速递

12分钟
78
1个月前

2026.02.09 | AI问诊如住院医;互动悟规则才是真智能

HuggingFace 每日AI论文速递

【赞助商】 通勤路上就听AI每周谈。AI每周谈,每周带你回顾上周AI大事 传送门 🔗https://www.xiaoyuzhoufm.com/podcast/688a34636f5a275f1cba40fd 【目录】 本期的 15 篇论文如下: [00:32] 🩺 Baichuan-M3: Modeling Clinical Inquiry for Reliable Medical Decision-Making(Baichuan-M3:建模临床问询以实现可靠的医疗决策) [01:17] 🧭 OdysseyArena: Benchmarking Large Language Models For Long-Horizon, Active and Inductive Interactions(奥德赛竞技场:面向长视野、主动与归纳交互的大语言模型基准测试) [02:03] 📈 On the Entropy Dynamics in Reinforcement Fine-Tuning of Large Language Models(论大型语言模型强化微调中的熵动态) [02:47] 🎯 F-GRPO: Don't Let Your Policy Learn the Obvious and Forget the Rare(F-GRPO:别让你的策略学会常见而遗忘罕见) [03:48] ⚖ MSign: An Optimizer Preventing Training Instability in Large Language Models via Stable Rank Restoration(MSign:一种通过稳定秩恢复防止大语言模型训练不稳定的优化器) [04:33] 🤖 DreamDojo: A Generalist Robot World Model from Large-Scale Human Videos(DreamDojo:基于大规模人类视频的通用机器人世界模型) [05:14] 🧠 Self-Improving Multilingual Long Reasoning via Translation-Reasoning Integrated Training(通过翻译-推理集成训练实现自我改进的多语言长推理) [06:07] 🧮 Judging What We Cannot Solve: A Consequence-Based Approach for Oracle-Free Evaluation of Research-Level Math(评判我们无法解决的问题:一种基于后果的无监督研究级数学评估方法) [06:46] 🎯 POINTS-GUI-G: GUI-Grounding Journey(POINTS-GUI-G:图形用户界面基础任务之旅) [07:45] 🧠 MemGUI-Bench: Benchmarking Memory of Mobile GUI Agents in Dynamic Environments(MemGUI-Bench:动态环境中移动GUI代理内存能力的基准测试) [08:29] 🧠 Back to Basics: Revisiting Exploration in Reinforcement Learning for LLM Reasoning via Generative Probabilities(回归基础:通过生成概率重新审视强化学习在LLM推理中的探索) [09:18] 🎵 AudioSAE: Towards Understanding of Audio-Processing Models with Sparse AutoEncoders(AudioSAE:利用稀疏自编码器理解音频处理模型) [09:59] ⚡ Canzona: A Unified, Asynchronous, and Load-Balanced Framework for Distributed Matrix-based Optimizers(Canzona:一个统一、异步且负载均衡的分布式矩阵优化器框架) [11:02] 🧠 InftyThink+: Effective and Efficient Infinite-Horizon Reasoning via Reinforcement Learning(InftyThink+:通过强化学习实现高效且有效的无限视野推理) [11:49] 🧠 PlanViz: Evaluating Planning-Oriented Image Generation and Editing for Computer-Use Tasks(PlanViz:面向计算机使用任务的规划导向图像生成与编辑评估) 【关注我们】 您还可以在以下平台找到我们,获得播客内容以外更多信息 小红书: AI速递

13分钟
99+
1个月前

2026.02.06 | RLVR去长度偏见;长镜头不换记忆

HuggingFace 每日AI论文速递

【赞助商】 通勤路上就听AI每周谈。AI每周谈,每周带你回顾上周AI大事 传送门 🔗https://www.xiaoyuzhoufm.com/podcast/688a34636f5a275f1cba40fd 【目录】 本期的 15 篇论文如下: [00:29] 📊 Length-Unbiased Sequence Policy Optimization: Revealing and Controlling Response Length Variation in RLVR(长度无偏序列策略优化:揭示与控制RLVR中的响应长度变化) [01:20] 🎬 Context Forcing: Consistent Autoregressive Video Generation with Long Context(上下文强制:具有长上下文的一致自回归视频生成) [02:11] 🧠 RISE-Video: Can Video Generators Decode Implicit World Rules?(RISE-Video:视频生成器能否解码隐含的世界规则?) [02:57] 🔮 ProAct: Agentic Lookahead in Interactive Environments(ProAct:交互式环境中的前瞻性智能体规划) [03:47] ⚡ Dr. Kernel: Reinforcement Learning Done Right for Triton Kernel Generations(Dr. Kernel:用于Triton内核生成的强化学习正确实现) [04:39] 🧭 Steering LLMs via Scalable Interactive Oversight(通过可扩展的交互式监督引导大型语言模型) [05:27] 🧠 Grounding and Enhancing Informativeness and Utility in Dataset Distillation(数据集约简中信息性与实用性的基础与增强) [06:13] 🧪 Retrieval-Infused Reasoning Sandbox: A Benchmark for Decoupling Retrieval and Reasoning Capabilities(检索增强推理沙盒:一个解耦检索与推理能力的基准) [07:07] 🔍 Semantic Search over 9 Million Mathematical Theorems(对超过900万个数学定理的语义搜索) [07:57] 🕷 Spider-Sense: Intrinsic Risk Sensing for Efficient Agent Defense with Hierarchical Adaptive Screening(Spider-Sense:基于内在风险感知的高效智能体防御与分层自适应筛查) [08:39] 🧪 CAR-bench: Evaluating the Consistency and Limit-Awareness of LLM Agents under Real-World Uncertainty(CAR-bench:评估现实世界不确定性下LLM智能体的一致性与极限感知能力) [09:30] 🤖 InterPrior: Scaling Generative Control for Physics-Based Human-Object Interactions(InterPrior:基于物理的人-物交互生成控制扩展框架) [10:22] 🎬 Thinking in Frames: How Visual Context and Test-Time Scaling Empower Video Reasoning(帧中思考:视觉上下文与测试时缩放如何赋能视频推理) [11:14] 🔄 SwimBird: Eliciting Switchable Reasoning Mode in Hybrid Autoregressive MLLMs(SwimBird:在混合自回归多模态大语言模型中引发可切换推理模式) [12:20] 🔍 SAGE: Benchmarking and Improving Retrieval for Deep Research Agents(SAGE:深度研究智能体的检索基准评测与性能提升) 【关注我们】 您还可以在以下平台找到我们,获得播客内容以外更多信息 小红书: AI速递

13分钟
99+
1个月前

2026.02.05 | ERNIE 5.0统一模态;FASA稀疏注意力省内存

HuggingFace 每日AI论文速递

【赞助商】 通勤路上就听AI每周谈。AI每周谈,每周带你回顾上周AI大事 传送门 🔗https://www.xiaoyuzhoufm.com/podcast/688a34636f5a275f1cba40fd 【目录】 本期的 15 篇论文如下: [00:29] 🧠 ERNIE 5.0 Technical Report(ERNIE 5.0 技术报告) [01:11] ⚡ FASA: Frequency-aware Sparse Attention(FASA:基于频率感知的稀疏注意力机制) [02:01] 📊 Training Data Efficiency in Multimodal Process Reward Models(多模态过程奖励模型中的训练数据效率研究) [02:44] 🤖 WideSeek-R1: Exploring Width Scaling for Broad Information Seeking via Multi-Agent Reinforcement Learning(WideSeek-R1:通过多智能体强化学习探索宽度扩展以实现广泛信息检索) [03:28] ⚡ OmniSIFT: Modality-Asymmetric Token Compression for Efficient Omni-modal Large Language Models(OmniSIFT:面向高效全模态大语言模型的模态非对称令牌压缩) [04:21] ⚡ HySparse: A Hybrid Sparse Attention Architecture with Oracle Token Selection and KV Cache Sharing(HySparse:一种具有预言机令牌选择和KV缓存共享的混合稀疏注意力架构) [05:02] 🤖 EgoActor: Grounding Task Planning into Spatial-aware Egocentric Actions for Humanoid Robots via Visual-Language Models(EgoActor:通过视觉语言模型将任务规划落地为空间感知的具身动作) [06:05] 🎬 Quant VideoGen: Auto-Regressive Long Video Generation via 2-Bit KV-Cache Quantization(Quant VideoGen:通过2位KV缓存量化实现自回归长视频生成) [06:59] 🤖 SoMA: A Real-to-Sim Neural Simulator for Robotic Soft-body Manipulation(SoMA:面向机器人软体操作的真实到仿真神经模拟器) [07:44] 🔍 TIDE: Trajectory-based Diagnostic Evaluation of Test-Time Improvement in LLM Agents(TIDE:基于轨迹的LLM智能体测试时改进诊断评估) [08:21] 🧠 Semantic Routing: Exploring Multi-Layer LLM Feature Weighting for Diffusion Transformers(语义路由:探索扩散变换器中多层LLM特征加权的融合框架) [09:12] 🤖 Rethinking the Trust Region in LLM Reinforcement Learning(重新思考大语言模型强化学习中的信任区域) [09:54] ♻ Residual Context Diffusion Language Models(残差上下文扩散语言模型) [10:40] 🧱 HY3D-Bench: Generation of 3D Assets(HY3D-Bench:3D资产的生成) [11:34] 🎨 AutoFigure: Generating and Refining Publication-Ready Scientific Illustrations(AutoFigure:生成与优化可直接用于发表的科学插图) 【关注我们】 您还可以在以下平台找到我们,获得播客内容以外更多信息 小红书: AI速递

12分钟
99+
1个月前

2026.02.04 | 看图写代码省token;临时组队降成本

HuggingFace 每日AI论文速递

【赞助商】 通勤路上就听AI每周谈。AI每周谈,每周带你回顾上周AI大事 传送门 🔗https://www.xiaoyuzhoufm.com/podcast/688a34636f5a275f1cba40fd 【目录】 本期的 15 篇论文如下: [00:32] 👁 CodeOCR: On the Effectiveness of Vision Language Models in Code Understanding(CodeOCR:视觉语言模型在代码理解中的有效性研究) [01:18] 🤖 AOrchestra: Automating Sub-Agent Creation for Agentic Orchestration(AOrchestra:面向智能体编排的子智能体自动创建) [02:01] 🔍 No Global Plan in Chain-of-Thought: Uncover the Latent Planning Horizon of LLMs(思维链中无全局规划:揭示大语言模型的潜在规划视野) [02:43] 🔗 daVinci-Agency: Unlocking Long-Horizon Agency Data-Efficiently(daVinci-Agency:高效解锁长程智能体工作流) [03:23] 🧠 Research on World Models Is Not Merely Injecting World Knowledge into Specific Tasks(世界模型研究并非仅将世界知识注入特定任务) [04:06] 🎬 3D-Aware Implicit Motion Control for View-Adaptive Human Video Generation(面向视角自适应人体视频生成的3D感知隐式运动控制) [04:56] 🤖 MARS: Modular Agent with Reflective Search for Automated AI Research(MARS:具备反思搜索能力的模块化智能体用于自动化人工智能研究) [05:41] 📊 CoBA-RL: Capability-Oriented Budget Allocation for Reinforcement Learning in LLMs(CoBA-RL:面向大语言模型强化学习的基于能力的预算分配算法) [06:25] ⚡ Diversity-Preserved Distribution Matching Distillation for Fast Visual Synthesis(保持多样性的分布匹配蒸馏用于快速视觉合成) [07:19] 🤖 SWE-World: Building Software Engineering Agents in Docker-Free Environments(SWE-World:在无Docker环境中构建软件工程智能体) [08:09] 🤖 SWE-Master: Unleashing the Potential of Software Engineering Agents via Post-Training(SWE-Master:通过后训练释放软件工程智能体的潜力) [09:14] 📊 Learning Query-Specific Rubrics from Human Preferences for DeepResearch Report Generation(基于人类偏好的查询特定评分规则学习用于深度研究报告生成) [10:08] ⚡ Parallel-Probe: Towards Efficient Parallel Thinking via 2D Probing(Parallel-Probe:通过二维探测实现高效并行思维) [10:59] 🎯 Unified Personalized Reward Model for Vision Generation(视觉生成的统一个性化奖励模型) [11:47] 🔍 WideSeek: Advancing Wide Research via Multi-Agent Scaling(WideSeek:通过多智能体扩展推进广度研究) 【关注我们】 您还可以在以下平台找到我们,获得播客内容以外更多信息 小红书: AI速递

12分钟
99+
1个月前

加入我们的 Discord

与播客爱好者一起交流

立即加入

扫描微信二维码

添加微信好友,获取更多播客资讯

微信二维码

播放列表

自动播放下一个

播放列表还是空的

去找些喜欢的节目添加进来吧