HuggingFace 每日AI论文速递 - 节目列表

【目录】本期的 5 篇论文如下：00:31 TOP1(🔥244) | 👗 Tstars-Tryon 1.0: Robust and Realistic Virtual Try-On for Diverse Fashion Items（Tstars-Tryon 1.0：面向多样化时尚商品的鲁棒且逼真的虚拟试穿系统）02:42 TOP2(🔥229) | 🔮 LLaDA2.0-Uni: Unifying Multimodal Understanding and Generation with Diffusion Large Language Model（LLaDA2.0-Uni：基于扩散大语言模型统一多模态理解与生成）05:07 TOP3(🔥154) | 🤖 AgentSPEX: An Agent SPecification and EXecution Language（AgentSPEX：一种智能体规范与执行语言）07:06 TOP4(🔥96) | 🚀 Extending One-Step Image Generation from Class Labels to Text via Discriminative Text Representation（从类别标签到文本：通过判别性文本表征扩展一步图像生成）08:48 TOP5(🔥84) | 🚗 OneVL: One-Step Latent Reasoning and Planning with Vision-Language Explanation（OneVL：基于视觉语言解释的单步潜在推理与规划）【关注我们】您还可以在以下平台找到我们，获得播客内容以外更多信息小红书: AI速递在小宇宙查看该单集文稿

11分钟

1个月前

2026.04.24 | LLaTiSA四级闯关教模型读时序；WorldMark统一基准测视频世界模型

【目录】本期的 15 篇论文如下：00:23 📈 LLaTiSA: Towards Difficulty-Stratified Time Series Reasoning from Visual Perception to Semantics（LLaTiSA：从视觉感知到语义的难度分层时间序列推理）01:11 🎮 WorldMark: A Unified Benchmark Suite for Interactive Video World Models（WorldMark：交互式视频世界模型的统一基准套件）01:54 🤖 UniT: Toward a Unified Physical Language for Human-to-Humanoid Policy Learning and World Modeling（UniT：面向人形机器人策略学习与世界建模的统一物理语言）02:44 🎨 StyleID: A Perception-Aware Dataset and Metric for Stylization-Agnostic Facial Identity Recognition（StyleID：一种面向风格化无关面部身份识别的感知感知数据集与度量）03:56 ⏩ Seeing Fast and Slow: Learning the Flow of Time in Videos（快慢视觉：学习视频中的时间流动）04:39 ⚡ TingIS: Real-time Risk Event Discovery from Noisy Customer Incidents at Enterprise Scale（TingIS：企业级规模下从嘈杂客户事件中实时发现风险事件）05:16 🧠 Hybrid Policy Distillation for LLMs（面向大语言模型的混合策略蒸馏）05:48 🧠 Co-Evolving LLM Decision and Skill Bank Agents for Long-Horizon Tasks（面向长时域任务的LLM决策与技能库智能体协同进化）06:44 🤖 VLAA-GUI: Knowing When to Stop, Recover, and Search, A Modular Framework for GUI Automation（VLAA-GUI：一种用于GUI自动化的模块化框架——知晓何时停止、恢复与搜索）07:43 🧩 Context Unrolling in Omni Models（全模态模型中的上下文展开）08:31 🎨 EditCrafter: Tuning-free High-Resolution Image Editing via Pretrained Diffusion Model（EditCrafter：基于预训练扩散模型的无调优高分辨率图像编辑）09:34 🔗 UniGenDet: A Unified Generative-Discriminative Framework for Co-Evolutionary Image Generation and Generated Image Detection（UniGenDet：一种用于协同进化图像生成与生成图像检测的统一生成-判别框架）10:25 🌐 WebGen-R1: Incentivizing Large Language Models to Generate Functional and Aesthetic Websites with Reinforcement Learning（WebGen-R1：利用强化学习激励大型语言模型生成功能性与美观性网站）11:14 🔍 Trust but Verify: Introducing DAVinCI -- A Framework for Dual Attribution and Verification in Claim Inference for Language Models（信任但验证：引入DAVinCI——一种用于语言模型声明推理的双重归因与验证框架）12:11 🔍 Explainable Disentangled Representation Learning for Generalizable Authorship Attribution in the Era of Generative AI（面向生成式AI时代的可解释解耦表示学习用于泛化作者归因）【关注我们】您还可以在以下平台找到我们，获得播客内容以外更多信息小红书: AI速递在小宇宙查看该单集文稿

2026.04.23 | LLaDA2.0统一多模态；未来经验外挂RL

【目录】本期的 15 篇论文如下：00:28 🔮 LLaDA2.0-Uni: Unifying Multimodal Understanding and Generation with Diffusion Large Language Model（LLaDA2.0-Uni：基于扩散大语言模型统一多模态理解与生成）01:17 🔮 Near-Future Policy Optimization（近未来策略优化）02:07 🤖 DR-Venus: Towards Frontier Edge-Scale Deep Research Agents with Only 10K Open Data（DR-Venus：仅用1万条开放数据迈向前沿边缘规模深度研究代理）02:53 🤖 DeVI: Physics-based Dexterous Human-Object Interaction via Synthetic Video Imitation（DeVI：基于物理的灵巧人机交互通过合成视频模仿）03:42 🎭 Reward Hacking in the Era of Large Models: Mechanisms, Emergent Misalignment, Challenges（大模型时代的奖励黑客：机制、涌现性失调与挑战）04:36 🧠 Exploring Spatial Intelligence from a Generative Perspective（从生成视角探索空间智能）05:21 🤖 A Self-Evolving Framework for Efficient Terminal Agents via Observational Context Compression（一种通过观测上下文压缩实现高效终端代理的自演化框架）06:18 🎤 WavAlign: Enhancing Intelligence and Expressiveness in Spoken Dialogue Models via Adaptive Hybrid Post-Training（WavAlign：通过自适应混合后训练增强口语对话模型的智能与表现力）07:06 🤖 SWE-chat: Coding Agent Interactions From Real Users in the Wild（SWE-chat：来自真实用户的编码智能体交互记录）07:53 🤖 Cortex 2.0: Grounding World Models in Real-World Industrial Deployment（Cortex 2.0：在现实工业部署中基于世界模型进行规划）08:36 🧠 Convergent Evolution: How Different Language Models Learn Similar Number Representations（趋同演化：不同语言模型如何学习相似的数值表示）09:21 🤝 SAVOIR: Learning Social Savoir-Faire via Shapley-based Reward Attribution（SAVOIR：通过沙普利值奖励归因学习社交智慧）09:57 🎬 ReImagine: Rethinking Controllable High-Quality Human Video Generation via Image-First Synthesis（ReImagine：通过图像优先合成重新思考可控的高质量人类视频生成）10:34 🔧 Visual Reasoning through Tool-supervised Reinforcement Learning（通过工具监督强化学习实现视觉推理）11:09 🤖 AI scientists produce results without reasoning scientifically（AI科学家产生结果但未进行科学推理）【关注我们】您还可以在以下平台找到我们，获得播客内容以外更多信息小红书: AI速递在小宇宙查看该单集文稿

12分钟

2026.04.22 | 虚拟试衣3.9秒高清生成；协同生成HOI视频物理一致

【目录】本期的 15 篇论文如下：00:23 👗 Tstars-Tryon 1.0: Robust and Realistic Virtual Try-On for Diverse Fashion Items（Tstars-Tryon 1.0：面向多样化时尚商品的鲁棒且逼真的虚拟试穿系统）01:05 🎬 CoInteract: Physically-Consistent Human-Object Interaction Video Synthesis via Spatially-Structured Co-Generation（CoInteract：通过空间结构化协同生成实现物理一致的人-物交互视频合成）01:58 🤖 AgentSPEX: An Agent SPecification and EXecution Language（AgentSPEX：一种智能体规范与执行语言）02:51 📐 AnyRecon: Arbitrary-View 3D Reconstruction with Video Diffusion Model（AnyRecon：基于视频扩散模型的任意视角三维重建）03:33 🚀 TEMPO: Scaling Test-time Training for Large Reasoning Models（TEMPO：扩展大型推理模型的测试时训练规模）04:26 🎮 PlayCoder: Making LLM-Generated GUI Code Playable（PlayCoder：让LLM生成的GUI代码可玩）05:08 🕶 ShadowPEFT: Shadow Network for Parameter-Efficient Fine-Tuning（ShadowPEFT：用于参数高效微调的影子网络）05:58 🤖 Chat2Workflow: A Benchmark for Generating Executable Visual Workflows with Natural Language（Chat2Workflow：基于自然语言生成可执行视觉工作流的基准）06:44 ⚖ AJ-Bench: Benchmarking Agent-as-a-Judge for Environment-Aware Evaluation（AJ-Bench：面向环境感知评估的Agent-as-a-Judge基准测试）07:31 🔄 Dual-View Training for Instruction-Following Information Retrieval（面向指令跟随信息检索的双视图训练）08:41 🔍 Code-Switching Information Retrieval: Benchmarks, Analysis, and the Limits of Current Retrievers（代码转换信息检索：基准测试、分析与当前检索系统的局限）09:20 🔗 Understanding and Enforcing Weight Disentanglement in Task Arithmetic（理解与强制任务算术中的权重解耦）10:00 ⚡ Speculative Decoding for Autoregressive Video Generation（用于自回归视频生成的推测解码）11:01 🧠 Target-Oriented Pretraining Data Selection via Neuron-Activated Graph（基于神经元激活图的目标导向预训练数据选择）11:41 🧩 UniMesh: Unifying 3D Mesh Understanding and Generation（UniMesh：统一三维网格理解与生成）【关注我们】您还可以在以下平台找到我们，获得播客内容以外更多信息小红书: AI速递在小宇宙查看该单集文稿

12分钟

2026.04.21 | 一步听懂句子出图；单步潜码搞定驾驶推理

【目录】本期的 15 篇论文如下：00:24 🚀 Extending One-Step Image Generation from Class Labels to Text via Discriminative Text Representation（从类别标签到文本：通过判别性文本表征扩展一步图像生成）01:08 🚗 OneVL: One-Step Latent Reasoning and Planning with Vision-Language Explanation（OneVL：基于视觉语言解释的单步潜在推理与规划）01:54 🤖 Agent-World: Scaling Real-World Environment Synthesis for Evolving General Agent Intelligence（Agent-World：通过可扩展环境合成推进通用智能体智能的自我演化训练场）02:41 🎮 OpenGame: Open Agentic Coding for Games（OpenGame：面向游戏开发的开放式智能体编码框架）03:48 🤖 MultiWorld: Scalable Multi-Agent Multi-View Video World Models（MultiWorld：可扩展的多智能体多视角视频世界模型）04:44 🎬 EasyVideoR1: Easier RL for Video Understanding（EasyVideoR1：面向视频理解的简易强化学习框架）05:42 🧭 WebCompass: Towards Multimodal Web Coding Evaluation for Code Language Models（WebCompass：面向代码语言模型的多模态网页编码评估）06:46 🧠 GFT: From Imitation to Reward Fine-Tuning with Unbiased Group Advantages and Dynamic Coefficient Rectification（GFT：从模仿到奖励微调——基于无偏群体优势与动态系数校正）07:34 🧠 SkillFlow:Benchmarking Lifelong Skill Discovery and Evolution for Autonomous Agents（SkillFlow：面向自主智能体的终身技能发现与演化基准测试）08:22 🧩 Crowded in B-Space: Calibrating Shared Directions for LoRA Merging（B空间拥挤：为LoRA合并校准共享方向）09:13 🧠 When Can LLMs Learn to Reason with Weak Supervision?（大型语言模型何时能在弱监督下学会推理？）10:04 🤖 ClawEnvKit: Automatic Environment Generation for Claw-Like Agents（ClawEnvKit：面向爪状智能体的自动环境生成系统）10:52 🎬 OmniScript: Towards Audio-Visual Script Generation for Long-Form Cinematic Video（OmniScript：面向长篇幅影视视频的视听脚本生成）11:35 🧬 Training LLM Agents for Spontaneous, Reward-Free Self-Evolution via World Knowledge Exploration（通过世界知识探索训练LLM智能体实现自发的、无奖励的自我进化）12:26 🧮 MathNet: a Global Multimodal Benchmark for Mathematical Reasoning and Retrieval（MathNet：一个用于数学推理与检索的全球多模态基准）【关注我们】您还可以在以下平台找到我们，获得播客内容以外更多信息小红书: AI速递在小宇宙查看该单集文稿

2026.04.20 | DPM零训画质糖；两位翻转毁模型

【目录】本期的 15 篇论文如下：00:20 🔍 Elucidating the SNR-t Bias of Diffusion Probabilistic Models（阐明扩散概率模型的信噪比-时间步偏差）01:00 💥 Maximal Brain Damage Without Data or Optimization: Disrupting Neural Networks via Sign-Bit Flips（无需数据或优化的最大脑损伤：通过符号位翻转破坏神经网络）01:45 🧠 PersonaVLM: Long-Term Personalized Multimodal LLMs（PersonaVLM：面向长期个性化的多模态大语言模型）02:56 🧩 Web Retrieval-Aware Chunking (W-RAC) for Efficient and Cost-Effective Retrieval-Augmented Generation Systems（面向高效且经济高效的检索增强生成系统的Web检索感知分块（W-RAC））03:40 ✂ Cut Your Losses! Learning to Prune Paths Early for Efficient Parallel Reasoning（削减你的损失！学习早期剪枝路径以实现高效并行推理）04:32 🚀 Qwen3.5-Omni Technical Report（Qwen3.5-Omni技术报告）05:17 🧱 Repurposing 3D Generative Model for Autoregressive Layout Generation（重新利用三维生成模型进行自回归布局生成）06:02 🔍 (1D) Ordered Tokens Enable Efficient Test-Time Search（（一维）有序分词实现高效的测试时搜索）06:55 📈 QuantCode-Bench: A Benchmark for Evaluating the Ability of Large Language Models to Generate Executable Algorithmic Trading Strategies（QuantCode-Bench：评估大语言模型生成可执行算法交易策略能力的基准）07:36 🧠 Learning Adaptive Reasoning Paths for Efficient Visual Reasoning（学习自适应推理路径以实现高效视觉推理）08:29 🔍 TIPSv2: Advancing Vision-Language Pretraining with Enhanced Patch-Text Alignment（TIPSv2：通过增强的补丁-文本对齐推进视觉-语言预训练）09:33 💡 Can Large Language Models Reinvent Foundational Algorithms?（大型语言模型能否重新发明基础算法？）10:17 📊 GTA-2: Benchmarking General Tool Agents from Atomic Tool-Use to Open-Ended Workflows（GTA-2：从原子工具使用到开放式工作流的通用工具智能体基准测试）11:10 ⚡ AccelOpt: A Self-Improving LLM Agentic System for AI Accelerator Kernel Optimization（AccelOpt：一种用于AI加速器内核优化的自我改进型LLM智能体系统）11:55 🎭 Hierarchical Codec Diffusion for Video-to-Speech Generation（基于分层编解码扩散的视频到语音生成）【关注我们】您还可以在以下平台找到我们，获得播客内容以外更多信息小红书: AI速递在小宇宙查看该单集文稿

【周末特辑】4月第3周最火AI论文 | 单图识3D升维；记忆防崩溃

【赞助商】通勤路上就听AI每周谈。AI每周谈，每周带你回顾上周AI大事传送门 🔗www.xiaoyuzhoufm.com【目录】本期的 5 篇论文如下：00:41 TOP1(🔥237) | 🔍 WildDet3D: Scaling Promptable 3D Detection in the Wild（WildDet3D：可扩展的野外可提示三维检测）02:54 TOP2(🔥135) | 🧠 The Past Is Not Past: Memory-Enhanced Dynamic Reward Shaping（过去并未过去：基于记忆增强的动态奖励塑形）05:12 TOP3(🔥134) | 🤖 ClawGUI: A Unified Framework for Training, Evaluating, and Deploying GUI Agents（ClawGUI：用于训练、评估和部署GUI智能体的统一框架）07:55 TOP4(🔥134) | 🎬 Seedance 2.0: Advancing Video Generation for World Complexity（Seedance 2.0：面向世界复杂性的视频生成技术进展）10:27 TOP5(🔥121) | ⚛ QuanBench+: A Unified Multi-Framework Benchmark for LLM-Based Quantum Code Generation（QuanBench+：面向基于大语言模型的量子代码生成的统一多框架基准测试）【关注我们】您还可以在以下平台找到我们，获得播客内容以外更多信息小红书: AI速递在小宇宙查看该单集文稿

2026.04.17 | HY-World2.0统一生成与重建；DR³-Eval建可复现研究基准

【赞助商】通勤路上就听AI每周谈。AI每周谈，每周带你回顾上周AI大事传送门 🔗www.xiaoyuzhoufm.com【目录】本期的 15 篇论文如下：00:31 🌍 HY-World 2.0: A Multi-Modal World Model for Reconstructing, Generating, and Simulating 3D Worlds（HY-World 2.0：用于重建、生成和模拟3D世界的多模态世界模型）01:24 🔬 DR$^{3}$-Eval: Towards Realistic and Reproducible Deep Research Evaluation（DR³-Eval：迈向现实且可复现的深度研究评估）02:16 🚗 RAD-2: Scaling Reinforcement Learning in a Generator-Discriminator Framework（RAD-2：在生成器-判别器框架中扩展强化学习）03:15 🤖 HiVLA: A Visual-Grounded-Centric Hierarchical Embodied Manipulation System（HiVLA：一种以视觉接地为中心的分层具身操作系统）04:04 🛡 ASGuard: Activation-Scaling Guard to Mitigate Targeted Jailbreaking Attack（ASGuard：基于激活缩放防护的定向越狱攻击缓解方法）05:02 🧠 How to Fine-Tune a Reasoning Model? A Teacher-Student Cooperation Framework to Synthesize Student-Consistent SFT Data（如何微调推理模型？一种师生协作框架以合成学生一致的SFT数据）05:37 🌍 GlobalSplat: Efficient Feed-Forward 3D Gaussian Splatting via Global Scene Tokens（GlobalSplat：通过全局场景令牌实现高效前馈式3D高斯溅射）06:20 🔍 UniDoc-RL: Coarse-to-Fine Visual RAG with Hierarchical Actions and Dense Rewards（UniDoc-RL：基于分层动作与密集奖励的从粗到细视觉检索增强生成）06:56 🧠 Switch-KD: Visual-Switch Knowledge Distillation for Vision-Language Models（Switch-KD：面向视觉语言模型的视觉切换知识蒸馏）07:34 🛣 TRACER: Trace-Based Adaptive Cost-Efficient Routing for LLM Classification（TRACER：基于追踪的自适应成本高效路由用于大语言模型分类）08:41 🤖 Dive into Claude Code: The Design Space of Today's and Future AI Agent Systems（深入Claude Code：当今及未来AI代理系统的设计空间）09:25 🎬 Representations Before Pixels: Semantics-Guided Hierarchical Video Prediction（像素之前：语义引导的分层视频预测）10:13 🧭 Don't Retrieve, Navigate: Distilling Enterprise Knowledge into Navigable Agent Skills for QA and RAG（无需检索，直接导航：将企业知识提炼为可导航的智能体技能以支持问答与检索增强生成）11:03 🚀 LongAct: Harnessing Intrinsic Activation Patterns for Long-Context Reinforcement Learning（LongAct：利用内在激活模式进行长上下文强化学习）11:45 ⚡ KV Packet: Recomputation-Free Context-Independent KV Caching for LLMs（KV数据包：面向大语言模型的无重计算上下文无关KV缓存方法）【关注我们】您还可以在以下平台找到我们，获得播客内容以外更多信息小红书: AI速递在小宇宙查看该单集文稿

2026.04.16 | Seedance 2.0一统多模态生成；RationalRewards让奖励模型讲理

【赞助商】通勤路上就听AI每周谈。AI每周谈，每周带你回顾上周AI大事传送门 🔗www.xiaoyuzhoufm.com【目录】本期的 15 篇论文如下：00:31 🎬 Seedance 2.0: Advancing Video Generation for World Complexity（Seedance 2.0：面向世界复杂性的视频生成技术进展）01:21 🧠 RationalRewards: Reasoning Rewards Scale Visual Generation Both Training and Test Time（RationalRewards：推理奖励在训练和测试时均能提升视觉生成）02:08 🧠 SpatialEvo: Self-Evolving Spatial Intelligence via Deterministic Geometric Environments（SpatialEvo：通过确定性几何环境实现自我演化的空间智能）02:49 🧪 OccuBench: Evaluating AI Agents on Real-World Professional Tasks via Language World Models（OccuBench：基于语言世界模型评估AI代理在真实世界专业任务上的表现）03:39 🎮 GameWorld: Towards Standardized and Verifiable Evaluation of Multimodal Game Agents（GameWorld：面向多模态游戏智能体的标准化与可验证评估）04:22 🧠 Memory Transfer Learning: How Memories are Transferred Across Domains in Coding Agents（记忆迁移学习：编码智能体中记忆如何跨领域迁移）05:21 🧠 From $P(y|x)$ to $P(y)$: Investigating Reinforcement Learning in Pre-train Space（从条件分布$P(y|x)$到边际分布$P(y)$：探索预训练空间中的强化学习）06:16 🎯 Target Policy Optimization（目标策略优化）07:12 🧩 Sema Code: Decoupling AI Coding Agents into Programmable, Embeddable Infrastructure（Sema Code：将AI编码代理解耦为可编程、可嵌入的基础设施）08:04 🤖 SemaClaw: A Step Towards General-Purpose Personal AI Agents through Harness Engineering（SemaClaw：通过约束工程迈向通用个人AI代理的一步）08:40 🔍 Free Geometry: Refining 3D Reconstruction from Longer Versions of Itself（自由几何：从自身的更长版本中精炼三维重建）09:26 🔍 TIP: Token Importance in On-Policy Distillation（TIP：基于策略蒸馏中的令牌重要性）10:21 🔬 ReconPhys: Reconstruct Appearance and Physical Attributes from Single Video（ReconPhys：从单视频重建外观与物理属性）11:15 🔍 UI-Zoomer: Uncertainty-Driven Adaptive Zoom-In for GUI Grounding（UI-Zoomer：基于不确定性的自适应放大用于图形用户界面定位）12:17 🤖 TREX: Automating LLM Fine-tuning via Agent-Driven Tree-based Exploration（TREX：基于智能体驱动树状探索的LLM自动微调）【关注我们】您还可以在以下平台找到我们，获得播客内容以外更多信息小红书: AI速递在小宇宙查看该单集文稿

2026.04.15 | ClawGUI开源全家桶；KnowRL精简提示提效

【赞助商】通勤路上就听AI每周谈。AI每周谈，每周带你回顾上周AI大事传送门 🔗www.xiaoyuzhoufm.com【目录】本期的 15 篇论文如下：00:33 🤖 ClawGUI: A Unified Framework for Training, Evaluating, and Deploying GUI Agents（ClawGUI：用于训练、评估和部署GUI智能体的统一框架）01:21 🧠 KnowRL: Boosting LLM Reasoning via Reinforcement Learning with Minimal-Sufficient Knowledge Guidance（KnowRL：通过强化学习与最小充分知识指导提升大语言模型推理能力）02:16 🧠 Rethinking On-Policy Distillation of Large Language Models: Phenomenology, Mechanism, and Recipe（重新思考大型语言模型的在线策略蒸馏：现象学、机制与方案）03:09 🤖 Turing Test on Screen: A Benchmark for Mobile GUI Agent Humanization（屏幕上的图灵测试：移动GUI代理拟人化基准）04:01 🧠 SPPO: Sequence-Level PPO for Long-Horizon Reasoning Tasks（SPPO：面向长程推理任务的序列级近端策略优化）04:47 🤖 Toward Autonomous Long-Horizon Engineering for ML Research（迈向自主长周期机器学习研究工程）05:33 ⚖ BERT-as-a-Judge: A Robust Alternative to Lexical Methods for Efficient Reference-Based LLM Evaluation（BERT作为评判者：一种用于高效基于参考的LLM评估的鲁棒性替代词汇方法）06:17 🔍 Towards Long-horizon Agentic Multimodal Search（迈向长视野智能体多模态搜索）06:57 🌍 Lyra 2.0: Explorable Generative 3D Worlds（Lyra 2.0：可探索的生成式3D世界）07:40 ⚡ Self-Adversarial One Step Generation via Condition Shifting（通过条件偏移实现的自对抗单步生成）08:37 🤖 Habitat-GS: A High-Fidelity Navigation Simulator with Dynamic Gaussian Splatting（Habitat-GS：基于动态高斯溅射的高保真导航模拟器）09:20 ⚖ Many-Tier Instruction Hierarchy in LLM Agents（大语言模型代理中的多层级指令层次结构）10:04 🚀 Nemotron 3 Super: Open, Efficient Mixture-of-Experts Hybrid Mamba-Transformer Model for Agentic Reasoning（Nemotron 3 Super：用于智能体推理的开放、高效混合专家Mamba-Transformer模型）10:52 🧠 Rethinking the Diffusion Model from a Langevin Perspective（从朗之万视角重新思考扩散模型）11:44 🤖 LARY: A Latent Action Representation Yielding Benchmark for Generalizable Vision-to-Action Alignment（LARY：一种用于通用视觉-动作对齐基准的潜在动作表示）【关注我们】您还可以在以下平台找到我们，获得播客内容以外更多信息小红书: AI速递在小宇宙查看该单集文稿

12分钟

2026.04.14 | 错罚记忆助大模型提分；注意力沉没机制全解析

【赞助商】通勤路上就听AI每周谈。AI每周谈，每周带你回顾上周AI大事传送门 🔗https://www.xiaoyuzhoufm.com/podcast/688a34636f5a275f1cba40fd【目录】本期的 15 篇论文如下：[00:31] 🧠 The Past Is Not Past: Memory-Enhanced Dynamic Reward Shaping（过去并未过去：基于记忆增强的动态奖励塑形）[01:20] 🔍 Attention Sink in Transformers: A Survey on Utilization, Interpretation, and Mitigation（Transformer中的注意力沉没现象：利用、解释与缓解策略综述）[02:08] ⚛ QuanBench+: A Unified Multi-Framework Benchmark for LLM-Based Quantum Code Generation（QuanBench+：面向基于大语言模型的量子代码生成的统一多框架基准测试）[02:59] 🎬 OmniShow: Unifying Multimodal Conditions for Human-Object Interaction Video Generation（OmniShow：统一多模态条件的人-物交互视频生成）[03:35] 🎨 Strips as Tokens: Artist Mesh Generation with Native UV Segmentation（条带即令牌：基于原生UV分割的艺术家网格生成）[04:11] 🎬 Uni-ViGU: Towards Unified Video Generation and Understanding via A Diffusion-Based Video Generator（Uni-ViGU：通过基于扩散的视频生成器实现统一的视频生成与理解）[05:13] 🔍 Pseudo-Unification: Entropy Probing Reveals Divergent Information Patterns in Unified Multimodal Models（伪统一：熵探测揭示统一多模态模型中的信息模式分歧）[05:57] 🔍 CodeTracer: Towards Traceable Agent States（CodeTracer：迈向可追溯的智能体状态）[06:45] 🧪 CocoaBench: Evaluating Unified Digital Agents in the Wild（CocoaBench：在真实场景中评估统一数字智能体）[07:32] 🕸 Tracing the Roots: A Multi-Agent Framework for Uncovering Data Lineage in Post-Training LLMs（溯源寻根：用于揭示后训练大语言模型中数据谱系的多智能体框架）[08:17] 🤔 Introspective Diffusion Language Models（内省扩散语言模型）[09:12] 🧠 Solving Physics Olympiad via Reinforcement Learning on Physics Simulators（基于物理模拟器的强化学习解决物理奥林匹克竞赛问题）[09:50] 🎬 Prompt Relay: Inference-Time Temporal Control for Multi-Event Video Generation（提示接力：面向多事件视频生成的推理时态控制）[10:38] 🎵 Audio Flamingo Next: Next-Generation Open Audio-Language Models for Speech, Sound, and Music（Audio Flamingo Next：面向语音、声音与音乐的下一代开放音频-语言模型）[11:33] ⚡ SPEED-Bench: A Unified and Diverse Benchmark for Speculative Decoding（SPEED-Bench：一个用于推测解码的统一且多样化的基准测试）【关注我们】您还可以在以下平台找到我们，获得播客内容以外更多信息小红书: AI速递在小宇宙查看该单集文稿