HuggingFace 每日AI论文速递 - 2026.04.09 | RL智能体模板病；分步生图更可控 - EarsOnMe

主播

节目简介

来源：小宇宙

【赞助商】

通勤路上就听AI每周谈。AI每周谈，每周带你回顾上周AI大事

传送门 🔗https://www.xiaoyuzhoufm.com/podcast/688a34636f5a275f1cba40fd

【目录】

本期的 15 篇论文如下：

[00:31] 🧠 RAGEN-2: Reasoning Collapse in Agentic RL（RAGEN-2：智能体强化学习中的推理崩溃）

[01:21] 🎨 Think in Strokes, Not Pixels: Process-Driven Image Generation via Interleaved Reasoning（以笔画思考，而非像素：通过交错推理实现过程驱动的图像生成）

[02:00] ⚡ MARS: Enabling Autoregressive Models Multi-Token Generation（MARS：实现自回归模型的多令牌生成）

[02:51] 🌍 INSPATIO-WORLD: A Real-Time 4D World Simulator via Spatiotemporal Autoregressive Modeling（INSPATIO-WORLD：基于时空自回归建模的实时4D世界模拟器）

[03:48] 🔬 SEVerA: Verified Synthesis of Self-Evolving Agents（SEVerA：可验证自进化智能体的合成）

[04:41] 🔍 TC-AE: Unlocking Token Capacity for Deep Compression Autoencoders（TC-AE：解锁深度压缩自编码器的令牌容量）

[05:26] ⚡ FP4 Explore, BF16 Train: Diffusion Reinforcement Learning via Efficient Rollout Scaling（FP4探索，BF16训练：通过高效扩展rollout的扩散模型强化学习）

[06:17] 🔄 FlowInOne:Unifying Multimodal Generation as Image-in, Image-out Flow Matching（FlowInOne：将多模态生成统一为图像输入-图像输出的流匹配）

[07:00] 🧠 Neural Computers（神经计算机）

[07:37] 🎯 Personalized RewardBench: Evaluating Reward Models with Human Aligned Personalization（个性化奖励模型基准：基于人类对齐个性化的奖励模型评估）

[08:22] 💡 Learning to Hint for Reinforcement Learning（强化学习的提示学习）

[09:11] 🧠 Fast Spatial Memory with Elastic Test-Time Training（基于弹性测试时训练的高速空间记忆）

[09:44] 🎬 MoRight: Motion Control Done Right（MoRight：正确的运动控制）

[10:21] 🌐 Improving Semantic Proximity in Information Retrieval through Cross-Lingual Alignment（通过跨语言对齐提升信息检索中的语义邻近性）

[11:02] 📊 Beyond Hard Negatives: The Importance of Score Distribution in Knowledge Distillation for Dense Retrieval（超越困难负样本：知识蒸馏中分数分布对稠密检索的重要性）

【关注我们】

您还可以在以下平台找到我们，获得播客内容以外更多信息

小红书: AI速递

2026.04.09 | RL智能体模板病；分步生图更可控