HuggingFace 每日AI论文速递 - 2026.04.21 | 一步听懂句子出图；单步潜码搞定驾驶推理 - EarsOnMe

主播

节目简介

来源：小宇宙

【目录】
本期的 15 篇论文如下：
00:24 🚀 Extending One-Step Image Generation from Class Labels to Text via Discriminative Text Representation（从类别标签到文本：通过判别性文本表征扩展一步图像生成）
01:08 🚗 OneVL: One-Step Latent Reasoning and Planning with Vision-Language Explanation（OneVL：基于视觉语言解释的单步潜在推理与规划）
01:54 🤖 Agent-World: Scaling Real-World Environment Synthesis for Evolving General Agent Intelligence（Agent-World：通过可扩展环境合成推进通用智能体智能的自我演化训练场）
02:41 🎮 OpenGame: Open Agentic Coding for Games（OpenGame：面向游戏开发的开放式智能体编码框架）
03:48 🤖 MultiWorld: Scalable Multi-Agent Multi-View Video World Models（MultiWorld：可扩展的多智能体多视角视频世界模型）
04:44 🎬 EasyVideoR1: Easier RL for Video Understanding（EasyVideoR1：面向视频理解的简易强化学习框架）
05:42 🧭 WebCompass: Towards Multimodal Web Coding Evaluation for Code Language Models（WebCompass：面向代码语言模型的多模态网页编码评估）
06:46 🧠 GFT: From Imitation to Reward Fine-Tuning with Unbiased Group Advantages and Dynamic Coefficient Rectification（GFT：从模仿到奖励微调——基于无偏群体优势与动态系数校正）
07:34 🧠 SkillFlow:Benchmarking Lifelong Skill Discovery and Evolution for Autonomous Agents（SkillFlow：面向自主智能体的终身技能发现与演化基准测试）
08:22 🧩 Crowded in B-Space: Calibrating Shared Directions for LoRA Merging（B空间拥挤：为LoRA合并校准共享方向）
09:13 🧠 When Can LLMs Learn to Reason with Weak Supervision?（大型语言模型何时能在弱监督下学会推理？）
10:04 🤖 ClawEnvKit: Automatic Environment Generation for Claw-Like Agents（ClawEnvKit：面向爪状智能体的自动环境生成系统）
10:52 🎬 OmniScript: Towards Audio-Visual Script Generation for Long-Form Cinematic Video（OmniScript：面向长篇幅影视视频的视听脚本生成）
11:35 🧬 Training LLM Agents for Spontaneous, Reward-Free Self-Evolution via World Knowledge Exploration（通过世界知识探索训练LLM智能体实现自发的、无奖励的自我进化）
12:26 🧮 MathNet: a Global Multimodal Benchmark for Mathematical Reasoning and Retrieval（MathNet：一个用于数学推理与检索的全球多模态基准）

【关注我们】
您还可以在以下平台找到我们，获得播客内容以外更多信息
小红书: AI速递

在小宇宙查看该单集文稿

2026.04.21 | 一步听懂句子出图；单步潜码搞定驾驶推理

加入我们的 Discord

扫描微信二维码

播放列表