2025.06.04 | 强化学习提升LLM性能;UniWorld统一视觉理解与生成。
HuggingFace 每日AI论文速递
本期的 15 篇论文如下:[00:23] 💡 Reflect, Retry, Reward: Self-Improving LLMs via Reinforcement Learning(反思、重试、奖励:通过强化学习实现LLM的自我提升)[01:09] 🖼 UniWorld: High-Resolution Semantic Encoders for Unified Visual Understanding and Generation(UniWorld:用于统一视觉理解与生成的高分辨率语义编码器)[01:53] 🧪 CSVQA: A Chinese Multimodal Benchmark for Evaluating STEM Reasoning Capabilities of VLMs(CSVQA:一个用于评估视觉语言模型STEM推理能力的中文多模态基准)[02:37] 🤖 VS-Bench: Evaluating VLMs for Strategic Reasoning and Decision-Making in Multi-Agent Environments(VS-Bench:评估视觉语言模型在多智能体环境中进行战略推理和决策的能力)[03:15] 🧠 SynthRL: Scaling Visual Reasoning with Verifiable Data Synthesis(SynthRL:利用可验证数据合成扩展视觉推理)[04:01] 🧠 OmniSpatial: Towards Comprehensive Spatial Reasoning Benchmark for Vision Language Models(OmniSpatial:面向视觉语言模型的综合空间推理基准)[04:47] 🤖 Visual Embodied Brain: Let Multimodal Large Language Models See, Think, and Control in Spaces(视觉具身大脑:让多模态大型语言模型在空间中观察、思考和控制)[05:24] 👀 MotionSight: Boosting Fine-Grained Motion Understanding in Multimodal LLMs(MotionSight:提升多模态大型语言模型中的细粒度运动理解能力)[06:10] 🤖 GUI-Actor: Coordinate-Free Visual Grounding for GUI Agents(GUI-Actor:面向GUI代理的无坐标视觉定位)[06:48] 🎬 Sparse-vDiT: Unleashing the Power of Sparse Attention to Accelerate Video Diffusion Transformers(Sparse-vDiT:释放稀疏注意力以加速视频扩散Transformer)[07:27] 🧩 DINGO: Constrained Inference for Diffusion LLMs(DINGO:扩散LLM的约束推理)[08:10] 🎬 AnimeShooter: A Multi-Shot Animation Dataset for Reference-Guided Video Generation(AnimeShooter:一个用于参考引导视频生成的多镜头动画数据集)[08:47] 🤖 Robot-R1: Reinforcement Learning for Enhanced Embodied Reasoning in Robotics(Robot-R1:用于增强机器人具身推理的强化学习)[09:35] 🤖 Co-Evolving LLM Coder and Unit Tester via Reinforcement Learning(基于强化学习的LLM代码生成器与单元测试器协同进化)[10:21] 🖼 Native-Resolution Image Synthesis(原生分辨率图像合成)【关注我们】您还可以在以下平台找到我们,获得播客内容以外更多信息小红书: AI速递在小宇宙查看该单集文稿