2026.04.28 | 强化学习逼出几何一致视频;AI公司乐高式组队降本提效
HuggingFace 每日AI论文速递
【目录】本期的 15 篇论文如下:[00:24] 🌍 World-R1: Reinforcing 3D Constraints for Text-to-Video Generation(世界-R1:通过强化学习为文本到视频生成注入3D约束)[01:29] 🏢 From Skills to Talent: Organising Heterogeneous Agents as a Real-World Company(从技能到人才:将异构智能体组织为现实世界公司)[02:26] 🧠 ReVSI: Rebuilding Visual Spatial Intelligence Evaluation for Accurate Assessment of VLM 3D Reasoning(重建视觉空间智能评估:精准评估VLM三维推理能力)[03:23] 🛡 Vision-Language-Action Safety: Threats, Challenges, Evaluations, and Mechanisms(视觉-语言-动作安全:威胁、挑战、评估与机制)[04:12] 🖼 Tuna-2: Pixel Embeddings Beat Vision Encoders for Multimodal Understanding and Generation(Tuna-2:像素嵌入在多模态理解与生成中击败视觉编码器)[05:02] 🤖 ClawMark: A Living-World Benchmark for Multi-Turn, Multi-Day, Multimodal Coworker Agents(ClawMark:面向多轮、多日、多模态协作者智能体的现实世界基准测试)[06:20] ✍ SketchVLM: Vision language models can annotate images to explain thoughts and guide users(SketchVLM:视觉语言模型可以通过图像标注来解释思维并引导用户)[07:17] 🔬 Rewarding the Scientific Process: Process-Level Reward Modeling for Agentic Data Analysis(奖励科学过程:面向智能体数据分析的过程级奖励建模)[08:24] ⚖ Taming Actor-Observer Asymmetry in Agents via Dialectical Alignment(通过辩证对齐驯服智能体中的行动者-观察者不对称性)[09:20] 🔀 Efficient Agent Evaluation via Diversity-Guided User Simulation(通过多样性引导的用户模拟实现高效智能体评估)[10:02] ⚡ For-Value: Efficient Forward-Only Data Valuation for finetuning LLMs and VLMs(For-Value:面向微调大语言模型和视觉语言模型的高效前向数据估值方法)[11:04] 🎬 OmniShotCut: Holistic Relational Shot Boundary Detection with Shot-Query Transformer(全镜头剪切:基于镜头查询Transformer的整体关系型镜头边界检测)[12:03] 📷 UniGeo: Unifying Geometric Guidance for Camera-Controllable Image Editing via Video Models(UniGeo:通过视频模型实现相机可控图像编辑的统一几何引导)[12:49] 📄 TexOCR: Advancing Document OCR Models for Compilable Page-to-LaTeX Reconstruction(TexOCR:面向可编译页面到LaTeX重建的文档OCR模型进展)[13:56] 🔄 How Much Is One Recurrence Worth? Iso-Depth Scaling Laws for Looped Language Models(一次循环值多少?循环语言模型的等深度缩放定律)【关注我们】您还可以在以下平台找到我们,获得播客内容以外更多信息小红书: AI速递在小宇宙查看该单集文稿