本期的 15 篇论文如下:
[00:21] 🎥 DropletVideo: A Dataset and Approach to Explore Integral Spatio-Temporal Consistent Video Generation(DropletVideo:探索整体时空一致性视频生成的数据集与方法)
[01:10] 🤖 Being-0: A Humanoid Robotic Agent with Vision-Language Models and Modular Skills(Being-0:一个具有视觉-语言模型和模块化技能的人形机器人代理)
[01:49] 🖼 DreamRenderer: Taming Multi-Instance Attribute Control in Large-Scale Text-to-Image Models(DreamRenderer:驯服大规模文本到图像模型中的多实例属性控制)
[02:38] 🖼 Edit Transfer: Learning Image Editing via Vision In-Context Relations(编辑迁移:通过视觉上下文关系学习图像编辑)
[03:12] 🖼 Personalize Anything for Free with Diffusion Transformer(使用扩散Transformer免费实现任何物体的个性化)
[03:53] 🎬 WideRange4D: Enabling High-Quality 4D Reconstruction with Wide-Range Movements and Scenes(WideRange4D:通过宽范围运动和场景实现高质量4D重建)
[04:30] 🎨 BlobCtrl: A Unified and Flexible Framework for Element-level Image Generation and Editing(BlobCtrl: 用于元素级图像生成与编辑的统一且灵活的框架)
[05:14] 🛡 reWordBench: Benchmarking and Improving the Robustness of Reward Models with Transformed Inputs(reWordBench:通过转换输入来评估和提升奖励模型的鲁棒性)
[05:54] 🔬 MicroVQA: A Multimodal Reasoning Benchmark for Microscopy-Based Scientific Research(MicroVQA:一个用于基于显微镜的科学研究的多模态推理基准)
[06:31] 🧠 Multimodal Chain-of-Thought Reasoning: A Comprehensive Survey(多模态思维链推理:一项综合综述)
[07:09] 🤖 Free-form language-based robotic reasoning and grasping(基于自由形式语言的机器人推理与抓取)
[07:45] 🧠 R1-VL: Learning to Reason with Multimodal Large Language Models via Step-wise Group Relative Policy Optimization(R1-VL:通过逐步分组相对策略优化学习多模态大型语言模型的推理)
[08:35] 🤔 V-STaR: Benchmarking Video-LLMs on Video Spatio-Temporal Reasoning(V-STaR:视频时空推理能力评测基准)
[09:18] 🎬 VideoMind: A Chain-of-LoRA Agent for Long Video Reasoning(VideoMind:用于长视频推理的链式LoRA Agent)
[09:51] 🖼 Rewards Are Enough for Fast Photo-Realistic Text-to-image Generation(奖励足以实现快速逼真的文本到图像生成)

【关注我们】
您还可以在以下平台找到我们,获得播客内容以外更多信息
小红书: AI速递

空空如也
暂无小宇宙热门评论