本期的 15 篇论文如下:
[00:22] 🧠 Perception, Reason, Think, and Plan: A Survey on Large Multimodal Reasoning Models(感知、推理、思考与规划:大型多模态推理模型综述)
[00:57] 🤖 On Path to Multimodal Generalist: General-Level and General-Bench(迈向多模态通用智能:通用水平与通用基准)
[01:40] 🤖 Flow-GRPO: Training Flow Matching Models via Online RL(Flow-GRPO:通过在线强化学习训练Flow Matching模型)
[02:23] 🧠 Sentient Agent as a Judge: Evaluating Higher-Order Social Cognition in Large Language Models(作为裁判的感知代理:评估大型语言模型中的高阶社会认知)
[03:05] 🧠 Scalable Chain of Thoughts via Elastic Reasoning(基于弹性推理的可扩展思维链)
[03:41] 🔍 FG-CLIP: Fine-Grained Visual and Textual Alignment(FG-CLIP:细粒度视觉与文本对齐)
[04:19] 🏞 3D Scene Generation: A Survey(三维场景生成:综述)
[05:02] 🧮 ICon: In-Context Contribution for Automatic Data Selection(ICon:用于自动数据选择的上下文贡献度学习)
[05:39] 🎬 StreamBridge: Turning Your Offline Video Large Language Model into a Proactive Streaming Assistant(StreamBridge:将离线视频大语言模型转化为主动流式助手)
[06:19] 🤖 LiftFeat: 3D Geometry-Aware Local Feature Matching(LiftFeat: 三维几何感知局部特征匹配)
[06:56] 🧱 Generating Physically Stable and Buildable LEGO Designs from Text(基于文本生成物理稳定且可搭建的乐高设计)
[07:38] 🧠 X-Reasoner: Towards Generalizable Reasoning Across Modalities and Domains(X-Reasoner:迈向跨模态和领域的通用推理)
[08:22] 🌐 Crosslingual Reasoning through Test-Time Scaling(基于测试时缩放的跨语言推理)
[09:04] 🖼 PlaceIt3D: Language-Guided Object Placement in Real 3D Scenes(PlaceIt3D:语言引导的真实3D场景物体放置)
[09:42] 🌐 BrowseComp-ZH: Benchmarking Web Browsing Ability of Large Language Models in Chinese(BrowseComp-ZH:中文环境下评估大型语言模型网页浏览能力的基准)

【关注我们】
您还可以在以下平台找到我们,获得播客内容以外更多信息
小红书: AI速递

空空如也
