本期的 14 篇论文如下:
[00:22] 🧠 Kimi-VL Technical Report(Kimi-VL技术报告)
[01:05] 🎬 VCR-Bench: A Comprehensive Evaluation Framework for Video Chain-of-Thought Reasoning(VCR-Bench:一个用于视频链式思考推理的综合评估框架)
[01:54] 🖼 MM-IFEngine: Towards Multimodal Instruction Following(MM-IFEngine: 面向多模态指令跟随)
[02:35] 🖼 VisualCloze: A Universal Image Generation Framework via Visual In-Context Learning(VisualCloze:一个基于视觉情境学习的通用图像生成框架)
[03:15] 🤔 DeepSeek-R1 Thoughtology: Let's <think> about LLM Reasoning(DeepSeek-R1 思维学:让我们来<思考>关于LLM的推理)
[03:54] 🧩 HoloPart: Generative 3D Part Amodal Segmentation(HoloPart:生成式3D部件非模态分割)
[04:36] 🤖 C3PO: Critical-Layer, Core-Expert, Collaborative Pathway Optimization for Test-Time Expert Re-Mixing(C3PO:面向测试时专家重混合的关键层、核心专家、协同路径优化)
[05:11] 🤖 MOSAIC: Modeling Social AI for Content Dissemination and Regulation in Multi-Agent Simulations(MOSAIC:用于多智能体模拟中内容传播和监管的社会人工智能建模)
[05:58] 🖼 Scaling Laws for Native Multimodal Models Scaling Laws for Native Multimodal Models(原生多模态模型的扩展法则)
[06:30] 🧠 SoTA with Less: MCTS-Guided Sample Selection for Data-Efficient Visual Reasoning Self-Improvement(更少数据,更强性能:MCTS引导的样本选择用于数据高效的视觉推理自提升)
[07:16] 🖼 Towards Visual Text Grounding of Multimodal Large Language Model(面向多模态大语言模型的视觉文本定位)
[07:57] 🤖 MonoPlace3D: Learning 3D-Aware Object Placement for 3D Monocular Detection(MonoPlace3D:学习用于单目3D检测的3D感知物体放置)
[08:39] 🧭 Compass Control: Multi Object Orientation Control for Text-to-Image Generation(罗盘控制:用于文本到图像生成的多对象方向控制)
[09:22] 📍 TAPNext: Tracking Any Point (TAP) as Next Token Prediction(TAPNext:将追踪任意点(TAP)视为下一个令牌预测)

【关注我们】
您还可以在以下平台找到我们,获得播客内容以外更多信息
小红书: AI速递

空空如也
暂无小宇宙热门评论