评分
暂无评分
0人评价
5星
0%
4星
0%
3星
0%
2星
0%
1星
0%
AI智能总结...
AI/summary > _
AI 正在思考中...
本集内容尚未生成 AI 总结
简介...
https://xiaoyuzhoufm.com

本期的 14 篇论文如下:

[00:22] 🧠 Kimi-VL Technical Report(Kimi-VL技术报告)

[01:05] 🎬 VCR-Bench: A Comprehensive Evaluation Framework for Video Chain-of-Thought Reasoning(VCR-Bench:一个用于视频链式思考推理的综合评估框架)

[01:54] 🖼 MM-IFEngine: Towards Multimodal Instruction Following(MM-IFEngine: 面向多模态指令跟随)

[02:35] 🖼 VisualCloze: A Universal Image Generation Framework via Visual In-Context Learning(VisualCloze:一个基于视觉情境学习的通用图像生成框架)

[03:15] 🤔 DeepSeek-R1 Thoughtology: Let's <think> about LLM Reasoning(DeepSeek-R1 思维学:让我们来<思考>关于LLM的推理)

[03:54] 🧩 HoloPart: Generative 3D Part Amodal Segmentation(HoloPart:生成式3D部件非模态分割)

[04:36] 🤖 C3PO: Critical-Layer, Core-Expert, Collaborative Pathway Optimization for Test-Time Expert Re-Mixing(C3PO:面向测试时专家重混合的关键层、核心专家、协同路径优化)

[05:11] 🤖 MOSAIC: Modeling Social AI for Content Dissemination and Regulation in Multi-Agent Simulations(MOSAIC:用于多智能体模拟中内容传播和监管的社会人工智能建模)

[05:58] 🖼 Scaling Laws for Native Multimodal Models Scaling Laws for Native Multimodal Models(原生多模态模型的扩展法则)

[06:30] 🧠 SoTA with Less: MCTS-Guided Sample Selection for Data-Efficient Visual Reasoning Self-Improvement(更少数据,更强性能:MCTS引导的样本选择用于数据高效的视觉推理自提升)

[07:16] 🖼 Towards Visual Text Grounding of Multimodal Large Language Model(面向多模态大语言模型的视觉文本定位)

[07:57] 🤖 MonoPlace3D: Learning 3D-Aware Object Placement for 3D Monocular Detection(MonoPlace3D:学习用于单目3D检测的3D感知物体放置)

[08:39] 🧭 Compass Control: Multi Object Orientation Control for Text-to-Image Generation(罗盘控制:用于文本到图像生成的多对象方向控制)

[09:22] 📍 TAPNext: Tracking Any Point (TAP) as Next Token Prediction(TAPNext:将追踪任意点(TAP)视为下一个令牌预测)

【关注我们】

您还可以在以下平台找到我们,获得播客内容以外更多信息

小红书: AI速递

主播...
拨号上网
评价...

空空如也

小宇宙热门评论...

暂无小宇宙热门评论

EarsOnMe

加入我们的 Discord

与播客爱好者一起交流

立即加入

播放列表

自动播放下一个

播放列表还是空的

去找些喜欢的节目添加进来吧