时长:
13分钟
播放:
136
发布:
1周前
主播...
简介...
【赞助商】
通勤路上就听AI每周谈。AI每周谈,每周带你回顾上周AI大事
传送门 🔗https://www.xiaoyuzhoufm.com/podcast/688a34636f5a275f1cba40fd
【目录】
本期的 15 篇论文如下:
[00:32] 🤖 BayesianVLA: Bayesian Decomposition of Vision Language Action Models via Latent Action Queries(BayesianVLA:通过潜在动作查询对视觉语言动作模型进行贝叶斯分解)
[01:22] ⚠ The Flexibility Trap: Why Arbitrary Order Limits Reasoning Potential in Diffusion Language Models(灵活性陷阱:为何任意顺序生成会限制扩散语言模型的推理潜力)
[02:26] 🎥 HERMES: KV Cache as Hierarchical Memory for Efficient Streaming Video Understanding(HERMES:将KV缓存作为分层内存以实现高效流式视频理解)
[03:14] 🚀 EvoCUA: Evolving Computer Use Agents via Learning from Scalable Synthetic Experience(EvoCUA:通过从可扩展合成经验中学习来演化计算机使用智能体)
[04:02] 🧪 LLM-in-Sandbox Elicits General Agentic Intelligence(沙盒中的LLM激发通用智能体智能)
[04:54] 🚀 Stable-DiffCoder: Pushing the Frontier of Code Diffusion Large Language Model(Stable-DiffCoder:推进代码扩散大语言模型的前沿)
[05:34] 🎭 SAMTok: Representing Any Mask with Two Words(SAMTok:用两个词表示任意掩码)
[06:30] 🚀 Scaling Text-to-Image Diffusion Transformers with Representation Autoencoders(使用表征自编码器扩展文本到图像扩散变换器)
[07:23] 🔬 Learning to Discover at Test Time(在测试时学习发现)
[08:08] 🔍 Rethinking Composed Image Retrieval Evaluation: A Fine-Grained Benchmark from Image Editing(重新思考组合图像检索评估:一个源自图像编辑的细粒度基准)
[09:06] ⚙ Towards Automated Kernel Generation in the Era of LLMs(大语言模型时代的自动化内核生成研究)
[09:48] 🔄 OpenVision 3: A Family of Unified Visual Encoder for Both Understanding and Generation(OpenVision 3:一个用于理解和生成的统一视觉编码器家族)
[10:45] 💻 Terminal-Bench: Benchmarking Agents on Hard, Realistic Tasks in Command Line Interfaces(终端基准测试:在命令行界面中对智能体进行困难、现实任务的基准评估)
[11:29] 🗣 Qwen3-TTS Technical Report(Qwen3-TTS技术报告)
[12:13] 🤖 Cosmos Policy: Fine-Tuning Video Models for Visuomotor Control and Planning(Cosmos策略:通过微调视频模型实现视觉运动控制与规划)
【关注我们】
您还可以在以下平台找到我们,获得播客内容以外更多信息
小红书: AI速递
通勤路上就听AI每周谈。AI每周谈,每周带你回顾上周AI大事
传送门 🔗https://www.xiaoyuzhoufm.com/podcast/688a34636f5a275f1cba40fd
【目录】
本期的 15 篇论文如下:
[00:32] 🤖 BayesianVLA: Bayesian Decomposition of Vision Language Action Models via Latent Action Queries(BayesianVLA:通过潜在动作查询对视觉语言动作模型进行贝叶斯分解)
[01:22] ⚠ The Flexibility Trap: Why Arbitrary Order Limits Reasoning Potential in Diffusion Language Models(灵活性陷阱:为何任意顺序生成会限制扩散语言模型的推理潜力)
[02:26] 🎥 HERMES: KV Cache as Hierarchical Memory for Efficient Streaming Video Understanding(HERMES:将KV缓存作为分层内存以实现高效流式视频理解)
[03:14] 🚀 EvoCUA: Evolving Computer Use Agents via Learning from Scalable Synthetic Experience(EvoCUA:通过从可扩展合成经验中学习来演化计算机使用智能体)
[04:02] 🧪 LLM-in-Sandbox Elicits General Agentic Intelligence(沙盒中的LLM激发通用智能体智能)
[04:54] 🚀 Stable-DiffCoder: Pushing the Frontier of Code Diffusion Large Language Model(Stable-DiffCoder:推进代码扩散大语言模型的前沿)
[05:34] 🎭 SAMTok: Representing Any Mask with Two Words(SAMTok:用两个词表示任意掩码)
[06:30] 🚀 Scaling Text-to-Image Diffusion Transformers with Representation Autoencoders(使用表征自编码器扩展文本到图像扩散变换器)
[07:23] 🔬 Learning to Discover at Test Time(在测试时学习发现)
[08:08] 🔍 Rethinking Composed Image Retrieval Evaluation: A Fine-Grained Benchmark from Image Editing(重新思考组合图像检索评估:一个源自图像编辑的细粒度基准)
[09:06] ⚙ Towards Automated Kernel Generation in the Era of LLMs(大语言模型时代的自动化内核生成研究)
[09:48] 🔄 OpenVision 3: A Family of Unified Visual Encoder for Both Understanding and Generation(OpenVision 3:一个用于理解和生成的统一视觉编码器家族)
[10:45] 💻 Terminal-Bench: Benchmarking Agents on Hard, Realistic Tasks in Command Line Interfaces(终端基准测试:在命令行界面中对智能体进行困难、现实任务的基准评估)
[11:29] 🗣 Qwen3-TTS Technical Report(Qwen3-TTS技术报告)
[12:13] 🤖 Cosmos Policy: Fine-Tuning Video Models for Visuomotor Control and Planning(Cosmos策略:通过微调视频模型实现视觉运动控制与规划)
【关注我们】
您还可以在以下平台找到我们,获得播客内容以外更多信息
小红书: AI速递
评价...
空空如也
小宇宙热门评论...
暂无小宇宙热门评论