主播
节目简介
来源:小宇宙
【目录】
本期的 15 篇论文如下:
[00:25] 🏗 MinT: Managed Infrastructure for Training and Serving Millions of LLMs(MinT:用于训练和服务数百万大语言模型的托管基础设施)
[01:08] 📊 MulTaBench: Benchmarking Multimodal Tabular Learning with Text and Image(MulTaBench:融合文本与图像的多模态表格学习基准测试)
[02:14] 🎬 AnyFlow: Any-Step Video Diffusion Model with On-Policy Flow Map Distillation(AnyFlow:任意步数视频扩散模型与在线流图蒸馏)
[03:02] 📚 Training Long-Context Vision-Language Models Effectively with Generalization Beyond 128K Context(有效训练长上下文视觉语言模型,实现超越128K上下文的泛化能力)
[03:48] 🤖 Predicting Decisions of AI Agents from Limited Interaction through Text-Tabular Modeling(从有限交互中通过文本-表格建模预测AI代理的决策)
[04:27] 🖼 Qwen-Image-VAE-2.0 Technical Report(千问图像变分自编码器2.0技术报告)
[05:05] 🎨 Edit-Compass & EditReward-Compass: A Unified Benchmark for Image Editing and Reward Modeling(编辑指南针和编辑奖励指南针:图像编辑与奖励建模的统一基准)
[06:01] 🎯 TrackCraft3R: Repurposing Video Diffusion Transformers for Dense 3D Tracking(TrackCraft3R:将视频扩散变换器重新用于密集3D跟踪)
[06:57] 🧠 Many-Shot CoT-ICL: Making In-Context Learning Truly Learn(多示例思维链上下文学习:让上下文学习真正学会)
[07:58] 🎯 FrameSkip: Learning from Fewer but More Informative Frames in VLA Training(FrameSkip:在VLA训练中从更少但更具信息量的帧中学习)
[08:52] 🌅 The DAWN of World-Action Interactive Models(世界-动作交互模型的黎明)
[09:43] 🌊 Asymmetric Flow Models(非对称流模型)
[10:24] 🤖 Learning Agentic Policy from Action Guidance(从行动引导中学习智能体策略)
[11:23] 💻 Retrieval is Cheap, Show Me the Code: Executable Multi-Hop Reasoning for Retrieval-Augmented Generation(检索成本低廉,给我看代码:面向检索增强生成的可执行多跳推理)
[12:13] 🎬 PresentAgent-2: Towards Generalist Multimodal Presentation Agents(PresentAgent-2:迈向通用多模态演示智能体)
【关注我们】
您还可以在以下平台找到我们,获得播客内容以外更多信息
小红书: AI速递
本期的 15 篇论文如下:
[00:25] 🏗 MinT: Managed Infrastructure for Training and Serving Millions of LLMs(MinT:用于训练和服务数百万大语言模型的托管基础设施)
[01:08] 📊 MulTaBench: Benchmarking Multimodal Tabular Learning with Text and Image(MulTaBench:融合文本与图像的多模态表格学习基准测试)
[02:14] 🎬 AnyFlow: Any-Step Video Diffusion Model with On-Policy Flow Map Distillation(AnyFlow:任意步数视频扩散模型与在线流图蒸馏)
[03:02] 📚 Training Long-Context Vision-Language Models Effectively with Generalization Beyond 128K Context(有效训练长上下文视觉语言模型,实现超越128K上下文的泛化能力)
[03:48] 🤖 Predicting Decisions of AI Agents from Limited Interaction through Text-Tabular Modeling(从有限交互中通过文本-表格建模预测AI代理的决策)
[04:27] 🖼 Qwen-Image-VAE-2.0 Technical Report(千问图像变分自编码器2.0技术报告)
[05:05] 🎨 Edit-Compass & EditReward-Compass: A Unified Benchmark for Image Editing and Reward Modeling(编辑指南针和编辑奖励指南针:图像编辑与奖励建模的统一基准)
[06:01] 🎯 TrackCraft3R: Repurposing Video Diffusion Transformers for Dense 3D Tracking(TrackCraft3R:将视频扩散变换器重新用于密集3D跟踪)
[06:57] 🧠 Many-Shot CoT-ICL: Making In-Context Learning Truly Learn(多示例思维链上下文学习:让上下文学习真正学会)
[07:58] 🎯 FrameSkip: Learning from Fewer but More Informative Frames in VLA Training(FrameSkip:在VLA训练中从更少但更具信息量的帧中学习)
[08:52] 🌅 The DAWN of World-Action Interactive Models(世界-动作交互模型的黎明)
[09:43] 🌊 Asymmetric Flow Models(非对称流模型)
[10:24] 🤖 Learning Agentic Policy from Action Guidance(从行动引导中学习智能体策略)
[11:23] 💻 Retrieval is Cheap, Show Me the Code: Executable Multi-Hop Reasoning for Retrieval-Augmented Generation(检索成本低廉,给我看代码:面向检索增强生成的可执行多跳推理)
[12:13] 🎬 PresentAgent-2: Towards Generalist Multimodal Presentation Agents(PresentAgent-2:迈向通用多模态演示智能体)
【关注我们】
您还可以在以下平台找到我们,获得播客内容以外更多信息
小红书: AI速递