HuggingFace 每日AI论文速递 - 2026.05.14 | MinT用LoRA补丁解决大模型规模难题；MulTaBench对齐图文任务小模型胜大模型 - EarsOnMe

主播

节目简介

来源：小宇宙

【目录】
本期的 15 篇论文如下：
[00:25] 🏗 MinT: Managed Infrastructure for Training and Serving Millions of LLMs（MinT：用于训练和服务数百万大语言模型的托管基础设施）
[01:08] 📊 MulTaBench: Benchmarking Multimodal Tabular Learning with Text and Image（MulTaBench：融合文本与图像的多模态表格学习基准测试）
[02:14] 🎬 AnyFlow: Any-Step Video Diffusion Model with On-Policy Flow Map Distillation（AnyFlow：任意步数视频扩散模型与在线流图蒸馏）
[03:02] 📚 Training Long-Context Vision-Language Models Effectively with Generalization Beyond 128K Context（有效训练长上下文视觉语言模型，实现超越128K上下文的泛化能力）
[03:48] 🤖 Predicting Decisions of AI Agents from Limited Interaction through Text-Tabular Modeling（从有限交互中通过文本-表格建模预测AI代理的决策）
[04:27] 🖼 Qwen-Image-VAE-2.0 Technical Report（千问图像变分自编码器2.0技术报告）
[05:05] 🎨 Edit-Compass & EditReward-Compass: A Unified Benchmark for Image Editing and Reward Modeling（编辑指南针和编辑奖励指南针：图像编辑与奖励建模的统一基准）
[06:01] 🎯 TrackCraft3R: Repurposing Video Diffusion Transformers for Dense 3D Tracking（TrackCraft3R：将视频扩散变换器重新用于密集3D跟踪）
[06:57] 🧠 Many-Shot CoT-ICL: Making In-Context Learning Truly Learn（多示例思维链上下文学习：让上下文学习真正学会）
[07:58] 🎯 FrameSkip: Learning from Fewer but More Informative Frames in VLA Training（FrameSkip：在VLA训练中从更少但更具信息量的帧中学习）
[08:52] 🌅 The DAWN of World-Action Interactive Models（世界-动作交互模型的黎明）
[09:43] 🌊 Asymmetric Flow Models（非对称流模型）
[10:24] 🤖 Learning Agentic Policy from Action Guidance（从行动引导中学习智能体策略）
[11:23] 💻 Retrieval is Cheap, Show Me the Code: Executable Multi-Hop Reasoning for Retrieval-Augmented Generation（检索成本低廉，给我看代码：面向检索增强生成的可执行多跳推理）
[12:13] 🎬 PresentAgent-2: Towards Generalist Multimodal Presentation Agents（PresentAgent-2：迈向通用多模态演示智能体）
【关注我们】
您还可以在以下平台找到我们，获得播客内容以外更多信息
小红书: AI速递

2026.05.14 | MinT用LoRA补丁解决大模型规模难题；MulTaBench对齐图文任务小模型胜大模型

加入我们的 Discord

扫描微信二维码

播放列表