HuggingFace 每日AI论文速递 - 2025.05.08 | 多模态模型整合潜力大；零搜索提升LLMs效率。 - EarsOnMe

HuggingFace 每日AI论文速递
2025.05.08 | 多模态模型整合潜力大；零搜索提升LLMs效率。

时长：

10分钟

播放：

113

发布：

4个月前

主播...

拨号上网

简介...

本期的 14 篇论文如下：
[00:21] 💡 Unified Multimodal Understanding and Generation Models: Advances, Challenges, and Opportunities（统一多模态理解与生成模型：进展、挑战与机遇）
[01:02] 🤖 ZeroSearch: Incentivize the Search Capability of LLMs without Searching（零搜索：无需搜索即可激励大型语言模型的搜索能力）
[01:50] 🤔 Beyond Recognition: Evaluating Visual Perspective Taking in Vision Language Models（超越识别：评估视觉语言模型中的视觉视角采纳能力）
[02:31] 🎬 HunyuanCustom: A Multimodal-Driven Architecture for Customized Video Generation（HunyuanCustom：一种用于定制视频生成的多模态驱动架构）
[03:15] 🧩 PrimitiveAnything: Human-Crafted 3D Primitive Assembly Generation with Auto-Regressive Transformer（PrimitiveAnything：基于自回归Transformer的人工3D图元组合生成）
[04:04] 🤖 Benchmarking LLMs' Swarm intelligence（大型语言模型群集智能基准测试）
[04:49] 🤔 Beyond Theorem Proving: Formulation, Framework and Benchmark for Formal Problem-Solving（超越定理证明：形式化问题求解的公式、框架与基准）
[05:26] 🤖 OpenHelix: A Short Survey, Empirical Analysis, and Open-Source Dual-System VLA Model for Robotic Manipulation（OpenHelix：机器人操作的双系统VLA模型的简要调查、实证分析和开源实现）
[05:58] 🌐 OmniGIRL: A Multilingual and Multimodal Benchmark for GitHub Issue Resolution（OmniGIRL：一个用于GitHub问题解决的多语言和多模态基准）
[06:36] 🖥 OSUniverse: Benchmark for Multimodal GUI-navigation AI Agents（OSUniverse：多模态GUI导航AI智能体的基准测试）
[07:19] 🧠 Knowledge Augmented Complex Problem Solving with Large Language Models: A Survey（大型语言模型赋能知识增强的复杂问题求解：一项综述）
[08:04] 🎛 R&B: Domain Regrouping and Data Mixture Balancing for Efficient Foundation Model Training（R&B：面向高效基础模型训练的领域重组与数据混合平衡）
[08:48] 🤝 Cognitio Emergens: Agency, Dimensions, and Dynamics in Human-AI Knowledge Co-Creation（涌现认知：人机知识共创中的能动性、维度与动态）
[09:26] 📹 Uncertainty-Weighted Image-Event Multimodal Fusion for Video Anomaly Detection（不确定性加权图像-事件多模态融合的视频异常检测）
【关注我们】
您还可以在以下平台找到我们，获得播客内容以外更多信息
小红书: AI速递

评价...

空空如也

小宇宙热门评论...

暂无小宇宙热门评论

去听...

小宇宙

谁收藏了...

EarsOnMe

空空如也

加入我们的 Discord

扫描微信二维码

播放列表