HuggingFace 每日AI论文速递 - 2025.08.22 | 科学多模态缩小差距；GUI自动化解决挑战 - EarsOnMe

HuggingFace 每日AI论文速递
2025.08.22 | 科学多模态缩小差距；GUI自动化解决挑战

时长：

7分钟

播放：

116

发布：

2周前

主播...

拨号上网

简介...

本期的 15 篇论文如下：
[00:22] 🧪 Intern-S1: A Scientific Multimodal Foundation Model（Intern-S1：一个科学多模态基础模型）
[00:46] 🤖 Mobile-Agent-v3: Foundamental Agents for GUI Automation（Mobile-Agent-v3：GUI自动化基础智能体）
[01:10] ✅ Deep Think with Confidence（置信深思）
[01:31] 🤔 LiveMCP-101: Stress Testing and Diagnosing MCP-enabled Agents on Challenging Queries（LiveMCP-101：在挑战性查询上对启用MCP的智能体进行压力测试与诊断）
[02:01] 🎬 Waver: Wave Your Way to Lifelike Video Generation（Waver：驾驭波形，生成栩栩如生的视频）
[02:25] 🏞 SceneGen: Single-Image 3D Scene Generation in One Feedforward Pass（SceneGen：单图一次前向传播生成三维场景）
[02:56] 📚 A Survey on Large Language Model Benchmarks（大语言模型基准测试综述）
[03:20] 🤸 ATLAS: Decoupling Skeletal and Shape Parameters for Expressive Parametric Human Modeling（ATLAS：解耦骨骼与形状参数，实现富有表现力的参数化人体建模）
[03:46] 🎨 Visual Autoregressive Modeling for Instruction-Guided Image Editing（用于指令引导图像编辑的视觉自回归建模）
[04:15] 🤖 aiXiv: A Next-Generation Open Access Ecosystem for Scientific Discovery Generated by AI Scientists（aiXiv：由AI科学家生成的下一代开放获取科学发现生态系统）
[04:40] 🗺 "Does the cafe entrance look accessible? Where is the door?" Towards Geospatial AI Agents for Visual Inquiries（“咖啡馆入口是否无障碍？门在哪里？”——迈向地理空间AI智能体实现视觉查询）
[05:12] 🔍 When and What: Diffusion-Grounded VideoLLM with Entity Aware Segmentation for Long Video Understanding（何时何物：基于扩散模型的视频大语言模型，结合实体感知分割实现长视频理解）
[05:44] 💰 Fin-PRM: A Domain-Specialized Process Reward Model for Financial Reasoning in Large Language Models（Fin-PRM：大型语言模型金融推理的领域专用过程奖励模型）
[06:08] ⚡ Snap-Snap: Taking Two Images to Reconstruct 3D Human Gaussians in Milliseconds（Snap-Snap：双图快拍，毫秒级3D人体高斯重建）
[06:37] 🫂 INTIMA: A Benchmark for Human-AI Companionship Behavior（INTIMA：人机陪伴行为基准）
【关注我们】
您还可以在以下平台找到我们，获得播客内容以外更多信息
小红书: AI速递

评价...

空空如也

小宇宙热门评论...

暂无小宇宙热门评论

去听...

小宇宙

谁收藏了...

EarsOnMe

空空如也

加入我们的 Discord

扫描微信二维码

播放列表