主播
节目简介
来源:小宇宙
【赞助商】
通勤路上就听AI每周谈。AI每周谈,每周带你回顾上周AI大事
传送门 🔗https://www.xiaoyuzhoufm.com/podcast/688a34636f5a275f1cba40fd
【目录】
本期的 15 篇论文如下:
[00:29] 🤖 MiroThinker-1.7 & H1: Towards Heavy-Duty Research Agents via Verification(MiroThinker-1.7与H1:通过验证迈向重型研究智能体)
[01:10] 🏭 InCoder-32B: Code Foundation Model for Industrial Scenarios(InCoder-32B:面向工业场景的代码基础模型)
[02:08] 🧠 Qianfan-OCR: A Unified End-to-End Model for Document Intelligence(千帆OCR:一个用于文档智能的统一端到端模型)
[02:50] 🤖 Kinema4D: Kinematic 4D World Modeling for Spatiotemporal Embodied Simulation(Kinema4D:面向时空具身仿真的运动学4D世界建模)
[03:28] 🧠 Demystifing Video Reasoning(揭秘视频推理机制)
[04:26] 🎮 WorldCam: Interactive Autoregressive 3D Gaming Worlds with Camera Pose as a Unifying Geometric Representation(WorldCam:以相机位姿为统一几何表示的交互式自回归3D游戏世界)
[05:26] 🧠 TRUST-SQL: Tool-Integrated Multi-Turn Reinforcement Learning for Text-to-SQL over Unknown Schemas(TRUST-SQL:面向未知模式的文本到SQL工具集成多轮强化学习)
[06:12] 🤔 Thinking in Uncertainty: Mitigating Hallucinations in MLRMs with Latent Entropy-Aware Decoding(在不确定性中思考:通过潜在熵感知解码缓解多模态大推理模型的幻觉问题)
[07:02] 🔄 Online Experiential Learning for Language Models(语言模型的在线体验式学习)
[07:54] 📊 FinToolBench: Evaluating LLM Agents for Real-World Financial Tool Use(FinToolBench:评估面向现实世界金融工具使用的大语言模型智能体)
[08:47] 🚀 Rethinking UMM Visual Generation: Masked Modeling for Efficient Image-Only Pre-training(重新思考统一多模态模型视觉生成:基于掩码建模的高效纯图像预训练)
[09:30] 🧭 WiT: Waypoint Diffusion Transformers via Trajectory Conflict Navigation(WiT:基于轨迹冲突导航的路径点扩散Transformer)
[10:20] 🔍 AgentProcessBench: Diagnosing Step-Level Process Quality in Tool-Using Agents(AgentProcessBench:诊断工具使用智能体的步骤级过程质量)
[11:03] 🎨 SegviGen: Repurposing 3D Generative Model for Part Segmentation(SegviGen:重新利用3D生成模型进行部件分割)
[11:59] 🗣 SocialOmni: Benchmarking Audio-Visual Social Interactivity in Omni Models(SocialOmni:全模态模型中视听社交交互能力的基准测试)
【关注我们】
您还可以在以下平台找到我们,获得播客内容以外更多信息
小红书: AI速递
通勤路上就听AI每周谈。AI每周谈,每周带你回顾上周AI大事
传送门 🔗https://www.xiaoyuzhoufm.com/podcast/688a34636f5a275f1cba40fd
【目录】
本期的 15 篇论文如下:
[00:29] 🤖 MiroThinker-1.7 & H1: Towards Heavy-Duty Research Agents via Verification(MiroThinker-1.7与H1:通过验证迈向重型研究智能体)
[01:10] 🏭 InCoder-32B: Code Foundation Model for Industrial Scenarios(InCoder-32B:面向工业场景的代码基础模型)
[02:08] 🧠 Qianfan-OCR: A Unified End-to-End Model for Document Intelligence(千帆OCR:一个用于文档智能的统一端到端模型)
[02:50] 🤖 Kinema4D: Kinematic 4D World Modeling for Spatiotemporal Embodied Simulation(Kinema4D:面向时空具身仿真的运动学4D世界建模)
[03:28] 🧠 Demystifing Video Reasoning(揭秘视频推理机制)
[04:26] 🎮 WorldCam: Interactive Autoregressive 3D Gaming Worlds with Camera Pose as a Unifying Geometric Representation(WorldCam:以相机位姿为统一几何表示的交互式自回归3D游戏世界)
[05:26] 🧠 TRUST-SQL: Tool-Integrated Multi-Turn Reinforcement Learning for Text-to-SQL over Unknown Schemas(TRUST-SQL:面向未知模式的文本到SQL工具集成多轮强化学习)
[06:12] 🤔 Thinking in Uncertainty: Mitigating Hallucinations in MLRMs with Latent Entropy-Aware Decoding(在不确定性中思考:通过潜在熵感知解码缓解多模态大推理模型的幻觉问题)
[07:02] 🔄 Online Experiential Learning for Language Models(语言模型的在线体验式学习)
[07:54] 📊 FinToolBench: Evaluating LLM Agents for Real-World Financial Tool Use(FinToolBench:评估面向现实世界金融工具使用的大语言模型智能体)
[08:47] 🚀 Rethinking UMM Visual Generation: Masked Modeling for Efficient Image-Only Pre-training(重新思考统一多模态模型视觉生成:基于掩码建模的高效纯图像预训练)
[09:30] 🧭 WiT: Waypoint Diffusion Transformers via Trajectory Conflict Navigation(WiT:基于轨迹冲突导航的路径点扩散Transformer)
[10:20] 🔍 AgentProcessBench: Diagnosing Step-Level Process Quality in Tool-Using Agents(AgentProcessBench:诊断工具使用智能体的步骤级过程质量)
[11:03] 🎨 SegviGen: Repurposing 3D Generative Model for Part Segmentation(SegviGen:重新利用3D生成模型进行部件分割)
[11:59] 🗣 SocialOmni: Benchmarking Audio-Visual Social Interactivity in Omni Models(SocialOmni:全模态模型中视听社交交互能力的基准测试)
【关注我们】
您还可以在以下平台找到我们,获得播客内容以外更多信息
小红书: AI速递