时长:
8分钟
播放:
87
发布:
2天前
主播...
简介...
本期的 11 篇论文如下:
[00:23] 🧠 Can World Simulators Reason? Gen-ViRe: A Generative Visual Reasoning Benchmark(世界模拟器会推理吗?Gen-ViRe生成式视觉推理基准)
[01:03] 🕵 MVI-Bench: A Comprehensive Benchmark for Evaluating Robustness to Misleading Visual Inputs in LVLMs(MVI-Bench:评估大型视觉语言模型对误导性视觉输入鲁棒性的综合基准)
[01:49] 🎞 REVISOR: Beyond Textual Reflection, Towards Multimodal Introspective Reasoning in Long-Form Video Understanding(REVISOR:超越文本反思,迈向长视频理解中的多模态内省推理)
[03:02] 🧪 ATLAS: A High-Difficulty, Multidisciplinary Benchmark for Frontier Scientific Reasoning(ATLAS:面向通用人工智能的高难度跨学科科学推理基准)
[03:43] 🔍 Large Language Models Meet Extreme Multi-label Classification: Scaling and Multi-modal Framework(大语言模型遇上极端多标签分类:可扩展多模态框架)
[04:16] 🤖 Agent-R1: Training Powerful LLM Agents with End-to-End Reinforcement Learning(Agent-R1:以端到端强化学习训练强大语言模型智能体)
[05:02] 🤖 Orion: A Unified Visual Agent for Multimodal Perception, Advanced Visual Reasoning and Execution(Orion:统一视觉智能体,实现多模态感知、高级视觉推理与执行)
[05:32] ⚖ Mitigating Label Length Bias in Large Language Models(缓解大语言模型中的标签长度偏差)
[06:14] 🧠 Agent READMEs: An Empirical Study of Context Files for Agentic Coding(智能体README:面向代理编程的上下文文件实证研究)
[06:49] 🎧 Proactive Hearing Assistants that Isolate Egocentric Conversations(主动式听力助手:以自我为中心的对话自动分离技术)
[07:20] 🎯 Error-Driven Scene Editing for 3D Grounding in Large Language Models(面向3D大模型的误差驱动场景编辑实现精准视觉定位)
【关注我们】
您还可以在以下平台找到我们,获得播客内容以外更多信息
小红书: AI速递
[00:23] 🧠 Can World Simulators Reason? Gen-ViRe: A Generative Visual Reasoning Benchmark(世界模拟器会推理吗?Gen-ViRe生成式视觉推理基准)
[01:03] 🕵 MVI-Bench: A Comprehensive Benchmark for Evaluating Robustness to Misleading Visual Inputs in LVLMs(MVI-Bench:评估大型视觉语言模型对误导性视觉输入鲁棒性的综合基准)
[01:49] 🎞 REVISOR: Beyond Textual Reflection, Towards Multimodal Introspective Reasoning in Long-Form Video Understanding(REVISOR:超越文本反思,迈向长视频理解中的多模态内省推理)
[03:02] 🧪 ATLAS: A High-Difficulty, Multidisciplinary Benchmark for Frontier Scientific Reasoning(ATLAS:面向通用人工智能的高难度跨学科科学推理基准)
[03:43] 🔍 Large Language Models Meet Extreme Multi-label Classification: Scaling and Multi-modal Framework(大语言模型遇上极端多标签分类:可扩展多模态框架)
[04:16] 🤖 Agent-R1: Training Powerful LLM Agents with End-to-End Reinforcement Learning(Agent-R1:以端到端强化学习训练强大语言模型智能体)
[05:02] 🤖 Orion: A Unified Visual Agent for Multimodal Perception, Advanced Visual Reasoning and Execution(Orion:统一视觉智能体,实现多模态感知、高级视觉推理与执行)
[05:32] ⚖ Mitigating Label Length Bias in Large Language Models(缓解大语言模型中的标签长度偏差)
[06:14] 🧠 Agent READMEs: An Empirical Study of Context Files for Agentic Coding(智能体README:面向代理编程的上下文文件实证研究)
[06:49] 🎧 Proactive Hearing Assistants that Isolate Egocentric Conversations(主动式听力助手:以自我为中心的对话自动分离技术)
[07:20] 🎯 Error-Driven Scene Editing for 3D Grounding in Large Language Models(面向3D大模型的误差驱动场景编辑实现精准视觉定位)
【关注我们】
您还可以在以下平台找到我们,获得播客内容以外更多信息
小红书: AI速递
评价...
空空如也
小宇宙热门评论...
暂无小宇宙热门评论