本期的 21 篇论文如下:
[00:26] 🔍 Differential Transformer(差分Transformer)
[01:04] 🧠 LLMs Know More Than They Show: On the Intrinsic Representation of LLM Hallucinations(大语言模型知多于表:关于LLM幻觉的内在表征)
[01:50] 📹 VideoGuide: Improving Video Diffusion Models without Training Through a Teacher's Guide(视频指南:通过教师指导提升视频扩散模型无需训练)
[02:28] 📈 FAN: Fourier Analysis Networks(傅里叶分析网络)
[03:05] 🏥 Named Clinical Entity Recognition Benchmark(命名临床实体识别基准)
[03:37] 🔬 ScienceAgentBench: Toward Rigorous Assessment of Language Agents for Data-Driven Scientific Discovery(科学智能基准:面向数据驱动科学发现的语言智能体严格评估)
[04:19] 🎶 UniMuMo: Unified Text, Music and Motion Generation(统一文本、音乐与动作生成)
[04:55] 🔍 TLDR: Token-Level Detective Reward Model for Large Vision Language Models(TLDR:大视觉语言模型的令牌级侦探奖励模型)
[05:35] 🎵 Presto! Distilling Steps and Layers for Accelerating Music Generation(快速!加速音乐生成的步骤和层级蒸馏)
[06:08] 🖥 Navigating the Digital World as Humans Do: Universal Visual Grounding for GUI Agents(像人类一样导航数字世界:GUI代理的通用视觉基础)
[06:49] 🖼 OmniBooth: Learning Latent Control for Image Synthesis with Multi-modal Instruction(全能展台:通过多模态指令学习图像合成的潜在控制)
[07:29] 🌀 MonST3R: A Simple Approach for Estimating Geometry in the Presence of Motion(MonST3R:一种在动态场景中估计几何的简单方法)
[08:09] 🧠 LLaMA-Berry: Pairwise Optimization for O1-like Olympiad-Level Mathematical Reasoning(LLaMA-Berry:O1类奥林匹克级数学推理的成对优化)
[08:50] 📊 MathHay: An Automated Benchmark for Long-Context Mathematical Reasoning in LLMs(MathHay:LLMs长上下文数学推理自动化基准)
[09:39] 📊 GSM-Symbolic: Understanding the Limitations of Mathematical Reasoning in Large Language Models(GSM-符号化:理解大型语言模型在数学推理中的局限性)
[10:34] 🤖 Autonomous Character-Scene Interaction Synthesis from Text Instruction(从文本指令自主合成角色场景互动)
[11:12] 🧩 TurtleBench: Evaluating Top Language Models via Real-World Yes/No Puzzles(TurtleBench:通过真实世界的Yes/No谜题评估顶级语言模型)
[12:00] 🤖 Grounding Language in Multi-Perspective Referential Communication(多视角指称通信中的语言接地)
[12:48] 🎯 SePPO: Semi-Policy Preference Optimization for Diffusion Alignment(SePPO:扩散模型对齐的半策略偏好优化)
[13:25] 🧩 What Matters for Model Merging at Scale?(大规模模型合并的关键因素是什么?)
[14:02] 📊 SELECT: A Large-Scale Benchmark of Data Curation Strategies for Image Classification(SELECT:图像分类数据策展策略的大规模基准)
【关注我们】
您还可以在以下平台找到我们,获得播客内容以外更多信息
小红书: AI速递

空空如也
暂无小宇宙热门评论