节目列表: HuggingFace 每日AI论文速递 - EarsOnMe | 发现和收听来自小宇宙的热门播客

2024.10.31 每日AI论文 | 多轮对话评估新基准，机器人任务高效推理模型。

本期的 5 篇论文如下： [00:29] 🗣 CORAL: Benchmarking Multi-turn Conversational Retrieval-Augmentation Generation（CORAL：多轮对话增强生成基准测试） [01:09] 🤖 A Large Recurrent Action Model: xLSTM enables Fast Inference for Robotics Tasks（大型递归动作模型：xLSTM为机器人任务实现快速推理） [01:50] 🔍 Stealing User Prompts from Mixture of Experts（从混合专家模型中窃取用户提示） [02:26] 🩺 AutoMIR: Effective Zero-Shot Medical Information Retrieval without Relevance Labels（自动医疗信息检索：无需相关标签的有效零样本检索） [02:58] 🔄 TokenFormer: Rethinking Transformer Scaling with Tokenized Model Parameters（TokenFormer：重新思考Transformer的扩展与模型参数的标记化）【关注我们】您还可以在以下平台找到我们，获得播客内容以外更多信息小红书: AI速递

4分钟

2024.10.30 每日AI论文 | 多模态遗忘挑战大，AutoKaggle提升效率。

本期的 8 篇论文如下： [00:33] 🧠 CLEAR: Character Unlearning in Textual and Visual Modalities（CLEAR：文本与视觉模态中的字符遗忘） [01:10] 🤖 AutoKaggle: A Multi-Agent Framework for Autonomous Data Science Competitions（AutoKaggle：一种用于自主数据科学竞赛的多智能体框架） [01:46] 🤖 SocialGPT: Prompting LLMs for Social Relation Reasoning via Greedy Segment Optimization（社交GPT：通过贪婪段优化提示LLMs进行社交关系推理） [02:26] 🌐 OpenWebVoyager: Building Multimodal Web Agents via Iterative Real-World Exploration, Feedback and Optimization（开放式网络航海者：通过迭代现实世界探索、反馈和优化构建多模态网络代理） [03:13] 🧠 Flow-DPO: Improving LLM Mathematical Reasoning through Online Multi-Agent Learning（Flow-DPO：通过在线多智能体学习提升LLM数学推理能力） [03:52] 🚀 ShadowKV: KV Cache in Shadows for High-Throughput Long-Context LLM Inference（ShadowKV：高吞吐量长上下文LLM推理的KV缓存优化） [04:31] 🤖 Robots Pre-train Robots: Manipulation-Centric Robotic Representation from Large-Scale Robot Dataset（机器人预训练机器人：基于大规模机器人数据集的以操作为中心的机器人表示） [05:17] 🤖 Precise and Dexterous Robotic Manipulation via Human-in-the-Loop Reinforcement Learning（基于人机协作强化学习的精确灵巧机器人操作）【关注我们】您还可以在以下平台找到我们，获得播客内容以外更多信息小红书: AI速递

6分钟

74

2024.10.29 每日AI论文 | 波兰语模型性能提升，异构代理系统创新。

本期的 17 篇论文如下： [00:24] 🇵 Bielik 7B v0.1: A Polish Language Model -- Development, Insights, and Evaluation（Bielik 7B v0.1：波兰语言模型——开发、洞察与评估） [01:00] 🤖 AgentStore: Scalable Integration of Heterogeneous Agents As Specialized Generalist Computer Assistant（AgentStore：可扩展的异构代理作为专业化通才计算机助手集成） [01:39] 🤖 GPT-4o System Card（GPT-4o系统卡片） [02:21] 📄 Document Parsing Unveiled: Techniques, Challenges, and Prospects for Structured Information Extraction（文档解析揭秘：结构化信息提取的技术、挑战与前景） [03:08] 🤖 LongReward: Improving Long-context Large Language Models with AI Feedback（长奖励：通过AI反馈提升长上下文大语言模型） [03:43] 🎥 MarDini: Masked Autoregressive Diffusion for Video Generation at Scale（MarDini：大规模视频生成的掩码自回归扩散模型） [04:22] 🌟 DreamClear: High-Capacity Real-World Image Restoration with Privacy-Safe Dataset Curation（DreamClear：高容量真实世界图像修复与隐私安全数据集构建） [05:10] 🧩 GrounDiT: Grounding Diffusion Transformers via Noisy Patch Transplantation（GrounDiT：基于噪声补丁移植的扩散变换器空间定位） [05:49] 📚 A Survey of Small Language Models（小语言模型综述） [06:23] 💾 COAT: Compressing Optimizer states and Activation for Memory-Efficient FP8 Training（COAT：压缩优化器状态和激活以实现高效的FP8训练） [06:58] ⚡ Fast Best-of-N Decoding via Speculative Rejection（基于推测拒绝的快速最佳N解码） [07:36] 🔍 Vision Search Assistant: Empower Vision-Language Models as Multimodal Search Engines（视觉搜索助手：赋能视觉-语言模型作为多模态搜索引擎） [08:25] 🎥 LARP: Tokenizing Videos with a Learned Autoregressive Generative Prior（LARP：利用学习到的自回归生成先验进行视频标记化） [09:00] 🤖 Neural Fields in Robotics: A Survey（机器人学中的神经场：综述） [09:40] 🗣 Dialog2Flow: Pre-training Soft-Contrastive Action-Driven Sentence Embeddings for Automatic Dialog Flow Extraction（对话2流程：预训练软对比动作驱动句子嵌入用于自动对话流程提取） [10:15] 🩺 Language Models And A Second Opinion Use Case: The Pocket Professional（语言模型与第二意见应用案例：口袋专家） [10:55] 🤖 Leveraging Locality to Boost Sample Efficiency in Robotic Manipulation（利用局部性提升机器人操作的样本效率）【关注我们】您还可以在以下平台找到我们，获得播客内容以外更多信息小红书: AI速递

12分钟

86

2024.10.28 每日AI论文 | 视觉-时间提示提升交互，连续扩散模型优化语音合成

本期的 13 篇论文如下： [00:25] 🚀 ROCKET-1: Master Open-World Interaction with Visual-Temporal Context Prompting（ROCKET-1：利用视觉-时间上下文提示掌握开放世界交互） [01:14] 🗣 Continuous Speech Synthesis using per-token Latent Diffusion（基于每标记潜在扩散的连续语音合成） [01:55] ⚡ Teach Multimodal LLMs to Comprehend Electrocardiographic Images（教授多模态大语言模型理解心电图图像） [02:39] 🌐 Infinity-MM: Scaling Multimodal Performance with Large-Scale and High-Quality Instruction Data（无限多模态：通过大规模高质量指令数据扩展多模态性能） [03:23] ⚡ FasterCache: Training-Free Video Diffusion Model Acceleration with High Quality（FasterCache：无训练视频扩散模型加速与高质量生成） [03:56] 🎧 MMAU: A Massive Multi-Task Audio Understanding and Reasoning Benchmark（大规模多任务音频理解与推理基准） [04:34] 🧠 Counting Ability of Large Language Models and Impact of Tokenization（大型语言模型的计数能力及其对分词的影响） [05:08] 🧠 Fictitious Synthetic Data Can Improve LLM Factuality via Prerequisite Learning（通过先决学习利用虚构合成数据提升LLM事实性） [05:46] 🤖 Reflection-Bench: probing AI intelligence with reflection（反射-基准：通过反射探测AI智能） [06:23] 🤖 Hybrid Preferences: Learning to Route Instances for Human vs. AI Feedback（混合偏好：学习路由实例以进行人机反馈） [06:57] 🔍 Leveraging Skills from Unlabeled Prior Data for Efficient Online Exploration（利用未标注的先验数据进行高效在线探索） [07:35] 🔍 Are LLMs Better than Reported? Detecting Label Errors and Mitigating Their Effect on Model Performance（LLM是否优于报告？检测标签错误并减轻其对模型性能的影响） [08:15] 🤖 Dynamic 3D Gaussian Tracking for Graph-Based Neural Dynamics Modeling（基于图神经网络的动态三维高斯跟踪用于神经动力学建模）【关注我们】您还可以在以下平台找到我们，获得播客内容以外更多信息小红书: AI速递

9分钟

86

【周末特辑】10月第4周最火AI论文 | 少样本NeRF高效收敛，长视频分割精度提升。

本期的 5 篇论文如下： [00:44] TOP1(🔥79) | ⚡ FrugalNeRF: Fast Convergence for Few-shot Novel View Synthesis without Learned Priors（节俭NeRF：无学习先验的少样本新视角合成快速收敛） [02:42] TOP2(🔥60) | 🌳 SAM2Long: Enhancing SAM 2 for Long Video Segmentation with a Training-Free Memory Tree（SAM2Long：通过无训练记忆树增强SAM 2以实现长视频分割） [04:19] TOP3(🔥58) | 🚀 Breaking the Memory Barrier: Near Infinite Batch Size Scaling for Contrastive Loss（打破内存壁垒：对比损失的近无限批量规模扩展） [06:11] TOP4(🔥55) | 🤖 CompassJudger-1: All-in-one Judge Model Helps Model Evaluation and Evolution（指南针评判者-1：一体化评判模型助力模型评估与进化） [08:28] TOP5(🔥52) | 💼 UCFE: A User-Centric Financial Expertise Benchmark for Large Language Models（UCFE：面向用户的大语言模型金融专业能力基准）【关注我们】您还可以在以下平台找到我们，获得播客内容以外更多信息小红书: AI速递

10分钟

2024.10.25 每日AI论文 | 内存效率显著提升，长上下文对齐能力增强。

本期的 21 篇论文如下： [00:26] 🚀 Breaking the Memory Barrier: Near Infinite Batch Size Scaling for Contrastive Loss（打破内存壁垒：对比损失的近无限批量规模扩展） [01:09] 🔄 LOGO -- Long cOntext aliGnment via efficient preference Optimization（LOGO -- 通过高效偏好优化实现长上下文对齐） [01:45] 🧠 Unleashing Reasoning Capability of LLMs via Scalable Question Synthesis from Scratch（从零开始释放LLMs的推理能力：可扩展的问题合成方法） [02:30] 🤔 Can Knowledge Editing Really Correct Hallucinations?（知识编辑真的能纠正幻觉吗？） [03:17] 🎮 Unbounded: A Generative Infinite Game of Character Life Simulation（无界：生成式无限角色生活模拟游戏） [04:02] 🎥 Framer: Interactive Frame Interpolation（Framer：交互式帧插值） [04:48] 📊 Distill Visual Chart Reasoning Ability from LLMs to MLLMs（从LLMs到MLLMs的视觉图表推理能力提炼） [05:35] 📉 Why Does the Effective Context Length of LLMs Fall Short?（为什么大型语言模型的有效上下文长度不足？） [06:14] 🔒 Robust Watermarking Using Generative Priors Against Image Editing: From Benchmarking to Advances（基于生成先验的鲁棒水印技术对抗图像编辑：从基准测试到进展） [06:52] 🔧 Skywork-Reward: Bag of Tricks for Reward Modeling in LLMs（天工奖励：LLM奖励建模的技巧包） [07:27] 🌍 CAMEL-Bench: A Comprehensive Arabic LMM Benchmark（CAMEL-Bench：一个全面的阿拉伯语大型多模态模型基准） [08:09] 📊 Should We Really Edit Language Models? On the Evaluation of Edited Language Models（我们真的应该编辑语言模型吗？关于编辑语言模型的评估） [08:43] 🌐 ADEM-VL: Adaptive and Embedded Fusion for Efficient Vision-Language Tuning（ADEM-VL：高效视觉语言调优的自适应嵌入融合方法） [09:20] 🌐 WAFFLE: Multi-Modal Model for Automated Front-End Development（WAFFLE：自动化前端开发的多模态模型） [09:52] 📚 CCI3.0-HQ: a large-scale Chinese dataset of high quality designed for pre-training large language models（CCI3.0-HQ：一个用于预训练大型语言模型的高质量大规模中文数据集） [10:30] 🔄 Stable Consistency Tuning: Understanding and Improving Consistency Models（稳定一致性调优：理解与改进一致性模型） [11:10] 🧮 Language Models are Symbolic Learners in Arithmetic（语言模型在算术中的符号学习者角色） [12:00] 🐍 Taipan: Efficient and Expressive State Space Language Models with Selective Attention（Taipan：高效且表达丰富的状态空间语言模型与选择性注意力） [12:44] 🔄 Value Residual Learning For Alleviating Attention Concentration In Transformers（残差值学习缓解Transformer中的注意力集中问题） [13:23] 📚 Multi-Draft Speculative Sampling: Canonical Architectures and Theoretical Limits（多草稿推测采样：典型架构与理论极限） [14:03] 🤖 Data Scaling Laws in Imitation Learning for Robotic Manipulation（机器人操作中的模仿学习数据缩放定律）【关注我们】您还可以在以下平台找到我们，获得播客内容以外更多信息小红书: AI速递

15分钟

74

2024.10.24 每日AI论文 | 多图像任务优化，视频生成模型评估

本期的 10 篇论文如下： [00:25] 🖼 MIA-DPO: Multi-Image Augmented Direct Preference Optimization For Large Vision-Language Models（多图像增强直接偏好优化：大型视觉语言模型） [01:09] 🌍 WorldSimBench: Towards Video Generation Models as World Simulators（世界模拟器：迈向视频生成模型作为世界模拟器） [01:47] 🌊 Scaling Diffusion Language Models via Adaptation from Autoregressive Models（通过自回归模型适应扩展扩散语言模型） [02:20] 📱 Lightweight Neural App Control（轻量级神经应用控制） [03:01] 🏠 ARKit LabelMaker: A New Scale for Indoor 3D Scene Understanding（ARKit标签制造者：室内3D场景理解的新尺度） [03:47] 🖼 Scalable Ranked Preference Optimization for Text-to-Image Generation（可扩展的文本到图像生成中的排序偏好优化） [04:23] 🌆 DynamicCity: Large-Scale LiDAR Generation from Dynamic Scenes（动态城市：动态场景的大规模LiDAR生成） [05:05] 🩺 MedINST: Meta Dataset of Biomedical Instructions（医学指令元数据集：MedINST） [05:52] 🌍 M-RewardBench: Evaluating Reward Models in Multilingual Settings（多语言环境下的奖励模型评估：M-RewardBench） [06:27] 📊 TP-Eval: Tap Multimodal LLMs' Potential in Evaluation by Customizing Prompts（TP-Eval：通过定制提示挖掘多模态大语言模型的评估潜力）【关注我们】您还可以在以下平台找到我们，获得播客内容以外更多信息小红书: AI速递

7分钟

2024.10.23 每日AI论文 | 视觉冗余减少提升效率，动态三维重建优化镜面场景。

本期的 8 篇论文如下： [00:27] 🔍 PyramidDrop: Accelerating Your Large Vision-Language Models via Pyramid Visual Redundancy Reduction（金字塔式视觉冗余减少：通过金字塔视觉冗余减少加速大型视觉-语言模型） [01:09] 🌟 SpectroMotion: Dynamic 3D Reconstruction of Specular Scenes（光谱运动：镜面场景的动态三维重建） [01:48] 🤖 Aligning Large Language Models via Self-Steering Optimization（通过自引导优化对齐大型语言模型） [02:30] 🇯 JMMMU: A Japanese Massive Multi-discipline Multimodal Understanding Benchmark for Culture-aware Evaluation（JMMMU：一个用于文化意识评估的日本大规模多学科多模态理解基准） [03:11] 🧬 EvoPress: Towards Optimal Dynamic Model Compression via Evolutionary Search（EvoPress：通过进化搜索实现最优动态模型压缩） [03:53] 🧠 MiniPLM: Knowledge Distillation for Pre-Training Language Models（MiniPLM：预训练语言模型的知识蒸馏） [04:30] 🔍 Mitigating Object Hallucination via Concentric Causal Attention（通过同心因果注意力缓解对象幻觉） [05:19] 🧠 Math Neurosurgery: Isolating Language Models' Math Reasoning Abilities Using Only Forward Passes（数学神经外科：仅使用前向传递隔离语言模型的数学推理能力）【关注我们】您还可以在以下平台找到我们，获得播客内容以外更多信息小红书: AI速递

6分钟

71

2024.10.22 每日AI论文 | 指南针评判者加速模型评估，SAM2Long提升长视频分割精度。

本期的 21 篇论文如下： [00:24] 🤖 CompassJudger-1: All-in-one Judge Model Helps Model Evaluation and Evolution（指南针评判者-1：一体化评判模型助力模型评估与进化） [01:11] 🌲 SAM2Long: Enhancing SAM 2 for Long Video Segmentation with a Training-Free Memory Tree（SAM2长：通过无需训练的记忆树增强SAM 2以实现长视频分割） [01:55] 🌐 PUMA: Empowering Unified MLLM with Multi-granular Visual Generation（PUMA：赋予统一多模态大语言模型多粒度视觉生成能力） [02:37] 🤖 AutoTrain: No-code training for state-of-the-art models（AutoTrain：无代码训练最先进的模型） [03:10] ⚡ FrugalNeRF: Fast Convergence for Few-shot Novel View Synthesis without Learned Priors（节俭NeRF：无学习先验的少样本新视角合成快速收敛） [03:56] 📊 Baichuan Alignment Technical Report（百川对齐技术报告） [04:39] 🌍 Pangea: A Fully Open Multilingual Multimodal LLM for 39 Languages（泛亚：一个完全开放的多语种多模态LLM，涵盖39种语言） [05:21] 🔍 RM-Bench: Benchmarking Reward Models of Language Models with Subtlety and Style（RM-Bench：评估语言模型奖励模型的细致性与风格敏感度） [06:05] 📚 Meta-Chunking: Learning Efficient Text Segmentation via Logical Perception（元分块：通过逻辑感知学习高效的文本分割） [06:41] 🔍 Pre-training Distillation for Large Language Models: A Design Space Exploration（大型语言模型预训练蒸馏：设计空间探索） [07:16] 🔬 Alchemy: Amplifying Theorem-Proving Capability through Symbolic Mutation（炼金术：通过符号变异增强定理证明能力） [07:55] 🔄 SemiEvol: Semi-supervised Fine-tuning for LLM Adaptation（半监督微调：LLM适应的半监督微调框架） [08:31] 📚 Selecting Influential Samples for Long Context Alignment via Homologous Models' Guidance and Contextual Awareness Measurement（通过同源模型引导和上下文意识测量选择长上下文对齐的关键样本） [09:11] 🤖 Zero-shot Model-based Reinforcement Learning using Large Language Models（基于大语言模型的零样本模型强化学习） [09:53] 🗣 Ichigo: Mixed-Modal Early-Fusion Realtime Voice Assistant（一护：混合模态早期融合实时语音助手） [10:28] 🧠 CBT-Bench: Evaluating Large Language Models on Assisting Cognitive Behavior Therapy（CBT-Bench：评估大型语言模型在辅助认知行为疗法中的应用） [11:12] 🛠 Router-Tuning: A Simple and Effective Approach for Enabling Dynamic-Depth in Transformers（路由器调优：一种简单有效的Transformer动态深度调整方法） [11:58] 🧠 Hallucination Detox: Sensitive Neuron Dropout (SeND) for Large Language Model Training（幻觉解毒：用于大型语言模型训练的敏感神经元丢弃方法） [12:45] 🌍 Cross-Lingual Auto Evaluation for Assessing Multilingual LLMs（多语言大语言模型的跨语言自动评估） [13:25] 🗣 DM-Codec: Distilling Multimodal Representations for Speech Tokenization（多模态表示蒸馏用于语音标记化） [14:17] 🧠 In-context learning and Occam's razor（上下文学习与奥卡姆剃刀）【关注我们】您还可以在以下平台找到我们，获得播客内容以外更多信息小红书: AI速递

15分钟

2024.10.21 每日AI论文 | 提升网页导航成功率，增强图像生成精细度。

本期的 12 篇论文如下： [00:27] 🌐 Web Agents with World Models: Learning and Leveraging Environment Dynamics in Web Navigation（拥有世界模型的网络代理：学习和利用环境动态进行网页导航） [01:11] 👗 MagicTailor: Component-Controllable Personalization in Text-to-Image Diffusion Models（魔法裁缝：文本到图像扩散模型中的组件可控个性化） [01:48] 💼 UCFE: A User-Centric Financial Expertise Benchmark for Large Language Models（UCFE：面向用户的大语言模型金融专业能力基准） [02:37] 🧠 NaturalBench: Evaluating Vision-Language Models on Natural Adversarial Samples（自然对抗样本：评估视觉语言模型） [03:12] 🧠 SeerAttention: Learning Intrinsic Sparse Attention in Your LLMs（SeerAttention：在LLMs中学习内在稀疏注意力） [03:54] 📊 Are AI Detectors Good Enough? A Survey on Quality of Datasets With Machine-Generated Texts（AI检测器足够好吗？机器生成文本数据集质量调查） [04:25] 🌐 Diffusion Curriculum: Synthetic-to-Real Generative Curriculum Learning via Image-Guided Diffusion（扩散课程：通过图像引导扩散实现合成到真实的生成课程学习） [05:08] 🎥 DAWN: Dynamic Frame Avatar with Non-autoregressive Diffusion Framework for Talking Head Video Generation（DAWN: 非自回归扩散框架动态帧头像的讲话头视频生成） [05:50] 🔄 A Common Pitfall of Margin-based Language Model Alignment: Gradient Entanglement（基于边际的语言模型对齐常见陷阱：梯度纠缠） [06:31] 🧬 DPLM-2: A Multimodal Diffusion Protein Language Model（DPLM-2: 一种多模态扩散蛋白质语言模型） [07:12] 📰 Context is Key(NMF): Modelling Topical Information Dynamics in Chinese Diaspora Media（关键在于上下文（NMF）：建模华人媒体中的主题信息动态） [07:56] 🧠 How Do Training Methods Influence the Utilization of Vision Models?（训练方法如何影响视觉模型的利用？）【关注我们】您还可以在以下平台找到我们，获得播客内容以外更多信息小红书: AI速递

8分钟

95

【周末特辑】10月第3周最火AI论文 | 多模态大语言模型创新，评估标准统一化。

本期的 5 篇论文如下： [00:45] TOP1(🔥80) | 🌐 Baichuan-Omni Technical Report（百川-Omni 技术报告） [02:20] TOP2(🔥58) | 📊 MixEval-X: Any-to-Any Evaluations from Real-World Data Mixtures（MixEval-X：从现实世界数据混合中进行任意到任意评估） [04:20] TOP3(🔥58) | 🎥 Movie Gen: A Cast of Media Foundation Models（电影生成：媒体基础模型集合） [06:27] TOP4(🔥53) | 🤖 LOKI: A Comprehensive Synthetic Data Detection Benchmark using Large Multimodal Models（LOKI：基于大型多模态模型的综合合成数据检测基准） [08:23] TOP5(🔥48) | 🌐 MMIE: Massive Multimodal Interleaved Comprehension Benchmark for Large Vision-Language Models（大规模多模态交错理解基准测试）【关注我们】您还可以在以下平台找到我们，获得播客内容以外更多信息小红书: AI速递

10分钟