节目列表: HuggingFace 每日AI论文速递 - EarsOnMe | 发现和收听来自小宇宙的热门播客

2025.03.03 | 工程设计效率提升，推理任务成本降低。

本期的 10 篇论文如下： [00:20] 🌲 DeepSolution: Boosting Complex Engineering Solution Design via Tree-based Exploration and Bi-point Thinking（深度解决方案：通过基于树的探索与双点思维提升复杂工程解决方案设计） [00:55] ✍ Chain of Draft: Thinking Faster by Writing Less（草稿链：通过减少书写提高思考速度） [01:39] 🧠 ViDoRAG: Visual Document Retrieval-Augmented Generation via Dynamic Iterative Reasoning Agents（ViDoRAG：基于动态迭代推理代理的视觉文档检索增强生成） [02:20] 🧠 SoS1: O1 and R1-Like Reasoning LLMs are Sum-of-Square Solvers（SoS1：O1 和 R1 类推理 LLM 是平方和求解器） [03:09] 🧠 Optimal Brain Apoptosis（最优脑凋亡） [03:51] 🧠 Tell me why: Visual foundation models as self-explainable classifiers（告诉我为什么：视觉基础模型作为自解释分类器） [04:31] 🤖 Sim-to-Real Reinforcement Learning for Vision-Based Dexterous Manipulation on Humanoids（基于视觉的类人灵巧操作的仿真到现实强化学习） [05:09] ⚡ LiteASR: Efficient Automatic Speech Recognition with Low-Rank Approximation（LiteASR：基于低秩近似的有效自动语音识别） [05:47] 🎥 HAIC: Improving Human Action Understanding and Generation with Better Captions for Multi-modal Large Language Models（HAIC：利用更好的字幕提升多模态大语言模型的人类行为理解和生成） [06:26] 🥬 LettuceDetect: A Hallucination Detection Framework for RAG Applications（LettuceDetect：用于RAG应用的幻觉检测框架）【关注我们】您还可以在以下平台找到我们，获得播客内容以外更多信息小红书: AI速递

7分钟

99+

2个月前

【月末特辑】2月最火AI论文 | 以数据为中心的小型语言模型训练；人类动画新框架。

本期的 10 篇论文如下： [00:39] TOP1(🔥196) | 🤖 SmolLM2: When Smol Goes Big -- Data-Centric Training of a Small Language Model（SmolLM2：当小型模型走向大型化——以数据为中心的小型语言模型训练） [02:32] TOP2(🔥183) | 🎥 OmniHuman-1: Rethinking the Scaling-Up of One-Stage Conditioned Human Animation Models（OmniHuman-1：重新思考一阶段条件人类动画模型的扩展） [05:02] TOP3(🔥182) | 🦜 The Stochastic Parrot on LLM's Shoulder: A Summative Assessment of Physical Concept Understanding（随机鹦鹉在大语言模型肩上：物理概念理解的总结性评估） [06:41] TOP4(🔥167) | 🧠 MLGym: A New Framework and Benchmark for Advancing AI Research Agents（MLGym：推进AI研究代理的新框架与基准） [09:03] TOP5(🔥152) | 🌐 Qwen2.5-VL Technical Report（Qwen2.5-VL 技术报告） [11:48] TOP6(🔥152) | 🔍 LLM-Microscope: Uncovering the Hidden Role of Punctuation in Context Memory of Transformers（LLM显微镜：揭示标点符号在Transformer上下文记忆中的隐藏作用） [13:41] TOP7(🔥142) | 🚀 InfiniteHiP: Extending Language Model Context Up to 3 Million Tokens on a Single GPU（InfiniteHiP：在单个GPU上扩展语言模型上下文至300万 tokens） [16:06] TOP8(🔥140) | 🤔 Can 1B LLM Surpass 405B LLM? Rethinking Compute-Optimal Test-Time Scaling（10亿参数LLM能否超越4050亿参数LLM？重新思考计算最优的测试时缩放） [18:40] TOP9(🔥137) | ⚡ Native Sparse Attention: Hardware-Aligned and Natively Trainable Sparse Attention（原生稀疏注意力：硬件对齐与原生可训练的稀疏注意力） [20:46] TOP10(🔥125) | 💼 Expect the Unexpected: FailSafe Long Context QA for Finance（预料之外：金融领域长上下文问答的FailSafe）【关注我们】您还可以在以下平台找到我们，获得播客内容以外更多信息小红书: AI速递

23分钟

99+

2个月前

【周末特辑】2月第4周最火AI论文 | 标点符号影响LLM记忆，SurveyX提升问卷质量。

本期的 5 篇论文如下： [00:50] TOP1(🔥152) | 🔍 LLM-Microscope: Uncovering the Hidden Role of Punctuation in Context Memory of Transformers（LLM显微镜：揭示标点符号在Transformer上下文记忆中的隐藏作用） [03:08] TOP2(🔥89) | 📚 SurveyX: Academic Survey Automation via Large Language Models（SurveyX 基于大型语言模型的学术调查自动化系统） [05:42] TOP3(🔥65) | 🎥 VideoGrain: Modulating Space-Time Attention for Multi-grained Video Editing（视频粒度：调节时空注意力实现多粒度视频编辑） [07:24] TOP4(🔥64) | 📖 Thus Spake Long-Context Large Language Model（长上下文大语言模型如是说） [09:35] TOP5(🔥61) | 🤖 OmniAlign-V: Towards Enhanced Alignment of MLLMs with Human Preference（OmniAlign-V：迈向多模态大语言模型与人类偏好增强对齐）【关注我们】您还可以在以下平台找到我们，获得播客内容以外更多信息小红书: AI速递

12分钟

99+

2个月前

2025.02.28 | 自我校正提升数学推理，强化学习优化医疗推理。

本期的 19 篇论文如下： [00:23] 🧠 Self-rewarding correction for mathematical reasoning（自我奖励的数学推理校正） [01:03] 🧠 MedVLM-R1: Incentivizing Medical Reasoning Capability of Vision-Language Models (VLMs) via Reinforcement Learning（MedVLM-R1：通过强化学习激励视觉语言模型的医疗推理能力） [01:53] 🧠 R2-T2: Re-Routing in Test-Time for Multimodal Mixture-of-Experts（R2-T2：测试时重路由在多模态专家混合模型中的应用） [02:34] 🧬 LongRoPE2: Near-Lossless LLM Context Window Scaling（LongRoPE2：近乎无损的LLM上下文窗口扩展） [03:11] 🧠 FINEREASON: Evaluating and Improving LLMs' Deliberate Reasoning through Reflective Puzzle Solving（FINEREASON：通过反思性谜题解决评估和改进大语言模型的深思熟虑推理） [04:02] 🤖 CODESYNC: Synchronizing Large Language Models with Dynamic Code Evolution at Scale（CODESYNC：大规模动态代码演化与大型语言模型同步） [04:48] 🚀 Lean and Mean: Decoupled Value Policy Optimization with Global Value Guidance（精简与高效：基于全局价值引导的解耦价值策略优化） [05:33] 🧩 UniTok: A Unified Tokenizer for Visual Generation and Understanding（UniTok：面向视觉生成与理解的统一分词器） [06:12] 🚀 NeoBERT: A Next-Generation BERT（NeoBERT：下一代BERT） [06:47] 🌀 FlexiDiT: Your Diffusion Transformer Can Easily Generate High-Quality Samples with Less Compute（FlexiDiT：让你的扩散Transformer轻松生成高质量样本，计算量更少） [07:30] 🛠 SoRFT: Issue Resolving with Subtask-oriented Reinforced Fine-Tuning（SoRFT：面向子任务的强化微调问题解决方法） [08:07] 🤖 Building Interactable Replicas of Complex Articulated Objects via Gaussian Splatting（基于高斯样条构建复杂 articulated 物体的交互式副本） [08:45] 🎨 Multimodal Representation Alignment for Image Generation: Text-Image Interleaved Control Is Easier Than You Think（多模态表示对齐用于图像生成：文本-图像交错控制比你想象的更简单） [09:30] 🎥 Mobius: Text to Seamless Looping Video Generation via Latent Shift（Mobius：通过潜在位移从文本生成无缝循环视频） [10:08] 🛡 Guardians of the Agentic System: Preventing Many Shots Jailbreak with Agentic System（代理系统守护者：通过代理系统防止多次越狱） [10:49] 🤖 R1-T1: Fully Incentivizing Translation Capability in LLMs via Reasoning Learning（通过推理学习全面激励大语言模型中的翻译能力） [11:29] 🧠 On Relation-Specific Neurons in Large Language Models（关于大型语言模型中的关系特定神经元） [12:05] 🔄 Training Consistency Models with Variational Noise Coupling（基于变分噪声耦合的训练一致性模型） [12:46] ⚡ Efficient Gaussian Splatting for Monocular Dynamic Scene Rendering via Sparse Time-Variant Attribute Modeling（通过稀疏时变属性建模实现单目动态场景渲染的高效高斯光栅化）【关注我们】您还可以在以下平台找到我们，获得播客内容以外更多信息小红书: AI速递

13分钟

99+

2个月前

2025.02.27 | Kanana提升韩英双语效率，GHOST 2.0实现高保真头部转移。

本期的 18 篇论文如下： [00:23] 🌐 Kanana: Compute-efficient Bilingual Language Models（Kanana：计算高效的双语语言模型） [00:54] 👤 GHOST 2.0: generative high-fidelity one shot transfer of heads（GHOST 2.0：生成高保真一次性头部转移） [01:43] 🎥 TheoremExplainAgent: Towards Multimodal Explanations for LLM Theorem Understanding（定理解释代理：面向大语言模型定理理解的多模态解释） [02:21] 🤖 Agentic Reward Modeling: Integrating Human Preferences with Verifiable Correctness Signals for Reliable Reward Systems（代理奖励建模：将人类偏好与可验证的正确性信号结合以构建可靠的奖励系统） [03:02] 🤖 Can Large Language Models Detect Errors in Long Chain-of-Thought Reasoning?（大型语言模型能否检测长链推理中的错误？） [03:47] 🌍 Language Models' Factuality Depends on the Language of Inquiry（语言模型的事实性依赖于查询语言） [04:27] 🧠 Can Language Models Falsify? Evaluating Algorithmic Reasoning with Counterexample Creation（语言模型能否证伪？评估算法推理中的反例创建） [05:11] 🤖 Towards an AI co-scientist（迈向人工智能合作科学家） [05:52] 🇬 Plutus: Benchmarking Large Language Models in Low-Resource Greek Finance（普鲁托斯：在低资源希腊金融环境中评估大型语言模型） [06:38] 🤖 VEM: Environment-Free Exploration for Training GUI Agent with Value Environment Model（VEM：利用价值环境模型训练GUI代理的无环境探索） [07:12] 📏 Distill Any Depth: Distillation Creates a Stronger Monocular Depth Estimator（蒸馏任意深度：蒸馏技术创造更强的单目深度估计器） [07:52] 📚 Project Alexandria: Towards Freeing Scientific Knowledge from Copyright Burdens via LLMs（亚历山大项目：通过大型语言模型解除科学知识的版权负担） [08:35] 🛡 AISafetyLab: A Comprehensive Framework for AI Safety Evaluation and Improvement（AISafetyLab：AI安全评估与改进的综合框架） [09:23] 🧠 BIG-Bench Extra Hard（BIG-Bench 超难版本） [10:07] 🔍 CritiQ: Mining Data Quality Criteria from Human Preferences（CritiQ：从人类偏好中挖掘数据质量标准） [10:44] 🔬 MolSpectra: Pre-training 3D Molecular Representation with Multi-modal Energy Spectra（MolSpectra：利用多模态能量光谱预训练三维分子表示） [11:28] 📄 PosterSum: A Multimodal Benchmark for Scientific Poster Summarization（PosterSum：科学海报摘要的多模态基准） [12:08] 🧠 DOEI: Dual Optimization of Embedding Information for Attention-Enhanced Class Activation Maps（双优化嵌入信息用于增强注意力类激活图）【关注我们】您还可以在以下平台找到我们，获得播客内容以外更多信息小红书: AI速递

13分钟

95

2个月前

2025.02.26 | OmniAlign-V提升多模态模型对齐，SpargeAttn加速注意力计算

本期的 14 篇论文如下： [00:23] 🤖 OmniAlign-V: Towards Enhanced Alignment of MLLMs with Human Preference（OmniAlign-V：迈向多模态大语言模型与人类偏好增强对齐） [01:06] ⚡ SpargeAttn: Accurate Sparse Attention Accelerating Any Model Inference（SpargeAttn：准确稀疏注意力加速任意模型推理） [01:53] 🖼 KV-Edit: Training-Free Image Editing for Precise Background Preservation（KV-编辑：无需训练的图像编辑方法，实现精确背景保留） [02:32] 🌈 ART: Anonymous Region Transformer for Variable Multi-Layer Transparent Image Generation（匿名区域变换器：可变多层透明图像生成） [03:08] 🤖 SWE-RL: Advancing LLM Reasoning via Reinforcement Learning on Open Software Evolution（SWE-RL：通过开源软件演化数据强化学习提升LLM推理能力） [03:51] 📊 Unveiling Downstream Performance Scaling of LLMs: A Clustering-Based Perspective（揭示大语言模型下游性能扩展：基于聚类的视角） [04:30] 🧠 Scale-Distribution Decoupling: Enabling Stable and Effective Training of Large Language Models（尺度分布解耦：实现大型语言模型稳定有效训练） [05:11] 🔄 K-LoRA: Unlocking Training-Free Fusion of Any Subject and Style LoRAs（K-LoRA：解锁无需训练的任意主题和风格LoRA融合） [05:51] 🌐 WebGames: Challenging General-Purpose Web-Browsing AI Agents（WebGames：挑战通用网页浏览AI代理） [06:29] 🧠 Introducing Visual Perception Token into Multimodal Large Language Model（引入视觉感知令牌的多模态大语言模型） [07:07] 🎰 The Lottery LLM Hypothesis, Rethinking What Abilities Should LLM Compression Preserve?（彩票LLM假说：重新思考LLM压缩应保留的能力） [07:47] 🧠 AAD-LLM: Neural Attention-Driven Auditory Scene Understanding（AAD-LLM：神经注意力驱动的听觉场景理解） [08:26] 🔍 LaTIM: Measuring Latent Token-to-Token Interactions in Mamba Models（LaTIM：测量Mamba模型中的潜在Token-to-Token交互） [09:07] 🧠 Shakti-VLMs: Scalable Vision-Language Models for Enterprise AI（Shakti-VLMs：企业级AI的可扩展视觉语言模型）【关注我们】您还可以在以下平台找到我们，获得播客内容以外更多信息小红书: AI速递

10分钟

99+

2个月前

2025.02.25 | 长上下文优化创新，视觉扩散高效通用。

本期的 20 篇论文如下： [00:27] 📖 Thus Spake Long-Context Large Language Model（长上下文大语言模型如是说） [01:09] 🌈 DICEPTION: A Generalist Diffusion Model for Visual Perceptual Tasks（用于视觉感知任务的通用扩散模型） [01:48] 🚀 Slamming: Training a Speech Language Model on One GPU in a Day（撞击：在一天内使用单个GPU训练语音语言模型） [02:32] 🎥 VideoGrain: Modulating Space-Time Attention for Multi-grained Video Editing（视频粒度：调节时空注意力实现多粒度视频编辑） [03:11] 🎧 Audio-FLAN: A Preliminary Release（音频FLAN：初步发布） [03:43] 🧠 CodeCriticBench: A Holistic Code Critique Benchmark for Large Language Models（CodeCriticBench：面向大型语言模型的全面代码 critique 基准测试） [04:28] 🎨 GCC: Generative Color Constancy via Diffusing a Color Checker（GCC：通过扩散色卡生成颜色恒常性） [05:11] 📊 Linguistic Generalizability of Test-Time Scaling in Mathematical Reasoning（数学推理中测试时间扩展的语言通用性） [05:57] 🚀 Make LoRA Great Again: Boosting LoRA with Adaptive Singular Values and Mixture-of-Experts Optimization Alignment（让LoRA再次伟大：通过自适应奇异值和混合专家优化对齐提升LoRA性能） [06:38] 🧠 Multimodal Inconsistency Reasoning (MMIR): A New Benchmark for Multimodal Reasoning Models（多模态不一致性推理（MMIR）：多模态推理模型的新基准） [07:23] 🎥 RIFLEx: A Free Lunch for Length Extrapolation in Video Diffusion Transformers（RIFLEx：视频扩散Transformer中长度外推的免费午餐） [08:01] 📱 Mobile-Agent-V: Learning Mobile Device Operation Through Video-Guided Multi-Agent Collaboration（移动代理V：通过视频引导的多代理协作学习移动设备操作） [08:45] ⏳ Benchmarking Temporal Reasoning and Alignment Across Chinese Dynasties（中国朝代间的时间推理与对齐基准测试） [09:31] 🤖 Reflective Planning: Vision-Language Models for Multi-Stage Long-Horizon Robotic Manipulation（反射性规划：视觉语言模型在多阶段长时程机器人操作中的应用） [10:02] 🔄 Stable-SPAM: How to Train in 4-Bit More Stably than 16-Bit Adam（稳定-SPAM：如何在4位精度下比16位Adam更稳定地训练） [10:43] 📝 Can Community Notes Replace Professional Fact-Checkers?（社区笔记能替代专业事实核查员吗？） [11:24] 📈 Forecasting Open-Weight AI Model Growth on Hugging Face（预测Hugging Face上开放权重AI模型的增长） [12:08] 🔑 Beyond Release: Access Considerations for Generative AI Systems（超越发布：生成式人工智能系统的访问考量） [12:49] 🌐 TAG: A Decentralized Framework for Multi-Agent Hierarchical Reinforcement Learning（TAG：一种用于多智能体分层强化学习的去中心化框架） [13:30] 💃 X-Dancer: Expressive Music to Human Dance Video Generation（X-Dancer：从音乐生成生动舞蹈视频）【关注我们】您还可以在以下平台找到我们，获得播客内容以外更多信息小红书: AI速递

14分钟

99+

2个月前

2025.02.24 | 高效学术调查生成，标点符号关键作用

本期的 20 篇论文如下： [00:23] 📚 SurveyX: Academic Survey Automation via Large Language Models（基于大型语言模型的学术调查自动化） [01:10] 🔍 LLM-Microscope: Uncovering the Hidden Role of Punctuation in Context Memory of Transformers（LLM显微镜：揭示标点符号在Transformer上下文记忆中的隐藏作用） [01:50] 🚗 MaskGWM: A Generalizable Driving World Model with Video Mask Reconstruction（MaskGWM：结合视频掩码重建的通用驾驶世界模型） [02:28] 🧬 Mol-LLaMA: Towards General Understanding of Molecules in Large Molecular Language Model（Mol-LLaMA：面向大分子语言模型的分子通用理解） [03:12] 🎨 PhotoDoodle: Learning Artistic Image Editing from Few-Shot Pairwise Data（PhotoDoodle：从少量成对数据中学习艺术图像编辑） [03:55] 🔗 VLM$^2$-Bench: A Closer Look at How Well VLMs Implicitly Link Explicit Matching Visual Cues（VLM²-Bench：深入探究视觉语言模型在显式匹配视觉线索上的隐式链接能力） [04:42] 📌 SIFT: Grounding LLM Reasoning in Contexts via Stickers（SIFT：通过贴纸将大语言模型的推理扎根于上下文中） [05:27] 🧠 LightThinker: Thinking Step-by-Step Compression（光思者：逐步压缩推理） [05:59] 🗂 StructFlowBench: A Structured Flow Benchmark for Multi-turn Instruction Following（结构流基准：多轮指令跟随的结构流评估） [06:48] 🛡 Is Safety Standard Same for Everyone? User-Specific Safety Evaluation of Large Language Models（安全标准对所有人都一样吗？大型语言模型的用户特定安全评估） [07:40] 📚 KITAB-Bench: A Comprehensive Multi-Domain Benchmark for Arabic OCR and Document Understanding（KITAB-Bench：阿拉伯语OCR与文档理解的综合多领域基准） [08:30] 🧬 ReQFlow: Rectified Quaternion Flow for Efficient and High-Quality Protein Backbone Generation（ReQFlow：用于高效高质量蛋白质骨架生成的校正四元数流） [09:11] 🧠 MoBA: Mixture of Block Attention for Long-Context LLMs（MoBA：块注意力混合模型用于长上下文LLMs） [09:49] 🤖 InterFeedback: Unveiling Interactive Intelligence of Large Multimodal Models via Human Feedback（InterFeedback：通过人类反馈揭示大型多模态模型的交互智能） [10:37] 🧠 The Relationship Between Reasoning and Performance in Large Language Models -- o3 (mini) Thinks Harder, Not Longer（大语言模型中推理与性能的关系——o3（mini）通过更努力而非更长时间进行推理） [11:20] 📚 Evaluating Multimodal Generative AI with Korean Educational Standards（评估多模态生成式人工智能与韩国教育标准） [11:54] ⚠ Superintelligent Agents Pose Catastrophic Risks: Can Scientist AI Offer a Safer Path?（超级智能体带来灾难性风险：科学家AI能否提供更安全的路径？） [12:29] ⚡ One-step Diffusion Models with $f$-Divergence Distribution Matching（基于$f$-散度分布匹配的一步扩散模型） [13:09] 🧠 Think Inside the JSON: Reinforcement Strategy for Strict LLM Schema Adherence（在JSON内部思考：强化策略实现严格LLM模式遵循） [13:52] 🧠 MedHallu: A Comprehensive Benchmark for Detecting Medical Hallucinations in Large Language Models（MedHallu：检测大型语言模型中的医学幻觉的综合基准）【关注我们】您还可以在以下平台找到我们，获得播客内容以外更多信息小红书: AI速递

15分钟

99+

2个月前

【周末特辑】2月第3周最火AI论文 | MLGym推动AI代理评估，Qwen2.5-VL提升多模态表现。

本期的 5 篇论文如下： [00:42] TOP1(🔥138) | 🧠 MLGym: A New Framework and Benchmark for Advancing AI Research Agents（MLGym：推进AI研究代理的新框架与基准） [03:30] TOP2(🔥131) | 🌐 Qwen2.5-VL Technical Report（Qwen2.5-VL 技术报告） [06:56] TOP3(🔥130) | ⚡ Native Sparse Attention: Hardware-Aligned and Natively Trainable Sparse Attention（原生稀疏注意力：硬件对齐与原生可训练的稀疏注意力） [09:20] TOP4(🔥91) | 🌐 SigLIP 2: Multilingual Vision-Language Encoders with Improved Semantic Understanding, Localization, and Dense Features（SigLIP 2：多语言视觉-语言编码器的语义理解、定位与密集特征改进） [11:50] TOP5(🔥86) | 📚 SuperGPQA: Scaling LLM Evaluation across 285 Graduate Disciplines（SuperGPQA：扩展LLM评估至285个研究生学科）【关注我们】您还可以在以下平台找到我们，获得播客内容以外更多信息小红书: AI速递

14分钟

99+

2个月前

2025.02.21 | AI代理评估新框架，LLM学科表现差异显著。

本期的 20 篇论文如下： [00:26] 🧠 MLGym: A New Framework and Benchmark for Advancing AI Research Agents（MLGym：推进AI研究代理的新框架与基准） [01:18] 📚 SuperGPQA: Scaling LLM Evaluation across 285 Graduate Disciplines（SuperGPQA：扩展LLM评估至285个研究生学科） [02:04] 🌐 SigLIP 2: Multilingual Vision-Language Encoders with Improved Semantic Understanding, Localization, and Dense Features（SigLIP 2：多语言视觉-语言编码器的语义理解、定位与密集特征改进） [02:52] 🧠 How Much Knowledge Can You Pack into a LoRA Adapter without Harming LLM?（在不损害大型语言模型的情况下，LoRA适配器能容纳多少知识？） [03:49] 🚀 S*: Test Time Scaling for Code Generation（S*：代码生成中的测试时间缩放） [04:35] ⏳ Does Time Have Its Place? Temporal Heads: Where Language Models Recall Time-specific Information（时间是否有其位置？时间头：语言模型如何回忆时间特定信息） [05:28] 📄 LongWriter-V: Enabling Ultra-Long and High-Fidelity Generation in Vision-Language Models（LongWriter-V：在视觉-语言模型中实现超长和高保真生成） [06:17] 🧠 Logic-RL: Unleashing LLM Reasoning with Rule-Based Reinforcement Learning（逻辑-RL：通过基于规则的强化学习释放LLM推理能力） [07:13] 🖥 PC-Agent: A Hierarchical Multi-Agent Collaboration Framework for Complex Task Automation on PC（PC-Agent：一种用于复杂任务自动化在PC上的分层多智能体协作框架） [08:07] 🧠 S$^2$R: Teaching LLMs to Self-verify and Self-correct via Reinforcement Learning（S$^2$R：通过强化学习教导大语言模型自我验证与自我修正） [09:01] 🧠 Discovering highly efficient low-weight quantum error-correcting codes with reinforcement learning（利用强化学习发现高效低权重量子纠错码） [09:55] 🎥 Dynamic Concepts Personalization from Single Videos（单视频动态概念个性化） [10:38] 🖼 Scaling Text-Rich Image Understanding via Code-Guided Synthetic Multimodal Data Generation（通过代码引导的合成多模态数据生成扩展文本丰富的图像理解） [11:23] 🌍 NAVIG: Natural Language-guided Analysis with Vision Language Models for Image Geo-localization（NAVIG：基于自然语言引导的视觉语言模型用于图像地理定位分析） [12:13] 🧠 AlphaMaze: Enhancing Large Language Models' Spatial Intelligence via GRPO（AlphaMaze：通过GRPO提升大型语言模型的空间智能） [13:06] 🌍 How Much Do LLMs Hallucinate across Languages? On Multilingual Estimation of LLM Hallucination in the Wild（LLMs在多语言环境下的幻觉现象研究：在野外场景中的多语言幻觉估计） [13:52] 🌍 Geolocation with Real Human Gameplay Data: A Large-Scale Dataset and Human-Like Reasoning Framework（基于真实人类游戏数据的 geolocation：大规模数据集与人类推理框架） [14:55] 🌐 RelaCtrl: Relevance-Guided Efficient Control for Diffusion Transformers（RelaCtrl：引导相关性的高效控制扩散变换器） [15:54] 🧠 Enhancing Cognition and Explainability of Multimodal Foundation Models with Self-Synthesized Data（增强多模态基础模型的认知与可解释性通过自合成数据） [16:41] 🤖 LLM-based User Profile Management for Recommender System（基于大语言模型的推荐系统用户画像管理）【关注我们】您还可以在以下平台找到我们，获得播客内容以外更多信息小红书: AI速递

18分钟

99+

2个月前

2024.12.18 每日AI论文 | 推理能力待提升，多模态模型需优化。

本期的 8 篇论文如下： [00:24] 🧠 Are Your LLMs Capable of Stable Reasoning?（你的LLM是否具备稳定推理能力？） [01:06] 📊 Multi-Dimensional Insights: Benchmarking Real-World Personalization in Large Multimodal Models（多维度洞察：大型多模态模型在现实世界个性化中的基准测试） [01:52] 📊 OmniEval: An Omnidirectional and Automatic RAG Evaluation Benchmark in Financial Domain（OmniEval：金融领域全方位自动RAG评估基准） [02:33] 🧠 Emergence of Abstractions: Concept Encoding and Decoding Mechanism for In-Context Learning in Transformers（抽象概念的涌现：Transformer中上下文学习中的概念编码与解码机制） [03:16] 🤖 Proposer-Agent-Evaluator(PAE): Autonomous Skill Discovery For Foundation Model Internet Agents（提议者-代理-评估者（PAE）：为基模型互联网代理实现自主技能发现） [04:00] 📊 VisDoM: Multi-Document QA with Visually Rich Elements Using Multimodal Retrieval-Augmented Generation（VisDoM：使用多模态检索增强生成的多文档问答与视觉丰富元素） [04:39] 🤔 When to Speak, When to Abstain: Contrastive Decoding with Abstention（何时发言，何时保持沉默：对比解码与放弃机制） [05:18] 🎥 MIVE: New Design and Benchmark for Multi-Instance Video Editing（MIVE：多实例视频编辑的新设计与基准）【关注我们】您还可以在以下平台找到我们，获得播客内容以外更多信息小红书: AI速递

6分钟

43

5个月前

2024.12.17 每日AI论文 | 提升检索生成效率，优化视觉生成评估。

本期的 18 篇论文如下： [00:23] 🧠 RetroLLM: Empowering Large Language Models to Retrieve Fine-grained Evidence within Generation（RetroLLM：赋能大型语言模型在生成过程中检索细粒度证据） [01:05] ⚡ Evaluation Agent: Efficient and Promptable Evaluation Framework for Visual Generative Models（评估代理：高效且可提示的视觉生成模型评估框架） [01:45] 🎨 BrushEdit: All-In-One Image Inpainting and Editing（BrushEdit：一站式图像修复与编辑） [02:27] 🎨 ColorFlow: Retrieval-Augmented Image Sequence Colorization（ColorFlow：检索增强型图像序列着色） [03:10] 🧩 Byte Latent Transformer: Patches Scale Better Than Tokens（字节潜在变换器：补丁尺度优于标记） [03:56] 🧠 Causal Diffusion Transformers for Generative Modeling（因果扩散变换器用于生成建模） [04:33] 🤖 Smaller Language Models Are Better Instruction Evolvers（更小的语言模型是更好的指令进化器） [05:16] 🌟 IDArb: Intrinsic Decomposition for Arbitrary Number of Input Views and Illuminations（IDArb：任意数量输入视图和光照下的内在分解） [06:02] 🌳 SPaR: Self-Play with Tree-Search Refinement to Improve Instruction-Following in Large Language Models（SPaR：通过树搜索优化自我对弈以提升大型语言模型的指令遵循能力） [06:47] 🌌 Wonderland: Navigating 3D Scenes from a Single Image（奇境：从单张图像导航3D场景） [07:32] 🔬 GaussianProperty: Integrating Physical Properties to 3D Gaussians with LMMs（高斯属性：将物理属性集成到3D高斯分布中与LMMs结合） [08:18] ⚡ SepLLM: Accelerate Large Language Models by Compressing One Segment into One Separator（SepLLM：通过将一段内容压缩为一个分隔符来加速大型语言模型） [09:06] 🧠 Wonderful Matrices: Combining for a More Efficient and Effective Foundation Model Architecture（奇妙矩阵：结合以实现更高效和有效的基模型架构） [09:46] 👩 StrandHead: Text to Strand-Disentangled 3D Head Avatars Using Hair Geometric Priors（StrandHead：基于头发几何先验的文本生成解耦3D头部虚拟形象） [10:35] 🌐 MOVIS: Enhancing Multi-Object Novel View Synthesis for Indoor Scenes（MOVIS：增强室内场景多物体新颖视角合成） [11:19] 🎵 Whisper-GPT: A Hybrid Representation Audio Large Language Model（Whisper-GPT：一种混合表示的音频大语言模型） [12:10] 🤖 TidyBot++: An Open-Source Holonomic Mobile Manipulator for Robot Learning（TidyBot++：用于机器人学习的开源全向移动机械手） [13:01] 🔒 Just a Simple Transformation is Enough for Data Protection in Vertical Federated Learning（只需简单变换即可实现纵向联邦学习中的数据保护）【关注我们】您还可以在以下平台找到我们，获得播客内容以外更多信息小红书: AI速递

14分钟

43

5个月前