2025.05.22 | Web导航效率提升;量化误差优化。

本期的 15 篇论文如下: [00:25] 🤖 Web-Shepherd: Advancing PRMs for Reinforcing Web Agents(Web-Shepherd:用于增强Web代理的PRM的进步) [01:13] 🧮 Scaling Law for Quantization-Aware Training(量化感知训练的缩放法则) [01:53] 🤖 UniVG-R1: Reasoning Guided Universal Visual Grounding with Reinforcement Learning(基于强化学习和推理引导的通用视觉定位) [02:28] 🎨 MMaDA: Multimodal Large Diffusion Language Models(MMaDA:多模态大型扩散语言模型) [03:04] 🔄 Diffusion vs. Autoregressive Language Models: A Text Embedding Perspective(扩散模型 vs. 自回归语言模型:文本嵌入的视角) [03:44] 💻 Efficient Agent Training for Computer Use(用于计算机使用的高效Agent训练) [04:26] 🧠 Learn to Reason Efficiently with Adaptive Length-based Reward Shaping(基于自适应长度奖励塑造的高效推理学习) [05:08] 💡 When to Continue Thinking: Adaptive Thinking Mode Switching for Efficient Reasoning(何时继续思考:用于高效推理的自适应思考模式切换) [05:39] 🤖 Vid2World: Crafting Video Diffusion Models to Interactive World Models(Vid2World:构建交互式世界模型的视频扩散模型) [06:16] 🖼 IA-T2I: Internet-Augmented Text-to-Image Generation(互联网增强的文本到图像生成) [06:49] 🧠 Deliberation on Priors: Trustworthy Reasoning of Large Language Models on Knowledge Graphs(基于先验知识的审慎:大型语言模型在知识图谱上的可信推理) [07:31] 🎮 lmgame-Bench: How Good are LLMs at Playing Games?(lmgame-Bench:大型语言模型在玩游戏方面表现如何?) [08:18] 🏙 Constructing a 3D Town from a Single Image(从单张图像构建三维城镇) [08:58] 🚀 dKV-Cache: The Cache for Diffusion Language Models(dKV-Cache:扩散语言模型的缓存) [09:40] 🛡 How Should We Enhance the Safety of Large Reasoning Models: An Empirical Study(我们应该如何提升大型推理模型的安全性:一项实证研究) 【关注我们】 您还可以在以下平台找到我们,获得播客内容以外更多信息 小红书: AI速递

10分钟
46
20小时前

2025.05.21 | 多模态预训练提升复杂任务能力;注意力机制优化推理与训练效率。

本期的 15 篇论文如下: [00:22] 💡 Emerging Properties in Unified Multimodal Pretraining(统一多模态预训练中的涌现属性) [01:03] 🚀 SageAttention3: Microscaling FP4 Attention for Inference and An Exploration of 8-Bit Training(SageAttention3:用于推理的微缩FP4注意力机制与8位训练的探索) [01:42] 🖼 VisualQuality-R1: Reasoning-Induced Image Quality Assessment via Reinforcement Learning to Rank(VisualQuality-R1:基于强化学习排序的推理引导图像质量评估) [02:23] 🤖 Visual Agentic Reinforcement Fine-Tuning(视觉Agent强化微调) [03:01] 🧪 The Aloe Family Recipe for Open and Specialized Healthcare LLMs(开源与专用医疗保健大型语言模型的芦荟家族秘方) [03:40] 🧮 Optimizing Anytime Reasoning via Budget Relative Policy Optimization(通过预算相对策略优化实现随时推理优化) [04:25] 🧠 Neurosymbolic Diffusion Models(神经符号扩散模型) [05:02] 🌊 Latent Flow Transformer(潜在流Transformer) [05:40] 🧑 Exploring Federated Pruning for Large Language Models(探索用于大型语言模型的联邦剪枝) [06:23] 👁 Visionary-R1: Mitigating Shortcuts in Visual Reasoning with Reinforcement Learning(Visionary-R1:利用强化学习缓解视觉推理中的捷径问题) [07:05] 🧠 General-Reasoner: Advancing LLM Reasoning Across All Domains(通用推理器:提升大型语言模型在所有领域的推理能力) [07:45] 🤔 Reasoning Models Better Express Their Confidence(推理模型更善于表达其置信度) [08:20] 🚀 Reasoning Path Compression: Compressing Generation Trajectories for Efficient LLM Reasoning(推理路径压缩:压缩生成轨迹以实现高效的LLM推理) [09:07] 🖼 Training-Free Watermarking for Autoregressive Image Generation(自回归图像生成模型的免训练水印方法) [09:48] 🤔 VideoEval-Pro: Robust and Realistic Long Video Understanding Evaluation(VideoEval-Pro:稳健且真实的长视频理解评估) 【关注我们】 您还可以在以下平台找到我们,获得播客内容以外更多信息 小红书: AI速递

10分钟
66
1天前

2025.05.20 | 模型链学习提升效率;AdaptThink优化推理速度。

本期的 15 篇论文如下: [00:23] 🔗 Chain-of-Model Learning for Language Model(模型链学习:一种用于语言模型的新型学习范式) [00:58] 🤔 AdaptThink: Reasoning Models Can Learn When to Think(AdaptThink:推理模型何时思考的学习) [01:45] 🧠 AdaCoT: Pareto-Optimal Adaptive Chain-of-Thought Triggering via Reinforcement Learning(AdaCoT: 通过强化学习实现帕累托最优的自适应思维链触发) [02:21] 🚀 Delta Attention: Fast and Accurate Sparse Attention Inference by Delta Correction(Delta注意力机制:通过Delta校正实现快速而精确的稀疏注意力推断) [03:04] 🖥 Scaling Computer-Use Grounding via User Interface Decomposition and Synthesis(通过用户界面分解与合成扩展计算机使用中的Grounding) [03:43] 🤔 Thinkless: LLM Learns When to Think(智思:大语言模型学习何时思考) [04:23] 💡 Seek in the Dark: Reasoning via Test-Time Instance-Level Policy Gradient in Latent Space(暗中求索:在隐空间中通过测试时实例级策略梯度进行推理) [05:00] 🧮 MM-PRM: Enhancing Multimodal Mathematical Reasoning with Scalable Step-Level Supervision(MM-PRM:利用可扩展的步骤级监督增强多模态数学推理) [05:39] ✨ Hybrid 3D-4D Gaussian Splatting for Fast Dynamic Scene Representation(混合3D-4D高斯溅射:用于快速动态场景表示) [06:15] 🛡 FedSVD: Adaptive Orthogonalization for Private Federated Learning with LoRA(FedSVD:基于LoRA的自适应正交化差分隐私联邦学习) [07:00] 🧩 Model Merging in Pre-training of Large Language Models(大型语言模型预训练中的模型合并) [07:53] 🤖 CPGD: Toward Stable Rule-based Reinforcement Learning for Language Models(CPGD:面向语言模型稳定规则强化学习) [08:36] 🎬 Faster Video Diffusion with Trainable Sparse Attention(基于可训练稀疏注意力的快速视频扩散) [09:23] 🧠 Fractured Chain-of-Thought Reasoning(碎裂的思维链推理) [10:03] 🧠 Neuro-Symbolic Query Compiler(神经符号查询编译器) 【关注我们】 您还可以在以下平台找到我们,获得播客内容以外更多信息 小红书: AI速递

11分钟
74
2天前

2025.05.19 | Qwen3提升LLMs性能;GuardReasoner-VL强化VLM安全。

本期的 15 篇论文如下: [00:24] 🤖 Qwen3 Technical Report(Qwen3技术报告) [01:14] 🛡 GuardReasoner-VL: Safeguarding VLMs via Reinforced Reasoning(GuardReasoner-VL:通过强化推理保护视觉语言模型) [02:01] 🖼 MMLongBench: Benchmarking Long-Context Vision-Language Models Effectively and Thoroughly(MMLongBench:有效且全面地评测长文本视觉语言模型) [02:40] 🖼 Visual Planning: Let's Think Only with Images(视觉规划:让我们只用图像思考) [03:25] 💡 Simple Semi-supervised Knowledge Distillation from Vision-Language Models via $\mathbf{\texttt{D}}$ual-$\mathbf{\texttt{H}}$ead $\mathbf{\texttt{O}}$ptimization(基于视觉-语言模型通过双头优化实现的简单半监督知识蒸馏) [04:09] 🧠 Group Think: Multiple Concurrent Reasoning Agents Collaborating at Token Level Granularity(群策群思:多个并发推理智能体在Token级别粒度上进行协作) [04:53] 🧬 Mergenetic: a Simple Evolutionary Model Merging Library(Mergenetic:一个用于合并库的简单进化模型) [05:35] 💡 MPS-Prover: Advancing Stepwise Theorem Proving by Multi-Perspective Search and Data Curation(MPS-Prover:通过多视角搜索和数据精选推进逐步定理证明) [06:14] 🧮 Multi-Token Prediction Needs Registers(多Token预测需要寄存器) [06:48] 🤔 Scaling Reasoning can Improve Factuality in Large Language Models(扩展推理能力提升大型语言模型的事实准确性) [07:25] 🧪 MatTools: Benchmarking Large Language Models for Materials Science Tools(MatTools:用于材料科学工具的大语言模型基准测试) [08:04] 🤔 Humans expect rationality and cooperation from LLM opponents in strategic games(人类期望在策略游戏中,大型语言模型对手是理性和合作的) [08:45] 🤝 Learning Dense Hand Contact Estimation from Imbalanced Data(基于不平衡数据的稠密手部接触估计学习) [09:26] 🩻 CheXGenBench: A Unified Benchmark For Fidelity, Privacy and Utility of Synthetic Chest Radiographs(CheXGenBench:合成胸部X光片的保真度、隐私性和效用性的统一基准) [10:11] 🤝 From Trade-off to Synergy: A Versatile Symbiotic Watermarking Framework for Large Language Models(从权衡到协同:一种用于大型语言模型的多功能共生水印框架) 【关注我们】 您还可以在以下平台找到我们,获得播客内容以外更多信息 小红书: AI速递

11分钟
99+
3天前

2025.05.16 | 推理模型元能力提升;系统提示优化与鲁棒性增强

本期的 15 篇论文如下: [00:24] 💡 Beyond 'Aha!': Toward Systematic Meta-Abilities Alignment in Large Reasoning Models(超越“Aha!”时刻:迈向大型推理模型中系统性元能力对齐) [01:02] 🤖 System Prompt Optimization with Meta-Learning(基于元学习的系统提示优化) [01:47] 🤖 EnerVerse-AC: Envisioning Embodied Environments with Action Condition(EnerVerse-AC:通过动作条件设想具身环境) [02:29] 🧠 The CoT Encyclopedia: Analyzing, Predicting, and Controlling how a Reasoning Model will Think(CoT百科全书:分析、预测和控制推理模型如何思考) [03:17] 🤖 EWMBench: Evaluating Scene, Motion, and Semantic Quality in Embodied World Models(EWMBench:具身世界模型中场景、运动和语义质量的评估) [03:57] 🖼 End-to-End Vision Tokenizer Tuning(端到端视觉标记器调优) [04:34] 📈 WorldPM: Scaling Human Preference Modeling(世界偏好建模:扩展人类偏好模型) [05:13] 🤖 MLE-Dojo: Interactive Environments for Empowering LLM Agents in Machine Learning Engineering(MLE-Dojo:用于增强机器学习工程中LLM代理的交互式环境) [06:01] 🧩 Achieving Tokenizer Flexibility in Language Models through Heuristic Adaptation and Supertoken Learning(通过启发式适配和超Token学习实现语言模型中的Tokenizer灵活性) [06:43] 🎨 Style Customization of Text-to-Vector Generation with Image Diffusion Priors(基于图像扩散先验的文本到矢量生成风格定制) [07:25] 🧠 J1: Incentivizing Thinking in LLM-as-a-Judge via Reinforcement Learning(J1:通过强化学习激励LLM作为裁判时的思考) [08:07] 👉 PointArena: Probing Multimodal Grounding Through Language-Guided Pointing(PointArena:通过语言引导的指向探测多模态理解) [08:47] 🖼 Depth Anything with Any Prior(任意先验的深度感知) [09:29] 🖼 OpenThinkIMG: Learning to Think with Images via Visual Tool Reinforcement Learning(OpenThinkIMG: 通过视觉工具强化学习,学习用图像思考) [10:14] 🚀 Parallel Scaling Law for Language Models(语言模型的并行扩展法则) 【关注我们】 您还可以在以下平台找到我们,获得播客内容以外更多信息 小红书: AI速递

11分钟
99+
6天前

2025.05.15 | 解耦学习提升感知性能;多模态模型优化图像生成。

本期的 11 篇论文如下: [00:23] 🖼 DeCLIP: Decoupled Learning for Open-Vocabulary Dense Perception(DeCLIP:用于开放词汇密集感知的解耦学习) [01:02] 🖼 BLIP3-o: A Family of Fully Open Unified Multimodal Models-Architecture, Training and Dataset(BLIP3-o:一族完全开放的统一多模态模型——架构、训练和数据集) [01:41] 💡 Insights into DeepSeek-V3: Scaling Challenges and Reflections on Hardware for AI Architectures(DeepSeek-V3 的深度剖析:AI 架构的扩展挑战与硬件思考) [02:24] 🎨 Marigold: Affordable Adaptation of Diffusion-Based Image Generators for Image Analysis(Marigold:用于图像分析的基于扩散的图像生成器的经济型适配) [03:00] 🤖 UniSkill: Imitating Human Videos via Cross-Embodiment Skill Representations(UniSkill:通过跨具身技能表征模仿人类视频) [03:42] 🐛 SweRank: Software Issue Localization with Code Ranking(SweRank:基于代码排序的软件问题定位) [04:23] 🤔 VCRBench: Exploring Long-form Causal Reasoning Capabilities of Large Video Language Models(VCRBench:探索大型视频语言模型在长程因果推理方面的能力) [05:14] 🖼 CAST: Component-Aligned 3D Scene Reconstruction from an RGB Image(CAST:基于RGB图像的组件对齐三维场景重建) [05:49] 🤔 Omni-R1: Do You Really Need Audio to Fine-Tune Your Audio LLM?(Omni-R1: 微调音频大语言模型真的需要音频数据吗?) [06:27] 🤔 Visually Interpretable Subtask Reasoning for Visual Question Answering(视觉问答中基于视觉可解释性的子任务推理) [06:59] 🚁 DetReIDX: A Stress-Test Dataset for Real-World UAV-Based Person Recognition(DetReIDX:一个用于现实世界无人机人员识别的压力测试数据集) 【关注我们】 您还可以在以下平台找到我们,获得播客内容以外更多信息 小红书: AI速递

8分钟
85
1周前

2025.05.14 | 零样本语音合成新模型;多维度评估LLM指令能力

本期的 8 篇论文如下: [00:25] 🗣 MiniMax-Speech: Intrinsic Zero-Shot Text-to-Speech with a Learnable Speaker Encoder(MiniMax-Speech:具有可学习说话人编码器的内在零样本语音合成) [01:00] 🤖 A Multi-Dimensional Constraint Framework for Evaluating and Improving Instruction Following in Large Language Models(用于评估和改进大型语言模型指令遵循能力的多维度约束框架) [01:47] 🎮 Measuring General Intelligence with Generated Games(基于生成游戏测量通用智能) [02:29] 🎦 SkillFormer: Unified Multi-View Video Understanding for Proficiency Estimation(SkillFormer:用于评估技能水平的统一多视角视频理解) [03:14] 🤖 NavDP: Learning Sim-to-Real Navigation Diffusion Policy with Privileged Information Guidance(NavDP:基于特权信息引导的Sim-to-Real导航扩散策略学习) [03:51] 🔍 Optimizing Retrieval-Augmented Generation: Analysis of Hyperparameter Impact on Performance and Efficiency(优化检索增强生成:超参数对性能和效率影响的分析) [04:28] 🇻 ViMRHP: A Vietnamese Benchmark Dataset for Multimodal Review Helpfulness Prediction via Human-AI Collaborative Annotation(ViMRHP:一个人机协作标注的越南语多模态评论有用性预测基准数据集) [05:04] 📖 Advancing Arabic Reverse Dictionary Systems: A Transformer-Based Approach with Dataset Construction Guidelines(推进阿拉伯语逆向词典系统:一种基于Transformer的方法与数据集构建指南) 【关注我们】 您还可以在以下平台找到我们,获得播客内容以外更多信息 小红书: AI速递

6分钟
99+
1周前
EarsOnMe

加入我们的 Discord

与播客爱好者一起交流

立即加入

播放列表

自动播放下一个

播放列表还是空的

去找些喜欢的节目添加进来吧