节目列表: HuggingFace 每日AI论文速递 - EarsOnMe

2025.10.16 | UniMoE一统语音音乐；注意力图点亮大模型推理

本期的 15 篇论文如下： [00:21] 🎧 UniMoE-Audio: Unified Speech and Music Generation with Dynamic-Capacity MoE（UniMoE-Audio：基于动态容量MoE的统一语音与音乐生成模型） [00:57] 🔍 Attention Illuminates LLM Reasoning: The Preplan-and-Anchor Rhythm Enables Fine-Grained Policy Optimization（注意力照亮大模型推理：预规划-锚定节奏实现细粒度策略优化） [01:38] ⚡ FlashWorld: High-quality 3D Scene Generation within Seconds（FlashWorld：秒级高质量3D场景生成） [02:06] 🐝 Bee: A High-Quality Corpus and Full-Stack Suite to Unlock Advanced Fully Open MLLMs（Bee：高质量语料与全栈套件解锁完全开源多模态大模型） [02:37] 🗣 InteractiveOmni: A Unified Omni-modal Model for Audio-Visual Multi-turn Dialogue（InteractiveOmni：面向音视频多轮对话的统一全模态模型） [03:24] 🌍 PhysMaster: Mastering Physical Representation for Video Generation via Reinforcement Learning（PhysMaster：通过强化学习掌握视频生成的物理表征） [04:00] 🧪 LIBERO-Plus: In-depth Robustness Analysis of Vision-Language-Action Models（LIBERO-Plus：视觉-语言-动作模型鲁棒性深度剖析） [04:43] 🚗 CVD-STORM: Cross-View Video Diffusion with Spatial-Temporal Reconstruction Model for Autonomous Driving（CVD-STORM：面向自动驾驶的跨视角视频扩散时空重建模型） [05:21] 🔍 Generative Universal Verifier as Multimodal Meta-Reasoner（生成式通用验证器：多模态元推理的反思引擎） [06:07] ⚖ ParallelBench: Understanding the Trade-offs of Parallel Decoding in Diffusion LLMs（ParallelBench：探明扩散式大模型并行解码的取舍） [06:43] 🎞 Trace Anything: Representing Any Video in 4D via Trajectory Fields（任意视频4D轨迹场表示：一次前馈即可还原每像素连续时空路径） [07:27] 🌍 Reasoning in Space via Grounding in the World（基于世界锚定的空间推理） [07:54] 🧠 The Role of Computing Resources in Publishing Foundation Model Research（计算资源在基础模型研究发表中的角色） [08:28] ⚖ UniME-V2: MLLM-as-a-Judge for Universal Multimodal Embedding Learning（UniME-V2：用多模态大模型当裁判，打造通用多模态表征） [09:05] 🤖 InternVLA-M1: A Spatially Guided Vision-Language-Action Framework for Generalist Robot Policy（InternVLA-M1：面向通用机器人策略的空间引导视觉-语言-动作框架）【关注我们】您还可以在以下平台找到我们，获得播客内容以外更多信息小红书: AI速递

10分钟

99+

3个月前

2025.10.15 | 像素级自监督ViT刷新生成基准；多智能体评测网文翻译新标尺

HuggingFace 每日AI论文速递

本期的 14 篇论文如下： [00:20] 🖼 Advancing End-to-End Pixel Space Generative Modeling via Self-supervised Pre-training（通过自监督预训练推进端到端像素空间生成建模） [00:53] 📚 DITING: A Multi-Agent Evaluation Framework for Benchmarking Web Novel Translation（DITING：面向网络小说翻译评测的多智能体基准框架） [01:41] 🌐 Scaling Language-Centric Omnimodal Representation Learning（以语言为中心的跨模态表征扩展学习） [02:29] 🎯 Detect Anything via Next Point Prediction（通过下一点预测检测万物） [03:02] ⚡ FlashVSR: Towards Real-Time Diffusion-Based Streaming Video Super-Resolution（FlashVSR：迈向实时扩散式流媒体视频超分辨率） [03:40] 🎯 Temporal Alignment Guidance: On-Manifold Sampling in Diffusion Models（时间对齐引导：扩散模型中的流形采样） [04:16] 🧠 Dr.LLM: Dynamic Layer Routing in LLMs（Dr.LLM：大模型中的动态层级路由） [05:03] 🎯 Spatial Forcing: Implicit Spatial Representation Alignment for Vision-language-action Model（空间强迫：面向视觉-语言-动作模型的隐式空间表征对齐） [05:50] 🤖 ERA: Transforming VLMs into Embodied Agents via Embodied Prior Learning and Online Reinforcement Learning（ERA：借助具身先验学习与在线强化学习将视觉-语言模型转化为具身智能体） [06:35] 🤖 Robot Learning: A Tutorial（机器人学习教程：从强化学习到多任务通用模型） [07:27] 🔄 SRUM: Fine-Grained Self-Rewarding for Unified Multimodal Models（SRUM：面向统一多模态模型的细粒度自奖励机制） [08:01] 🧠 Boundary-Guided Policy Optimization for Memory-efficient RL of Diffusion Large Language Models（面向扩散大语言模型的边界引导策略优化：内存高效的强化学习） [09:06] 🖼 UniFusion: Vision-Language Model as Unified Encoder in Image Generation（UniFusion：将视觉-语言模型统一作为图像生成的编码器） [09:43] 🧠 Memory as Action: Autonomous Context Curation for Long-Horizon Agentic Tasks（记忆即行动：面向长程智能体任务的自主上下文策展）【关注我们】您还可以在以下平台找到我们，获得播客内容以外更多信息小红书: AI速递

10分钟

95

3个月前

2025.10.14 | 量化误差变奖励，单卡训32B；面向多模态大模型的音视频评测基准

HuggingFace 每日AI论文速递

本期的 15 篇论文如下： [00:23] 🚀 QeRL: Beyond Efficiency -- Quantization-enhanced Reinforcement Learning for LLMs（QeRL：超越效率——面向大语言模型的量化增强强化学习） [01:22] 🧠 Diffusion Transformers with Representation Autoencoders（基于表示自编码器的扩散Transformer） [02:12] 🎬 OmniVideoBench: Towards Audio-Visual Understanding Evaluation for Omni MLLMs（OmniVideoBench：面向全向多模态大模型的音视频协同理解评测基准） [02:41] 🔄 Latent Refinement Decoding: Enhancing Diffusion-Based Language Models by Refining Belief States（潜变量精化解码：通过精化信念状态增强基于扩散的语言模型） [03:18] 🌊 RLFR: Extending Reinforcement Learning for LLMs with Flow Environment（RLFR：基于潜流环境扩展大模型强化学习） [04:11] 🔍 Spotlight on Token Perception for Multimodal Reinforcement Learning（多模态强化学习中token感知的光束聚焦） [04:50] 🎬 AVoCaDO: An Audiovisual Video Captioner Driven by Temporal Orchestration（AVoCaDO：面向时序编排的音视频联合字幕生成器） [05:25] 🌐 DiT360: High-Fidelity Panoramic Image Generation via Hybrid Training（DiT360：混合训练视角与全景数据的高保真全景图像生成） [05:56] 🧠 Demystifying Reinforcement Learning in Agentic Reasoning（揭开强化学习在智能体推理中的神秘面纱） [06:51] 🧮 Making Mathematical Reasoning Adaptive（让数学推理具备自适应性） [07:26] 🛡 Building a Foundational Guardrail for General Agentic Systems via Synthetic Data（面向通用智能体的基础护栏：基于合成数据的预执行安全框架） [08:05] 🧠 ACADREASON: Exploring the Limits of Reasoning Models with Academic Research Problems（ACADREASON：用学术研究问题探索推理模型的极限） [08:43] 🎨 InternSVG: Towards Unified SVG Tasks with Multimodal Large Language Models（InternSVG：用多模态大模型统一搞定SVG理解、编辑与生成） [09:23] 🧾 FinAuditing: A Financial Taxonomy-Structured Multi-Document Benchmark for Evaluating LLMs（FinAuditing：面向LLM评估的财务分类多文档基准） [10:09] 🧠 GIR-Bench: Versatile Benchmark for Generating Images with Reasoning（GIR-Bench：面向推理图像生成的多功能基准）【关注我们】您还可以在以下平台找到我们，获得播客内容以外更多信息小红书: AI速递

11分钟

99+

3个月前

2025.10.13 | 桌面交互预训练解锁机器人潜能；统一模型赋予相机空间想象力

HuggingFace 每日AI论文速递

本期的 14 篇论文如下： [00:20] 🖥 D2E: Scaling Vision-Action Pretraining on Desktop Data for Transfer to Embodied AI（D2E：利用桌面数据规模化视觉-动作预训练以迁移至具身智能） [01:13] 📷 Thinking with Camera: A Unified Multimodal Model for Camera-Centric Understanding and Generation（基于相机的统一多模态理解与生成模型） [01:56] 🎨 TAG:Tangential Amplifying Guidance for Hallucination-Resistant Diffusion Sampling（TAG：抑制幻觉的扩散采样切向放大引导） [02:31] 🧠 Multimodal Prompt Optimization: Why Not Leverage Multiple Modalities for MLLMs（多模态提示优化：为何不为多模态大模型释放全模态潜能） [03:05] 🚀 AutoPR: Let's Automate Your Academic Promotion!（AutoPR：让学术晋升一键自动化！） [03:39] 🧭 R-Horizon: How Far Can Your Large Reasoning Model Really Go in Breadth and Depth?（R-HORIZON：你的大推理模型在广度与深度上究竟能走多远？） [04:14] 🚀 Webscale-RL: Automated Data Pipeline for Scaling RL Data to Pretraining Levels（Webscale-RL：把强化学习数据扩展到预训练体量的自动化流水线） [04:56] 🛰 SpaceVista: All-Scale Visual Spatial Reasoning from mm to km（SpaceVista：毫米到千米全尺度视觉空间推理） [05:37] 🎥 StreamingVLM: Real-Time Understanding for Infinite Video Streams（StreamingVLM：面向无限视频流的实时理解框架） [06:19] 🌐 KORMo: Korean Open Reasoning Model for Everyone（KORMo：人人可用的韩语开放推理模型） [06:42] ♻ Don't Waste Mistakes: Leveraging Negative RL-Groups via Confidence Reweighting（别浪费错误：通过置信度加权利用负RL组） [07:25] 🧠 Bridging Reasoning to Learning: Unmasking Illusions using Complexity Out of Distribution Generalization（从推理到学习的桥梁：以复杂度分布外泛化揭穿幻觉） [08:16] ⚡ DISCO: Diversifying Sample Condensation for Efficient Model Evaluation（DISCO：以模型分歧为导向的样本浓缩加速评测） [08:56] 🚗 Progressive Gaussian Transformer with Anisotropy-aware Sampling for Open Vocabulary Occupancy Prediction（面向开放词汇占用预测的各向异性采样渐进高斯Transformer）【关注我们】您还可以在以下平台找到我们，获得播客内容以外更多信息小红书: AI速递

10分钟

99+

3个月前

【周末特辑】10月第2周最火AI论文 | 递归小模型刷爆推理榜；未来经验点亮零奖励学习

HuggingFace 每日AI论文速递

本期的 5 篇论文如下： [00:33] TOP1(🔥300) | 🧠 Less is More: Recursive Reasoning with Tiny Networks（小而精：用微型网络递归推理） [02:16] TOP2(🔥164) | 🌱 Agent Learning via Early Experience（基于早期经验的主体学习） [04:15] TOP3(🔥105) | 🧠 Apriel-1.5-15b-Thinker（Apriel-1.5-15B-Thinker：以小博大实现前沿多模态推理的15B开源模型） [06:17] TOP4(🔥97) | 🧠 MM-HELIX: Boosting Multimodal Long-Chain Reflective Reasoning with Holistic Platform and Adaptive Hybrid Policy Optimization（MM-HELIX：以整体平台与自适应混合策略优化激发多模态长链反思推理） [08:45] TOP5(🔥88) | 🎬 Paper2Video: Automatic Video Generation from Scientific Papers（论文自动生成学术演讲视频）【关注我们】您还可以在以下平台找到我们，获得播客内容以外更多信息小红书: AI速递

11分钟

99+

3个月前

2025.10.10 | 早期经验的Agent Learning；图文交错反思链跃升至24.9%

HuggingFace 每日AI论文速递

本期的 14 篇论文如下： [00:16] 🌱 Agent Learning via Early Experience（基于早期经验的主体学习） [00:50] 🧠 MM-HELIX: Boosting Multimodal Long-Chain Reflective Reasoning with Holistic Platform and Adaptive Hybrid Policy Optimization（MM-HELIX：以整体平台与自适应混合策略优化激发多模态长链反思推理） [01:42] 🧪 From What to Why: A Multi-Agent System for Evidence-based Chemical Reaction Condition Reasoning（从“是什么”到“为什么”：面向循证化学反应条件推理的多智能体系统） [02:19] 🎬 UniVideo: Unified Understanding, Generation, and Editing for Videos（UniVideo：统一理解、生成与编辑视频的多模态框架） [03:01] 🧠 When Thoughts Meet Facts: Reusable Reasoning for Long-Context LMs（当思想邂逅事实：面向长上下文语言模型的可复用推理） [03:43] 🧠 Meta-Awareness Enhances Reasoning Models: Self-Alignment Reinforcement Learning（元认知增强推理模型：自对齐强化学习） [04:25] 🧠 MemMamba: Rethinking Memory Patterns in State Space Model（MemMamba：重新思考状态空间模型中的记忆模式） [05:17] 🛡 The Alignment Waltz: Jointly Training Agents to Collaborate for Safety（对齐圆舞曲：联合训练智能体协同守护安全） [05:53] 🎯 Hybrid Reinforcement: When Reward Is Sparse, It's Better to Be Dense（混合强化：奖励稀疏时，密集信号更胜一筹） [06:40] 🧪 NewtonBench: Benchmarking Generalizable Scientific Law Discovery in LLM Agents（NewtonBench：评测大模型智能体在通用科学定律发现中的基准） [07:17] 🪚 DeepPrune: Parallel Scaling without Inter-trace Redundancy（DeepPrune：并行扩展中消除跨路径冗余的高效推理框架） [07:54] 🚀 Training-Free Group Relative Policy Optimization（免训练群组相对策略优化） [08:24] 🪄 ARTDECO: Towards Efficient and High-Fidelity On-the-Fly 3D Reconstruction with Structured Scene Representation（ARTDECO：面向高效高保真即时三维重建的结构化场景表征） [08:55] 🤥 LLMs Learn to Deceive Unintentionally: Emergent Misalignment in Dishonesty from Misaligned Samples to Biased Human-AI Interactions（大模型在欺骗性样本与偏见人机交互中意外学会欺骗：不诚实行为的新兴错位）【关注我们】您还可以在以下平台找到我们，获得播客内容以外更多信息小红书: AI速递

10分钟

99+

3个月前

2025.10.09 | Ming-UniVision统一视觉词表；KV-Cache直连让大模型秒聊

HuggingFace 每日AI论文速递

本期的 15 篇论文如下： [00:21] 🔄 Ming-UniVision: Joint Image Understanding and Generation with a Unified Continuous Tokenizer（Ming-UniVision：用统一连续视觉词表打通图像理解与生成） [00:59] 🧠 Cache-to-Cache: Direct Semantic Communication Between Large Language Models（缓存到缓存：大模型间的直接语义通信） [01:32] 🌀 Lumina-DiMOO: An Omni Diffusion Large Language Model for Multi-Modal Generation and Understanding（Lumina-DiMOO：面向多模态生成与理解的离散扩散大模型） [02:07] 🧠 SHANKS: Simultaneous Hearing and Thinking for Spoken Language Models（SHANKS：口语模型边听边想的同步推理框架） [03:06] 🤖 RLinf-VLA: A Unified and Efficient Framework for VLA+RL Training（RLinf-VLA：面向VLA模型强化学习训练的统一高效框架） [04:02] 🎬 MATRIX: Mask Track Alignment for Interaction-aware Video Generation（MATRIX：面向交互感知视频生成的掩码轨迹对齐） [04:51] 🎯 Vibe Checker: Aligning Code Evaluation with Human Preference（Vibe Checker：让代码评估对齐人类偏好） [05:44] 🤖 Multi-Agent Tool-Integrated Policy Optimization（多智能体工具集成策略优化） [06:24] 🧠 CALM Before the STORM: Unlocking Native Reasoning for Optimization Modeling（风暴前夜：解锁优化建模原生推理潜能的轻量化矫正框架） [06:59] ✂ OBS-Diff: Accurate Pruning For Diffusion Models in One-Shot（OBS-Diff：一次性精准剪枝扩散模型） [07:52] 🧠 Artificial Hippocampus Networks for Efficient Long-Context Modeling（面向高效长上下文建模的人工海马网络） [08:30] 🔍 Revisiting Long-context Modeling from Context Denoising Perspective（基于上下文降噪视角的长文本建模再审视） [09:11] 🧠 Pushing on Multilingual Reasoning Models with Language-Mixed Chain-of-Thought（推动多语言推理模型：语言混合思维链新范式） [09:51] 💥 Why Low-Precision Transformer Training Fails: An Analysis on Flash Attention（低精度Transformer训练为何失败：Flash Attention失效机理剖析） [10:37] ⚡ Native Hybrid Attention for Efficient Sequence Modeling（原生混合注意力高效序列建模）【关注我们】您还可以在以下平台找到我们，获得播客内容以外更多信息小红书: AI速递

11分钟

99+

3个月前

2025.10.08 | TaTToo用外挂代码干翻大模型；4B小模型32步逼近闭源巨头

HuggingFace 每日AI论文速递

本期的 15 篇论文如下： [00:24] 📊 TaTToo: Tool-Grounded Thinking PRM for Test-Time Scaling in Tabular Reasoning（TaTToo：面向表格推理测试时扩展的“工具落地思维”过程奖励模型） [00:57] 🔍 Fathom-DeepResearch: Unlocking Long Horizon Information Retrieval and Synthesis for SLMs（Fathom-DeepResearch：解锁小模型长程信息检索与综合的钥匙） [01:39] 🚀 Fast-dLLM v2: Efficient Block-Diffusion LLM（Fast-dLLM v2：高效的块扩散大语言模型） [02:30] 🧑 CoDA: Coding LM via Diffusion Adaptation（CoDA：基于扩散适配的轻量级代码生成模型） [03:01] 🧩 Scaling Code-Assisted Chain-of-Thoughts and Instructions for Model Reasoning（规模化代码辅助思维链与指令以增强模型推理） [03:52] ⚖ ASPO: Asymmetric Importance Sampling Policy Optimization（ASPO：非对称重要性采样策略优化） [04:34] 🔗 Mixing Mechanisms: How Language Models Retrieve Bound Entities In-Context（混合机制：语言模型如何在上下文中检索绑定实体） [05:15] 🧠 AInstein: Assessing the Feasibility of AI-Generated Approaches to Research Problems（AInstein：评估AI生成科研方案可行性的研究框架） [05:51] 🪂 Refusal Falls off a Cliff: How Safety Alignment Fails in Reasoning?（拒绝断崖：安全对齐在推理中为何崩塌） [06:35] 🌍 HoloScene: Simulation-Ready Interactive 3D Worlds from a Single Video（HoloScene：单视频生成可交互3D仿真世界） [07:22] ⚡ TensorBLEU: Vectorized GPU-based BLEU Score Implementation for Per-Sentence In-Training Evaluation（TensorBLEU：面向逐句训练评估的向量化GPU加速BLEU分数实现） [08:09] 🎯 Margin Adaptive DPO: Leveraging Reward Model for Granular Control in Preference Optimization（边缘自适应DPO：利用奖励模型实现偏好优化的粒度控制） [09:00] 🩺 Discrete Diffusion Models with MLLMs for Unified Medical Multimodal Generation（基于多模态大语言模型的离散扩散模型实现统一医学多模态生成） [09:46] 🧠 MixReasoning: Switching Modes to Think（混合推理：动态切换思考模式） [10:20] ⚡ LightCache: Memory-Efficient, Training-Free Acceleration for Video Generation（LightCache：面向视频生成的内存高效、无需训练的加速方法）【关注我们】您还可以在以下平台找到我们，获得播客内容以外更多信息小红书: AI速递

11分钟

99+

3个月前

2025.10.07 | 论文秒变演讲；Video-LMM后训练突破

HuggingFace 每日AI论文速递

本期的 15 篇论文如下： [00:21] 🎬 Paper2Video: Automatic Video Generation from Scientific Papers（论文自动生成学术演讲视频） [00:55] 🎬 Video-LMM Post-Training: A Deep Dive into Video Reasoning with Large Multimodal Models（Video-LMM后训练：深入剖析大型多模态模型的视频推理） [01:38] 🎬 VChain: Chain-of-Visual-Thought for Reasoning in Video Generation（VChain：面向视频生成推理的视觉思维链） [02:14] 👻 Imperceptible Jailbreaking against Large Language Models（针对大语言模型的隐形越狱攻击） [02:56] 🌳 MITS: Enhanced Tree Search Reasoning for LLMs via Pointwise Mutual Information（MITS：基于点互信息的树搜索增强大模型推理） [03:30] 🧬 Hybrid Architectures for Language Models: Systematic Analysis and Design Insights（语言模型混合架构：系统剖析与设计洞见） [04:07] 📊 Factuality Matters: When Image Generation and Editing Meet Structured Visuals（事实至关重要：当图像生成与编辑遇上结构化视觉） [04:59] 🔄 Reactive Transformer (RxT) -- Stateful Real-Time Processing for Event-Driven Reactive Language Models（反应式Transformer：事件驱动的实时有状态对话模型） [05:55] ⚖ Judging with Confidence: Calibrating Autoraters to Preference Distributions（置信评判：将自动评分器校准到偏好分布） [06:44] 🎯 Reinforce-Ada: An Adaptive Sampling Framework for Reinforce-Style LLM Training（Reinforce-Ada：面向Reinforce风格LLM训练的自适应采样框架） [07:27] 📏 Optimal Scaling Needs Optimal Norm（最优扩放需要最优范数） [07:51] 🔬 Code4MeV2: a Research-oriented Code-completion Platform（Code4MeV2：面向研究的代码补全平台） [08:31] 🪞 Self-Reflective Generation at Test Time（测试时自反思生成） [09:15] 🔄 SwiReasoning: Switch-Thinking in Latent and Explicit for Pareto-Superior Reasoning LLMs（SwiReasoning：在显式与潜空间之间切换思维，实现帕累托更优的推理大模型） [10:00] 👀 Watch and Learn: Learning to Use Computers from Online Videos（观看与学习：从在线视频中学习使用计算机）【关注我们】您还可以在以下平台找到我们，获得播客内容以外更多信息小红书: AI速递

11分钟

99+

3个月前

2025.10.06 | 15B小模型追平DeepSeek-R1；渐进蒸馏128 token省八成算力

HuggingFace 每日AI论文速递

本期的 15 篇论文如下： [00:28] 🧠 Apriel-1.5-15b-Thinker（Apriel-1.5-15B-Thinker：以小博大实现前沿多模态推理的15B开源模型） [01:04] 🚀 Efficient Multi-modal Large Language Models via Progressive Consistency Distillation（基于渐进一致性蒸馏的高效多模态大模型） [01:42] 🧩 Compose Your Policies! Improving Diffusion-based or Flow-based Robot Policies via Test-time Distribution-level Composition（组合式策略！利用测试时段分布级组合提升基于扩散或流的机器人策略性能） [02:19] 🪞 Self-Improvement in Multimodal Large Language Models: A Survey（多模态大语言模型自我提升综述） [02:59] 🧬 Your Agent May Misevolve: Emergent Risks in Self-evolving LLM Agents（你的智能体可能误入歧途：自演化大模型智能体中的涌现风险） [03:38] 📊 CoDA: Agentic Systems for Collaborative Data Visualization（CoDA：面向协同数据可视化的智能体系统） [04:21] 🧐 SurveyBench: How Well Can LLM(-Agents) Write Academic Surveys?（SurveyBench：大模型（智能体）写学术综述能有多靠谱？） [05:06] 🔧 REPAIR: Robust Editing via Progressive Adaptive Intervention and Reintegration（REPAIR：渐进式自适应干预与再融合的鲁棒编辑框架） [05:53] 🔍 OrtSAE: Orthogonal Sparse Autoencoders Uncover Atomic Features（OrtSAE：正交稀疏自编码器揭示原子级特征） [06:38] 🔍 FocusAgent: Simple Yet Effective Ways of Trimming the Large Context of Web Agents（FocusAgent：轻量级检索器为网页智能体精简冗长上下文的简易高效方案） [07:14] 🎯 Improving GUI Grounding with Explicit Position-to-Coordinate Mapping（基于显式位置-坐标映射的GUI定位改进方法） [08:05] 📏 LSPO: Length-aware Dynamic Sampling for Policy Optimization in LLM Reasoning（LSPO：面向大模型推理的基于长度感知的动态采样策略优化） [08:45] 🤖 WAInjectBench: Benchmarking Prompt Injection Detections for Web Agents（WAInjectBench：面向网页智能体的提示注入攻防基准评测） [09:19] 🍱 Free Lunch Alignment of Text-to-Image Diffusion Models without Preference Image Pairs（无需配对偏好图像即可免费对齐文本到图像扩散模型） [09:54] 🎯 LEAML: Label-Efficient Adaptation to Out-of-Distribution Visual Tasks for Multimodal Large Language Models（LEAML：面向多模态大模型的标签高效分布外视觉任务适配）【关注我们】您还可以在以下平台找到我们，获得播客内容以外更多信息小红书: AI速递

11分钟

99+

3个月前

【周末特辑】10月第1周最火AI论文 | Transformer长出大脑的壳；LongLive把长视频做成直播

HuggingFace 每日AI论文速递

本期的 5 篇论文如下： [00:43] TOP1(🔥323) | 🐣 The Dragon Hatchling: The Missing Link between the Transformer and Models of the Brain（幼龙破壳： Transformer 与大脑模型之间缺失的环节） [02:38] TOP2(🔥167) | 🎬 LongLive: Real-time Interactive Long Video Generation（LongLive：实时交互式长视频生成框架） [05:04] TOP3(🔥150) | 🔥 MCPMark: A Benchmark for Stress-Testing Realistic and Comprehensive MCP Use（MCPMark：面向真实且全面的MCP应用场景的压力测试基准） [07:24] TOP4(🔥124) | 🧠 EPO: Entropy-regularized Policy Optimization for LLM Agents Reinforcement Learning（EPO：面向LLM智能体强化学习的熵正则策略优化） [09:18] TOP5(🔥122) | 🎮 Vision-Zero: Scalable VLM Self-Improvement via Strategic Gamified Self-Play（Vision-Zero：基于策略化博弈自对弈的可扩展视觉语言模型自我提升）【关注我们】您还可以在以下平台找到我们，获得播客内容以外更多信息小红书: AI速递

12分钟

99+

3个月前

2025.10.03 | LongCodeZip删得快准；迈向分钟级高质量视频生成

HuggingFace 每日AI论文速递

本期的 15 篇论文如下： [00:22] 🗜 LongCodeZip: Compress Long Context for Code Language Models（LongCodeZip：面向代码大模型的长上下文压缩方法） [00:56] 🎬 Self-Forcing++: Towards Minute-Scale High-Quality Video Generation（自增强++：迈向分钟级高质量视频生成） [01:38] 🧠 ExGRPO: Learning to Reason from Experience（基于经验的群体相对策略优化：让大模型学会从经验中推理） [02:32] 🥷 StealthAttack: Robust 3D Gaussian Splatting Poisoning via Density-Guided Illusions（隐身投毒：基于密度引导幻觉的鲁棒3D高斯溅射攻击） [03:32] 🎛 Interactive Training: Feedback-Driven Neural Network Optimization（交互式训练：反馈驱动的神经网络优化） [04:24] 📈 StockBench: Can LLM Agents Trade Stocks Profitably In Real-world Markets?（StockBench：大模型智能体能否在真实股市中稳定盈利？） [05:07] 🔍 VOGUE: Guiding Exploration with Visual Uncertainty Improves Multimodal Reasoning（VOGUE：用视觉不确定性引导探索，提升多模态推理） [05:44] 🪓 The Rogue Scalpel: Activation Steering Compromises LLM Safety（失控的手术刀：激活向量操控竟瓦解大模型安全锁） [06:21] 🔍 CLUE: Non-parametric Verification from Experience via Hidden-State Clustering（CLUE：基于隐状态聚类的非参数经验验证） [07:09] 🔍 ModernVBERT: Towards Smaller Visual Document Retrievers（ModernVBERT：打造更轻量的视觉文档检索器） [07:54] 🗺 RewardMap: Tackling Sparse Rewards in Fine-grained Visual Reasoning via Multi-Stage Reinforcement Learning（RewardMap：通过多阶段强化学习解决细粒度视觉推理中的稀疏奖励问题） [08:37] 🚀 F2LLM Technical Report: Matching SOTA Embedding Performance with 6 Million Open-Source Data（F2LLM技术报告：仅用600万开源数据即可达到SOTA嵌入性能） [09:13] 🧠 RLP: Reinforcement as a Pretraining Objective（RLP：将强化学习作为预训练目标） [09:45] 🖱 DragFlow: Unleashing DiT Priors with Region Based Supervision for Drag Editing（DragFlow：借助区域监督释放DiT先验，实现拖拽式编辑） [10:19] 🚀 The Unreasonable Effectiveness of Scaling Agents for Computer Use（扩展计算机使用代理的规模带来的不合理有效性）【关注我们】您还可以在以下平台找到我们，获得播客内容以外更多信息小红书: AI速递

11分钟

99+

4个月前