Album

HuggingFace 每日AI论文速递

10分钟速读热门AI论文

拨号上网 佚名
1.33万 订阅 545 集 4天前
播客简介
每天10分钟,带您快速了解当日HuggingFace热门AI论文内容。每个工作日更新,欢迎订阅。 📢播客节目在小宇宙、Apple Podcast平台搜索【HuggingFace 每日AI论文速递】 🖼另外还有图文版,可在小红书搜索并关注【AI速递】
节目

2026.03.16 | LMEB填补长记忆评测盲区;Cheers解耦语义与细节实现多模态统一

HuggingFace 每日AI论文速递

【赞助商】 通勤路上就听AI每周谈。AI每周谈,每周带你回顾上周AI大事 传送门 🔗https://www.xiaoyuzhoufm.com/podcast/688a34636f5a275f1cba40fd 【目录】 本期的 15 篇论文如下: [00:28] 🧠 LMEB: Long-horizon Memory Embedding Benchmark(LMEB:长时程记忆嵌入基准) [01:12] 🔄 Cheers: Decoupling Patch Details from Semantic Representations Enables Unified Multimodal Comprehension and Generation(Cheers:通过解耦补丁细节与语义表征实现统一的多模态理解与生成) [01:59] 🐳 daVinci-Env: Open SWE Environment Synthesis at Scale(daVinci-Env:大规模开源软件工程环境合成) [02:46] 🔍 Can Vision-Language Models Solve the Shell Game?(视觉语言模型能破解“猜球游戏”吗?) [03:26] ⚡ OmniForcing: Unleashing Real-time Joint Audio-Visual Generation(OmniForcing:释放实时联合视听生成) [04:14] 🎯 Visual-ERM: Reward Modeling for Visual Equivalence(Visual-ERM:面向视觉等价性的奖励建模) [05:11] 🔍 MM-CondChain: A Programmatically Verified Benchmark for Visually Grounded Deep Compositional Reasoning(MM-CondChain:一个经程序验证的视觉基础深度组合推理基准) [06:18] 🌉 V-Bridge: Bridging Video Generative Priors to Versatile Few-shot Image Restoration(V-Bridge:将视频生成先验桥接至通用少样本图像复原) [07:05] 🔍 Multimodal OCR: Parse Anything from Documents(多模态OCR:从文档中解析一切) [07:49] 🧠 Video Streaming Thinking: VideoLLMs Can Watch and Think Simultaneously(视频流式思考:VideoLLMs能够边观看边推理) [08:22] ⚠ HomeSafe-Bench: Evaluating Vision-Language Models on Unsafe Action Detection for Embodied Agents in Household Scenarios(HomeSafe-Bench:评估视觉语言模型在家庭场景具身智能体不安全动作检测中的表现) [09:13] 🔍 From Sparse to Dense: Multi-View GRPO for Flow Models via Augmented Condition Space(从稀疏到稠密:通过增强条件空间实现流模型的多视图GRPO) [09:59] ⚡ HybridStitch: Pixel and Timestep Level Model Stitching for Diffusion Acceleration(HybridStitch:用于扩散加速的像素与时间步级别模型拼接) [11:04] 🧠 Steve-Evolving: Open-World Embodied Self-Evolution via Fine-Grained Diagnosis and Dual-Track Knowledge Distillation(史蒂夫进化:通过细粒度诊断与双轨知识蒸馏实现开放世界具身自我进化) [11:54] 🎬 VQQA: An Agentic Approach for Video Evaluation and Quality Improvement(VQQA:一种用于视频评估与质量提升的智能体方法) 【关注我们】 您还可以在以下平台找到我们,获得播客内容以外更多信息 小红书: AI速递

13分钟
90
4天前

2026.03.13 | 流式空间记忆2B小模型逆袭;AI“蛮力”翻页不敌人类策略

HuggingFace 每日AI论文速递

【赞助商】 通勤路上就听AI每周谈。AI每周谈,每周带你回顾上周AI大事 传送门 🔗https://www.xiaoyuzhoufm.com/podcast/688a34636f5a275f1cba40fd 【目录】 本期的 15 篇论文如下: [00:32] 🧠 Spatial-TTT: Streaming Visual-based Spatial Intelligence with Test-Time Training(Spatial-TTT:基于测试时训练的流式视觉空间智能) [01:17] 🤔 Strategic Navigation or Stochastic Search? How Agents and Humans Reason Over Document Collections(策略性导航还是随机搜索?智能体与人类在文档集合上的推理方式研究) [02:11] ⚡ IndexCache: Accelerating Sparse Attention via Cross-Layer Index Reuse(IndexCache:通过跨层索引复用加速稀疏注意力) [02:54] 🎬 Video-Based Reward Modeling for Computer-Use Agents(基于视频的计算机使用智能体奖励建模) [03:55] 🎬 DreamVideo-Omni: Omni-Motion Controlled Multi-Subject Video Customization with Latent Identity Reinforcement Learning(DreamVideo-Omni:基于潜在身份强化学习的全运动控制多主体视频定制) [04:46] 🎯 Trust Your Critic: Robust Reward Modeling and Reinforcement Learning for Faithful Image Editing and Generation(信任你的评判者:用于忠实图像编辑与生成的鲁棒奖励建模与强化学习) [05:40] 🎬 DVD: Deterministic Video Depth Estimation with Generative Priors(DVD:基于生成先验的确定性视频深度估计) [06:29] 🖼 WeEdit: A Dataset, Benchmark and Glyph-Guided Framework for Text-centric Image Editing(WeEdit:面向文本中心图像编辑的数据集、基准与字形引导框架) [07:29] 🎬 ShotVerse: Advancing Cinematic Camera Control for Text-Driven Multi-Shot Video Creation(ShotVerse:面向文本驱动多镜头视频创作的电影级摄像机控制技术) [08:24] 🧠 GRADE: Benchmarking Discipline-Informed Reasoning in Image Editing(GRADE:基准测试学科知识驱动的图像编辑推理能力) [09:08] 🎬 EVATok: Adaptive Length Video Tokenization for Efficient Visual Autoregressive Generation(EVATok:面向高效视觉自回归生成的自适应长度视频分词) [09:55] ⚡ One Model, Many Budgets: Elastic Latent Interfaces for Diffusion Transformers(一模型,多预算:用于扩散变换器的弹性潜在接口) [10:46] 🤖 OmniStream: Mastering Perception, Reconstruction and Action in Continuous Streams(OmniStream:在连续流中掌握感知、重建与行动) [11:29] 🧠 EndoCoT: Scaling Endogenous Chain-of-Thought Reasoning in Diffusion Models(EndoCoT:在扩散模型中扩展内生思维链推理) [12:37] 🧠 XSkill: Continual Learning from Experience and Skills in Multimodal Agents(XSkill:多模态智能体从经验与技能中的持续学习) 【关注我们】 您还可以在以下平台找到我们,获得播客内容以外更多信息 小红书: AI速递

13分钟
99+
1周前

2026.03.12 | 边聊边训智能体;GPU秒解亿级K均

HuggingFace 每日AI论文速递

【赞助商】 通勤路上就听AI每周谈。AI每周谈,每周带你回顾上周AI大事 传送门 🔗https://www.xiaoyuzhoufm.com/podcast/688a34636f5a275f1cba40fd 【目录】 本期的 15 篇论文如下: [00:29] 🤖 OpenClaw-RL: Train Any Agent Simply by Talking(OpenClaw-RL:通过对话训练任意智能体) [01:17] ⚡ Flash-KMeans: Fast and Memory-Efficient Exact K-Means(Flash-KMeans:快速且内存高效的精确K-Means算法) [02:01] 👁 MA-EgoQA: Question Answering over Egocentric Videos from Multiple Embodied Agents(MA-EgoQA:基于多具身智能体第一人称视角视频的问答) [02:43] 🧠 In-Context Reinforcement Learning for Tool Use in Large Language Models(大语言模型中工具使用的上下文强化学习) [03:19] 🧠 ReMix: Reinforcement routing for mixtures of LoRAs in LLM finetuning(ReMix:基于强化学习的LoRA混合路由用于大语言模型微调) [04:10] 📊 Can Large Language Models Keep Up? Benchmarking Online Adaptation to Continual Knowledge Streams(大型语言模型能否跟上?在线适应持续知识流的基准测试) [05:00] 🧠 RetroAgent: From Solving to Evolving via Retrospective Dual Intrinsic Feedback(RetroAgent:通过回顾性双重内在反馈实现从解决问题到持续进化) [05:50] 🔬 CodePercept: Code-Grounded Visual STEM Perception for MLLMs(CodePercept:基于代码的多模态大语言模型视觉STEM感知) [06:44] 🎯 Prism-$Δ$: Differential Subspace Steering for Prompt Highlighting in Large Language Models(Prism-Δ:面向大语言模型提示高亮的差分子空间导向方法) [07:31] 🧠 LLM2Vec-Gen: Generative Embeddings from Large Language Models(LLM2Vec-Gen:基于大语言模型的生成式嵌入方法) [08:22] ⚖ $V_{0.5}$: Generalist Value Model as a Prior for Sparse RL Rollouts(V_{0.5}:作为稀疏强化学习rollouts先验的通用价值模型) [09:05] ⚡ Just-in-Time: Training-Free Spatial Acceleration for Diffusion Transformers(即时:无需训练的空间加速方法用于扩散Transformer) [09:47] 🧠 Bootstrapping Exploration with Group-Level Natural Language Feedback in Reinforcement Learning(强化学习中利用群体级自然语言反馈引导探索) [10:39] 💬 RbtAct: Rebuttal as Supervision for Actionable Review Feedback Generation(RbtAct:以反驳作为监督的可操作审稿反馈生成) [11:14] 🧠 Hindsight Credit Assignment for Long-Horizon LLM Agents(面向长视野LLM智能体的后见之明信用分配) 【关注我们】 您还可以在以下平台找到我们,获得播客内容以外更多信息 小红书: AI速递

12分钟
99+
1周前
评价

空空如也

加入我们的 Discord

与播客爱好者一起交流

立即加入

扫描微信二维码

添加微信好友,获取更多播客资讯

微信二维码

播放列表

自动播放下一个

播放列表还是空的

去找些喜欢的节目添加进来吧