2025.08.04 | 扩散语言模型变长去噪,高效省资源;PixNerd图像扩散,高效高质量。

HuggingFace 每日AI论文速递

本期的 11 篇论文如下: [00:22] 🔄 Beyond Fixed: Variable-Length Denoising for Diffusion Large Language Models(超越固定长度:扩散大语言模型的可变长度去噪) [00:44] 🎨 PixNerd: Pixel Neural Field Diffusion(PixNerd:像素神经场扩散) [01:11] 💡 SWE-Exp: Experience-Driven Software Issue Resolution(SWE-Exp:经验驱动的软件问题解决) [01:38] 🔍 Multimodal Referring Segmentation: A Survey(多模态指代表达分割:一项综述) [01:59] 🧠 3D-R1: Enhancing Reasoning in 3D VLMs for Unified Scene Understanding(3D-R1:增强3D VLM的推理能力以实现统一场景理解) [02:40] 🤖 SWE-Debate: Competitive Multi-Agent Debate for Software Issue Resolution(SWE-Debate:用于软件问题解决的竞争性多智能体辩论) [03:05] ⚖ Learning an Efficient Multi-Turn Dialogue Evaluator from Multiple Judges(从多个评委中学习高效的多轮对话评估器) [03:33] 🤯 Investigating Hallucination in Conversations for Low Resource Languages(研究低资源语言对话中的幻觉现象) [04:00] 🧭 IGL-Nav: Incremental 3D Gaussian Localization for Image-goal Navigation(IGL-Nav:用于图像目标导航的增量式三维高斯定位) [04:30] 🎧 SpA2V: Harnessing Spatial Auditory Cues for Audio-driven Spatially-aware Video Generation(SpA2V: 利用空间听觉线索进行音频驱动的空间感知视频生成) [04:55] 🎮 Multi-Agent Game Generation and Evaluation via Audio-Visual Recordings(多智能体游戏生成与评估基于视听记录) 【关注我们】 您还可以在以下平台找到我们,获得播客内容以外更多信息 小红书: AI速递

5分钟
94
3个月前

【月末特辑】7月最火AI论文 | GSPO稳训练;序列级裁剪降方差;上下文工程综述,动态拼装信息流

HuggingFace 每日AI论文速递

本期的 10 篇论文如下: [00:30] TOP1(🔥257) | 🚀 Group Sequence Policy Optimization(组序列策略优化) [02:21] TOP2(🔥227) | 🧮 A Survey of Context Engineering for Large Language Models(大型语言模型上下文工程综述) [03:33] TOP3(🔥207) | 🧠 GLM-4.1V-Thinking: Towards Versatile Multimodal Reasoning with Scalable Reinforcement Learning(GLM-4.1V-Thinking:基于可扩展强化学习的通用多模态推理) [05:02] TOP4(🔥151) | 🎬 Scaling RL to Long Videos(强化学习驱动视觉语言模型扩展至长视频) [06:57] TOP5(🔥144) | 🧠 MemOS: A Memory OS for AI System(MemOS:面向人工智能系统的内存操作系统) [08:47] TOP6(🔥126) | 🎬 Kwai Keye-VL Technical Report(Kwai Keye-VL 技术报告) [10:41] TOP7(🔥126) | 🎯 GUI-G$^2$: Gaussian Reward Modeling for GUI Grounding(GUI-G$^2$: 基于高斯奖励模型的GUI定位) [12:38] TOP8(🔥121) | 🤖 Agentic Reinforced Policy Optimization(智能体强化策略优化) [14:21] TOP9(🔥120) | 🧮 MiroMind-M1: An Open-Source Advancement in Mathematical Reasoning via Context-Aware Multi-Stage Policy Optimization(MiroMind-M1:通过上下文感知多阶段策略优化实现数学推理的开源进展) [15:53] TOP10(🔥118) | ⚡ $\nabla$NABLA: Neighborhood Adaptive Block-Level Attention(邻域自适应块级注意力) 【关注我们】 您还可以在以下平台找到我们,获得播客内容以外更多信息 小红书: AI速递

18分钟
99+
3个月前

2025.08.01 | Seed-Prover融合LLM解决IMO数学题;Phi-Ground提升GUI感知精度。

HuggingFace 每日AI论文速递

本期的 15 篇论文如下: [00:22] 🏆 Seed-Prover: Deep and Broad Reasoning for Automated Theorem Proving(Seed-Prover:自动化定理证明的深度与广度推理) [01:04] 🎯 Phi-Ground Tech Report: Advancing Perception in GUI Grounding(Phi-Ground 技术报告:提升 GUI 接地感知能力) [01:30] 🤔 C3: A Bilingual Benchmark for Spoken Dialogue Models Exploring Challenges in Complex Conversations(C3:探索复杂对话挑战的双语口语对话模型基准) [02:07] 🚀 RecGPT Technical Report(RecGPT 技术报告) [02:36] 🤖 villa-X: Enhancing Latent Action Modeling in Vision-Language-Action Models(villa-X:增强视觉-语言-动作模型中的潜在动作建模) [03:14] 🤖 Scalable Multi-Task Reinforcement Learning for Generalizable Spatial Intelligence in Visuomotor Agents(可扩展的多任务强化学习,赋能视觉运动智能体可泛化空间智能) [04:07] ⚖ Persona Vectors: Monitoring and Controlling Character Traits in Language Models(人格向量:语言模型中性格特征的监测与控制) [04:41] 🚀 iLRM: An Iterative Large 3D Reconstruction Model(iLRM:迭代式大型3D重建模型) [05:32] ✅ TARS: MinMax Token-Adaptive Preference Strategy for Hallucination Reduction in MLLMs(TARS:多模态大语言模型幻觉抑制的最小最大词元自适应偏好策略) [06:02] 💡 On the Expressiveness of Softmax Attention: A Recurrent Neural Network Perspective(Softmax注意力机制的表达能力:循环神经网络视角) [06:29] 🤝 NeRF Is a Valuable Assistant for 3D Gaussian Splatting(NeRF 是 3D Gaussian Splatting 的得力助手) [07:05] 🌾 AgroBench: Vision-Language Model Benchmark in Agriculture(AgroBench:农业视觉-语言模型基准) [07:36] 🎨 Beyond Linear Bottlenecks: Spline-Based Knowledge Distillation for Culturally Diverse Art Style Classification(超越线性瓶颈:基于样条的知识蒸馏用于文化多样性艺术风格分类) [08:15] 🔎 Enhanced Arabic Text Retrieval with Attentive Relevance Scoring(采用注意力相关性评分的增强型阿拉伯语文本检索) [08:45] 🌊 Flow Equivariant Recurrent Neural Networks(流等变循环神经网络) 【关注我们】 您还可以在以下平台找到我们,获得播客内容以外更多信息 小红书: AI速递

9分钟
79
3个月前

2025.07.31 | ScreenCoder自动化UI转代码;Falcon-H1混合架构,提升长序列效率。

HuggingFace 每日AI论文速递

本期的 9 篇论文如下: [00:22] 💻 ScreenCoder: Advancing Visual-to-Code Generation for Front-End Automation via Modular Multimodal Agents(ScreenCoder:模块化多模态智能体赋能前端视觉代码生成) [01:02] 🚀 Falcon-H1: A Family of Hybrid-Head Language Models Redefining Efficiency and Performance(Falcon-H1:重塑效率与性能的混合架构语言模型系列) [01:33] 💥 BANG: Dividing 3D Assets via Generative Exploded Dynamics(BANG:基于生成式爆炸动态的三维资产分解) [02:17] 🧠 VL-Cogito: Progressive Curriculum Reinforcement Learning for Advanced Multimodal Reasoning(VL-Cogito:面向高级多模态推理的渐进式课程强化学习) [02:51] 🚁 Adapting Vehicle Detectors for Aerial Imagery to Unseen Domains with Weak Supervision(弱监督下航空影像车辆检测器在未知领域的适配) [03:34] 🧩 Towards Omnimodal Expressions and Reasoning in Referring Audio-Visual Segmentation(迈向指代性音视频分割中的全模态表达与推理) [04:04] 🚀 Efficient Differentially Private Fine-Tuning of LLMs via Reinforcement Learning(基于强化学习的大语言模型高效差分隐私微调) [04:56] 🛠 Repair-R1: Better Test Before Repair(Repair-R1:修复前先测试,效果更佳) [05:33] 🌍 MetaCLIP 2: A Worldwide Scaling Recipe(MetaCLIP 2:全球规模化训练方案) 【关注我们】 您还可以在以下平台找到我们,获得播客内容以外更多信息 小红书: AI速递

6分钟
81
3个月前

2025.07.30 | 混元世界从文字像素生成沉浸3D世界;X-Omni用强化学习提升图像生成质量。

HuggingFace 每日AI论文速递

本期的 8 篇论文如下: [00:23] 🌍 HunyuanWorld 1.0: Generating Immersive, Explorable, and Interactive 3D Worlds from Words or Pixels(混元世界 1.0:从文字或像素生成沉浸式、可探索、可交互的3D世界) [00:56] ✨ X-Omni: Reinforcement Learning Makes Discrete Autoregressive Image Generative Models Great Again(X-Omni:强化学习让离散自回归图像生成模型再展辉煌) [01:59] 🚀 CUDA-L1: Improving CUDA Optimization via Contrastive Reinforcement Learning(CUDA-L1:通过对比强化学习改进CUDA优化) [02:43] ✨ MaPPO: Maximum a Posteriori Preference Optimization with Prior Knowledge(MaPPO:结合先验知识的最大后验偏好优化) [03:32] 🐾 AnimalClue: Recognizing Animals by their Traces(AnimalClue:通过痕迹识别动物) [04:04] 🏃 MOVE: Motion-Guided Few-Shot Video Object Segmentation(MOVE:运动引导的少样本视频目标分割) [04:31] 🤥 MoHoBench: Assessing Honesty of Multimodal Large Language Models via Unanswerable Visual Questions(MoHoBench:通过无法回答的视觉问题评估多模态大语言模型的诚实性) [04:59] 🐘 Evaluating Deep Learning Models for African Wildlife Image Classification: From DenseNet to Vision Transformers(评估用于非洲野生动物图像分类的深度学习模型:从DenseNet到视觉Transformer) 【关注我们】 您还可以在以下平台找到我们,获得播客内容以外更多信息 小红书: AI速递

6分钟
94
3个月前

2025.07.29 | ARPO提升LLM工具交互性能;ARC-Hunyuan-Video-7B深耕短视频理解。

HuggingFace 每日AI论文速递

本期的 15 篇论文如下: [00:23] 🤖 Agentic Reinforced Policy Optimization(智能体强化策略优化) [00:55] 🧠 ARC-Hunyuan-Video-7B: Structured Video Comprehension of Real-World Shorts(ARC-Hunyuan-Video-7B:真实世界短视频的结构化理解) [01:35] 🚀 Rep-MTL: Unleashing the Power of Representation-level Task Saliency for Multi-Task Learning(Rep-MTL:释放表示层任务显著性在多任务学习中的力量) [02:03] 🌐 Reconstructing 4D Spatial Intelligence: A Survey(重建4D空间智能:一项综述) [02:55] 💡 SmallThinker: A Family of Efficient Large Language Models Natively Trained for Local Deployment(SmallThinker:原生为本地部署而训练的高效大型语言模型家族) [03:35] 🚀 A Survey of Self-Evolving Agents: On Path to Artificial Super Intelligence(自进化智能体综述:通往人工超级智能之路) [04:17] ⚖ Geometric-Mean Policy Optimization(几何平均策略优化) [04:59] 🎯 Region-based Cluster Discrimination for Visual Representation Learning(面向视觉表征学习的区域聚类判别) [05:38] ✨ GPT-IMAGE-EDIT-1.5M: A Million-Scale, GPT-Generated Image Dataset(GPT-IMAGE-EDIT-1.5M:一个百万规模的GPT生成图像数据集) [06:18] 🚀 UloRL:An Ultra-Long Output Reinforcement Learning Approach for Advancing Large Language Models' Reasoning Abilities(UloRL:一种提升大型语言模型推理能力的超长输出强化学习方法) [06:47] ⚡ Met$^2$Net: A Decoupled Two-Stage Spatio-Temporal Forecasting Model for Complex Meteorological Systems(Met$^2$Net:一种针对复杂气象系统的解耦两阶段时空预测模型) [07:18] ✨ ForCenNet: Foreground-Centric Network for Document Image Rectification(ForCenNet:面向前景的文档图像矫正网络) [07:52] 🎨 ScenePainter: Semantically Consistent Perpetual 3D Scene Generation with Concept Relation Alignment(ScenePainter:基于概念关系对齐的语义一致永续三维场景生成) [08:43] 🏆 Music Arena: Live Evaluation for Text-to-Music(Music Arena:文本到音乐的实时评估) [09:13] 🎶 JAM: A Tiny Flow-based Song Generator with Fine-grained Controllability and Aesthetic Alignment(JAM:一个具有细粒度可控性和审美对齐的微型基于流的歌曲生成器) 【关注我们】 您还可以在以下平台找到我们,获得播客内容以外更多信息 小红书: AI速递

10分钟
90
3个月前

2025.07.25 | GSPO解决大模型训练崩溃;MUR提升LLM推理效率。

HuggingFace 每日AI论文速递

本期的 15 篇论文如下: [00:24] 🚀 Group Sequence Policy Optimization(组序列策略优化) [00:53] 🧠 MUR: Momentum Uncertainty guided Reasoning for Large Language Models(MUR:面向大型语言模型的动量不确定性引导推理) [01:30] 🧠 LAPO: Internalizing Reasoning Efficiency via Length-Adaptive Policy Optimization(LAPO:内化推理效率的长度自适应策略优化) [02:09] 🎬 Captain Cinema: Towards Short Movie Generation(电影队长:迈向短片电影生成) [02:58] 📈 TTS-VAR: A Test-Time Scaling Framework for Visual Auto-Regressive Generation(TTS-VAR:一种用于视觉自回归生成的测试时缩放框架) [03:36] 🌍 EarthCrafter: Scalable 3D Earth Generation via Dual-Sparse Latent Diffusion(EarthCrafter:通过双稀疏潜在扩散实现可扩展三维地球生成) [04:23] 💡 Hierarchical Budget Policy Optimization for Adaptive Reasoning(用于自适应推理的分层预算策略优化) [04:48] 🔄 DriftMoE: A Mixture of Experts Approach to Handle Concept Drifts(DriftMoE:一种处理概念漂移的混合专家方法) [05:17] 🚀 Technical Report of TeleChat2, TeleChat2.5 and T1(TeleChat2、TeleChat2.5和T1技术报告) [06:00] 📈 DMOSpeech 2: Reinforcement Learning for Duration Prediction in Metric-Optimized Speech Synthesis(DMOSpeech 2:度量优化语音合成中时长预测的强化学习) [06:31] ✨ A New Pair of GloVes(新一代GloVe模型) [07:10] 🚀 GLiNER2: An Efficient Multi-Task Information Extraction System with Schema-Driven Interface(GLiNER2:一个高效多任务模式驱动的信息抽取系统) [07:38] ⚡ TeEFusion: Blending Text Embeddings to Distill Classifier-Free Guidance(TeEFusion:融合文本嵌入以蒸馏无分类器引导) [08:22] ⚕ SegDT: A Diffusion Transformer-Based Segmentation Model for Medical Imaging(SegDT:一个基于扩散Transformer的医学影像分割模型) [08:52] 🧩 Discovering and using Spelke segments(发现与应用 Spelke 分割) 【关注我们】 您还可以在以下平台找到我们,获得播客内容以外更多信息 小红书: AI速递

9分钟
92
3个月前

2025.07.24 | MLLMs视觉感知仍不足;Yume模型可生成交互虚拟世界。

HuggingFace 每日AI论文速递

本期的 9 篇论文如下: [00:23] 👁 Pixels, Patterns, but No Poetry: To See The World like Humans(像素、模式,却无诗意:像人类一样感知世界) [00:56] 🌌 Yume: An Interactive World Generation Model(Yume:交互式世界生成模型) [01:29] ✨ DesignLab: Designing Slides Through Iterative Detection and Correction(DesignLab:通过迭代检测与修正进行幻灯片设计) [02:14] 🧠 Can One Domain Help Others? A Data-Centric Study on Multi-Domain Reasoning via Reinforcement Learning(一个领域能否助益其他领域?一项以数据为中心的多领域强化学习推理研究) [02:59] ✅ Re:Form -- Reducing Human Priors in Scalable Formal Software Verification with RL in LLMs: A Preliminary Study on Dafny(Re:Form:在LLM中利用强化学习减少可扩展形式化软件验证中的人类先验——基于Dafny的初步研究) [03:35] 🔍 RAVine: Reality-Aligned Evaluation for Agentic Search(RAVine:面向代理式搜索的现实对齐评估) [04:13] ⚡ Ultra3D: Efficient and High-Fidelity 3D Generation with Part Attention(Ultra3D:采用部分注意力的高效高保真3D生成) [04:59] ✨ Elevating 3D Models: High-Quality Texture and Geometry Refinement from a Low-Quality Model(提升3D模型:从低质量模型实现高质量纹理与几何精修) [05:31] 🔍 Finding Dori: Memorization in Text-to-Image Diffusion Models Is Less Local Than Assumed(寻找多莉:文本到图像扩散模型中的记忆化比假设的局部性更低) 【关注我们】 您还可以在以下平台找到我们,获得播客内容以外更多信息 小红书: AI速递

6分钟
99+
3个月前

2025.07.23 | TIM模型突破LLM上下文限制;Step-Audio 2提升多模态语音对话。

HuggingFace 每日AI论文速递

本期的 15 篇论文如下: [00:24] ♾ Beyond Context Limits: Subconscious Threads for Long-Horizon Reasoning(超越上下文限制:用于长程推理的潜意识线索) [01:05] 🔊 Step-Audio 2 Technical Report(Step-Audio 2 技术报告) [01:41] 🚀 MegaScience: Pushing the Frontiers of Post-Training Datasets for Science Reasoning(MegaScience:推动科学推理后训练数据集的前沿) [02:23] ⚡ Upsample What Matters: Region-Adaptive Latent Sampling for Accelerated Diffusion Transformers(上采样重要区域:用于加速扩散Transformer的区域自适应潜在采样) [03:17] 🧠 Semi-off-Policy Reinforcement Learning for Vision-Language Slow-thinking Reasoning(面向视觉-语言慢思考推理的半离线策略强化学习) [03:56] 🧩 Zebra-CoT: A Dataset for Interleaved Vision Language Reasoning(Zebra-CoT:一个用于交错式视觉语言推理的数据集) [04:36] 🤔 ThinkAct: Vision-Language-Action Reasoning via Reinforced Visual Latent Planning(ThinkAct:基于强化视觉潜在规划的视觉-语言-动作推理) [05:03] 🤖 Experience is the Best Teacher: Grounding VLMs for Robotics through Self-Generated Memory(经验是最好的老师:通过自生成记忆将视觉语言模型应用于机器人领域) [05:56] ✨ HOComp: Interaction-Aware Human-Object Composition(HOComp:交互感知的人物-物体合成) [06:54] 🧐 RefCritic: Training Long Chain-of-Thought Critic Models with Refinement Feedback(RefCritic:利用精炼反馈训练长思维链评论模型) [07:36] 🚀 Task-Specific Zero-shot Quantization-Aware Training for Object Detection(面向目标检测的任务特异性零样本量化感知训练) [08:06] 🔍 SPAR: Scholar Paper Retrieval with LLM-based Agents for Enhanced Academic Search(SPAR: 基于LLM代理的学术论文检索,增强学术搜索能力) [08:35] ⚠ Does More Inference-Time Compute Really Help Robustness?(推理时计算量增加真的有助于提升鲁棒性吗?) [09:16] 🧭 Steering Out-of-Distribution Generalization with Concept Ablation Fine-Tuning(概念消融微调:引导域外泛化) [10:02] 🧠 ObjectGS: Object-aware Scene Reconstruction and Scene Understanding via Gaussian Splatting(ObjectGS:基于高斯泼溅的对象感知场景重建与场景理解) 【关注我们】 您还可以在以下平台找到我们,获得播客内容以外更多信息 小红书: AI速递

11分钟
99+
3个月前
EarsOnMe

加入我们的 Discord

与播客爱好者一起交流

立即加入

扫描微信二维码

添加微信好友,获取更多播客资讯

微信二维码

播放列表

自动播放下一个

播放列表还是空的

去找些喜欢的节目添加进来吧