HuggingFace 每日AI论文速递 - 节目列表

2025.11.19 | 像素演员难推理;视觉误导测真章

HuggingFace 每日AI论文速递

本期的 11 篇论文如下: [00:23] 🧠 Can World Simulators Reason? Gen-ViRe: A Generative Visual Reasoning Benchmark(世界模拟器会推理吗?Gen-ViRe生成式视觉推理基准) [01:03] 🕵 MVI-Bench: A Comprehensive Benchmark for Evaluating Robustness to Misleading Visual Inputs in LVLMs(MVI-Bench:评估大型视觉语言模型对误导性视觉输入鲁棒性的综合基准) [01:49] 🎞 REVISOR: Beyond Textual Reflection, Towards Multimodal Introspective Reasoning in Long-Form Video Understanding(REVISOR:超越文本反思,迈向长视频理解中的多模态内省推理) [03:02] 🧪 ATLAS: A High-Difficulty, Multidisciplinary Benchmark for Frontier Scientific Reasoning(ATLAS:面向通用人工智能的高难度跨学科科学推理基准) [03:43] 🔍 Large Language Models Meet Extreme Multi-label Classification: Scaling and Multi-modal Framework(大语言模型遇上极端多标签分类:可扩展多模态框架) [04:16] 🤖 Agent-R1: Training Powerful LLM Agents with End-to-End Reinforcement Learning(Agent-R1:以端到端强化学习训练强大语言模型智能体) [05:02] 🤖 Orion: A Unified Visual Agent for Multimodal Perception, Advanced Visual Reasoning and Execution(Orion:统一视觉智能体,实现多模态感知、高级视觉推理与执行) [05:32] ⚖ Mitigating Label Length Bias in Large Language Models(缓解大语言模型中的标签长度偏差) [06:14] 🧠 Agent READMEs: An Empirical Study of Context Files for Agentic Coding(智能体README:面向代理编程的上下文文件实证研究) [06:49] 🎧 Proactive Hearing Assistants that Isolate Egocentric Conversations(主动式听力助手:以自我为中心的对话自动分离技术) [07:20] 🎯 Error-Driven Scene Editing for 3D Grounding in Large Language Models(面向3D大模型的误差驱动场景编辑实现精准视觉定位) 【关注我们】 您还可以在以下平台找到我们,获得播客内容以外更多信息 小红书: AI速递

8分钟
99+
5个月前

2025.11.18 | RL奥赛夺金;Uni-MoE 2.0全能跃升

HuggingFace 每日AI论文速递

本期的 14 篇论文如下: [00:17] 🏅 P1: Mastering Physics Olympiads with Reinforcement Learning(用强化学习攻克物理奥赛) [00:56] 🌐 Uni-MoE-2.0-Omni: Scaling Language-Centric Omnimodal Large Model with Advanced MoE, Training and Data(Uni-MoE 2.0 Omni:以语言为中心的万模态大模型,通过先进MoE、训练与数据实现规模跃升) [01:42] 🧩 Part-X-MLLM: Part-aware 3D Multimodal Large Language Model(Part-X-MLLM:面向部件感知的3D多模态大语言模型) [02:22] 🧠 TiViBench: Benchmarking Think-in-Video Reasoning for Video Generative Models(TiViBench:视频生成模型思维推理基准测试) [03:08] 🚀 GroupRank: A Groupwise Reranking Paradigm Driven by Reinforcement Learning(GroupRank:一种由强化学习驱动的分组重排范式) [03:49] 🧩 PhysX-Anything: Simulation-Ready Physical 3D Assets from Single Image(PhysX-Anything:单张图像生成可仿真物理3D资产) [04:28] 🌌 UFO$^3$: Weaving the Digital Agent Galaxy(UFO³:编织数字智能体银河) [04:59] 🍲 Souper-Model: How Simple Arithmetic Unlocks State-of-the-Art LLM Performance(“汤”级模型:简单加权平均即可让大语言模型性能跃升) [05:38] 🌍 OlmoEarth: Stable Latent Image Modeling for Multimodal Earth Observation(OlmoEarth:面向多模态地球观测的稳定潜变量图像建模) [06:19] 🔄 Live-SWE-agent: Can Software Engineering Agents Self-Evolve on the Fly?(Live-SWE-agent:软件工程智能体能否实时自我进化?) [06:51] 🚀 MiroThinker: Pushing the Performance Boundaries of Open-Source Research Agents via Model, Context, and Interactive Scaling(MiroThinker:通过模型、上下文与交互扩展,将开源研究智能体性能推向新边界) [07:36] 🎯 Test-Time Spectrum-Aware Latent Steering for Zero-Shot Generalization in Vision-Language Models(测试时谱感知潜变量引导实现视觉-语言模型零样本泛化) [08:19] 🧠 WebCoach: Self-Evolving Web Agents with Cross-Session Memory Guidance(WebCoach:具备跨会话记忆引导的自进化网页智能体) [09:10] 🧬 Evolve the Method, Not the Prompts: Evolutionary Synthesis of Jailbreak Attacks on LLMs(进化方法而非提示:面向大模型的越狱攻击演化合成) 【关注我们】 您还可以在以下平台找到我们,获得播客内容以外更多信息 小红书: AI速递

10分钟
99+
5个月前

2025.11.17 | RoPE去噪救长文本;AI速筛离子液体

HuggingFace 每日AI论文速递

本期的 13 篇论文如下: [00:24] 🧹 DoPE: Denoising Rotary Position Embedding(DoPE:面向旋转位置嵌入的去噪处理) [00:58] 🧪 AIonopedia: an LLM agent orchestrating multimodal learning for ionic liquid discovery(AIonopedia:面向离子液体发现的LLM智能体多模态学习编排) [01:44] 🖼 UI2Code^N: A Visual Language Model for Test-Time Scalable Interactive UI-to-Code Generation(UI2Code^N:面向测试时可扩展交互式UI转代码生成的视觉语言模型) [02:20] 🚀 Virtual Width Networks(虚拟宽度网络) [02:56] ⚡ LiteAttention: A Temporal Sparse Attention for Diffusion Transformers(LiteAttention:面向扩散Transformer的时序稀疏注意力机制) [03:32] 🌐 Simulating the Visual World with Artificial Intelligence: A Roadmap(用人工智能模拟视觉世界:路线图) [04:12] 📐 GGBench: A Geometric Generative Reasoning Benchmark for Unified Multimodal Models(GGBench:面向统一多模态模型的几何生成推理基准) [05:00] 🧏 HI-TransPA: Hearing Impairments Translation Personal Assistant(HI-TransPA:面向听障者的语音-唇形翻译个人助手) [05:35] 🚀 MarsRL: Advancing Multi-Agent Reasoning System via Reinforcement Learning with Agentic Pipeline Parallelism(MarsRL:基于智能体流水线并行强化学习的多智能体推理系统进阶研究) [06:38] 🎭 EmoVid: A Multimodal Emotion Video Dataset for Emotion-Centric Video Understanding and Generation(EmoVid:面向情感中心视频理解与生成的大规模多模态情感视频数据集) [07:18] 🧭 SpatialThinker: Reinforcing 3D Reasoning in Multimodal LLMs via Spatial Rewards(SpatialThinker:用空间奖励强化多模态大模型的3D推理) [07:55] 📊 Workload Schedulers -- Genesis, Algorithms and Differences(工作负载调度器——起源、算法与差异) [08:51] 🚗 CATS-V2V: A Real-World Vehicle-to-Vehicle Cooperative Perception Dataset with Complex Adverse Traffic Scenarios(CATS-V2V:面向复杂恶劣交通场景的真实车车协同感知数据集) 【关注我们】 您还可以在以下平台找到我们,获得播客内容以外更多信息 小红书: AI速递

10分钟
99+
5个月前

2025.11.13 | 原神数据炼成7B通用AI;零训练轨迹秒变视频遥控器

HuggingFace 每日AI论文速递

本期的 9 篇论文如下: [00:19] 🌍 Lumine: An Open Recipe for Building Generalist Agents in 3D Open Worlds(Lumine:在3D开放世界中打造通才智能体的开源方案) [00:54] 🎬 Time-to-Move: Training-Free Motion Controlled Video Generation via Dual-Clock Denoising(Time-to-Move:无需训练的双时钟去噪运动控制视频生成) [01:31] ⚡ TiDAR: Think in Diffusion, Talk in Autoregression(TiDAR:扩散式思考,自回归式表达) [02:15] 🔄 LoopTool: Closing the Data-Training Loop for Robust LLM Tool Calls(LoopTool:闭合数据-训练循环,铸就鲁棒LLM工具调用) [02:51] 🤖 WMPO: World Model-based Policy Optimization for Vision-Language-Action Models(基于世界模型的视觉-语言-动作策略优化) [03:33] 🖥 WebVIA: A Web-based Vision-Language Agentic Framework for Interactive and Verifiable UI-to-Code Generation(WebVIA:可交互可验证的网页端视觉-语言智能体UI代码生成框架) [04:19] 🎯 Toward the Frontiers of Reliable Diffusion Sampling via Adversarial Sinkhorn Attention Guidance(迈向对抗式Sinkhorn注意力引导的可靠扩散采样新前沿) [04:55] 🤖 Agentic Refactoring: An Empirical Study of AI Coding Agents(智能体重构:AI编程智能体的大规模实证研究) [05:31] 🛡 Stemming Hallucination in Language Models Using a Licensing Oracle(利用许可证预言机遏制语言模型幻觉) 【关注我们】 您还可以在以下平台找到我们,获得播客内容以外更多信息 小红书: AI速递

6分钟
99+
5个月前

2025.11.12 | 1.5B小模型反超671B大模型;多智能体质检聊天机器人

HuggingFace 每日AI论文速递

本期的 9 篇论文如下: [00:24] 🧠 Tiny Model, Big Logic: Diversity-Driven Optimization Elicits Large-Model Reasoning Ability in VibeThinker-1.5B(小模型大逻辑:多样性驱动优化唤醒VibeThinker-1.5B的大模型推理力) [00:59] 🤝 Adaptive Multi-Agent Response Refinement in Conversational Systems(对话系统中自适应多智能体响应精炼机制) [01:30] 🧩 Wasm: A Pipeline for Constructing Structured Arabic Interleaved Multimodal Corpora(Wasm:构建结构化阿拉伯交错型多模态语料的流水线) [02:17] ⚡ KLASS: KL-Guided Fast Inference in Masked Diffusion Models(KLASS:基于KL散度引导的掩码扩散模型快速采样) [02:53] 🖥 Grounding Computer Use Agents on Human Demonstrations(基于人类演示的计算机使用智能体定位研究) [03:37] 🎥 VideoSSR: Video Self-Supervised Reinforcement Learning(VideoSSR:视频自监督强化学习) [04:19] 🚪 The Path Not Taken: RLVR Provably Learns Off the Principals(未被选择的路径:RLVR确实沿非主方向学习) [05:14] 🔗 BiCA: Effective Biomedical Dense Retrieval with Citation-Aware Hard Negatives(BiCA:面向引文感知难负样本的生物医学稠密检索) [05:56] 🤹 Walking the Tightrope of LLMs for Software Development: A Practitioners' Perspective(游走于大型语言模型的钢丝绳——开发者视角的平衡之道) 【关注我们】 您还可以在以下平台找到我们,获得播客内容以外更多信息 小红书: AI速递

6分钟
99+
5个月前

2025.11.11 | 小窗口勤总结刷新深度研究;先广撒网再啃难题激活代码竞赛

HuggingFace 每日AI论文速递

本期的 13 篇论文如下: [00:25] 🧩 IterResearch: Rethinking Long-Horizon Agents via Markovian State Reconstruction(IterResearch:基于马尔可夫状态重构的长程智能体再思考) [01:16] 🏆 DRIVE: Data Curation Best Practices for Reinforcement Learning with Verifiable Reward in Competitive Code Generation(DRIVE:面向可验证奖励强化学习的竞赛级代码生成数据精选最佳实践) [02:03] 🔬 The Station: An Open-World Environment for AI-Driven Discovery(“站”:面向AI驱动科学发现的开放世界环境) [02:43] 🚀 RedOne 2.0: Rethinking Domain-specific LLM Post-Training in Social Networking Services(RedOne 2.0:社交网络场景下领域大模型后训练新范式) [03:15] 🧠 SofT-GRPO: Surpassing Discrete-Token LLM Reinforcement Learning via Gumbel-Reparameterized Soft-Thinking Policy Optimization(SofT-GRPO:用Gumbel重参数化软思考策略优化让离散Token强化学习望尘莫及) [03:53] 🧭 Routing Manifold Alignment Improves Generalization of Mixture-of-Experts LLMs(路由流形对齐提升混合专家大语言模型的泛化能力) [04:30] 🔍 Reasoning with Confidence: Efficient Verification of LLM Reasoning Steps via Uncertainty Heads(以置信度推理:通过不确定性头高效验证大模型推理步骤) [05:10] 🎬 MVU-Eval: Towards Multi-Video Understanding Evaluation for Multimodal LLMs(MVU-Eval:面向多模态大模型的多视频理解评测基准) [05:50] 🎨 MPJudge: Towards Perceptual Assessment of Music-Induced Paintings(MPJudge:面向音乐诱发绘画的感知评估) [06:57] 🔄 RLoop: An Self-Improving Framework for Reinforcement Learning with Iterative Policy Initialization(RLoop:一种通过迭代策略初始化自我提升的强化学习框架) [07:36] 🤖 Robot Learning from a Physical World Model(基于物理世界模型的机器人学习) [08:21] 🛠 NURBGen: High-Fidelity Text-to-CAD Generation through LLM-Driven NURBS Modeling(NURBGen:基于大模型驱动NURBS建模的高保真文本转CAD生成) [08:52] 🚀 SWE-fficiency: Can Language Models Optimize Real-World Repositories on Real Workloads?(SWE-fficiency:语言模型能否在真实工作负载下优化真实仓库性能?) 【关注我们】 您还可以在以下平台找到我们,获得播客内容以外更多信息 小红书: AI速递

9分钟
99+
5个月前

2025.10.27 | DeepAgent一步推理+ToolPO;视频即提示DiT秒控百种语义

HuggingFace 每日AI论文速递

本期的 15 篇论文如下: [00:27] 🧠 DeepAgent: A General Reasoning Agent with Scalable Toolsets(DeepAgent:具备可扩展工具集的通用推理智能体) [01:01] 🎬 Video-As-Prompt: Unified Semantic Control for Video Generation(视频即提示:统一语义控制的视频生成新范式) [01:35] 🔧 From Denoising to Refining: A Corrective Framework for Vision-Language Diffusion Model(从去噪到精修:视觉-语言扩散模型的纠错式生成框架) [02:14] 🧩 Sample By Step, Optimize By Chunk: Chunk-Level GRPO For Text-to-Image Generation(逐段采样、分块优化:面向文本到图像生成的块级GRPO方法) [02:51] 🧠 A Definition of AGI(AGI的量化定义) [03:23] 🧩 Sparser Block-Sparse Attention via Token Permutation(基于Token置换的稀疏块稀疏注意力机制) [04:14] 🧭 UI-Ins: Enhancing GUI Grounding with Multi-Perspective Instruction-as-Reasoning(UI-Ins:以“指令即推理”多视角增强GUI定位) [04:57] 🧠 Reasoning with Sampling: Your Base Model is Smarter Than You Think(基于采样的推理:你的基础模型比你想象的更聪明) [05:30] 🧠 RECALL: REpresentation-aligned Catastrophic-forgetting ALLeviation via Hierarchical Model Merging(RECALL:基于表示对齐的层级模型融合缓解大模型灾难性遗忘) [06:08] 📐 Visual Diffusion Models are Geometric Solvers(视觉扩散模型是几何求解器) [06:56] 🌍 WorldGrow: Generating Infinite 3D World(无限3D世界生成:WorldGrow) [07:35] 🎬 RAPO++: Cross-Stage Prompt Optimization for Text-to-Video Generation via Data Alignment and Test-Time Scaling(RAPO++:面向文生视频的跨阶段提示优化——数据对齐与测试时缩放) [08:14] 🔗 Model Merging with Functional Dual Anchors(基于功能双锚点的模型融合方法) [08:49] 🧭 Map the Flow: Revealing Hidden Pathways of Information in VideoLLMs(揭示VideoLLM隐藏信息通路:视频语言模型内部流动图谱) [09:34] 📊 Document Understanding, Measurement, and Manipulation Using Category Theory(基于范畴论的文档理解、度量与操控) 【关注我们】 您还可以在以下平台找到我们,获得播客内容以外更多信息 小红书: AI速递

10分钟
99+
5个月前

加入我们的 Discord

与播客爱好者一起交流

立即加入

扫描微信二维码

添加微信好友,获取更多播客资讯

微信二维码

播放列表

自动播放下一个

播放列表还是空的

去找些喜欢的节目添加进来吧