2025.11.13 | 原神数据炼成7B通用AI;零训练轨迹秒变视频遥控器

HuggingFace 每日AI论文速递

本期的 9 篇论文如下: [00:19] 🌍 Lumine: An Open Recipe for Building Generalist Agents in 3D Open Worlds(Lumine:在3D开放世界中打造通才智能体的开源方案) [00:54] 🎬 Time-to-Move: Training-Free Motion Controlled Video Generation via Dual-Clock Denoising(Time-to-Move:无需训练的双时钟去噪运动控制视频生成) [01:31] ⚡ TiDAR: Think in Diffusion, Talk in Autoregression(TiDAR:扩散式思考,自回归式表达) [02:15] 🔄 LoopTool: Closing the Data-Training Loop for Robust LLM Tool Calls(LoopTool:闭合数据-训练循环,铸就鲁棒LLM工具调用) [02:51] 🤖 WMPO: World Model-based Policy Optimization for Vision-Language-Action Models(基于世界模型的视觉-语言-动作策略优化) [03:33] 🖥 WebVIA: A Web-based Vision-Language Agentic Framework for Interactive and Verifiable UI-to-Code Generation(WebVIA:可交互可验证的网页端视觉-语言智能体UI代码生成框架) [04:19] 🎯 Toward the Frontiers of Reliable Diffusion Sampling via Adversarial Sinkhorn Attention Guidance(迈向对抗式Sinkhorn注意力引导的可靠扩散采样新前沿) [04:55] 🤖 Agentic Refactoring: An Empirical Study of AI Coding Agents(智能体重构:AI编程智能体的大规模实证研究) [05:31] 🛡 Stemming Hallucination in Language Models Using a Licensing Oracle(利用许可证预言机遏制语言模型幻觉) 【关注我们】 您还可以在以下平台找到我们,获得播客内容以外更多信息 小红书: AI速递

6分钟
99+
1个月前

2025.11.12 | 1.5B小模型反超671B大模型;多智能体质检聊天机器人

HuggingFace 每日AI论文速递

本期的 9 篇论文如下: [00:24] 🧠 Tiny Model, Big Logic: Diversity-Driven Optimization Elicits Large-Model Reasoning Ability in VibeThinker-1.5B(小模型大逻辑:多样性驱动优化唤醒VibeThinker-1.5B的大模型推理力) [00:59] 🤝 Adaptive Multi-Agent Response Refinement in Conversational Systems(对话系统中自适应多智能体响应精炼机制) [01:30] 🧩 Wasm: A Pipeline for Constructing Structured Arabic Interleaved Multimodal Corpora(Wasm:构建结构化阿拉伯交错型多模态语料的流水线) [02:17] ⚡ KLASS: KL-Guided Fast Inference in Masked Diffusion Models(KLASS:基于KL散度引导的掩码扩散模型快速采样) [02:53] 🖥 Grounding Computer Use Agents on Human Demonstrations(基于人类演示的计算机使用智能体定位研究) [03:37] 🎥 VideoSSR: Video Self-Supervised Reinforcement Learning(VideoSSR:视频自监督强化学习) [04:19] 🚪 The Path Not Taken: RLVR Provably Learns Off the Principals(未被选择的路径:RLVR确实沿非主方向学习) [05:14] 🔗 BiCA: Effective Biomedical Dense Retrieval with Citation-Aware Hard Negatives(BiCA:面向引文感知难负样本的生物医学稠密检索) [05:56] 🤹 Walking the Tightrope of LLMs for Software Development: A Practitioners' Perspective(游走于大型语言模型的钢丝绳——开发者视角的平衡之道) 【关注我们】 您还可以在以下平台找到我们,获得播客内容以外更多信息 小红书: AI速递

6分钟
99+
1个月前

2025.11.11 | 小窗口勤总结刷新深度研究;先广撒网再啃难题激活代码竞赛

HuggingFace 每日AI论文速递

本期的 13 篇论文如下: [00:25] 🧩 IterResearch: Rethinking Long-Horizon Agents via Markovian State Reconstruction(IterResearch:基于马尔可夫状态重构的长程智能体再思考) [01:16] 🏆 DRIVE: Data Curation Best Practices for Reinforcement Learning with Verifiable Reward in Competitive Code Generation(DRIVE:面向可验证奖励强化学习的竞赛级代码生成数据精选最佳实践) [02:03] 🔬 The Station: An Open-World Environment for AI-Driven Discovery(“站”:面向AI驱动科学发现的开放世界环境) [02:43] 🚀 RedOne 2.0: Rethinking Domain-specific LLM Post-Training in Social Networking Services(RedOne 2.0:社交网络场景下领域大模型后训练新范式) [03:15] 🧠 SofT-GRPO: Surpassing Discrete-Token LLM Reinforcement Learning via Gumbel-Reparameterized Soft-Thinking Policy Optimization(SofT-GRPO:用Gumbel重参数化软思考策略优化让离散Token强化学习望尘莫及) [03:53] 🧭 Routing Manifold Alignment Improves Generalization of Mixture-of-Experts LLMs(路由流形对齐提升混合专家大语言模型的泛化能力) [04:30] 🔍 Reasoning with Confidence: Efficient Verification of LLM Reasoning Steps via Uncertainty Heads(以置信度推理:通过不确定性头高效验证大模型推理步骤) [05:10] 🎬 MVU-Eval: Towards Multi-Video Understanding Evaluation for Multimodal LLMs(MVU-Eval:面向多模态大模型的多视频理解评测基准) [05:50] 🎨 MPJudge: Towards Perceptual Assessment of Music-Induced Paintings(MPJudge:面向音乐诱发绘画的感知评估) [06:57] 🔄 RLoop: An Self-Improving Framework for Reinforcement Learning with Iterative Policy Initialization(RLoop:一种通过迭代策略初始化自我提升的强化学习框架) [07:36] 🤖 Robot Learning from a Physical World Model(基于物理世界模型的机器人学习) [08:21] 🛠 NURBGen: High-Fidelity Text-to-CAD Generation through LLM-Driven NURBS Modeling(NURBGen:基于大模型驱动NURBS建模的高保真文本转CAD生成) [08:52] 🚀 SWE-fficiency: Can Language Models Optimize Real-World Repositories on Real Workloads?(SWE-fficiency:语言模型能否在真实工作负载下优化真实仓库性能?) 【关注我们】 您还可以在以下平台找到我们,获得播客内容以外更多信息 小红书: AI速递

9分钟
99+
1个月前

2025.10.27 | DeepAgent一步推理+ToolPO;视频即提示DiT秒控百种语义

HuggingFace 每日AI论文速递

本期的 15 篇论文如下: [00:27] 🧠 DeepAgent: A General Reasoning Agent with Scalable Toolsets(DeepAgent:具备可扩展工具集的通用推理智能体) [01:01] 🎬 Video-As-Prompt: Unified Semantic Control for Video Generation(视频即提示:统一语义控制的视频生成新范式) [01:35] 🔧 From Denoising to Refining: A Corrective Framework for Vision-Language Diffusion Model(从去噪到精修:视觉-语言扩散模型的纠错式生成框架) [02:14] 🧩 Sample By Step, Optimize By Chunk: Chunk-Level GRPO For Text-to-Image Generation(逐段采样、分块优化:面向文本到图像生成的块级GRPO方法) [02:51] 🧠 A Definition of AGI(AGI的量化定义) [03:23] 🧩 Sparser Block-Sparse Attention via Token Permutation(基于Token置换的稀疏块稀疏注意力机制) [04:14] 🧭 UI-Ins: Enhancing GUI Grounding with Multi-Perspective Instruction-as-Reasoning(UI-Ins:以“指令即推理”多视角增强GUI定位) [04:57] 🧠 Reasoning with Sampling: Your Base Model is Smarter Than You Think(基于采样的推理:你的基础模型比你想象的更聪明) [05:30] 🧠 RECALL: REpresentation-aligned Catastrophic-forgetting ALLeviation via Hierarchical Model Merging(RECALL:基于表示对齐的层级模型融合缓解大模型灾难性遗忘) [06:08] 📐 Visual Diffusion Models are Geometric Solvers(视觉扩散模型是几何求解器) [06:56] 🌍 WorldGrow: Generating Infinite 3D World(无限3D世界生成:WorldGrow) [07:35] 🎬 RAPO++: Cross-Stage Prompt Optimization for Text-to-Video Generation via Data Alignment and Test-Time Scaling(RAPO++:面向文生视频的跨阶段提示优化——数据对齐与测试时缩放) [08:14] 🔗 Model Merging with Functional Dual Anchors(基于功能双锚点的模型融合方法) [08:49] 🧭 Map the Flow: Revealing Hidden Pathways of Information in VideoLLMs(揭示VideoLLM隐藏信息通路:视频语言模型内部流动图谱) [09:34] 📊 Document Understanding, Measurement, and Manipulation Using Category Theory(基于范畴论的文档理解、度量与操控) 【关注我们】 您还可以在以下平台找到我们,获得播客内容以外更多信息 小红书: AI速递

10分钟
99+
1个月前

2025.10.24 | AdaSPEC挑40% token提速两成;AutoPage 10美分生成交互网页

HuggingFace 每日AI论文速递

本期的 15 篇论文如下: [00:23] 🎯 AdaSPEC: Selective Knowledge Distillation for Efficient Speculative Decoders(AdaSPEC:面向高效推测解码的选择性知识蒸馏) [00:57] 🤖 Human-Agent Collaborative Paper-to-Page Crafting for Under $0.1(低成本人机协作论文一键成页:低于0.1美元) [01:35] 🔍 Open-o3 Video: Grounded Video Reasoning with Explicit Spatio-Temporal Evidence(Open-o3视频:显式时空证据支撑的开放域视频推理) [02:06] 🎬 HoloCine: Holistic Generation of Cinematic Multi-Shot Long Video Narratives(HoloCine:端到端生成多镜头长时电影级叙事视频) [02:52] 🌀 Loopholing Discrete Diffusion: Deterministic Bypass of the Sampling Wall(绕过离散扩散采样墙的确定性捷径) [03:33] 💎 Every Question Has Its Own Value: Reinforcement Learning with Explicit Human Values(每个问题都有它的价值:显式人类价值驱动的强化学习) [04:06] ⚖ The Massive Legal Embedding Benchmark (MLEB)(大规模法律嵌入评测基准(MLEB)) [04:48] 🔍 DyPE: Dynamic Position Extrapolation for Ultra High Resolution Diffusion(DyPE:面向超高分辨率扩散模型的动态位置外推方法) [05:33] 🕵 Conan: Progressive Learning to Reason Like a Detective over Multi-Scale Visual Evidence(柯南:像侦探一样在多尺度视觉证据上渐进式推理) [06:12] 🤖 Search Self-play: Pushing the Frontier of Agent Capability without Supervision(搜索自博弈:无需监督即可拓展智能体能力边界) [06:56] 🎭 Investigating Safety Vulnerabilities of Large Audio-Language Models Under Speaker Emotional Variations(探究大音频语言模型在说话人情绪变化下的安全漏洞) [07:42] 🖼 LayerComposer: Interactive Personalized T2I via Spatially-Aware Layered Canvas(LayerComposer:基于空间感知分层画布的交互式个性化文生图) [08:10] 🎧 SAKE: Towards Editing Auditory Attribute Knowledge of Large Audio-Language Models(SAKE:面向大型音频-语言模型听觉属性知识编辑的探索) [08:51] 🖼 ARGenSeg: Image Segmentation with Autoregressive Image Generation Model(ARGenSeg:基于自回归图像生成的图像分割) [09:39] 🧩 Seed3D 1.0: From Images to High-Fidelity Simulation-Ready 3D Assets(Seed3D 1.0:从单张图像生成高保真、可仿真的3D资产) 【关注我们】 您还可以在以下平台找到我们,获得播客内容以外更多信息 小红书: AI速递

10分钟
99+
1个月前

2025.11.07 | 视频推理新范式;图像互动促思维

HuggingFace 每日AI论文速递

本期的 12 篇论文如下: [00:21] 🎬 Thinking with Video: Video Generation as a Promising Multimodal Reasoning Paradigm(用视频思考:视频生成作为统一多模态推理新范式) [00:58] 🧠 V-Thinker: Interactive Thinking with Images(V-Thinker:与图像互动的思维推理) [01:39] 🧠 Scaling Agent Learning via Experience Synthesis(基于经验合成的智能体规模化强化学习) [02:23] 🧠 Cambrian-S: Towards Spatial Supersensing in Video(Cambrian-S:迈向视频中的空间超感) [03:06] 🖥 GUI-360: A Comprehensive Dataset and Benchmark for Computer-Using Agents(GUI-360°:面向计算机使用智能体的大规模综合数据集与评测基准) [03:51] 📄 NVIDIA Nemotron Nano V2 VL(NVIDIA Nemotron Nano V2 VL:面向文档与长视频理解的高效视觉语言模型) [04:28] 🎟 The Strong Lottery Ticket Hypothesis for Multi-Head Attention Mechanisms(多头注意力机制的强彩票假设) [05:12] 🕵 Benchmark Designers Should "Train on the Test Set" to Expose Exploitable Non-Visual Shortcuts(基准设计者应“在测试集上训练”以暴露可利用的非视觉捷径) [05:48] ⚽ Learning Vision-Driven Reactive Soccer Skills for Humanoid Robots(人形机器人视觉驱动反应式足球技能学习) [06:18] 🔍 Contamination Detection for VLMs using Multi-Modal Semantic Perturbation(基于多模态语义扰动的视觉语言模型污染检测) [06:53] 🎧 How to Evaluate Speech Translation with Source-Aware Neural MT Metrics(如何借助源语言感知的神经机器翻译指标评估语音翻译) [07:32] 🚀 RDMA Point-to-Point Communication for LLM Systems(面向LLM系统的RDMA点对点通信) 【关注我们】 您还可以在以下平台找到我们,获得播客内容以外更多信息 小红书: AI速递

8分钟
95
1个月前

2025.11.06 | 扩散模型省数据;音视频对口型

HuggingFace 每日AI论文速递

本期的 9 篇论文如下: [00:17] 🚀 Diffusion Language Models are Super Data Learners(扩散语言模型是超级数据学习者) [01:06] 🎬 UniAVGen: Unified Audio and Video Generation with Asymmetric Cross-Modal Interactions(统一音视频生成的不对称跨模态交互方法) [01:42] 🧩 LEGO-Eval: Towards Fine-Grained Evaluation on Synthesizing 3D Embodied Environments with Tool Augmentation(LEGO-Eval:面向具身3D环境合成工具增强细粒度评测) [02:25] 📊 Orion-MSP: Multi-Scale Sparse Attention for Tabular In-Context Learning(Orion-MSP:面向表格上下文学习的多尺度稀疏注意力机制) [03:15] 📊 TabTune: A Unified Library for Inference and Fine-Tuning Tabular Foundation Models(TabTune:面向表格基础模型推理与微调的一站式统一库) [03:46] 🦾 Kinematify: Open-Vocabulary Synthesis of High-DoF Articulated Objects(Kinematify:开放词汇的高自由度关节物体合成) [04:30] 🧠 MME-CC: A Challenging Multi-Modal Evaluation Benchmark of Cognitive Capacity(MME-CC:一项面向多模态认知能力的挑战性评测基准) [05:06] 📈 LiveTradeBench: Seeking Real-World Alpha with Large Language Models(LiveTradeBench:用大模型在真实市场里挖掘超额收益) [05:55] 🔍 Let Multimodal Embedders Learn When to Augment Query via Adaptive Query Augmentation(多模态嵌入器自适应决定何时增强查询的所罗门方法) 【关注我们】 您还可以在以下平台找到我们,获得播客内容以外更多信息 小红书: AI速递

7分钟
64
1个月前
EarsOnMe

加入我们的 Discord

与播客爱好者一起交流

立即加入

扫描微信二维码

添加微信好友,获取更多播客资讯

微信二维码

播放列表

自动播放下一个

播放列表还是空的

去找些喜欢的节目添加进来吧