HuggingFace 每日AI论文速递 - 节目列表

2026.06.05 | ArcANE框架量化角色弧线;TIDE模型实现主动洞察

2026.06.05 | ArcANE框架量化角色弧线;TIDE模型实现主动洞察

HuggingFace 每日AI论文速递

【目录】 本期的 15 篇论文如下: [00:31] 🎭 ArcANE: Do Role-Playing Language Agents Stay in Character at the Right Time?(ArcANE:角色扮演语言代理在正确时刻保持角色一致性吗?) [01:26] 🔍 TIDE: Proactive Multi-Problem Discovery via Template-Guided Iteration(TIDE:通过模板引导的迭代实现主动多问题发现) [02:27] 🤖 AdaPlanBench: Evaluating Adaptive Planning in Large Language Model Agents under World and User Constraints(AdaPlanBench:在世界与用户约束下评估大语言模型智能体的自适应规划能力) [03:14] 🎥 VideoKR: Towards Knowledge- and Reasoning-Intensive Video Understanding(VideoKR:迈向知识和推理密集型视频理解) [04:09] 🤖 RobotValues: Evaluating Household Robots When Human Values Conflict(机器人价值观:当人类价值观冲突时评估家用机器人) [05:01] 🌐 Reinforcement Learning Elicits Contextual Learning of Unseen Language Translation(强化学习引发对未见语言的上下文翻译学习) [05:58] 🎬 LoomVideo: Unifying Multimodal Inputs into Video Generation and Editing(LoomVideo:统一多模态输入的视频生成与编辑) [06:49] 📸 Personal AI Agent for Camera Roll VQA(个人相机胶卷视觉问答的AI助手) [07:36] 🧠 Rethinking Continual Experience Internalization for Self-Evolving LLM Agents(重新思考持续经验内化以实现自演化的大语言模型智能体) [08:27] ⚖ Complexity-Balanced Diffusion Splitting(复杂度平衡扩散分割) [09:28] 🤖 Dream.exe: Can Video Generation Models Dream Executable Robot Manipulation?(Dream.exe:视频生成模型能否构想出可执行的机器人操作?) [10:33] 🔬 Unsupervised Skill Discovery for Agentic Data Analysis(面向智能体数据分析的无监督技能发现) [11:25] 🔍 LLMs Can Leak Training Data But Do They Want To? A Propensity-Aware Evaluation of Memorization in LLMs(大型语言模型可能泄露训练数据,但它们愿意吗?一种基于倾向性的记忆评估方法) [12:17] 🎯 Towards One-to-Many Temporal Grounding(迈向一对多时序定位) [13:16] 💰 The Shadow Price of Reasoning: Economic Perspective on Optimal Budget Allocation for LLMs(推理的影子价格:大型语言模型最优预算分配的经济学视角) 【关注我们】 您还可以在以下平台找到我们,获得播客内容以外更多信息 小红书: AI速递 【赞助商】 OpenClaw快报 每天五分钟,听听 OpenClaw 快报,带你了解最新动态和业内讨论 传送门 https://www.xiaoyuzhoufm.com/podcast/6a1732a2dffa135d0ab5ef43

14分钟
45
2周前
2026.06.04 | 全模态统一框架;音频实时主动交互

2026.06.04 | 全模态统一框架;音频实时主动交互

HuggingFace 每日AI论文速递

【目录】 本期的 15 篇论文如下: [00:31] 🌌 Cosmos 3: Omnimodal World Models for Physical AI(宇宙3:面向物理AI的全模态世界模型) [01:36] 🎧 Audio Interaction Model(音频交互模型) [02:31] 🔍 Where Do Deep-Research Agents Go Wrong? Span-Level Error Localization in Agent Trajectories(深度研究型智能体错在哪里?智能体轨迹中的跨度级错误定位) [03:30] 🔍 Reproducing, Analyzing, and Detecting Reward Hacking in Rubric-Based Reinforcement Learning(在基于评分标准的强化学习中复现、分析与检测奖励作弊行为) [04:25] 🧭 OVO-S-Bench: A Hierarchical Benchmark for Streaming Spatial Intelligence in Multimodal LLMs(OVO-S-Bench:面向多模态大语言模型流式空间智能的分层基准) [05:27] ⚡ Qwen-Image-Flash: Beyond Objective Design(Qwen-Image-Flash:超越客观设计) [06:18] 🧠 M$^3$Eval: Multi-Modal Memory Evaluation through Cognitively-Grounded Video Tasks(M$^3$Eval:基于认知视频任务的多模态记忆评估) [07:13] 🎥 Echo-Infinity: Learning Evolving Memory for Real-Time Infinite Video Generation(回声无限:面向实时无限视频生成的可学习演化记忆) [08:14] 🧠 ThoughtFold: Folding Reasoning Chains via Introspective Preference Learning(思维折叠:通过内省偏好学习折叠推理链) [09:08] 🧪 Benchmarks are Not Enough: RAMP for Runtime Assessing of Agentic Models in Production Systems(基准测试并不足够:用于生产系统中智能体模型运行时评估的RAMP框架) [10:15] ⚡ Streaming Communication in Multi-Agent Reasoning(多智能体推理中的流式通信) [11:08] 🎯 Self-Distilled Policy Gradient(自蒸馏策略梯度) [12:13] 🧠 MemTrain: Self-Supervised Context Memory Training(MemTrain:自监督上下文记忆训练) [13:05] 🧩 Eliciting Complex Spatial Reasoning in MLLMs through Wide-Baseline Matching(通过宽基线匹配激发多模态大语言模型中的复杂空间推理能力) [14:11] 🤖 MMG2Skill: Can Agents Distill In-the-Wild Guides into Self-Evolving Skills?(MMG2Skill:智能体能否从野外指南中蒸馏出自我进化的技能?) 【关注我们】 您还可以在以下平台找到我们,获得播客内容以外更多信息 小红书: AI速递 【赞助商】 OpenClaw快报 每天五分钟,听听 OpenClaw 快报,带你了解最新动态和业内讨论 传送门 https://www.xiaoyuzhoufm.com/podcast/6a1732a2dffa135d0ab5ef43

15分钟
87
2周前
2026.06.03 | 信任区域教小模型;人形GPT追踪动作

2026.06.03 | 信任区域教小模型;人形GPT追踪动作

HuggingFace 每日AI论文速递

【目录】 本期的 15 篇论文如下: [00:31] 🎯 Trust Region On-Policy Distillation(信任区域同策略蒸馏) [01:17] 🤖 Humanoid-GPT: Scaling Data and Structure for Zero-Shot Motion Tracking(人形GPT:扩展数据与结构实现零样本运动追踪) [02:07] 🧠 A Local Perturbation Theory for Cross-Domain Interference and Recovery in Multi-Domain RL(多领域强化学习中跨域干扰与恢复的局部微扰理论) [03:06] 🧠 World Models Meet Language Models: On the Complementarity of Concrete and Abstract Reasoning(世界模型与语言模型:具体与抽象推理的互补性) [03:57] 🏥 AutoMedBench: Towards Medical AutoResearch with Agentic AI Models(AutoMedBench:面向医疗自主研究的智能体AI模型基准) [05:09] 🖼 Decoupled Residual Denoising Diffusion Models for Unified and Data Efficient Image-to-Image Translation(解耦残差去噪扩散模型用于统一且数据高效的图像到图像翻译) [06:12] 😴 Language Models Need Sleep: Learning to Self-Modify and Consolidate Memories(语言模型需要睡眠:学习自我修改与记忆巩固) [07:09] 🧩 TRON: Targeted Rule-Verifiable Online Environments for Visual Reasoning RL(TRON:面向视觉推理强化学习的目标驱动、规则可验证的在线环境) [08:07] 💬 $Ψ$-Bench: Evaluating Persona-Sensitive Influencing in Persuasive Dialogues(Ψ-Bench:评估说服性对话中个性感知影响能力) [09:08] 🧩 Decentralized Instruction Tuning: Conflict-Aware Splitting and Weight Merging(去中心化指令微调:冲突感知分割与权重合并) [10:05] 🎯 Small RL Controller, Large Language Model: RL-Guided Adaptive Sampling for Test-Time Scaling(小型强化学习控制器与大型语言模型:基于强化学习引导的自适应采样实现测试时扩展) [11:09] 📄 PaddleOCR-VL-1.6: Expanding the Frontier of Document Parsing with Under-Optimized Region Refinement and Progressive Post-Training(PaddleOCR-VL-1.6:通过欠优化区域精炼与渐进式后训练扩展文档解析前沿) [12:14] 🗺 PlatonicNav: Unveiling Semantic Correspondence in Navigation with Platonic Topological Maps(柏拉图导航:利用柏拉图拓扑图揭示导航中的语义对应关系) [13:16] 🔍 Diagnosing Harmful Continuation in Answer-Correct Long-CoT Training Traces(诊断正确答案长链思维训练轨迹中的有害延续) [14:05] 🎵 MERIT: Learning Disentangled Music Representations for Audio Similarity(MERIT:学习用于音频相似性的解耦音乐表示) 【关注我们】 您还可以在以下平台找到我们,获得播客内容以外更多信息 小红书: AI速递 【赞助商】 OpenClaw快报 每天五分钟,听听 OpenClaw 快报,带你了解最新动态和业内讨论 传送门 https://www.xiaoyuzhoufm.com/podcast/6a1732a2dffa135d0ab5ef43

15分钟
82
2周前
2026.06.02 | 多智能体框架生成可编辑图表;参数高效微调支撑百万个性化模型

2026.06.02 | 多智能体框架生成可编辑图表;参数高效微调支撑百万个性化模型

HuggingFace 每日AI论文速递

【目录】 本期的 15 篇论文如下: [00:33] 🎨 Crafter: A Multi-Agent Harness for Editable Scientific Figure Generation from Diverse Inputs(Crafter:一种用于从多样化输入生成可编辑科学图形的多智能体框架) [01:39] 🧩 On the Scaling of PEFT: Towards Million Personal Models of Trillion Parameters(关于参数高效微调的规模化:迈向万亿参数级别的百万个性化模型) [02:35] 🧪 A Matter of TASTE: Improving Coverage and Difficulty of Agent Benchmarks(品味之道:提升智能体基准测试的覆盖度与难度) [03:25] 🌐 K-BrowseComp: A Web Browsing Agent Benchmark Grounded in Korean Contexts(K-BrowseComp:基于韩国语境的网页浏览代理基准测试) [04:21] ⚡ Draft-OPD: On-Policy Distillation for Speculative Draft Models(Draft-OPD:面向推测草稿模型的在策略蒸馏) [05:10] 🎓 VLMs are Good Teachers for Video Reasoning via Adaptive Test-Time Optimization(视觉语言模型作为视频推理的优质教师:通过自适应测试时优化) [06:18] 📡 X-Stream: Exploring MLLMs as Multiplexers for Multi-Stream Understanding(X-Stream:探索多模态大语言模型作为多流理解的多路复用器) [07:13] 🎬 VideoMLA: Low-Rank Latent KV Cache for Minute-Scale Autoregressive Video Diffusion(VideoMLA:用于分钟级自回归视频扩散的低秩潜在KV缓存) [07:59] 🤖 SkillAdaptor: Self-Adapting Skills for LLM Agents from Trajectories(SkillAdaptor:面向LLM智能体的自适应技能从轨迹中学习) [08:54] 🧠 Which Pretraining Paradigm Better Serves Spatial Intelligence? An Empirical Comparison of Vision-Language and Video Generation Models(哪种预训练范式更好地服务于空间智能?视觉语言模型与视频生成模型的实证比较) [09:51] 🧠 NITP: Next Implicit Token Prediction for LLM Pre-training(NITP:面向大语言模型预训练的下一隐式词元预测) [10:50] 👀 Where to Look: Can Foundation Models Reach a Target Viewpoint Through Active Exploration?(该看向哪里:基础模型能否通过主动探索达到目标视角?) [11:46] 🎬 LVSA: Training-Free Sparse Attention for Long Video Diffusion(LVSA:面向长视频扩散的无训练稀疏注意力机制) [12:38] 🛑 ESPO: Early-Stopping Proximal Policy Optimization(早期停止的近端策略优化) [13:37] 🎤 StreamChar: Long-Horizon Streaming Character Audio-Video Generation with Decoupled Orchestration(StreamChar:基于解耦编排的长时流式角色音频-视频生成) 【关注我们】 您还可以在以下平台找到我们,获得播客内容以外更多信息 小红书: AI速递 【赞助商】 OpenClaw快报 每天五分钟,听听 OpenClaw 快报,带你了解最新动态和业内讨论 传送门 https://www.xiaoyuzhoufm.com/podcast/6a1732a2dffa135d0ab5ef43

14分钟
84
3周前
2026.06.01 | 知识蒸馏炼技能;表示强制破瓶颈

2026.06.01 | 知识蒸馏炼技能;表示强制破瓶颈

HuggingFace 每日AI论文速递

【目录】 本期的 15 篇论文如下: [00:30] 🧠 COLLEAGUE.SKILL: Automated AI Skill Generation via Expert Knowledge Distillation(COLLEAGUE.SKILL:通过专家知识蒸馏实现自动化AI技能生成) [01:17] 🧠 Representation Forcing for Bottleneck-Free Unified Multimodal Models(表示强制:无瓶颈统一多模态模型) [02:07] 🎙 SwanVoice: Expressive Long-Form Zero-Shot Speech Synthesis for Both Monologue and Dialogue(SwanVoice:面向独白与对话的表现力丰富长文本零样本语音合成) [02:58] 🔍 LongTraceRL: Learning Long-Context Reasoning from Search Agent Trajectories with Rubric Rewards(长迹强化学习:利用评分奖励从搜索代理轨迹中学习长上下文推理) [03:59] 🎧 Towards Streaming Synchronized Spatial Audio Generation via Autoregressive Diffusion Transformer(面向流式同步空间音频生成的自回归扩散Transformer) [04:48] 🖼 GGT-100K: Generative Ground Truth for Generalizable Real-World Image Restoration(GGT-100K:面向通用真实世界图像恢复的生成式真实标签) [05:39] 🎤 Comprehensive Benchmarking of Long-Form Speech Generation in Diverse Scenarios(多样化场景下长篇语音生成的综合基准测试) [06:46] 🛋 Function2Scene: 3D Indoor Scene Layout from Functional Specifications(从功能规格到场景:基于功能说明的3D室内布局生成) [07:36] 🎥 SANA-Streaming: Real-time Streaming Video Editing with Hybrid Diffusion Transformer(SANA-Streaming:基于混合扩散Transformer的实时流式视频编辑) [08:29] 🧠 Task-Focused Memorization for Multimodal Agents(面向多模态智能体的任务聚焦记忆机制) [09:30] 🤖 Exploring Autonomous Agentic Data Engineering for Model Specialization(探索面向模型专业化的自主代理数据工程) [10:15] 🎓 Not All Disagreement Is Learnable: Token Teachability in On-Policy Distillation(并非所有分歧都是可学习的:在线策略蒸馏中的令牌可教性) [11:10] 🧩 dMoE: dLLMs with Learnable Block Experts(dMoE:具有可学习块级专家机制的扩散大语言模型) [12:12] 🛠 Recovering Policy-Induced Errors: Benchmarking and Trajectory Synthesis for Robust GUI Agents(恢复策略诱导的错误:面向鲁棒GUI智能体的基准测试与轨迹合成) [13:07] 🛡 From Prompt Injection to Persistent Control: Defending Agentic Harness Against Trojan Backdoors(从提示注入到持久控制:防御智能体框架中的特洛伊后门) 【关注我们】 您还可以在以下平台找到我们,获得播客内容以外更多信息 小红书: AI速递 【赞助商】 OpenClaw快报 每天五分钟,听听 OpenClaw 快报,带你了解最新动态和业内讨论 传送门 https://www.xiaoyuzhoufm.com/podcast/6a1732a2dffa135d0ab5ef43

14分钟
99+
3周前
【月末特辑】5月最火AI论文 | 多智能体世界建模;开源机器人VLA模型

【月末特辑】5月最火AI论文 | 多智能体世界建模;开源机器人VLA模型

HuggingFace 每日AI论文速递

【目录】 本期的 10 篇论文如下: [00:45] TOP1(🔥407) | 🌍 Gamma-World: Generative Multi-Agent World Modeling Beyond Two Players(Gamma-World:超越双玩家的生成式多智能体世界建模) [03:09] TOP2(🔥347) | 🤖 MolmoAct2: Action Reasoning Models for Real-world Deployment(MolmoAct2:面向实际部署的動作推理模型) [05:30] TOP3(🔥269) | 🔍 CiteVQA: Benchmarking Evidence Attribution for Trustworthy Document Intelligence(CiteVQA:为可信文档智能建立证据归因基准) [07:51] TOP4(🔥231) | 🧠 Mean Mode Screaming: Mean--Variance Split Residuals for 1000-Layer Diffusion Transformers(均值模式尖叫:面向千层扩散Transformer的均值-方差分裂残差) [10:04] TOP5(🔥219) | 🏗 MinT: Managed Infrastructure for Training and Serving Millions of LLMs(MinT:用于训练和服务数百万大语言模型的托管基础设施) [11:59] TOP6(🔥217) | 🧠 Heterogeneous Scientific Foundation Model Collaboration(异构科学基础模型协作) [14:17] TOP7(🔥210) | 🤖 Code as Agent Harness(代码作为智能体框架) [16:26] TOP8(🔥210) | 🧠 SkillOpt: Executive Strategy for Self-Evolving Agent Skills(SkillOpt:面向自进化智能体技能的执行策略) [18:39] TOP9(🔥204) | 🎯 DelTA: Discriminative Token Credit Assignment for Reinforcement Learning from Verifiable Rewards(DelTA:面向可验证奖励强化学习的判别性令牌信用分配) [20:25] TOP10(🔥195) | 🧠 Anti-Self-Distillation for Reasoning RL via Pointwise Mutual Information(基于点互信息的反自蒸馏用于推理强化学习) 【关注我们】 您还可以在以下平台找到我们,获得播客内容以外更多信息 小红书: AI速递

23分钟
99+
3周前
2026.05.29 | AgentDoG 1.5实现毫秒级安全防护;Qwen-VLA统一跨任务动作建模。

2026.05.29 | AgentDoG 1.5实现毫秒级安全防护;Qwen-VLA统一跨任务动作建模。

HuggingFace 每日AI论文速递

【目录】 本期的 14 篇论文如下: [00:25] 🛡 AgentDoG 1.5: A Lightweight and Scalable Alignment Framework for AI Agent Safety and Security(AgentDoG 1.5:一种轻量级且可扩展的AI代理安全与安保对齐框架) [01:06] 🤖 Qwen-VLA: Unifying Vision-Language-Action Modeling across Tasks, Environments, and Robot Embodiments(Qwen-VLA:统一跨任务、环境和机器人本体的视觉-语言-动作建模) [02:02] 🌐 OmniRetrieval: Unified Retrieval across Heterogeneous Knowledge Sources(OmniRetrieval:跨异构知识源的统一检索) [02:52] 🎨 CollectionLoRA: Collecting 50 Effects in 1 LoRA via Multi-Teacher On-Policy Distillation(集合LoRA:通过多教师同策略蒸馏将50种效果收集到一个LoRA中) [03:47] 🎬 minWM: A Full-Stack Open-Source Framework for Real-Time Interactive Video World Models(minWM:一个用于实时交互式视频世界模型的全栈开源框架) [04:39] 🎥 YoCausal: How Far is Video Generation from World Model? A Causality Perspective(YoCausal:视频生成距离世界模型还有多远?一个因果视角) [05:42] 🎨 GenClaw: Code-Driven Agentic Image Generation(GenClaw:代码驱动的智能体图像生成) [06:40] ⚡ EarlyTom: Early Token Compression Completes Fast Video Understanding(EarlyTom:早期令牌压缩实现快速视频理解) [07:37] 🎯 UniSteer: Text-Guided Flow Matching in Activation Space for Versatile LLM Steering(UniSteer:文本引导的激活空间流匹配实现多功能大语言模型操控) [08:25] 🧠 How LoRA Remembers? A Parametric Memory Law for LLM Finetuning(LoRA如何记忆?大语言模型微调中的参数化记忆定律) [09:20] 🔗 LoMo: Local Modality Substitution for Deeper Vision-Language Fusion(本地模态替换:实现更深入的视觉-语言融合) [10:24] 🔍 LaRA: Layer-wise Representation Analysis for Detecting Data Contamination in RL Post-Training(LaRA:基于逐层表示分析的RL后训练数据污染检测方法) [11:16] 🧠 Skill0.5: Joint Skill Internalization and Utilization for Out-of-Distribution Generalization in Agentic Reinforcement Learning(Skill0.5:面向智能体强化学习中分布外泛化的技能内化与利用联合框架) [12:17] 🔍 Xetrieval: Mechanistically Explaining Dense Retrieval(Xetrieval:机制性解释稠密检索) 【关注我们】 您还可以在以下平台找到我们,获得播客内容以外更多信息 小红书: AI速递

13分钟
99+
3周前
2026.05.28 | ProRL主动引导推荐;γ-World实现多智能体零样本泛化

2026.05.28 | ProRL主动引导推荐;γ-World实现多智能体零样本泛化

HuggingFace 每日AI论文速递

【目录】 本期的 15 篇论文如下: [00:24] 🎯 ProRL: Effective Reinforcement Learning for Proactive Recommendation via Rectified Policy Gradient Estimation(ProRL:通过修正策略梯度估计实现主动推荐的有效强化学习) [01:27] 🌍 Gamma-World: Generative Multi-Agent World Modeling Beyond Two Players(Gamma-World:超越双玩家的生成式多智能体世界建模) [02:28] 🤖 Agent Explorative Policy Optimization for Multimodal Agentic Reasoning(面向多模态智能体推理的智能体探索性策略优化) [03:24] 👁 From Pixels to Words -- Towards Native One-Vision Models at Scale(从像素到文字——迈向规模化的原生单视觉模型) [04:19] 🔍 Self-Improving Language Models with Bidirectional Evolutionary Search(基于双向进化搜索的自我改进语言模型) [05:01] 🧮 ResearchMath-14K: Scaling Research-Level Mathematics via Agents(ResearchMath-14K:通过智能体扩展研究级数学问题) [06:03] 🔍 MemTrace: Tracing and Attributing Errors in Large Language Model Memory Systems(MemTrace:大型语言模型记忆系统中的错误追踪与归因) [06:58] 🛠 DenoiseRL: Bootstrapping Reasoning Models to Recover from Noisy Prefixes(DenoiseRL:引导推理模型从噪声前缀中恢复的自举强化学习) [07:54] 🤖 GEM: Generative Supervision Helps Embodied Intelligence(GEM:生成式监督助力具身智能) [08:41] 🎯 Learn from Weaknesses: Automated Domain Specialization for Small Computer-Use Agents(从弱点中学习:小型计算机使用智能体的自动化领域专精) [09:30] 🔗 ScientistOne: Towards Human-Level Autonomous Research via Chain-of-Evidence(ScientistOne:通过证据链迈向人类级别的自主研究) [10:27] 🔬 AI Research Agents Narrow Scientific Exploration(AI研究代理缩小科学探索范围) [11:17] 🧠 Rethinking Memory as Continuously Evolving Connectivity(重新思考记忆作为持续演化的连接性) [12:15] 🎥 OSP-Next: Efficient High-Quality Video Generation with Sparse Sequence Parallelism, HiF8 Quantization, and Reinforcement Learning(OSP-Next:基于稀疏序列并行、HiF8量化和强化学习的高效高质量视频生成) [13:08] ⚖ Long Live The Balance: Information Bottleneck Driven Tree-based Policy Optimization(长久平衡:信息瓶颈驱动的树形策略优化) 【关注我们】 您还可以在以下平台找到我们,获得播客内容以外更多信息 小红书: AI速递

14分钟
70
3周前
2026.05.27 | 并行框解码提速十倍;空间评测揭示模型短板

2026.05.27 | 并行框解码提速十倍;空间评测揭示模型短板

HuggingFace 每日AI论文速递

【目录】 本期的 15 篇论文如下: [00:24] 🎯 LocateAnything: Fast and High-Quality Vision-Language Grounding with Parallel Box Decoding(LocateAnything:基于并行框解码的快速高质量视觉-语言定位) [01:13] 🧩 SpatialBench: Is Your Spatial Foundation Model an All-Round Player?(SpatialBench:你的空间基础模型是全能选手吗?) [02:07] 🎬 EvalVerse: Pipeline-Aware and Expert-Calibrated Benchmarking for Professional Cinematic Video Generation(EvalVerse:面向专业电影级视频生成的流水线感知与专家校准基准测试框架) [03:06] 📱 MobileGym: A Verifiable and Highly Parallel Simulation Platform for Mobile GUI Agent Research(MobileGym:一个可验证且高度并行的移动图形用户界面智能体研究仿真平台) [04:05] 🏗 Geometry-Aware Representation Denoising for Robust Multi-view 3D Reconstruction(几何感知表示去噪:面向鲁棒的多视图三维重建) [05:00] 🎬 LongAV-Compass: Towards Unified Evaluation of Minute-Scale Audio-Visual Generation Across T2AV, I2AV, and V2AV(LongAV-Compass:面向分钟级音视频生成的统一评估框架,涵盖文本到音视频、图像到音视频及视频到音视频) [05:59] 🛡 $D^2$-Monitor: Dynamic Safety Monitoring for Diffusion LLMs via Hesitation-Aware Routing(D²-Monitor:基于犹豫感知路由的扩散大语言模型动态安全监控) [06:54] 🤖 The MiniMax-M2 Series: Mini Activations Unleashing Max Real-World Intelligence(MiniMax-M2系列:微型激活释放最大现实智能) [07:51] 🤝 Share More, Search Less: Collaborative Parallel Thinking for Efficient Test-Time Scaling(多分享,少搜索:面向高效测试时扩展的协作式并行思考) [08:46] 🎬 Soap2Soap: Long Cinematic Video Remaking via Multi-Agent Collaboration(Soap2Soap:基于多智能体协作的长篇影视视频重制) [09:46] 👁 LLaVA-OneVision-2: Towards Next-Generation Perceptual Intelligence(LLaVA-OneVision-2:迈向下一代感知智能) [10:37] 🤖 VitaBench 2.0: Evaluating Personalized and Proactive Agents in Long-Term User Interactions(VitaBench 2.0:评估长期用户交互中的个性化与主动型智能体) [11:42] 👁 Does Seeing More Mean Knowing More? Mono-Anchored Advantage Normalization for Multi-Source Visual Reasoning(看见更多就意味着知道更多吗?面向多源视觉推理的单锚点优势归一化方法) [12:34] 🔮 JLT: Clean-Latent Prediction in Latent Diffusion Transformers(JLT:潜在扩散Transformer中的干净潜在预测) [13:17] 🧠 Efficient Agentic Reinforcement Learning with On-Policy Intrinsic Knowledge Boundary Enhancement(基于策略的内在知识边界增强的高效智能体强化学习) 【关注我们】 您还可以在以下平台找到我们,获得播客内容以外更多信息 小红书: AI速递

14分钟
72
3周前
2026.05.26 | DVAO动态平衡多目标;WBench填补交互评估空白

2026.05.26 | DVAO动态平衡多目标;WBench填补交互评估空白

HuggingFace 每日AI论文速递

【目录】 本期的 15 篇论文如下: [00:25] 🎯 DVAO: Dynamic Variance-adaptive Advantage Optimization for Multi-reward Reinforcement Learning(DVAO:面向多奖励强化学习的动态方差自适应优势优化) [01:15] 🎬 WBench: A Comprehensive Multi-turn Benchmark for Interactive Video World Model Evaluation(WBench:用于交互式视频世界模型评估的全面多轮基准) [02:13] 🖥 Macaron-A2UI: A Model for Generative UI in Personal Agents(Macaron-A2UI:一种面向个人代理的生成式用户界面模型) [02:56] 🤝 Foundation Protocol: A Coordination Layer for Agentic Society(基础协议:面向智能体社会的协调层) [04:02] 🔺 TriSplat: Simulation-Ready Feed-Forward 3D Scene Reconstruction(TriSplat:面向模拟的馈通式三维场景重建) [05:05] 🎬 ParaVT: Taming the Tool Prior Paradox for Parallel Tool Use in Agentic Video Reinforcement Learning(ParaVT:驯服工具先验悖论,实现智能视频强化学习中的并行工具调用) [06:00] 🧠 Toward Native Multimodal Modeling: A Roadmap(迈向原生多模态建模:路线图) [06:50] 🔍 QUEST: Training Frontier Deep Research Agents with Fully Synthetic Tasks(QUEST:通过完全合成任务训练前沿深度研究智能体) [07:43] 🎯 ThriftAttention: Selective Mixed Precision for Long-Context FP4 Attention(ThriftAttention:面向长上下文的FP4注意力机制的选择性混合精度方法) [08:50] 🔬 AutoResearch AI: Towards AI-Powered Research Automation for Scientific Discovery(自动研究AI:迈向人工智能驱动的科学发现自动化研究) [09:46] 🧠 Your Embedding Model is SMARTer Than You Think(你的嵌入模型比想象中更聪明) [10:25] 💡 ControlLight: Towards Controllable, Consistent, and Generalizable Low-Light Enhancement(ControlLight:迈向可控、一致且泛化的低光增强) [11:22] 🌐 Pantheon360: Taming Digital Twin Generation via 3D-Aware 360° Video Diffusion(Pantheon360:通过三维感知的360°视频扩散驯服数字孪生生成) [12:09] 🤖 CUA-Gym: Scaling Verifiable Training Environments and Tasks for Computer-Use Agents(CUA-Gym:为计算机使用智能体扩展可验证的训练环境与任务) [13:01] 🤖 Anticipate and Learn: Unleashing Idle-Time Compute in Proactive Agents(预见与学习:释放主动智能体中的空闲计算资源) 【关注我们】 您还可以在以下平台找到我们,获得播客内容以外更多信息 小红书: AI速递

14分钟
70
4周前
2026.05.25 | SkillOpt实现技能自进化;Lens提升文生图训练效率

2026.05.25 | SkillOpt实现技能自进化;Lens提升文生图训练效率

HuggingFace 每日AI论文速递

【目录】 本期的 15 篇论文如下: [00:25] 🧠 SkillOpt: Executive Strategy for Self-Evolving Agent Skills(SkillOpt:面向自进化智能体技能的执行策略) [01:16] 🔍 Lens: Rethinking Training Efficiency for Foundational Text-to-Image Models(Lens:重新思考基础文本到图像模型的训练效率) [02:20] 🔀 Rethinking Cross-Layer Information Routing in Diffusion Transformers(重新思考扩散变换器中的跨层信息路由) [03:01] 🧠 SciAtlas: A Large-Scale Knowledge Graph for Automated Scientific Research(SciAtlas:面向自动化科学研究的大规模知识图谱) [03:56] 🎙 StepAudio 2.5 Technical Report(StepAudio 2.5 技术报告) [04:51] 👁 See What I Mean: Aligning Vision and Language Representations for Video Fine-grained Object Understanding(看懂我的意思:对齐视觉与语言表示以实现视频细粒度物体理解) [05:50] 📸 PhotoFlow: Agentic 3D Virtual Photography Missions(PhotoFlow:智能体式的3D虚拟摄影任务) [06:29] 🧠 From Raw Experience to Skill Consumption: A Systematic Study of Model-Generated Agent Skills(从原始经验到技能消费:模型生成智能体技能的系统性研究) [07:28] 🎥 VGenST-Bench: A Benchmark for Spatio-Temporal Reasoning via Active Video Synthesis(VGenST-Bench:通过主动视频合成进行时空推理的基准测试) [08:29] ⚡ PiD: Fast and High-Resolution Latent Decoding with Pixel Diffusion(PiD:基于像素扩散的快速高分辨率潜在解码) [09:35] 🎨 RankE: End-to-End Post-Training for Discrete Text-to-Image Generation with Decoder Co-Evolution(RankE:面向离散文本到图像生成的端到端后训练与解码器协同进化) [10:30] ✂ ETCHR: Editing To Clarify and Harness Reasoning(ETCHR:通过编辑来阐明和利用推理能力) [11:26] 🎮 SCOPE: Simulating Cross-game Operations in Playable Environments for FPS World Models(SCOPE:在可玩环境中模拟跨游戏操作以构建FPS世界模型) [12:14] 📡 LLMs as Noisy Channels: A Shannon Perspective on Model Capacity and Scaling Laws(大语言模型作为噪声信道:香农视角下的模型容量与缩放定律) [13:03] 🎥 Geo-Align: Video Generation Alignment via Metric Geometry Reward(几何对齐:基于度量几何奖励的视频生成对齐方法) 【关注我们】 您还可以在以下平台找到我们,获得播客内容以外更多信息 小红书: AI速递

14分钟
99+
4周前

加入我们的 Discord

与播客爱好者一起交流

立即加入

扫描微信二维码

添加微信好友,获取更多播客资讯

微信二维码

播放列表

自动播放下一个

播放列表还是空的

去找些喜欢的节目添加进来吧