HuggingFace 每日AI论文速递 - 节目列表

2026.04.27 | 坐标系统摄世界模型;扩散重建提速临床CT

2026.04.27 | 坐标系统摄世界模型;扩散重建提速临床CT

HuggingFace 每日AI论文速递

【目录】本期的 11 篇论文如下:[00:31] 🌍 Agentic World Modeling: Foundations, Capabilities, Laws, and Beyond(智能体世界建模:基础、能力、法则及其超越)[01:24] 🩻 DiffNR: Diffusion-Enhanced Neural Representation Optimization for Sparse-View 3D Tomographic Reconstruction(DiffNR:基于扩散增强的神经表示优化用于稀疏视角三维断层重建)[02:10] 🛡 LLM Safety From Within: Detecting Harmful Content with Internal Representations(从内部保障大语言模型安全:利用内部表征检测有害内容)[02:50] 🎬 FlowAnchor: Stabilizing the Editing Signal for Inversion-Free Video Editing(FlowAnchor:稳定无反转视频编辑中的编辑信号)[03:34] 📚 Contexts are Never Long Enough: Structured Reasoning for Scalable Question Answering over Long Document Sets(上下文永远不够长:面向长文档集的可扩展问答的结构化推理)[04:23] 🔍 AgentSearchBench: A Benchmark for AI Agent Search in the Wild(AgentSearchBench:野外AI智能体搜索基准测试)[05:03] 🎬 Building a Precise Video Language with Human-AI Oversight(构建具有人机监督的精准视频语言)[06:11] 🤖 dWorldEval: Scalable Robotic Policy Evaluation via Discrete Diffusion World Model(dWorldEval:基于离散扩散世界模型的可扩展机器人策略评估)[06:52] 🔍 Sessa: Selective State Space Attention(Sessa:选择性状态空间注意力)[07:32] 🌾 AgriIR: A Scalable Framework for Domain-Specific Knowledge Retrieval(AgriIR:一种面向领域特定知识检索的可扩展框架)[08:19] 🔦 Learning Evidence Highlighting for Frozen LLMs(学习为冻结的大语言模型高亮证据)【关注我们】您还可以在以下平台找到我们,获得播客内容以外更多信息小红书: AI速递在小宇宙查看该单集文稿

9分钟
97
1个月前
2026.04.24 | LLaTiSA四级闯关教模型读时序;WorldMark统一基准测视频世界模型

2026.04.24 | LLaTiSA四级闯关教模型读时序;WorldMark统一基准测视频世界模型

HuggingFace 每日AI论文速递

【目录】本期的 15 篇论文如下:00:23 📈 LLaTiSA: Towards Difficulty-Stratified Time Series Reasoning from Visual Perception to Semantics(LLaTiSA:从视觉感知到语义的难度分层时间序列推理)01:11 🎮 WorldMark: A Unified Benchmark Suite for Interactive Video World Models(WorldMark:交互式视频世界模型的统一基准套件)01:54 🤖 UniT: Toward a Unified Physical Language for Human-to-Humanoid Policy Learning and World Modeling(UniT:面向人形机器人策略学习与世界建模的统一物理语言)02:44 🎨 StyleID: A Perception-Aware Dataset and Metric for Stylization-Agnostic Facial Identity Recognition(StyleID:一种面向风格化无关面部身份识别的感知感知数据集与度量)03:56 ⏩ Seeing Fast and Slow: Learning the Flow of Time in Videos(快慢视觉:学习视频中的时间流动)04:39 ⚡ TingIS: Real-time Risk Event Discovery from Noisy Customer Incidents at Enterprise Scale(TingIS:企业级规模下从嘈杂客户事件中实时发现风险事件)05:16 🧠 Hybrid Policy Distillation for LLMs(面向大语言模型的混合策略蒸馏)05:48 🧠 Co-Evolving LLM Decision and Skill Bank Agents for Long-Horizon Tasks(面向长时域任务的LLM决策与技能库智能体协同进化)06:44 🤖 VLAA-GUI: Knowing When to Stop, Recover, and Search, A Modular Framework for GUI Automation(VLAA-GUI:一种用于GUI自动化的模块化框架——知晓何时停止、恢复与搜索)07:43 🧩 Context Unrolling in Omni Models(全模态模型中的上下文展开)08:31 🎨 EditCrafter: Tuning-free High-Resolution Image Editing via Pretrained Diffusion Model(EditCrafter:基于预训练扩散模型的无调优高分辨率图像编辑)09:34 🔗 UniGenDet: A Unified Generative-Discriminative Framework for Co-Evolutionary Image Generation and Generated Image Detection(UniGenDet:一种用于协同进化图像生成与生成图像检测的统一生成-判别框架)10:25 🌐 WebGen-R1: Incentivizing Large Language Models to Generate Functional and Aesthetic Websites with Reinforcement Learning(WebGen-R1:利用强化学习激励大型语言模型生成功能性与美观性网站)11:14 🔍 Trust but Verify: Introducing DAVinCI -- A Framework for Dual Attribution and Verification in Claim Inference for Language Models(信任但验证:引入DAVinCI——一种用于语言模型声明推理的双重归因与验证框架)12:11 🔍 Explainable Disentangled Representation Learning for Generalizable Authorship Attribution in the Era of Generative AI(面向生成式AI时代的可解释解耦表示学习用于泛化作者归因)【关注我们】您还可以在以下平台找到我们,获得播客内容以外更多信息小红书: AI速递在小宇宙查看该单集文稿

13分钟
99+
2个月前
2026.04.23 | LLaDA2.0统一多模态;未来经验外挂RL

2026.04.23 | LLaDA2.0统一多模态;未来经验外挂RL

HuggingFace 每日AI论文速递

【目录】本期的 15 篇论文如下:00:28 🔮 LLaDA2.0-Uni: Unifying Multimodal Understanding and Generation with Diffusion Large Language Model(LLaDA2.0-Uni:基于扩散大语言模型统一多模态理解与生成)01:17 🔮 Near-Future Policy Optimization(近未来策略优化)02:07 🤖 DR-Venus: Towards Frontier Edge-Scale Deep Research Agents with Only 10K Open Data(DR-Venus:仅用1万条开放数据迈向前沿边缘规模深度研究代理)02:53 🤖 DeVI: Physics-based Dexterous Human-Object Interaction via Synthetic Video Imitation(DeVI:基于物理的灵巧人机交互通过合成视频模仿)03:42 🎭 Reward Hacking in the Era of Large Models: Mechanisms, Emergent Misalignment, Challenges(大模型时代的奖励黑客:机制、涌现性失调与挑战)04:36 🧠 Exploring Spatial Intelligence from a Generative Perspective(从生成视角探索空间智能)05:21 🤖 A Self-Evolving Framework for Efficient Terminal Agents via Observational Context Compression(一种通过观测上下文压缩实现高效终端代理的自演化框架)06:18 🎤 WavAlign: Enhancing Intelligence and Expressiveness in Spoken Dialogue Models via Adaptive Hybrid Post-Training(WavAlign:通过自适应混合后训练增强口语对话模型的智能与表现力)07:06 🤖 SWE-chat: Coding Agent Interactions From Real Users in the Wild(SWE-chat:来自真实用户的编码智能体交互记录)07:53 🤖 Cortex 2.0: Grounding World Models in Real-World Industrial Deployment(Cortex 2.0:在现实工业部署中基于世界模型进行规划)08:36 🧠 Convergent Evolution: How Different Language Models Learn Similar Number Representations(趋同演化:不同语言模型如何学习相似的数值表示)09:21 🤝 SAVOIR: Learning Social Savoir-Faire via Shapley-based Reward Attribution(SAVOIR:通过沙普利值奖励归因学习社交智慧)09:57 🎬 ReImagine: Rethinking Controllable High-Quality Human Video Generation via Image-First Synthesis(ReImagine:通过图像优先合成重新思考可控的高质量人类视频生成)10:34 🔧 Visual Reasoning through Tool-supervised Reinforcement Learning(通过工具监督强化学习实现视觉推理)11:09 🤖 AI scientists produce results without reasoning scientifically(AI科学家产生结果但未进行科学推理)【关注我们】您还可以在以下平台找到我们,获得播客内容以外更多信息小红书: AI速递在小宇宙查看该单集文稿

12分钟
99+
2个月前
2026.04.22 | 虚拟试衣3.9秒高清生成;协同生成HOI视频物理一致

2026.04.22 | 虚拟试衣3.9秒高清生成;协同生成HOI视频物理一致

HuggingFace 每日AI论文速递

【目录】本期的 15 篇论文如下:00:23 👗 Tstars-Tryon 1.0: Robust and Realistic Virtual Try-On for Diverse Fashion Items(Tstars-Tryon 1.0:面向多样化时尚商品的鲁棒且逼真的虚拟试穿系统)01:05 🎬 CoInteract: Physically-Consistent Human-Object Interaction Video Synthesis via Spatially-Structured Co-Generation(CoInteract:通过空间结构化协同生成实现物理一致的人-物交互视频合成)01:58 🤖 AgentSPEX: An Agent SPecification and EXecution Language(AgentSPEX:一种智能体规范与执行语言)02:51 📐 AnyRecon: Arbitrary-View 3D Reconstruction with Video Diffusion Model(AnyRecon:基于视频扩散模型的任意视角三维重建)03:33 🚀 TEMPO: Scaling Test-time Training for Large Reasoning Models(TEMPO:扩展大型推理模型的测试时训练规模)04:26 🎮 PlayCoder: Making LLM-Generated GUI Code Playable(PlayCoder:让LLM生成的GUI代码可玩)05:08 🕶 ShadowPEFT: Shadow Network for Parameter-Efficient Fine-Tuning(ShadowPEFT:用于参数高效微调的影子网络)05:58 🤖 Chat2Workflow: A Benchmark for Generating Executable Visual Workflows with Natural Language(Chat2Workflow:基于自然语言生成可执行视觉工作流的基准)06:44 ⚖ AJ-Bench: Benchmarking Agent-as-a-Judge for Environment-Aware Evaluation(AJ-Bench:面向环境感知评估的Agent-as-a-Judge基准测试)07:31 🔄 Dual-View Training for Instruction-Following Information Retrieval(面向指令跟随信息检索的双视图训练)08:41 🔍 Code-Switching Information Retrieval: Benchmarks, Analysis, and the Limits of Current Retrievers(代码转换信息检索:基准测试、分析与当前检索系统的局限)09:20 🔗 Understanding and Enforcing Weight Disentanglement in Task Arithmetic(理解与强制任务算术中的权重解耦)10:00 ⚡ Speculative Decoding for Autoregressive Video Generation(用于自回归视频生成的推测解码)11:01 🧠 Target-Oriented Pretraining Data Selection via Neuron-Activated Graph(基于神经元激活图的目标导向预训练数据选择)11:41 🧩 UniMesh: Unifying 3D Mesh Understanding and Generation(UniMesh:统一三维网格理解与生成)【关注我们】您还可以在以下平台找到我们,获得播客内容以外更多信息小红书: AI速递在小宇宙查看该单集文稿

12分钟
99+
2个月前
2026.04.21 | 一步听懂句子出图;单步潜码搞定驾驶推理

2026.04.21 | 一步听懂句子出图;单步潜码搞定驾驶推理

HuggingFace 每日AI论文速递

【目录】本期的 15 篇论文如下:00:24 🚀 Extending One-Step Image Generation from Class Labels to Text via Discriminative Text Representation(从类别标签到文本:通过判别性文本表征扩展一步图像生成)01:08 🚗 OneVL: One-Step Latent Reasoning and Planning with Vision-Language Explanation(OneVL:基于视觉语言解释的单步潜在推理与规划)01:54 🤖 Agent-World: Scaling Real-World Environment Synthesis for Evolving General Agent Intelligence(Agent-World:通过可扩展环境合成推进通用智能体智能的自我演化训练场)02:41 🎮 OpenGame: Open Agentic Coding for Games(OpenGame:面向游戏开发的开放式智能体编码框架)03:48 🤖 MultiWorld: Scalable Multi-Agent Multi-View Video World Models(MultiWorld:可扩展的多智能体多视角视频世界模型)04:44 🎬 EasyVideoR1: Easier RL for Video Understanding(EasyVideoR1:面向视频理解的简易强化学习框架)05:42 🧭 WebCompass: Towards Multimodal Web Coding Evaluation for Code Language Models(WebCompass:面向代码语言模型的多模态网页编码评估)06:46 🧠 GFT: From Imitation to Reward Fine-Tuning with Unbiased Group Advantages and Dynamic Coefficient Rectification(GFT:从模仿到奖励微调——基于无偏群体优势与动态系数校正)07:34 🧠 SkillFlow:Benchmarking Lifelong Skill Discovery and Evolution for Autonomous Agents(SkillFlow:面向自主智能体的终身技能发现与演化基准测试)08:22 🧩 Crowded in B-Space: Calibrating Shared Directions for LoRA Merging(B空间拥挤:为LoRA合并校准共享方向)09:13 🧠 When Can LLMs Learn to Reason with Weak Supervision?(大型语言模型何时能在弱监督下学会推理?)10:04 🤖 ClawEnvKit: Automatic Environment Generation for Claw-Like Agents(ClawEnvKit:面向爪状智能体的自动环境生成系统)10:52 🎬 OmniScript: Towards Audio-Visual Script Generation for Long-Form Cinematic Video(OmniScript:面向长篇幅影视视频的视听脚本生成)11:35 🧬 Training LLM Agents for Spontaneous, Reward-Free Self-Evolution via World Knowledge Exploration(通过世界知识探索训练LLM智能体实现自发的、无奖励的自我进化)12:26 🧮 MathNet: a Global Multimodal Benchmark for Mathematical Reasoning and Retrieval(MathNet:一个用于数学推理与检索的全球多模态基准)【关注我们】您还可以在以下平台找到我们,获得播客内容以外更多信息小红书: AI速递在小宇宙查看该单集文稿

13分钟
99+
2个月前
2026.04.20 | DPM零训画质糖;两位翻转毁模型

2026.04.20 | DPM零训画质糖;两位翻转毁模型

HuggingFace 每日AI论文速递

【目录】本期的 15 篇论文如下:00:20 🔍 Elucidating the SNR-t Bias of Diffusion Probabilistic Models(阐明扩散概率模型的信噪比-时间步偏差)01:00 💥 Maximal Brain Damage Without Data or Optimization: Disrupting Neural Networks via Sign-Bit Flips(无需数据或优化的最大脑损伤:通过符号位翻转破坏神经网络)01:45 🧠 PersonaVLM: Long-Term Personalized Multimodal LLMs(PersonaVLM:面向长期个性化的多模态大语言模型)02:56 🧩 Web Retrieval-Aware Chunking (W-RAC) for Efficient and Cost-Effective Retrieval-Augmented Generation Systems(面向高效且经济高效的检索增强生成系统的Web检索感知分块(W-RAC))03:40 ✂ Cut Your Losses! Learning to Prune Paths Early for Efficient Parallel Reasoning(削减你的损失!学习早期剪枝路径以实现高效并行推理)04:32 🚀 Qwen3.5-Omni Technical Report(Qwen3.5-Omni技术报告)05:17 🧱 Repurposing 3D Generative Model for Autoregressive Layout Generation(重新利用三维生成模型进行自回归布局生成)06:02 🔍 (1D) Ordered Tokens Enable Efficient Test-Time Search((一维)有序分词实现高效的测试时搜索)06:55 📈 QuantCode-Bench: A Benchmark for Evaluating the Ability of Large Language Models to Generate Executable Algorithmic Trading Strategies(QuantCode-Bench:评估大语言模型生成可执行算法交易策略能力的基准)07:36 🧠 Learning Adaptive Reasoning Paths for Efficient Visual Reasoning(学习自适应推理路径以实现高效视觉推理)08:29 🔍 TIPSv2: Advancing Vision-Language Pretraining with Enhanced Patch-Text Alignment(TIPSv2:通过增强的补丁-文本对齐推进视觉-语言预训练)09:33 💡 Can Large Language Models Reinvent Foundational Algorithms?(大型语言模型能否重新发明基础算法?)10:17 📊 GTA-2: Benchmarking General Tool Agents from Atomic Tool-Use to Open-Ended Workflows(GTA-2:从原子工具使用到开放式工作流的通用工具智能体基准测试)11:10 ⚡ AccelOpt: A Self-Improving LLM Agentic System for AI Accelerator Kernel Optimization(AccelOpt:一种用于AI加速器内核优化的自我改进型LLM智能体系统)11:55 🎭 Hierarchical Codec Diffusion for Video-to-Speech Generation(基于分层编解码扩散的视频到语音生成)【关注我们】您还可以在以下平台找到我们,获得播客内容以外更多信息小红书: AI速递在小宇宙查看该单集文稿

13分钟
99+
2个月前
2026.04.17 | HY-World2.0统一生成与重建;DR³-Eval建可复现研究基准

2026.04.17 | HY-World2.0统一生成与重建;DR³-Eval建可复现研究基准

HuggingFace 每日AI论文速递

【赞助商】通勤路上就听AI每周谈。AI每周谈,每周带你回顾上周AI大事传送门 🔗www.xiaoyuzhoufm.com【目录】本期的 15 篇论文如下:00:31 🌍 HY-World 2.0: A Multi-Modal World Model for Reconstructing, Generating, and Simulating 3D Worlds(HY-World 2.0:用于重建、生成和模拟3D世界的多模态世界模型)01:24 🔬 DR$^{3}$-Eval: Towards Realistic and Reproducible Deep Research Evaluation(DR³-Eval:迈向现实且可复现的深度研究评估)02:16 🚗 RAD-2: Scaling Reinforcement Learning in a Generator-Discriminator Framework(RAD-2:在生成器-判别器框架中扩展强化学习)03:15 🤖 HiVLA: A Visual-Grounded-Centric Hierarchical Embodied Manipulation System(HiVLA:一种以视觉接地为中心的分层具身操作系统)04:04 🛡 ASGuard: Activation-Scaling Guard to Mitigate Targeted Jailbreaking Attack(ASGuard:基于激活缩放防护的定向越狱攻击缓解方法)05:02 🧠 How to Fine-Tune a Reasoning Model? A Teacher-Student Cooperation Framework to Synthesize Student-Consistent SFT Data(如何微调推理模型?一种师生协作框架以合成学生一致的SFT数据)05:37 🌍 GlobalSplat: Efficient Feed-Forward 3D Gaussian Splatting via Global Scene Tokens(GlobalSplat:通过全局场景令牌实现高效前馈式3D高斯溅射)06:20 🔍 UniDoc-RL: Coarse-to-Fine Visual RAG with Hierarchical Actions and Dense Rewards(UniDoc-RL:基于分层动作与密集奖励的从粗到细视觉检索增强生成)06:56 🧠 Switch-KD: Visual-Switch Knowledge Distillation for Vision-Language Models(Switch-KD:面向视觉语言模型的视觉切换知识蒸馏)07:34 🛣 TRACER: Trace-Based Adaptive Cost-Efficient Routing for LLM Classification(TRACER:基于追踪的自适应成本高效路由用于大语言模型分类)08:41 🤖 Dive into Claude Code: The Design Space of Today's and Future AI Agent Systems(深入Claude Code:当今及未来AI代理系统的设计空间)09:25 🎬 Representations Before Pixels: Semantics-Guided Hierarchical Video Prediction(像素之前:语义引导的分层视频预测)10:13 🧭 Don't Retrieve, Navigate: Distilling Enterprise Knowledge into Navigable Agent Skills for QA and RAG(无需检索,直接导航:将企业知识提炼为可导航的智能体技能以支持问答与检索增强生成)11:03 🚀 LongAct: Harnessing Intrinsic Activation Patterns for Long-Context Reinforcement Learning(LongAct:利用内在激活模式进行长上下文强化学习)11:45 ⚡ KV Packet: Recomputation-Free Context-Independent KV Caching for LLMs(KV数据包:面向大语言模型的无重计算上下文无关KV缓存方法)【关注我们】您还可以在以下平台找到我们,获得播客内容以外更多信息小红书: AI速递在小宇宙查看该单集文稿

13分钟
99+
2个月前
2026.04.16 | Seedance 2.0一统多模态生成;RationalRewards让奖励模型讲理

2026.04.16 | Seedance 2.0一统多模态生成;RationalRewards让奖励模型讲理

HuggingFace 每日AI论文速递

【赞助商】通勤路上就听AI每周谈。AI每周谈,每周带你回顾上周AI大事传送门 🔗www.xiaoyuzhoufm.com【目录】本期的 15 篇论文如下:00:31 🎬 Seedance 2.0: Advancing Video Generation for World Complexity(Seedance 2.0:面向世界复杂性的视频生成技术进展)01:21 🧠 RationalRewards: Reasoning Rewards Scale Visual Generation Both Training and Test Time(RationalRewards:推理奖励在训练和测试时均能提升视觉生成)02:08 🧠 SpatialEvo: Self-Evolving Spatial Intelligence via Deterministic Geometric Environments(SpatialEvo:通过确定性几何环境实现自我演化的空间智能)02:49 🧪 OccuBench: Evaluating AI Agents on Real-World Professional Tasks via Language World Models(OccuBench:基于语言世界模型评估AI代理在真实世界专业任务上的表现)03:39 🎮 GameWorld: Towards Standardized and Verifiable Evaluation of Multimodal Game Agents(GameWorld:面向多模态游戏智能体的标准化与可验证评估)04:22 🧠 Memory Transfer Learning: How Memories are Transferred Across Domains in Coding Agents(记忆迁移学习:编码智能体中记忆如何跨领域迁移)05:21 🧠 From $P(y|x)$ to $P(y)$: Investigating Reinforcement Learning in Pre-train Space(从条件分布$P(y|x)$到边际分布$P(y)$:探索预训练空间中的强化学习)06:16 🎯 Target Policy Optimization(目标策略优化)07:12 🧩 Sema Code: Decoupling AI Coding Agents into Programmable, Embeddable Infrastructure(Sema Code:将AI编码代理解耦为可编程、可嵌入的基础设施)08:04 🤖 SemaClaw: A Step Towards General-Purpose Personal AI Agents through Harness Engineering(SemaClaw:通过约束工程迈向通用个人AI代理的一步)08:40 🔍 Free Geometry: Refining 3D Reconstruction from Longer Versions of Itself(自由几何:从自身的更长版本中精炼三维重建)09:26 🔍 TIP: Token Importance in On-Policy Distillation(TIP:基于策略蒸馏中的令牌重要性)10:21 🔬 ReconPhys: Reconstruct Appearance and Physical Attributes from Single Video(ReconPhys:从单视频重建外观与物理属性)11:15 🔍 UI-Zoomer: Uncertainty-Driven Adaptive Zoom-In for GUI Grounding(UI-Zoomer:基于不确定性的自适应放大用于图形用户界面定位)12:17 🤖 TREX: Automating LLM Fine-tuning via Agent-Driven Tree-based Exploration(TREX:基于智能体驱动树状探索的LLM自动微调)【关注我们】您还可以在以下平台找到我们,获得播客内容以外更多信息小红书: AI速递在小宇宙查看该单集文稿

13分钟
99+
2个月前
2026.04.15 | ClawGUI开源全家桶;KnowRL精简提示提效

2026.04.15 | ClawGUI开源全家桶;KnowRL精简提示提效

HuggingFace 每日AI论文速递

【赞助商】通勤路上就听AI每周谈。AI每周谈,每周带你回顾上周AI大事传送门 🔗www.xiaoyuzhoufm.com【目录】本期的 15 篇论文如下:00:33 🤖 ClawGUI: A Unified Framework for Training, Evaluating, and Deploying GUI Agents(ClawGUI:用于训练、评估和部署GUI智能体的统一框架)01:21 🧠 KnowRL: Boosting LLM Reasoning via Reinforcement Learning with Minimal-Sufficient Knowledge Guidance(KnowRL:通过强化学习与最小充分知识指导提升大语言模型推理能力)02:16 🧠 Rethinking On-Policy Distillation of Large Language Models: Phenomenology, Mechanism, and Recipe(重新思考大型语言模型的在线策略蒸馏:现象学、机制与方案)03:09 🤖 Turing Test on Screen: A Benchmark for Mobile GUI Agent Humanization(屏幕上的图灵测试:移动GUI代理拟人化基准)04:01 🧠 SPPO: Sequence-Level PPO for Long-Horizon Reasoning Tasks(SPPO:面向长程推理任务的序列级近端策略优化)04:47 🤖 Toward Autonomous Long-Horizon Engineering for ML Research(迈向自主长周期机器学习研究工程)05:33 ⚖ BERT-as-a-Judge: A Robust Alternative to Lexical Methods for Efficient Reference-Based LLM Evaluation(BERT作为评判者:一种用于高效基于参考的LLM评估的鲁棒性替代词汇方法)06:17 🔍 Towards Long-horizon Agentic Multimodal Search(迈向长视野智能体多模态搜索)06:57 🌍 Lyra 2.0: Explorable Generative 3D Worlds(Lyra 2.0:可探索的生成式3D世界)07:40 ⚡ Self-Adversarial One Step Generation via Condition Shifting(通过条件偏移实现的自对抗单步生成)08:37 🤖 Habitat-GS: A High-Fidelity Navigation Simulator with Dynamic Gaussian Splatting(Habitat-GS:基于动态高斯溅射的高保真导航模拟器)09:20 ⚖ Many-Tier Instruction Hierarchy in LLM Agents(大语言模型代理中的多层级指令层次结构)10:04 🚀 Nemotron 3 Super: Open, Efficient Mixture-of-Experts Hybrid Mamba-Transformer Model for Agentic Reasoning(Nemotron 3 Super:用于智能体推理的开放、高效混合专家Mamba-Transformer模型)10:52 🧠 Rethinking the Diffusion Model from a Langevin Perspective(从朗之万视角重新思考扩散模型)11:44 🤖 LARY: A Latent Action Representation Yielding Benchmark for Generalizable Vision-to-Action Alignment(LARY:一种用于通用视觉-动作对齐基准的潜在动作表示)【关注我们】您还可以在以下平台找到我们,获得播客内容以外更多信息小红书: AI速递在小宇宙查看该单集文稿

12分钟
99+
2个月前
2026.04.14 | 错罚记忆助大模型提分;注意力沉没机制全解析

2026.04.14 | 错罚记忆助大模型提分;注意力沉没机制全解析

HuggingFace 每日AI论文速递

【赞助商】通勤路上就听AI每周谈。AI每周谈,每周带你回顾上周AI大事传送门 🔗https://www.xiaoyuzhoufm.com/podcast/688a34636f5a275f1cba40fd【目录】本期的 15 篇论文如下:[00:31] 🧠 The Past Is Not Past: Memory-Enhanced Dynamic Reward Shaping(过去并未过去:基于记忆增强的动态奖励塑形)[01:20] 🔍 Attention Sink in Transformers: A Survey on Utilization, Interpretation, and Mitigation(Transformer中的注意力沉没现象:利用、解释与缓解策略综述)[02:08] ⚛ QuanBench+: A Unified Multi-Framework Benchmark for LLM-Based Quantum Code Generation(QuanBench+:面向基于大语言模型的量子代码生成的统一多框架基准测试)[02:59] 🎬 OmniShow: Unifying Multimodal Conditions for Human-Object Interaction Video Generation(OmniShow:统一多模态条件的人-物交互视频生成)[03:35] 🎨 Strips as Tokens: Artist Mesh Generation with Native UV Segmentation(条带即令牌:基于原生UV分割的艺术家网格生成)[04:11] 🎬 Uni-ViGU: Towards Unified Video Generation and Understanding via A Diffusion-Based Video Generator(Uni-ViGU:通过基于扩散的视频生成器实现统一的视频生成与理解)[05:13] 🔍 Pseudo-Unification: Entropy Probing Reveals Divergent Information Patterns in Unified Multimodal Models(伪统一:熵探测揭示统一多模态模型中的信息模式分歧)[05:57] 🔍 CodeTracer: Towards Traceable Agent States(CodeTracer:迈向可追溯的智能体状态)[06:45] 🧪 CocoaBench: Evaluating Unified Digital Agents in the Wild(CocoaBench:在真实场景中评估统一数字智能体)[07:32] 🕸 Tracing the Roots: A Multi-Agent Framework for Uncovering Data Lineage in Post-Training LLMs(溯源寻根:用于揭示后训练大语言模型中数据谱系的多智能体框架)[08:17] 🤔 Introspective Diffusion Language Models(内省扩散语言模型)[09:12] 🧠 Solving Physics Olympiad via Reinforcement Learning on Physics Simulators(基于物理模拟器的强化学习解决物理奥林匹克竞赛问题)[09:50] 🎬 Prompt Relay: Inference-Time Temporal Control for Multi-Event Video Generation(提示接力:面向多事件视频生成的推理时态控制)[10:38] 🎵 Audio Flamingo Next: Next-Generation Open Audio-Language Models for Speech, Sound, and Music(Audio Flamingo Next:面向语音、声音与音乐的下一代开放音频-语言模型)[11:33] ⚡ SPEED-Bench: A Unified and Diverse Benchmark for Speculative Decoding(SPEED-Bench:一个用于推测解码的统一且多样化的基准测试)【关注我们】您还可以在以下平台找到我们,获得播客内容以外更多信息小红书: AI速递在小宇宙查看该单集文稿

13分钟
99+
2个月前

加入我们的 Discord

与播客爱好者一起交流

立即加入

扫描微信二维码

添加微信好友,获取更多播客资讯

微信二维码

播放列表

自动播放下一个

播放列表还是空的

去找些喜欢的节目添加进来吧