HuggingFace 每日AI论文速递 - 节目列表

【目录】本期的 15 篇论文如下： [00:25] 🧠 Heterogeneous Scientific Foundation Model Collaboration（异构科学基础模型协作） [01:24] 🌍 Visual Generation in the New Era: An Evolution from Atomic Mapping to Agentic World Modeling（新时代的视觉生成：从原子映射到智能体世界建模的演进） [02:04] 🧬 Co-Evolving Policy Distillation（共同演化策略蒸馏） [02:47] 🤖 ExoActor: Exocentric Video Generation as Generalizable Interactive Humanoid Control（ExoActor：外视点视频生成作为可泛化的交互式人形机器人控制） [03:38] 🚀 Efficient Training on Multiple Consumer GPUs with RoundPipe（在多块消费级GPU上使用RoundPipe进行高效训练） [04:17] 🧠 Claw-Eval-Live: A Live Agent Benchmark for Evolving Real-World Workflows（Claw-Eval-Live：一个面向不断演变的真实世界工作流的实时智能体基准测试） [05:08] 🎨 Leveraging Verifier-Based Reinforcement Learning in Image Editing（利用基于验证器的强化学习进行图像编辑） [06:18] 📏 Length Value Model: Scalable Value Pretraining for Token-Level Length Modeling（长度价值模型：面向令牌级长度建模的可扩展价值预训练） [07:15] 🔬 Intern-Atlas: A Methodological Evolution Graph as Research Infrastructure for AI Scientists（Intern-Atlas：作为AI科学家研究基础设施的方法演化图） [08:31] 🌐 InteractWeb-Bench: Can Multimodal Agent Escape Blind Execution in Interactive Website Generation?（InteractWeb-Bench：多模态智能体能否在交互式网站生成中摆脱盲目执行？） [09:15] 🎨 Representation Fréchet Loss for Visual Generation（用于视觉生成的表示空间弗雷歇损失） [10:05] 🖥 Synthetic Computers at Scale for Long-Horizon Productivity Simulation（面向长周期生产力模拟的大规模合成计算机） [10:52] 🧠 Compliance versus Sensibility: On the Reasoning Controllability in Large Language Models（合规性与敏感性：大型语言模型中的推理可控性研究） [11:25] 🤖 The Last Human-Written Paper: Agent-Native Research Artifacts（最后一篇人类撰写的论文：智能体原生研究工件） [12:14] 💃 MoCapAnything V2: End-to-End Motion Capture for Arbitrary Skeletons（MoCapAnything V2：面向任意骨骼的端到端动作捕捉）【关注我们】您还可以在以下平台找到我们，获得播客内容以外更多信息小红书: AI速递

67

2026.04.30 | GLM-5V一锅端训多模态；潜在蒸馏采样省样本

【目录】本期的 11 篇论文如下： [00:22] 🤖 GLM-5V-Turbo: Toward a Native Foundation Model for Multimodal Agents（GLM-5V-Turbo：迈向多模态智能体的原生基础模型） [01:26] 🔬 Large Language Models Explore by Latent Distilling（大型语言模型通过潜在蒸馏进行探索） [02:16] 🌊 Turning the TIDE: Cross-Architecture Distillation for Diffusion Large Language Models（扭转潮流：面向扩散大语言模型的跨架构蒸馏） [03:02] 🦾 ClawGym: A Scalable Framework for Building Effective Claw Agents（ClawGym：一个构建高效Claw智能体的可扩展框架） [03:49] 🤖 RADIO-ViPE: Online Tightly Coupled Multi-Modal Fusion for Open-Vocabulary Semantic SLAM in Dynamic Environments（RADIO-ViPE：面向动态环境中开放词汇语义SLAM的在线紧耦合多模态融合） [04:35] 🧩 Diffusion Templates: A Unified Plugin Framework for Controllable Diffusion（扩散模板：一种用于可控扩散的统一插件框架） [05:20] 🚀 Accelerating RL Post-Training Rollouts via System-Integrated Speculative Decoding（通过系统集成的推测解码加速强化学习后训练中的自回归生成） [06:08] 🌍 Unified 4D World Action Modeling from Video Priors with Asynchronous Denoising（基于异步去噪的视频先验的统一4D世界动作建模） [07:02] 💬 A Survey on LLM-based Conversational User Simulation（基于大语言模型的对话式用户模拟综述） [07:55] 👗 FASH-iCNN: Making Editorial Fashion Identity Inspectable Through Multimodal CNN Probing（FASH-iCNN：通过多模态CNN探针使时尚编辑身份可审查） [08:43] 🧩 Probing Visual Planning in Image Editing Models（探究图像编辑模型中的视觉规划能力）【关注我们】您还可以在以下平台找到我们，获得播客内容以外更多信息小红书: AI速递

9分钟

52

2026.04.29 | 递归多智能体套娃提速；数据编程Git式自改进

【目录】本期的 15 篇论文如下： [00:25] 🔄 Recursive Multi-Agent Systems（递归多智能体系统） [01:01] 🔧 Programming with Data: Test-Driven Data Engineering for Self-Improving LLMs from Raw Corpora（数据编程：面向自改进大语言模型从原始语料库进行测试驱动数据工程） [01:55] 📊 DV-World: Benchmarking Data Visualization Agents in Real-World Scenarios（DV-World：在真实世界场景中评估数据可视化智能体的基准） [02:36] 🔬 AutoResearchBench: Benchmarking AI Agents on Complex Scientific Literature Discovery（AutoResearchBench：基于复杂科学文献发现的AI智能体基准测试） [03:23] 🖼 Meta-CoT: Enhancing Granularity and Generalization in Image Editing（元链式思维：增强图像编辑的粒度与泛化能力） [04:07] 🎨 Refinement via Regeneration: Enlarging Modification Space Boosts Image Refinement in Unified Multimodal Models（通过重生成进行精炼：扩大修改空间提升统一多模态模型中的图像精炼效果） [05:03] 🎥 Mutual Forcing: Dual-Mode Self-Evolution for Fast Autoregressive Audio-Video Character Generation（相互强迫：用于快速自回归音视频角色生成的双模式自演化） [05:46] 🎧 Step-Audio-R1.5 Technical Report（Step-Audio-R1.5 技术报告） [06:26] 🎬 Co-Director: Agentic Generative Video Storytelling（联合导演：基于智能体的生成式视频故事讲述） [07:13] 🖥 Toward Scalable Terminal Task Synthesis via Skill Graphs（面向可扩展终端任务合成的技能图方法） [07:57] 🎓 TCOD: Exploring Temporal Curriculum in On-Policy Distillation for Multi-turn Autonomous Agents（TCOD：面向多轮自主智能体的在策略蒸馏中的时序课程探索） [08:53] 🛡 BARRED: Synthetic Training of Custom Policy Guardrails via Asymmetric Debate（BARRED：通过非对称辩论进行自定义策略护栏的合成训练） [09:36] 🎓 MAIC-UI: Making Interactive Courseware with Generative UI（MAIC-UI：利用生成式用户界面制作交互式课件） [10:35] 🎨 V-GRPO: Online Reinforcement Learning for Denoising Generative Models Is Easier than You Think（V-GRPO：去噪生成模型的在线强化学习比你想象的要简单） [11:15] 🏃 IAM: Identity-Aware Human Motion and Shape Joint Generation（身份感知的人体运动与形状联合生成）【关注我们】您还可以在以下平台找到我们，获得播客内容以外更多信息小红书: AI速递

12分钟

88

2026.04.28 | 强化学习逼出几何一致视频；AI公司乐高式组队降本提效

【目录】本期的 15 篇论文如下： [00:24] 🌍 World-R1: Reinforcing 3D Constraints for Text-to-Video Generation（世界-R1：通过强化学习为文本到视频生成注入3D约束） [01:29] 🏢 From Skills to Talent: Organising Heterogeneous Agents as a Real-World Company（从技能到人才：将异构智能体组织为现实世界公司） [02:26] 🧠 ReVSI: Rebuilding Visual Spatial Intelligence Evaluation for Accurate Assessment of VLM 3D Reasoning（重建视觉空间智能评估：精准评估VLM三维推理能力） [03:23] 🛡 Vision-Language-Action Safety: Threats, Challenges, Evaluations, and Mechanisms（视觉-语言-动作安全：威胁、挑战、评估与机制） [04:12] 🖼 Tuna-2: Pixel Embeddings Beat Vision Encoders for Multimodal Understanding and Generation（Tuna-2：像素嵌入在多模态理解与生成中击败视觉编码器） [05:02] 🤖 ClawMark: A Living-World Benchmark for Multi-Turn, Multi-Day, Multimodal Coworker Agents（ClawMark：面向多轮、多日、多模态协作者智能体的现实世界基准测试） [06:20] ✍ SketchVLM: Vision language models can annotate images to explain thoughts and guide users（SketchVLM：视觉语言模型可以通过图像标注来解释思维并引导用户） [07:17] 🔬 Rewarding the Scientific Process: Process-Level Reward Modeling for Agentic Data Analysis（奖励科学过程：面向智能体数据分析的过程级奖励建模） [08:24] ⚖ Taming Actor-Observer Asymmetry in Agents via Dialectical Alignment（通过辩证对齐驯服智能体中的行动者-观察者不对称性） [09:20] 🔀 Efficient Agent Evaluation via Diversity-Guided User Simulation（通过多样性引导的用户模拟实现高效智能体评估） [10:02] ⚡ For-Value: Efficient Forward-Only Data Valuation for finetuning LLMs and VLMs（For-Value：面向微调大语言模型和视觉语言模型的高效前向数据估值方法） [11:04] 🎬 OmniShotCut: Holistic Relational Shot Boundary Detection with Shot-Query Transformer（全镜头剪切：基于镜头查询Transformer的整体关系型镜头边界检测） [12:03] 📷 UniGeo: Unifying Geometric Guidance for Camera-Controllable Image Editing via Video Models（UniGeo：通过视频模型实现相机可控图像编辑的统一几何引导） [12:49] 📄 TexOCR: Advancing Document OCR Models for Compilable Page-to-LaTeX Reconstruction（TexOCR：面向可编译页面到LaTeX重建的文档OCR模型进展） [13:56] 🔄 How Much Is One Recurrence Worth? Iso-Depth Scaling Laws for Looped Language Models（一次循环值多少？循环语言模型的等深度缩放定律）【关注我们】您还可以在以下平台找到我们，获得播客内容以外更多信息小红书: AI速递

15分钟

2026.04.27 | 坐标系统摄世界模型；扩散重建提速临床CT

【目录】本期的 11 篇论文如下： [00:31] 🌍 Agentic World Modeling: Foundations, Capabilities, Laws, and Beyond（智能体世界建模：基础、能力、法则及其超越） [01:24] 🩻 DiffNR: Diffusion-Enhanced Neural Representation Optimization for Sparse-View 3D Tomographic Reconstruction（DiffNR：基于扩散增强的神经表示优化用于稀疏视角三维断层重建） [02:10] 🛡 LLM Safety From Within: Detecting Harmful Content with Internal Representations（从内部保障大语言模型安全：利用内部表征检测有害内容） [02:50] 🎬 FlowAnchor: Stabilizing the Editing Signal for Inversion-Free Video Editing（FlowAnchor：稳定无反转视频编辑中的编辑信号） [03:34] 📚 Contexts are Never Long Enough: Structured Reasoning for Scalable Question Answering over Long Document Sets（上下文永远不够长：面向长文档集的可扩展问答的结构化推理） [04:23] 🔍 AgentSearchBench: A Benchmark for AI Agent Search in the Wild（AgentSearchBench：野外AI智能体搜索基准测试） [05:03] 🎬 Building a Precise Video Language with Human-AI Oversight（构建具有人机监督的精准视频语言） [06:11] 🤖 dWorldEval: Scalable Robotic Policy Evaluation via Discrete Diffusion World Model（dWorldEval：基于离散扩散世界模型的可扩展机器人策略评估） [06:52] 🔍 Sessa: Selective State Space Attention（Sessa：选择性状态空间注意力） [07:32] 🌾 AgriIR: A Scalable Framework for Domain-Specific Knowledge Retrieval（AgriIR：一种面向领域特定知识检索的可扩展框架） [08:19] 🔦 Learning Evidence Highlighting for Frozen LLMs（学习为冻结的大语言模型高亮证据）【关注我们】您还可以在以下平台找到我们，获得播客内容以外更多信息小红书: AI速递

9分钟

88

【周末特辑】4月第4周最火AI论文 | Tstars-Tryon登顶虚拟试穿；LLaDA2.0-Uni统一多模态生成

【目录】本期的 5 篇论文如下： 00:31 TOP1(🔥244) | 👗 Tstars-Tryon 1.0: Robust and Realistic Virtual Try-On for Diverse Fashion Items（Tstars-Tryon 1.0：面向多样化时尚商品的鲁棒且逼真的虚拟试穿系统） 02:42 TOP2(🔥229) | 🔮 LLaDA2.0-Uni: Unifying Multimodal Understanding and Generation with Diffusion Large Language Model（LLaDA2.0-Uni：基于扩散大语言模型统一多模态理解与生成） 05:07 TOP3(🔥154) | 🤖 AgentSPEX: An Agent SPecification and EXecution Language（AgentSPEX：一种智能体规范与执行语言） 07:06 TOP4(🔥96) | 🚀 Extending One-Step Image Generation from Class Labels to Text via Discriminative Text Representation（从类别标签到文本：通过判别性文本表征扩展一步图像生成） 08:48 TOP5(🔥84) | 🚗 OneVL: One-Step Latent Reasoning and Planning with Vision-Language Explanation（OneVL：基于视觉语言解释的单步潜在推理与规划）【关注我们】您还可以在以下平台找到我们，获得播客内容以外更多信息小红书: AI速递

11分钟

2026.04.24 | LLaTiSA四级闯关教模型读时序；WorldMark统一基准测视频世界模型

【目录】本期的 15 篇论文如下： 00:23 📈 LLaTiSA: Towards Difficulty-Stratified Time Series Reasoning from Visual Perception to Semantics（LLaTiSA：从视觉感知到语义的难度分层时间序列推理） 01:11 🎮 WorldMark: A Unified Benchmark Suite for Interactive Video World Models（WorldMark：交互式视频世界模型的统一基准套件） 01:54 🤖 UniT: Toward a Unified Physical Language for Human-to-Humanoid Policy Learning and World Modeling（UniT：面向人形机器人策略学习与世界建模的统一物理语言） 02:44 🎨 StyleID: A Perception-Aware Dataset and Metric for Stylization-Agnostic Facial Identity Recognition（StyleID：一种面向风格化无关面部身份识别的感知感知数据集与度量） 03:56 ⏩ Seeing Fast and Slow: Learning the Flow of Time in Videos（快慢视觉：学习视频中的时间流动） 04:39 ⚡ TingIS: Real-time Risk Event Discovery from Noisy Customer Incidents at Enterprise Scale（TingIS：企业级规模下从嘈杂客户事件中实时发现风险事件） 05:16 🧠 Hybrid Policy Distillation for LLMs（面向大语言模型的混合策略蒸馏） 05:48 🧠 Co-Evolving LLM Decision and Skill Bank Agents for Long-Horizon Tasks（面向长时域任务的LLM决策与技能库智能体协同进化） 06:44 🤖 VLAA-GUI: Knowing When to Stop, Recover, and Search, A Modular Framework for GUI Automation（VLAA-GUI：一种用于GUI自动化的模块化框架——知晓何时停止、恢复与搜索） 07:43 🧩 Context Unrolling in Omni Models（全模态模型中的上下文展开） 08:31 🎨 EditCrafter: Tuning-free High-Resolution Image Editing via Pretrained Diffusion Model（EditCrafter：基于预训练扩散模型的无调优高分辨率图像编辑） 09:34 🔗 UniGenDet: A Unified Generative-Discriminative Framework for Co-Evolutionary Image Generation and Generated Image Detection（UniGenDet：一种用于协同进化图像生成与生成图像检测的统一生成-判别框架） 10:25 🌐 WebGen-R1: Incentivizing Large Language Models to Generate Functional and Aesthetic Websites with Reinforcement Learning（WebGen-R1：利用强化学习激励大型语言模型生成功能性与美观性网站） 11:14 🔍 Trust but Verify: Introducing DAVinCI -- A Framework for Dual Attribution and Verification in Claim Inference for Language Models（信任但验证：引入DAVinCI——一种用于语言模型声明推理的双重归因与验证框架） 12:11 🔍 Explainable Disentangled Representation Learning for Generalizable Authorship Attribution in the Era of Generative AI（面向生成式AI时代的可解释解耦表示学习用于泛化作者归因）【关注我们】您还可以在以下平台找到我们，获得播客内容以外更多信息小红书: AI速递

91

2026.04.23 | LLaDA2.0统一多模态；未来经验外挂RL

【目录】本期的 15 篇论文如下： 00:28 🔮 LLaDA2.0-Uni: Unifying Multimodal Understanding and Generation with Diffusion Large Language Model（LLaDA2.0-Uni：基于扩散大语言模型统一多模态理解与生成） 01:17 🔮 Near-Future Policy Optimization（近未来策略优化） 02:07 🤖 DR-Venus: Towards Frontier Edge-Scale Deep Research Agents with Only 10K Open Data（DR-Venus：仅用1万条开放数据迈向前沿边缘规模深度研究代理） 02:53 🤖 DeVI: Physics-based Dexterous Human-Object Interaction via Synthetic Video Imitation（DeVI：基于物理的灵巧人机交互通过合成视频模仿） 03:42 🎭 Reward Hacking in the Era of Large Models: Mechanisms, Emergent Misalignment, Challenges（大模型时代的奖励黑客：机制、涌现性失调与挑战） 04:36 🧠 Exploring Spatial Intelligence from a Generative Perspective（从生成视角探索空间智能） 05:21 🤖 A Self-Evolving Framework for Efficient Terminal Agents via Observational Context Compression（一种通过观测上下文压缩实现高效终端代理的自演化框架） 06:18 🎤 WavAlign: Enhancing Intelligence and Expressiveness in Spoken Dialogue Models via Adaptive Hybrid Post-Training（WavAlign：通过自适应混合后训练增强口语对话模型的智能与表现力） 07:06 🤖 SWE-chat: Coding Agent Interactions From Real Users in the Wild（SWE-chat：来自真实用户的编码智能体交互记录） 07:53 🤖 Cortex 2.0: Grounding World Models in Real-World Industrial Deployment（Cortex 2.0：在现实工业部署中基于世界模型进行规划） 08:36 🧠 Convergent Evolution: How Different Language Models Learn Similar Number Representations（趋同演化：不同语言模型如何学习相似的数值表示） 09:21 🤝 SAVOIR: Learning Social Savoir-Faire via Shapley-based Reward Attribution（SAVOIR：通过沙普利值奖励归因学习社交智慧） 09:57 🎬 ReImagine: Rethinking Controllable High-Quality Human Video Generation via Image-First Synthesis（ReImagine：通过图像优先合成重新思考可控的高质量人类视频生成） 10:34 🔧 Visual Reasoning through Tool-supervised Reinforcement Learning（通过工具监督强化学习实现视觉推理） 11:09 🤖 AI scientists produce results without reasoning scientifically（AI科学家产生结果但未进行科学推理）【关注我们】您还可以在以下平台找到我们，获得播客内容以外更多信息小红书: AI速递

12分钟

2026.04.22 | 虚拟试衣3.9秒高清生成；协同生成HOI视频物理一致

【目录】本期的 15 篇论文如下： 00:23 👗 Tstars-Tryon 1.0: Robust and Realistic Virtual Try-On for Diverse Fashion Items（Tstars-Tryon 1.0：面向多样化时尚商品的鲁棒且逼真的虚拟试穿系统） 01:05 🎬 CoInteract: Physically-Consistent Human-Object Interaction Video Synthesis via Spatially-Structured Co-Generation（CoInteract：通过空间结构化协同生成实现物理一致的人-物交互视频合成） 01:58 🤖 AgentSPEX: An Agent SPecification and EXecution Language（AgentSPEX：一种智能体规范与执行语言） 02:51 📐 AnyRecon: Arbitrary-View 3D Reconstruction with Video Diffusion Model（AnyRecon：基于视频扩散模型的任意视角三维重建） 03:33 🚀 TEMPO: Scaling Test-time Training for Large Reasoning Models（TEMPO：扩展大型推理模型的测试时训练规模） 04:26 🎮 PlayCoder: Making LLM-Generated GUI Code Playable（PlayCoder：让LLM生成的GUI代码可玩） 05:08 🕶 ShadowPEFT: Shadow Network for Parameter-Efficient Fine-Tuning（ShadowPEFT：用于参数高效微调的影子网络） 05:58 🤖 Chat2Workflow: A Benchmark for Generating Executable Visual Workflows with Natural Language（Chat2Workflow：基于自然语言生成可执行视觉工作流的基准） 06:44 ⚖ AJ-Bench: Benchmarking Agent-as-a-Judge for Environment-Aware Evaluation（AJ-Bench：面向环境感知评估的Agent-as-a-Judge基准测试） 07:31 🔄 Dual-View Training for Instruction-Following Information Retrieval（面向指令跟随信息检索的双视图训练） 08:41 🔍 Code-Switching Information Retrieval: Benchmarks, Analysis, and the Limits of Current Retrievers（代码转换信息检索：基准测试、分析与当前检索系统的局限） 09:20 🔗 Understanding and Enforcing Weight Disentanglement in Task Arithmetic（理解与强制任务算术中的权重解耦） 10:00 ⚡ Speculative Decoding for Autoregressive Video Generation（用于自回归视频生成的推测解码） 11:01 🧠 Target-Oriented Pretraining Data Selection via Neuron-Activated Graph（基于神经元激活图的目标导向预训练数据选择） 11:41 🧩 UniMesh: Unifying 3D Mesh Understanding and Generation（UniMesh：统一三维网格理解与生成）【关注我们】您还可以在以下平台找到我们，获得播客内容以外更多信息小红书: AI速递

12分钟

2026.04.21 | 一步听懂句子出图；单步潜码搞定驾驶推理

【目录】本期的 15 篇论文如下： 00:24 🚀 Extending One-Step Image Generation from Class Labels to Text via Discriminative Text Representation（从类别标签到文本：通过判别性文本表征扩展一步图像生成） 01:08 🚗 OneVL: One-Step Latent Reasoning and Planning with Vision-Language Explanation（OneVL：基于视觉语言解释的单步潜在推理与规划） 01:54 🤖 Agent-World: Scaling Real-World Environment Synthesis for Evolving General Agent Intelligence（Agent-World：通过可扩展环境合成推进通用智能体智能的自我演化训练场） 02:41 🎮 OpenGame: Open Agentic Coding for Games（OpenGame：面向游戏开发的开放式智能体编码框架） 03:48 🤖 MultiWorld: Scalable Multi-Agent Multi-View Video World Models（MultiWorld：可扩展的多智能体多视角视频世界模型） 04:44 🎬 EasyVideoR1: Easier RL for Video Understanding（EasyVideoR1：面向视频理解的简易强化学习框架） 05:42 🧭 WebCompass: Towards Multimodal Web Coding Evaluation for Code Language Models（WebCompass：面向代码语言模型的多模态网页编码评估） 06:46 🧠 GFT: From Imitation to Reward Fine-Tuning with Unbiased Group Advantages and Dynamic Coefficient Rectification（GFT：从模仿到奖励微调——基于无偏群体优势与动态系数校正） 07:34 🧠 SkillFlow:Benchmarking Lifelong Skill Discovery and Evolution for Autonomous Agents（SkillFlow：面向自主智能体的终身技能发现与演化基准测试） 08:22 🧩 Crowded in B-Space: Calibrating Shared Directions for LoRA Merging（B空间拥挤：为LoRA合并校准共享方向） 09:13 🧠 When Can LLMs Learn to Reason with Weak Supervision?（大型语言模型何时能在弱监督下学会推理？） 10:04 🤖 ClawEnvKit: Automatic Environment Generation for Claw-Like Agents（ClawEnvKit：面向爪状智能体的自动环境生成系统） 10:52 🎬 OmniScript: Towards Audio-Visual Script Generation for Long-Form Cinematic Video（OmniScript：面向长篇幅影视视频的视听脚本生成） 11:35 🧬 Training LLM Agents for Spontaneous, Reward-Free Self-Evolution via World Knowledge Exploration（通过世界知识探索训练LLM智能体实现自发的、无奖励的自我进化） 12:26 🧮 MathNet: a Global Multimodal Benchmark for Mathematical Reasoning and Retrieval（MathNet：一个用于数学推理与检索的全球多模态基准）【关注我们】您还可以在以下平台找到我们，获得播客内容以外更多信息小红书: AI速递

2026.04.20 | DPM零训画质糖；两位翻转毁模型

【目录】本期的 15 篇论文如下： 00:20 🔍 Elucidating the SNR-t Bias of Diffusion Probabilistic Models（阐明扩散概率模型的信噪比-时间步偏差） 01:00 💥 Maximal Brain Damage Without Data or Optimization: Disrupting Neural Networks via Sign-Bit Flips（无需数据或优化的最大脑损伤：通过符号位翻转破坏神经网络） 01:45 🧠 PersonaVLM: Long-Term Personalized Multimodal LLMs（PersonaVLM：面向长期个性化的多模态大语言模型） 02:56 🧩 Web Retrieval-Aware Chunking (W-RAC) for Efficient and Cost-Effective Retrieval-Augmented Generation Systems（面向高效且经济高效的检索增强生成系统的Web检索感知分块（W-RAC）） 03:40 ✂ Cut Your Losses! Learning to Prune Paths Early for Efficient Parallel Reasoning（削减你的损失！学习早期剪枝路径以实现高效并行推理） 04:32 🚀 Qwen3.5-Omni Technical Report（Qwen3.5-Omni技术报告） 05:17 🧱 Repurposing 3D Generative Model for Autoregressive Layout Generation（重新利用三维生成模型进行自回归布局生成） 06:02 🔍 (1D) Ordered Tokens Enable Efficient Test-Time Search（（一维）有序分词实现高效的测试时搜索） 06:55 📈 QuantCode-Bench: A Benchmark for Evaluating the Ability of Large Language Models to Generate Executable Algorithmic Trading Strategies（QuantCode-Bench：评估大语言模型生成可执行算法交易策略能力的基准） 07:36 🧠 Learning Adaptive Reasoning Paths for Efficient Visual Reasoning（学习自适应推理路径以实现高效视觉推理） 08:29 🔍 TIPSv2: Advancing Vision-Language Pretraining with Enhanced Patch-Text Alignment（TIPSv2：通过增强的补丁-文本对齐推进视觉-语言预训练） 09:33 💡 Can Large Language Models Reinvent Foundational Algorithms?（大型语言模型能否重新发明基础算法？） 10:17 📊 GTA-2: Benchmarking General Tool Agents from Atomic Tool-Use to Open-Ended Workflows（GTA-2：从原子工具使用到开放式工作流的通用工具智能体基准测试） 11:10 ⚡ AccelOpt: A Self-Improving LLM Agentic System for AI Accelerator Kernel Optimization（AccelOpt：一种用于AI加速器内核优化的自我改进型LLM智能体系统） 11:55 🎭 Hierarchical Codec Diffusion for Video-to-Speech Generation（基于分层编解码扩散的视频到语音生成）【关注我们】您还可以在以下平台找到我们，获得播客内容以外更多信息小红书: AI速递