HuggingFace 每日AI论文速递 - 节目列表

2026.05.01 | Eywa让LLM牵手领域模型提效30%;视觉生成五级跃迁仍卡第三关

HuggingFace 每日AI论文速递

【目录】 本期的 15 篇论文如下: [00:25] 🧠 Heterogeneous Scientific Foundation Model Collaboration(异构科学基础模型协作) [01:24] 🌍 Visual Generation in the New Era: An Evolution from Atomic Mapping to Agentic World Modeling(新时代的视觉生成:从原子映射到智能体世界建模的演进) [02:04] 🧬 Co-Evolving Policy Distillation(共同演化策略蒸馏) [02:47] 🤖 ExoActor: Exocentric Video Generation as Generalizable Interactive Humanoid Control(ExoActor:外视点视频生成作为可泛化的交互式人形机器人控制) [03:38] 🚀 Efficient Training on Multiple Consumer GPUs with RoundPipe(在多块消费级GPU上使用RoundPipe进行高效训练) [04:17] 🧠 Claw-Eval-Live: A Live Agent Benchmark for Evolving Real-World Workflows(Claw-Eval-Live:一个面向不断演变的真实世界工作流的实时智能体基准测试) [05:08] 🎨 Leveraging Verifier-Based Reinforcement Learning in Image Editing(利用基于验证器的强化学习进行图像编辑) [06:18] 📏 Length Value Model: Scalable Value Pretraining for Token-Level Length Modeling(长度价值模型:面向令牌级长度建模的可扩展价值预训练) [07:15] 🔬 Intern-Atlas: A Methodological Evolution Graph as Research Infrastructure for AI Scientists(Intern-Atlas:作为AI科学家研究基础设施的方法演化图) [08:31] 🌐 InteractWeb-Bench: Can Multimodal Agent Escape Blind Execution in Interactive Website Generation?(InteractWeb-Bench:多模态智能体能否在交互式网站生成中摆脱盲目执行?) [09:15] 🎨 Representation Fréchet Loss for Visual Generation(用于视觉生成的表示空间弗雷歇损失) [10:05] 🖥 Synthetic Computers at Scale for Long-Horizon Productivity Simulation(面向长周期生产力模拟的大规模合成计算机) [10:52] 🧠 Compliance versus Sensibility: On the Reasoning Controllability in Large Language Models(合规性与敏感性:大型语言模型中的推理可控性研究) [11:25] 🤖 The Last Human-Written Paper: Agent-Native Research Artifacts(最后一篇人类撰写的论文:智能体原生研究工件) [12:14] 💃 MoCapAnything V2: End-to-End Motion Capture for Arbitrary Skeletons(MoCapAnything V2:面向任意骨骼的端到端动作捕捉) 【关注我们】 您还可以在以下平台找到我们,获得播客内容以外更多信息 小红书: AI速递

13分钟
67
1周前

2026.04.30 | GLM-5V一锅端训多模态;潜在蒸馏采样省样本

HuggingFace 每日AI论文速递

【目录】 本期的 11 篇论文如下: [00:22] 🤖 GLM-5V-Turbo: Toward a Native Foundation Model for Multimodal Agents(GLM-5V-Turbo:迈向多模态智能体的原生基础模型) [01:26] 🔬 Large Language Models Explore by Latent Distilling(大型语言模型通过潜在蒸馏进行探索) [02:16] 🌊 Turning the TIDE: Cross-Architecture Distillation for Diffusion Large Language Models(扭转潮流:面向扩散大语言模型的跨架构蒸馏) [03:02] 🦾 ClawGym: A Scalable Framework for Building Effective Claw Agents(ClawGym:一个构建高效Claw智能体的可扩展框架) [03:49] 🤖 RADIO-ViPE: Online Tightly Coupled Multi-Modal Fusion for Open-Vocabulary Semantic SLAM in Dynamic Environments(RADIO-ViPE:面向动态环境中开放词汇语义SLAM的在线紧耦合多模态融合) [04:35] 🧩 Diffusion Templates: A Unified Plugin Framework for Controllable Diffusion(扩散模板:一种用于可控扩散的统一插件框架) [05:20] 🚀 Accelerating RL Post-Training Rollouts via System-Integrated Speculative Decoding(通过系统集成的推测解码加速强化学习后训练中的自回归生成) [06:08] 🌍 Unified 4D World Action Modeling from Video Priors with Asynchronous Denoising(基于异步去噪的视频先验的统一4D世界动作建模) [07:02] 💬 A Survey on LLM-based Conversational User Simulation(基于大语言模型的对话式用户模拟综述) [07:55] 👗 FASH-iCNN: Making Editorial Fashion Identity Inspectable Through Multimodal CNN Probing(FASH-iCNN:通过多模态CNN探针使时尚编辑身份可审查) [08:43] 🧩 Probing Visual Planning in Image Editing Models(探究图像编辑模型中的视觉规划能力) 【关注我们】 您还可以在以下平台找到我们,获得播客内容以外更多信息 小红书: AI速递

9分钟
52
1周前

2026.04.29 | 递归多智能体套娃提速;数据编程Git式自改进

HuggingFace 每日AI论文速递

【目录】 本期的 15 篇论文如下: [00:25] 🔄 Recursive Multi-Agent Systems(递归多智能体系统) [01:01] 🔧 Programming with Data: Test-Driven Data Engineering for Self-Improving LLMs from Raw Corpora(数据编程:面向自改进大语言模型从原始语料库进行测试驱动数据工程) [01:55] 📊 DV-World: Benchmarking Data Visualization Agents in Real-World Scenarios(DV-World:在真实世界场景中评估数据可视化智能体的基准) [02:36] 🔬 AutoResearchBench: Benchmarking AI Agents on Complex Scientific Literature Discovery(AutoResearchBench:基于复杂科学文献发现的AI智能体基准测试) [03:23] 🖼 Meta-CoT: Enhancing Granularity and Generalization in Image Editing(元链式思维:增强图像编辑的粒度与泛化能力) [04:07] 🎨 Refinement via Regeneration: Enlarging Modification Space Boosts Image Refinement in Unified Multimodal Models(通过重生成进行精炼:扩大修改空间提升统一多模态模型中的图像精炼效果) [05:03] 🎥 Mutual Forcing: Dual-Mode Self-Evolution for Fast Autoregressive Audio-Video Character Generation(相互强迫:用于快速自回归音视频角色生成的双模式自演化) [05:46] 🎧 Step-Audio-R1.5 Technical Report(Step-Audio-R1.5 技术报告) [06:26] 🎬 Co-Director: Agentic Generative Video Storytelling(联合导演:基于智能体的生成式视频故事讲述) [07:13] 🖥 Toward Scalable Terminal Task Synthesis via Skill Graphs(面向可扩展终端任务合成的技能图方法) [07:57] 🎓 TCOD: Exploring Temporal Curriculum in On-Policy Distillation for Multi-turn Autonomous Agents(TCOD:面向多轮自主智能体的在策略蒸馏中的时序课程探索) [08:53] 🛡 BARRED: Synthetic Training of Custom Policy Guardrails via Asymmetric Debate(BARRED:通过非对称辩论进行自定义策略护栏的合成训练) [09:36] 🎓 MAIC-UI: Making Interactive Courseware with Generative UI(MAIC-UI:利用生成式用户界面制作交互式课件) [10:35] 🎨 V-GRPO: Online Reinforcement Learning for Denoising Generative Models Is Easier than You Think(V-GRPO:去噪生成模型的在线强化学习比你想象的要简单) [11:15] 🏃 IAM: Identity-Aware Human Motion and Shape Joint Generation(身份感知的人体运动与形状联合生成) 【关注我们】 您还可以在以下平台找到我们,获得播客内容以外更多信息 小红书: AI速递

12分钟
88
1周前

2026.04.28 | 强化学习逼出几何一致视频;AI公司乐高式组队降本提效

HuggingFace 每日AI论文速递

【目录】 本期的 15 篇论文如下: [00:24] 🌍 World-R1: Reinforcing 3D Constraints for Text-to-Video Generation(世界-R1:通过强化学习为文本到视频生成注入3D约束) [01:29] 🏢 From Skills to Talent: Organising Heterogeneous Agents as a Real-World Company(从技能到人才:将异构智能体组织为现实世界公司) [02:26] 🧠 ReVSI: Rebuilding Visual Spatial Intelligence Evaluation for Accurate Assessment of VLM 3D Reasoning(重建视觉空间智能评估:精准评估VLM三维推理能力) [03:23] 🛡 Vision-Language-Action Safety: Threats, Challenges, Evaluations, and Mechanisms(视觉-语言-动作安全:威胁、挑战、评估与机制) [04:12] 🖼 Tuna-2: Pixel Embeddings Beat Vision Encoders for Multimodal Understanding and Generation(Tuna-2:像素嵌入在多模态理解与生成中击败视觉编码器) [05:02] 🤖 ClawMark: A Living-World Benchmark for Multi-Turn, Multi-Day, Multimodal Coworker Agents(ClawMark:面向多轮、多日、多模态协作者智能体的现实世界基准测试) [06:20] ✍ SketchVLM: Vision language models can annotate images to explain thoughts and guide users(SketchVLM:视觉语言模型可以通过图像标注来解释思维并引导用户) [07:17] 🔬 Rewarding the Scientific Process: Process-Level Reward Modeling for Agentic Data Analysis(奖励科学过程:面向智能体数据分析的过程级奖励建模) [08:24] ⚖ Taming Actor-Observer Asymmetry in Agents via Dialectical Alignment(通过辩证对齐驯服智能体中的行动者-观察者不对称性) [09:20] 🔀 Efficient Agent Evaluation via Diversity-Guided User Simulation(通过多样性引导的用户模拟实现高效智能体评估) [10:02] ⚡ For-Value: Efficient Forward-Only Data Valuation for finetuning LLMs and VLMs(For-Value:面向微调大语言模型和视觉语言模型的高效前向数据估值方法) [11:04] 🎬 OmniShotCut: Holistic Relational Shot Boundary Detection with Shot-Query Transformer(全镜头剪切:基于镜头查询Transformer的整体关系型镜头边界检测) [12:03] 📷 UniGeo: Unifying Geometric Guidance for Camera-Controllable Image Editing via Video Models(UniGeo:通过视频模型实现相机可控图像编辑的统一几何引导) [12:49] 📄 TexOCR: Advancing Document OCR Models for Compilable Page-to-LaTeX Reconstruction(TexOCR:面向可编译页面到LaTeX重建的文档OCR模型进展) [13:56] 🔄 How Much Is One Recurrence Worth? Iso-Depth Scaling Laws for Looped Language Models(一次循环值多少?循环语言模型的等深度缩放定律) 【关注我们】 您还可以在以下平台找到我们,获得播客内容以外更多信息 小红书: AI速递

15分钟
99+
1周前

2026.04.27 | 坐标系统摄世界模型;扩散重建提速临床CT

HuggingFace 每日AI论文速递

【目录】 本期的 11 篇论文如下: [00:31] 🌍 Agentic World Modeling: Foundations, Capabilities, Laws, and Beyond(智能体世界建模:基础、能力、法则及其超越) [01:24] 🩻 DiffNR: Diffusion-Enhanced Neural Representation Optimization for Sparse-View 3D Tomographic Reconstruction(DiffNR:基于扩散增强的神经表示优化用于稀疏视角三维断层重建) [02:10] 🛡 LLM Safety From Within: Detecting Harmful Content with Internal Representations(从内部保障大语言模型安全:利用内部表征检测有害内容) [02:50] 🎬 FlowAnchor: Stabilizing the Editing Signal for Inversion-Free Video Editing(FlowAnchor:稳定无反转视频编辑中的编辑信号) [03:34] 📚 Contexts are Never Long Enough: Structured Reasoning for Scalable Question Answering over Long Document Sets(上下文永远不够长:面向长文档集的可扩展问答的结构化推理) [04:23] 🔍 AgentSearchBench: A Benchmark for AI Agent Search in the Wild(AgentSearchBench:野外AI智能体搜索基准测试) [05:03] 🎬 Building a Precise Video Language with Human-AI Oversight(构建具有人机监督的精准视频语言) [06:11] 🤖 dWorldEval: Scalable Robotic Policy Evaluation via Discrete Diffusion World Model(dWorldEval:基于离散扩散世界模型的可扩展机器人策略评估) [06:52] 🔍 Sessa: Selective State Space Attention(Sessa:选择性状态空间注意力) [07:32] 🌾 AgriIR: A Scalable Framework for Domain-Specific Knowledge Retrieval(AgriIR:一种面向领域特定知识检索的可扩展框架) [08:19] 🔦 Learning Evidence Highlighting for Frozen LLMs(学习为冻结的大语言模型高亮证据) 【关注我们】 您还可以在以下平台找到我们,获得播客内容以外更多信息 小红书: AI速递

9分钟
88
1周前

2026.04.24 | LLaTiSA四级闯关教模型读时序;WorldMark统一基准测视频世界模型

HuggingFace 每日AI论文速递

【目录】 本期的 15 篇论文如下: 00:23 📈 LLaTiSA: Towards Difficulty-Stratified Time Series Reasoning from Visual Perception to Semantics(LLaTiSA:从视觉感知到语义的难度分层时间序列推理) 01:11 🎮 WorldMark: A Unified Benchmark Suite for Interactive Video World Models(WorldMark:交互式视频世界模型的统一基准套件) 01:54 🤖 UniT: Toward a Unified Physical Language for Human-to-Humanoid Policy Learning and World Modeling(UniT:面向人形机器人策略学习与世界建模的统一物理语言) 02:44 🎨 StyleID: A Perception-Aware Dataset and Metric for Stylization-Agnostic Facial Identity Recognition(StyleID:一种面向风格化无关面部身份识别的感知感知数据集与度量) 03:56 ⏩ Seeing Fast and Slow: Learning the Flow of Time in Videos(快慢视觉:学习视频中的时间流动) 04:39 ⚡ TingIS: Real-time Risk Event Discovery from Noisy Customer Incidents at Enterprise Scale(TingIS:企业级规模下从嘈杂客户事件中实时发现风险事件) 05:16 🧠 Hybrid Policy Distillation for LLMs(面向大语言模型的混合策略蒸馏) 05:48 🧠 Co-Evolving LLM Decision and Skill Bank Agents for Long-Horizon Tasks(面向长时域任务的LLM决策与技能库智能体协同进化) 06:44 🤖 VLAA-GUI: Knowing When to Stop, Recover, and Search, A Modular Framework for GUI Automation(VLAA-GUI:一种用于GUI自动化的模块化框架——知晓何时停止、恢复与搜索) 07:43 🧩 Context Unrolling in Omni Models(全模态模型中的上下文展开) 08:31 🎨 EditCrafter: Tuning-free High-Resolution Image Editing via Pretrained Diffusion Model(EditCrafter:基于预训练扩散模型的无调优高分辨率图像编辑) 09:34 🔗 UniGenDet: A Unified Generative-Discriminative Framework for Co-Evolutionary Image Generation and Generated Image Detection(UniGenDet:一种用于协同进化图像生成与生成图像检测的统一生成-判别框架) 10:25 🌐 WebGen-R1: Incentivizing Large Language Models to Generate Functional and Aesthetic Websites with Reinforcement Learning(WebGen-R1:利用强化学习激励大型语言模型生成功能性与美观性网站) 11:14 🔍 Trust but Verify: Introducing DAVinCI -- A Framework for Dual Attribution and Verification in Claim Inference for Language Models(信任但验证:引入DAVinCI——一种用于语言模型声明推理的双重归因与验证框架) 12:11 🔍 Explainable Disentangled Representation Learning for Generalizable Authorship Attribution in the Era of Generative AI(面向生成式AI时代的可解释解耦表示学习用于泛化作者归因) 【关注我们】 您还可以在以下平台找到我们,获得播客内容以外更多信息 小红书: AI速递

13分钟
91
2周前

2026.04.23 | LLaDA2.0统一多模态;未来经验外挂RL

HuggingFace 每日AI论文速递

【目录】 本期的 15 篇论文如下: 00:28 🔮 LLaDA2.0-Uni: Unifying Multimodal Understanding and Generation with Diffusion Large Language Model(LLaDA2.0-Uni:基于扩散大语言模型统一多模态理解与生成) 01:17 🔮 Near-Future Policy Optimization(近未来策略优化) 02:07 🤖 DR-Venus: Towards Frontier Edge-Scale Deep Research Agents with Only 10K Open Data(DR-Venus:仅用1万条开放数据迈向前沿边缘规模深度研究代理) 02:53 🤖 DeVI: Physics-based Dexterous Human-Object Interaction via Synthetic Video Imitation(DeVI:基于物理的灵巧人机交互通过合成视频模仿) 03:42 🎭 Reward Hacking in the Era of Large Models: Mechanisms, Emergent Misalignment, Challenges(大模型时代的奖励黑客:机制、涌现性失调与挑战) 04:36 🧠 Exploring Spatial Intelligence from a Generative Perspective(从生成视角探索空间智能) 05:21 🤖 A Self-Evolving Framework for Efficient Terminal Agents via Observational Context Compression(一种通过观测上下文压缩实现高效终端代理的自演化框架) 06:18 🎤 WavAlign: Enhancing Intelligence and Expressiveness in Spoken Dialogue Models via Adaptive Hybrid Post-Training(WavAlign:通过自适应混合后训练增强口语对话模型的智能与表现力) 07:06 🤖 SWE-chat: Coding Agent Interactions From Real Users in the Wild(SWE-chat:来自真实用户的编码智能体交互记录) 07:53 🤖 Cortex 2.0: Grounding World Models in Real-World Industrial Deployment(Cortex 2.0:在现实工业部署中基于世界模型进行规划) 08:36 🧠 Convergent Evolution: How Different Language Models Learn Similar Number Representations(趋同演化:不同语言模型如何学习相似的数值表示) 09:21 🤝 SAVOIR: Learning Social Savoir-Faire via Shapley-based Reward Attribution(SAVOIR:通过沙普利值奖励归因学习社交智慧) 09:57 🎬 ReImagine: Rethinking Controllable High-Quality Human Video Generation via Image-First Synthesis(ReImagine:通过图像优先合成重新思考可控的高质量人类视频生成) 10:34 🔧 Visual Reasoning through Tool-supervised Reinforcement Learning(通过工具监督强化学习实现视觉推理) 11:09 🤖 AI scientists produce results without reasoning scientifically(AI科学家产生结果但未进行科学推理) 【关注我们】 您还可以在以下平台找到我们,获得播客内容以外更多信息 小红书: AI速递

12分钟
99+
2周前

2026.04.22 | 虚拟试衣3.9秒高清生成;协同生成HOI视频物理一致

HuggingFace 每日AI论文速递

【目录】 本期的 15 篇论文如下: 00:23 👗 Tstars-Tryon 1.0: Robust and Realistic Virtual Try-On for Diverse Fashion Items(Tstars-Tryon 1.0:面向多样化时尚商品的鲁棒且逼真的虚拟试穿系统) 01:05 🎬 CoInteract: Physically-Consistent Human-Object Interaction Video Synthesis via Spatially-Structured Co-Generation(CoInteract:通过空间结构化协同生成实现物理一致的人-物交互视频合成) 01:58 🤖 AgentSPEX: An Agent SPecification and EXecution Language(AgentSPEX:一种智能体规范与执行语言) 02:51 📐 AnyRecon: Arbitrary-View 3D Reconstruction with Video Diffusion Model(AnyRecon:基于视频扩散模型的任意视角三维重建) 03:33 🚀 TEMPO: Scaling Test-time Training for Large Reasoning Models(TEMPO:扩展大型推理模型的测试时训练规模) 04:26 🎮 PlayCoder: Making LLM-Generated GUI Code Playable(PlayCoder:让LLM生成的GUI代码可玩) 05:08 🕶 ShadowPEFT: Shadow Network for Parameter-Efficient Fine-Tuning(ShadowPEFT:用于参数高效微调的影子网络) 05:58 🤖 Chat2Workflow: A Benchmark for Generating Executable Visual Workflows with Natural Language(Chat2Workflow:基于自然语言生成可执行视觉工作流的基准) 06:44 ⚖ AJ-Bench: Benchmarking Agent-as-a-Judge for Environment-Aware Evaluation(AJ-Bench:面向环境感知评估的Agent-as-a-Judge基准测试) 07:31 🔄 Dual-View Training for Instruction-Following Information Retrieval(面向指令跟随信息检索的双视图训练) 08:41 🔍 Code-Switching Information Retrieval: Benchmarks, Analysis, and the Limits of Current Retrievers(代码转换信息检索:基准测试、分析与当前检索系统的局限) 09:20 🔗 Understanding and Enforcing Weight Disentanglement in Task Arithmetic(理解与强制任务算术中的权重解耦) 10:00 ⚡ Speculative Decoding for Autoregressive Video Generation(用于自回归视频生成的推测解码) 11:01 🧠 Target-Oriented Pretraining Data Selection via Neuron-Activated Graph(基于神经元激活图的目标导向预训练数据选择) 11:41 🧩 UniMesh: Unifying 3D Mesh Understanding and Generation(UniMesh:统一三维网格理解与生成) 【关注我们】 您还可以在以下平台找到我们,获得播客内容以外更多信息 小红书: AI速递

12分钟
99+
2周前

2026.04.21 | 一步听懂句子出图;单步潜码搞定驾驶推理

HuggingFace 每日AI论文速递

【目录】 本期的 15 篇论文如下: 00:24 🚀 Extending One-Step Image Generation from Class Labels to Text via Discriminative Text Representation(从类别标签到文本:通过判别性文本表征扩展一步图像生成) 01:08 🚗 OneVL: One-Step Latent Reasoning and Planning with Vision-Language Explanation(OneVL:基于视觉语言解释的单步潜在推理与规划) 01:54 🤖 Agent-World: Scaling Real-World Environment Synthesis for Evolving General Agent Intelligence(Agent-World:通过可扩展环境合成推进通用智能体智能的自我演化训练场) 02:41 🎮 OpenGame: Open Agentic Coding for Games(OpenGame:面向游戏开发的开放式智能体编码框架) 03:48 🤖 MultiWorld: Scalable Multi-Agent Multi-View Video World Models(MultiWorld:可扩展的多智能体多视角视频世界模型) 04:44 🎬 EasyVideoR1: Easier RL for Video Understanding(EasyVideoR1:面向视频理解的简易强化学习框架) 05:42 🧭 WebCompass: Towards Multimodal Web Coding Evaluation for Code Language Models(WebCompass:面向代码语言模型的多模态网页编码评估) 06:46 🧠 GFT: From Imitation to Reward Fine-Tuning with Unbiased Group Advantages and Dynamic Coefficient Rectification(GFT:从模仿到奖励微调——基于无偏群体优势与动态系数校正) 07:34 🧠 SkillFlow:Benchmarking Lifelong Skill Discovery and Evolution for Autonomous Agents(SkillFlow:面向自主智能体的终身技能发现与演化基准测试) 08:22 🧩 Crowded in B-Space: Calibrating Shared Directions for LoRA Merging(B空间拥挤:为LoRA合并校准共享方向) 09:13 🧠 When Can LLMs Learn to Reason with Weak Supervision?(大型语言模型何时能在弱监督下学会推理?) 10:04 🤖 ClawEnvKit: Automatic Environment Generation for Claw-Like Agents(ClawEnvKit:面向爪状智能体的自动环境生成系统) 10:52 🎬 OmniScript: Towards Audio-Visual Script Generation for Long-Form Cinematic Video(OmniScript:面向长篇幅影视视频的视听脚本生成) 11:35 🧬 Training LLM Agents for Spontaneous, Reward-Free Self-Evolution via World Knowledge Exploration(通过世界知识探索训练LLM智能体实现自发的、无奖励的自我进化) 12:26 🧮 MathNet: a Global Multimodal Benchmark for Mathematical Reasoning and Retrieval(MathNet:一个用于数学推理与检索的全球多模态基准) 【关注我们】 您还可以在以下平台找到我们,获得播客内容以外更多信息 小红书: AI速递

13分钟
99+
2周前

2026.04.20 | DPM零训画质糖;两位翻转毁模型

HuggingFace 每日AI论文速递

【目录】 本期的 15 篇论文如下: 00:20 🔍 Elucidating the SNR-t Bias of Diffusion Probabilistic Models(阐明扩散概率模型的信噪比-时间步偏差) 01:00 💥 Maximal Brain Damage Without Data or Optimization: Disrupting Neural Networks via Sign-Bit Flips(无需数据或优化的最大脑损伤:通过符号位翻转破坏神经网络) 01:45 🧠 PersonaVLM: Long-Term Personalized Multimodal LLMs(PersonaVLM:面向长期个性化的多模态大语言模型) 02:56 🧩 Web Retrieval-Aware Chunking (W-RAC) for Efficient and Cost-Effective Retrieval-Augmented Generation Systems(面向高效且经济高效的检索增强生成系统的Web检索感知分块(W-RAC)) 03:40 ✂ Cut Your Losses! Learning to Prune Paths Early for Efficient Parallel Reasoning(削减你的损失!学习早期剪枝路径以实现高效并行推理) 04:32 🚀 Qwen3.5-Omni Technical Report(Qwen3.5-Omni技术报告) 05:17 🧱 Repurposing 3D Generative Model for Autoregressive Layout Generation(重新利用三维生成模型进行自回归布局生成) 06:02 🔍 (1D) Ordered Tokens Enable Efficient Test-Time Search((一维)有序分词实现高效的测试时搜索) 06:55 📈 QuantCode-Bench: A Benchmark for Evaluating the Ability of Large Language Models to Generate Executable Algorithmic Trading Strategies(QuantCode-Bench:评估大语言模型生成可执行算法交易策略能力的基准) 07:36 🧠 Learning Adaptive Reasoning Paths for Efficient Visual Reasoning(学习自适应推理路径以实现高效视觉推理) 08:29 🔍 TIPSv2: Advancing Vision-Language Pretraining with Enhanced Patch-Text Alignment(TIPSv2:通过增强的补丁-文本对齐推进视觉-语言预训练) 09:33 💡 Can Large Language Models Reinvent Foundational Algorithms?(大型语言模型能否重新发明基础算法?) 10:17 📊 GTA-2: Benchmarking General Tool Agents from Atomic Tool-Use to Open-Ended Workflows(GTA-2:从原子工具使用到开放式工作流的通用工具智能体基准测试) 11:10 ⚡ AccelOpt: A Self-Improving LLM Agentic System for AI Accelerator Kernel Optimization(AccelOpt:一种用于AI加速器内核优化的自我改进型LLM智能体系统) 11:55 🎭 Hierarchical Codec Diffusion for Video-to-Speech Generation(基于分层编解码扩散的视频到语音生成) 【关注我们】 您还可以在以下平台找到我们,获得播客内容以外更多信息 小红书: AI速递

13分钟
99+
2周前

加入我们的 Discord

与播客爱好者一起交流

立即加入

扫描微信二维码

添加微信好友,获取更多播客资讯

微信二维码

播放列表

自动播放下一个

播放列表还是空的

去找些喜欢的节目添加进来吧