节目列表: HuggingFace 每日AI论文速递 - EarsOnMe | 发现和收听来自小宇宙的热门播客

2025.04.25 | 开源模型超越闭源；新型评估指标提升生成质量。

本期的 15 篇论文如下： [00:24] 🖼 Step1X-Edit: A Practical Framework for General Image Editing（Step1X-Edit：一个通用的图像编辑实用框架） [01:05] 🖼 RefVNLI: Towards Scalable Evaluation of Subject-driven Text-to-image Generation（RefVNLI：面向主体驱动的文本到图像生成的可扩展评估） [01:48] 🤖 Paper2Code: Automating Code Generation from Scientific Papers in Machine Learning（Paper2Code：从机器学习科学论文中自动生成代码） [02:22] 🖼 Breaking the Modality Barrier: Universal Embedding Learning with Multimodal LLMs（打破模态壁垒：基于多模态大型语言模型的通用嵌入学习） [03:02] 🧠 Perspective-Aware Reasoning in Vision-Language Models via Mental Imagery Simulation（基于心智图像模拟的视觉-语言模型中的视角感知推理） [03:42] ⚖ QuaDMix: Quality-Diversity Balanced Data Selection for Efficient LLM Pretraining（QuaDMix：面向高效LLM预训练的质量-多样性平衡数据选择） [04:19] 🖼 Token-Shuffle: Towards High-Resolution Image Generation with Autoregressive Models（Token-Shuffle：面向自回归模型的高分辨率图像生成） [04:58] 🖼 Distilling semantically aware orders for autoregressive image generation（用于自回归图像生成的语义感知顺序蒸馏） [05:38] 🗜 DyMU: Dynamic Merging and Virtual Unmerging for Efficient VLMs（DyMU：用于高效视觉语言模型的动态合并与虚拟解合并） [06:17] 🇪 IberBench: LLM Evaluation on Iberian Languages（IberBench：伊比利亚语系的大语言模型评测基准） [07:01] 🧠 Process Reward Models That Think（思考过程奖励模型） [07:46] 🎨 Boosting Generative Image Modeling via Joint Image-Feature Synthesis（通过联合图像-特征合成增强生成图像建模） [08:21] 🎬 ViSMaP: Unsupervised Hour-long Video Summarisation by Meta-Prompting（ViSMaP：基于元提示的无监督小时级视频摘要） [09:02] 👗 3DV-TON: Textured 3D-Guided Consistent Video Try-on via Diffusion Models（3DV-TON：基于扩散模型的纹理3D引导一致性视频试穿） [09:44] 📹 TimeChat-Online: 80% Visual Tokens are Naturally Redundant in Streaming Videos（TimeChat-Online：在线流媒体视频中 80% 的视觉 tokens 天然冗余）【关注我们】您还可以在以下平台找到我们，获得播客内容以外更多信息小红书: AI速递

2025.04.24 | 视觉推理评估新基准；高保真人脸替换技术

本期的 14 篇论文如下： [00:23] 👁 VisuLogic: A Benchmark for Evaluating Visual Reasoning in Multi-modal Large Language Models（VisuLogic：一个用于评估多模态大型语言模型中视觉推理能力的基准） [01:08] 🎭 DreamID: High-Fidelity and Fast diffusion-based Face Swapping via Triplet ID Group Learning（DreamID：基于Triplet ID Group Learning的高保真快速扩散人脸替换） [01:46] 🌐 Trillion 7B Technical Report（Trillion-7B 技术报告） [02:30] 💡 Pre-DPO: Improving Data Utilization in Direct Preference Optimization Using a Guiding Reference Model（Pre-DPO：利用引导参考模型提升直接偏好优化中的数据利用率） [03:11] 🧩 I-Con: A Unifying Framework for Representation Learning（I-Con：一种统一的表征学习框架） [03:50] 🧩 Decoupled Global-Local Alignment for Improving Compositional Understanding（解耦的全局-局部对齐以提升组合理解能力） [04:30] 🎨 DreamO: A Unified Framework for Image Customization（DreamO：图像定制的统一框架） [05:12] 💡 Tina: Tiny Reasoning Models via LoRA（蒂娜：基于LoRA的小型推理模型） [05:49] 🛡 A Comprehensive Survey in LLM(-Agent) Full Stack Safety: Data, Training and Deployment（LLM(-Agent) 全栈安全综合研究：数据、训练与部署） [06:30] 🧐 RePOPE: Impact of Annotation Errors on the POPE Benchmark（RePOPE：标注错误对POPE基准的影响） [07:06] 💡 Rethinking the Generation of High-Quality CoT Data from the Perspective of LLM-Adaptive Question Difficulty Grading（重新思考：基于LLM自适应问题难度分级的优质CoT数据生成） [07:46] 🛠 CRUST-Bench: A Comprehensive Benchmark for C-to-safe-Rust Transpilation（CRUST-Bench：C到安全Rust转译的综合基准） [08:29] ✅ Unchecked and Overlooked: Addressing the Checkbox Blind Spot in Large Language Models with CheckboxQA（未被检查与忽视：用 CheckboxQA 数据集解决大语言模型中的复选框盲点） [09:21] 🖼 Progressive Language-guided Visual Learning for Multi-Task Visual Grounding（多任务视觉定位的渐进式语言引导视觉学习）【关注我们】您还可以在以下平台找到我们，获得播客内容以外更多信息小红书: AI速递

2025.04.23 | 阿拉伯语性能提升；推理任务性能显著提高。

本期的 15 篇论文如下： [00:22] 💡 Kuwain 1.5B: An Arabic SLM via Language Injection（Kuwain 1.5B：一种基于语言注入的阿拉伯语SLM） [00:58] 🤖 TTRL: Test-Time Reinforcement Learning（测试时强化学习） [01:40] 🌍 The Bitter Lesson Learned from 2,000+ Multilingual Benchmarks（从2000+多语种评测基准中汲取的惨痛教训） [02:23] 🖼 Describe Anything: Detailed Localized Image and Video Captioning（描述一切：细粒度局部图像与视频字幕生成） [03:00] 💡 Learning Adaptive Parallel Reasoning with Language Models（基于语言模型的自适应并行推理学习） [03:34] 🖼 IV-Bench: A Benchmark for Image-Grounded Video Perception and Reasoning in Multimodal LLMs（IV-Bench：多模态大语言模型中基于图像的视频感知与推理基准） [04:19] 📖 BookWorld: From Novels to Interactive Agent Societies for Creative Story Generation（BookWorld：从小说到交互式智能体社会，用于创意故事生成） [05:10] 🚀 Efficient Pretraining Length Scaling（高效预训练长度扩展） [05:49] 🩻 CheXWorld: Exploring Image World Modeling for Radiograph Representation Learning（CheXWorld：探索用于X射线影像表征学习的图像世界建模） [06:26] 🖼 Personalized Text-to-Image Generation with Auto-Regressive Models（基于自回归模型的个性化文本到图像生成） [07:08] 🗣 LiveCC: Learning Video LLM with Streaming Speech Transcription at Scale（LiveCC：基于大规模流式语音转录学习视频大语言模型） [07:47] 🎬 Vidi: Large Multimodal Models for Video Understanding and Editing（Vidi：用于视频理解与编辑的大型多模态模型） [08:27] 🖼 From Reflection to Perfection: Scaling Inference-Time Optimization for Text-to-Image Diffusion Models via Reflection Tuning（从反思到完美：通过反思调优扩展文本到图像扩散模型的推理时优化） [09:03] 🤖 LLMs are Greedy Agents: Effects of RL Fine-tuning on Decision-Making Abilities（LLM是贪婪的智能体：强化学习微调对决策能力的影响） [09:44] 🤖 WALL-E 2.0: World Alignment by NeuroSymbolic Learning improves World Model-based LLM Agents（WALL-E 2.0：通过神经符号学习实现世界对齐，提升基于世界模型的LLM智能体性能）【关注我们】您还可以在以下平台找到我们，获得播客内容以外更多信息小红书: AI速递

2025.04.22 | LUFFY提升推理性能；FlowReasoner增强系统适应性。

本期的 15 篇论文如下： [00:25] 🧠 Learning to Reason under Off-Policy Guidance（离线策略指导下的推理学习） [01:00] 🤖 FlowReasoner: Reinforcing Query-Level Meta-Agents（FlowReasoner：强化查询级别元代理） [01:40] 🦅 Eagle 2.5: Boosting Long-Context Post-Training for Frontier Vision-Language Models（Eagle 2.5：提升前沿视觉-语言模型长文本后训练性能） [02:22] 🧰 ToolRL: Reward is All Tool Learning Needs（工具强化学习：奖励是工具学习的全部） [03:07] 🌐 SphereDiff: Tuning-free Omnidirectional Panoramic Image and Video Generation via Spherical Latent Representation（SphereDiff：通过球面潜在表示实现免调优全景图像和视频生成） [03:39] 🎨 StyleMe3D: Stylization with Disentangled Priors by Multiple Encoders on 3D Gaussians（StyleMe3D：基于3D高斯的解耦先验多编码器风格化） [04:18] 🤖 X-Teaming: Multi-Turn Jailbreaks and Defenses with Adaptive Multi-Agents（X-Teaming：基于自适应多智能体的多轮越狱与防御） [04:57] 🤖 UFO2: The Desktop AgentOS（UFO2：桌面AgentOS） [05:34] 🧑 LeetCodeDataset: A Temporal Dataset for Robust Evaluation and Efficient Training of Code LLMs（LeetCodeDataset：一个用于代码大语言模型稳健评估和高效训练的时序数据集） [06:18] 👀 Seeing from Another Perspective: Evaluating Multi-View Understanding in MLLMs（换个角度看世界：评估多模态大语言模型中的多视角理解能力） [07:02] 🤖 InfiGUI-R1: Advancing Multimodal GUI Agents from Reactive Actors to Deliberative Reasoners（InfiGUI-R1：推进多模态GUI智能体从反应式执行者到审慎推理者的演进） [07:42] 🕹 EasyEdit2: An Easy-to-use Steering Framework for Editing Large Language Models（EasyEdit2：一种用于编辑大型语言模型的简易操控框架） [08:23] 📱 LearnAct: Few-Shot Mobile GUI Agent with a Unified Demonstration Benchmark（LearnAct：基于统一演示基准的少样本移动GUI智能体） [09:06] 🖼 LookingGlass: Generative Anamorphoses via Laplacian Pyramid Warping（窥镜：基于拉普拉斯金字塔扭曲的生成式畸变图像） [09:50] 🎵 DRAGON: Distributional Rewards Optimize Diffusion Generative Models（DRAGON：利用分布奖励优化扩散生成模型）【关注我们】您还可以在以下平台找到我们，获得播客内容以外更多信息小红书: AI速递

2025.04.21 | 强化学习未提升新推理能力；MIG优化指令微调数据选择。

本期的 9 篇论文如下： [00:22] 🤔 Does Reinforcement Learning Really Incentivize Reasoning Capacity in LLMs Beyond the Base Model?（强化学习真的能激励大语言模型产生超越基础模型的推理能力吗？） [00:59] 🧠 MIG: Automatic Data Selection for Instruction Tuning by Maximizing Information Gain in Semantic Space（MIG：通过最大化语义空间中的信息增益实现指令微调的自动数据选择） [01:41] 🤔 Could Thinking Multilingually Empower LLM Reasoning?（多语思考能否增强大型语言模型的推理能力？） [02:25] 🏙 AerialMegaDepth: Learning Aerial-Ground Reconstruction and View Synthesis（AerialMegaDepth：学习空中-地面重建与视角合成） [03:09] 🏠 HiScene: Creating Hierarchical 3D Scenes with Isometric View Generation（HiScene：利用等距视图生成创建分层3D场景） [03:52] 💡 NodeRAG: Structuring Graph-based RAG with Heterogeneous Nodes（NodeRAG：使用异构节点构建的基于图结构的RAG） [04:30] 🧠 It's All Connected: A Journey Through Test-Time Memorization, Attentional Bias, Retention, and Online Optimization（一切皆有关联：一次关于测试时记忆、注意力偏差、保留和在线优化的探索之旅） [05:07] 🏞 Tokenize Image Patches: Global Context Fusion for Effective Haze Removal in Large Images（令牌化图像块：用于大型图像中有效去雾的全局上下文融合） [05:51] 🧠 Thought Manipulation: External Thought Can Be Efficient for Large Reasoning Models（思想操控：外部思想能够有效应用于大型推理模型）【关注我们】您还可以在以下平台找到我们，获得播客内容以外更多信息小红书: AI速递

6分钟

【周末特辑】4月第3周最火AI论文 | 多模态模型InternVL3创新预训练；Seaweed-7B高效视频生成。

本期的 5 篇论文如下： [00:52] TOP1(🔥223) | 🖼 InternVL3: Exploring Advanced Training and Test-Time Recipes for Open-Source Multimodal Models（InternVL3：探索开源多模态模型的高级训练和测试时方案） [03:22] TOP2(🔥117) | 🎬 Seaweed-7B: Cost-Effective Training of Video Generation Foundation Model（Seaweed-7B：一种经济高效的视频生成基础模型训练方法） [05:40] TOP3(🔥112) | 🏠 PRIMA.CPP: Speeding Up 70B-Scale LLM Inference on Low-Resource Everyday Home Clusters（PRIMA.CPP: 加速低资源家用集群上700亿参数规模大语言模型的推理） [07:34] TOP4(🔥77) | ✅ xVerify: Efficient Answer Verifier for Reasoning Model Evaluations（xVerify：用于推理模型评估的高效答案验证器） [10:43] TOP5(🔥74) | 🗂 CLIMB: CLustering-based Iterative Data Mixture Bootstrapping for Language Model Pre-training（CLIMB：基于聚类的迭代数据混合引导预训练方法）【关注我们】您还可以在以下平台找到我们，获得播客内容以外更多信息小红书: AI速递

13分钟

2025.04.18 | CLIMB提升领域模型表现；反蒸馏采样防止模型被盗用。

本期的 15 篇论文如下： [00:23] 🗂 CLIMB: CLustering-based Iterative Data Mixture Bootstrapping for Language Model Pre-training（CLIMB：基于聚类的迭代数据混合引导预训练方法） [01:03] 🧪 Antidistillation Sampling（反蒸馏采样） [01:41] 🤝 A Strategic Coordination Framework of Small LLMs Matches Large LLMs in Data Synthesis（小型LLM的策略协调框架在数据合成方面与大型LLM相媲美） [02:26] 🎬 Packing Input Frame Context in Next-Frame Prediction Models for Video Generation（视频生成中基于帧打包的下一帧预测模型） [03:02] 🤖 Generate, but Verify: Reducing Hallucination in Vision-Language Models with Retrospective Resampling（生成，但验证：通过回顾重采样减少视觉-语言模型中的幻觉） [03:43] 🧠 WORLDMEM: Long-term Consistent World Simulation with Memory（WORLDMEM：基于记忆的长期一致性世界模拟） [04:27] 🎬 VistaDPO: Video Hierarchical Spatial-Temporal Direct Preference Optimization for Large Video Models（VistaDPO：用于大型视频模型的分层时空直接偏好优化） [05:01] 🤖 NoisyRollout: Reinforcing Visual Reasoning with Data Augmentation（NoisyRollout：利用数据增强强化视觉推理） [05:43] 🎨 DMM: Building a Versatile Image Generation Model via Distillation-Based Model Merging（DMM：构建基于蒸馏模型合并的通用图像生成模型） [06:20] 📊 ChartQAPro: A More Diverse and Challenging Benchmark for Chart Question Answering（ChartQAPro：一个更多样化和更具挑战性的图表问答基准） [07:07] 🤖 Exploring Expert Failures Improves LLM Agent Tuning（探索专家失败案例以提升LLM Agent的调优效果） [07:48] 🎨 InstantCharacter: Personalize Any Characters with a Scalable Diffusion Transformer Framework（InstantCharacter：使用可扩展的扩散Transformer框架个性化任何角色） [08:26] 📸 CCMNet: Leveraging Calibrated Color Correction Matrices for Cross-Camera Color Constancy（CCMNet：利用校准颜色校正矩阵实现跨相机色彩恒常性） [09:06] 🎬 FocusedAD: Character-centric Movie Audio Description（聚焦AD：以角色为中心的电影音频描述） [09:39] 🤔 Retrieval-Augmented Generation with Conflicting Evidence（检索增强生成与冲突证据）【关注我们】您还可以在以下平台找到我们，获得播客内容以外更多信息小红书: AI速递

96

2025.04.17 | ColorBench测试VLM颜色理解；BitNet提升计算效率。

本期的 11 篇论文如下： [00:27] 🎨 ColorBench: Can VLMs See and Understand the Colorful World? A Comprehensive Benchmark for Color Perception, Reasoning, and Robustness（ColorBench：视觉语言模型能否看到并理解多彩世界？一个关于颜色感知、推理和鲁棒性的综合基准） [01:09] 💡 BitNet b1.58 2B4T Technical Report（BitNet b1.58 2B4T 技术报告） [01:50] 🎨 Cobra: Efficient Line Art COlorization with BRoAder References（Cobra：基于更广泛参考的高效线稿着色） [02:28] 🚀 AlayaDB: The Data Foundation for Efficient and Effective Long-context LLM Inference（AlayaDB：用于高效且有效的长文本LLM推理的数据基础） [03:05] 🗣 SIFT-50M: A Large-Scale Multilingual Dataset for Speech Instruction Fine-Tuning（SIFT-50M：用于语音指令微调的大规模多语种数据集） [03:51] 🧰 ReTool: Reinforcement Learning for Strategic Tool Use in LLMs（ReTool：基于强化学习的LLM战略性工具使用） [04:31] 🚀 REPA-E: Unlocking VAE for End-to-End Tuning with Latent Diffusion Transformers（REPA-E：通过潜在扩散Transformer解锁变分自编码器的端到端调整） [05:09] 📹 Vivid4D: Improving 4D Reconstruction from Monocular Video by Video Inpainting（Vivid4D：通过视频修复改进单目视频的4D重建） [05:51] 🤖 Robust and Fine-Grained Detection of AI Generated Texts（AI生成文本的稳健和细粒度检测） [06:34] 🧠 Syzygy of Thoughts: Improving LLM CoT with the Minimal Free Resolution（思想的合冲：用极小自由分解改进大型语言模型的思维链） [07:18] 🖼 BlockGaussian: Efficient Large-Scale Scene Novel View Synthesis via Adaptive Block-Based Gaussian Splatting（BlockGaussian：基于自适应块的高效大规模场景新视角合成）【关注我们】您还可以在以下平台找到我们，获得播客内容以外更多信息小红书: AI速递

8分钟

93

2025.04.16 | Genius提升LLM推理能力；xVerify高效验证推理模型。

本期的 15 篇论文如下： [00:22] 🧠 Genius: A Generalizable and Purely Unsupervised Self-Training Framework For Advanced Reasoning（Genius：一种用于高级推理的通用且纯粹的无监督自训练框架） [01:06] ✅ xVerify: Efficient Answer Verifier for Reasoning Model Evaluations（xVerify：用于推理模型评估的高效答案验证器） [01:52] 🖼 Pixel-SAIL: Single Transformer For Pixel-Grounded Understanding（Pixel-SAIL：用于像素级理解的单Transformer） [02:37] ✅ Heimdall: test-time scaling on the generative verification（海姆达尔：生成式验证的测试时扩展） [03:23] 🎨 Seedream 3.0 Technical Report（Seedream 3.0 技术报告） [04:07] 📊 How Instruction and Reasoning Data shape Post-Training: Data Quality through the Lens of Layer-wise Gradients（指令和推理数据如何塑造后训练：基于层级梯度的数据质量分析） [04:54] 🎮 TextArena（文本竞技场：用于大型语言模型中智能行为训练与评估的竞争性文本游戏集合） [05:43] 🧠 The Scalability of Simplicity: Empirical Analysis of Vision-Language Learning with a Single Transformer（简单性的可扩展性：使用单一Transformer的视觉-语言学习的实证分析） [06:22] 🤖 Efficient Process Reward Model Training via Active Learning（基于主动学习的高效过程奖励模型训练） [07:01] 🚀 Efficient Generative Model Training via Embedded Representation Warmup（基于嵌入表示预热的高效生成模型训练） [07:43] 🎥 NormalCrafter: Learning Temporally Consistent Normals from Video Diffusion Priors（NormalCrafter: 从视频扩散先验中学习时序一致的法线） [08:23] 🧠 A Minimalist Approach to LLM Reasoning: from Rejection Sampling to Reinforce（LLM推理的极简方法：从拒绝采样到强化学习） [09:00] 🧮 DeepMath-103K: A Large-Scale, Challenging, Decontaminated, and Verifiable Mathematical Dataset for Advancing Reasoning（DeepMath-103K：一个大规模、具有挑战性、经过净化且可验证的数学数据集，用于推进推理研究） [09:43] 🚗 Diffusion Distillation With Direct Preference Optimization For Efficient 3D LiDAR Scene Completion（基于直接偏好优化的扩散蒸馏，用于高效3D激光雷达场景补全） [10:25] 📹 PVUW 2025 Challenge Report: Advances in Pixel-level Understanding of Complex Videos in the Wild（PVUW 2025 挑战报告：复杂自然视频像素级理解进展）【关注我们】您还可以在以下平台找到我们，获得播客内容以外更多信息小红书: AI速递

11分钟

2025.04.15 | 多模态模型性能提升；低资源推理加速优化

本期的 15 篇论文如下： [00:23] 🖼 InternVL3: Exploring Advanced Training and Test-Time Recipes for Open-Source Multimodal Models（InternVL3：探索开源多模态模型的高级训练和测试时方案） [01:03] 🏠 PRIMA.CPP: Speeding Up 70B-Scale LLM Inference on Low-Resource Everyday Home Clusters（PRIMA.CPP: 加速低资源家用集群上700亿参数规模大语言模型的推理） [01:46] 🖼 FUSION: Fully Integration of Vision-Language Representations for Deep Cross-Modal Understanding（FUSION：用于深度跨模态理解的视觉-语言表征的完全集成） [02:26] 🤔 VL-Rethinker: Incentivizing Self-Reflection of Vision-Language Models with Reinforcement Learning（VL-Rethinker：通过强化学习激励视觉-语言模型的自我反思） [03:07] 🤖 Iterative Self-Training for Code Generation via Reinforced Re-Ranking（基于强化重排序的迭代自训练代码生成） [03:51] 🎬 Mavors: Multi-granularity Video Representation for Multimodal Large Language Model（Mavors：面向多模态大型语言模型的多粒度视频表征） [04:28] 🤖 AgentRewardBench: Evaluating Automatic Evaluations of Web Agent Trajectories（AgentRewardBench：评估Web Agent轨迹的自动评估方法） [05:13] 🧠 S1-Bench: A Simple Benchmark for Evaluating System 1 Thinking Capability of Large Reasoning Models（S1-Bench：一个评估大型推理模型系统1思维能力的简单基准） [05:56] 🤔 Have we unified image generation and understanding yet? An empirical study of GPT-4o's image generation ability（我们是否已经统一了图像生成与理解？GPT-4o图像生成能力的一项实证研究） [06:42] 🤖 DUMP: Automated Distribution-Level Curriculum Learning for RL-based LLM Post-training（DUMP：基于强化学习的LLM后训练的自动化分布级别课程学习） [07:22] 🌍 SocioVerse: A World Model for Social Simulation Powered by LLM Agents and A Pool of 10 Million Real-World Users（SocioVerse：一个由LLM驱动的智能体和一千万真实用户池支持的社会模拟世界模型） [08:11] 🤖 Breaking the Data Barrier -- Building GUI Agents Through Task Generalization（打破数据壁垒——通过任务泛化构建GUI智能体） [08:56] 💡 TinyLLaVA-Video-R1: Towards Smaller LMMs for Video Reasoning（TinyLLaVA-Video-R1：面向视频推理的小型多模态模型） [09:40] 🧪 LLM-SRBench: A New Benchmark for Scientific Equation Discovery with Large Language Models（LLM-SRBench：一个用于大型语言模型科学方程发现的新基准） [10:21] 🛡 EmoAgent: Assessing and Safeguarding Human-AI Interaction for Mental Health Safety（EmoAgent：评估并保障人机交互中的心理健康安全）【关注我们】您还可以在以下平台找到我们，获得播客内容以外更多信息小红书: AI速递

11分钟

2025.04.14 | 经济高效视频生成；自回归图像生成扩展。

本期的 13 篇论文如下： [00:24] 🎬 Seaweed-7B: Cost-Effective Training of Video Generation Foundation Model（Seaweed-7B：一种经济高效的视频生成基础模型训练方法） [01:00] 🖼 GigaTok: Scaling Visual Tokenizers to 3 Billion Parameters for Autoregressive Image Generation（GigaTok：将视觉标记器扩展到30亿参数以进行自回归图像生成） [01:42] 🎮 MineWorld: a Real-Time and Open-Source Interactive World Model on Minecraft（MineWorld：基于Minecraft的实时开源交互式世界模型） [02:25] 🖼 PixelFlow: Pixel-Space Generative Models with Flow（PixelFlow：基于Flow的像素空间生成模型） [03:05] 🤖 SQL-R1: Training Natural Language to SQL Reasoning Model By Reinforcement Learning（SQL-R1：通过强化学习训练自然语言到SQL的推理模型） [03:51] 🎨 FlexIP: Dynamic Control of Preservation and Personality for Customized Image Generation（FlexIP：用于定制图像生成的保持与个性动态控制） [04:30] 🎬 In-2-4D: Inbetweening from Two Single-View Images to 4D Generation（In-2-4D：从两张单视图图像到4D生成的补帧） [05:05] 🤔 ModernBERT or DeBERTaV3? Examining Architecture and Data Influence on Transformer Encoder Models Performance（ModernBERT还是DeBERTaV3？探究架构和数据对Transformer编码器模型性能的影响） [05:42] 🚀 Pangu Ultra: Pushing the Limits of Dense Large Language Models on Ascend NPUs（盘古 Ultra：在昇腾NPU上突破稠密大型语言模型的极限） [06:21] 🤔 Do PhD-level LLMs Truly Grasp Elementary Addition? Probing Rule Learning vs. Memorization in Large Language Models（博士级大语言模型真的理解基础加法吗？探究大语言模型中的规则学习与记忆） [07:11] 🛡 SAEs $\textit{Can}$ Improve Unlearning: Dynamic Sparse Autoencoder Guardrails for Precision Unlearning in LLMs（稀疏自编码器助力模型遗忘：用于大语言模型精确遗忘的动态稀疏自编码器防护） [07:52] 🤝 CoRAG: Collaborative Retrieval-Augmented Generation（CoRAG：协同检索增强生成） [08:29] 🤝 InteractVLM: 3D Interaction Reasoning from 2D Foundational Models（InteractVLM：基于2D基础模型的三维交互推理）【关注我们】您还可以在以下平台找到我们，获得播客内容以外更多信息小红书: AI速递

9分钟