本期的 15 篇论文如下: [00:24] 💡 Seed1.5-VL Technical Report(Seed1.5-VL 技术报告) [01:04] 🧠 MiMo: Unlocking the Reasoning Potential of Language Model -- From Pretraining to Posttraining(MiMo:释放语言模型的推理潜力——从预训练到后训练) [01:48] 🖼 Step1X-3D: Towards High-Fidelity and Controllable Generation of Textured 3D Assets(Step1X-3D:迈向高质量和可控的纹理3D资产生成) [02:29] 🤝 Learning from Peers in Reasoning Models(推理模型中的同伴学习) [03:08] 🎨 Unified Continuous Generative Models(统一连续生成模型) [03:49] 🤖 REFINE-AF: A Task-Agnostic Framework to Align Language Models via Self-Generated Instructions using Reinforcement Learning from Automated Feedback(REFINE-AF:一种通过强化学习和自动反馈,以自生成指令对齐语言模型的任务无关框架) [04:44] 💃 DanceGRPO: Unleashing GRPO on Visual Generation(DanceGRPO:在视觉生成领域释放GRPO的潜力) [05:25] 🧠 AttentionInfluence: Adopting Attention Head Influence for Weak-to-Strong Pretraining Data Selection(AttentionInfluence:采用注意力头影响进行弱到强预训练数据选择) [06:10] 🌐 WebGen-Bench: Evaluating LLMs on Generating Interactive and Functional Websites from Scratch(WebGen-Bench:评估大型语言模型从零生成交互式和功能性网站的能力) [06:53] 📈 Learning Dynamics in Continual Pre-Training for Large Language Models(大型语言模型持续预训练中的学习动态) [07:28] 🏆 Skywork-VL Reward: An Effective Reward Model for Multimodal Understanding and Reasoning(Skywork-VL Reward:一种用于多模态理解和推理的有效奖励模型) [08:11] 🧠 Reinforced Internal-External Knowledge Synergistic Reasoning for Efficient Adaptive Search Agent(用于高效自适应搜索代理的增强型内外知识协同推理) [08:50] 🤖 H$^{\mathbf{3}}$DP: Triply-Hierarchical Diffusion Policy for Visuomotor Learning(H$^{\mathbf{3}}$DP:用于视觉运动学习的三重分层扩散策略) [09:36] 🎨 Continuous Visual Autoregressive Generation via Score Maximization(基于得分最大化的连续视觉自回归生成) [10:26] 🧠 Overflow Prevention Enhances Long-Context Recurrent LLMs(溢出预防增强长文本循环LLM) 【关注我们】 您还可以在以下平台找到我们,获得播客内容以外更多信息 小红书: AI速递
本期的 7 篇论文如下: [00:23] 🇵 Bielik v3 Small: Technical Report(Bielik v3 Small:技术报告) [01:07] 🇵 Bielik 11B v2 Technical Report(Bielik 11B v2 技术报告) [01:42] 🤖 UniVLA: Learning to Act Anywhere with Task-centric Latent Actions(UniVLA:通过任务中心潜在动作学习在任意环境行动) [02:30] 🎨 G-FOCUS: Towards a Robust Method for Assessing UI Design Persuasiveness(G-FOCUS:迈向评估用户界面设计说服力的稳健方法) [03:16] ⭐ Sailing AI by the Stars: A Survey of Learning from Rewards in Post-Training and Test-Time Scaling of Large Language Models(星辰引航:大型语言模型后训练与测试时扩展中基于奖励学习的综述) [03:55] ⚕ Healthy LLMs? Benchmarking LLM Knowledge of UK Government Public Health Information(健康的大语言模型?英国政府公共健康信息知识基准测试) [04:37] 🖼 A Preliminary Study for GPT-4o on Image Restoration(GPT-4o 在图像修复中的初步研究) 【关注我们】 您还可以在以下平台找到我们,获得播客内容以外更多信息 小红书: AI速递
本期的 5 篇论文如下: [00:42] TOP1(🔥93) | 🚀 Absolute Zero: Reinforced Self-play Reasoning with Zero Data(绝对零度:基于零数据的强化自博弈推理) [02:38] TOP2(🔥91) | 🧠 Perception, Reason, Think, and Plan: A Survey on Large Multimodal Reasoning Models(感知、推理、思考与规划:大型多模态推理模型综述) [04:44] TOP3(🔥83) | 🧠 Unified Multimodal Chain-of-Thought Reward Model through Reinforcement Fine-Tuning(基于强化微调的统一多模态思维链奖励模型) [06:35] TOP4(🔥77) | 🤖 Voila: Voice-Language Foundation Models for Real-Time Autonomous Interaction and Voice Role-Play(Voila:用于实时自主交互和语音角色扮演的语音-语言基础模型) [08:52] TOP5(🔥77) | 🧠 Grokking in the Wild: Data Augmentation for Real-World Multi-Hop Reasoning with Transformers(野外Grokking:使用Transformers进行真实世界多跳推理的数据增强) 【关注我们】 您还可以在以下平台找到我们,获得播客内容以外更多信息 小红书: AI速递
本期的 15 篇论文如下: [00:22] 🧠 Perception, Reason, Think, and Plan: A Survey on Large Multimodal Reasoning Models(感知、推理、思考与规划:大型多模态推理模型综述) [00:57] 🤖 On Path to Multimodal Generalist: General-Level and General-Bench(迈向多模态通用智能:通用水平与通用基准) [01:40] 🤖 Flow-GRPO: Training Flow Matching Models via Online RL(Flow-GRPO:通过在线强化学习训练Flow Matching模型) [02:23] 🧠 Sentient Agent as a Judge: Evaluating Higher-Order Social Cognition in Large Language Models(作为裁判的感知代理:评估大型语言模型中的高阶社会认知) [03:05] 🧠 Scalable Chain of Thoughts via Elastic Reasoning(基于弹性推理的可扩展思维链) [03:41] 🔍 FG-CLIP: Fine-Grained Visual and Textual Alignment(FG-CLIP:细粒度视觉与文本对齐) [04:19] 🏞 3D Scene Generation: A Survey(三维场景生成:综述) [05:02] 🧮 ICon: In-Context Contribution for Automatic Data Selection(ICon:用于自动数据选择的上下文贡献度学习) [05:39] 🎬 StreamBridge: Turning Your Offline Video Large Language Model into a Proactive Streaming Assistant(StreamBridge:将离线视频大语言模型转化为主动流式助手) [06:19] 🤖 LiftFeat: 3D Geometry-Aware Local Feature Matching(LiftFeat: 三维几何感知局部特征匹配) [06:56] 🧱 Generating Physically Stable and Buildable LEGO Designs from Text(基于文本生成物理稳定且可搭建的乐高设计) [07:38] 🧠 X-Reasoner: Towards Generalizable Reasoning Across Modalities and Domains(X-Reasoner:迈向跨模态和领域的通用推理) [08:22] 🌐 Crosslingual Reasoning through Test-Time Scaling(基于测试时缩放的跨语言推理) [09:04] 🖼 PlaceIt3D: Language-Guided Object Placement in Real 3D Scenes(PlaceIt3D:语言引导的真实3D场景物体放置) [09:42] 🌐 BrowseComp-ZH: Benchmarking Web Browsing Ability of Large Language Models in Chinese(BrowseComp-ZH:中文环境下评估大型语言模型网页浏览能力的基准) 【关注我们】 您还可以在以下平台找到我们,获得播客内容以外更多信息 小红书: AI速递
本期的 14 篇论文如下: [00:21] 💡 Unified Multimodal Understanding and Generation Models: Advances, Challenges, and Opportunities(统一多模态理解与生成模型:进展、挑战与机遇) [01:02] 🤖 ZeroSearch: Incentivize the Search Capability of LLMs without Searching(零搜索:无需搜索即可激励大型语言模型的搜索能力) [01:50] 🤔 Beyond Recognition: Evaluating Visual Perspective Taking in Vision Language Models(超越识别:评估视觉语言模型中的视觉视角采纳能力) [02:31] 🎬 HunyuanCustom: A Multimodal-Driven Architecture for Customized Video Generation(HunyuanCustom:一种用于定制视频生成的多模态驱动架构) [03:15] 🧩 PrimitiveAnything: Human-Crafted 3D Primitive Assembly Generation with Auto-Regressive Transformer(PrimitiveAnything:基于自回归Transformer的人工3D图元组合生成) [04:04] 🤖 Benchmarking LLMs' Swarm intelligence(大型语言模型群集智能基准测试) [04:49] 🤔 Beyond Theorem Proving: Formulation, Framework and Benchmark for Formal Problem-Solving(超越定理证明:形式化问题求解的公式、框架与基准) [05:26] 🤖 OpenHelix: A Short Survey, Empirical Analysis, and Open-Source Dual-System VLA Model for Robotic Manipulation(OpenHelix:机器人操作的双系统VLA模型的简要调查、实证分析和开源实现) [05:58] 🌐 OmniGIRL: A Multilingual and Multimodal Benchmark for GitHub Issue Resolution(OmniGIRL:一个用于GitHub问题解决的多语言和多模态基准) [06:36] 🖥 OSUniverse: Benchmark for Multimodal GUI-navigation AI Agents(OSUniverse:多模态GUI导航AI智能体的基准测试) [07:19] 🧠 Knowledge Augmented Complex Problem Solving with Large Language Models: A Survey(大型语言模型赋能知识增强的复杂问题求解:一项综述) [08:04] 🎛 R&B: Domain Regrouping and Data Mixture Balancing for Efficient Foundation Model Training(R&B:面向高效基础模型训练的领域重组与数据混合平衡) [08:48] 🤝 Cognitio Emergens: Agency, Dimensions, and Dynamics in Human-AI Knowledge Co-Creation(涌现认知:人机知识共创中的能动性、维度与动态) [09:26] 📹 Uncertainty-Weighted Image-Event Multimodal Fusion for Video Anomaly Detection(不确定性加权图像-事件多模态融合的视频异常检测) 【关注我们】 您还可以在以下平台找到我们,获得播客内容以外更多信息 小红书: AI速递
本期的 14 篇论文如下: [00:24] 🧠 Unified Multimodal Chain-of-Thought Reward Model through Reinforcement Fine-Tuning(基于强化微调的统一多模态思维链奖励模型) [01:10] 🤖 Absolute Zero: Reinforced Self-play Reasoning with Zero Data(绝对零度:零数据下的强化自博弈推理) [01:52] 🤸 FlexiAct: Towards Flexible Action Control in Heterogeneous Scenarios(FlexiAct:面向异构场景的灵活动作控制) [02:33] 🚀 RADLADS: Rapid Attention Distillation to Linear Attention Decoders at Scale(RADLADS:大规模线性注意力解码器的快速注意力蒸馏) [03:07] 🚀 RetroInfer: A Vector-Storage Approach for Scalable Long-Context LLM Inference(RetroInfer:一种用于可扩展长文本LLM推理的向量存储方法) [03:45] 👁 Decoding Open-Ended Information Seeking Goals from Eye Movements in Reading(从阅读中的眼动解码开放式信息搜寻目标) [04:30] 🗜 An Empirical Study of Qwen3 Quantization(Qwen3量化的实证研究) [05:09] ⚽ Multi-Agent System for Comprehensive Soccer Understanding(用于综合足球理解的多智能体系统) [05:52] 🗣 VITA-Audio: Fast Interleaved Cross-Modal Token Generation for Efficient Large Speech-Language Model(VITA-Audio:用于高效大型语音-语言模型的快速交错跨模态Token生成) [06:36] 🗺 Geospatial Mechanistic Interpretability of Large Language Models(大型语言模型的地理空间机制可解释性) [07:12] 🧑 InfoVids: Reimagining the Viewer Experience with Alternative Visualization-Presenter Relationships(InfoVids:通过另类可视化-演示者关系重塑观看者体验) [07:54] 🤖 Invoke Interfaces Only When Needed: Adaptive Invocation for Large Language Models in Question Answering(仅在需要时调用接口:用于问答中大语言模型的自适应调用) [08:32] 🥽 HoloTime: Taming Video Diffusion Models for Panoramic 4D Scene Generation(HoloTime:驾驭视频扩散模型生成全景4D场景) [09:18] 🤖 Auto-SLURP: A Benchmark Dataset for Evaluating Multi-Agent Frameworks in Smart Personal Assistant(Auto-SLURP:一个用于评估智能个人助理中多智能体框架的基准数据集) 【关注我们】 您还可以在以下平台找到我们,获得播客内容以外更多信息 小红书: AI速递
本期的 15 篇论文如下: [00:22] 🤖 Voila: Voice-Language Foundation Models for Real-Time Autonomous Interaction and Voice Role-Play(Voila:用于实时自主交互和语音角色扮演的语音-语言基础模型) [01:09] 🤔 RM-R1: Reward Modeling as Reasoning(RM-R1:将奖励建模视为推理) [01:52] 🧠 Grokking in the Wild: Data Augmentation for Real-World Multi-Hop Reasoning with Transformers(野外Grokking:用于Transformer真实世界多跳推理的数据增强) [02:32] 🧮 FormalMATH: Benchmarking Formal Mathematical Reasoning of Large Language Models(FormalMATH:大规模语言模型的形式化数学推理基准) [03:17] ✂ ReplaceMe: Network Simplification via Layer Pruning and Linear Transformations(ReplaceMe:基于层剪枝和线性变换的网络简化) [03:59] 🧠 Optimizing Chain-of-Thought Reasoners via Gradient Variance Minimization in Rejection Sampling and RL(通过拒绝采样和强化学习中的梯度方差最小化优化思维链推理器) [04:39] 🚀 Practical Efficiency of Muon for Pretraining(Muon在预训练中的实际效率) [05:18] ⚙ A Survey on Inference Engines for Large Language Models: Perspectives on Optimization and Efficiency(大语言模型推理引擎综述:优化与效率的视角) [06:01] 🤖 R1-Reward: Training Multimodal Reward Model Through Stable Reinforcement Learning(R1-奖励:通过稳定强化学习训练多模态奖励模型) [06:44] 🤔 Think on your Feet: Adaptive Thinking via Reinforcement Learning for Social Agents(随机应变:基于强化学习的社交智能体自适应思考) [07:24] 🤖 SkillMimic-V2: Learning Robust and Generalizable Interaction Skills from Sparse and Noisy Demonstrations(SkillMimic-V2:从稀疏和嘈杂的示范中学习鲁棒且可泛化的交互技能) [08:03] 🤖 Agentic Reasoning and Tool Integration for LLMs via Reinforcement Learning(基于强化学习的LLM自主推理与工具集成) [08:50] 🖼 SuperEdit: Rectifying and Facilitating Supervision for Instruction-Based Image Editing(SuperEdit:修正并促进基于指令的图像编辑的监督) [09:30] 🧮 Low-Precision Training of Large Language Models: Methods, Challenges, and Opportunities(大语言模型低精度训练:方法、挑战与机遇) [10:11] 🎨 Ming-Lite-Uni: Advancements in Unified Architecture for Natural Multimodal Interaction(Ming-Lite-Uni:自然多模态交互统一架构的进展) 【关注我们】 您还可以在以下平台找到我们,获得播客内容以外更多信息 小红书: AI速递
本期的 8 篇论文如下: [00:21] 🖼 PixelHacker: Image Inpainting with Structural and Semantic Consistency(PixelHacker:基于结构和语义一致性的图像修复) [01:01] 🎨 Improving Editability in Image Generation with Layer-wise Memory(通过分层记忆提升图像生成的可编辑性) [01:35] 🤖 Beyond One-Size-Fits-All: Inversion Learning for Highly Effective NLG Evaluation Prompts(超越一刀切:用于高效自然语言生成评估提示的反演学习) [02:18] 💡 Llama-Nemotron: Efficient Reasoning Models(Llama-Nemotron:高效推理模型) [03:02] 🧩 CORG: Generating Answers from Complex, Interrelated Contexts(CORG: 从复杂、相互关联的上下文中生成答案) [03:45] 🤖 Real-World Gaps in AI Governance Research(人工智能治理研究中的现实差距) [04:26] 🤖 TeLoGraF: Temporal Logic Planning via Graph-encoded Flow Matching(TeLoGraF:基于图编码流匹配的时序逻辑规划) [05:02] 🔄 X-Cross: Dynamic Integration of Language Models for Cross-Domain Sequential Recommendation(X-Cross:用于跨领域序列推荐的语言模型动态集成) 【关注我们】 您还可以在以下平台找到我们,获得播客内容以外更多信息 小红书: AI速递
本期的 5 篇论文如下: [00:43] TOP1(🔥149) | 🎥 Towards Understanding Camera Motions in Any Video(迈向理解任意视频中的相机运动) [03:05] TOP2(🔥74) | 🧠 Reinforcement Learning for Reasoning in Large Language Models with One Training Example(单样本强化学习赋能大语言模型推理) [05:48] TOP3(🔥54) | 🎭 The Leaderboard Illusion(排行榜的幻觉) [07:58] TOP4(🔥51) | 🔍 UniversalRAG: Retrieval-Augmented Generation over Multiple Corpora with Diverse Modalities and Granularities(通用RAG:基于多模态、多粒度异构语料库的检索增强生成) [10:29] TOP5(🔥50) | 🧠 Skywork R1V2: Multimodal Hybrid Reinforcement Learning for Reasoning(Skywork R1V2:用于推理的多模态混合强化学习) 【关注我们】 您还可以在以下平台找到我们,获得播客内容以外更多信息 小红书: AI速递
本期的 8 篇论文如下: [00:28] 🎮 A Survey of Interactive Generative Video(交互式生成视频综述) [01:05] 🧐 DeepCritic: Deliberate Critique with Large Language Models(DeepCritic: 基于大语言模型的审慎评判) [01:38] 🖼 T2I-R1: Reinforcing Image Generation with Collaborative Semantic-level and Token-level CoT(T2I-R1:通过协作式语义级和令牌级思维链强化图像生成) [02:15] 👄 KeySync: A Robust Approach for Leakage-free Lip Synchronization in High Resolution(KeySync:一种高分辨率下鲁棒的无泄漏唇形同步方法) [02:50] 🧠 AdaR1: From Long-CoT to Hybrid-CoT via Bi-Level Adaptive Reasoning Optimization(AdaR1:通过双层自适应推理优化,从长链思维到混合链思维) [03:31] 📚 TF1-EN-3M: Three Million Synthetic Moral Fables for Training Small, Open Language Models(TF1-EN-3M:用于训练小型开放语言模型的300万条合成道德寓言) [04:15] 🚀 LLMs for Engineering: Teaching Models to Design High Powered Rockets(工程领域的大语言模型:教模型设计高功率火箭) [05:09] 🩻 MediAug: Exploring Visual Augmentation in Medical Imaging(MediAug:探索医学影像中的视觉增强) 【关注我们】 您还可以在以下平台找到我们,获得播客内容以外更多信息 小红书: AI速递
本期的 14 篇论文如下: [00:21] 🗣 Sadeed: Advancing Arabic Diacritization Through Small Language Model(Sadeed:通过小型语言模型推进阿拉伯语变音) [01:05] 🔎 WebThinker: Empowering Large Reasoning Models with Deep Research Capability(WebThinker:利用深度研究能力增强大型推理模型) [01:43] 🧮 Phi-4-Mini-Reasoning: Exploring the Limits of Small Reasoning Language Models in Math(Phi-4-Mini-Reasoning:探索小型推理语言模型在数学方面的极限) [02:20] 💡 Softpick: No Attention Sink, No Massive Activations with Rectified Softmax(Softpick:一种使用修正Softmax且无注意力陷阱、无大规模激活的方法) [03:00] 🤔 Beyond the Last Answer: Your Reasoning Trace Uncovers More than You Think(超越最终答案:你的推理轨迹揭示了超乎你想象的信息) [03:38] 🧠 Phi-4-reasoning Technical Report(Phi-4-reasoning 技术报告) [04:21] 🧩 COMPACT: COMPositional Atomic-to-Complex Visual Capability Tuning(COMPACT:组合式的原子到复杂视觉能力调优) [04:59] 💡 Taming the Titans: A Survey of Efficient LLM Inference Serving(驯服泰坦:高效LLM推理服务综述) [05:34] 🤖 Generative AI for Character Animation: A Comprehensive Survey of Techniques, Applications, and Future Directions(用于角色动画的生成式人工智能:技术、应用与未来方向的综合综述) [06:09] 🤖 RoboVerse: Towards a Unified Platform, Dataset and Benchmark for Scalable and Generalizable Robot Learning(RoboVerse:面向可扩展和泛化机器人学习的统一平台、数据集和基准) [06:49] 🎬 ReVision: High-Quality, Low-Cost Video Generation with Explicit 3D Physics Modeling for Complex Motion and Interaction(ReVision:基于显式3D物理建模的高质量、低成本复杂运动与交互视频生成) [07:32] 🛡 Llama-3.1-FoundationAI-SecurityLLM-Base-8B Technical Report(Llama-3.1-FoundationAI-SecurityLLM-Base-8B 技术报告) [08:08] 🩻 UniBiomed: A Universal Foundation Model for Grounded Biomedical Image Interpretation(UniBiomed:用于Grounded生物医学图像解读的通用基础模型) [08:53] 🗳 Selecting Optimal Candidate Profiles in Adversarial Environments Using Conjoint Analysis and Machine Learning(在对抗环境中利用联合分析和机器学习选择最优候选人形象) 【关注我们】 您还可以在以下平台找到我们,获得播客内容以外更多信息 小红书: AI速递
本期的 12 篇论文如下: [00:24] 🔍 UniversalRAG: Retrieval-Augmented Generation over Multiple Corpora with Diverse Modalities and Granularities(通用RAG:基于多模态、多粒度异构语料库的检索增强生成) [01:06] 🧠 Reinforcement Learning for Reasoning in Large Language Models with One Training Example(单样本强化学习赋能大语言模型推理) [01:52] 🧠 ReasonIR: Training Retrievers for Reasoning Tasks(ReasonIR:训练用于推理任务的检索器) [02:31] 🤖 Toward Evaluative Thinking: Meta Policy Optimization with Evolving Reward Models(迈向评估性思考:基于演化奖励模型的元策略优化) [03:20] 🤖 TesserAct: Learning 4D Embodied World Models(TesserAct:学习4D具身世界模型) [04:01] 🎭 The Leaderboard Illusion(排行榜的幻觉) [04:37] 🖼 YoChameleon: Personalized Vision and Language Generation(Yo'Chameleon:个性化的视觉与语言生成) [05:17] 🛡 Certified Mitigation of Worst-Case LLM Copyright Infringement(大语言模型最坏情况版权侵权的认证缓解) [05:50] 🎭 ISDrama: Immersive Spatial Drama Generation through Multimodal Prompting(ISDrama:基于多模态提示的沉浸式空间戏剧生成) [06:29] 🧩 X-Fusion: Introducing New Modality to Frozen Large Language Models(X-Fusion:为冻结的大型语言模型引入新模态) [07:14] 🎭 Disentangle Identity, Cooperate Emotion: Correlation-Aware Emotional Talking Portrait Generation(解耦身份,协同情感:相关感知的情感语音头像生成) [07:53] 🌳 TreeHop: Generate and Filter Next Query Embeddings Efficiently for Multi-hop Question Answering(TreeHop:为多跳问答高效生成和过滤下一跳查询嵌入) 【关注我们】 您还可以在以下平台找到我们,获得播客内容以外更多信息 小红书: AI速递
与播客爱好者一起交流
播放列表还是空的
去找些喜欢的节目添加进来吧