本期的 15 篇论文如下: [00:22] 🤖 Voila: Voice-Language Foundation Models for Real-Time Autonomous Interaction and Voice Role-Play(Voila:用于实时自主交互和语音角色扮演的语音-语言基础模型) [01:09] 🤔 RM-R1: Reward Modeling as Reasoning(RM-R1:将奖励建模视为推理) [01:52] 🧠 Grokking in the Wild: Data Augmentation for Real-World Multi-Hop Reasoning with Transformers(野外Grokking:用于Transformer真实世界多跳推理的数据增强) [02:32] 🧮 FormalMATH: Benchmarking Formal Mathematical Reasoning of Large Language Models(FormalMATH:大规模语言模型的形式化数学推理基准) [03:17] ✂ ReplaceMe: Network Simplification via Layer Pruning and Linear Transformations(ReplaceMe:基于层剪枝和线性变换的网络简化) [03:59] 🧠 Optimizing Chain-of-Thought Reasoners via Gradient Variance Minimization in Rejection Sampling and RL(通过拒绝采样和强化学习中的梯度方差最小化优化思维链推理器) [04:39] 🚀 Practical Efficiency of Muon for Pretraining(Muon在预训练中的实际效率) [05:18] ⚙ A Survey on Inference Engines for Large Language Models: Perspectives on Optimization and Efficiency(大语言模型推理引擎综述:优化与效率的视角) [06:01] 🤖 R1-Reward: Training Multimodal Reward Model Through Stable Reinforcement Learning(R1-奖励:通过稳定强化学习训练多模态奖励模型) [06:44] 🤔 Think on your Feet: Adaptive Thinking via Reinforcement Learning for Social Agents(随机应变:基于强化学习的社交智能体自适应思考) [07:24] 🤖 SkillMimic-V2: Learning Robust and Generalizable Interaction Skills from Sparse and Noisy Demonstrations(SkillMimic-V2:从稀疏和嘈杂的示范中学习鲁棒且可泛化的交互技能) [08:03] 🤖 Agentic Reasoning and Tool Integration for LLMs via Reinforcement Learning(基于强化学习的LLM自主推理与工具集成) [08:50] 🖼 SuperEdit: Rectifying and Facilitating Supervision for Instruction-Based Image Editing(SuperEdit:修正并促进基于指令的图像编辑的监督) [09:30] 🧮 Low-Precision Training of Large Language Models: Methods, Challenges, and Opportunities(大语言模型低精度训练:方法、挑战与机遇) [10:11] 🎨 Ming-Lite-Uni: Advancements in Unified Architecture for Natural Multimodal Interaction(Ming-Lite-Uni:自然多模态交互统一架构的进展) 【关注我们】 您还可以在以下平台找到我们,获得播客内容以外更多信息 小红书: AI速递
本期的 8 篇论文如下: [00:21] 🖼 PixelHacker: Image Inpainting with Structural and Semantic Consistency(PixelHacker:基于结构和语义一致性的图像修复) [01:01] 🎨 Improving Editability in Image Generation with Layer-wise Memory(通过分层记忆提升图像生成的可编辑性) [01:35] 🤖 Beyond One-Size-Fits-All: Inversion Learning for Highly Effective NLG Evaluation Prompts(超越一刀切:用于高效自然语言生成评估提示的反演学习) [02:18] 💡 Llama-Nemotron: Efficient Reasoning Models(Llama-Nemotron:高效推理模型) [03:02] 🧩 CORG: Generating Answers from Complex, Interrelated Contexts(CORG: 从复杂、相互关联的上下文中生成答案) [03:45] 🤖 Real-World Gaps in AI Governance Research(人工智能治理研究中的现实差距) [04:26] 🤖 TeLoGraF: Temporal Logic Planning via Graph-encoded Flow Matching(TeLoGraF:基于图编码流匹配的时序逻辑规划) [05:02] 🔄 X-Cross: Dynamic Integration of Language Models for Cross-Domain Sequential Recommendation(X-Cross:用于跨领域序列推荐的语言模型动态集成) 【关注我们】 您还可以在以下平台找到我们,获得播客内容以外更多信息 小红书: AI速递
本期的 5 篇论文如下: [00:43] TOP1(🔥149) | 🎥 Towards Understanding Camera Motions in Any Video(迈向理解任意视频中的相机运动) [03:05] TOP2(🔥74) | 🧠 Reinforcement Learning for Reasoning in Large Language Models with One Training Example(单样本强化学习赋能大语言模型推理) [05:48] TOP3(🔥54) | 🎭 The Leaderboard Illusion(排行榜的幻觉) [07:58] TOP4(🔥51) | 🔍 UniversalRAG: Retrieval-Augmented Generation over Multiple Corpora with Diverse Modalities and Granularities(通用RAG:基于多模态、多粒度异构语料库的检索增强生成) [10:29] TOP5(🔥50) | 🧠 Skywork R1V2: Multimodal Hybrid Reinforcement Learning for Reasoning(Skywork R1V2:用于推理的多模态混合强化学习) 【关注我们】 您还可以在以下平台找到我们,获得播客内容以外更多信息 小红书: AI速递
本期的 8 篇论文如下: [00:28] 🎮 A Survey of Interactive Generative Video(交互式生成视频综述) [01:05] 🧐 DeepCritic: Deliberate Critique with Large Language Models(DeepCritic: 基于大语言模型的审慎评判) [01:38] 🖼 T2I-R1: Reinforcing Image Generation with Collaborative Semantic-level and Token-level CoT(T2I-R1:通过协作式语义级和令牌级思维链强化图像生成) [02:15] 👄 KeySync: A Robust Approach for Leakage-free Lip Synchronization in High Resolution(KeySync:一种高分辨率下鲁棒的无泄漏唇形同步方法) [02:50] 🧠 AdaR1: From Long-CoT to Hybrid-CoT via Bi-Level Adaptive Reasoning Optimization(AdaR1:通过双层自适应推理优化,从长链思维到混合链思维) [03:31] 📚 TF1-EN-3M: Three Million Synthetic Moral Fables for Training Small, Open Language Models(TF1-EN-3M:用于训练小型开放语言模型的300万条合成道德寓言) [04:15] 🚀 LLMs for Engineering: Teaching Models to Design High Powered Rockets(工程领域的大语言模型:教模型设计高功率火箭) [05:09] 🩻 MediAug: Exploring Visual Augmentation in Medical Imaging(MediAug:探索医学影像中的视觉增强) 【关注我们】 您还可以在以下平台找到我们,获得播客内容以外更多信息 小红书: AI速递
本期的 14 篇论文如下: [00:21] 🗣 Sadeed: Advancing Arabic Diacritization Through Small Language Model(Sadeed:通过小型语言模型推进阿拉伯语变音) [01:05] 🔎 WebThinker: Empowering Large Reasoning Models with Deep Research Capability(WebThinker:利用深度研究能力增强大型推理模型) [01:43] 🧮 Phi-4-Mini-Reasoning: Exploring the Limits of Small Reasoning Language Models in Math(Phi-4-Mini-Reasoning:探索小型推理语言模型在数学方面的极限) [02:20] 💡 Softpick: No Attention Sink, No Massive Activations with Rectified Softmax(Softpick:一种使用修正Softmax且无注意力陷阱、无大规模激活的方法) [03:00] 🤔 Beyond the Last Answer: Your Reasoning Trace Uncovers More than You Think(超越最终答案:你的推理轨迹揭示了超乎你想象的信息) [03:38] 🧠 Phi-4-reasoning Technical Report(Phi-4-reasoning 技术报告) [04:21] 🧩 COMPACT: COMPositional Atomic-to-Complex Visual Capability Tuning(COMPACT:组合式的原子到复杂视觉能力调优) [04:59] 💡 Taming the Titans: A Survey of Efficient LLM Inference Serving(驯服泰坦:高效LLM推理服务综述) [05:34] 🤖 Generative AI for Character Animation: A Comprehensive Survey of Techniques, Applications, and Future Directions(用于角色动画的生成式人工智能:技术、应用与未来方向的综合综述) [06:09] 🤖 RoboVerse: Towards a Unified Platform, Dataset and Benchmark for Scalable and Generalizable Robot Learning(RoboVerse:面向可扩展和泛化机器人学习的统一平台、数据集和基准) [06:49] 🎬 ReVision: High-Quality, Low-Cost Video Generation with Explicit 3D Physics Modeling for Complex Motion and Interaction(ReVision:基于显式3D物理建模的高质量、低成本复杂运动与交互视频生成) [07:32] 🛡 Llama-3.1-FoundationAI-SecurityLLM-Base-8B Technical Report(Llama-3.1-FoundationAI-SecurityLLM-Base-8B 技术报告) [08:08] 🩻 UniBiomed: A Universal Foundation Model for Grounded Biomedical Image Interpretation(UniBiomed:用于Grounded生物医学图像解读的通用基础模型) [08:53] 🗳 Selecting Optimal Candidate Profiles in Adversarial Environments Using Conjoint Analysis and Machine Learning(在对抗环境中利用联合分析和机器学习选择最优候选人形象) 【关注我们】 您还可以在以下平台找到我们,获得播客内容以外更多信息 小红书: AI速递
本期的 12 篇论文如下: [00:24] 🔍 UniversalRAG: Retrieval-Augmented Generation over Multiple Corpora with Diverse Modalities and Granularities(通用RAG:基于多模态、多粒度异构语料库的检索增强生成) [01:06] 🧠 Reinforcement Learning for Reasoning in Large Language Models with One Training Example(单样本强化学习赋能大语言模型推理) [01:52] 🧠 ReasonIR: Training Retrievers for Reasoning Tasks(ReasonIR:训练用于推理任务的检索器) [02:31] 🤖 Toward Evaluative Thinking: Meta Policy Optimization with Evolving Reward Models(迈向评估性思考:基于演化奖励模型的元策略优化) [03:20] 🤖 TesserAct: Learning 4D Embodied World Models(TesserAct:学习4D具身世界模型) [04:01] 🎭 The Leaderboard Illusion(排行榜的幻觉) [04:37] 🖼 YoChameleon: Personalized Vision and Language Generation(Yo'Chameleon:个性化的视觉与语言生成) [05:17] 🛡 Certified Mitigation of Worst-Case LLM Copyright Infringement(大语言模型最坏情况版权侵权的认证缓解) [05:50] 🎭 ISDrama: Immersive Spatial Drama Generation through Multimodal Prompting(ISDrama:基于多模态提示的沉浸式空间戏剧生成) [06:29] 🧩 X-Fusion: Introducing New Modality to Frozen Large Language Models(X-Fusion:为冻结的大型语言模型引入新模态) [07:14] 🎭 Disentangle Identity, Cooperate Emotion: Correlation-Aware Emotional Talking Portrait Generation(解耦身份,协同情感:相关感知的情感语音头像生成) [07:53] 🌳 TreeHop: Generate and Filter Next Query Embeddings Efficiently for Multi-hop Question Answering(TreeHop:为多跳问答高效生成和过滤下一跳查询嵌入) 【关注我们】 您还可以在以下平台找到我们,获得播客内容以外更多信息 小红书: AI速递
本期的 11 篇论文如下: [00:23] ✍ RepText: Rendering Visual Text via Replicating(RepText:通过复制渲染视觉文本) [01:02] 📱 LLM-Powered GUI Agents in Phone Automation: Surveying Progress and Prospects(LLM驱动的手机GUI代理:进展与展望) [01:44] 🔐 CipherBank: Exploring the Boundary of LLM Reasoning Capabilities through Cryptography Challenges(CipherBank:通过密码学挑战探索大型语言模型推理能力的边界) [02:30] 🤔 Clinical knowledge in LLMs does not translate to human interactions(大型语言模型中的临床知识未能转化为人际互动) [03:16] ⬇ Group Downsampling with Equivariant Anti-aliasing(群等变抗锯齿降采样) [03:59] 📐 TrustGeoGen: Scalable and Formal-Verified Data Engine for Trustworthy Multi-modal Geometric Problem Solving(TrustGeoGen:用于可信多模态几何问题求解的可扩展且形式验证的数据引擎) [04:39] 🤖 SPC: Evolving Self-Play Critic via Adversarial Games for LLM Reasoning(SPC:通过对抗博弈演进自博弈评论器以提升大型语言模型推理能力) [05:30] 🖼 Benchmarking Multimodal Mathematical Reasoning with Explicit Visual Dependency(基于显式视觉依赖的多模态数学推理能力基准测试) [06:15] 🚀 MMInference: Accelerating Pre-filling for Long-Context VLMs via Modality-Aware Permutation Sparse Attention(MMInference:通过模态感知置换稀疏注意力加速长文本VLM的预填充) [06:49] 🔑 ICL CIPHERS: Quantifying "Learning'' in In-Context Learning via Substitution Ciphers(ICL密码:通过替换密码量化上下文学习中的“学习”) [07:30] 💡 ChiseLLM: Unleashing the Power of Reasoning LLMs for Chisel Agile Hardware Development(ChiseLLM:释放推理LLM在Chisel敏捷硬件开发中的力量) 【关注我们】 您还可以在以下平台找到我们,获得播客内容以外更多信息 小红书: AI速递
本期的 11 篇论文如下: [00:22] 🎥 Towards Understanding Camera Motions in Any Video(迈向理解任意视频中的相机运动) [01:04] 🧠 Skywork R1V2: Multimodal Hybrid Reinforcement Learning for Reasoning(Skywork R1V2:用于推理的多模态混合强化学习) [01:49] 💡 BitNet v2: Native 4-bit Activations with Hadamard Transformation for 1-bit LLMs(BitNet v2:用于1-bit LLM的具有哈达玛变换的原生4-bit激活) [02:28] 🌍 VideoVista-CulturalLingo: 360$^\circ$ Horizons-Bridging Cultures, Languages, and Domains in Video Comprehension(VideoVista-CulturalLingo:360°视野——弥合视频理解中的文化、语言和领域差异) [03:13] 🗣 Can Large Language Models Help Multimodal Language Analysis? MMLA: A Comprehensive Benchmark(大型语言模型能否助力多模态语言分析?MMLA:一个综合性的基准) [03:48] 🤔 The Sparse Frontier: Sparse Attention Trade-offs in Transformer LLMs(稀疏前沿:Transformer LLM 中的稀疏注意力权衡) [04:23] 🎬 Subject-driven Video Generation via Disentangled Identity and Motion(基于解耦身份与运动的主体驱动视频生成) [05:00] 🧠 DianJin-R1: Evaluating and Enhancing Financial Reasoning in Large Language Models(DianJin-R1:评估并提升大型语言模型中的金融推理能力) [05:34] 🔲 DC-SAM: In-Context Segment Anything in Images and Videos via Dual Consistency(DC-SAM:通过双重一致性实现图像和视频中的上下文分割) [06:12] 🔊 Kimi-Audio Technical Report(Kimi-Audio技术报告) [06:43] 🇮 Optimizing LLMs for Italian: Reducing Token Fertility and Enhancing Efficiency Through Vocabulary Adaptation(优化意大利语大型语言模型:通过词汇调整减少Token冗余并提高效率) 【关注我们】 您还可以在以下平台找到我们,获得播客内容以外更多信息 小红书: AI速递
本期的 5 篇论文如下: [00:33] TOP1(🔥108) | 💡 Kuwain 1.5B: An Arabic SLM via Language Injection(Kuwain 1.5B:一种基于语言注入的阿拉伯语SLM) [02:43] TOP2(🔥98) | 🤔 Does Reinforcement Learning Really Incentivize Reasoning Capacity in LLMs Beyond the Base Model?(强化学习真的能激励大语言模型产生超越基础模型的推理能力吗?) [04:58] TOP3(🔥78) | 🤖 TTRL: Test-Time Reinforcement Learning(测试时强化学习) [07:12] TOP4(🔥71) | 💡 Learning to Reason under Off-Policy Guidance(基于离策略指导的学习推理) [09:12] TOP5(🔥62) | 🦅 Eagle 2.5: Boosting Long-Context Post-Training for Frontier Vision-Language Models(Eagle 2.5:提升前沿视觉-语言模型长文本后训练性能) 【关注我们】 您还可以在以下平台找到我们,获得播客内容以外更多信息 小红书: AI速递
本期的 15 篇论文如下: [00:24] 🖼 Step1X-Edit: A Practical Framework for General Image Editing(Step1X-Edit:一个通用的图像编辑实用框架) [01:05] 🖼 RefVNLI: Towards Scalable Evaluation of Subject-driven Text-to-image Generation(RefVNLI:面向主体驱动的文本到图像生成的可扩展评估) [01:48] 🤖 Paper2Code: Automating Code Generation from Scientific Papers in Machine Learning(Paper2Code:从机器学习科学论文中自动生成代码) [02:22] 🖼 Breaking the Modality Barrier: Universal Embedding Learning with Multimodal LLMs(打破模态壁垒:基于多模态大型语言模型的通用嵌入学习) [03:02] 🧠 Perspective-Aware Reasoning in Vision-Language Models via Mental Imagery Simulation(基于心智图像模拟的视觉-语言模型中的视角感知推理) [03:42] ⚖ QuaDMix: Quality-Diversity Balanced Data Selection for Efficient LLM Pretraining(QuaDMix:面向高效LLM预训练的质量-多样性平衡数据选择) [04:19] 🖼 Token-Shuffle: Towards High-Resolution Image Generation with Autoregressive Models(Token-Shuffle:面向自回归模型的高分辨率图像生成) [04:58] 🖼 Distilling semantically aware orders for autoregressive image generation(用于自回归图像生成的语义感知顺序蒸馏) [05:38] 🗜 DyMU: Dynamic Merging and Virtual Unmerging for Efficient VLMs(DyMU:用于高效视觉语言模型的动态合并与虚拟解合并) [06:17] 🇪 IberBench: LLM Evaluation on Iberian Languages(IberBench:伊比利亚语系的大语言模型评测基准) [07:01] 🧠 Process Reward Models That Think(思考过程奖励模型) [07:46] 🎨 Boosting Generative Image Modeling via Joint Image-Feature Synthesis(通过联合图像-特征合成增强生成图像建模) [08:21] 🎬 ViSMaP: Unsupervised Hour-long Video Summarisation by Meta-Prompting(ViSMaP:基于元提示的无监督小时级视频摘要) [09:02] 👗 3DV-TON: Textured 3D-Guided Consistent Video Try-on via Diffusion Models(3DV-TON:基于扩散模型的纹理3D引导一致性视频试穿) [09:44] 📹 TimeChat-Online: 80% Visual Tokens are Naturally Redundant in Streaming Videos(TimeChat-Online:在线流媒体视频中 80% 的视觉 tokens 天然冗余) 【关注我们】 您还可以在以下平台找到我们,获得播客内容以外更多信息 小红书: AI速递
本期的 14 篇论文如下: [00:23] 👁 VisuLogic: A Benchmark for Evaluating Visual Reasoning in Multi-modal Large Language Models(VisuLogic:一个用于评估多模态大型语言模型中视觉推理能力的基准) [01:08] 🎭 DreamID: High-Fidelity and Fast diffusion-based Face Swapping via Triplet ID Group Learning(DreamID:基于Triplet ID Group Learning的高保真快速扩散人脸替换) [01:46] 🌐 Trillion 7B Technical Report(Trillion-7B 技术报告) [02:30] 💡 Pre-DPO: Improving Data Utilization in Direct Preference Optimization Using a Guiding Reference Model(Pre-DPO:利用引导参考模型提升直接偏好优化中的数据利用率) [03:11] 🧩 I-Con: A Unifying Framework for Representation Learning(I-Con:一种统一的表征学习框架) [03:50] 🧩 Decoupled Global-Local Alignment for Improving Compositional Understanding(解耦的全局-局部对齐以提升组合理解能力) [04:30] 🎨 DreamO: A Unified Framework for Image Customization(DreamO:图像定制的统一框架) [05:12] 💡 Tina: Tiny Reasoning Models via LoRA(蒂娜:基于LoRA的小型推理模型) [05:49] 🛡 A Comprehensive Survey in LLM(-Agent) Full Stack Safety: Data, Training and Deployment(LLM(-Agent) 全栈安全综合研究:数据、训练与部署) [06:30] 🧐 RePOPE: Impact of Annotation Errors on the POPE Benchmark(RePOPE:标注错误对POPE基准的影响) [07:06] 💡 Rethinking the Generation of High-Quality CoT Data from the Perspective of LLM-Adaptive Question Difficulty Grading(重新思考:基于LLM自适应问题难度分级的优质CoT数据生成) [07:46] 🛠 CRUST-Bench: A Comprehensive Benchmark for C-to-safe-Rust Transpilation(CRUST-Bench:C到安全Rust转译的综合基准) [08:29] ✅ Unchecked and Overlooked: Addressing the Checkbox Blind Spot in Large Language Models with CheckboxQA(未被检查与忽视:用 CheckboxQA 数据集解决大语言模型中的复选框盲点) [09:21] 🖼 Progressive Language-guided Visual Learning for Multi-Task Visual Grounding(多任务视觉定位的渐进式语言引导视觉学习) 【关注我们】 您还可以在以下平台找到我们,获得播客内容以外更多信息 小红书: AI速递
本期的 15 篇论文如下: [00:22] 💡 Kuwain 1.5B: An Arabic SLM via Language Injection(Kuwain 1.5B:一种基于语言注入的阿拉伯语SLM) [00:58] 🤖 TTRL: Test-Time Reinforcement Learning(测试时强化学习) [01:40] 🌍 The Bitter Lesson Learned from 2,000+ Multilingual Benchmarks(从2000+多语种评测基准中汲取的惨痛教训) [02:23] 🖼 Describe Anything: Detailed Localized Image and Video Captioning(描述一切:细粒度局部图像与视频字幕生成) [03:00] 💡 Learning Adaptive Parallel Reasoning with Language Models(基于语言模型的自适应并行推理学习) [03:34] 🖼 IV-Bench: A Benchmark for Image-Grounded Video Perception and Reasoning in Multimodal LLMs(IV-Bench:多模态大语言模型中基于图像的视频感知与推理基准) [04:19] 📖 BookWorld: From Novels to Interactive Agent Societies for Creative Story Generation(BookWorld:从小说到交互式智能体社会,用于创意故事生成) [05:10] 🚀 Efficient Pretraining Length Scaling(高效预训练长度扩展) [05:49] 🩻 CheXWorld: Exploring Image World Modeling for Radiograph Representation Learning(CheXWorld:探索用于X射线影像表征学习的图像世界建模) [06:26] 🖼 Personalized Text-to-Image Generation with Auto-Regressive Models(基于自回归模型的个性化文本到图像生成) [07:08] 🗣 LiveCC: Learning Video LLM with Streaming Speech Transcription at Scale(LiveCC:基于大规模流式语音转录学习视频大语言模型) [07:47] 🎬 Vidi: Large Multimodal Models for Video Understanding and Editing(Vidi:用于视频理解与编辑的大型多模态模型) [08:27] 🖼 From Reflection to Perfection: Scaling Inference-Time Optimization for Text-to-Image Diffusion Models via Reflection Tuning(从反思到完美:通过反思调优扩展文本到图像扩散模型的推理时优化) [09:03] 🤖 LLMs are Greedy Agents: Effects of RL Fine-tuning on Decision-Making Abilities(LLM是贪婪的智能体:强化学习微调对决策能力的影响) [09:44] 🤖 WALL-E 2.0: World Alignment by NeuroSymbolic Learning improves World Model-based LLM Agents(WALL-E 2.0:通过神经符号学习实现世界对齐,提升基于世界模型的LLM智能体性能) 【关注我们】 您还可以在以下平台找到我们,获得播客内容以外更多信息 小红书: AI速递
与播客爱好者一起交流
播放列表还是空的
去找些喜欢的节目添加进来吧