本期的 5 篇论文如下: [00:43] TOP1(🔥205) | 🌐 Mutarjim: Advancing Bidirectional Arabic-English Translation with a Small Language Model(Mutarjim:利用小型语言模型推进阿拉伯语-英语双向翻译) [03:10] TOP2(🔥139) | 🗜 Shifting AI Efficiency From Model-Centric to Data-Centric Compression(AI效率转移:从以模型为中心到以数据为中心的压缩) [04:55] TOP3(🔥106) | 📊 TabSTAR: A Foundation Tabular Model With Semantically Target-Aware Representations(TabSTAR:具有语义目标感知表征的表格基础模型) [07:01] TOP4(🔥100) | 🤖 The Entropy Mechanism of Reinforcement Learning for Reasoning Language Models(用于推理语言模型的强化学习的熵机制) [09:30] TOP5(🔥97) | 🧪 ScienceBoard: Evaluating Multimodal Autonomous Agents in Realistic Scientific Workflows(ScienceBoard:评估现实科学工作流程中的多模态自主Agent) 【关注我们】 您还可以在以下平台找到我们,获得播客内容以外更多信息 小红书: AI速递
本期的 15 篇论文如下: [00:22] 📊 Table-R1: Inference-Time Scaling for Table Reasoning(Table-R1:表格推理的推理时扩展) [01:02] 🤖 VF-Eval: Evaluating Multimodal LLMs for Generating Feedback on AIGC Videos(VF-Eval:评估多模态大语言模型生成AIGC视频反馈的能力) [01:45] 🧠 Spatial-MLLM: Boosting MLLM Capabilities in Visual-based Spatial Intelligence(Spatial-MLLM:提升多模态大语言模型在基于视觉的空间智能方面的能力) [02:25] 🧠 The Climb Carves Wisdom Deeper Than the Summit: On the Noisy Rewards in Learning to Reason(行胜于言:论证推理学习中的噪声奖励) [03:11] 🤖 ZeroGUI: Automating Online GUI Learning at Zero Human Cost(ZeroGUI:零人工成本的在线GUI学习自动化) [03:45] 🤔 VideoReasonBench: Can MLLMs Perform Vision-Centric Complex Video Reasoning?(VideoReasonBench:多模态大语言模型能否执行以视觉为中心的复杂视频推理?) [04:39] 🧬 Satori-SWE: Evolutionary Test-Time Scaling for Sample-Efficient Software Engineering(Satori-SWE: 面向高效软件工程的演化测试时扩展) [05:15] 🤔 Are Reasoning Models More Prone to Hallucination?(推理模型更容易产生幻觉吗?) [05:51] 🤖 cadrille: Multi-modal CAD Reconstruction with Online Reinforcement Learning(cadrille:基于在线强化学习的多模态CAD重建) [06:29] 🎨 D-AR: Diffusion via Autoregressive Models(D-AR:基于自回归模型的扩散) [07:16] 📸 AnySplat: Feed-forward 3D Gaussian Splatting from Unconstrained Views(AnySplat:来自非约束视角的Feed-forward 3D高斯溅射) [07:53] 🛠 SWE-bench Goes Live!(SWE-bench-Live:一个实时更新的问题解决基准评测) [08:36] 💡 Multi-Domain Explainability of Preferences(偏好的多领域可解释性) [09:16] 🤖 UniRL: Self-Improving Unified Multimodal Models via Supervised and Reinforcement Learning(UniRL:基于监督学习和强化学习的自提升统一多模态模型) [10:01] 🗣 FAMA: The First Large-Scale Open-Science Speech Foundation Model for English and Italian(FAMA:首个面向英语和意大利语的大规模开放科学语音基础模型) 【关注我们】 您还可以在以下平台找到我们,获得播客内容以外更多信息 小红书: AI速递
本期的 15 篇论文如下: [00:22] 🤖 The Entropy Mechanism of Reinforcement Learning for Reasoning Language Models(用于推理语言模型的强化学习的熵机制) [00:56] 🛣 R2R: Efficiently Navigating Divergent Reasoning Paths with Small-Large Model Token Routing(R2R:通过大小模型令牌路由高效导航不同的推理路径) [01:40] 🧠 Skywork Open Reasoner 1 Technical Report(Skywork开放推理器1技术报告) [02:20] 🔍 Sherlock: Self-Correcting Reasoning in Vision-Language Models(夏洛克:视觉-语言模型中的自我纠正推理) [02:55] 🤖 Unsupervised Post-Training for Multi-Modal LLM Reasoning via GRPO(基于GRPO的无监督后训练提升多模态LLM推理能力) [03:35] 🤖 SWE-rebench: An Automated Pipeline for Task Collection and Decontaminated Evaluation of Software Engineering Agents(SWE-rebench:一个用于软件工程代理任务收集和去污染评估的自动化流程) [04:25] 🚀 SageAttention2++: A More Efficient Implementation of SageAttention2(SageAttention2++:一种更高效的SageAttention2实现) [05:12] 🧠 Advancing Multimodal Reasoning via Reinforcement Learning with Cold Start(通过强化学习与冷启动推进多模态推理) [05:59] 🎬 Fostering Video Reasoning via Next-Event Prediction(通过预测下一事件促进视频推理) [06:42] 💡 RenderFormer: Transformer-based Neural Rendering of Triangle Meshes with Global Illumination(RenderFormer:基于Transformer的三角形网格全局光照神经渲染) [07:25] 🔬 DeepResearchGym: A Free, Transparent, and Reproducible Evaluation Sandbox for Deep Research(DeepResearchGym:一个免费、透明且可复现的深度研究评估沙盒) [08:16] 🖼 Chain-of-Zoom: Extreme Super-Resolution via Scale Autoregression and Preference Alignment(链式缩放:通过尺度自回归和偏好对齐实现极限超分辨率) [08:58] 🧩 Universal Reasoner: A Single, Composable Plug-and-Play Reasoner for Frozen LLMs(通用推理器:一个用于冻结LLM的单一、可组合的即插即用推理器) [09:38] 🚚 SVRPBench: A Realistic Benchmark for Stochastic Vehicle Routing Problem(SVRPBench:一个面向随机车辆路径问题的真实基准) [10:26] 🌐 Judging Quality Across Languages: A Multilingual Approach to Pretraining Data Filtering with Language Models(跨语言质量评估:一种基于语言模型的多语种预训练数据过滤方法) 【关注我们】 您还可以在以下平台找到我们,获得播客内容以外更多信息 小红书: AI速递
本期的 15 篇论文如下: [00:23] 🧪 ScienceBoard: Evaluating Multimodal Autonomous Agents in Realistic Scientific Workflows(ScienceBoard:评估现实科学工作流程中的多模态自主Agent) [01:09] 🤔 MME-Reasoning: A Comprehensive Benchmark for Logical Reasoning in MLLMs(MME-推理:多模态大型语言模型中逻辑推理的综合基准) [01:51] 🖼 Paper2Poster: Towards Multimodal Poster Automation from Scientific Papers(Paper2Poster:基于科研论文的多模态海报自动生成) [02:28] 🎨 OmniConsistency: Learning Style-Agnostic Consistency from Paired Stylization Data(OmniConsistency:从配对风格化数据中学习与风格无关的一致性) [03:06] 🎬 OpenS2V-Nexus: A Detailed Benchmark and Million-Scale Dataset for Subject-to-Video Generation(OpenS2V-Nexus:一个用于主题驱动视频生成的详细基准和百万级数据集) [03:50] 🧠 SynLogic: Synthesizing Verifiable Reasoning Data at Scale for Learning Logical Reasoning and Beyond(SynLogic:大规模合成可验证推理数据,用于学习逻辑推理及其他能力) [04:32] 💡 Exploring the Latent Capacity of LLMs for One-Step Text Generation(探索大型语言模型在一步文本生成中的潜在能力) [05:13] 🧠 VerIPO: Cultivating Long Reasoning in Video-LLMs via Verifier-Gudied Iterative Policy Optimization(VerIPO:通过验证器引导的迭代策略优化,培养视频大型语言模型中的长期推理能力) [05:48] 🤔 Don't Overthink it. Preferring Shorter Thinking Chains for Improved LLM Reasoning(别想太多:偏好更短的思维链以提升大型语言模型的推理能力) [06:29] 🤔 MMMR: Benchmarking Massive Multi-Modal Reasoning Tasks(MMMR:大规模多模态推理任务的基准测试) [07:09] 🤖 UI-Genie: A Self-Improving Approach for Iteratively Boosting MLLM-based Mobile GUI Agents(UI-Genie:一种迭代提升基于MLLM的移动GUI代理的自提升方法) [07:52] 🎬 Sparse VideoGen2: Accelerate Video Generation with Sparse Attention via Semantic-Aware Permutation(Sparse VideoGen2:通过语义感知置换和稀疏注意力加速视频生成) [08:28] 📹 MME-VideoOCR: Evaluating OCR-Based Capabilities of Multimodal LLMs in Video Scenarios(MME-VideoOCR:评估多模态大型语言模型在视频场景中基于OCR的能力) [09:16] 🧩 GraLoRA: Granular Low-Rank Adaptation for Parameter-Efficient Fine-Tuning(GraLoRA:用于参数高效微调的细粒度低秩适配) [10:02] 🕵 Video-Holmes: Can MLLM Think Like Holmes for Complex Video Reasoning?(Video-Holmes:多模态大语言模型能否像福尔摩斯一样进行复杂的视频推理?) 【关注我们】 您还可以在以下平台找到我们,获得播客内容以外更多信息 小红书: AI速递
本期的 15 篇论文如下: [00:24] 🗜 Shifting AI Efficiency From Model-Centric to Data-Centric Compression(AI效率转移:从以模型为中心到以数据为中心的压缩) [01:05] 🌐 Mutarjim: Advancing Bidirectional Arabic-English Translation with a Small Language Model(Mutarjim:利用小型语言模型推进阿拉伯语-英语双向翻译) [02:00] 📊 BizFinBench: A Business-Driven Real-World Financial Benchmark for Evaluating LLMs(BizFinBench:一个用于评估大型语言模型在业务驱动的真实金融场景表现的基准) [02:40] 🖼 Alchemist: Turning Public Text-to-Image Data into Generative Gold(炼金术士:将公共文本到图像数据转化为生成式金矿) [03:18] 🧠 Embodied Agents Meet Personalization: Exploring Memory Utilization for Personalized Assistance(具身智能体与个性化相遇:探索用于个性化辅助的记忆利用) [03:59] 🧠 PATS: Process-Level Adaptive Thinking Mode Switching(PATS:过程级自适应思维模式切换) [04:52] 🧠 ARM: Adaptive Reasoning Model(自适应推理模型) [05:37] 🧩 Enigmata: Scaling Logical Reasoning in Large Language Models with Synthetic Verifiable Puzzles(谜题:利用合成可验证谜题扩展大型语言模型的逻辑推理能力) [06:18] 🤖 B-score: Detecting biases in large language models using response history(B-score:利用响应历史检测大型语言模型中的偏见) [06:58] 🧠 Deciphering Trajectory-Aided LLM Reasoning: An Optimization Perspective(解析轨迹辅助的大语言模型推理:一个优化的视角) [07:39] 🛡 Lifelong Safety Alignment for Language Models(语言模型的终身安全对齐) [08:14] 🧪 MOOSE-Chem2: Exploring LLM Limits in Fine-Grained Scientific Hypothesis Discovery via Hierarchical Search(MOOSE-Chem2: 探索大型语言模型在基于层级搜索的精细化科学假设发现中的能力极限) [09:00] 🗺 Can MLLMs Guide Me Home? A Benchmark Study on Fine-Grained Visual Reasoning from Transit Maps(多模态大语言模型能指引我回家吗?基于交通地图的细粒度视觉推理基准研究) [09:43] 🧮 Surrogate Signals from Format and Length: Reinforcement Learning for Solving Mathematical Problems without Ground Truth Answers(来自格式和长度的替代信号:用于解决没有标准答案的数学问题的强化学习) [10:28] 🧠 Reinforcement Fine-Tuning Powers Reasoning Capability of Multimodal Large Language Models(强化微调驱动多模态大语言模型的推理能力) 【关注我们】 您还可以在以下平台找到我们,获得播客内容以外更多信息 小红书: AI速递
本期的 15 篇论文如下: [00:23] 📊 TabSTAR: A Foundation Tabular Model With Semantically Target-Aware Representations(TabSTAR:具有语义目标感知表征的表格基础模型) [00:59] 🧠 QwenLong-L1: Towards Long-Context Large Reasoning Models with Reinforcement Learning(QwenLong-L1:基于强化学习的长文本大型推理模型) [01:43] 🤔 Reasoning Model is Stubborn: Diagnosing Instruction Overriding in Reasoning Models(推理模型是顽固的:诊断推理模型中的指令覆盖问题) [02:19] 🚀 Quartet: Native FP4 Training Can Be Optimal for Large Language Models(Quartet:原生FP4训练对于大型语言模型是最优的) [03:01] 🤖 One RL to See Them All: Visual Triple Unified Reinforcement Learning(万法归一:视觉三元统一强化学习) [03:36] 🤖 Distilling LLM Agent into Small Models with Retrieval and Code Tools(利用检索和代码工具将大型语言模型Agent提炼到小型模型中) [04:21] 🤔 PhyX: Does Your Model Have the "Wits" for Physical Reasoning?(PhyX:你的模型具备物理推理的“智慧”吗?) [05:02] ♾ QwenLong-CPRS: Towards $\infty$-LLMs with Dynamic Context Optimization(QwenLong-CPRS:通过动态上下文优化迈向无限长的语言模型) [05:46] 🧬 Scaling Image and Video Generation via Test-Time Evolutionary Search(基于测试时演化搜索的图像和视频生成扩展) [06:21] 🎬 Model Already Knows the Best Noise: Bayesian Active Noise Selection via Attention in Video Diffusion Model(模型早已知晓最佳噪声:视频扩散模型中基于注意力的贝叶斯主动噪声选择) [07:06] 🤔 VeriThinker: Learning to Verify Makes Reasoning Model Efficient(VeriThinker:通过学习验证来提高推理模型的效率) [07:45] 🧪 MOOSE-Chem3: Toward Experiment-Guided Hypothesis Ranking via Simulated Experimental Feedback(MOOSE-Chem3:通过模拟实验反馈实现实验指导下的假设排序) [08:27] 🎧 AudioTrust: Benchmarking the Multifaceted Trustworthiness of Audio Large Language Models(AudioTrust:音频大语言模型多方面可信度基准测试) [09:10] 💻 FullFront: Benchmarking MLLMs Across the Full Front-End Engineering Workflow(FullFront:跨越完整前端工程工作流程的多模态大语言模型基准测试) [09:51] 🤥 Teaching with Lies: Curriculum DPO on Synthetic Negatives for Hallucination Detection(谎言教学:基于合成负样本的课程DPO用于幻觉检测) 【关注我们】 您还可以在以下平台找到我们,获得播客内容以外更多信息 小红书: AI速递
本期的 5 篇论文如下: [00:42] TOP1(🔥146) | 🤖 Qwen3 Technical Report(Qwen3技术报告) [03:08] TOP2(🔥114) | 💡 Emerging Properties in Unified Multimodal Pretraining(统一多模态预训练中的涌现属性) [05:22] TOP3(🔥105) | 🔗 Chain-of-Model Learning for Language Model(模型链学习:一种用于语言模型的新型学习范式) [07:34] TOP4(🔥93) | 🧪 NovelSeek: When Agent Becomes the Scientist -- Building Closed-Loop System from Hypothesis to Verification(NovelSeek:当智能体成为科学家——构建从假设到验证的闭环系统) [09:57] TOP5(🔥91) | 🤖 Web-Shepherd: Advancing PRMs for Reinforcing Web Agents(Web-Shepherd:用于增强Web代理的PRM的进步) 【关注我们】 您还可以在以下平台找到我们,获得播客内容以外更多信息 小红书: AI速递
本期的 15 篇论文如下: [00:22] 🧪 NovelSeek: When Agent Becomes the Scientist -- Building Closed-Loop System from Hypothesis to Verification(NovelSeek:当智能体成为科学家——构建从假设到验证的闭环系统) [01:05] 🤔 Scaling Reasoning, Losing Control: Evaluating Instruction Following in Large Reasoning Models(规模化推理,失控的指令:评估大型推理模型中的指令遵循) [01:50] 🤖 Tool-Star: Empowering LLM-Brained Multi-Tool Reasoner via Reinforcement Learning(Tool-Star:通过强化学习赋能基于LLM的多工具推理器) [02:30] 🖼 KRIS-Bench: Benchmarking Next-Level Intelligent Image Editing Models(KRIS-Bench:下一代智能图像编辑模型评测基准) [03:16] 🖼 Pixel Reasoner: Incentivizing Pixel-Space Reasoning with Curiosity-Driven Reinforcement Learning(像素推理器:通过好奇心驱动的强化学习激励像素空间推理) [04:03] ⏱ QuickVideo: Real-Time Long Video Understanding with System Algorithm Co-Design(QuickVideo:基于系统算法协同设计的实时长视频理解) [04:55] 🖼 GoT-R1: Unleashing Reasoning Capability of MLLM for Visual Generation with Reinforcement Learning(GoT-R1:利用强化学习释放多模态大语言模型在视觉生成中的推理能力) [05:39] 🖼 LLaDA-V: Large Language Diffusion Models with Visual Instruction Tuning(LLaDA-V:基于视觉指令调整的大型语言扩散模型) [06:15] 📉 Risk-Averse Reinforcement Learning with Itakura-Saito Loss(基于Itakura-Saito损失的风险规避强化学习) [06:54] 🚀 Scaling Diffusion Transformers Efficiently via $μ$P(通过 μP 高效扩展扩散Transformer) [07:33] 🖼 Understanding Generative AI Capabilities in Everyday Image Editing Tasks(理解生成式人工智能在日常图像编辑任务中的能力) [08:19] 🧠 Let LLMs Break Free from Overthinking via Self-Braking Tuning(让大型语言模型通过自刹车调整摆脱过度思考) [08:56] 🧠 Mind the Gap: Bridging Thought Leap for Improved Chain-of-Thought Tuning(弥合差距:桥接思维跳跃以改进思维链微调) [09:37] 🎮 VideoGameQA-Bench: Evaluating Vision-Language Models for Video Game Quality Assurance(VideoGameQA-Bench:评估视觉-语言模型在视频游戏质量保证中的应用) [10:23] 💡 Dimple: Discrete Diffusion Multimodal Large Language Model with Parallel Decoding(Dimple:具有并行解码的离散扩散多模态大型语言模型) 【关注我们】 您还可以在以下平台找到我们,获得播客内容以外更多信息 小红书: AI速递
本期的 15 篇论文如下: [00:25] 🤖 Web-Shepherd: Advancing PRMs for Reinforcing Web Agents(Web-Shepherd:用于增强Web代理的PRM的进步) [01:13] 🧮 Scaling Law for Quantization-Aware Training(量化感知训练的缩放法则) [01:53] 🤖 UniVG-R1: Reasoning Guided Universal Visual Grounding with Reinforcement Learning(基于强化学习和推理引导的通用视觉定位) [02:28] 🎨 MMaDA: Multimodal Large Diffusion Language Models(MMaDA:多模态大型扩散语言模型) [03:04] 🔄 Diffusion vs. Autoregressive Language Models: A Text Embedding Perspective(扩散模型 vs. 自回归语言模型:文本嵌入的视角) [03:44] 💻 Efficient Agent Training for Computer Use(用于计算机使用的高效Agent训练) [04:26] 🧠 Learn to Reason Efficiently with Adaptive Length-based Reward Shaping(基于自适应长度奖励塑造的高效推理学习) [05:08] 💡 When to Continue Thinking: Adaptive Thinking Mode Switching for Efficient Reasoning(何时继续思考:用于高效推理的自适应思考模式切换) [05:39] 🤖 Vid2World: Crafting Video Diffusion Models to Interactive World Models(Vid2World:构建交互式世界模型的视频扩散模型) [06:16] 🖼 IA-T2I: Internet-Augmented Text-to-Image Generation(互联网增强的文本到图像生成) [06:49] 🧠 Deliberation on Priors: Trustworthy Reasoning of Large Language Models on Knowledge Graphs(基于先验知识的审慎:大型语言模型在知识图谱上的可信推理) [07:31] 🎮 lmgame-Bench: How Good are LLMs at Playing Games?(lmgame-Bench:大型语言模型在玩游戏方面表现如何?) [08:18] 🏙 Constructing a 3D Town from a Single Image(从单张图像构建三维城镇) [08:58] 🚀 dKV-Cache: The Cache for Diffusion Language Models(dKV-Cache:扩散语言模型的缓存) [09:40] 🛡 How Should We Enhance the Safety of Large Reasoning Models: An Empirical Study(我们应该如何提升大型推理模型的安全性:一项实证研究) 【关注我们】 您还可以在以下平台找到我们,获得播客内容以外更多信息 小红书: AI速递
本期的 15 篇论文如下: [00:22] 💡 Emerging Properties in Unified Multimodal Pretraining(统一多模态预训练中的涌现属性) [01:03] 🚀 SageAttention3: Microscaling FP4 Attention for Inference and An Exploration of 8-Bit Training(SageAttention3:用于推理的微缩FP4注意力机制与8位训练的探索) [01:42] 🖼 VisualQuality-R1: Reasoning-Induced Image Quality Assessment via Reinforcement Learning to Rank(VisualQuality-R1:基于强化学习排序的推理引导图像质量评估) [02:23] 🤖 Visual Agentic Reinforcement Fine-Tuning(视觉Agent强化微调) [03:01] 🧪 The Aloe Family Recipe for Open and Specialized Healthcare LLMs(开源与专用医疗保健大型语言模型的芦荟家族秘方) [03:40] 🧮 Optimizing Anytime Reasoning via Budget Relative Policy Optimization(通过预算相对策略优化实现随时推理优化) [04:25] 🧠 Neurosymbolic Diffusion Models(神经符号扩散模型) [05:02] 🌊 Latent Flow Transformer(潜在流Transformer) [05:40] 🧑 Exploring Federated Pruning for Large Language Models(探索用于大型语言模型的联邦剪枝) [06:23] 👁 Visionary-R1: Mitigating Shortcuts in Visual Reasoning with Reinforcement Learning(Visionary-R1:利用强化学习缓解视觉推理中的捷径问题) [07:05] 🧠 General-Reasoner: Advancing LLM Reasoning Across All Domains(通用推理器:提升大型语言模型在所有领域的推理能力) [07:45] 🤔 Reasoning Models Better Express Their Confidence(推理模型更善于表达其置信度) [08:20] 🚀 Reasoning Path Compression: Compressing Generation Trajectories for Efficient LLM Reasoning(推理路径压缩:压缩生成轨迹以实现高效的LLM推理) [09:07] 🖼 Training-Free Watermarking for Autoregressive Image Generation(自回归图像生成模型的免训练水印方法) [09:48] 🤔 VideoEval-Pro: Robust and Realistic Long Video Understanding Evaluation(VideoEval-Pro:稳健且真实的长视频理解评估) 【关注我们】 您还可以在以下平台找到我们,获得播客内容以外更多信息 小红书: AI速递
与播客爱好者一起交流
播放列表还是空的
去找些喜欢的节目添加进来吧