00:00:31 AI的“思考”,会不会只是个高明的“学霸”? 00:04:41 AI写稿,快一点和好一点,哪个更重要? 00:09:15 AI进化论:如何把“草台班子”训练成“梦之队”? 00:13:51 AI的“左右互搏”:不靠人类,如何自我进化? 00:17:31 AI的“读心术”:我们能看懂它的“脑回路”吗? 本期介绍的五篇论文: [LG] Is Chain-of-Thought Reasoning of LLMs a Mirage? A Data Distribution Lens [Arizona State University] https://arxiv.org/abs/2508.01191 --- [CL] Seed Diffusion: A Large-Scale Diffusion Language Model with High-Speed Inference [ByteDance Seed & Tsinghua University] https://arxiv.org/abs/2508.02193 --- [CL] Multi-module GRPO: Composing Policy Gradients and Prompt Optimization for Language Model Programs [University of Notre Dame & Stanford University & UC Berkeley] https://arxiv.org/abs/2508.04660 --- [CL] R-Zero: Self-Evolving Reasoning LLM from Zero Data [Tencent AI Seattle Lab] https://arxiv.org/abs/2508.05004 --- [LG] Decomposing Representation Space into Interpretable Subspaces with Unsupervised Learning [Saarland University] https://arxiv.org/abs/2508.01916
放弃一个“不适合自己的生活方式”,就意味着一个“全新的开始”。
00:41:15 AI防忽悠指南:如何让聪明的机器不说胡话? 00:05:37 想变强?别再刷旧题了,你得学会自己“造”难题 00:10:06 AI进阶的秘密:一行代码如何让“学霸”真正开窍? 00:14:46 AI的新玩法:从“搬运工”到“侦探” 00:19:10 AI也会“想太多”?聊聊如何给模型一颗“定心丸” 本期介绍的五篇论文: [CL] Learning to Reason for Factuality [FAIR at Meta] https://arxiv.org/abs/2508.05618 --- [CL] MathSmith: Towards Extremely Hard Mathematical Reasoning by Forging Synthetic Problems with a Reinforced Policy [Tsinghua University] https://arxiv.org/abs/2508.05592 --- [LG] On the Generalization of SFT: A Reinforcement Learning Perspective with Reward Rectification [Southeast University & University of California, Los Angeles] https://arxiv.org/abs/2508.05629 --- [LG] GRAIL: Learning to Interact with Large Knowledge Graphs for Retrieval Augmented Reasoning [Tsinghua University] https://arxiv.org/abs/2508.05498 --- [CL] Efficient Reasoning for Large Reasoning Language Models via Certainty-Guided Reflection Suppression [Peking University & The Hong Kong University of Science and Technology] https://arxiv.org/abs/2508.05337
从一个微小的意愿,到一个根植于你灵魂深处的习惯,这趟旅程,就像一场炼金术。
00:00:37 AI也需要“断舍离” 00:05:14 AI也懂抄近道:当你的电脑助理学会了编程 00:09:00 你的下一件乐器,何必是乐器? 00:12:50 如何给AI装上一个“高情商”的大脑? 00:17:01 你的下一位同事,可能就是你的电脑本身 本期介绍的五篇论文: [CL] Sculptor: Empowering LLMs with Cognitive Agency via Active Context Management [Tsinghua University] https://arxiv.org/abs/2508.04664 --- [AS] Live Music Models [Google DeepMind] https://arxiv.org/abs/2508.04651 --- [CL] CoAct-1: Computer-using Agents with Coding as Actions [University of Southern California & Salesforce Research] https://arxiv.org/abs/2508.03923 --- [CL] Sotopia-RL: Reward Design for Social Intelligence [University of Illinois Urbana-Champaign & Carnegie Mellon University] https://arxiv.org/abs/2508.03905 --- [LG] OS Agents: A Survey on MLLM-based Agents for General Computing Devices Use https://arxiv.org/abs/2508.04482
康德的脚步,测量着柯尼斯堡的大地,也测量着人类理性的边界。
00:00:26 AI的自我修养:当机器开始给自己出题 00:03:54 给AI装上一个会“犯嘀咕”的大脑 00:08:57 你的AI客服,为什么总被人带跑偏? 00:13:36 AI的“速成班”:一个简单动作,让机器秒懂人心 00:17:36 如何让你身边的AI“开窍”? 本期介绍的几篇论文: [LG] Self-Questioning Language Models [CMU] https://arxiv.org/abs/2508.03682 --- [LG] Cognitive Loop via In-Situ Optimization: Self-Adaptive Reasoning for Science [Microsoft] https://arxiv.org/abs/2508.02789 --- [CL] Highlight & Summarize: RAG without the jailbreaks [Microsoft] https://arxiv.org/abs/2508.02872 --- [CL] Cropping outperforms dropout as an augmentation strategy for training self-supervised text embeddings [University of Tübingen] https://arxiv.org/abs/2508.03453 --- [LG] Agent Lightning: Train ANY AI Agents with Reinforcement Learning [Microsoft Research] https://arxiv.org/abs/2508.03680
真正的成长,无论是对人类还是对AI,都不是源于一帆风顺的灌输,而是源于一次次成功克服那些“必要之恶”之后,留下的深刻印记。
00:00:30 AI的“错题本”:高手是如何炼成的? 00:03:50 AI的自我修养:如何像高手一样不断精进? 00:07:10 AI当学徒:如何让机器自己学会“青出于蓝”? 00:11:25 AI的“装傻”艺术:我们还能相信它吗? 本期介绍的四篇论文: [CL] WarriorMath: Enhancing the Mathematical Ability of Large Language Models with a Defect-aware Framework [Microsoft & Peking University] https://arxiv.org/abs/2508.01245 --- [LG] Refine-n-Judge: Curating High-Quality Preference Chains for LLM-Fine-Tuning [Meta Reality Labs] https://arxiv.org/abs/2508.01543 --- [LG] CRINN: Contrastive Reinforcement Learning for Approximate Nearest Neighbor Search [University of Washington & DeepReinforce Team] https://arxiv.org/abs/2508.02091 --- [LG] LLMs Can Covertly Sandbag on Capability Evaluations Against Chain-of-Thought Monitoring [University College London] https://arxiv.org/abs/2508.00943
在这个AI可以写诗、可以作曲、可以为我们规划一切的时代,我们或许更需要掌握这套AI永远无法计算的“情绪算法”。
00:00:32 AI也会“偏科”?高手如何跳出舒适区 00:05:55 AI进化论:从“伸手党”到“高手的秘密” 00:10:42 AI的“体检”新思路:如何看穿一个模型的“小心思”? 00:15:17 一百万学生教会我们的事:简单,可能就是最优解 本期介绍的四篇论文: [LG] RL-PLUS: Countering Capability Boundary Collapse of LLMs in Reinforcement Learning with Hybrid-policy Optimization [Tongyi Lab, Alibaba Group & Peking University] https://arxiv.org/abs/2508.00222 --- [LG] MetaAgent: Toward Self-Evolving Agent via Tool Meta-Learning [BAAI] https://arxiv.org/abs/2508.00271 --- [LG] Watch the Weights: Unsupervised monitoring and control of fine-tuned LLMs [Carnegie Mellon University (CMU)] https://arxiv.org/abs/2508.00161 --- [LG] Learning to Optimize Feedback for One Million Students: Insights from Multi-Armed and Contextual Bandits in Large-Scale Online Tutoring [Carnegie Mellon University (CMU) & CK-12 Foundation] https://arxiv.org/abs/2508.00270
我们终其一生寻找的,或许不该是一个精确的目的地,而是一种心甘情愿的过程。
与播客爱好者一起交流
添加微信好友,获取更多播客资讯
播放列表还是空的
去找些喜欢的节目添加进来吧