主播
节目简介
来源:小宇宙
你有没有想过,为什么AI题刷得越多,反而越容易在简单问题上翻车?这一期,我们将一起潜入AI的内心世界,看看它们是如何陷入“应试教育”的陷阱,又是如何被“剪刀石头布”这样的逻辑死循环给困住的。但更重要的是,我们会发现,科学家们如何通过“读心术”和“记仇本”这样的奇思妙想,教会AI从失败中学习,并找到那条跳出困境的智慧之路。准备好,一场关于AI学习与评估的深度思考,现在开始。
00:00:35 为什么AI刷题越多,第一次答对率反而越低?
00:05:35 AI的“好记性”与“烂笔头”
00:10:06 AI程序员的“应试教育”陷阱
00:14:17 AI世界的“剪刀石头布”难题
00:19:08 机器人教练的“读心术”
本期介绍的几篇论文:
[LG] Why Pass﹫k Optimization Can Degrade Pass﹫1: Prompt Interference in LLM Post-training
[Singapore University of Technology and Design & University of Maryland]
https://arxiv.org/abs/2602.21189
---
[LG] Exploratory Memory-Augmented LLM Agent via Hybrid On- and Off-Policy Optimization
[Microsoft Research]
https://arxiv.org/abs/2602.23008
---
[LG] ISO-Bench: Can Coding Agents Optimize Real-World Inference Workloads?
[Lossfunk]
https://arxiv.org/abs/2602.19594
---
[LG] Back to Blackwell: Closing the Loop on Intransitivity in Multi-Objective Preference Fine-Tuning
[CMU]
https://arxiv.org/abs/2602.19041
---
[RO] TOPReward: Token Probabilities as Hidden Zero-Shot Rewards for Robotics
[University of Washington & Amazon]
https://arxiv.org/abs/2602.19313
00:00:35 为什么AI刷题越多,第一次答对率反而越低?
00:05:35 AI的“好记性”与“烂笔头”
00:10:06 AI程序员的“应试教育”陷阱
00:14:17 AI世界的“剪刀石头布”难题
00:19:08 机器人教练的“读心术”
本期介绍的几篇论文:
[LG] Why Pass﹫k Optimization Can Degrade Pass﹫1: Prompt Interference in LLM Post-training
[Singapore University of Technology and Design & University of Maryland]
https://arxiv.org/abs/2602.21189
---
[LG] Exploratory Memory-Augmented LLM Agent via Hybrid On- and Off-Policy Optimization
[Microsoft Research]
https://arxiv.org/abs/2602.23008
---
[LG] ISO-Bench: Can Coding Agents Optimize Real-World Inference Workloads?
[Lossfunk]
https://arxiv.org/abs/2602.19594
---
[LG] Back to Blackwell: Closing the Loop on Intransitivity in Multi-Objective Preference Fine-Tuning
[CMU]
https://arxiv.org/abs/2602.19041
---
[RO] TOPReward: Token Probabilities as Hidden Zero-Shot Rewards for Robotics
[University of Washington & Amazon]
https://arxiv.org/abs/2602.19313
评价
空空如也
小宇宙热评
暂无小宇宙热门评论