主播
节目简介
来源:小宇宙
今天我们要聊点特别的,看看科学家们是如何用一些生活中的大智慧,来教AI学得更聪明。我们会探索四篇最新论文,看看如何给AI配一个靠谱的“数学翻译官”,让它不再胡说八道;又如何像一位金牌私教,通过“错题本”和“二选一”来因材施教。接着,我们会揭秘一种神奇的“反向学习法”,让AI通过观察就能比老师做得更好;最后,我们还会聊聊为什么给大模型一个“沙漏身材”,会比传统的“水桶身材”更高效。准备好了吗?让我们一起出发!
00:00:36 给AI装一个靠谱的数学翻译官
00:05:03 AI界的“因材施教”,如何让小模型学得更聪明?
00:10:32 如何让机器“反向”学习,变得比老师更聪明?
00:16:08 只看结果,你可能错过了真正的第一名
00:22:15 AI大模型的新身材,为什么“沙漏”比“水桶”好?
本期介绍的几篇论文:
[LG] Visored: A Controlled-Natural-Language Prover for LLM-Generated Mathematics
[University of Washington]
https://arxiv.org/abs/2606.17581
---
[CL] Zone of Proximal Policy Optimization: Teacher in Prompts, Not Gradients
[NVIDIA]
https://arxiv.org/abs/2606.18216
---
[LG] Reversal Q-Learning
[UC Berkeley]
https://arxiv.org/abs/2606.17551
---
[LG] Offline Preference-Based Trajectory Evaluation
[CMU]
https://arxiv.org/abs/2606.17541
---
[CL] Variable-Width Transformers
[MIT]
https://arxiv.org/abs/2606.18246
00:00:36 给AI装一个靠谱的数学翻译官
00:05:03 AI界的“因材施教”,如何让小模型学得更聪明?
00:10:32 如何让机器“反向”学习,变得比老师更聪明?
00:16:08 只看结果,你可能错过了真正的第一名
00:22:15 AI大模型的新身材,为什么“沙漏”比“水桶”好?
本期介绍的几篇论文:
[LG] Visored: A Controlled-Natural-Language Prover for LLM-Generated Mathematics
[University of Washington]
https://arxiv.org/abs/2606.17581
---
[CL] Zone of Proximal Policy Optimization: Teacher in Prompts, Not Gradients
[NVIDIA]
https://arxiv.org/abs/2606.18216
---
[LG] Reversal Q-Learning
[UC Berkeley]
https://arxiv.org/abs/2606.17551
---
[LG] Offline Preference-Based Trajectory Evaluation
[CMU]
https://arxiv.org/abs/2606.17541
---
[CL] Variable-Width Transformers
[MIT]
https://arxiv.org/abs/2606.18246