主播
节目简介
来源:小宇宙
你有没有想过,AI是在帮你分析,还是在高级地“说服”你?我们总希望AI像个完美的老师,但如果它只会给标准答案,甚至连老师的偏见都一并继承,那会怎样?而为了让AI学得更好,我们不仅要为它的“记忆”做体检,甚至还要教会它一项人类的高级智慧:学会放弃。今天,我们就从五篇最新的论文出发,看看AI是如何在说服、学习和思考的边界上,进行着一场静悄悄的认知革命。
00:00:33 当AI学会了“高级说服”,你的大脑还够用吗?
00:06:00 如何给AI做一次“记忆体检”?
00:12:34 AI只会“标准答案”?那可就危险了
00:18:04 高手过招,如何避免被师傅“带偏”?
00:23:19 训练AI的真谛,学会放弃,才能得到更多
本期介绍的几篇论文:
[AI] Evaluating Language Models for Harmful Manipulation
[Google DeepMind & Google]
https://arxiv.org/abs/2603.25326
---
[CL] Estimating near-verbatim extraction risk in language models with decoding-constrained beam search
[Stanford & Cornell]
https://arxiv.org/abs/2603.24917
---
[LG] Reaching Beyond the Mode: RL for Distributional Reasoning in Language Models
[MIT]
https://arxiv.org/abs/2603.24844
---
[LG] Residual-as-Teacher: Mitigating Bias Propagation in Student--Teacher Estimation
[MIT]
https://arxiv.org/abs/2603.25466
---
[CL] Prune as You Generate: Online Rollout Pruning for Faster and Better RLVR
[University of Illinois at Urbana-Champaign]
https://arxiv.org/abs/2603.24840
00:00:33 当AI学会了“高级说服”,你的大脑还够用吗?
00:06:00 如何给AI做一次“记忆体检”?
00:12:34 AI只会“标准答案”?那可就危险了
00:18:04 高手过招,如何避免被师傅“带偏”?
00:23:19 训练AI的真谛,学会放弃,才能得到更多
本期介绍的几篇论文:
[AI] Evaluating Language Models for Harmful Manipulation
[Google DeepMind & Google]
https://arxiv.org/abs/2603.25326
---
[CL] Estimating near-verbatim extraction risk in language models with decoding-constrained beam search
[Stanford & Cornell]
https://arxiv.org/abs/2603.24917
---
[LG] Reaching Beyond the Mode: RL for Distributional Reasoning in Language Models
[MIT]
https://arxiv.org/abs/2603.24844
---
[LG] Residual-as-Teacher: Mitigating Bias Propagation in Student--Teacher Estimation
[MIT]
https://arxiv.org/abs/2603.25466
---
[CL] Prune as You Generate: Online Rollout Pruning for Faster and Better RLVR
[University of Illinois at Urbana-Champaign]
https://arxiv.org/abs/2603.24840