主播
节目简介
来源:小宇宙
你有没有想过,AI在长篇大论时,如何避免“每写一字就重读全书”的笨办法?我们又该如何教会AI像高手一样,先画好跑道再冲刺,而不是把所有规矩搅成一锅粥?本期节目,我们将揭秘几篇最新论文中的精妙巧思:从只“聪明”一次的共享索引,到为模型“正骨”提升训练速度,再到探索AI用“大脑”而非“嘴巴”进行潜意识思考的全新可能。让我们一起看看,AI是如何在内部进行一场深刻的“流程革命”的。
00:00:35 AI的长思考难题,如何只聪明一次?
00:05:13 用更慢的网线,如何训练出更强的AI?
00:10:20 给AI模型做“正骨”,一个让训练提速2倍的巧思
00:15:05 先画好跑道,再谈百米冲刺
00:20:31 大模型思考,用嘴还是用脑?
本期介绍的几篇论文:
[CL] You Only Index Once: Cross-Layer Sparse Attention with Shared Routing
[Microsoft Research]
https://arxiv.org/abs/2606.06467
---
[LG] Learned Subspace Compression for Communication-Efficient Pipeline Parallelism
[Concordia University & Sorbonne University]
https://arxiv.org/abs/2606.05484
---
[LG] PC Layer: Polynomial Weight Preconditioning for Improving LLM Pre-Training
[The Chinese University of Hong Kong & Google LLC]
https://arxiv.org/abs/2606.06470
---
[LG] Multi-ResNets for Subspace Preconditioning in Constrained Optimization
[UCLA & University of Oxford & Stanford University]
https://arxiv.org/abs/2606.06300
---
[CL] Latent Reasoning with Normalizing Flows
[University of Pennsylvania]
https://arxiv.org/abs/2606.06447
00:00:35 AI的长思考难题,如何只聪明一次?
00:05:13 用更慢的网线,如何训练出更强的AI?
00:10:20 给AI模型做“正骨”,一个让训练提速2倍的巧思
00:15:05 先画好跑道,再谈百米冲刺
00:20:31 大模型思考,用嘴还是用脑?
本期介绍的几篇论文:
[CL] You Only Index Once: Cross-Layer Sparse Attention with Shared Routing
[Microsoft Research]
https://arxiv.org/abs/2606.06467
---
[LG] Learned Subspace Compression for Communication-Efficient Pipeline Parallelism
[Concordia University & Sorbonne University]
https://arxiv.org/abs/2606.05484
---
[LG] PC Layer: Polynomial Weight Preconditioning for Improving LLM Pre-Training
[The Chinese University of Hong Kong & Google LLC]
https://arxiv.org/abs/2606.06470
---
[LG] Multi-ResNets for Subspace Preconditioning in Constrained Optimization
[UCLA & University of Oxford & Stanford University]
https://arxiv.org/abs/2606.06300
---
[CL] Latent Reasoning with Normalizing Flows
[University of Pennsylvania]
https://arxiv.org/abs/2606.06447