主播
节目简介
来源:小宇宙
[LG] Is your batch size the problem? Revisiting the Adam-SGD gap in language modeling
[Max Planck Institute for Intelligent Systems]
https://arxiv.org/abs/2506.12543