主播
节目简介
来源:小宇宙
[LG] Optimal Embedding Learning Rate in LLMs: The Effect of Vocabulary Size
[UC Berkeley & Microsoft Research]
https://arxiv.org/abs/2506.15025