[CL] Bridging Offline and Online Reinforcement Learning for LLMs [FAIR at Meta] arxiv.org
[LG] The Ideation-Execution Gap: Execution Outcomes of LLM-Generated versus Human Research Ideas [Stanford University] arxiv.org
[CL] Potemkin Understanding in Large Language Models [MIT & University of Chicago & Harvard University] arxiv.org
[CL] Can Gradient Descent Simulate Prompting? [MIT CSAIL] arxiv.org
[CL] OctoThinker: Mid-training Incentivizes Reinforcement Learning Scaling [Shanghai Jiao Tong University] https://arxiv.org/abs/2506.20512 --- [LG] Overtuning in Hyperparameter Optimization [LMU Munich] https://arxiv.org/abs/2506.19540 --- [LG] Distilling Normalizing Flows [University of Oregon & HSE University & Picsart AI Research] https://arxiv.org/abs/2506.21003 --- [LG] Gaussian Invariant Markov Chain Monte Carlo [Google DeepMind & UCL] https://arxiv.org/abs/2506.21511
这个时代对普通人最底层的要求,其实就是做一个“正常人”。
[LG] Asymmetric REINFORCE for off-Policy Reinforcement Learning: Balancing positive and negative rewards [FAIR at Meta] arxiv.org
[LG] Mastering Multiple-Expert Routing: Realizable H-Consistency and Strong Guarantees for Learning to Defer [Courant Institute of Mathematical Sciences & Google Research] arxiv.org
[CL] Inside you are many wolves: Using cognitive models to interpret value trade-offs in LLMs [Harvard University] arxiv.org
[LG] Language Modeling by Language Models [Allen Institute for AI] arxiv.org
[CL] DiffuCoder:Understanding and Improving Masked Diffusion Models for Code Generation [Apple] arxiv.org
[CL] Can Gradient Descent Simulate Prompting? [MIT CSAIL] https://arxiv.org/abs/2506.20989 --- [CL] Potemkin Understanding in Large Language Models [MIT & University of Chicago & Harvard University] https://arxiv.org/abs/2506.21521 --- [LG] The Ideation-Execution Gap: Execution Outcomes of LLM-Generated versus Human Research Ideas [Stanford University] https://arxiv.org/abs/2506.20803 --- [CL] Bridging Offline and Online Reinforcement Learning for LLMs [FAIR at Meta] https://arxiv.org/abs/2506.21495 --- [CL] Data Efficacy for Language Model Training [Microsoft Research] https://arxiv.org/abs/2506.21545
与播客爱好者一起交流
添加微信好友,获取更多播客资讯
播放列表还是空的
去找些喜欢的节目添加进来吧