Foundations of Post-training from Base Models
Audrey Huang, University of Illinois Urbana-Champaign
Abstract: In large language models, post-training consistently elicits complex reasoning behavior from pre-trained base models, yet we lack a principled understanding of this process: how does post-training interact with the base model, and what algorithms maximally leverage its capabilities? In this talk, we first identify fundamental quantities—such as the base model’s coverage over high quality responses—characterizing the behaviors of post-training, and then develop optimal algorithms within this framework. Our results raise new questions and lay groundwork for further investigations into how reinforcement learning can incentivize capabilities beyond the base model’s.
