Event Categories
Research Fellow Candidate Presentations
Data-centric Approaches to Predicting and Improving Model Capabilities
Kimia Hamidieh, Massachusetts Institute of Technology
Abstract: Understanding how data composition shapes model capabilities is fundamental to building capable and safe AI models. We find that data domains interact synergistically, where certain domain combinations unlock emergent capabilities while others create interference, and introduce domain-aware scaling laws to predict these effects. Beyond prediction, we show that removing specific training examples substantially improves model reliability.
