Loading Events

Data-centric Approaches to Predicting and Improving Model Capabilities

Kimia Hamidieh, Massachusetts Institute of Technology

Date: Thursday, November 20, 2025 Time: 10:00 - 10:45am Virtual Link , opens in a new tab/window

Abstract: Understanding how data composition shapes model capabilities is fundamental to building capable and safe AI models. We find that data domains interact synergistically, where certain domain combinations unlock emergent capabilities while others create interference, and introduce domain-aware scaling laws to predict these effects. Beyond prediction, we show that removing specific training examples substantially improves model reliability.