Generalization in Attention-Based Models: Insights from Solvable High-Dimensional Models
Lenka Zdeborová
Join us for a talk by Lenka Zdeborová, Associate Professor of Physics and Computer Science at École Polytechnique Fédérale de Lausanne. This talk is part of the Kempner Seminar Series, a research-level seminar series that covers topics related to the basis of intelligence in natural and artificial systems.
Statistical physics has long provided a framework for understanding learning in high dimensions, uncovering phase transitions, fundamental limits on generalization, and gaps between optimal inference and practical algorithms. In this talk, I will discuss recent efforts to extend this perspective from classical perceptron-type models to attention-based architectures that operate on sequences, as in transformers. Using solvable high-dimensional models, we obtain analytical insights into generalization, scaling laws, or spectral properties of learned representations. While these models are necessarily simplified, they offer a controlled setting in which theoretical predictions can be derived and compared with empirical phenomena observed in modern AI systems.
