12 July 2024
Anything but SGD: Evaluating Optimizers for LLM Training
By: Rosie Zhao, Depen Morwani, David Brandfonbrener and Nikhil Vyas
The authors perform a rigorous study of a variety of LLM training optimizers and find that they are all fairly similar except for SGD, which is notably worse.
18 June 2024
Transcendence: Generative Models Can Outperform the Experts That Train Them
By: Edwin Zhang and Eran Malach
The authors theoretically and empirically demonstrate that generative models in chess can outperform the experts that train them by low-temperature sampling.
12 June 2024
A Dynamical Model of Neural Scaling Laws
Second part of a two-part blog post covering recent findings from the authors
By: Blake Bordelon, Alex Atanasov, and Cengiz Pehlevan
By analyzing the dynamics of a random feature model trained with gradient descent, the authors reproduce and explain many observations about neural scaling laws, including compute optimal scaling of model size and training time.
13 May 2024
Infinite Limits of Neural Networks
Part 1 of a two-part blog post covering recent findings from the authors
By: Alex Atanasov, Blake Bordelon, and Cengiz Pehlevan
A discussion of expository material and the authors' recent papers related to large width and depth limits.
16 April 2024
Distinguishing the Knowable from the Unknowable with Language Models
By: Gustaf Ahdritz, Tian Qin, Nikhil Vyas, Boaz Barak, and Ben Edelman
A new way to label different types of uncertainty in unconstrained text and simple methods to predict those labels, including a completely unsupervised approach.
5 February 2024
Repeat After Me: Transformers are Better than State Space Models at Copying
By: Samy Jelassi, David Brandfonbrener, Sham Kakade and Eran Malach
Improved efficiency of State Space Models sacrifices some core capabilities for modern LLMs.
7 December 2023
A Next-Generation Architecture for Elastic and Conditional Computation
The Matryoshka Way
By: Aditya Kusupati, Sneha Kudugunta, Devvrit, and Tim Dettmers
Introducing an algorithmic method to elastically deploy large models: the #MatFormer.
15 November 2023
Where Do Features Come From?
A story of sinusoids and inductive biases
By: Ben Edelman, Depen Morwani, Costin Oncescu, and Rosie Zhao
Mechanic interpretability results explained using known inductive biases.
9 November 2023
Watermarking in the Sand
By: Ben Edelman, Hanlin Zhang and Boaz Barak
Robust watermarking in AI is impossible under natural assumptions.