Deeper Learning

22 November 2024

How Does Critical Batch Size Scale in Pre-training? (Decoupling Data and Model Size)

By: Hanlin Zhang, Depen Morwani, Nikhil Vyas, Udaya Ghai, Jingfeng Wu and Difan Zou

The authors empirically show that the critical batch size for pre-training scales with data size rather than model size, followed by theoretical justifications for it.

Preprint

28 October 2024

Mixture of Parrots 🦜🦜🦜: Experts Improve Memorization More Than Reasoning

By: Samy Jelassi and Eran Malach

The authors demonstrate, through theory and experiments, that in MoEs, experts improve memorization more than reasoning.

Preprint

25 September 2024

Contrastive Learning Explains the Emergence and Function of Visual Category Selectivity

By: Jacob Prince, George Alvarez, and Talia Konkle

The authors introduce an updated framework for understanding visual object recognition and category selectivity: contrastive coding.

Paper

16 August 2024

Context Matters for Foundation Models in Biology

By: Michelle M. Li and Marinka Zitnik

Introducing PINNACLE, a novel contextual AI model for single-cell protein biology that supports a broad array of biomedicalAI tasks by tailoring outputs to the cell type context in which the model makes predictions.

12 July 2024

Anything but SGD: Evaluating Optimizers for LLM Training

By: Rosie Zhao, Depen Morwani, David Brandfonbrener and Nikhil Vyas

The authors perform a rigorous study of a variety of LLM training optimizers and find that they are all fairly similar except for SGD, which is notably worse.

Paper

18 June 2024

Transcendence: Generative Models Can Outperform the Experts That Train Them

By: Edwin Zhang and Eran Malach

The authors theoretically and empirically demonstrate that generative models in chess can outperform the experts that train them by low-temperature sampling.

12 June 2024

A Dynamical Model of Neural Scaling Laws

Second part of a two-part blog post covering recent findings from the authors

By: Blake Bordelon, Alex Atanasov, and Cengiz Pehlevan

By analyzing the dynamics of a random feature model trained with gradient descent, the authors reproduce and explain many observations about neural scaling laws, including compute optimal scaling of model size and training time.

13 May 2024

Infinite Limits of Neural Networks

Part 1 of a two-part blog post covering recent findings from the authors

By: Alex Atanasov, Blake Bordelon, and Cengiz Pehlevan

A discussion of expository material and the authors' recent papers related to large width and depth limits.

16 April 2024

Distinguishing the Knowable from the Unknowable with Language Models

By: Gustaf Ahdritz, Tian Qin, Nikhil Vyas, Boaz Barak, and Ben Edelman

A new way to label different types of uncertainty in unconstrained text and simple methods to predict those labels, including a completely unsupervised approach.

Code
Paper

5 February 2024

Repeat After Me: Transformers are Better than State Space Models at Copying

By: Samy Jelassi, David Brandfonbrener, Sham Kakade and Eran Malach

Improved efficiency of State Space Models sacrifices some core capabilities for modern LLMs.

Pre-print

Blog List

2024