19 December 2024
ProCyon: A Multimodal Foundation Model for Protein Phenotypes
By: Owen Queen, Robert Calef, and Marinka Zitnik
The authors introduce ProCyon, a multimodal foundation model to model, generate, and predict protein phenotypes.
9 December 2024
Loss-to-Loss Prediction
Scaling Laws for all Datasets
By: David Brandfonbrener, Nikhil Anand, Nikhil Vyas, Eran Malach and Sham Kakade
The authors develop a method to predict how large language models scale with compute across different datasets, enabling more efficient training and better understanding of data-compute tradeoffs.
22 November 2024
How Does Critical Batch Size Scale in Pre-training? (Decoupling Data and Model Size)
By: Hanlin Zhang, Depen Morwani, Nikhil Vyas, Udaya Ghai, Jingfeng Wu and Difan Zou
The authors empirically show that the critical batch size for pre-training scales with data size rather than model size, followed by theoretical justifications for it.
28 October 2024
Mixture of Parrots 🦜🦜🦜: Experts Improve Memorization More Than Reasoning
By: Samy Jelassi and Eran Malach
The authors demonstrate, through theory and experiments, that in MoEs, experts improve memorization more than reasoning.
25 September 2024
Contrastive Learning Explains the Emergence and Function of Visual Category Selectivity
By: Jacob Prince, George Alvarez, and Talia Konkle
The authors introduce an updated framework for understanding visual object recognition and category selectivity: contrastive coding.
16 August 2024
Context Matters for Foundation Models in Biology
By: Michelle M. Li and Marinka Zitnik
Introducing PINNACLE, a novel contextual AI model for single-cell protein biology that supports a broad array of biomedicalAI tasks by tailoring outputs to the cell type context in which the model makes predictions.
12 July 2024
Anything but SGD: Evaluating Optimizers for LLM Training
By: Rosie Zhao, Depen Morwani, David Brandfonbrener and Nikhil Vyas
The authors perform a rigorous study of a variety of LLM training optimizers and find that they are all fairly similar except for SGD, which is notably worse.
18 June 2024
Transcendence: Generative Models Can Outperform the Experts That Train Them
By: Edwin Zhang and Eran Malach
The authors theoretically and empirically demonstrate that generative models in chess can outperform the experts that train them by low-temperature sampling.
12 June 2024
A Dynamical Model of Neural Scaling Laws
Second part of a two-part blog post covering recent findings from the authors
By: Blake Bordelon, Alex Atanasov, and Cengiz Pehlevan
By analyzing the dynamics of a random feature model trained with gradient descent, the authors reproduce and explain many observations about neural scaling laws, including compute optimal scaling of model size and training time.
13 May 2024
Infinite Limits of Neural Networks
Part 1 of a two-part blog post covering recent findings from the authors
By: Alex Atanasov, Blake Bordelon, and Cengiz Pehlevan
A discussion of expository material and the authors' recent papers related to large width and depth limits.