16 March 2026
Energy-Based Fine-Tuning: Beyond Next-Token Prediction
By: Samy Jelassi*, Mujin Kwun*, Rosie Zhao*, Yuanzhi Li, Nicolo Fusi, Yilun Du, Sham M. Kakade and Carles Domingo-Enrich*
The authors introduce a Energy-Based Fine-Tuning (EBFT), a method that matches the long-range statistics of model generations to ground-truth sequences in high-dimensional feature spaces. EBFT corrects the error amplification caused by standard teacher-forced next token training while improving downstream task performance. Experiments show that EBFT matches the accuracy improvements of RLVR while requiring no external reward signal or verifier. In contrast to RLVR, EBFT simultaneously improves the validation cross-entropy.
9 March 2026
InputDSA: Demixing then comparing recurrent and externally driven dynamics in complex systems
By: Ann Huang and Kanaka Rajan
The authors introduce InputDSA, a new method to measure the similarity between two complex systems when they are driven by external inputs, like biological neural circuits or reinforcement learning agents. The method disentangles each systems’ intrinsic dynamics from its input-driven effects, enabling highly accurate, robust, and efficient comparisons of those components.
4 March 2026
Structure, Disorder, and Dynamics in Task-Trained Recurrent Neural Circuits
By: David Clark,* Blake Bordelon,* Jacob Zavatone-Veth,* Cengiz Pehlevan
The authors develop a mean-field theory of task-trained recurrent networks that continuously interpolates between these regimes, and find evidence that macaque motor cortex is best captured by an intermediate level of task-specific recurrent restructuring.
6 February 2026
Forecasting the Brain: Scalable Neural Prediction with POCO
By: Yu Duan and Kanaka Rajan
Predicting future neural activity is a critical step toward achieving real-time, closed-loop neurotechnologies. To this end, we introduce POCO, a unified forecasting model trained on…
4 February 2026
Anytime Pretraining: Horizon-Free Learning-Rate Schedules with Weight Averaging
By: Alexandru Meterez*, Pranav Ajit Nair*, Depen Morwani*, Cengiz Pehlevan, Sham Kakade
The authors provide a theoretical analysis demonstrating the existence of anytime learning schedules for overparameterized linear regression, and highlight the central role of weight averaging—also known as model merging—in achieving the optimal convergence rates of stochastic gradient descent.
3 February 2026
Measuring and Controlling Solution Degeneracy Across Task-Trained Recurrent Neural Networks
By: Ann Huang and Kanaka Rajan
Despite reaching equal performance success when trained on the same task, artificial neural networks can develop dramatically different internal solutions, much like different students solving the same math problem using completely different approaches. Our study introduces a unified framework to quantify this variability across Recurrent Neural Network (RNN) solutions, which we term solution degeneracy, and analyze what factors shape it across thousands of recurrent networks trained on memory and decision-making tasks.
26 January 2026
PROTON: A Relational Foundation Model for Neurological Discovery
By: Ayush Noori and Marinka Zitnik
This work introduces a relational foundation model for neurological discovery and evaluates it through discovery loops that connect AI predictions to experiments in Parkinson’s disease, bipolar disorder, and Alzheimer’s disease.
5 January 2026
Large Video Planner: A New Foundation Model for General-Purpose Robots
By: Yilun Du
This work explores using video as the primary modality for robot foundation models. Unlike static images, videos naturally encode physical dynamics and semantics of the world, providing a rich prior for physical decision-making.
24 November 2025
Into the Rabbit Hull-Part II
From Linear Directions to Convex Geometry
By: Thomas Fel*, Binxu Wang*, Michael A. Lepori, Matthew Kowal, Andrew Lee, Randall Balestriero, Sonia Joseph, Ekdeep S. Lubana, Talia Konkle, Demba Ba, Martin Wattenberg
The authors ask the fundamental question: is the linear view of DINOv2 under the Linear Representation Hypothesis (LRH) sufficient to describe how deep vision models organize information? The authors examine the geometry and statistics of the learned concepts themselves and the results suggest that representations are organized beyond linear sparsity alone.
12 November 2025
Into the Rabbit Hull – Part I
By: Thomas Fel*, Binxu Wang*, Michael A. Lepori, Matthew Kowal, Andrew Lee, Randall Balestriero, Sonia Joseph, Ekdeep S. Lubana, Talia Konkle, Demba Ba, Martin Wattenberg
The authors offer an interpretability deep dive, examining the most important concepts emerging in one of today’s central vision foundation models, DINOv2.