“A Beautiful Puzzle”: Looking Inside AI Models and Trying to Understand What We See

By Deborah Apsel LangFebruary 26, 2026

Thomas Fel, a rising star in AI vision models, discusses his work at the Kempner Institute, and what’s next for him— and for AI

A man in conversation smiling at the camera

Thomas Fel's research seeks to understand the inner workings of AI models. Of the models he says, "We can see inside, all the weights and numbers and geometry. We can see it, we just don’t know what it means yet.”

Photo: Anna Olivella

Thomas Fel, a research fellow at the Kempner Institute, wants to help solve what he describes as one of the most fascinating puzzles of our generation—understanding the inner workings of AI models, and uncovering the mathematical principles that underlie their abilities.

In particular, Fel works on large vision models, advanced AI systems trained on massive amounts of visual data that can interpret images and video and, in many cases, generate new visual content. The models themselves are often described as “black boxes” because, while they can make accurate predictions or decisions, their internal reasoning remains a mystery.

For Fel, however, the idea of a black box doesn’t quite capture the actual nature of the puzzle.

“I think of it more as a glass box,” says Fel. “We can see inside, all the weights and numbers and geometry. We can see it, we just don’t know what it means yet.”

Fel’s research is part of the growing AI subfield of interpretability, which seeks to develop principled, mathematical accounts of how complex AI models encode and represent information internally and arrive at their predictions.

Most of today’s AI vision models are built using neural networks, layered computational systems that form complex representations by learning from large amounts of data. So, when Fel and his colleagues “look inside” a vision model, they’re really trying to understand how its neural network has learned to represent and combine visual features.

“It is a beautiful puzzle,” says Fel.

As he prepares to depart the Kempner for a new role at Goodfire AI, a research company dedicated to interpretability, Fel spoke with us about his last two years of work probing the mathematical structure of large vision models, what he has learned about this puzzle, and where he believes the most important pieces still lie hidden.

The following interview has been edited and condensed.


Kempner: You have been at the Kempner Institute for almost two years as a research fellow. What is the major accomplishment you’ll take away from your time at the Kempner?

Fel: It’s not so much one accomplishment, but more a series of papers on interpretability that that I’ve been proud to be part of, examining the implicit inductive biases that underlie the methods our field uses to interpret AI models, and trying to surface and formalize assumptions that had previously gone unexamined. And we were able to uncover some really interesting things about the inner workings of these models and in particular how these models represent information geometrically.

Kempner: A lot of your work focuses on trying to understand large vision models. Can you tell me a little about what got you interested in vision models in the first place?

Fel: My first encounter was with my Ph.D. advisor, Thomas Serre at Brown University, who is a visual neuroscientist, and I just fell in love with it directly. Working on vision is a luxury. I get to work with beautiful images every day.

I remember there was a blog post on Distill, basically by some of the first people doing interpretability. It was late at night and cold outside, and I remember sitting there, eating my little piece of chocolate, reading the blog post and just thinking: This is so beautiful. This is a beautiful puzzle. Understanding how these models are working is the most exciting puzzle of the century.

Can you talk a little about how you blend computational techniques with insights from neuroscience to better understand the inner workings of these vision models?

At the Kempner, Fel has collaborated with researchers across disciplines to investigate and analyze large vision models, and to develop a mathematical definition of the geometric space inside the models. Photo: Anna Olivella

Fel: When I arrived at Kempner two years ago, we had a lot of methods [concrete technical recipes for building, training and evaluating models], but one of the questions was: why do the methods work? So, when I arrived I was interested in answering the question: is there a common ground for all these methods? And what we’ve discovered is that the method that works for [artificial] neural network models, and has been used for years, is actually exactly the same as the method used for neuroscience, for learning in the brain. And it is called dictionary learning.

Dictionary learning is this classical problem in signal processing and applied mathematics: you start with a set of observations, find a set of simple patterns that show up again and again, and those patterns become entries in the dictionary.  In that sense, it formalizes something like memory— the idea that complex signals can be reconstructed from a small number of basic building blocks.

And this is where I met Demba Ba [Kempner Associate Faculty member and Gordon McKay Professor of Electrical Engineering at Harvard]. He does incredible work on dictionary learning, and I told him I think there is a strong application in interpretability for dictionary learning. And we thought we could build bridges between dictionary learning and interpretability. So, the big insight we had here is connecting dictionary learning to interpretability. And we found dictionary learning is a very helpful tool for interpreting and understanding how artificial brains learn patterns from masses of data.

What kind of insights has this allowed for?

Fel: It took us down two paths. The first is analysis. We asked: given these tools, how much can we understand about the inner workings of neural networks? So, we took the best vision model in the world and spent three months inspecting the concepts inside of it and tried to understand how it is organized. We did a deep investigation of the model using the tool of dictionary learning to understand how the model organizes concepts, and to understand the geometry of each concept. We were just trying to analyze and get as much insight as we could about why the model behaves the way it does.

There are many dictionary learning methods, hundreds of them, so the second question was: which should we use? Or more importantly, the better question was: what assumptions underlie each method? And what can we learn about the geometry of the model itself by understanding the assumptions behind the methods that work in building and training those models?

It turns out that the dictionary learning methods that perform well on these models share a common set of structural assumptions. Rather than converging by design [i.e. being designed with a common set of assumptions], they converge implicitly [i.e. consistently uncover the same underlying structure of the data without being crafted to do so], which itself tells us something: those shared assumptions are likely not arbitrary, but reflect genuine properties of the underlying representational geometry.

And this allowed us to say something important about neural networks. We had discovered something hidden about the geometry of this neural network model, because those common assumptions, they were geometrical assumptions. They tell us something about how the space inside these models is organized.

And so we were able to properly formalize this, to write the exact mathematical definition of this geometric space inside the model, which basically states that a model is able to encode far more features than it has [artificial] neurons, and that these features are arranged in a quasi-orthogonal fashion within the representational space. This is a precise, falsifiable geometric claim about how information is structured inside the model that goes beyond just a qualitative observation.

Why does this geometry matter?

Fel: We think the geometry has a motivation. We think it happens for a reason.

The geometry of deep learning [the learning of neural networks] seems to resemble something called the Grassmannian frame [a geometric concept describing a set of vectors distributed as uniformly as possible in high-dimensional space, optimally spread in every direction simultaneously.]

I did this work with my advisors Talia Konkle, Demba Ba, and Martin Wattenberg. We were really impressed that everything fit together pretty well; that the geometry of deep learning is a geometry that resembles something very well known in mathematics and appears all over the place in the natural world. It’s a beautiful fact.

Fel presents his research on the geometry of large vision models to Kempner community members. Fel and his collaborators have been able to describe the geometric space inside a vision model. “We think the geometry has a motivation,” says Fel. “We think it happens for a reason.” Photo: Anna Olivella

This same geometric principle emerges across natural systems. In biology, it appears in the Tammes problem, which describes how pores arrange themselves on the surface of a pollen grain to maximize spacing and enable efficient germination. In physics, it appears in the Thomson problem, where electrons distributed across a sphere adopt this same quasi-uniform configuration to minimize their mutual repulsion. It seems that nature keeps arriving at the same elegant solution, and it is striking to find that deep learning models appear to arrive there too.

Kempner: You’ve been able to describe the geometric space inside vision models. Now what?

Fel: We searched for two years to characterize some geometric phenomena, and I would like to explore an algorithmic angle now. I think geometry and algorithm are really two faces of the same coin. A neural network is, in some sense, a “differentiable programming language”—a system that writes its own algorithm through training—in a language we do not yet know how to read. The geometric structure we uncovered is like the syntax of that language. Now I want to find the algorithm itself.

And we now have a concrete reason to think that algorithm is simple. In recent work led by Mozes Jacob, Richard Hakim, Alessandra Brondetta, Demba Ba and Andy Keller, we showed that large deep vision models can be compressed into surprisingly small recurrent blocks, without meaningful loss of performance. That tells us that these models have low Kolmogorov complexity, meaning the algorithm they are executing is, in some formal sense, compact. And it suggests that some geometric structures that we have studied and algorithmic simplicity may be more linked than we thought. They may be two views of the same underlying meta-structure.

So now I want to turn the coin over. We have described the geometric face. The next step is to find the algorithm, something precise enough to write down, and simple enough that a human can actually read it and understand what the model is doing. That, for me, is the most exciting open problem right now.

Kempner: You are heading to Goodfire AI, a relatively small and early-stage research company focused on interpretability. Can you tell us a little about what you are planning on doing there and why you chose this as your next step?

Fel: I chose Goodfire and not the others because of the research, because for at least three years I can continue to do my research. They are taking a bet on me to develop new methods in interpretability.

The second reason is I like to follow people rather than companies and I work a lot with Ekdeep Singh [Lubana]. He’s a true genius, and I think he might be one of the people who can crack this puzzle, and he is there. And they are building this team at Goodfire and bringing in some rock stars on interpretability. I want to be close to those people and learn from them.

And the third reason is that because it is small, I can potentially make a difference there. I’ll be continuing to investigate more precisely algorithmic assumptions to understand neural networks. And also continuing the geometric work, exploring the algorithmic and geometric aspects in parallel.

Kempner: Your field is changing and advancing at a rapid clip. What do you understand about vision models today that you didn’t know in 2024 when you arrived at the Kempner?

Fel: It depends on if you ask me in the morning or at night! In the morning, I’m very optimistic, and I think, of course, all we have learned, all the phenomena that we have observed, transfers well. What we now understand about some neural networks helps us to better understand neural networks more generally.

It is a little like comparative anatomy: the species are different, but the underlying structural principles recur. What we learn about one architecture genuinely constrains what is possible in others, because the geometry we are finding appears to be a property of the learning process itself, not of any particular model.

But, at night, when I’m going to sleep and I’m a bit desperate, I think: We have to start all over again every time there is a new model! And sometimes, of course, it’s a bit daunting. So, I’d say it depends on the day. But right now, I’m optimistic. I think we’re going to make it.

Kempner: What does “making it” look like?

Fel: For me, it would be taking the weights of a trained model and producing a structured, minimal description of the computation it is performing, ideally one compact enough that a researcher could inspect it, critique it, and understand not just what the model predicts, but why. That would be a great step.