Home Research Deeper Learning Traveling Waves Integrate Spatial Information Through Time

Traveling Waves Integrate Spatial Information Through Time

March 10, 2025 By: Mozes Jacobs, Roberto Budzinski, Lyle Muller, Demba Ba, and T. Anderson Keller

The act of vision is a coordinated activity involving millions of neurons in the visual cortex, which communicate over distances spanning up to centimeters on the cortical surface. How do these neurons perform this coordinated computation and effectively share information over such long distances?

Traveling Waves in the Brain

Traveling waves of neural activity have been observed throughout the brain, spanning a range of scales and brain regions (Muller et al., 2018)^[1]. Surprisingly, these waves have even been measured in visual cortex in response to stimuli, propagating over the otherwise highly spatially organized visual field (Cowey, 1964)^[2]. Recent advances in recording technologies, such as calcium imaging and multielectrode arrays, have allowed neuroscientists to measure these waves with unprecedented precision (Muller et al., 2014)^[3], (Davis et al., 2020)^[4]; however, due to the inherent complexity of these measurements and the multitude of potential confounding variables, elucidating a causal role for these dynamics has remained highly challenging – especially in the context of vision.

A leading hypothesis suggests these waves may transfer and combine information across spatially distant regions of the visual cortex (Sato et al., 2012)^[5]. For example, Kitano et al. (1994)^[6] and Bringuier et al. (1999)^[7] found that neurons could be elicited to respond to stimuli far outside their classical receptive fields, but with an increased delay as a function of distance – as if the information was slowly transmitted over time between adjacent neurons. Despite these promising observations, it still remains unclear how neurons might actually use this long-range information, leading many to argue that these effects are largely ’epiphenomenal’; in other words, they are resulting from some separate causal process without any causal role of their own.

Figure 1: Overview of traveling wave-based spatial information integration. An input stimulus triggers an initial condition and sets the response properties of a lattice of neurons with both local input receptive fields and local recurrent connectivity. This initial condition evolves over time under the recurrent wave dynamics, and the resulting timeseries at each neuron eventually contains information about the entire image – effectively becoming a globally integrated representation of the visual stimulus.

In a recent preprint (Jacobs et al. 2025)^[8], we take a machine learning approach to directly measure the computational role of traveling waves in the spatial integration of information. Specifically, we introduce a set of convolutional recurrent neural networks that can generate traveling waves in their hidden states when processing visual inputs. We observe that these waves effectively expand neurons’ receptive fields, allowing distant features within an image to be shared more readily between neurons with otherwise locally constrained receptive fields. We demonstrate that these models tackle visual segmentation tasks requiring global integration, outperforming local feed-forward models and even rivaling the popular non-local U-Net architecture (Ronneberger et al., 2015)^[9] (now used in virtually all diffusion models) — despite having fewer parameters. In the following blog post, we will give a high-level overview of this work and the motivational theory.

What does it mean to “Integrate Spatial Information”?

At the simplest level, the integration of spatial information means that a neuron at one spatial location has access to signals from far-flung regions of the input. For an image, that could mean that a neuron at the bottom of an image is able to use information from the top of the image in order to figure out if the given blue patch it is looking at is part of the sky, or just a reflection off a lake; while for language, it might involve linking words from the beginning and end of a sentence to figure out the meaning of a word in the middle (Figure 2).

Figure 2: Examples of images and sentences which require global integration of information in order to make sense of local structure.

Modern artificial vision systems do not use anything like wave dynamics to integrate spatial information. Instead, most systems use a large number of layers (such as deep convolutional neural networks), bottleneck layers (such as in U-Nets), and global connectivity (such as in the all-to-all attention of Transformers). However, each of these approaches comes with its own limitations. For example, deeper networks have more challenging gradient propagation (requiring the use of residual connections), bottlenecks inherently limit the capacity of neural representations (requiring U-Net like skip connections), and all-to-all global connectivity is extremely computationally expensive (in terms of run time and memory usage, limiting scalability).

In this work, we seek to answer if traveling wave integration of spatial information might be an efficient biologically plausible alternative to these methods. Furthermore, practically, how might waves actually be doing this integration? Specifically, in what format is this information transmitted such that downstream neurons can best read it out?

How Might Waves Integrate Spatial Information?

One well known mechanism by which waves can be seen to integrate spatial information is exemplified by the famous mathematical question, posed by Mark Kac in (1966): ”Can One Hear the Shape of a Drum?”^[10]. At a high level, Kac wondered whether the set of natural frequencies at which a drum head vibrates is uniquely determined by the shape of its boundary.

Intuitively, when you strike a drumhead, the initial disturbance will propagate outwards as a transient traveling wave until it reaches the fixed boundary conditions where it will reflect with a phase shift. This reflected wave will thus have collected information about the boundary, and serves to bring it back towards the center. After repeated reflections and collisions, the wave activity eventually settles into patterns (normal modes) determined by the drumhead’s global geometry.

In a broad sense then, the answer to the famous questions is yes, the shape of a drum does determine the sound that it will produce (but not always uniquely, see (Gordon et al., 1992)^[11] for the famous counter example of ‘isospectral drum shapes’).

In the videos below we show a simulation of these dynamics precisely for square drum heads of different sizes.

Simulation of wave dynamics for drum size L=13

Simulation of wave dynamics for drum size L=33

Simulation of wave dynamics for drum size L=23

Simulation of wave dynamics for drum size L=43

But how exactly is the drum’s geometry encoded in these wave dynamics? Taking inspiration from Kac’s famous question, in Figure 3 we look at the ‘sound’ of these waveforms through the time-series of the drumhead’s displacement at a chosen point on the drum, for example from a point just off the center.

Figure 3: Plot of displacement of an individual point on the drum head (an individual neuron) as a function of time for the four different drum sizes. We can begin to see just from eyeballing these dynamics that the larger drums (L=33, 43) have lower frequency oscillations than the smaller ones (L=13, 23).

Often, a more natural way to think about sound is in terms of its frequency components, which we can compute by taking the Fourier transform of the displacement. In doing so, we arrive at the ‘spectral representations’ of each shape, which we plot in Figure 4. Mathematically, for a square drum, one can derive that the lowest frequency at which a drum head will vibrate is inversely proportional to its side length – and indeed, if we look at the first peak in the frequency spectra plotted, we can see that they gradually shift lower as the side lengths increase. This aligns with our real-world intuition that larger drums make deeper, lower-pitched sounds.

Figure 4: Fourier transform coefficient amplitudes for the point displacements plotted in Figure 3. We see the measured fundamental frequencies (first peak) from our simulation matches the theoretical value for each drum size, and similarly decreases as drum size increases.

But how could these ideas possibly relate to the brain or even artificial neural networks? First, we note that the equation which we used to simulate the above dynamics (the wave equation^[12]) actually appears very similar to the equations describing the time evolution of a recurrent neural network hidden state when discretized over space and time (as noted by other authors such as (Keller et al. 2024)^[13] and (Hughes et al. 2019)^[14]). Secondly, from this conceptual framework, we can see the desired ’integrated information’ is actually only present in the time-series history of activity at each location, and best read out through a linear transformation of this history (e.g. through the Fourier Transform).

Taking inspiration from this analogy, in our recent work, we therefore propose to investigate whether wave-based RNN architectures might similarly be able to integrate spatial information into temporally extended representations.

Can a Wave-based RNN Hear the Shape of a Polygon?

To begin, we start with the simplest learned variant of this question: Can A Wave-based RNN Hear the Shape of a Polygon?

To test this we provide images of black polygons on white backgrounds to a wave-generating RNN architecture, and measure the Fourier coefficients of the resulting dynamics at each point. We then train the model to use this ‘spectral representation’ at each pixel location to classify the pixel as either belonging to the background or one of the n-sided shapes. For each of the shapes, the model must crucially know which shape it is a part of (e.g. triangle vs. square vs. hexagon), a task which requires a significantly larger portion of the image to be able to accomplish successfully, compared with each neuron’s immediate single-step receptive field (analogous to the examples in Figure 2 above).

Figure 5: Resulting wave-dynamics learned by the model on an example hexagon image.

In Figure 5, we show the resulting wave-dynamics learned by the model on an example hexagon image. We see the model has learned to use differing natural frequencies inside and outside the shape to induce soft boundaries, causing reflection, thereby yielding different internal dynamics based on shape.

In Figure 6 (below), we show the resulting predicted segmentation masks, and a select set of Fourier coefficient magnitudes at all spatial locations for an example image of a triangle and a hexagon. We see that the model does indeed learn to accomplish this segmentation task, and further separates different shapes into different parts of frequency space. On the right, we plot the full frequency spectrum for each shape in the dataset, averaged over all pixels containing that class label. We see that different shapes have qualitatively different frequency spectra, allowing for > 99% pixel-wise classification accuracy on a test set.

Figure 6: Wave-based models learn to separate distinct shapes in frequency space. (Left)
Plot of predicted semantic segmentation and select set of frequency bins for each pixel of a
given test image. (Right) The full frequency spectrum for each shape in the dataset.

In a sense then, this model can actually be understood to be ‘hearing the shape’ of each polygon, by propagating waves around it in latent space, and extracting the Fourier coefficients of the resulting oscillations. To see what we mean, take a listen for yourself. Below are the different sounds that the model has learned to produce foreach of the shapes in the dataset, synthesized from the above Fourier spectra, as well as for the background:

Background

Audio Player