Poster Abstracts

Comparing LLM and Human Semantic Representations in Abstract Reasoning, Caroline Ahn
- Despite recent advancements, AI systems struggle to represent abstract knowledge in ways that align with human cognition. This study evaluates how a large language model (LLM), OpenAI’s GPT-4o, approaches abstract rule-learning and problem-solving compared to humans. A total of 184 online participants completed 40 visuospatial puzzles adapted from the Abstraction and Reasoning Corpus, optimized for human cognitive testing. GPT-4o, a multimodal transformer pretrained on internet-scale data, received the puzzles as structured JSON data, where the input-output of each puzzle was represented as discrete grids of integers. Both humans and AI were prompted to select abstract keywords relevant to each puzzle’s rules from a predefined list. On average, human participants solved 42.3% of tasks (SD = 30.4%), whereas GPT-4o solved only 2 out of 40 (5%), demonstrating limited LLM generalization in abstract reasoning. However, human performance varied widely, suggesting individual differences in abstract rule learning. AI-selected descriptors diverged from human choices—humans prioritized Color, Pattern, and Object Relationships, while GPT-4o overemphasized spatial structure and low-level features. In the two solved tasks, AI’s selections aligned more closely with humans, particularly in the Color keyword, whereas in failed tasks, AI overemphasized the Position keyword and tended to choose structural concepts humans largely ignored. These findings highlight GPT-4o’s limitations as an LLM trained predominantly on linguistic data, revealing its struggles with non-text-based abstraction. By identifying where AI reasoning diverges from human semantic understanding, this study provides insight into how LLMs process abstract problem-solving and informs efforts to improve their reasoning abilities. (Back to top>>)
Loss-to-Loss Prediction: Scaling Laws for All Datasets, Nikhil Anand
- While scaling laws provide a reliable methodology for predicting train loss across compute scales for a single data distribution, less is known about how these predictions should change as we change the distribution. In this paper, we derive a strategy for predicting one loss from another and apply it to predict across different pre-training datasets and from pre-training data to downstream task data. Our predictions extrapolate well even at 20x the largest FLOP budget used to fit the curves. More precisely, we find that there are simple shifted power law relationships between (1) the train losses of two models trained on two separate datasets when the models are paired by training compute (train-to-train), (2) the train loss and the test loss on any downstream distribution for a single model (train-to-test), and (3) the test losses of two models trained on two separate train datasets (test-to-test). The results hold up for pre-training datasets that differ substantially (some are entirely code and others have no code at all) and across a variety of downstream tasks. Finally, we find that in some settings these shifted power law relationships can yield more accurate predictions than extrapolating single-dataset scaling laws. (Back to top>>)
Deep Learning Needs Deep Analysis: Evidence from ForageWorld, Ryan Badman
- Spatial navigation, memory, and planning are well-studied in both neuroscience and computer science, but rarely together in the same framework. Deep reinforcement learning (DRL) agents, containing neural networks that approach the size and computational power of insect brains, are attractive systems to examine cognition in naturalistic, goal-oriented spatial tasks. We created ForageWorld, a foraging task suite, in which DRL agents learn the location of resource patches that deplete and replenish, while avoiding predators, in large procedurally generated and partially observable arenas. Despite their relatively small size and vanilla architectures, DRL agents in ForageWorld demonstrated sophisticated time-extended behaviors, such as mapping locational features, strategic exploration, and predator defense. Crucially, these agents were not explicitly reward-shaped for these abilities and contained recurrent networks with only simple memory. Furthermore, the path trajectories of ForageWorld agents mimicked the exploration paths documented in insects, and analyses of the agents’ neural activations revealed multiplexed encoding of reward, location, and patch features, similar to what is known in neural circuits in biological brains. We tested the often proposed hypothesis in neuroscience and computer science that more naturalistic and difficult tasks automatically lead to models converging to more similar behavioral and neural solutions across architectures. Our results suggest that deeper behavioral and neural analyses of DRL agents in naturalistic tasks is critical for both advancing agent-based models and for probing the capabilities of small brains. (Back to top>>)
Foveated sensing with KNN convolutional neural networks, Nicholas Blauch
- Human vision prioritizes the center of gaze through spatially-variant retinal sampling, leading to magnification of the fovea in cortical visual maps. In contrast, deep neural network models (DNNs) typically operate on spatially uniform inputs, a mismatch that limits their use in understanding human vision. While some work has explored foveated sampling in DNNs, these methods have been forced to wrangle retinal samples into grid-like representations, sacrificing faithful cortical retinotopy and leading to warped receptive field shapes that depend on eccentricity. Here, we enable realistic foveated encoding of visual space by adapting the model architecture. First, we create a spatially-variant input sensor based on the principle of isotropic cortical magnification (Daniel & Whitteridge, 1961; Schwartz, 1980), embedded in a 3D cortical model (Rovamo & Virsu, 1984). To process images on the curved sensor manifold, we convert spatial kernels for convolution and pooling into k-nearest neighborhoods (KNNs), and generalize convolution to KNNs. Filters are learned in a canonical reference frame, and are spatially mapped into each neighborhood for perception. This approach enables the construction of hierarchical KNN convolutional neural networks (KNN-CNNs) closely matched to their CNN counterparts. Broadly, this model class offers a more biologically-aligned sampling of the visual world, enabling future computational work to explore the active processes by which humans decide where to look, and how information is integrated over multiple fixations. Last, this approach holds promise in building more neurally mappable models.(Back to top>>)
Preemptive Solving of Future Problems: Multitask Preplay in Humans and Machines, Wilka Carvalho
- We live in a world filled with many co-occurring tasks—stove and fridge tasks are commonly co-located in kitchens, coffee shops and shopping centers are commonly co-located in city centers, and colleagues with similar specialties may sit near each other in office buildings. We hypothesize that humans leverage experience on one task to preemptively learn solutions to other tasks that were accessible but not achieved. We formalize this with “Multitask Preplay”, a novel computational theory that replays experience on one task as the starting point to “preplay” behavior, which aims to solve other available tasks. We first show that compared to traditional preplay and predictive representation methods, multitask preplay better predicts how humans generalize to tasks that were accessible but not accomplished in a small grid world, including—suprisingly—even when participants didn’t know they would need to generalize to these tasks. We then show these predictions generalize to craftax, a large, partially observable 2D open-world environment. Finally, to showcase the utility of Multitask Preplay as a theory for human intelligence, we leverage craftax to demonstrate that, compared to traditional preplay and predictive representation methods, Multitask Preplay enables AI agents to learn behaviors that best transfer to novel worlds that share task co-occurrence structure. Together these findings highlight that Multitask Preplay is a scalable theory for human intelligence which can also improve how AI generalizes to novel environments. (Back to top>>)
RNN Replay: Leakage and Underdamped Dynamics, Josue Casco-Rodriguez
- Neural replay is a feature of biological neural systems that is indicative of how biological neural networks, like the hippocampus, process and store memories. Artificial models of networks like the hippocampus should thus also be able to replay previously seen event sequences. One useful task for studying hippocampal function and neural replay is path integration, wherein neural networks must track a latent variable using noisy observations. The standard hippocampal models for such tasks are continuous attractor neural networks (CANNs), but they are handcrafted instead of learned from data. A few recent works have described hippocampal neural replay as a byproduct of RNNs trained to path-integrate via predictive learning, since they form attractors that can be described through Langevin sampling of score functions. However, the current perspective of neural replay as Langevin sampling has yet to identify the nature of RNN score functions, explain recently proposed replay techniques like negative feedback (adaptation) and multi-step predictive training, or prescribe new methods to improve neural replay in RNNs. We rectify these shortcomings with three key results: (1) The RNN score function is time-variant and thus difficult to estimate, even for simple distributions—however, its nature does motivate the use of leaky dynamics. (2) Multi-step predictive training forces the RNN to learn both its score function and the dynamics of observed events; adaptation destabilizes attractors, which diversifies but also slows replay. (3) We propose a novel underdamped Langevin sampling scheme that temporally compresses neural replay events, a phenomenon which is repeatedly observed in vivo. (Back to top>>)
Balancing Excitation and Inhibition for Adaptive Decision-Making, Veronica Chelu
- Neural circuits rely on excitatory–inhibitory (E/I) interactions to support adaptive learning and decision-making [5]. Here, we investigate the computational role of the E/I balance in shaping neural circuits for decision-making and continual reinforcement learning (RL). First, using a mean-field E/I model of two-choice decision-making [2], we show that excitatory recurrence enhances responsiveness by amplifying signals and driving circuits toward near-critical states, while inhibition stabilizes or refines decision-selective activity. Next, we integrate a bio-inspired E/I mechanism in an Actor-Critic[3] agent to examine how a well-tuned E/I balance supports adaptive learning in environments with shifting contingencies. By modulating inhibitory feedback, we show that E/I interactions regulate the speed-accuracy trade-off, enabling flexible control over adaptation rate. Our results suggest that E/I tuning is essential for maintaining an optimal balance between flexibility and stability, thus supporting efficient continual learning. We then compare two E/I recurrent neural network (RNN) approaches in a continual RL setting for decision-making tasks. A naive partitioning of excitatory and inhibitory populations [1] proves insufficient for dynamic tasks, resulting in poor performance. In contrast, Dale’s Artificial Neural Networks (DANNs)[4]—which incorporate biologically realistic E/I constraints—demonstrate robust learning and reliable adaptation across varying task demands. Finally, we discuss these findings in the context of neuromodulation, highlighting how E/I balance can serve as a fundamental mechanism for regulating cognitive function. Overall, our work underscores the computational importance of E/I interactions in shaping neural circuit dynamics, enabling adaptive decision-making and continual reinforcement learning in biologically inspired models. (Back to top>>)
Rethinking Fine-Tuning when Scaling Test-Time Compute: Limiting Confidence Improves Mathematical Reasoning, Feng Chen
- Recent progress in large language models (LLMs) highlights the power of scaling test-time compute to achieve strong performance on complex tasks, such as mathematical reasoning and code generation. This raises a critical question: how should model training be modified to optimize performance under a subsequent test-time compute strategy and budget? To explore this, we focus on pass@N, a simple test-time strategy that searches for a correct answer in N independent samples. We show, surprisingly, that training with cross-entropy (CE) loss can be misaligned with pass@N in that pass@N accuracy decreases with longer training. We explain the origins of this misalignment in terms of model overconfidence induced by CE, and experimentally verify our prediction of overconfidence as an impediment to scaling test-time compute via pass@N. Furthermore we suggest a principled, modified training loss that is better aligned to pass@N by limiting model confidence and rescuing pass@N test performance. Our algorithm demonstrates improved mathematical reasoning on MATH and MiniF2F benchmarks under several scenarios: (1) providing answers to math questions; and (2) proving theorems by searching over proof trees of varying shapes. Overall, our work underscores the importance of co-designing two traditionally separate phases of LLM development: training-time protocols and test-time search and reasoning strategies. (Back to top>>)
Geometry Linked to Untangling Efficiency Reveals Structure and Task-Relevant Information in Neural Populations, Chi-Ning Chou
- From an eagle spotting a fish in shimmering water to a scientist extracting patterns from noisy data, many cognitive tasks require untangling overlapping signals. Neural circuits achieve this by transforming complex sensory inputs into distinct, separable representations that guide behavior. Data-visualization techniques convey the geometry of these transformations, and decoding approaches quantify performance efficiency, but we lack a framework for linking these two key aspects. Here we address this gap by introducing Geometry Linked to Untangling Efficiency (GLUE) via manifold capacity theory, a data-driven analysis framework that links changes in the geometrical properties of neural activity patterns to representational untangling. We applied GLUE to over seven neuroscience datasets—spanning multiple organisms, tasks, and recording techniques—and found that task-relevant representations untangle in many domains, including along the cortical hierarchy, throughout learning, and during neural dynamics. Furthermore, GLUE can characterize the underlying geometric mechanisms of representational untangling, and explain how it facilitates efficient and robust computation. Beyond neuroscience, GLUE provides a powerful framework for quantifying information organization in data-intensive fields such as structural genomics and interpretable AI, where analyzing high-dimensional representations remains a fundamental challenge. (Back to top>>)
Implicit Generative Modeling by Kernel Similarity Matching, Shubham Choudhary
- Understanding how the brain encodes stimuli has been a fundamental problem in computational neuroscience. Insights into this problem have led to the design and development of artificial neural networks that learn representations by incorporating brain-like learning abilities. Recently, learning representations by capturing similarity between input samples has been studied to tackle this problem. This approach, however, has thus far been used to only learn downstream features from an input and has not been studied in the context of a generative paradigm, where one can map the representations back to the input space, incorporating not only bottom-up interactions (stimuli to latent) but also learning features in a top-down manner (latent to stimuli). We investigate a kernel similarity matching framework for generative modeling. Starting with a modified sparse coding objective for learning representations proposed in prior work, we demonstrate that representation learning in this context is equivalent to maximizing similarity between the input kernel and a latent kernel. We show that an implicit generative model arises from learning the kernel structure in the latent space and show how the framework can be adapted to learn manifold structures, potentially providing insights as to how task representations can be encoded in the brain. To solve the objective, we propose a novel Alternate Direction Method of Multipliers (ADMM) based algorithm and discuss the interpretation of the optimization process. Finally, we discuss how this representation learning problem can lead towards a biologically plausible architecture to learn the model parameters that ties together representation learning using similarity matching (a bottom-up approach) with predictive coding (a top-down approach). (Back to top>>)
Alignment Between Human Attention and a Gated Attention Mechanism in Social and Descriptive Captioning Tasks, Isaac Christian
- The concept of attention has transformed artificial intelligence, serving as the computational backbone of modern large language and vision models. However, the extent to which machine attention approximates human attentional mechanisms remains an open question. While neural networks can replicate some human-like attentional mechanisms, they struggle with top-down attention – the ability to direct focus based on prior experience and knowledge rather than extrinsic features. To investigate this disparity, we examined the correspondence of a gated attention mechanism to human attention. We opted for gated attention as it may capture an aspect of top down attention selection (multiplicative gain) that transformer (similarity based) attention may not. Eye-tracking data was collected as participants engaged in two tasks: (1) inferring an agent’s focus of attention (theory of mind) and (2) generating descriptive captions for naturalistic images. The model was sensitive to context (task objective and semantic production), exhibited behavior similar to hemi-spatial neglect (a neurological condition) when perturbed, and proved most useful in noisy environments – aligning with signatures of human attention. However, when presented with social scenes that were ambiguous, requiring more complex social reasoning, the model’s performance worsened, revealing a gap between strategies learned by the model and humans. Our work provides evidence that a gated attention mechanism may account for a number of behavioral effects associated with attention. More broadly, we provide evidence that multiplicative gain is computationally distinct and may produce separate behavior from transformer attention. (Back to top>>)
Lasting Representations in RL via Successor Features and Memory Consolidation, Raymond Chua
- A fundamental hypothesis in computational neuroscience proposes that biological agents mitigate catastrophic interference through learning and memory consolidation occurring over multiple timescales. Recent work in reinforcement learning (RL) has demonstrated that multi-timescale consolidation mechanisms significantly improve continual learning. In this study, we investigate how learning representations through Successor Features coupled with multi-timescale consolidation inspired by synaptic consolidation can produce representations robust to drastic changes in environment dynamics. We evaluated this hypothesis using sparse-reward 3D mazes with pixel observations and Mujoco control tasks with state observations. Results demonstrate that our proposed approach mitigates catastrophic forgetting and enhances continual task learning. Furthermore, we observed through self-attention mechanisms that slower-timescale representations increasingly dominate attention over faster timescales as agent recounter previously experienced tasks, aligning with biological evidence of memory consolidation. Our findings contribute to understanding the functional roles of synaptic plasticity operating at multiple timescales, demonstrating improved RL performance by capturing essential features from biological memory systems. (Back to top>>)
Biologically Interpretable Machine Learning Approaches for Analyzing Neural Data, Madelyn Cruz
- While deep neural networks (DNNs) can achieve impressive classification performance, they operate like “black boxes” and are often difficult to interpret. This study explores biological neural networks (BNNs) by applying backpropagation to biophysically accurate neuron models. Using BNNs, we classify electroencephalogram (EEG) and non-EEG signals, generate EEG signals, and analyze EEG neurophysiology through model-derived parameters.
- Our BNNs achieve strong performance in classifying handwritten digits from the MNIST Digits Dataset, learning faster than traditional neural networks. The same BNN architecture also performs well on time-series neuronal datasets, requiring fewer parameter adjustments while accurately capturing temporal dynamics, leading to faster learning and improved interpretability. This is demonstrated by directly using BNNs to classify EEG recordings that correspond to alertness vs. fatigue and varying levels of consciousness, as well as power spectral densities from EEG recordings associated with different workload levels. By analyzing the gradients from backpropagation, we find similarities between these numerically efficient learning mechanisms and Hebbian learning in the brain, in terms of how weights change the loss function and how changing the weights at specific time intervals affects the loss function. Additionally, we trained our BNNs to exhibit different frequencies observed in EEG recordings and found that the variability of synaptic weights and applied currents increased with the target frequency range. Overall, applying backpropagation to accurate ordinary differential equation models for analyzing neuronal data can enhance our ability to faster classify and further understand neuronal activity and neural network learning, especially when applied to neuronal data. (Back to top>>)
Anatomically and Functionally Constrained Bio-Inspired Recurrent Neural Networks Outperform Traditional RNN Models, Nima Dehghani
- Introduction: Understanding how neural circuits drive sensory processing and decision-making is a core challenge in neuroscience. While traditional Recurrent Neural Networks (RNNs) excel at modeling temporal dynamics, they fall short in capturing the structured synaptic architecture observed in biological networks. Recent advances, such as spatially embedded RNNs (seRNNs), have integrated spatial constraints to enhance biological relevance. However, there remains a gap in leveraging comprehensive anatomical and functional data to improve both task performance and alignment with neural principles.
- Methods: We present a bio-inspired RNN that bridges this gap by integrating detailed anatomical connectivity and functional imaging data from the MICrONS dataset. This dataset provides nanometer-scale reconstructions of neural circuits and functional activity recordings from mouse visual cortex. Using neuronal positions, synaptic connections, functional correlations, and Spike Time Tiling Coefficients (STTC) to quantify synchrony, we constrained our model with biologically grounded weight initialization, communicability calculations, and a regularizer to favor realistic network properties.
- Results: Our bio-inspired RNN outperformed baseline models across three decision-making tasks: 1-step inference, Go/No-Go, and perceptual decision-making. The anatomically and functionally constrained model achieved the highest accuracies—89.4%, 96.9%, and 86.7%, respectively—and exhibited superior metrics like modularity and small-worldness, reflecting biologically relevant network structures.
- Discussion: These findings demonstrate that incorporating anatomical and functional constraints not only improves task performance but also fosters biologically meaningful network properties. Future work should extend this framework to visual processing and diverse architectures, paving the way for deeper insights into neural dynamics. (Back to top>>)
Robusto-1 Dataset: Comparing Humans and VLMs on real out-of-distribution Autonomous Driving VQA from Peru, Arturo Deza
- As multimodal foundational models start being deployed experimentally in Self-Driving cars, a reasonable question we ask ourselves is how similar to humans do these systems respond in certain driving situations — especially those that are out-of-distribution? To study this, we use dashcam video data from Peru, a country with one of the “worst” (aggressive) drivers in the world, a high traffic index, and a high ratio of bizarre to non–bizarre street objects likely never seen in training. In particular, to test at a cognitive level how well Foundational Visual-Language Models (VLMs) compare to Humans in Autonomous Driving, we move away from bounding boxes, segmentation maps, occupancy maps or trajectory estimation to multi-modal Visual Question Answering (VQA) comparing both humans and machines through a popular method in systems neuroscience known as Representational Similarity Analysis (RSA). Depending on the type of questions we ask and the answers these systems give, we will show in what cases do VLMs and Humans converge or diverge allowing us to probe on their cognitive alignment. (Back to top>>)
Unified Predictive Model for Whole-Brain Neural Dynamics of Larval Zebrafish, Yu Duan
- Predicting future neural dynamics is critical to building mechanistic models of brain function and for applications such as closed-loop optogenetic control and brain-machine interfaces. Models that effectively forecast future neural activity over large spatial scales, rather than merely fit existing data, are essential for this endeavor. Yet, most neuroscience models focus on capturing dynamics across a few regions in lab animals in short trials of stereotyped behavioral tasks. More holistic models require insights into brainwide dynamics during spontaneous behaviors in task-free settings. Furthermore, time-series forecasting (TSF) in complex systems is an unsolved problem in machine learning. Here, we develop a unified predictive model trained to forecast spontaneous brainwide activity recorded from larval zebrafish. We introduce Predict-POYO, a novel model based on neural data from multiple larval zebrafish that predicts neural dynamics at cellular resolution. To build Predict-POYO, we adapt the tokenization scheme of a transformer-based model previously used for behavioral decoding of primate data [5]. Predict-POYO uses recent calcium imaging data to successfully predict whole-brain neural dynamics several timesteps into the future. We evaluate Predict-POYO against standard baselines and state-of-the-art models for dynamical system reconstruction (DSR) and TSF. We find that in real data, only Predict-POYO significantly outperforms a linear baseline, highlighting the advantage of dedicated architectures for modeling the structured noise present in neural data. By comparison, DSR and TSF models only perform with idealized, simulated data, and drastically underperform on real brain data. Our analyses reveal that high-quality neural datasets – longer recordings from multiple animals in naturalistic settings – will better facilitate neural prediction models that capture features shared across the brains of multiple individuals. Creating the first zebrafish-based neural prediction model that generalizes across multiple individuals’ brains holds significant promise for uncovering both conserved and unique neural dynamics, and designing closed-loop interventions that are effective across individuals. (Back to top>>)
A Model of Continuous Phoneme Recognition Reveals the Role of Context in Human Speech Perception, Gasser Elbanna
- Humans excel at transforming acoustic waveforms into meaningful linguistic representations, despite the inherent variability in speech signals. However, the underlying mechanisms that enable such robust perception remain unclear. One bottleneck is the absence of models that replicate human performance and that could be used to probe for mechanistic hypotheses.
- To address this bottleneck, we developed PARROT, an artificial neural network model of continuous speech perception. PARROT maps acoustic inputs from a simulated cochlear front-end into linguistic units.
- To evaluate human-model alignment, we designed a novel behavioral experiment in which participants transcribed spoken nonwords. This experiment allowed us to compute the first full phoneme confusion matrix in humans, enabling a systematic comparison of human–model phoneme confusions. We found that PARROT exhibited similar patterns of phoneme confusions as humans (r=0.93) as well as patterns of phoneme accuracy (r=0.97).
- To study the role of contextual cues in human speech perception, we manipulated the model’s access to surrounding context. We found that models with access to both future and past context aligned more with human phonemic judgments than those using past or future alone. This result provides evidence that humans integrate across a local time window extending into the future to disambiguate speech sounds.
- Overall, the results suggest that aspects of human-like speech perception emerge by optimizing for sub-lexical recognition from cochlear representations. Our work is a first step towards building biologically-plausible models that explain human speech encoding. (Back to top>>)
How Memory Enables Rapid Learning in Animal Navigation: A Meta-Learning Perspective, Ching Fang
- Animals can quickly learn and adapt to new environments, a capability not well captured by standard reinforcement learning models. This rapid adaptation likely relies on episodic memory—forming and retrieving experiences in novel contexts to guide decisions. What computations support fast learning and what role does memory play? We investigate this question by meta-learning a transformer model for in-context reinforcement learning across branching tree mazes where mice demonstrate efficient navigation. Transformers, with their attention-based architecture mimicking key-value memory systems, provide a framework for studying the interplay of memory and learning. Our model learns quickly in new mazes, outperforming standard RL methods. Analysis reveals two key representation learning strategies: in-context learning, building reward-independent environmental maps, and cross-context learning, aligning representations across different mazes. These strategies parallel computations in the hippocampal-entorhinal system, regions in the brain crucial for episodic memory. Using integrated gradients, we show the model accesses “memories” from its current state and areas near goals when making decisions. These memories contain both state information and cached computations about environment structure. We hypothesize that the model employs a hybrid decision-making approach—using approximate location estimates when far from rewards and precise planning when closer. Our work demonstrates that rapid learning in animals may be best understood through meta-learning. We provide predictions about neural representations and memory activations expected in maze navigation. Finally, our findings propose that memory systems in the brain may store not just raw experience, but also cached computations to enable rapid learning in novel environments. (Back to top>>)
An optimal control principle for goal-directed behavior, Geoffrey Goodhill
- Animal behavior is shaped by numerous constraints, among which energy consumption is fundamental. Neural circuits must optimize movement strategies to balance energy expenditure with task demands, a challenge that is particularly acute in freely moving animals. Here, we reveal how energy constraints shape prey-hunting behaviors in larval zebrafish, a model system with well-defined sensorimotor control circuits. Using large-scale computational fluid dynamics simulations we quantified the energetic costs in Joules associated with experimentally-measured movement patterns and found that, while hunting prey, zebrafish match the usage frequency of different patterns to the energy they consume. This remained true at three different stages of development despite changing energy costs, suggesting a process of active adaptation. An artificial agent trained with reinforcement learning to catch prey using minimal energy reproduced this matching strategy, demonstrating that it represents a minimum-energy solution for prey capture. Together this work suggests a general principle for optimal motor control in the face of changing energy costs due to developmental or environmental changes. (Back to top>>)
Convolutional Neural Networks Can (Meta-)Learn the Same-Different Relation, Max Gupta
- While convolutional neural networks (CNNs) have come to match and exceed human performance in many settings, the tasks these models optimize for are largely constrained to the level of individual objects, such as classification and captioning. Humans remain vastly superior to CNNs in visual tasks involving relations, including the ability to identify two objects as same’orsame’ordifferent’. A number of studies have shown that while CNNs can be coaxed into learning the same-different relation in some settings, they tend to generalize poorly to other instances of this relation. In this work we show that the same CNN architectures that fail to generalize the same-different relation with conventional training are able to succeed when trained via meta-learning, which explicitly encourages abstraction and generalization across tasks. (Back to top>>)
Hierarchical Development, Learning, and Robustness in Deep Networks as a Model for the Primate Visual System, Dianna Hidalgo
- Understanding how hierarchical processing, maturation, and robustness emerge in the primate visual system is a central challenge in neuroscience. Directly studying these processes in biological neural circuits remains difficult due to current technological limitations. Here, we use deep convolutional neural networks (DCNNs) as biologically inspired models of the primate visual system, leveraging their accessible connectivity patterns, activations, and performance metrics to generate concrete, experimentally testable predictions about cortical function.
- We investigate three biological hypotheses using DCNNs: (1) hierarchical maturation of visual areas during development, (2) structured changes in network representations during learning new visual tasks (plasticity), and (3) robustness differences across cortical areas when facing synaptic or neuronal perturbations. Our findings reveal that hierarchical maturation emerges naturally in DCNNs, with earlier visual layers maturing faster than later ones—consistent with developmental patterns observed in primates. Furthermore, during transfer learning, higher layers adapt more readily to new tasks, suggesting a hierarchical structure in visual cortical plasticity. Finally, robustness tests demonstrate that higher-level representations in strictly feedforward architectures (e.g., AlexNet) are significantly more resilient to synaptic and node perturbations compared to lower layers, whereas architectures with skip connections (e.g., ResNet) show robustness dependent on circuit connectivity rather than strict hierarchical depth.
- These findings hold consistently across multiple datasets and training methods, including alternative biologically plausible training schemes such as forward direct feedback alignment (FDFA). By bridging biological theory and deep learning, our results propose testable predictions about visual cortical development, learning, and robustness, guiding future experimental approaches in NeuroAI research. (Back to top>>)
Biologically Realistic Computational Primitives of Neocortex Implemented on Neuromorphic Hardware Improve Vision Transformer Performance, Suraj Honnuraiah
- Understanding and replicating the computational principles of the brain using neuromorphic hardware and modern deep learning architectures is vital for advancing neuro-inspired AI (NeuroAI). In this study, we developed an experimentally-constrained biophysical network model focusing on neocortical circuit motifs in layers 2-3 of the primary visual cortex (V1). Specifically, we investigated how four major classes of cortical interneurons—Parvalbumin (feedforward inhibition), Somatostatin (feedback inhibition), VIP (disinhibition), and LAMP5 (gain normalization)—contribute to soft winner-take-all (sWTA) computations critical for gain modulation, signal restoration, and context-dependent multistability. Employing a novel parameter mapping technique, we successfully implemented sWTA computations on IBM’s TrueNorth (TN) neuromorphic chip, closely replicating biological neural dynamics. Remarkably, retrospective analysis revealed a strong correspondence between parameters of our biophysical model and those implemented on the TN hardware, affirming the functional roles of these inhibitory neuron classes. Furthermore, sparse coupling within this sWTA motif enabled the simulation of a two-state neural state machine on the TN chip, replicating essential working memory dynamics for cognitive functions. Additionally, integrating this sWTA computational motif as a preprocessing layer significantly enhanced the performance of the Vision Transformer (ViT) architecture on the MNIST digit classification task, notably improving generalization to unseen data—suggesting potential mechanisms analogous to zero-shot learning. Our work establishes a robust framework for translating biologically-inspired computations to neuromorphic platforms, offering valuable insights for applications on advanced hardware such as Intel’s Loihi2 and IBM’s Northpole, thus paving a clear path forward for NeuroAI. (Back to top>>)
AgentDSA: Dynamical Similarity Analysis for Neural Controllers, Ann Huang
- Quantitative comparison of different reinforcement learning (RL) agents is a crucial step towards scientific understand- ing and algorithmic improvement. In complex, partially observed environments, agents can be modeled as dynamical systems (e.g. recurrent neural networks), which motivates the development of comparison metrics based on dynamical systems theory. Recently, Dynamical Similarity Analysis (DSA) was introduced, which compares the dynamics of au- tonomous dynamical systems or those driven by relatively simple inputs. However, many systems of interest, especially in RL and neuroscience, are non-autonomous. They are driven by complex external inputs and receive observations that are contingent on the outputs produced by the system itself. Here, we introduce AgentDSA, a novel extension of DSA, designed for comparing the dynamics of input-driven systems. While DSA applies the Dynamic Mode Decompo- sition (DMD) to efficiently estimate intrinsic dynamics, AgentDSA leverages DMD with Control (DMDc) to distinguish the contributions of intrinsic andinput-driven dynamics. We additionally extend the original DSA metric to account for comparing non-autonomous systems. AgentDSA thus provides a robust framework for comparing RL agents by treating them as non-autonomous dynamical systems.
- We demonstrate AgentDSA on the learning dynamics of recurrent neural networks trained using RL to solve the Inverted Pendulum problem. AgentDSA uncovers a phase transition in the controller that aligns with a sharp increase in reward. Notably, this transition is not apparent in standard methods of analysis such as fixed point analysis and is less evident via the original DSA. AgentDSA additionally reveals that the transition is driven by changes in the input-driven dynamics, although both intrinsic and input-driven dynamics transition concomitantly. In future work, we will explore AgentDSA’s application in more complex environments, including neuroscience tasks and multiagent settings. (Back to top>>)
Sex differences in the brain’s structural and functional connectivity across the lifespan, Ke Huang
- Understanding sex differences in functional and structural brain connectivity across different age ranges is essential for unraveling the complex interplay between biological and developmental factors that shape the human brain. Research has shown that males and females exhibit distinct patterns of brain connectivity, but these differences are not static; they evolve dynamically over the lifespan, influenced by key developmental stages such as puberty, reproductive maturity, and aging. Examining these patterns helps identify critical periods when sex differences in connectivity are most pronounced and provides insights into their role in development and aging. Moreover, determining which brain regions or networks contribute most to these differences has broader implications for neuroscience and brain health in both men and women.
- To investigate these patterns, we employ logistic regression with an ensemble technique and Krakencoder, a multi-modality connectome fusion and translation tool, to predict sex based on brain connectivity features. Our results reveal that these features are most distinct during young adulthood. To further explore network contributions, we conduct sensitivity analyses by systematically ablating all regional connections within a network—replacing them with the mean—and performing sex classification on the modified data. This approach allows us to identify the brain networks most critical for sex prediction. Our findings indicate that higher-order networks play a particularly significant role in distinguishing sex, especially during developmental and aging transitions. (Back to top>>)
Can Your Neurons Hear the Shape of an Object?, Mozes Jacobs
- Traveling waves of neural activity are widely observed in the brain, but their precise computational function remains unclear. One prominent hypothesis is that they enable the transfer and integration of spatial information across neural populations. However, few computational models have explored how traveling waves might be harnessed to perform such integrative processing. Drawing inspiration from the famous “Can one hear the shape of a drum?” problem — which highlights how normal modes of wave dynamics encode geometric information — we investigate whether similar principles can be leveraged in artificial neural networks. Specifically, we introduce convolutional recurrent neural networks that learn to produce traveling waves in their hidden states in response to visual stimuli, enabling spatial integration. By then treating these wave-like activation sequences as visual representations themselves, we obtain a powerful representational space that outperforms local feed-forward networks on tasks requiring global spatial context. In particular, we observe that traveling waves effectively expand the receptive field of locally connected neurons, supporting long-range encoding and communication of information. We demonstrate that models equipped with this mechanism solve visual semantic segmentation tasks demanding global integration, significantly outperforming local feed-forward models and rivaling non-local U-Net models with fewer parameters. As a first step toward traveling-wave-based communication and visual representation in artificial networks, our findings suggest wave-dynamics may provide efficiency and training stability benefits, while simultaneously offering a new framework for connecting models to biological recordings of neural activity. (Back to top>>)
Active Electrosensing in Artificial Fish Collectives, Sonja Johnson-Yu
- Weakly electric fish, like Gnathonemus petersii, use a remarkable electrical modality for active sensing and communication, but studying their rich collective behavior and associated neural activity in naturalistic settings remains challenging. Here, we present a novel computational framework to study these behaviors using a biologically-inspired model of fish collectives, where recurrent neural network (RNN) based artificial agents trained via multi-agent reinforcement learning (MARL) modulate their electric organ discharges (EODs) and movement patterns to collectively forage in virtual environments. We show that our virtual agents recapitulate the behavior of real weakly electric fish on a “homing” task by learning to orient themselves with respect to the electric field lines. Trained agents also demonstrate several emergent social behaviors consistent with real fish collectives, including shifts in EOD interval distributions across competitive and non-competitive environments, and social interaction patterns like ‘freeloading,’ where agents reduce their EOD rates while benefiting from neighboring agents’ active sensing. Notably, these behaviors emerge through rewards for individual fitness and emergent inter-agent interactions, rather than through rewarding agents explicitly for social interactions. Analyzing the neural activity from the RNNs in the trained agents reveals high dimensional dynamics, with key sensory inputs–including proximity to other agents and food–most clearly represented. These analyses reveal the neural basis of dynamic interactions between multiple interacting fish-like agents and with the environment, and do so in an interpretable manner. This work has broad implications for the neuroethology of weakly electric fish, as well as other social animals in which extensive neural recordings from multiple individuals, and thus traditional data-driven modeling, are intractable. (Back to top>>)
No longer available
An analytic theory of creativity in convolutional diffusion models, Mason Kamb
- We obtain an analytic, interpretable and predictive theory of creativity in convolutional diffusion models. Indeed, score-based diffusion models can generate highly creative images that lie far from their training data. But optimal score-matching theory suggests that these models should only be able to produce memorized training examples. To reconcile this theory-experiment gap, we identify two simple inductive biases, locality and equivariance, that: (1) induce a form of combinatorial creativity by preventing optimal score-matching; (2) result in a fully analytic, completely mechanistically interpretable, equivariant local score (ELS) machine that, (3) without any training can quantitatively predict the outputs of trained full convolutional diffusion models (like ResNets and UNets) with high accuracy (median r2 of 0.90, 0.91, 0.94 on CIFAR10, FashionMNIST, and MNIST). Our ELS machine reveals a locally consistent patch mosaic model of creativity, in which diffusion models create exponentially many novel images by mixing and matching different local training set patches in different image locations. Our theory partially predicts the outputs of pre-trained self-attention enabled UNets (median r2 ∼ 0.75 on CIFAR10), revealing an intriguing role for attention in carving out semantic coherence from local patch mosaics. Our theory also reveals for the first time the mechanism behind common spatial inconsistencies in diffusion-generated images, theoretically predicting common defects such as incorrect numbers of limbs and digits as a consequence of excessive spatial locality in generation. (Back to top>>)
Ken Utilization Layer for Enhanced Knowledge Tracing and Adaptive Learning, Grey Kuling
- We develop a novel approach to knowledge tracing—tracking a learner’s evolving mastery of knowledge concepts—using neural simulations inspired by cognitive neuroscience. Our method, the Ken Utilization Layer for Graph Knowledge Tracing (KUL-GKT), introduces an online updating mechanism that dynamically refines student knowledge states as they respond to assessment items. Unlike traditional Graph Knowledge Tracing (GKT), which requires training on large datasets (500K responses) before deployment, KUL-GKT updates in real-time, which substantially reduces data requirements and better reflects real-world learning.
- KUL-GKT, inspired by the Go-CLS algorithm, uses Hebbian-inspired associative memory retrieval, consolidation, and forgetting within a Hopfield network structure, capturing student understanding as it evolves. We evaluated its predictive performance against a GKT model trained offline on the full dataset. Despite using only the last 10 responses per student, KUL-GKT outperformed the fully trained baseline for mean reciprocal rank (MRR) and normalized discounted cumulative gain (nDCG).
- These results show that KUL-GKT can generalize from minimal data and dynamically adapt to new students, making it well-suited for educational applications. This work highlights the potential of cognitively inspired online learning frameworks to replace static, large-scale training datasets with incremental, adaptive models that better mirror student learning. (Back to top>>)
Test-time scaling meets associative memory: Challenges in subquadratic models, Mohit Kulkarni
- The emerging paradigm of scaling test-time compute–enhancing model performance by scaling up chain of thought reasoning–is gaining significant traction in the deep learning community. While effective, these methods incur substantial computational costs at inference time due to the quadratic memory complexity of Transformers with respect to sequence length. Recently, subquadratic architectures such as Mamba have emerged which approach the performance of Transformers on language tasks while showcasing significant improvements in computational efficiency on long sequences. In this paper, we present the first empirical investigation into test-time compute scaling for subquadratic architectures. Our findings reveal that while these models do benefit from increase test-time compute, their gains are consistently lower than those observed in Transformers. We find that this limitation is correlated with their reduced capabilities for in-context associative memory, which hinder reasoning over extended sequences. These results shed light on the trade-offs between computational efficiency and reasoning capabilities in modern architectures, providing a foundation for future research on designing models for both test-time compute scalability and long-chain reasoning. (Back to top>>)
Adaptive kernel predictors from feature-learning infinite limits of neural networks, Clarissa Lauditi
- Previous influential work showed that infinite width limits of neural networks in the lazy training regime are described by kernel machines. Here, we show that neural networks trained in the rich, feature learning infinite-width regime in two different settings are also described by kernel machines, but with data-dependent kernels. For both cases, we provide explicit expressions for the kernel predictors and prescriptions to numerically calculate them. To derive the first predictor, we study the large-width limit of feature-learning Bayesian networks, showing how feature learning leads to task-relevant adaptation of layer kernels and preactivation densities. The saddle point equations governing this limit result in a min-max optimization problem that defines the kernel predictor. To derive the second predictor, we study gradient flow training of randomly initialized networks trained with weight decay in the infinite-width limit using dynamical mean field theory (DMFT). The fixed point equations of the arising DMFT defines the task-adapted internal representations and the kernel predictor. We compare our kernel predictors to kernels derived from lazy regime and demonstrate that our adaptive kernels achieve lower test loss on benchmark datasets. (Back to top>>)
Train-Test Task Alignment in a Solvable model of In-Context Learning, Mary Letey
- Modern sequence models and large language models exhibit in-context learning (ICL), the ability to perform inference on new tasks using only input-provided information, without explicit prior training on those tasks. Recent theoretical work has analyzed ICL in linear regression settings using linear attention architectures, providing sharp asymptotic characterizations and insights into implicit learning algorithms for isotropic data. We extend this analysis to structured data, focusing on the alignment between training and test task structures. We expect this will provide insight for designing pretraining data to optimize ICL performance on structured regression tasks at test time. (Back to top>>)
Training mechanistic models of neural computations, Jakob Macke
- Understanding how neural circuits perform behaviorally relevant computations requires models that integrate experimental observations with biophysically grounded mechanisms. Large-scale neural recordings and connectomic reconstructions provide a wealth of data, but building interpretable models that align with experimental data remains a major challenge.
- We have developed machine learning methods and differentiable simulators that make it possible to algorithmically identify models that link biophysical mechanisms, neural data, and behaviour. In particular, we have developed a simulation toolbox—Jaxley—which implements numerical solvers required for biophysical simulations in Jax. This makes it possible to efficiently simulate biophysical models and train thousands of parameters with gradient descent. We applied Jaxley to a range of datasets and models, e.g. building data-constrained biophysical models of retinal ganglion cells and training large-scale networks with of biophysically detailed neurons on challenging tasks. I will also show how differentiable simulators —in combination with modern connectomic measurements—make it possible to build large-scale mechanistic models of the fruit fly visual system, and how such a model can make experimentally testable predictions for each neuron in the system.
- By leveraging differentiable simulators and inference techniques, we provide a scalable approach for integrating neural data with computational models, bringing us closer to understanding the neural mechanisms underlying behavior. (Back to top>>)
Reverse Engineering Neural Circuits with Artificial Networks: Insights from Loss Landscape Geometry, Flavio Martinelli
- Understanding the mechanisms of brain circuits with artificial networks is a central goal of NeuroAI. One common approach is to fit the output of each model neuron to the activity of a corresponding neuron recorded from the brain. However, how can we ensure that the solution(s) found by the neural network match the brain’s mechanisms? Specifically, how can we ensure that the connections between model neurons are equivalent to their biological counterparts?
- To address this, we explore the space of weight configurations in feed-forward network models, where a “student” network is fitted to the input-output activations of a “teacher” network. Due to the non-convexity of the optimisation problem, a wide range of solutions emerge, many of which fail to replicate the teacher’s connectivity.
- We make two important contributions to the understanding of loss landscapes in deep learning:
- 1) Bad solutions of narrow networks generate manifolds of saddle points in the loss landscapes of wider networks. Parallel to these saddle manifolds, nearly-flat loss channels can guide gradient dynamics to poor minima at infinite parameter norm.
- 2) By increasing the hidden layer width of the student network beyond the teacher’s, the ratio of degenerate global minima to problematic saddle manifolds grows exponentially. This increases the likelihood of reaching a global minimum. We then design an algorithm to bypass the non-convexity of the problem, compress degenerate solutions and recover the teacher’s original connectivity.
- These insights are a step towards a more reliable alignment of artificial networks with brain circuits, improving NeuroAI’s interpretability. (Back to top>>)
Broadly-projecting mesolimbic dopamine neurons implement a distributional critic across the striatum, Sara Matias
- Animal behavior is controlled through the coordinated action of multiple learning systems in the brain. One of these systems, the basal ganglia, instantiates a reinforcement learning (RL) algorithm in which dopamine (DA) neurons transmit reward prediction error (RPE) signals—the difference between actual and expected rewards—to enable value learning via cortico-striatal plasticity. Recent studies have highlighted two novel aspects: first, that RPE signals from midbrain DA neurons can encode entire reward distributions through a distributional RL algorithm that mirrors cutting-edge machine learning approaches, and second, that dopamine axons projecting to different regions of the striatum exhibit functional heterogeneity, indicating that not all DA neurons encode RPE. To examine the functional and anatomical organization of RPE and non-RPE dopamine signals, we conducted multi-fiber photometry recordings of dopamine axonal activity across the entire striatum. We observed that while RPE signals are present throughout the striatum in a reward-based task, aversive signals are heterogeneous. For example, DA in the dorsomedial striatum is activated by airpuffs, while that in the dorsolateral striatum conveys a brief biphasic response. However, fiber photometry recordings cannot disentangle whether the recorded signals are generated from a uniform population of dopamine axons or if, on the contrary, functionally heterogeneous axons intermingle in any particular striatal area. To overcome this limitation, we performed projection-identified electrophysiological recordings from midbrain DA neurons, to investigate if all dopamine neurons, projecting to all striatal regions, encode the reward distribution. We found that pure RPE-encoding DA neurons project to the lateral nucleus accumbens shell (lAcbSh), and broadly across the striatum. Moreover, lAcbSh- and broadly-projecting DA neurons show structured RPE heterogeneity consistent with distributional RL predictions for a quantile-like population code. Our findings suggest that dopamine-based RL is organized through a “distributional critic” architecture that is superimposed on other outcome-specific information, supporting continuous, reward-informed, behavioral control. (Back to top>>)
Learning the Language of Smell: Foundation Models for Protein-Odor Interactions, Brian DePasquale
- Foundation models are large neural networks pre-trained on unlabeled datasets to learn rich, generalizable data embeddings that enhance performance in downstream tasks. These models are particularly valuable in low-data regimes, where they mitigate overfitting and improve generalization. In olfaction, datasets that record how an odorant interacts with olfactory receptor (ORs) are sparse, typically testing on the order of hundreds of odorants compared to the billions [1] that exist. The discrepancy between the size of odor space and the amount of available data raises the question of whether foundation models, trained on millions of unlabeled data points, can improve prediction in this domain. We utilize various foundation models to predict interactions between odorants and ORs across 3 datasets from different species, leading to two main findings: (1) Molecular information alone is insufficient for accurately predicting olfactory receptor neuron (ORN) activity, suggesting that individual neural selectivity cannot be captured solely from molecular features given current data limitations. (2) Integrating protein embeddings from a protein foundation model [2] restores predictive performance, addressing the limitations of molecular information alone. These findings suggest that multimodal models, which integrate both chemical and protein data embeddings, offer a promising path forward for leveraging foundation models in olfactory processing. (Back to top>>)
Automated online spike sorting algorithm for high-channel neural recordings, Zeinab Mohammadi
- Sorting data from high-channel probes online presents significant challenges due to data volume. Most existing algorithms are designed for post-hoc sorting, prioritizing accuracy but falling short of speed required for rapid neural analysis [1-4]. To address this, we introduce Graph nEtwork Multichannel sorting (GEMsort) [5], a novel spike sorting algorithm designed for online sorting across multiple channels. GEMsort introduces several innovative features, including leveraging graph networks’ flexibility to efficiently track neural data and avoid analyzing duplicated spikes from adjacent sites. Additionally, the spike’s originating channel is used as an additional feature to enhance sorting accuracy.
- We evaluated GEMsort on synthetic and experimental Neuropixels data [6], comparing its performance against Kilosort and Mountainsort. GEMsort demonstrated comparable accuracy while reducing sorting time, making it well-suited for online applications. This rapid sorting capability, where all calculations are performed in parallel and only representations of spiking groups (algorithm’s nodes and edges) are stored in memory, is valuable for rapid data analysis, such as closed-loop experiments requiring immediate feedback. GEMsort operates without manual intervention, a critical feature for closed-loop control in Brain-Machine Interfaces (BMI). Its lightweight computational demands allow it to run efficiently on digital hardware which is particularly advantageous for high-channel recordings, such as Neuropixels. In conclusion, GEMsort provides an efficient solution for rapid sorting, combining graph-based clustering, duplicate spike elimination, and channel-specific features. Its flexibility in capturing data structure also helps overcome challenges like electrode drift. These make it an ideal tool for neuroscience research, especially in data processing for BMI. (Back to top>>)
Three-Dimensional Micro-Instrumented Neural Network for Neuromorphic Computing, Kumar Mritunjay
- Three-dimensional (3D) cultured neural networks that emulate the structure and computational principles of the brain could offer valuable insights into brain-inspired computing, artificial intelligence, neural development, and disease progression. However, fully realizing the potential of 3D cultured neural networks requires a true 3D device—3D neural network interface, enabling stable long-term volumetric mapping and modulation of the neural network dynamics with single-neuron resolution. Existing technologies like patch-clamps, planar multielectrode arrays and rigid penetrating probes fail to achieve this. We introduce a 3D Micro-Instrumented Neural network (3D-MINe), where we folded a photolithographically patterned flexible microelectrode array into multiple layers and seamlessly integrated it with a 3D culture of rat embryonic hippocampal neurons that replicates key properties of the brain. 3D-MINe’s volumetric electrical mapping capability allows stable action potential recordings from multiple planes over six months, enabling quantitative tracing of the neural network’s evolving connectivity maps and real-time tracking of neuronal activity modulated by pharmacological stimulations, including Bicuculline and Tetrodotoxin. Furthermore, chronic electrical stimulation allows systematic training of the 3D neural network by tuning the connectivity strength between neurons, confirming long-term plasticity. This enabled us to construct a reservoir-based 3D neural network for classification of both spatial and temporal inputs with very high accuracy. 3D neural networks outperformed 2D neural networks by 17% for similar classifications and showed much faster learning. Our 3D-MINe establishes the much-needed 3D device—3D neural network interface for monitoring and modulation of neural networks, offering the potential to impact brain-inspired computing, brain-machine interfaces and neurological diseases research. (Back to top>>)
New News: Evaluating single-shot in-weight learning abilities of large language models, Core-Francisco Park
- The rapid advancement of foundation models in artificial intelligence has revolutionized neural data interpretation, yet their epistemic limits remain underexplored—particularly in the context of subcortical neural dynamics. While cortical processes have dominated NeuroAI research, subcortical structures, which are crucial for fundamental cognitive and motor functions, pose unique challenges due to their deep, nonlinear activity patterns. This study conducts a meta-analysis to evaluate the interpretative fidelity of foundation model latent spaces when applied to subcortical neuroimaging and electrophysiology data. By synthesizing findings from diverse studies, we identify systemic limitations in model-derived representations, questioning their reliability in capturing subcortical dynamics. Our analysis employs a systematic review of literature on foundation model applications in subcortical neural decoding, alongside a comparative evaluation of latent space visualizations and feature importance metrics across architectures. To quantify uncertainty, we introduce a novel metric—interpretative entropy—which assesses the degree of ambiguity in latent representations. Preliminary findings suggest that foundation models trained predominantly on cortical datasets struggle with generalizability in subcortical contexts, often misattributing high-dimensional neural patterns to cortical-like structures. These results underscore the necessity for model adaptation strategies that account for the neurobiological distinctiveness of subcortical regions. The study contributes to the ongoing discourse on the epistemic boundaries of AI-driven neuroscience, highlighting the need for theoretical and methodological refinements in NeuroAI. Ultimately, our findings call for a paradigm shift in foundation model design, advocating for approaches that integrate neurobiological priors to enhance interpretability and reliability in subcortical neural decoding. (Back to top>>)
Learning and Stability of Value Representation in Ventral Striatum, Farhad Pashakhanloo
- Studying the dynamics of neural representation and its stability over time is essential for understanding the nature of noise and computation in the brain. Enabled by recent advances that allow recording and tracking of large population of neurons over long periods of time, it has been shown that neural activity in several areas of the brain can change drastically over weeks despite a fixed task performance. In this work, we use experimental data and theory to study learning and stability of value representation in the reinforcement learning (RL) setting. Specifically, in a Pavlovian association task we perform long-term calcium-imaging in a region of Ventral Striatum called Olfactory Tubercle, which is a recipient of reinforcement signal from Ventral Tegmental Area and sensory information from olfactory areas. This is particularly an interesting area as it may represent value in the RL framework, and it would be interesting to study how the dopamine signal may affect its learning and stability. We find that after performance has plateaued, the self-similarity of representations decays gradually as a function of time between sessions, which is consistent with drift phenomenon reported in other areas of the brain and outside the context of RL. We also find that the geometry of the drift is such that a downstream decoder has to alter itself to compensate for it. In a temporal difference (TD) learning model, we study the potential influence of TD-error signal on the fluctuation of value, and study the implications for the plasticity and flexibility of downstream decoder. Overall, our study sheds light on the nature of neural representation in the brain and RL. (Back to top>>)
Free Energy Projective Simulation: active inference with interpretability, Josephine Pazem
- In active inference, an adaptive agent is modeled as building a representation of its environment. As the agent interacts with its surroundings, predictions about future sensations are emitted based on past experiences and its model is updated using Bayesian inference. In addition, the agent’s behavior is a function of its model, and aims at minimizing the expected surprise about future sensory states.
- We introduce an interpretable model for learning and agency: Free Energy Projective Simulation (FEPS). This model uses an active inference approach to learn a clone-based graph structure for modelling sensory sequences. Specifically, temporal associations between the nodes of this graph are reinforced based on how well they generate accurate predictions over extended periods of time. Behavior is a function of the model of the agent and its preferences. Nodes in the graph initially encode redundant information about observations, then gradually specialize to accurately represent hidden variables while remaining interpretable. The model can evaluate long-term preferences, enabling planning towards distant goals. Proof of principle is provided in two environments: a delayed-response task and a navigation task. As a next step, we plan to further investigate the biological plausibility of FEPS and make it a meaningful tool to conceptualize memory processes. (Back to top>>)
Psychological Imagination Networks in Humans and Large Language Models, Saurabh Ranjan
- Multi-modal associative learning requires an agent to be sufficiently integrated so that the experiences of its components become bound into memories belonging to the agent rather than its parts. Conversely, could associative conditioning increase integration in an agent undergoing learning? We analyzed gene regulatory networks, which learn to associate distinct stimuli using causal emergence, which captures the degree to which a system is an integrated whole that is more than the sum of its parts. Analyzing 29 biological (experimentally derived) networks before, during, and after training, we discovered that biological networks, significantly more than random controls, increase their causal emergence due to training. Clustering analysis uncovered five distinct ways GRNs’ emergence responds to training; these clusters do not map to traditional ways to characterize network structure and function but correlate to different biological categories. Our analysis reveals how learning can reify the existence of an agent emerging over its parts and suggests that this property is favored by evolution. Our data have implications for how diverse intelligence moves across the cognitive spectrum and for a biomedical roadmap to exploit these remarkable features in GRNs – a ubiquitous biological control system with numerous impacts on health and disease. (Back to top>>)
Massive activations in language reasoning models: What are they good for?, Shivam Raval
- We investigate a critical yet understudied phenomenon in Large Language Models (LLMs): massive activations, where a small subset of neurons (0.01%) produces activation values orders of magnitude larger than the median. Through systematic examination across LLaMA-2-7B, LLaMA-3.2-3B, and DeepSeek-R1-Distill-Llama-8B, we reveal that these activations predominantly occur at sequence boundaries and function as implicit bias terms.
- To characterize the information encoded in these neurons, we employ linear probes to predict structural features such as sentence boundaries and dialog role transitions from activation values, complemented by representational similarity analysis to quantify relationships between activation patterns and linguistic structures. Preliminary results suggest massive activations strongly correlate with syntactic features while showing limited association with semantic content.
- Building on these findings, our intervention experiments demonstrate the causal role of massive activations in model behavior—zeroing out these activations increases perplexity by 110-139×, with particularly severe disruption at sentence boundaries. Magnitude clipping experiments show function words and punctuation are more sensitive than content words, supporting our probing results that these activations encode structural rather than semantic information. Analysis of training dynamics in Pythia-1B shows these activations emerge in distinct phases, reinforcing their role as learned structural anchors rather than initialization artifacts.
- Our results establish massive activations as critical computational mechanisms that direct attention flow and maintain structural coherence in text generation. This work advances LLM interpretability by revealing how specific activation patterns encode the architectural scaffolding that guides model reasoning, offering new insights for developing more transparent and controllable language models. (Back to top>>)
Dendritic Processing in Artificial Neural Networks based on Shunting Inhibition, Maceo Richards
- Dendritic computation plays a crucial role in neural information processing by facilitating complex and nonlinear integration of signals before they reach the soma. However, the exact nature of this processing and how information is integrated in dendrites are not well understood. This study introduces a novel artificial neural network architecture that emulates dendrites to better understand how they process information. The proposed architecture includes branching dendritic morphologies, non-linear synaptic interactions, shunting inhibition, and biologically plausible synaptic weight decay. Dendrites are built using an expandable and compartmentalized representation that allows the user to systematically explore the effects of changing dendritic morphologies and synaptic properties. This model of dendrite function and signaling was built to examine the contribution of biological features of real synapses and dendrites to neuronal computation while remaining compatible with standard artificial neural network training methods, such as gradient descent and backpropagation. Thus, it aims to examine computations under biological constraints without mimicking biological plasticity processes. A significant challenge encountered was the issue of vanishing gradients which impedes learning in distal dendritic branches. Extensive gradient manipulation is required to overcome this issue. This motivates the development of a brain-inspired and biologically realistic learning rule that is compatible with dendritic architectures. Findings from this study offer insights into the computational function of dendrites in biological neurons and may provide a more plausible learning rule for artificial neural networks that emulate dendritic computation. Furthermore, it provides a general platform for examining dendritic computation and plasticity using modern and machine learning frameworks. (Back to top>>)
The world model emerges in cortex as the pattern of intersections of sparse distributed codes of individual inputs, Rod Rinkus
- Machine learning (ML) is modeled primarily as optimization. Backpropagation (BP) underlies not only supervised but also unsupervised learning in all mainstream models. Despite advances, e.g., stochastic gradient descent, binarizing variables, BP still requires (to first approximation) computing loss derivatives w.r.t. every synaptic weight for every input repeatedly through an offline (thus non-biological) training phase. This is the source of mainstream ML’s massive power needs. But what if unsupervised learning in the brain’s neocortex is not optimization? First, consider that the goal of unsupervised learning is to produce an internal world model, i.e., a model of all important statistics of the input space, i.e., estimates of the frequencies of occurrence of all features, all pairs of features, all triples, etc. By “important”, we mean statistics reflecting actual spatial/temporal structure of natural entities, not coincidences/noise. Is there a way to learn a world model more efficiently than by optimization? Yes. The first key is to represent input items as sparse distributed codes (SDCs), small subsets of binary units chosen from a much larger coding field, e.g., the population of pyramidals comprising the L2/3 compartment of a macrocolumn. The second key is an efficient method for preserving similarity, specifically for mapping more similar inputs to more highly intersecting SDCs. Thus, the similarity (statistical) structure of all orders present in the input space becomes embedded in the intersection structure over the SDCs. I’ll describe this system, providing results demonstrating its extremely efficient single-trial learning, immediate best-match retrieval, and further biologically relevant properties. (Back to top>>)
Sparse Hebbian Learning for High-Capacity Pattern Discrimination in Cerebellar Ensembles, Benjamin Ruben
- In the cerebellar cortex, mossy fiber inputs project to the granule cell layer, where they undergo massive expansion. The parallel fiber axons of these granule cells are then read out by Purkinje cells, which are considered the main locus of synaptic plasticity in the circuit. Purkinje cells subsequently converge on neurons in the deep cerebellar nuclei, providing the sole output of the cerebellum. Experimental observations suggest an important role for coordination among Purkinje cells during cerebellar learning and function. However, a theoretical framework to explain this coordination is lacking. Here, we extend simple models of Hebbian learning from “sparse expansions” of sensory inputs to the case of ensembles of Purkinje cell decoders. We show that an ensemble learning strategy can significantly improve the capacity of the cerebellar circuit in a simple pattern-separation task when learning signals are delivered to sparse subsets of Purkinje cells at each step and the Purkinje cells have nonlinear activation functions. Our analytical results show that overlaps in the sets of parallel fibers sampled by different Purkinje cells determines the correlations of noise in their output, limiting ensemble accuracy. Under noisy inputs, our proposed sparse Hebbian learning algorithm outperforms backpropagation with a cross-entropy loss function, suggesting that sparse Hebbian learning may provide an optimal strategy for high-capacity pattern recognition in biologically relevant contexts. (Back to top>>)
Using vine copulas to model and decompose higher-order and time-varying interactions, Houman Safaai
- Vine copulas offer a powerful framework to dissect high-dimensional interactions by factorizing multivariate dependencies into cascades of bivariate dependencies arranged across multiple trees into a graphical structure. We leveraged these vine-copula structures and developed methods to decompose entropies of multivariate densities into a sum of individual interaction orders and using flow-based transformations we modeled their time-dependence. We showed that not only these models can be used to fit large dimensional interacting dynamics, and to generate samples from such systems, but also allow quantification and isolation of contribution of higher order interactions in terms of entropy or mutual information and in how they contribute to the dynamics of the system. This decomposition of the contribution of pairwise versus higher order interactions can be used to understand the role of such high-order interactions in the dynamics and can shed light on our understanding of how such interactions contribute to information processing and network dynamics in systems such highly interacting neural populations and the way they process and represent external inputs and behavior. By revealing how higher-order interaction effects unfold over time and how they influence the dynamics and information content represented by a population of neural activities, vine copulas and flow-based transformations open the door to a richer understanding of collective dynamics in neural systems and behaviors and effects which cannot be explained by including only the pairwise interactions. (Back to top>>)
Model alignment using inter-modal bridges, Noor Sajid
- Foundation models have demonstrated remarkable performance across modalities such as language and vision. However, inter-modal model reuse remains limited due to the difficulty of aligning internal representations. Existing methods require extensive paired training data or are constrained to specific domains. We introduce a semi-supervised approach to align model spaces using conditional flow matching. The conditional flow between latent spaces of different modalities (e.g., text-to-image or biological-to-artificial neuronal activity) can be learned in two settings: (1) solving a (balanced or unbalanced) optimal transport problem (Peyré & Cuturi, 2019) using our inter-space bridge cost, and (2) performing memory-efficient alignment using labelled exemplars. We evaluate our approach on object recognition and image generation tasks across MNIST, ImageNet, and neural activity (Majaj, et al, 2015) datasets. Despite being constrained by the original models’ capacity, our method matches the downstream performance of end-to-end trained models, particularly when labelled training data is scarce (<20%). For neural activity data, we show that using samples from the computed global optimal coupling (setting 1) achieves competitive downstream performance while avoiding over-fitting compared to the direct use of labelled pairs (setting 2). Our approach provides a data-efficient solution for inter-modal model alignment with minimal supervision, offering a practical way to use pre-trained models across different domains without requiring large amounts of paired data. However, the effectiveness of our method is limited by the quality of pre-trained feature extractors and the availability of paired samples. Therefore, future work should focus on developing more disentangled representations to improve model reusability across modalities. (Back to top>>)
A Computational Model of Learning and Memory Using Structurally Dynamic Cellular Automata, Jeet Singh
- In the fields of computation and neuroscience, much is still unknown about the underlying computations that enable key cognitive functions including learning, memory, abstraction and behavior. This paper proposes a mathematical and computational model of learning and memory based on a small set of bio-plausible functions that include coincidence detection, signal modulation, and reward/penalty mechanisms. Our theoretical approach proposes that these basic functions are sufficient to establish and modulate an information space over which computation can be carried out, generating signal gradients usable for inference and behavior. The computational method used to test this is a structurally dynamic cellular automaton with continuous-valued cell states and a series of recursive steps propagating over an undirected graph with the memory function embedded entirely in the creation and modulation of graph edges. The experimental results show: that the toy model can make near-optimal choices to re-discover a reward state after a single training run; that it can avoid complex penalty configurations; that signal modulation and network plasticity can generate exploratory behaviors in sparse reward environments; that the model generates context-dependent memory representations; and that it exhibits high computational efficiency because of its minimal, single-pass training requirements combined with flexible and contextual memory representation. (Back to top>>)
Impact of eye movements and orofacial movements on mouse visual cortex, Atika Syeda
- Previous studies have found that the mouse primary visual cortex (V1) is correlated to various orofacial movements. However, recent work in primates suggests that monkey V1 is primarily modulated by eye movements, not orofacial movements (Talluri et al 2023). In mice, it remains unclear how much eye movements contribute to the modulation of neural responses in the presence or absence of visual input compared to other orofacial movements. To determine the contribution of eye movements to mouse V1 activity, we recorded the activity of thousands of V1 neurons using two-photon calcium imaging while presenting a visual stimulus or in darkness and monitoring eye movements and orofacial behaviors with a camera. We utilized Facemap, our AI-based framework, to precisely track orofacial movements and predict neural activity from mouse behavioral videos. Facemap uses a convolutional neural network to track keypoints on the mouse face and get precise movement information. In addition, Facemap uses a deep neural network model with temporal convolutions to capture movements at various timescales for neural activity prediction, outperforming previous linear models. With this model, we found that eye movements influence a small fraction of activity in the mouse visual cortex compared to other orofacial behaviors. In the presence of visual input, eye position predicts ~10% variance explained after accounting for retinal input whereas orofacial behaviors account for 40-50% of variance explained. These results suggest that orofacial movement signals play a larger role in visual cortical processing in mice compared to primates. (Back to top>>)
Features are Fate: A Theory of Transfer Learning in High Dimensional Regression, Javan Tahir
- With the emergence of large-scale pre-trained neural networks, methods to adapt such “foundation” models to data-limited downstream tasks have become a necessity. Fine-tuning, preference optimization, and transfer learning have all been successfully employed for these purposes when the target task closely resembles the source task, but a precise theoretical understanding of “task similarity” is still lacking. While conventional wisdom suggests that simple measures of similarity between source and target distributions, such as ϕϕ-divergences or integral probability metrics, can directly predict the success of transfer, we prove the surprising fact that, in general, this is not the case. We adopt a feature-centric viewpoint on transfer learning and establish a number of theoretical results that demonstrate that when the target task is well represented by the feature space of the pre-trained model, transfer learning outperforms training from scratch. We study deep linear networks as a minimal model of transfer learning in which we can analytically characterize the transferability phase diagram as a function of the target dataset size and the feature space overlap. For this model, we establish rigorously that when the feature space overlap between the source and target tasks is sufficiently strong, both linear transfer and fine-tuning improve performance, especially in the low data limit.
- These results build on an emerging understanding of feature learning dynamics in deep linear networks, and we demonstrate numerically that the rigorous results we derive for the linear case also apply to nonlinear networks. (Back to top>>)
Learning richness modulates equality reasoning in neural networks, William Tong
- Equality reasoning is ubiquitous and purely abstract: sameness or difference may be evaluated no matter the nature of the underlying objects. As a result, same-different tasks (SD) have been extensively studied as a starting point for understanding abstract reasoning in humans and across animal species. With the rise of neural networks (NN) that exhibit striking apparent proficiency for abstractions, equality reasoning in NNs has also gained interest. Yet despite extensive study, conclusions about equality reasoning vary widely with little consensus. To clarify the underlying principles in learning SD, we develop a theory of equality reasoning in multi-layer perceptrons (MLP). Following observations in comparative psychology, we propose a spectrum of behavior on equality reasoning that ranges from conceptual to perceptual outcomes. Conceptual behavior is characterized by symbolic representations, efficient learning, and insensitivity to perceptual details. Perceptual behavior is characterized by statistical, associative learning that requires exhaustive training and remains strongly influenced by perceptual details. We develop a mathematical theory to show that an MLP’s position along this spectrum is driven by learning richness. Rich regime MLPs exhibit conceptual behavior, whereas lazy regime MLPs exhibit perceptual behavior. We validate our theoretical findings in vision SD tasks where neural networks have been documented to struggle, instead showing that rich feature learning promotes success by encouraging hallmarks of conceptual behavior. Overall, our work identifies feature learning richness as a key parameter modulating equality reasoning, and suggests that equality reasoning in humans and animals may similarly depend on learning richness in neural circuits. (Back to top>>)
Global Neuron Shape Reasoning with Point Affinity Transformers, Jakob Troidl
- Connectomics is a subfield of neuroscience that aims to map the brain’s intricate wiring diagram. Accurate neuron segmentation from microscopy volumes is essential for automating connectome reconstruction. However, current state-of-the-art algorithms use image-based convolutional neural networks that are limited to local neuron shape context. Thus, we introduce a new framework that reasons over global neuron shape with a novel point affinity transformer. Our framework embeds a (multi-)neuron point cloud into a fixed-length feature set from which we can decode any point pair affinities, enabling clustering neuron point clouds for automatic proofreading. We also show that the learned feature set can easily be mapped to a contrastive embedding space that enables neuron type classification using a simple KNN classifier. Our approach excels in two demanding connectomics tasks: proofreading segmentation errors and classifying neuron types. Evaluated on three benchmark datasets derived from state-of-the-art connectomes, our method outperforms point transformers, graph neural networks, and unsupervised clustering baselines. (Back to top>>)
Situated Projective Simulation: From Memory Trace Encoding toward Episodic Memory, Alexander Vining
- Certain insightful behaviors in animals are often attributed to “episodic-like” memory, but it remains unclear whether – or when – the processes underlying these behaviors amount to the explicit memory retrieval associated with human episodic memory. Situated Projective Simulation is a theoretical model of episodic memory that integrates an “episodic-like” predictive network with a deliberative “Projective Simulation” (1) process. The episodic-like network updates the weights of feed-forward and recurrent connections as a function of inputs, such that certain nodes can be interpreted as “memory traces” that encode sequences of inputs. Network outputs are used to compute surprise and guide the actions of an agent toward predictable or expected futures. “Projective Simulation” is a random walk along associative edges between memory traces that models the explicit activation of memories. The effect of this activation on the predictive processes of the episodic-like network can be scaled, providing a framework for studying the conditions under which a transition from “episodic-like” to episodic memory may be adaptive.
- Our simulations in multiple environments designed to reflect paradigmatic tests of animal cognition reveal that the episodic-like network alone is sufficient for agents to discover and exploit properties of noisy or aliased “situations” within their environment. This capacity is limited, however, and various features of Projective Simulation may help or hinder, depending on the environment. Because the network dynamics are interpretable, these results are useful for predicting when and why properties of episodic memory might evolve in natural systems, and for developing autonomously explorative AI to learn about unknown environments. (Back to top>>)
Dissociation of prediction and control in visual neuronal model, Binxu Wang
- Understanding how visual stimuli are transformed into neural representations often involves building image-computable models that predict neuronal responses. As diverse models achieve comparable prediction accuracies (R²) on benchmarks like BrainScore, this metric may obscure crucial differences between the models. Motivated by the idea that a true “digital twin” of neurons should enable the synthesis of stimuli that precisely control the firing rates of biological neurons, we designed a closed-loop, two-phase experiment to compare the control capabilities of different models.
- In phase one, we recorded neuronal activity in monkey inferotemporal cortex (IT) as subjects fixated on approximately 1000 images from the Natural Scene Dataset (NSD). For each channel, we fitted prediction models using Ridge regression on two backbones—ResNet50 and ResNet50-robust. We leveraged an explainable AI (xAI) technique called feature accentuation. Namely, we let each model synthesize image sets that they predict to drive neuronal activities at 10 levels spanning the full dynamic range. Measuring the neuronal responses to these images would test the control capabilities of the models. Additionally, we synthesized “controversial stimuli” which one model predicted to induce high activation, while the other predicted to suppress it. In phase two, neuronal responses to NSD images, accentuated stimuli, and controversial stimuli were recorded.
- Although all models predicted responses to NSD images at similar accuracy, accentuated stimuli from ResNet50-robust modulated neuronal firing more precisely. Further, controversial images that maximize ResNet50 while minimizing ResNet50-robust prediction suppressed the neuronal activity, whereas the opposite increased it. This indicates that while both models share key features for predicting NSD responses, the features unique to ResNet50-robust are more brain-aligned, while those unique to ResNet50 are somewhat spurious. Overall, our results demonstrate a dissociation between the prediction and control capability in models of visual neurons. Further, evaluating models by their neural control capability offers a stricter measure of brain-machine alignment, and provides insights into the factors that render deep networks brain-aligned. (Back to top>>)
When Implants meets AI: A Dual-Loop System for Neuromodulation and Naturalistic Cognitive Research, Edward Wang
- We propose a novel dual-loop system that synergistically combines responsive neurostimulation (RNS) implants with artificial intelligence-driven wearable devices for treating post-traumatic stress disorder (PTSD) and enabling naturalistic brain research. In PTSD Therapy Mode, an implanted closed-loop neural device monitors amygdala activity and provides on-demand stimulation upon detecting pathological theta oscillations, while an ensemble of wearables (smart glasses, smartwatches, smartphones) uses multimodal large language model (LLM) analysis of sensory data to detect environmental or physiological PTSD triggers and deliver timely audiovisual interventions. Logged events from both the neural and wearable loops are analyzed to personalize trigger detection and progressively transition patients to non-invasive interventions. In Neuroscience Research Mode, the same platform is adapted for real-world brain activity capture. Wearable-LLM systems recognize naturalistic events (social interactions, emotional situations, compulsive behaviors, decision making) and signal implanted RNS devices (via wireless triggers) to record synchronized intracranial data during these moments. This approach builds on recent advances in mobile intracranial EEG recording and closed-loop neuromodulation in humans. We discuss how our interdisciplinary system could revolutionize PTSD therapy and cognitive neuroscience by enabling 24/7 monitoring, context-aware intervention, and rich data collection outside traditional labs. The vision is a future where AI-enhanced devices continuously collaborate with the human brain, offering therapeutic support and deep insights into neural function, with the resulting real-world context rich neural data, in turn, accelerating the development of more biologically grounded and human-centric AI. (Back to top>>)
BlindNeuralGen: Brain-Inspired Architecture for Synthesizing EEG Signals from Visual Inputs, Sophia Wong
- This research introduces BlindNeuralGen, a brain-inspired deep learning framework that synthesizes electroencephalogram (EEG) signals from visual inputs, bridging artificial and biological vision systems. Inspired by neurophysiological processes, the architecture incorporates neural mechanisms found in the visual cortex through a hybrid design combining convolutional neural networks with Gated Recurrent Units (GRUs) to process both spatial and temporal dependencies in visual information processing.
- The framework’s key innovation is applying contrastive learning to EEG representation, ensuring that visually similar inputs produce corresponding neural activity patterns in feature space. This approach reflects how the brain forms categorical neural representations while maintaining distinct patterns across visual classes. Additionally, the implementation of InstanceNorm normalization preserves individual signal characteristics while stabilizing training, mimicking the brain’s ability to normalize neural responses.
- Trained on paired image-EEG data from 23 participants viewing character images, the model (10.1 million parameters) outperformed traditional CNN baselines with a Root Mean Square Error of 83.445 and Peak Signal-to-Noise Ratio of 14.987. Most notably, performance improvements were highest in occipital electrodes E5 and E10 (33.6% and 75.7% respectively), precisely corresponding to the brain’s primary visual processing regions.
- Analysis of synthesized EEG signals reveals that rather than replicating exact responses to specific images, BlindNeuralGen generates patterns representing visual categories – effectively simulating how the brain forms conceptual neural representations. This research advances our understanding of neural encoding principles while demonstrating how artificial systems can implement brain-inspired computational strategies, contributing to the foundation for future non-invasive neural interfaces. (Back to top>>)
Comparing train-test generalization in mouse olfactory learning to artificial neural networks, Ningjing Xia
- The field of machine learning uses tests of generalization with held out data not used in training as a gold standard for gauging performance. However, such train-test splits are rarely adopted in sensory neuroscience, despite the widespread belief that animals can generalize well from limited data. The olfactory mixture task developed in our group offers the potential to vary both the complexity of test stimuli and the difficulty of learning in a well-controlled experimental setting. In this task, mice are trained to detect which of two different target odors are present in a brief stimulus that also contains a highly variable number of up to 16 background odors. The large number of stimulus combinations (~30,000) has allowed us to test whether and how mice generalize in this categorization task. We designed a series of behavioral experiments to probe how the statistics of the training data set determine generalization performance. We varied data diversity, noisy labeling and correlation level within the training data to test their effects on generalization. We also tested reversal learning and transfer learning, and surprisingly, mouse behavior was not easily described by standard feature learning theories in ML. We are designing reinforcement learning agents to take the same sequence of trials given to mice to investigate how they perform compared to mice. By comparing learning in the exact task in animals versus in ML agents, we hope to contrast the different inductive biases between artificial and naturalistic intelligence. (Back to top>>)
When narrower is better: the narrow width limit of Bayesian parallel branching neural networks, Zechen Zhang
- The infinite width limit of random neural networks is known to result in Neural Networks as Gaussian Process (NNGP) (Lee et al. (2018)), characterized by task-independent kernels. It is widely accepted that larger network widths contribute to improved generalization (Park et al. (2019)). However, this work challenges this notion by investigating the narrow width limit of the Bayesian Parallel Branching Neural Network (BPB-NN), an architecture that resembles neural networks with residual blocks. We demonstrate that when the width of a BPB-NN is significantly smaller compared to the number of training examples, each branch exhibits more robust learning due to a symmetry breaking of branches in kernel renormalization. Surprisingly, the performance of a BPB-NN in the narrow width limit is generally superior to or comparable to that achieved in the wide width limit in bias-limited scenarios. Furthermore, the readout norms of each branch in the narrow width limit are mostly independent of the architectural hyperparameters but generally reflective of the nature of the data. We demonstrate such phenomenon primarily in the branching graph neural networks, where each branch represents a different order of convolutions of the graph; we also extend the results to other more general architectures such as the residual-MLP and demonstrate that the narrow width effect is a general feature of the branching networks. Our results characterize a newly defined narrow-width regime for parallel branching networks in general. (Back to top>>)
Does cell-type level connectivity support biologically plausible backpropagation? Zihan Zhang
- Backpropagation serves as the foundation for training and credit assignment in artificial neural networks. However, its mechanism violates several well-established constraints of biological signaling, including the weight transport problem and the requirement for synapse-wise, non-local information sharing across time. Previous studies have addressed these limitations through temporal truncations that reformulate the process as three-factor learning, relying on a top-down learning signal and eligibility traces. Subsequent research incorporates neuropeptidergic signaling across arbitrary timescales to enhance synaptic learning efficiency. Here, we investigate how heterogeneity in connectivity-based grouping strength among cell types can support biologically plausible learning schemes in neuroscience-related supervised learning tasks, particularly in regimes emphasizing recurrent interactions. We develop mathematical reasoning to explain how cell-type specific connectivity patterns lead to the decorrelation of synaptic weight deviations from type averages, thereby aligning the gradient step more closely with backpropagation on average. We demonstrate the effectiveness of this approach through simulation experiments that incorporate biological constraints, including Dale’s principle, sparsity control, and the restriction of excitatory projections between layers. Additionally, we validate our theoretical findings in online reinforcement learning settings. Overall, we explore how cell-type-specific modulatory signaling synergizes with cell-type-specific synaptic connection architectures, providing insights into learning mechanisms in biological networks. (Back to top>>)