https://arxiv.org/api/G+NyCRVPVLMFv/ixksfcQ/F0Ywk 2026-06-22T06:50:08Z 12181 315 15 http://arxiv.org/abs/2508.08289v2 Understanding Transformers through the Lens of Pavlovian Conditioning 2026-05-05T20:31:16Z

Transformer architectures have revolutionized artificial intelligence (AI) through their attention mechanisms, yet the computational principles underlying their success remain opaque. We present a novel theoretical framework that reinterprets the core computation of attention as Pavlovian conditioning. Our model finds a direct mathematical analogue in linear attention, which simplifies the analysis of the underlying associative process. We demonstrate that attention's queries, keys, and values can be mapped to the three elements of classical conditioning: test stimuli that probe associations, conditional stimuli (CS) that serve as retrieval cues, and unconditional stimuli (US) that contain response information. Through this lens, we suggest that each attention operation constructs a transient associative memory via a Hebbian rule, where CS-US pairs form dynamic associations that test stimuli can later retrieve. Our framework yields several theoretical insights grounded in this linearized model: (1) a capacity theorem showing that attention heads can store $O(\sqrt{d_k})$ associations for worst-case, error-free retrieval, while average-case retrieval fidelity scales robustly as $O(d_k)$; (2) an error propagation analysis revealing fundamental architectural trade-offs of balancing model depth, width, and head redundancy to maintain reliability; and (3) an understanding of how biologically plausible learning rules could enhance transformer architectures. By establishing this deep connection, we suggest that the success of modern AI may stem not from architectural novelty alone, but from implementing computational principles analogous to those optimized by biology over millions of years of evolution.

2025-08-05T05:00:00Z Mu Qiao http://arxiv.org/abs/2512.01073v2 The Modeler Schema Theory of Consciousness, with a Falsifiable Experiment 2026-05-05T19:09:14Z

We propose that consciousness arises from a single control agent, the Modelerschema. It monitors the brain's Modeler as that system constructs and updates the internal World Model. As part of that monitoring, the Modelerschema generates experience by applying a qualia-based consistency check to the Modeler's output. The Human Agent comprises three cooperating agents: Modeler, Controller, and Targeter, each paired with an associated regulatory "schema" agent. We also describe fast-Modelers and fast-Controllers, evolutionary shortcuts whose rapid actions precede awareness. Our core prediction is that the Modelerschema performs a qualia-based consistency check during saccades and issues a bottom-up attention request when a discrepancy is found. To test this prediction, we propose a saccadic change-detection experiment that distinguishes Modeler-generated from Modelerschema-generated bottom-up attention requests. Locating qualia in the Modelerschema ties experience to the regulation and refinement of internal representations, clarifies how awareness arises from model control, and suggests a path toward empirical falsification, thereby offering a concrete, testable proposal toward solving the Hard Problem of consciousness.

2025-11-30T20:47:38Z 39 pages, 5 figures Frank Heile http://arxiv.org/abs/2511.09290v2 Prediction horizon shapes representations in predictive learning 2026-05-05T12:11:24Z

Predictive learning has emerged as a central paradigm for training models across diverse data domains and is increasingly viewed as a foundation for modern artificial intelligence. A common intuition for this success is that accurate prediction requires models to capture the underlying dynamics of the environment, leading to the emergence of structured world models. However, predictive learning does not universally yield such representations, and a mechanistic account of when and why it does remains incomplete. In this work, we identify the prediction horizon as a critical, but often implicit, component of predictive learning objectives. We show that increasing the prediction horizon fundamentally shapes the effective structure of the learning problem. In a minimal setting, we demonstrate both theoretically and empirically that the model's implicit biases interact with this structural change to recover the latent geometry of the task. We then extend these empirical results to nonlinear architectures and more complex datasets, where similar phenomena persist. These findings provide a principled explanation for the emergence of structured representations in predictive learning paradigms and clarify the conditions under which such representations should be expected.

2025-11-12T12:57:19Z Aviv Ratzon Omri Barak http://arxiv.org/abs/2605.03606v1 Cusped singularities organize mixed-mode oscillations in mutually inhibitory slow-fast systems 2026-05-05T10:29:34Z

Mutual inhibition is a common motif in neural systems. Here, we establish that cusped singularities - folded singularities located at cusp points of critical manifolds - provide a universal organizing mechanism for mixed-mode oscillations (MMOs) in coupled slow-fast systems with mutual inhibition. We show that the geometric setup of these systems generically satisfies the conditions required by established geometric singular perturbation theory and blow-up methods, guaranteeing that such cusped singularities yield small-amplitude oscillations (SAOs). MMOs appear from the SAOs combined with an appropriate return mechanism. Further, we show that the geometric presence of a cusped singularity is strictly related to occurrence of a nearby singular Hopf bifurcation. We demonstrate the efficacy of this framework in two distinct neuronal models: the Curtu rate model of mutually inhibitory neural populations and coupled Morris-Lecar neurons with synaptic inhibition. In both cases, pushing the full system equilibrium near the cusped singularity triggers SAOs as the system passes near the cusp and approaches a full-system saddle-focus related to the singular Hopf bifurcation. Large-amplitude oscillations appear as the system spirals away from the saddle-focus, leading to MMOs, which may exhibit distinctive alternating patterns, in contrast to standard saddle-node induced MMOs. Our results establish cusped singularities as a generic, biologically relevant mechanism for complex oscillatory dynamics in inhibitory neural networks as well as for other inhibitory slow-fast systems.

2026-05-05T10:29:34Z Morten Gram Pedersen http://arxiv.org/abs/2605.04115v1 Learning reveals invisible structure in low-rank RNNs 2026-05-05T07:41:15Z

Learning in neural systems arises from synaptic changes that reshape the representations underlying behavior. While low-rank recurrent neural networks (RNNs) have emerged as a powerful framework for linking connectivity to function, a theoretical understanding of their learning process remains elusive. Here, we extend the low-rank framework from activity to learning by deriving gradient-descent dynamics directly in a reduced overlap space. We formulate a closed-form, low-dimensional system of ODEs that governs learning in this space, exact for linear RNNs and asymptotically exact for nonlinear RNNs in the large-$N$ Gaussian limit. Central to our analysis is a distinction between two classes of overlaps: loss-visible overlaps, which fully determine network activity, output, and loss, and loss-invisible overlaps, which do not affect function but are required to describe learning. We illustrate the consequences of this decomposition through two phenomena. First, we show that learning can serve as a perturbation that exposes differences in connectivity between functionally equivalent networks. Second, we show that loss-invisible overlaps can act as memory variables that encode training history, and characterize the conditions under which this occurs. Finally, we present several testable predictions for biological learning experiments derived from our theory.

2026-05-05T07:41:15Z 30 pages, 12 figures Yoav Ger Omri Barak http://arxiv.org/abs/2510.20839v2 On the evolutionary cognitive pressure for experiential awareness: do machines need it? 2026-05-05T03:52:49Z

The consciousness standing for artificial intelligence divides opinions across epistemological positions. Whether or not machines can be conscious, and whether we can ascertain the truth of such a proposition for any given case, has consequential ethical implications. This challenge is exacerbated by the lack of consensus on the nature of consciousness. We address an orthogonal problem: regardless of this nature of, is it \textit{required} for machines? Specifically, we focus on a constituent element of consciousness -experiential awareness- and examine why it arose evolutionarily in biological organisms, from a computational perspective. We show that, because of evolutionary "baggage" -autonomous neurological reactions- experiential awareness is necessary for higher-level reasoning to be possible. The implication is that, given artificial systems are architected without such legacy considerations, it is possible to design them with an arbitrary level of intelligence, without the need for experiential awareness. This possibility simplifies ethical considerations on artificial intelligence, and opens new approaches to the discernment of artificial consciousness.

2025-10-17T10:31:25Z Warisa Sritriratanarak Paulo Garcia http://arxiv.org/abs/2507.05561v2 Preemptive Solving of Future Problems: Multitask Preplay in Humans and Machines 2026-05-04T20:02:44Z

Humans can pursue a near-infinite variety of tasks, but typically can only pursue a small number at the same time. We hypothesize that humans leverage experience on one task to preemptively learn solutions to other tasks that were accessible but not pursued. We formalize this idea as Multitask Preplay, a novel algorithm that replays experience on one task as the starting point for "preplay" -- counterfactual simulation of an accessible but unpursued task. Preplay is used to learn a predictive representation that can support fast, adaptive task performance later on. We first show that, compared to traditional planning and predictive representation methods, multitask preplay better predicts how humans generalize to tasks that were accessible but not pursued in a small grid-world, even when people didn't know they would need to generalize to these tasks. We then show these predictions generalize to Craftax, a partially observable 2D Minecraft environment. Finally, we show that Multitask Preplay enables artificial agents to learn behaviors that transfer to novel Craftax worlds sharing task co-occurrence structure. These findings demonstrate that Multitask Preplay is a scalable theory of how humans counterfactually learn and generalize across multiple tasks; endowing artificial agents with the same capacity can significantly improve their performance in challenging multitask environments.

2025-07-08T00:55:47Z Wilka Carvalho Sam Hall-McMaster Honglak Lee Samuel J. Gershman http://arxiv.org/abs/2603.05612v2 Behavior-dLDS: A decomposed linear dynamical systems model for neural activity partially constrained by behavior 2026-05-04T19:39:32Z

Brain-wide recordings of large-scale networks of neurons now provide an unprecedented view into how the brain drives behavior. However, brain activity contains both information directly related to behavior as well as the potential for many internal computations. Moreover, observable behavior is executed not only by the brain, but also by the spinal cord and peripheral nervous system. Behavior is a coarse-grained product of neural activity, and we thus take the view that it can be best represented by lower-dimensional latent neural dynamics. Capturing this indirect relationship while disambiguating behavior-generating networks from internal computations running in parallel requires new modeling approaches that can embody the parallel and distributed nature of large-scale neural populations. We thus present behavior-decomposed linear dynamical systems (b-dLDS) to disentangle simultaneously recorded subsystems and identify how the latent neural subsystems relate to behavior. We demonstrate the ability of b-dLDS to decouple behavioral vs. internal computations on controlled, simulated data, showing improvements over a state-of-the-art model that uses behavior to supervise all dynamics based on behavior. We also demonstrate b-dLDS's interpretability benefits on a task-driven RNN dataset featuring a nonlinear relationship between behavior and activations. We then show that b-dLDS can further scale up to tens of thousands of neurons by applying our model to a large-scale recording of a zebrafish hindbrain during the complex positional homeostasis behavior, wherein b-dLDS highlights asymmetry in behavior-related dynamic connectivity networks.

2026-03-05T19:11:42Z Eva Yezerets En Yang Misha B. Ahrens Adam S. Charles http://arxiv.org/abs/2605.02852v1 Inferring Active Neural Circuits Using Diffusion Scores 2026-05-04T17:30:17Z

In biological systems, neural circuits compute through directed, short-latency interactions whose effects unfold across multiple time scales and behavioral contexts. We address the problem of inferring these local, lag-specific interactions from sampled neural population activity under varying stimuli, without assuming a parametric form for the underlying dynamics. Our approach leverages denoising score models by estimating joint-window scores over consecutive activity snapshots (i.e., brain states) and converting these scores into calibrated, directed edge tests via cross-block score products. The key insight is that these products recover the Jacobian of the transition map between brain states under nonlinear dynamics. To cleanly separate lag-specific effects, we introduce minimal multi-block windows that condition on intermediate time points, avoiding the omitted-lag bias inherent in pairwise analyses. The resulting method, Score--Block Time Graphs (SBTG), identifies lag-specific directed interactions in sampled neuronal population data. We specifically apply SBTG to whole-brain C. elegans calcium imaging data to recover lag-specific circuit structure not resolved by current methods, including improved alignment with independent connectomes, cell-type-specific temporal organization, and neuromodulatory profiles consistent with known receptor kinetics. These findings highlight the potential for SBTG to serve as a practical ``AI for science'' tool by turning high-dimensional neural population recordings into statistically testable circuit hypotheses.

2026-05-04T17:30:17Z Savik Kinger Johannes Bertram Luciano Dyballa Eviatar Yemini Steven W. Zucker http://arxiv.org/abs/2409.20318v3 A Rosetta Stone Hypothesis for Neurophenomenology: Mathematical Predictions from Predictive Processing 2026-05-04T17:30:17Z

Consciousness science faces the challenge of bridging first-person experience with third-person empirical measurements. Neurophenomenology aims to build such `generative passages' connecting the content of experience with behavioural and neuroscientific data. However, the mathematical machinery for such bridges remains underdeveloped. Here we develop a Rosetta Stone hypothesis from predictive processing, where beliefs serve as a central hub connecting phenomenology, behaviour, and neural dynamics. This hinges on a central technical assumption that phenomenology is a function of beliefs. We pursue a conditional approach: if this assumption holds, then certain predictions mathematically follow. We derive predictions for subjective similarity judgements, cognitive metabolic cost, subjective cognitive effort, and time perception. We review the connection between beliefs and neural dynamics to complete the generative passage for neurophenomenology, omitting the connection between beliefs and behaviour as this is already well-documented elsewhere. Testing our predictions will inform the validity of the central assumption connecting beliefs and phenomenology, and advance the neurophenomenology research programme.

2024-09-30T14:20:23Z 12 pages, 4 figures Lancelot Da Costa Anil K. Seth Karl Friston Maxwell J. D. Ramstead Lars Sandved-Smith http://arxiv.org/abs/2605.02675v1 Online Generalised Predictive Coding 2026-05-04T14:55:37Z

This paper introduces an extension of generalised filtering for online applications. Generalised filtering refers to data assimilation schemes that jointly infer latent states, learn unknown model parameters, and estimate uncertainty in an integrated framework -- e.g., estimate state and observation noise -- at the same time (i.e., triple estimation). This framework appears across disciplines under different names, including variational Kalman-Bucy filtering in engineering, generalised predictive coding in neuroscience, and Dynamic Expectation Maximisation (DEM) in time-series analysis. Here, we specialise DEM for ``online'' data assimilation, through a separation of temporal scales. We describe the variational principles and procedures that allow one to assimilate data in a way that allows for a slow updating of parameters and precisions, which contextualise fast Bayesian belief updating about the dynamic hidden states. Using numerical studies, we demonstrate the validity of online DEM (ODEM) using a non-linear -- and potentially chaotic -- generative model, to show that the ODEM scheme can track the latent states of the generative process, even when its functional form differs fundamentally from the dynamics of the generative model. Framed from a neuro-mimetic predictive coding perspective, ODEM offers a biologically inspired solution to online inference, learning, and uncertainty estimation in dynamic environments.

2026-05-04T14:55:37Z 45 pages, 17 Figures Mehran H. Z. Bazargani Szymon Urbas Adeel Razi Thomas Brendan Murphy Karl Friston http://arxiv.org/abs/2605.02365v1 Modeling sequential cognitive states via population level cortical dynamics 2026-05-04T09:09:13Z

In this work, we present a mathematical model for cyclic and sequential patterns of brain activity, combining heteroclinic dynamics with discrete neural-field models. We first show that spatial-discrete neural-field equations with biologically realistic equilibria cannot support heteroclinic cycles. On the other hand, heterocline dynamics often arise in Lotka-Volterra-type systems, but these equations do not directly correspond to neuronal processes. To address this, we use a version of the Universal Approximation Theorem to approximate any target dynamics by a neural network interpretable as a high-dimensional Amari-type neural-field system. When the target dynamics contains a heteroclinic cycle, the approximating vector field generates a periodic trajectory that closely follows the heteroclinic connection. As a case study, we consider the cognitive processes underlying focused-attention meditation. We show how the model reproduces sequential transitions among cognitive states and we conclude providing a neural interpretation of the approximating dynamics.

2026-05-04T09:09:13Z M Virginia Bolelli L2S Luca Greco L2S Dario Prandi CNRS, L2S http://arxiv.org/abs/2501.19106v3 Beyond variance: sensitivity-based dimensions in brain networks underlie individual differences in cognitive ability 2026-05-04T04:20:15Z

Explaining individual differences in cognitive abilities requires both identifying brain parameters that vary across individuals and understanding how brain networks are recruited for specific tasks. Typically, task performance relies on the integration and segregation of functional subnetworks, often captured by parameters like regional excitability and connectivity. Yet, the high dimensionality of these parameters hinders pinpointing their functional relevance. Here, we apply stiff-sloppy analysis to human brain data, revealing that certain subtle parameter combinations ("stiff dimensions") powerfully influence neural activity during task processing, whereas others ("sloppy dimensions") vary more extensively but exert minimal impact. Using a pairwise maximum entropy model of task fMRI, we show that even small deviations in stiff dimensions-derived through Fisher Information Matrix analysis-govern the dynamic interplay of segregation and integration between the default mode network (DMN) and a working memory network (WMN). Crucially, separating a 0-back task (vigilant attention) from a 2-back task (working memory updating) uncovers partially distinct stiff dimensions predicting performance in each condition, along with a global DMN-WMN segregation shared across both tasks. Altogether, stiff-sloppy analysis challenges the conventional focus on large parameter variability by highlighting these subtle yet functionally decisive parameter combinations.

2025-01-31T13:03:42Z Sida Chen Siqi Yang Zhao Chang Taro Toyoizumi Werner Sommer Lianchun Yu Qian-Yuan Tang Changsong Zhou http://arxiv.org/abs/2509.09152v2 LITcoder: A General-Purpose Library for Building and Comparing Encoding Models 2026-05-03T20:31:39Z

We introduce LITcoder, an open-source library for building and benchmarking neural encoding models. Designed as a flexible backend, LITcoder provides standardized tools for aligning continuous stimuli (e.g., text and speech) with brain data, transforming stimuli into representational features, mapping those features onto brain data, and evaluating the predictive performance of the resulting model on held-out data. The library implements a modular pipeline covering a wide array of methodological design choices, so researchers can easily compose, compare, and extend encoding models without reinventing core infrastructure. Such choices include brain datasets, brain regions, stimulus feature (both neural-net-based and control, such as word rate), downsampling approaches, and many others. In addition, the library provides built-in logging, plotting, and seamless integration with experiment tracking platforms such as Weights & Biases (W&B). We demonstrate the scalability and versatility of our framework by fitting a range of encoding models to three story listening datasets: LeBel et al. (2023), Narratives, and Little Prince. We also explore the methodological choices critical for building encoding models for continuous fMRI data, illustrating the importance of accounting for all tokens in a TR scan (as opposed to just taking the last one, even when contextualized), incorporating hemodynamic lag effects, using train-test splits that minimize information leakage, and accounting for head motion effects on encoding model predictivity. Overall, LITcoder lowers technical barriers to encoding model implementation, facilitates systematic comparisons across models and datasets, fosters methodological rigor, and accelerates the development of high-quality high-performance predictive models of brain activity. Project page: https://litcoder-brain.github.io

2025-09-11T05:14:14Z Accepted to the NeurIPS 2025 Workshop on Data on the Brain & Mind, Findings Track. OpenReview: https://openreview.net/forum?id=c9GUBrE5RV Taha Binhuraib Ruimin Gao Anna A. Ivanova http://arxiv.org/abs/2510.13768v2 Scaling Vision Transformers for Functional MRI with Flat Maps 2026-05-03T20:14:09Z

We study the problem of training self-supervised foundation models for functional MRI. Our main contributions are: (1) we introduce a new model family (CortexMAE) trained using the masked autoencoder framework on 2.1K hours of open fMRI data, and (2) we release the first open evaluation suite (Brainmarks) for fMRI foundation models. Our core innovation is simple: we adapt the Vision Transformer to fMRI by first converting each 3D fMRI volume to a 2D map using a cortical flat map projection. We directly compare flat maps to both parcellation and volume-based representations. While each has its advantages, flat maps generally perform best. We perform the first systematic scaling analysis for fMRI and observe strict power law scaling, albeit with limits. Finally, we use Brainmarks to do controlled benchmark comparisons. On subject-level trait prediction, we report a challenging null result: no single model achieves clear state-of-the-art performance. Moreover, all models struggle to outperform a simple functional connectivity baseline. On cognitive state decoding, we observe more robust performance, and in this setting our CortexMAE family outperforms prior models by a large margin. Code, models, and datasets are available at https://github.com/MedARC-AI/CortexMAE and https://github.com/MedARC-AI/Brainmarks.

2025-10-15T17:15:00Z Accepted at ICML 2026; Code: https://github.com/MedARC-AI/CortexMAE; Benchmark: https://github.com/MedARC-AI/Brainmarks; Discord: https://discord.gg/tVR4TWnRM9 Connor Lane Mihir Tripathy Leema Krishna Murali Ratna Sagari Grandhi Shamus Sim Zi Yang Sam Gijsen Debojyoti Das Manish Ram Utkarsh Kumar Singh Cesar Kadir Torrico Villanueva Yuxiang Wei Will Beddow Gianfranco Cortés Suin Cho Daniel Z. Kaplan Benjamin Warner Tanishq Mathew Abraham Paul S. Scotti