https://arxiv.org/api/gLVrn7jYKNbWOiiY+iiNeaX7Ww8 2026-03-24T23:15:38Z 11816 270 15 http://arxiv.org/abs/2602.03269v1 Systematic review of self-supervised foundation models for brain network representation using electroencephalography 2026-02-03T08:54:56Z Automated analysis of electroencephalography (EEG) has recently undergone a paradigm shift. The introduction of transformer architectures and self-supervised pretraining (SSL) has led to the development of EEG foundation models. These models are pretrained on large amounts of unlabeled data and can be adapted to a range of downstream tasks. This systematic review summarizes recent SSL-trained EEG foundation models that learn whole-brain representations from multichannel EEG rather than representations derived from a single channel. We searched PubMed, IEEE Xplore, Scopus, and arXiv through July 21, 2025. Nineteen preprints and peer-reviewed articles met inclusion criteria. We extracted information regarding pretraining datasets, model architectures, pretraining SSL objectives, and downstream task applications. While pretraining data heavily relied on the Temple University EEG corpus, there was significant heterogeneity in model architecture and training objectives across studies. Transformer architectures were identified as the predominant pretraining architecture with state-space models such as MAMBA and S4 as emerging alternatives. Concerning SSL objectives, masked auto-encoding was most common, and other studies incorporate contrastive learning. Downstream tasks varied widely and implemented diverse fine-tuning strategies, which made direct comparison challenging. Furthermore, most studies used single-task fine-tuning, and a generalizable EEG foundation model remains lacking. In conclusion, the field is advancing rapidly but still limited by limited dataset diversity and the absence of standardized benchmarks. Progress will likely depend on larger and more diverse pretraining datasets, standardized evaluation protocols, and multi-task validation. The development will advance EEG foundation models towards robust and general-purpose relevant to both basic and clinical applications. 2026-02-03T08:54:56Z 19 pages, 1 figure, 3 tables Hannah Portmann Yosuke Morishima http://arxiv.org/abs/2602.03896v1 A Hitchhiker's Guide to Poisson Gradient Estimation 2026-02-03T08:47:30Z Poisson-distributed latent variable models are widely used in computational neuroscience, but differentiating through discrete stochastic samples remains challenging. Two approaches address this: Exponential Arrival Time (EAT) simulation and Gumbel-SoftMax (GSM) relaxation. We provide the first systematic comparison of these methods, along with practical guidance for practitioners. Our main technical contribution is a modification to the EAT method that theoretically guarantees an unbiased first moment (exactly matching the firing rate), and reduces second-moment bias. We evaluate these methods on their distributional fidelity, gradient quality, and performance on two tasks: (1) variational autoencoders with Poisson latents, and (2) partially observable generalized linear models, where latent neural connectivity must be inferred from observed spike trains. Across all metrics, our modified EAT method exhibits better overall performance (often comparable to exact gradients), and substantially higher robustness to hyperparameter choices. Together, our results clarify the trade-offs between these methods and offer concrete recommendations for practitioners working with Poisson latent variable models. 2026-02-03T08:47:30Z Code: https://github.com/hadivafaii/PoissonGradientEstimation Michael Ibrahim Hanqi Zhao Eli Sennesh Zhi Li Anqi Wu Jacob L. Yates Chengrui Li Hadi Vafaii http://arxiv.org/abs/2602.03240v1 Estimating measures of information processing during cognitive tasks using functional magnetic resonance imaging 2026-02-03T08:19:31Z Cognition is increasingly framed in terms of information processing, yet most fMRI analyses focus on activation or functional connectivity rather than quantifying how information is stored and transferred. To remedy this problem, we propose a framework for estimating measures of information processing: active information storage (AIS), transfer entropy (TE), and net synergy from task-based fMRI. AIS measures information maintained within a region, TE captures directed information flow, and net synergy contrasts higher-order synergistic to redundant interactions. Crucially, to enable this framework we utilised a recently developed approach for calculating information-theoretic measures: the cross mutual information. This approach combines resting-state and task data to address the challenges of limited sample size, non-stationarity and context in task-based fMRI. We applied this framework to the working memory (N-back) task from the Human Connectome Project (470 participants). Results show that AIS increases in fronto-parietal regions with working memory load, TE reveals enhanced directed information flows across control pathways, and net synergy indicates a global shift to redundancy. This work establishes a novel methodology for quantifying information processing in task-based fMRI. 2026-02-03T08:19:31Z Chetan Gohil Oliver M. Cliff James M. Shine Ben D. Fulcher Joseph T. Lizier http://arxiv.org/abs/2602.03172v1 Adversarial construction as a potential solution to the experiment design problem in large task spaces 2026-02-03T06:41:56Z Despite decades of work, we still lack a robust, task-general theory of human behavior even in the simplest domains. In this paper we tackle the generality problem head-on, by aiming to develop a unified model for all tasks embedded in a task-space. In particular we consider the space of binary sequence prediction tasks where the observations are generated by the space parameterized by hidden Markov models (HMM). As the space of tasks is large, experimental exploration of the entire space is infeasible. To solve this problem we propose the adversarial construction approach, which helps identify tasks that are most likely to elicit a qualitatively novel behavior. Our results suggest that adversarial construction significantly outperforms random sampling of environments and therefore could be used as a proxy for optimal experimental design in high-dimensional task spaces. 2026-02-03T06:41:56Z 7 pages, 7 figures Prakhar Godara Frederick Callaway Marcelo G. Mattar http://arxiv.org/abs/2602.02920v1 A Reproducible Framework for Bias-Resistant Machine Learning on Small-Sample Neuroimaging Data 2026-02-02T23:47:57Z We introduce a reproducible, bias-resistant machine learning framework that integrates domain-informed feature engineering, nested cross-validation, and calibrated decision-threshold optimization for small-sample neuroimaging data. Conventional cross-validation frameworks that reuse the same folds for both model selection and performance estimation yield optimistically biased results, limiting reproducibility and generalization. Demonstrated on a high-dimensional structural MRI dataset of deep brain stimulation cognitive outcomes, the framework achieved a nested-CV balanced accuracy of 0.660\,$\pm$\,0.068 using a compact, interpretable subset selected via importance-guided ranking. By combining interpretability and unbiased evaluation, this work provides a generalizable computational blueprint for reliable machine learning in data-limited biomedical domains. 2026-02-02T23:47:57Z Accepted to ISBI 2026, 5 pages with 1 figure Jagan Mohan Reddy Dwarampudi Jennifer L Purks Joshua Wong Renjie Hu Tania Banerjee http://arxiv.org/abs/2505.12387v4 Neural Thermodynamics: Entropic Forces in Deep and Universal Representation Learning 2026-02-02T21:27:46Z With the rapid discovery of emergent phenomena in deep learning and large language models, understanding their cause has become an urgent need. Here, we propose a rigorous entropic-force theory for understanding the learning dynamics of neural networks trained with stochastic gradient descent (SGD) and its variants. Building on the theory of parameter symmetries and an entropic loss landscape, we show that representation learning is crucially governed by emergent entropic forces arising from stochasticity and discrete-time updates. These forces systematically break continuous parameter symmetries and preserve discrete ones, leading to a series of gradient balance phenomena that resemble the equipartition property of thermal systems. These phenomena, in turn, (a) explain the universal alignment of neural representations between AI models and lead to a proof of the Platonic Representation Hypothesis, and (b) reconcile the seemingly contradictory observations of sharpness- and flatness-seeking behavior of deep learning optimization. Our theory and experiments demonstrate that a combination of entropic forces and symmetry breaking is key to understanding emergent phenomena in deep learning. 2025-05-18T12:25:42Z Published at NeurIPS 2025 Liu Ziyin Yizhou Xu Isaac Chuang http://arxiv.org/abs/2602.02494v1 MEG-XL: Data-Efficient Brain-to-Text via Long-Context Pre-Training 2026-02-02T18:59:50Z Clinical brain-to-text interfaces are designed for paralysed patients who cannot provide extensive training recordings. Pre-training improves data-efficient generalisation by learning statistical priors across subjects, but these priors critically depend on context. While natural speech might unfold gradually over minutes, most methods pre-train with only a few seconds of context. Thus, we propose MEG-XL, a model pre-trained with 2.5 minutes of MEG context per sample, 5-300x longer than prior work, and equivalent to 191k tokens, capturing extended neural context. Fine-tuning on the task of word decoding from brain data, MEG-XL matches supervised performance with a fraction of the data (e.g. 1hr vs 50hrs) and outperforms brain foundation models. We find that models pre-trained with longer contexts learn representations that transfer better to word decoding. Our results indicate that long-context pre-training helps exploit extended neural context that other methods unnecessarily discard. Code, model weights, and instructions are available at https://github.com/neural-processing-lab/MEG-XL . 2026-02-02T18:59:50Z 19 pages, 8 figures, 5 tables Dulhan Jayalath Oiwi Parker Jones http://arxiv.org/abs/2509.15748v2 Hybrid Lie semi-group and cascade structures for the generalized Gaussian derivative model for visual receptive fields 2026-02-02T10:29:16Z Because of the variabilities of real-world image structures under the natural image transformations that arise when observing similar objects or spatio-temporal events under different viewing conditions, the receptive field responses computed in the earliest layers of the visual hierarchy may be strongly influenced by such geometric image transformations. One way of handling this variability is by basing the vision system on covariant receptive field families, which expand the receptive field shapes over the degrees of freedom in the image transformations. This paper addresses the problem of deriving relationships between spatial and spatio-temporal receptive field responses obtained for different values of the shape parameters in the resulting multi-parameter families of receptive fields. For this purpose, we derive both (i) infinitesimal relationships, roughly corresponding to a combination of notions from semi-groups and Lie groups, as well as (ii) macroscopic cascade smoothing properties, which describe how receptive field responses at coarser spatial and temporal scales can be computed by applying smaller support incremental filters to the output from corresponding receptive fields at finer spatial and temporal scales, structurally related to the notion of Lie algebras, although with directional preferences. The presented results provide (i) a deeper understanding of the relationships between spatial and spatio-temporal receptive field responses for different values of the filter parameters, which can be used for both (ii) designing more efficient schemes for computing receptive field responses over populations of multi-parameter families of receptive fields, as well as (iii)~formulating idealized theoretical models of the computations of simple cells in biological vision. 2025-09-19T08:23:44Z 27 pages, 9 figures Tony Lindeberg http://arxiv.org/abs/2506.17310v2 PaceLLM: Brain-Inspired Large Language Models for Long-Context Understanding 2026-02-02T06:05:34Z While Large Language Models (LLMs) demonstrate strong performance across domains, their long-context capabilities are limited by transient neural activations causing information decay and unstructured feed-forward network (FFN) weights leading to semantic fragmentation. Inspired by the brain's working memory and cortical modularity, we propose PaceLLM, featuring two innovations: (1) a Persistent Activity (PA) Mechanism that mimics prefrontal cortex (PFC) neurons' persistent firing by introducing an activation-level memory bank to dynamically retrieve, reuse, and update critical FFN states, addressing contextual decay; and (2) Cortical Expert (CE) Clustering that emulates task-adaptive neural specialization to reorganize FFN weights into semantic modules, establishing cross-token dependencies and mitigating fragmentation. Extensive evaluations show that PaceLLM achieves 6% improvement on LongBench's Multi-document QA and 12.5-17.5% performance gains on Infinite-Bench tasks, while extending measurable context length to 200K tokens in Needle-In-A-Haystack (NIAH) tests. This work pioneers brain-inspired LLM optimization and is complementary to other works. Besides, it can be generalized to any model and enhance their long-context performance and interpretability without structural overhauls. 2025-06-18T09:17:06Z Accepted by NeurIPS2025 Kangcong Li Peng Ye Chongjun Tu Lin Zhang Chunfeng Song Jiamin Wu Tao Yang Qihao Zheng Tao Chen http://arxiv.org/abs/2409.13669v2 A Spatiotemporal Perspective on Dynamical Computation in Neural Information Processing Systems 2026-02-02T04:09:26Z Spatiotemporal flows of neural activity, such as traveling waves, have been observed throughout the brain since the earliest recordings; yet there is still little consensus on their functional role. Recent experiments and models have linked traveling waves to visual and physical motion, but these observations have been difficult to reconcile with standard accounts of topographically organized selectivity and feedforward receptive fields. Here, we introduce a theoretical framework that formalizes and generalizes the connection between 'motion' and flowing neural dynamics in the language of equivariant neural network theory. We consider 'motion' not only in physical or visual spaces, but also in more abstract representational spaces, and we argue that recurrent traveling-wave-like dynamics are not just useful but necessary for accurate and stable processing of any signal undergoing such motion. Formally, we show that for any non-trivial recurrent neural network to process a sequence undergoing a flow transformation (such as visual motion) in a structured equivariant manner, its hidden state dynamics must actively realize a homomorphic representation of the same flow through recurrent connectivity. In this ''spatiotemporal perspective on dynamical computation'', traveling waves and related flows are best understood as faithful dynamic representations of stimulus flows; and consequently the natural inclination of biological systems towards such dynamics may be viewed as an innate inductive bias towards efficiency and generalization in the spatiotemporally-structured dynamical world they inhabit. 2024-09-20T17:25:37Z T. Anderson Keller Lyle Muller Terrence J. Sejnowski Max Welling http://arxiv.org/abs/2602.02605v1 Fine-Tuning Language Models to Know What They Know 2026-02-02T04:08:13Z Metacognition is a critical component of intelligence, specifically regarding the awareness of one's own knowledge. While humans rely on shared internal memory for both answering questions and reporting their knowledge state, this dependency in LLMs remains underexplored. This study proposes a framework to measure metacognitive ability $d_{\rm{type2}}'$ using a dual-prompt method, followed by the introduction of Evolution Strategy for Metacognitive Alignment (ESMA) to bind a model's internal knowledge to its explicit behaviors. ESMA demonstrates robust generalization across diverse untrained settings, indicating a enhancement in the model's ability to reference its own knowledge. Furthermore, parameter analysis attributes these improvements to a sparse set of significant modifications. 2026-02-02T04:08:13Z Preprint Sangjun Park Elliot Meyerson Xin Qiu Risto Miikkulainen http://arxiv.org/abs/2602.01482v1 Community-Level Modeling of Gyral Folding Patterns for Robust and Anatomically Informed Individualized Brain Mapping 2026-02-01T23:13:01Z Cortical folding exhibits substantial inter-individual variability while preserving stable anatomical landmarks that enable fine-scale characterization of cortical organization. Among these, the three-hinge gyrus (3HG) serves as a key folding primitive, showing consistent topology yet meaningful variations in morphology, connectivity, and function. Existing landmark-based methods typically model each 3HG independently, ignoring that 3HGs form higher-order folding communities that capture mesoscale structure. This simplification weakens anatomical representation and makes one-to-one matching sensitive to positional variability and noise. We propose a spectral graph representation learning framework that models community-level folding units rather than isolated landmarks. Each 3HG is encoded using a dual-profile representation combining surface topology and structural connectivity. Subject-specific spectral clustering identifies coherent folding communities, followed by topological refinement to preserve anatomical continuity. For cross-subject correspondence, we introduce Joint Morphological-Geometric Matching, jointly optimizing geometric and morphometric similarity. Across over 1000 Human Connectome Project subjects, the resulting communities show reduced morphometric variance, stronger modular organization, improved hemispheric consistency, and superior alignment compared with atlas-based and landmark-based or embedding-based baselines. These findings demonstrate that community-level modeling provides a robust and anatomically grounded framework for individualized cortical characterization and reliable cross-subject correspondence. 2026-02-01T23:13:01Z Minheng Chen Tong Chen Yan Zhuang Chao Cao Jing Zhang Tianming Liu Lu Zhang Dajiang Zhu http://arxiv.org/abs/2602.01019v1 Inter- and Intra-Subject Variability in EEG: A Systematic Survey 2026-02-01T05:08:08Z Electroencephalography (EEG) underpins neuroscience, clinical neurophysiology, and brain-computer interfaces (BCIs), yet pronounced inter- and intra-subject variability limits reliability, reproducibility, and translation. This systematic review studies that quantified or modeled EEG variability across resting-state, event-related potentials (ERPs), and task-related/BCI paradigms (including motor imagery and SSVEP) in healthy and clinical cohorts. Across paradigms, inter-subject differences are typically larger than within-subject fluctuations, but both affect inference and model generalization. Stability is feature-dependent: alpha-band measures and individual alpha peak frequency are often relatively reliable, whereas higher-frequency and many connectivity-derived metrics show more heterogeneous reliability; ERP reliability varies by component, with P300 measures frequently showing moderate-to-good stability. We summarize major sources of variability (biological, state-related, technical, and analytical), review common quantification and modeling approaches (e.g., ICC, CV, SNR, generalizability theory, and multivariate/learning-based methods), and provide recommendations for study design, reporting, and harmonization. Overall, EEG variability should be treated as both a practical constraint to manage and a meaningful signal to leverage for precision neuroscience and robust neurotechnology. 2026-02-01T05:08:08Z Xuan-The Tran Thien-Nhan Vo Son-Tung Vu Thoa-Thi Tran Manh-Dat Nguyen Thomas Do Chin-Teng Lin http://arxiv.org/abs/2602.02562v1 A Distinct Communication Strategies Model of the Double Empathy Problem 2026-01-30T16:22:42Z The double empathy problem recasts the difficulty of forming empathy bonds in social interactions between autistic and neurotypical individuals as a bidirectional problem, rather than due to a deficit exclusive to the person on the spectrum. However, no explicit mechanism to explain such a phenomenon has been proposed. Here we build a feedback-loop mathematical model that would theoretically induce the empathy degradation observed during communication in neurotypical-autistic pairs solely due to differences in communication preferences between neurotypical and neurodivergent individuals. Numerical simulations of dyadic interactions show the model, whose mechanism is based solely on communication preferences, can illustrate the breakdown of empathic bonding observed clinically. Stability analysis of the model provides a way to predict the overall trajectory of the interaction in the empathy space. Furthermore, we suggest experimental designs to measure several parameters outlined here and discuss the future directions for testing the proposed model. 2026-01-30T16:22:42Z 16 pages, 5 figures Enrique Calderoli Maria Cristina Varriale Flávio Kapczinski http://arxiv.org/abs/2512.21881v3 SLIM-Brain: A Data- and Training-Efficient Foundation Model for fMRI Data Analysis 2026-01-30T15:29:42Z Foundation models are emerging as a powerful paradigm for fMRI analysis, but current approaches face a dual bottleneck of data- and training-efficiency. Atlas-based methods aggregate voxel signals into fixed regions of interest, reducing data dimensionality but discarding fine-grained spatial details, and requiring extremely large cohorts to train effectively as general-purpose foundation models. Atlas-free methods, on the other hand, operate directly on voxel-level information - preserving spatial fidelity but are prohibitively memory- and compute-intensive, making large-scale pre-training infeasible. We introduce SLIM-Brain (Sample-efficient, Low-memory fMRI Foundation Model for Human Brain), a new atlas-free foundation model that simultaneously improves both data- and training-efficiency. SLIM-Brain adopts a two-stage adaptive design: (i) a lightweight temporal extractor captures global context across full sequences and ranks data windows by saliency, and (ii) a 4D hierarchical encoder (Hiera-JEPA) learns fine-grained voxel-level representations only from the top-$k$ selected windows, while deleting about 70% masked patches. Extensive experiments across seven public benchmarks show that SLIM-Brain establishes new state-of-the-art performance on diverse tasks, while requiring only 4 thousand pre-training sessions and approximately 30% of GPU memory comparing to traditional voxel-level methods. 2025-12-26T06:10:31Z release code Mo Wang Junfeng Xia Wenhao Ye Enyu Liu Kaining Peng Jianfeng Feng Quanying Liu Hongkai Wen