https://arxiv.org/api/gLVrn7jYKNbWOiiY+iiNeaX7Ww82026-03-24T23:15:38Z1181627015http://arxiv.org/abs/2602.03269v1Systematic review of self-supervised foundation models for brain network representation using electroencephalography2026-02-03T08:54:56ZAutomated analysis of electroencephalography (EEG) has recently undergone a paradigm shift. The introduction of transformer architectures and self-supervised pretraining (SSL) has led to the development of EEG foundation models. These models are pretrained on large amounts of unlabeled data and can be adapted to a range of downstream tasks. This systematic review summarizes recent SSL-trained EEG foundation models that learn whole-brain representations from multichannel EEG rather than representations derived from a single channel. We searched PubMed, IEEE Xplore, Scopus, and arXiv through July 21, 2025. Nineteen preprints and peer-reviewed articles met inclusion criteria. We extracted information regarding pretraining datasets, model architectures, pretraining SSL objectives, and downstream task applications. While pretraining data heavily relied on the Temple University EEG corpus, there was significant heterogeneity in model architecture and training objectives across studies. Transformer architectures were identified as the predominant pretraining architecture with state-space models such as MAMBA and S4 as emerging alternatives. Concerning SSL objectives, masked auto-encoding was most common, and other studies incorporate contrastive learning. Downstream tasks varied widely and implemented diverse fine-tuning strategies, which made direct comparison challenging. Furthermore, most studies used single-task fine-tuning, and a generalizable EEG foundation model remains lacking. In conclusion, the field is advancing rapidly but still limited by limited dataset diversity and the absence of standardized benchmarks. Progress will likely depend on larger and more diverse pretraining datasets, standardized evaluation protocols, and multi-task validation. The development will advance EEG foundation models towards robust and general-purpose relevant to both basic and clinical applications.2026-02-03T08:54:56Z19 pages, 1 figure, 3 tablesHannah PortmannYosuke Morishimahttp://arxiv.org/abs/2602.03896v1A Hitchhiker's Guide to Poisson Gradient Estimation2026-02-03T08:47:30ZPoisson-distributed latent variable models are widely used in computational neuroscience, but differentiating through discrete stochastic samples remains challenging. Two approaches address this: Exponential Arrival Time (EAT) simulation and Gumbel-SoftMax (GSM) relaxation. We provide the first systematic comparison of these methods, along with practical guidance for practitioners. Our main technical contribution is a modification to the EAT method that theoretically guarantees an unbiased first moment (exactly matching the firing rate), and reduces second-moment bias. We evaluate these methods on their distributional fidelity, gradient quality, and performance on two tasks: (1) variational autoencoders with Poisson latents, and (2) partially observable generalized linear models, where latent neural connectivity must be inferred from observed spike trains. Across all metrics, our modified EAT method exhibits better overall performance (often comparable to exact gradients), and substantially higher robustness to hyperparameter choices. Together, our results clarify the trade-offs between these methods and offer concrete recommendations for practitioners working with Poisson latent variable models.2026-02-03T08:47:30ZCode: https://github.com/hadivafaii/PoissonGradientEstimationMichael IbrahimHanqi ZhaoEli SenneshZhi LiAnqi WuJacob L. YatesChengrui LiHadi Vafaiihttp://arxiv.org/abs/2602.03240v1Estimating measures of information processing during cognitive tasks using functional magnetic resonance imaging2026-02-03T08:19:31ZCognition is increasingly framed in terms of information processing, yet most fMRI analyses focus on activation or functional connectivity rather than quantifying how information is stored and transferred. To remedy this problem, we propose a framework for estimating measures of information processing: active information storage (AIS), transfer entropy (TE), and net synergy from task-based fMRI. AIS measures information maintained within a region, TE captures directed information flow, and net synergy contrasts higher-order synergistic to redundant interactions. Crucially, to enable this framework we utilised a recently developed approach for calculating information-theoretic measures: the cross mutual information. This approach combines resting-state and task data to address the challenges of limited sample size, non-stationarity and context in task-based fMRI. We applied this framework to the working memory (N-back) task from the Human Connectome Project (470 participants). Results show that AIS increases in fronto-parietal regions with working memory load, TE reveals enhanced directed information flows across control pathways, and net synergy indicates a global shift to redundancy. This work establishes a novel methodology for quantifying information processing in task-based fMRI.2026-02-03T08:19:31ZChetan GohilOliver M. CliffJames M. ShineBen D. FulcherJoseph T. Lizierhttp://arxiv.org/abs/2602.03172v1Adversarial construction as a potential solution to the experiment design problem in large task spaces2026-02-03T06:41:56ZDespite decades of work, we still lack a robust, task-general theory of human behavior even in the simplest domains. In this paper we tackle the generality problem head-on, by aiming to develop a unified model for all tasks embedded in a task-space. In particular we consider the space of binary sequence prediction tasks where the observations are generated by the space parameterized by hidden Markov models (HMM). As the space of tasks is large, experimental exploration of the entire space is infeasible. To solve this problem we propose the adversarial construction approach, which helps identify tasks that are most likely to elicit a qualitatively novel behavior. Our results suggest that adversarial construction significantly outperforms random sampling of environments and therefore could be used as a proxy for optimal experimental design in high-dimensional task spaces.2026-02-03T06:41:56Z7 pages, 7 figuresPrakhar GodaraFrederick CallawayMarcelo G. Mattarhttp://arxiv.org/abs/2602.02920v1A Reproducible Framework for Bias-Resistant Machine Learning on Small-Sample Neuroimaging Data2026-02-02T23:47:57ZWe introduce a reproducible, bias-resistant machine learning framework that integrates domain-informed feature engineering, nested cross-validation, and calibrated decision-threshold optimization for small-sample neuroimaging data. Conventional cross-validation frameworks that reuse the same folds for both model selection and performance estimation yield optimistically biased results, limiting reproducibility and generalization. Demonstrated on a high-dimensional structural MRI dataset of deep brain stimulation cognitive outcomes, the framework achieved a nested-CV balanced accuracy of 0.660\,$\pm$\,0.068 using a compact, interpretable subset selected via importance-guided ranking. By combining interpretability and unbiased evaluation, this work provides a generalizable computational blueprint for reliable machine learning in data-limited biomedical domains.2026-02-02T23:47:57ZAccepted to ISBI 2026, 5 pages with 1 figureJagan Mohan Reddy DwarampudiJennifer L PurksJoshua WongRenjie HuTania Banerjeehttp://arxiv.org/abs/2505.12387v4Neural Thermodynamics: Entropic Forces in Deep and Universal Representation Learning2026-02-02T21:27:46ZWith the rapid discovery of emergent phenomena in deep learning and large language models, understanding their cause has become an urgent need. Here, we propose a rigorous entropic-force theory for understanding the learning dynamics of neural networks trained with stochastic gradient descent (SGD) and its variants. Building on the theory of parameter symmetries and an entropic loss landscape, we show that representation learning is crucially governed by emergent entropic forces arising from stochasticity and discrete-time updates. These forces systematically break continuous parameter symmetries and preserve discrete ones, leading to a series of gradient balance phenomena that resemble the equipartition property of thermal systems. These phenomena, in turn, (a) explain the universal alignment of neural representations between AI models and lead to a proof of the Platonic Representation Hypothesis, and (b) reconcile the seemingly contradictory observations of sharpness- and flatness-seeking behavior of deep learning optimization. Our theory and experiments demonstrate that a combination of entropic forces and symmetry breaking is key to understanding emergent phenomena in deep learning.2025-05-18T12:25:42ZPublished at NeurIPS 2025Liu ZiyinYizhou XuIsaac Chuanghttp://arxiv.org/abs/2602.02494v1MEG-XL: Data-Efficient Brain-to-Text via Long-Context Pre-Training2026-02-02T18:59:50ZClinical brain-to-text interfaces are designed for paralysed patients who cannot provide extensive training recordings. Pre-training improves data-efficient generalisation by learning statistical priors across subjects, but these priors critically depend on context. While natural speech might unfold gradually over minutes, most methods pre-train with only a few seconds of context. Thus, we propose MEG-XL, a model pre-trained with 2.5 minutes of MEG context per sample, 5-300x longer than prior work, and equivalent to 191k tokens, capturing extended neural context. Fine-tuning on the task of word decoding from brain data, MEG-XL matches supervised performance with a fraction of the data (e.g. 1hr vs 50hrs) and outperforms brain foundation models. We find that models pre-trained with longer contexts learn representations that transfer better to word decoding. Our results indicate that long-context pre-training helps exploit extended neural context that other methods unnecessarily discard. Code, model weights, and instructions are available at https://github.com/neural-processing-lab/MEG-XL .2026-02-02T18:59:50Z19 pages, 8 figures, 5 tablesDulhan JayalathOiwi Parker Joneshttp://arxiv.org/abs/2509.15748v2Hybrid Lie semi-group and cascade structures for the generalized Gaussian derivative model for visual receptive fields2026-02-02T10:29:16ZBecause of the variabilities of real-world image structures under the natural image transformations that arise when observing similar objects or spatio-temporal events under different viewing conditions, the receptive field responses computed in the earliest layers of the visual hierarchy may be strongly influenced by such geometric image transformations. One way of handling this variability is by basing the vision system on covariant receptive field families, which expand the receptive field shapes over the degrees of freedom in the image transformations.
This paper addresses the problem of deriving relationships between spatial and spatio-temporal receptive field responses obtained for different values of the shape parameters in the resulting multi-parameter families of receptive fields. For this purpose, we derive both (i) infinitesimal relationships, roughly corresponding to a combination of notions from semi-groups and Lie groups, as well as (ii) macroscopic cascade smoothing properties, which describe how receptive field responses at coarser spatial and temporal scales can be computed by applying smaller support incremental filters to the output from corresponding receptive fields at finer spatial and temporal scales, structurally related to the notion of Lie algebras, although with directional preferences.
The presented results provide (i) a deeper understanding of the relationships between spatial and spatio-temporal receptive field responses for different values of the filter parameters, which can be used for both (ii) designing more efficient schemes for computing receptive field responses over populations of multi-parameter families of receptive fields, as well as (iii)~formulating idealized theoretical models of the computations of simple cells in biological vision.2025-09-19T08:23:44Z27 pages, 9 figuresTony Lindeberghttp://arxiv.org/abs/2506.17310v2PaceLLM: Brain-Inspired Large Language Models for Long-Context Understanding2026-02-02T06:05:34ZWhile Large Language Models (LLMs) demonstrate strong performance across domains, their long-context capabilities are limited by transient neural activations causing information decay and unstructured feed-forward network (FFN) weights leading to semantic fragmentation. Inspired by the brain's working memory and cortical modularity, we propose PaceLLM, featuring two innovations: (1) a Persistent Activity (PA) Mechanism that mimics prefrontal cortex (PFC) neurons' persistent firing by introducing an activation-level memory bank to dynamically retrieve, reuse, and update critical FFN states, addressing contextual decay; and (2) Cortical Expert (CE) Clustering that emulates task-adaptive neural specialization to reorganize FFN weights into semantic modules, establishing cross-token dependencies and mitigating fragmentation. Extensive evaluations show that PaceLLM achieves 6% improvement on LongBench's Multi-document QA and 12.5-17.5% performance gains on Infinite-Bench tasks, while extending measurable context length to 200K tokens in Needle-In-A-Haystack (NIAH) tests. This work pioneers brain-inspired LLM optimization and is complementary to other works. Besides, it can be generalized to any model and enhance their long-context performance and interpretability without structural overhauls.2025-06-18T09:17:06ZAccepted by NeurIPS2025Kangcong LiPeng YeChongjun TuLin ZhangChunfeng SongJiamin WuTao YangQihao ZhengTao Chenhttp://arxiv.org/abs/2409.13669v2A Spatiotemporal Perspective on Dynamical Computation in Neural Information Processing Systems2026-02-02T04:09:26ZSpatiotemporal flows of neural activity, such as traveling waves, have been observed throughout the brain since the earliest recordings; yet there is still little consensus on their functional role. Recent experiments and models have linked traveling waves to visual and physical motion, but these observations have been difficult to reconcile with standard accounts of topographically organized selectivity and feedforward receptive fields. Here, we introduce a theoretical framework that formalizes and generalizes the connection between 'motion' and flowing neural dynamics in the language of equivariant neural network theory. We consider 'motion' not only in physical or visual spaces, but also in more abstract representational spaces, and we argue that recurrent traveling-wave-like dynamics are not just useful but necessary for accurate and stable processing of any signal undergoing such motion. Formally, we show that for any non-trivial recurrent neural network to process a sequence undergoing a flow transformation (such as visual motion) in a structured equivariant manner, its hidden state dynamics must actively realize a homomorphic representation of the same flow through recurrent connectivity. In this ''spatiotemporal perspective on dynamical computation'', traveling waves and related flows are best understood as faithful dynamic representations of stimulus flows; and consequently the natural inclination of biological systems towards such dynamics may be viewed as an innate inductive bias towards efficiency and generalization in the spatiotemporally-structured dynamical world they inhabit.2024-09-20T17:25:37ZT. Anderson KellerLyle MullerTerrence J. SejnowskiMax Wellinghttp://arxiv.org/abs/2602.02605v1Fine-Tuning Language Models to Know What They Know2026-02-02T04:08:13ZMetacognition is a critical component of intelligence, specifically regarding the awareness of one's own knowledge. While humans rely on shared internal memory for both answering questions and reporting their knowledge state, this dependency in LLMs remains underexplored. This study proposes a framework to measure metacognitive ability $d_{\rm{type2}}'$ using a dual-prompt method, followed by the introduction of Evolution Strategy for Metacognitive Alignment (ESMA) to bind a model's internal knowledge to its explicit behaviors. ESMA demonstrates robust generalization across diverse untrained settings, indicating a enhancement in the model's ability to reference its own knowledge. Furthermore, parameter analysis attributes these improvements to a sparse set of significant modifications.2026-02-02T04:08:13ZPreprintSangjun ParkElliot MeyersonXin QiuRisto Miikkulainenhttp://arxiv.org/abs/2602.01482v1Community-Level Modeling of Gyral Folding Patterns for Robust and Anatomically Informed Individualized Brain Mapping2026-02-01T23:13:01ZCortical folding exhibits substantial inter-individual variability while preserving stable anatomical landmarks that enable fine-scale characterization of cortical organization. Among these, the three-hinge gyrus (3HG) serves as a key folding primitive, showing consistent topology yet meaningful variations in morphology, connectivity, and function. Existing landmark-based methods typically model each 3HG independently, ignoring that 3HGs form higher-order folding communities that capture mesoscale structure. This simplification weakens anatomical representation and makes one-to-one matching sensitive to positional variability and noise. We propose a spectral graph representation learning framework that models community-level folding units rather than isolated landmarks. Each 3HG is encoded using a dual-profile representation combining surface topology and structural connectivity. Subject-specific spectral clustering identifies coherent folding communities, followed by topological refinement to preserve anatomical continuity. For cross-subject correspondence, we introduce Joint Morphological-Geometric Matching, jointly optimizing geometric and morphometric similarity. Across over 1000 Human Connectome Project subjects, the resulting communities show reduced morphometric variance, stronger modular organization, improved hemispheric consistency, and superior alignment compared with atlas-based and landmark-based or embedding-based baselines. These findings demonstrate that community-level modeling provides a robust and anatomically grounded framework for individualized cortical characterization and reliable cross-subject correspondence.2026-02-01T23:13:01ZMinheng ChenTong ChenYan ZhuangChao CaoJing ZhangTianming LiuLu ZhangDajiang Zhuhttp://arxiv.org/abs/2602.01019v1Inter- and Intra-Subject Variability in EEG: A Systematic Survey2026-02-01T05:08:08ZElectroencephalography (EEG) underpins neuroscience, clinical neurophysiology, and brain-computer interfaces (BCIs), yet pronounced inter- and intra-subject variability limits reliability, reproducibility, and translation. This systematic review studies that quantified or modeled EEG variability across resting-state, event-related potentials (ERPs), and task-related/BCI paradigms (including motor imagery and SSVEP) in healthy and clinical cohorts. Across paradigms, inter-subject differences are typically larger than within-subject fluctuations, but both affect inference and model generalization. Stability is feature-dependent: alpha-band measures and individual alpha peak frequency are often relatively reliable, whereas higher-frequency and many connectivity-derived metrics show more heterogeneous reliability; ERP reliability varies by component, with P300 measures frequently showing moderate-to-good stability. We summarize major sources of variability (biological, state-related, technical, and analytical), review common quantification and modeling approaches (e.g., ICC, CV, SNR, generalizability theory, and multivariate/learning-based methods), and provide recommendations for study design, reporting, and harmonization. Overall, EEG variability should be treated as both a practical constraint to manage and a meaningful signal to leverage for precision neuroscience and robust neurotechnology.2026-02-01T05:08:08ZXuan-The TranThien-Nhan VoSon-Tung VuThoa-Thi TranManh-Dat NguyenThomas DoChin-Teng Linhttp://arxiv.org/abs/2602.02562v1A Distinct Communication Strategies Model of the Double Empathy Problem2026-01-30T16:22:42ZThe double empathy problem recasts the difficulty of forming empathy bonds in social interactions between autistic and neurotypical individuals as a bidirectional problem, rather than due to a deficit exclusive to the person on the spectrum. However, no explicit mechanism to explain such a phenomenon has been proposed. Here we build a feedback-loop mathematical model that would theoretically induce the empathy degradation observed during communication in neurotypical-autistic pairs solely due to differences in communication preferences between neurotypical and neurodivergent individuals. Numerical simulations of dyadic interactions show the model, whose mechanism is based solely on communication preferences, can illustrate the breakdown of empathic bonding observed clinically. Stability analysis of the model provides a way to predict the overall trajectory of the interaction in the empathy space. Furthermore, we suggest experimental designs to measure several parameters outlined here and discuss the future directions for testing the proposed model.2026-01-30T16:22:42Z16 pages, 5 figuresEnrique CalderoliMaria Cristina VarrialeFlávio Kapczinskihttp://arxiv.org/abs/2512.21881v3SLIM-Brain: A Data- and Training-Efficient Foundation Model for fMRI Data Analysis2026-01-30T15:29:42ZFoundation models are emerging as a powerful paradigm for fMRI analysis, but current approaches face a dual bottleneck of data- and training-efficiency. Atlas-based methods aggregate voxel signals into fixed regions of interest, reducing data dimensionality but discarding fine-grained spatial details, and requiring extremely large cohorts to train effectively as general-purpose foundation models. Atlas-free methods, on the other hand, operate directly on voxel-level information - preserving spatial fidelity but are prohibitively memory- and compute-intensive, making large-scale pre-training infeasible. We introduce SLIM-Brain (Sample-efficient, Low-memory fMRI Foundation Model for Human Brain), a new atlas-free foundation model that simultaneously improves both data- and training-efficiency. SLIM-Brain adopts a two-stage adaptive design: (i) a lightweight temporal extractor captures global context across full sequences and ranks data windows by saliency, and (ii) a 4D hierarchical encoder (Hiera-JEPA) learns fine-grained voxel-level representations only from the top-$k$ selected windows, while deleting about 70% masked patches. Extensive experiments across seven public benchmarks show that SLIM-Brain establishes new state-of-the-art performance on diverse tasks, while requiring only 4 thousand pre-training sessions and approximately 30% of GPU memory comparing to traditional voxel-level methods.2025-12-26T06:10:31Zrelease codeMo WangJunfeng XiaWenhao YeEnyu LiuKaining PengJianfeng FengQuanying LiuHongkai Wen