https://arxiv.org/api/a9PEwO8oyx0yiIU4efb9BvdCNOc 2026-06-21T22:37:51Z 12181 210 15 http://arxiv.org/abs/2605.19352v1 Brain alignment of reasoning and action representations from vision-language and action models during naturalistic gameplay 2026-05-19T04:40:14Z

Understanding how humans and artificial intelligence systems predict and plan by interacting with their environment is a fundamental challenge at the intersection of neuroscience and machine learning. Most brain-encoding studies focus on aligning artificial models with brain activity during language comprehension or passive visual processing, while interactive brain-alignment studies have to date been largely limited to reinforcement-learning (RL) agents and theory-based models. To address this gap, we study brain alignment of representative models from two foundation-model families, namely vision-language models (VLMs) and large-action models (LAMs), using fMRI recordings from participants playing naturalistic Atari-style video games. Specifically, we examine how action-focused and reasoning-focused prompts shape model's internal representations and align with fMRI brain activity. First, we find that both VLMs and LAMs exhibit significantly exhibit voxel-wise encoding performance than RL baselines, with the advantage holding even under matched feature dimensionality. Second, prompt-driven gains scale with the cortical processing hierarchy: the largest improvements appear in frontal-parietal and motor-planning regions, while early visual cortex gains roughly half as much. Third, variance partitioning reveals a qualitatively different representational organization: VLM is prompt-symmetric (12.5% unique action vs. 13.6% unique reasoning), whereas LAM is prompt-asymmetric (27% unique action vs. -5% unique reasoning), with the asymmetry strongest in frontal-motor cortex. Together, these results demonstrate that action-specialized fine-tuning reorganizes multimodal representations toward action-relevant neural computations even when whole-brain prediction accuracy is statistically equivalent between VLM and LAM.

2026-05-19T04:40:14Z 21 pages, 11 figures Subba Reddy Oota Anant Khandelwal Khushbu Pahwa Satya Sai Srinath Namburi Tanmoy Chakraborty Bapi S. Raju Manish Gupta http://arxiv.org/abs/2604.01341v2 Perceptual misalignment of texture representations in convolutional neural networks 2026-05-18T21:38:16Z

Mathematical modeling of visual textures traces back to Julesz's intuition that texture perception in humans is based on local correlations between image features. An influential approach for texture analysis and generation generalizes this notion to linear correlations between the nonlinear features computed by convolutional neural networks (CNNs), compiled into Gram matrices. Given that CNNs are often used as models for the visual system, it is natural to ask whether such "texture representations" spontaneously align with the textures' perceptual content, and in particular whether those CNNs that are regarded as better models for the visual system also possess more human-like texture representations. Here we quantify the perceptual content captured by feature correlations computed for a diverse pool of CNNs, and we compare it to the models' perceptual alignment with the mammalian visual system as measured by Brain-Score. Surprisingly, we find that there is no connection between conventional measures of CNN quality as a model of the visual system and its alignment with human texture perception. We conclude that texture perception involves mechanisms that are distinct from those that are commonly modeled using approaches based on CNNs trained on object recognition, possibly depending on the integration of contextual information.

2026-04-01T19:51:45Z Ludovica de Paolis Fabio Anselmi Alessio Ansuini Eugenio Piasini http://arxiv.org/abs/2605.19048v1 Conserved Kinematic Representations enable Zero-Shot Decoding in Handwriting BCIs 2026-05-18T19:13:42Z

While intracortical Brain-Computer Interfaces (iBCIs) that decode imagined handwriting have achieved high communication rates for Latin scripts, they rely on observing every character in the alphabet during training. This poses a challenge in scaling to logographic languages (e.g., Chinese, Japanese), where the character set exceeds thousands of classes. The limitation highlights a fundamental question in motor neuroscience: does the motor cortex represent handwriting through the composition of shared kinematic primitives, that can be exploited by decoders? We introduce a computational framework for aligning neural activity to imagined kinematics in large datasets, enabling the training of a zero-shot capable machine learning algorithm for decoding unseen characters. Our model achieves 64% hits@3 retrieval on unseen letters, suggesting that neural representations of kinematic strokes are robustly conserved across different character contexts. This study provides a framework for dissecting conserved neural dynamics in large-scale intracortical datasets and offers strong evidence for a compositional basis of complex motor control. It also establishes a new paradigm for open-vocabulary iBCI communication with minimal recalibration burden on the user, crucial to increasing adoption of neuroprosthetics in logographic languages.

2026-05-18T19:13:42Z Srinivas Ravishankar Virginia de Sa http://arxiv.org/abs/2605.18616v1 Toward an Origin of Human Randomness: Interaction-Driven Enhancement in the Rock-Paper-Scissors Game 2026-05-18T16:25:55Z

Human-generated randomness is constrained by cognitive, motor, and strategic biases. This study examines how these constraints appear in individual behavior and how they may be modified through interaction with another human. We analyzed repeated rock-paper-scissors data from 9 participants, yielding 108 human-human matches and 216 individual player sequences. Using Lempel-Ziv complexity (LZC), we compared human-human sequences with the RNG-opponent condition. In the RNG-opponent condition, the maximum human LZC value was 84, which we used as an empirical reference. In the human-human condition, most sequences remained below this value, but a small number exceeded it, producing a small high-complexity tail that was not present in the RNG-opponent condition. We introduced a sensitivity measure that captures whether a player responds to the opponent's recent frequency bias by choosing the move that beats the opponent's most frequent recent move. Partial regression showed that focal-player sensitivity positively predicted future entropy in the opponent's move sequence after controlling for the opponent's current entropy. Circular-shift surrogate analyses indicated that this relation was most clearly interaction-specific when the opponent was in a low-entropy state, where the recent move distribution contained a clear frequency bias. These results suggest that human randomness is not only an isolated individual capacity, but can be shaped by interaction in a state-dependent manner. The findings identify a local mechanism by which interaction may destabilize biased behavior and increase entropy, providing a concrete basis for future causal experiments and generative models of high-complexity human behavior.

2026-05-18T16:25:55Z 30 pages, 7 figures Song-Ju Kim Shoma Ohara Hiroaki Kurokawa http://arxiv.org/abs/2605.30368v1 Reinterpreting Safety Thresholds as Neuron Spiking Thresholds 2026-05-18T16:11:57Z

Surrogate Safety Measures (SSMs) are extensively utilised in the evaluation of traffic risk in automated driving contexts. However, the majority of SSM-based evaluations employ fixed thresholds that fail to capture the human response to sustained borderline conditions or the reaction to brief, high-risk peaks. The present work proposes a biologically inspired reinterpretation of SSM thresholds. This is modelled as spiking thresholds of leaky integrate-and-fire (LIF) neurons, with multiple SSM inputs combined into a spiking neural network (SNN). The SNN is trained to emit spikes that are aligned with human braking onsets. The training data was recorded in a controlled car-following experiment using the 3D-CoAutoSim platform with CARLA/Unreal and a 6-DOF motion platform, where induced critical events were generated. The results demonstrate that the learned spiking activity qualitatively aligns with braking behaviour across scenarios and captures reactions that are not consistently explained by threshold crossings alone. Analysis across participants further indicates that learned input thresholds remain relatively consistent, while learned decay factors encode different temporal sensitivities for the SSMs. The findings of this study indicate that spiking dynamics may serve as a mechanism to facilitate the convergence of objective SSMs with subjective human safety perception.

2026-05-18T16:11:57Z 6 pages Enrico Del Re Mohamed Sabry Cristina Olaverri-Monreal http://arxiv.org/abs/2605.18557v1 Self-supervised local learning rules learn the hidden hierarchical structure of high-dimensional data 2026-05-18T15:37:28Z

The brain learns abstract representations of high-dimensional sensory input, but the plasticity rules that enable such learning are unknown. We study biologically plausible algorithms on the Random Hierarchy Model (RHM), an artificial dataset designed to investigate how deep neural networks learn the intrinsic hierarchical structure of high-dimensional data. We focus on two types of local learning rules that avoid both a long convergence time and the use of a symmetric error network. The first type uses direct feedback signals to approximate error propagation from the output layer. The second type uses layerwise self-supervised contrastive or non-contrastive loss functions that do not explicitly approximate errors at the output layer. We show that all rules of the first type fail to solve the tasks of the RHM and trace this failure back to input-specific nonlinearities (`masking') that are implemented in full backpropagation and are essential for learning complex tasks. However, algorithms of the second type are able to learn the hierarchical hidden structure of the RHM tasks and are as data-efficient as supervised backpropagation training, while being compatible with known rules of synaptic plasticity in cortex.

2026-05-18T15:37:28Z Ariane Delrocq Wu S. Zihan Guillaume Bellec Wulfram Gerstner http://arxiv.org/abs/2605.18251v1 Subject-Specific Analysis of Self-Initiated Attention Shifts from EEG with Controlled Internal and External Attention Conditions 2026-05-18T11:52:22Z

Self-initiated attention shifts play a critical role in voluntary behavior but are difficult to study due to the absence of explicit temporal markers. While previous studies have examined their neural correlates, it remains unclear how multi-dimensional electroencephalography (EEG) features contribute to their characterization within an interpretable computational framework. In this study, we build on an experimental paradigm developed in our previous work, which enables controlled comparison between task-constrained self-initiated shifts and externally instructed shifts under identical visual stimulation. Within this setting, we investigate whether preparatory EEG activity can distinguish these two types of attention shifts. We adopt a machine learning-based approach and conduct two complementary analyses: (1) a performance-oriented assessment of frequency-specific topographic patterns, and (2) a model-based feature attribution analysis using SHapley Additive exPlanations (SHAP). These analyses provide a structured view of how spectral features across regions of interest contribute to model behavior. Our results demonstrate reliable within-subject classification performance, indicating that preparatory EEG activity contains subject-specific discriminative information within this paradigm. The analysis shows that higher-frequency bands and frontal regions contribute strongly to model decisions, although such contributions should be interpreted cautiously due to the potential influence of non-neural artifacts in high-frequency EEG signals. Overall, this work highlights the value of interpretable machine learning for analyzing subject-specific EEG signal patterns in a controlled experimental setting, with potential applications in personalized and asynchronous brain-machine interface systems.

2026-05-18T11:52:22Z Yuwen Zeng Dengzhe Hou Zhang Zhang Sai Sun Yongsong Huang Chia-huei Tseng Satoshi Shioiri http://arxiv.org/abs/2605.18118v1 Functional Whole-Brain Models: A New Framework for Unifying Brain Structure and Cognitive Function 2026-05-18T09:26:38Z

Contemporary computational neuroscience features two prominent modeling traditions. Bottom-up whole-brain modeling (WBM) builds biophysically detailed simulations of brain structure and dynamics, whereas top-down neuroconnectionism optimizes deep neural networks for functional performance. Each has achieved remarkable success yet remains incomplete with WBMs lacking functional competence and neuroconnectionist models showing limited biological grounding. Here we propose functional whole-brain models (fWBMs) as a unified modeling paradigm that integrates structural and dynamical realism with task-performing capacity. fWBMs are defined by four minimal criteria: structural grounding in empirical connectomes and regional biology, continuous-time dynamical realism, functional competence across cognitive domains, and mappable observables to neuroimaging, electrophysiologcal and behavioral data. To formalize this integration, we establish a three-pillar roadmap across short-, mid-, and long-term horizons, and outline the scientific and clinical opportunities this paradigm enables. We argue that the disciplined pursuit of this integrative vision will generate the tools, common language, and cross-scale hypotheses needed to advance our understanding of the brain.

2026-05-18T09:26:38Z Mario Senden Leonardo Dalla Porta Jan Fousek Jorge F. Mejias Gorka Zamora-López http://arxiv.org/abs/2605.12485v2 Letting the neural code speak: Automated characterization of monkey visual neurons through human language 2026-05-18T04:19:54Z

Understanding what individual neurons encode is a core question in neuroscience. In primary visual cortex (V1), mathematical models (e.g., Gabor functions) capture neural selectivity, but no comparable framework exists for higher areas. We show that natural language can fill this role: across macaque V1 and V4, the selectivity of most neurons is captured by concise, verifiable semantic descriptions. Using digital twins of V1 and V4, we develop a closed-loop framework that translates each neuron's high- and low-activating images into dense captions, generates a semantic hypothesis and synthesized images, and verifies the hypothesis in silico. Descriptions range from oriented edges and spatial frequency in V1 to conjunctions of form, color, and texture in V4. In V4, images generated from activating and suppressing hypotheses drove 96.1% of neurons above the 95th and 97.6% below the 5th percentile of natural-image responses, respectively (vs. ~10% for random images); V1 activation results matched V4, while V1 suppression was less describable in language. Representational similarity analysis reveals partial alignment between neural activity, vision embeddings, and language embeddings, with vision most aligned to neural activity; alignment lost in the text bottleneck is recovered when hypotheses are rendered back into images, showing that linguistic compression is lossy yet semantically faithful. Together, these results show that combining generative models with neural digital twins enables interpretable, testable descriptions of neural function at scale, toward agentic scientific discovery.

2026-05-12T17:58:22Z Vedang Lad Katrin Franke Tamar Rott Shaham Surya Ganguli Andreas S. Tolias Sophia Sanborn Nikos Karantzas http://arxiv.org/abs/2603.03190v3 Expectation and Acoustic Neural Network Representations Enhance Music Identification from Brain Activity 2026-05-18T03:37:10Z

During music listening, cortical activity encodes both acoustic and expectation-related information. Prior work has shown that ANN representations resemble cortical representations and can serve as supervisory signals for EEG recognition. Here we show that distinguishing acoustic and expectation-related ANN representations as teacher targets improves EEG-based music identification. Models pretrained to predict either representation outperform non-pretrained baselines, and combining them yields complementary gains that exceed strong seed ensembles formed by varying random initializations. These findings show that teacher representation type shapes downstream performance and that representation learning can be guided by neural encoding. This work points toward advances in predictive music cognition and neural decoding. Our expectation representation, computed directly from raw signals without manual labels, reflects predictive structure beyond onset or pitch, enabling investigation of multilayer predictive encoding across diverse stimuli. Its scalability to large, diverse datasets further suggests potential for developing general-purpose EEG models grounded in cortical encoding principles.

2026-03-03T17:47:09Z 47 pages, 12 figures Shogo Noguchi Taketo Akama Tai Nakamura Shun Minamikawa Natalia Polouliakh http://arxiv.org/abs/2603.09089v2 Sampling on Discrete Spaces with Temporal Point Processes 2026-05-17T23:58:13Z

Temporal point processes offer a powerful framework for sampling from discrete distributions, yet they remain underutilized in existing literature. We show how to construct, for any target multivariate count distribution with downward-closed support, a multivariate temporal point process whose event-count vector in a fixed-length sliding window converges in distribution to the target as time tends to infinity. Structured as a system of potentially coupled infinite-server queues with deterministic service times, the sampler exhibits a discrete form of momentum that suppresses random-walk behaviour. The admissible families of processes permit both reversible and non-reversible dynamics. As an application, we derive a recurrent stochastic neural network whose dynamics implement sampling-based computation and exhibit some biologically plausible features, including relative refractory periods and oscillations. The introduction of auxiliary randomness reduces the sampler to a birth-death process, establishing the latter as a degenerate case with the same limiting distribution. In simulations on 63 target distributions, our sampler always outperforms these birth-death processes and frequently outperforms Zanella processes in multivariate effective sample size, with further gains when normalized by CPU time.

2026-03-10T01:58:49Z 20 pages, 1 figure. Minor revisions to wording, notation, and formatting. No substantive changes Cameron A. Stewart Gatsby Computational Neuroscience Unit, University College London, London, U.K Maneesh Sahani Gatsby Computational Neuroscience Unit, University College London, London, U.K http://arxiv.org/abs/2605.17399v1 Von Economo neurons enable reliable social skill acquisition in recurrent spiking neural networks: a computational account with clinical predictions 2026-05-17T11:39:58Z

Von Economo neurons (VENs) are selectively lost in behavioural-variant frontotemporal dementia (bvFTD) and reduced in autism spectrum conditions (ASC), yet their computational role in social learning remains unexplained. We train a spiking neural network (the VENCircuit) embedding VEN-like projection neurons (K=40, 2% of total) in a recurrent pyramidal circuit across 50 matched random initialisations with and without VENs. The network is trained on a controlled binary classification task; we make no claim to model social cognition directly. VEN-intact networks converged in 49/50 cases (98%) versus 35/50 (70%) for VEN-ablated networks (Fisher's exact OR=21.0, 95% CI 2.7-167, p=8.7e-5). Failed ablated networks showed complete absence of learning, inconsistent with a speed-of-learning account. Phase-ablation experiments show VEN removal is most disruptive during mid-training (epochs 5-25), when a co-adaptive dependency forms in the pyramidal circuit. We derive a formal account showing VENs provide a direct gradient pathway immune to Jacobian instabilities affecting the recurrent circuit. Inference-time VEN ablation caused a significant performance drop (Wilcoxon p=0.022), ranging from no change (16/20 networks) to catastrophic collapse (0.989 to 0.620). VENs function as acquisition scaffolds whose developmental absence produces stochastic learning failure - a computational analogue of variable social skill acquisition in ASC - with falsifiable predictions for organoid and electrophysiology studies.

2026-05-17T11:39:58Z 21 pages, 5 figures, 4 tables Esila Keskin http://arxiv.org/abs/2603.03347v3 Efficient Coding Predicts Synaptic Conductance 2026-05-17T09:03:53Z

Synapses are information efficient in the sense that their natural conductance values convey as many bits per Joule as possible, but efficiency falls rapidly if the conductance is forced to deviate from its natural value (Harris et al, 2015. However, the exact manner in which efficiency falls as conductance deviates from its natural value remains unexplained. Recently, Malkin et al (2026) showed that synaptic noise is minimised given the available energy, consistent with a minimal energy boundary. This minimal energy boundary is a necessary, but not sufficient, condition for maximising information efficiency. By expressing the minimal energy boundary in terms of Shannon's information theory (Shannon, 1949), we show that synapses operate at signal-to-noise ratios which maximise information efficiency, and that this accurately predicts the decrease in efficiency values observed in Harris et al (2015) across a wide range of synaptic conductances. Crucially, the proposed model contains no free parameters because it is derived from the biophysics of the synapse. The results reported here are consistent with the general principle that neuronal systems in the brain have evolved to be as efficient as possible in terms of the number of bits per Joule.

2026-02-25T15:51:39Z James V Stone http://arxiv.org/abs/2605.17199v1 Geometric Phase Transition Enables Extreme Hippocampal Memory Capacity 2026-05-16T23:55:17Z

Memory systems can store vastly different amounts of information despite similar hardware constraints. Here, we show that superior spatial memory emerges from a discrete stiffening of hippocampal population geometry-a transition from disorganized to crystalline collective coding. Comparing food-caching chickadees to non-caching zebra finches, we found that the caching hippocampus maintains a topologically rigid, "crystalline" geometry with significantly higher geometric stability (Shesha 0.245 v 0.166) and nearly two-fold greater temporal coherence (Shesha 0.393 v 0.209), while the non-caching hippocampus resembles a disorganized "mist." This stability is actively constructed by synergistic circuit dynamics: excitatory neurons form the spatial scaffold while inhibitory populations contribute orthogonal decorrelation, a circuit motif in which excitatory and inhibitory populations occupy largely non-overlapping representational subspaces. A double dissociation with Valiant's Stable Memory Allocator, a model predicting that dedicated neuron ensembles underlie each memory, confirms this advantage reflects continuous topological organization rather than discrete neuron allocation: caching networks exhibit near-zero split-half allocation reliability despite their geometric superiority. Computational modeling across 10k configurations reveals topological rigidity as the mathematical prerequisite for scale: crystalline codes sustain high-fidelity readout beyond M=1k locations while mist codes fail below M=10, a >100-fold capacity advantage. This capacity requires a 169fold representational redundancy: a "geometric tax" stabilizing the manifold against biological noise. These results establish geometric stability as a candidate organizing principle of biological memory: evolution achieves high-capacity memory not by proliferating neurons, but by engineering the geometry of the neural code itself.

2026-05-16T23:55:17Z Prashant C. Raju http://arxiv.org/abs/2605.17198v1 MIRAGE: Robust multi-modal architectures translate fMRI-to-image models from vision to mental imagery 2026-05-16T23:53:43Z

To be useful for downstream applications, vision decoding models that are trained to reconstruct seen images from human brain activity must be able to generalize to internally generated visual representations, i.e., mental images. In an analysis of the recently released NSD-Imagery dataset, we demonstrated that while some modern vision decoders can perform quite well on mental image reconstruction, some fail, and that state-of-the-art (SOTA) performance on seen image reconstruction is no guarantee of SOTA performance on mental image reconstruction. Motivated by these findings, we developed MIRAGE, a method explicitly designed to train on vision datasets and cross-decode mental images from brain activity. MIRAGE employs a linear backbone and multi-modal text and image features as input to a diffusion model. Feature metrics and human raters establish MIRAGE as SOTA for mental image reconstruction on the NSD-Imagery benchmark. With ablation analysis we show that mental image reconstruction works best when decoders use image features with relatively few dimensions and include guidance from text-based and both high- and low-level image-based features. Our work indicates that--given the right architecture--existing large-scale datasets using external stimuli are viable training data for decoding mental images, and warrant optimism about the future success and utility of mental image reconstruction.

2026-05-16T23:53:43Z Reese Kneeland Cesar Kadir Torrico Villanueva Jordyn Ojeda Shuhb Khanna Jonathan Xu Paul S. Scotti Thomas Naselaris