https://arxiv.org/api/a9PEwO8oyx0yiIU4efb9BvdCNOc2026-06-21T22:37:51Z1218121015http://arxiv.org/abs/2605.19352v1Brain alignment of reasoning and action representations from vision-language and action models during naturalistic gameplay2026-05-19T04:40:14ZUnderstanding how humans and artificial intelligence systems predict and plan by interacting with their environment is a fundamental challenge at the intersection of neuroscience and machine learning. Most brain-encoding studies focus on aligning artificial models with brain activity during language comprehension or passive visual processing, while interactive brain-alignment studies have to date been largely limited to reinforcement-learning (RL) agents and theory-based models. To address this gap, we study brain alignment of representative models from two foundation-model families, namely vision-language models (VLMs) and large-action models (LAMs), using fMRI recordings from participants playing naturalistic Atari-style video games. Specifically, we examine how action-focused and reasoning-focused prompts shape model's internal representations and align with fMRI brain activity. First, we find that both VLMs and LAMs exhibit significantly exhibit voxel-wise encoding performance than RL baselines, with the advantage holding even under matched feature dimensionality. Second, prompt-driven gains scale with the cortical processing hierarchy: the largest improvements appear in frontal-parietal and motor-planning regions, while early visual cortex gains roughly half as much. Third, variance partitioning reveals a qualitatively different representational organization: VLM is prompt-symmetric (12.5% unique action vs. 13.6% unique reasoning), whereas LAM is prompt-asymmetric (27% unique action vs. -5% unique reasoning), with the asymmetry strongest in frontal-motor cortex. Together, these results demonstrate that action-specialized fine-tuning reorganizes multimodal representations toward action-relevant neural computations even when whole-brain prediction accuracy is statistically equivalent between VLM and LAM.2026-05-19T04:40:14Z21 pages, 11 figuresSubba Reddy OotaAnant KhandelwalKhushbu PahwaSatya Sai Srinath NamburiTanmoy ChakrabortyBapi S. RajuManish Guptahttp://arxiv.org/abs/2604.01341v2Perceptual misalignment of texture representations in convolutional neural networks2026-05-18T21:38:16ZMathematical modeling of visual textures traces back to Julesz's intuition that texture perception in humans is based on local correlations between image features. An influential approach for texture analysis and generation generalizes this notion to linear correlations between the nonlinear features computed by convolutional neural networks (CNNs), compiled into Gram matrices. Given that CNNs are often used as models for the visual system, it is natural to ask whether such "texture representations" spontaneously align with the textures' perceptual content, and in particular whether those CNNs that are regarded as better models for the visual system also possess more human-like texture representations. Here we quantify the perceptual content captured by feature correlations computed for a diverse pool of CNNs, and we compare it to the models' perceptual alignment with the mammalian visual system as measured by Brain-Score. Surprisingly, we find that there is no connection between conventional measures of CNN quality as a model of the visual system and its alignment with human texture perception. We conclude that texture perception involves mechanisms that are distinct from those that are commonly modeled using approaches based on CNNs trained on object recognition, possibly depending on the integration of contextual information.2026-04-01T19:51:45ZLudovica de PaolisFabio AnselmiAlessio AnsuiniEugenio Piasinihttp://arxiv.org/abs/2605.19048v1Conserved Kinematic Representations enable Zero-Shot Decoding in Handwriting BCIs2026-05-18T19:13:42ZWhile intracortical Brain-Computer Interfaces (iBCIs) that decode imagined handwriting have achieved high communication rates for Latin scripts, they rely on observing every character in the alphabet during training. This poses a challenge in scaling to logographic languages (e.g., Chinese, Japanese), where the character set exceeds thousands of classes. The limitation highlights a fundamental question in motor neuroscience: does the motor cortex represent handwriting through the composition of shared kinematic primitives, that can be exploited by decoders? We introduce a computational framework for aligning neural activity to imagined kinematics in large datasets, enabling the training of a zero-shot capable machine learning algorithm for decoding unseen characters. Our model achieves 64% hits@3 retrieval on unseen letters, suggesting that neural representations of kinematic strokes are robustly conserved across different character contexts. This study provides a framework for dissecting conserved neural dynamics in large-scale intracortical datasets and offers strong evidence for a compositional basis of complex motor control. It also establishes a new paradigm for open-vocabulary iBCI communication with minimal recalibration burden on the user, crucial to increasing adoption of neuroprosthetics in logographic languages.2026-05-18T19:13:42ZSrinivas RavishankarVirginia de Sahttp://arxiv.org/abs/2605.18616v1Toward an Origin of Human Randomness: Interaction-Driven Enhancement in the Rock-Paper-Scissors Game2026-05-18T16:25:55ZHuman-generated randomness is constrained by cognitive, motor, and strategic biases. This study examines how these constraints appear in individual behavior and how they may be modified through interaction with another human. We analyzed repeated rock-paper-scissors data from 9 participants, yielding 108 human-human matches and 216 individual player sequences. Using Lempel-Ziv complexity (LZC), we compared human-human sequences with the RNG-opponent condition. In the RNG-opponent condition, the maximum human LZC value was 84, which we used as an empirical reference. In the human-human condition, most sequences remained below this value, but a small number exceeded it, producing a small high-complexity tail that was not present in the RNG-opponent condition. We introduced a sensitivity measure that captures whether a player responds to the opponent's recent frequency bias by choosing the move that beats the opponent's most frequent recent move. Partial regression showed that focal-player sensitivity positively predicted future entropy in the opponent's move sequence after controlling for the opponent's current entropy. Circular-shift surrogate analyses indicated that this relation was most clearly interaction-specific when the opponent was in a low-entropy state, where the recent move distribution contained a clear frequency bias. These results suggest that human randomness is not only an isolated individual capacity, but can be shaped by interaction in a state-dependent manner. The findings identify a local mechanism by which interaction may destabilize biased behavior and increase entropy, providing a concrete basis for future causal experiments and generative models of high-complexity human behavior.2026-05-18T16:25:55Z30 pages, 7 figuresSong-Ju KimShoma OharaHiroaki Kurokawahttp://arxiv.org/abs/2605.30368v1Reinterpreting Safety Thresholds as Neuron Spiking Thresholds2026-05-18T16:11:57ZSurrogate Safety Measures (SSMs) are extensively utilised in the evaluation of traffic risk in automated driving contexts. However, the majority of SSM-based evaluations employ fixed thresholds that fail to capture the human response to sustained borderline conditions or the reaction to brief, high-risk peaks. The present work proposes a biologically inspired reinterpretation of SSM thresholds. This is modelled as spiking thresholds of leaky integrate-and-fire (LIF) neurons, with multiple SSM inputs combined into a spiking neural network (SNN). The SNN is trained to emit spikes that are aligned with human braking onsets. The training data was recorded in a controlled car-following experiment using the 3D-CoAutoSim platform with CARLA/Unreal and a 6-DOF motion platform, where induced critical events were generated. The results demonstrate that the learned spiking activity qualitatively aligns with braking behaviour across scenarios and captures reactions that are not consistently explained by threshold crossings alone. Analysis across participants further indicates that learned input thresholds remain relatively consistent, while learned decay factors encode different temporal sensitivities for the SSMs. The findings of this study indicate that spiking dynamics may serve as a mechanism to facilitate the convergence of objective SSMs with subjective human safety perception.2026-05-18T16:11:57Z6 pagesEnrico Del ReMohamed SabryCristina Olaverri-Monrealhttp://arxiv.org/abs/2605.18557v1Self-supervised local learning rules learn the hidden hierarchical structure of high-dimensional data2026-05-18T15:37:28ZThe brain learns abstract representations of high-dimensional sensory input, but the plasticity rules that enable such learning are unknown. We study biologically plausible algorithms on the Random Hierarchy Model (RHM), an artificial dataset designed to investigate how deep neural networks learn the intrinsic hierarchical structure of high-dimensional data. We focus on two types of local learning rules that avoid both a long convergence time and the use of a symmetric error network. The first type uses direct feedback signals to approximate error propagation from the output layer. The second type uses layerwise self-supervised contrastive or non-contrastive loss functions that do not explicitly approximate errors at the output layer. We show that all rules of the first type fail to solve the tasks of the RHM and trace this failure back to input-specific nonlinearities (`masking') that are implemented in full backpropagation and are essential for learning complex tasks. However, algorithms of the second type are able to learn the hierarchical hidden structure of the RHM tasks and are as data-efficient as supervised backpropagation training, while being compatible with known rules of synaptic plasticity in cortex.2026-05-18T15:37:28ZAriane DelrocqWu S. ZihanGuillaume BellecWulfram Gerstnerhttp://arxiv.org/abs/2605.18251v1Subject-Specific Analysis of Self-Initiated Attention Shifts from EEG with Controlled Internal and External Attention Conditions2026-05-18T11:52:22ZSelf-initiated attention shifts play a critical role in voluntary behavior but are difficult to study due to the absence of explicit temporal markers. While previous studies have examined their neural correlates, it remains unclear how multi-dimensional electroencephalography (EEG) features contribute to their characterization within an interpretable computational framework. In this study, we build on an experimental paradigm developed in our previous work, which enables controlled comparison between task-constrained self-initiated shifts and externally instructed shifts under identical visual stimulation. Within this setting, we investigate whether preparatory EEG activity can distinguish these two types of attention shifts. We adopt a machine learning-based approach and conduct two complementary analyses: (1) a performance-oriented assessment of frequency-specific topographic patterns, and (2) a model-based feature attribution analysis using SHapley Additive exPlanations (SHAP). These analyses provide a structured view of how spectral features across regions of interest contribute to model behavior. Our results demonstrate reliable within-subject classification performance, indicating that preparatory EEG activity contains subject-specific discriminative information within this paradigm. The analysis shows that higher-frequency bands and frontal regions contribute strongly to model decisions, although such contributions should be interpreted cautiously due to the potential influence of non-neural artifacts in high-frequency EEG signals. Overall, this work highlights the value of interpretable machine learning for analyzing subject-specific EEG signal patterns in a controlled experimental setting, with potential applications in personalized and asynchronous brain-machine interface systems.2026-05-18T11:52:22ZYuwen ZengDengzhe HouZhang ZhangSai SunYongsong HuangChia-huei TsengSatoshi Shioirihttp://arxiv.org/abs/2605.18118v1Functional Whole-Brain Models: A New Framework for Unifying Brain Structure and Cognitive Function2026-05-18T09:26:38ZContemporary computational neuroscience features two prominent modeling traditions. Bottom-up whole-brain modeling (WBM) builds biophysically detailed simulations of brain structure and dynamics, whereas top-down neuroconnectionism optimizes deep neural networks for functional performance. Each has achieved remarkable success yet remains incomplete with WBMs lacking functional competence and neuroconnectionist models showing limited biological grounding. Here we propose functional whole-brain models (fWBMs) as a unified modeling paradigm that integrates structural and dynamical realism with task-performing capacity. fWBMs are defined by four minimal criteria: structural grounding in empirical connectomes and regional biology, continuous-time dynamical realism, functional competence across cognitive domains, and mappable observables to neuroimaging, electrophysiologcal and behavioral data. To formalize this integration, we establish a three-pillar roadmap across short-, mid-, and long-term horizons, and outline the scientific and clinical opportunities this paradigm enables. We argue that the disciplined pursuit of this integrative vision will generate the tools, common language, and cross-scale hypotheses needed to advance our understanding of the brain.2026-05-18T09:26:38ZMario SendenLeonardo Dalla PortaJan FousekJorge F. MejiasGorka Zamora-Lópezhttp://arxiv.org/abs/2605.12485v2Letting the neural code speak: Automated characterization of monkey visual neurons through human language2026-05-18T04:19:54ZUnderstanding what individual neurons encode is a core question in neuroscience. In primary visual cortex (V1), mathematical models (e.g., Gabor functions) capture neural selectivity, but no comparable framework exists for higher areas. We show that natural language can fill this role: across macaque V1 and V4, the selectivity of most neurons is captured by concise, verifiable semantic descriptions. Using digital twins of V1 and V4, we develop a closed-loop framework that translates each neuron's high- and low-activating images into dense captions, generates a semantic hypothesis and synthesized images, and verifies the hypothesis in silico. Descriptions range from oriented edges and spatial frequency in V1 to conjunctions of form, color, and texture in V4. In V4, images generated from activating and suppressing hypotheses drove 96.1% of neurons above the 95th and 97.6% below the 5th percentile of natural-image responses, respectively (vs. ~10% for random images); V1 activation results matched V4, while V1 suppression was less describable in language. Representational similarity analysis reveals partial alignment between neural activity, vision embeddings, and language embeddings, with vision most aligned to neural activity; alignment lost in the text bottleneck is recovered when hypotheses are rendered back into images, showing that linguistic compression is lossy yet semantically faithful. Together, these results show that combining generative models with neural digital twins enables interpretable, testable descriptions of neural function at scale, toward agentic scientific discovery.2026-05-12T17:58:22ZVedang LadKatrin FrankeTamar Rott ShahamSurya GanguliAndreas S. ToliasSophia SanbornNikos Karantzashttp://arxiv.org/abs/2603.03190v3Expectation and Acoustic Neural Network Representations Enhance Music Identification from Brain Activity2026-05-18T03:37:10ZDuring music listening, cortical activity encodes both acoustic and expectation-related information. Prior work has shown that ANN representations resemble cortical representations and can serve as supervisory signals for EEG recognition. Here we show that distinguishing acoustic and expectation-related ANN representations as teacher targets improves EEG-based music identification. Models pretrained to predict either representation outperform non-pretrained baselines, and combining them yields complementary gains that exceed strong seed ensembles formed by varying random initializations. These findings show that teacher representation type shapes downstream performance and that representation learning can be guided by neural encoding. This work points toward advances in predictive music cognition and neural decoding. Our expectation representation, computed directly from raw signals without manual labels, reflects predictive structure beyond onset or pitch, enabling investigation of multilayer predictive encoding across diverse stimuli. Its scalability to large, diverse datasets further suggests potential for developing general-purpose EEG models grounded in cortical encoding principles.2026-03-03T17:47:09Z47 pages, 12 figuresShogo NoguchiTaketo AkamaTai NakamuraShun MinamikawaNatalia Polouliakhhttp://arxiv.org/abs/2603.09089v2Sampling on Discrete Spaces with Temporal Point Processes2026-05-17T23:58:13ZTemporal point processes offer a powerful framework for sampling from discrete distributions, yet they remain underutilized in existing literature. We show how to construct, for any target multivariate count distribution with downward-closed support, a multivariate temporal point process whose event-count vector in a fixed-length sliding window converges in distribution to the target as time tends to infinity. Structured as a system of potentially coupled infinite-server queues with deterministic service times, the sampler exhibits a discrete form of momentum that suppresses random-walk behaviour. The admissible families of processes permit both reversible and non-reversible dynamics. As an application, we derive a recurrent stochastic neural network whose dynamics implement sampling-based computation and exhibit some biologically plausible features, including relative refractory periods and oscillations. The introduction of auxiliary randomness reduces the sampler to a birth-death process, establishing the latter as a degenerate case with the same limiting distribution. In simulations on 63 target distributions, our sampler always outperforms these birth-death processes and frequently outperforms Zanella processes in multivariate effective sample size, with further gains when normalized by CPU time.2026-03-10T01:58:49Z20 pages, 1 figure. Minor revisions to wording, notation, and formatting. No substantive changesCameron A. StewartGatsby Computational Neuroscience Unit, University College London, London, U.KManeesh SahaniGatsby Computational Neuroscience Unit, University College London, London, U.Khttp://arxiv.org/abs/2605.17399v1Von Economo neurons enable reliable social skill acquisition in recurrent spiking neural networks: a computational account with clinical predictions2026-05-17T11:39:58ZVon Economo neurons (VENs) are selectively lost in behavioural-variant frontotemporal dementia (bvFTD) and reduced in autism spectrum conditions (ASC), yet their computational role in social learning remains unexplained. We train a spiking neural network (the VENCircuit) embedding VEN-like projection neurons (K=40, 2% of total) in a recurrent pyramidal circuit across 50 matched random initialisations with and without VENs. The network is trained on a controlled binary classification task; we make no claim to model social cognition directly. VEN-intact networks converged in 49/50 cases (98%) versus 35/50 (70%) for VEN-ablated networks (Fisher's exact OR=21.0, 95% CI 2.7-167, p=8.7e-5). Failed ablated networks showed complete absence of learning, inconsistent with a speed-of-learning account. Phase-ablation experiments show VEN removal is most disruptive during mid-training (epochs 5-25), when a co-adaptive dependency forms in the pyramidal circuit. We derive a formal account showing VENs provide a direct gradient pathway immune to Jacobian instabilities affecting the recurrent circuit. Inference-time VEN ablation caused a significant performance drop (Wilcoxon p=0.022), ranging from no change (16/20 networks) to catastrophic collapse (0.989 to 0.620). VENs function as acquisition scaffolds whose developmental absence produces stochastic learning failure - a computational analogue of variable social skill acquisition in ASC - with falsifiable predictions for organoid and electrophysiology studies.2026-05-17T11:39:58Z21 pages, 5 figures, 4 tablesEsila Keskinhttp://arxiv.org/abs/2603.03347v3Efficient Coding Predicts Synaptic Conductance2026-05-17T09:03:53ZSynapses are information efficient in the sense that their natural conductance values convey as many bits per Joule as possible, but efficiency falls rapidly if the conductance is forced to deviate from its natural value (Harris et al, 2015. However, the exact manner in which efficiency falls as conductance deviates from its natural value remains unexplained. Recently, Malkin et al (2026) showed that synaptic noise is minimised given the available energy, consistent with a minimal energy boundary. This minimal energy boundary is a necessary, but not sufficient, condition for maximising information efficiency. By expressing the minimal energy boundary in terms of Shannon's information theory (Shannon, 1949), we show that synapses operate at signal-to-noise ratios which maximise information efficiency, and that this accurately predicts the decrease in efficiency values observed in Harris et al (2015) across a wide range of synaptic conductances. Crucially, the proposed model contains no free parameters because it is derived from the biophysics of the synapse. The results reported here are consistent with the general principle that neuronal systems in the brain have evolved to be as efficient as possible in terms of the number of bits per Joule.2026-02-25T15:51:39ZJames V Stonehttp://arxiv.org/abs/2605.17199v1Geometric Phase Transition Enables Extreme Hippocampal Memory Capacity2026-05-16T23:55:17ZMemory systems can store vastly different amounts of information despite similar hardware constraints. Here, we show that superior spatial memory emerges from a discrete stiffening of hippocampal population geometry-a transition from disorganized to crystalline collective coding. Comparing food-caching chickadees to non-caching zebra finches, we found that the caching hippocampus maintains a topologically rigid, "crystalline" geometry with significantly higher geometric stability (Shesha 0.245 v 0.166) and nearly two-fold greater temporal coherence (Shesha 0.393 v 0.209), while the non-caching hippocampus resembles a disorganized "mist." This stability is actively constructed by synergistic circuit dynamics: excitatory neurons form the spatial scaffold while inhibitory populations contribute orthogonal decorrelation, a circuit motif in which excitatory and inhibitory populations occupy largely non-overlapping representational subspaces. A double dissociation with Valiant's Stable Memory Allocator, a model predicting that dedicated neuron ensembles underlie each memory, confirms this advantage reflects continuous topological organization rather than discrete neuron allocation: caching networks exhibit near-zero split-half allocation reliability despite their geometric superiority. Computational modeling across 10k configurations reveals topological rigidity as the mathematical prerequisite for scale: crystalline codes sustain high-fidelity readout beyond M=1k locations while mist codes fail below M=10, a >100-fold capacity advantage. This capacity requires a 169fold representational redundancy: a "geometric tax" stabilizing the manifold against biological noise. These results establish geometric stability as a candidate organizing principle of biological memory: evolution achieves high-capacity memory not by proliferating neurons, but by engineering the geometry of the neural code itself.2026-05-16T23:55:17ZPrashant C. Rajuhttp://arxiv.org/abs/2605.17198v1MIRAGE: Robust multi-modal architectures translate fMRI-to-image models from vision to mental imagery2026-05-16T23:53:43ZTo be useful for downstream applications, vision decoding models that are trained to reconstruct seen images from human brain activity must be able to generalize to internally generated visual representations, i.e., mental images. In an analysis of the recently released NSD-Imagery dataset, we demonstrated that while some modern vision decoders can perform quite well on mental image reconstruction, some fail, and that state-of-the-art (SOTA) performance on seen image reconstruction is no guarantee of SOTA performance on mental image reconstruction. Motivated by these findings, we developed MIRAGE, a method explicitly designed to train on vision datasets and cross-decode mental images from brain activity. MIRAGE employs a linear backbone and multi-modal text and image features as input to a diffusion model. Feature metrics and human raters establish MIRAGE as SOTA for mental image reconstruction on the NSD-Imagery benchmark. With ablation analysis we show that mental image reconstruction works best when decoders use image features with relatively few dimensions and include guidance from text-based and both high- and low-level image-based features. Our work indicates that--given the right architecture--existing large-scale datasets using external stimuli are viable training data for decoding mental images, and warrant optimism about the future success and utility of mental image reconstruction.2026-05-16T23:53:43ZReese KneelandCesar Kadir Torrico VillanuevaJordyn OjedaShuhb KhannaJonathan XuPaul S. ScottiThomas Naselaris