https://arxiv.org/api/kn41r8BkJkkoEDJzuuCNy+k8uF0 2026-06-22T02:27:45Z 12181 255 15 http://arxiv.org/abs/2509.02139v7 On sources to variabilities of simple cells in the primary visual cortex: A principled theory for the interaction between geometric image transformations and receptive field responses 2026-05-13T08:51:00Z

This paper gives an overview of a theory for modelling the interaction between geometric image transformations and receptive field responses for a visual observer that views objects and spatio-temporal events in the environment. This treatment is developed over combinations of (i) uniform spatial scaling transformations, (ii) spatial affine transformations, (iii) Galilean transformations and (iv) temporal scaling transformations. By postulating that the family of receptive fields should be covariant under these classes of geometric image transformations, it follows that the receptive field shapes should be expanded over the degrees of freedom of the corresponding image transformations, to enable a formal matching between the receptive field responses computed under different viewing conditions for the same scene or for a structurally similar spatio-temporal event. We conclude the treatment by discussing and providing potential support for a working hypothesis that the receptive fields of simple cells in the primary visual cortex ought to be covariant under these classes of geometric image transformations, and thus have the shapes of their receptive fields expanded over the degrees of freedom of the corresponding geometric image transformations.

2025-09-02T09:41:55Z 40 pages, 19 figures Tony Lindeberg http://arxiv.org/abs/2605.12999v1 Implicit Behavioral Decoding from Next-Step Spike Forecasts at Population Scale 2026-05-13T04:55:03Z

Closed-loop brain-computer interfaces often require both a forecast of upcoming neural population activity and a readout of the animal's behavioral state. A single Mamba forecaster, trained only on next-step spike counts at Neuropixels scale, can deliver both in one forward pass. A lightweight per-session linear head reading the model's predicted rates decodes behavior better than the same linear classifier reading the raw spike counts, under matched temporal context. We test on the Steinmetz visual-discrimination benchmark, which spans 39 sessions, roughly 27,000 neurons, and 1,994 held-out trials. Across three training seeds, Mamba's predicted rates decode mouse choice at 75.7$\pm$0.2% trial vote, roughly 2.3 times chance level, and stimulus side at 66.1$\pm$0.6%, about twice chance. Compared to a matched 500 ms-context linear decoder on the raw spike counts, Mamba wins at trial vote by 4-6 pp on response and 4-6 pp on stimulus side. A session-start calibration block of about 100-150 trials brings the readout within 1-2 pp of asymptote, and the full pipeline fits inside the 50 ms bin budget on workstation-class GPUs typical of tethered chronic Neuropixels recordings.

2026-05-13T04:55:03Z 21 pages, 6 figures, 5 tables; submitted to NeurIPS 2026 Neuroscience & Cognitive Science Track John R. Minnick Jesus Gonzalez-Ferrer Kamran Hussain Jinghui Geng Ash Robbins Mohammed A. Mostajo-Radji David Haussler Jason Eshraghian Mircea Teodorescu http://arxiv.org/abs/2605.12992v1 SpikeProphecy: A Large-Scale Benchmark for Autoregressive Neural Population Forecasting 2026-05-13T04:45:35Z

Neural population models, which predict the joint firing of many simultaneously recorded neurons forward in time, are typically evaluated by a single aggregate Pearson correlation $r$ between predicted and actual spike counts, a number that masks critical structure. We argue that how we evaluate spike forecasting matters as much as what we build, and introduce SpikeProphecy, the first large-scale benchmark for causal, autoregressive spike-count forecasting on real electrophysiology recordings. Our core contribution is a population metric decomposition that separates aggregate performance into temporal fidelity, spatial pattern accuracy, and magnitude-invariant alignment. The decomposition surfaces aspects of the underlying data that an aggregate scalar collapses together. We apply the protocol to 105 Neuropixels sessions (Steinmetz 2019 + IBL Repeated Site; ~89,800 neurons) with seven architecture baselines spanning four structural families: four SSMs (three diagonal and one non-diagonal), a Transformer, an LSTM, and a spiking network. The decomposition surfaces a brain-region predictability ranking that reproduces across all seven baselines and survives ANCOVA correction for firing-statistics constraints (region $ΔR^2 = 0.018$ above the firing-statistics covariates). It also exposes a sub-Poisson evaluation floor where rigorous metrics combine with genuine biophysical constraints on regular spike trains, and yields a negative result on KL-on-output-rates distillation for ANN-to-SNN transfer in this Poisson count domain.

2026-05-13T04:45:35Z 26 pages, 4 figures, 12 tables; submitted to NeurIPS 2026 Datasets and Benchmarks Track; processed dataset at https://huggingface.co/datasets/mysteriousauthor/spikeprophecy-steinmetz (CC-BY-4.0); code at https://github.com/JohnMinnick/SpikeProphecy-A-Large-Scale-Benchmark-for-Autoregressive-Neural-Population-Forecasting John R. Minnick Jinghui Geng Kamran Hussain Jesus Gonzalez-Ferrer Ash Robbins Mohammed A. Mostajo-Radji David Haussler Jason K. Eshraghian Mircea Teodorescu http://arxiv.org/abs/2605.23967v1 Sensing Intelligence as a Trainable Metamaterial Property 2026-05-13T01:55:34Z

In biological systems, sensing is not performed by the brain alone: the body deforms, vibrates, and filters external stimuli before they are transduced into neural signals. In engineered systems, this processing burden is placed largely on electronics and computation, while the mechanical body is usually designed only for strength and stability. Here, we present sensing intelligence as a trainable property of the body. We show that the geometry of a metamaterial can be optimized to reshape external stimuli into internal signals that are easier for a neural network to interpret. Rather than hand-designing this physical preprocessing, we let the neural network train its own body for sensing by backpropagating the sensing loss to the body's design parameters through differentiable simulation. Across numerical and experimental sensing scenarios, the optimized body improves sensing accuracy by up to fivefold or reduces the number of required electronic sensors by nearly an order of magnitude.

2026-05-13T01:55:34Z Kyungmi Na Yifei Li Xinyi Yang Bolei Deng http://arxiv.org/abs/2605.13904v1 Feature Visualization Recovers Known Cortical Selectivity from TRIBE v2 2026-05-13T00:55:45Z

Brain encoder models predict cortical fMRI responses from the internal activations of pretrained vision and language networks, and are typically evaluated by held-out prediction accuracy. This is a useful signal for training but a poor one for interpretation: it tells us an encoder fits the data without telling us whether it has internalized the functional organization of the brain. We propose feature visualization -- gradient ascent on the encoder's predicted activation for a target region of interest (ROI) -- as a complementary interpretability technique, and apply it to TRIBE v2 composed with V-JEPA 2 (ViT-G, 40 layers), holding both frozen and synthesizing still images for seven regions spanning the ventral and dorsal visual hierarchies. Under identical hyperparameters, the probe recovers a visible progression of increasing spatial scale and feature complexity across V1 to V4, matching the ventral-stream hierarchy. It also produces three distinctive downstream regimes: radial "frozen-motion" streaks for the middle temporal area (MT) despite static-only optimization, face-like features for the fusiform face area (FFA), and consistent rectilinear line patterns for the parahippocampal place area (PPA). Optimized FFA stimuli drive the predicted region ~4x as much as a natural face photograph, consistent with feature visualization producing adversarial super-stimuli rather than canonical exemplars. The probe is simple, differentiable, and applicable to any brain encoder with a differentiable backbone, allowing for qualitative evaluation of brain encoders.

2026-05-13T00:55:45Z 8 pages, 3 figures, 2 tables. Code available at https://github.com/recozers/Tribe-V2-Interp Stuart Bladon Brinnae Bent http://arxiv.org/abs/2605.12763v1 State-Space NTK Collapse Near Bifurcations 2026-05-12T21:20:27Z

Rich feature learning in tasks that unfold over time often requires the model to pass through bifurcations, constituting qualitative changes in the underlying model dynamics. We develop a local theory of gradient descent near these transitions through the empirical state-space neural tangent kernel (sNTK). Our central finding is that bifurcations both dominate and simplify learning dynamics: near bifurcations, we can reduce sNTK to a rank-one operator corresponding to learning in a classical normal form system, providing an analytically tractable description of the local learning geometry, even for high-dimensional recurrent systems. Concretely, we give a procedure for decomposing sNTK into bifurcation-relevant and residual channels, showing that near commonly codimension-1 bifurcations the relevant channel is a rank-one operator that is highly amplified. This amplification causes the bifurcation channel to dominate the full sNTK. Thus, bifurcations locally warp the learning landscape, funneling gradient descent into a few critical dynamical directions and making the nearby kernel and loss geometry predictable from classical normal forms. We illustrate this in a student-teacher recurrent neural network: the first learned bifurcation coincides with a sharp collapse in sNTK effective rank and the emergence of a dominant parameter direction whose restricted sNTK closely matches the landscape predicted by the scalar pitchfork normal form. Finally, we show that low-rank natural gradient methods resolve the resulting learning instability near bifurcations with very little overhead over SGD.

2026-05-12T21:20:27Z James Hazelden Eric Shea-Brown http://arxiv.org/abs/2605.12732v1 Predictive Coding Light+: learning to predict visual sequences with spike timing-dependent plasticity and synaptic delays 2026-05-12T20:34:25Z

The ability to predict the future is of great value for biological and artificial cognitive systems alike. However, successfully predicting the future typically requires maintaining a memory of the recent past. It is currently unclear how biological or artificial spiking neural networks can learn to maintain past sensory information to help predict the future. Here we propose Predictive Coding Light+ (PCL+), a spiking neural network architecture for unsupervised sequence processing that learns recurrent excitatory connections with delays to enable short-term retention of information. We show that the PCL+ network reproduces classic findings on sequence learning in visual cortex. Furthermore, it learns to ``fill in'' missing input in a challenging gesture recognition task. Overall, our work shows how spiking neural networks can learn recurrent excitatory connections with delays to maintain a record of the recent past and successfully predict the future.

2026-05-12T20:34:25Z 13 pages, 7 figures, 2 tables, preprint Antony W. N'dri Thomas Barbier Céline Teulière Jochen Triesch http://arxiv.org/abs/2510.01502v2 Behavioral Geometric Supervision Aligns Video Foundation Models with Human Social Perception 2026-05-12T18:52:59Z

Current video foundation models, including the strongest self-supervised models such as V-JEPA2, fail to capture how humans organize social information in dynamic scenes. For example, across a range of diverse vision models tested, none were able to predict human similarity judgments to social video clips as well as a sentence embedding model of the caption text (MPNet). We show this gap in vision model performance can be closed by a compact behavioral supervisory signal. We introduce behavioral geometric supervision (BGS): a hybrid objective that constrains local and global pairwise embedding geometry to match the relational similarity structure across videos. We apply this method using a new human similarity dataset, containing 49,484 odd-one-out judgments from 250 naturalistic social video clips, and low-rank adaptation across four ViT backbones (V-JEPA 2/2.1, TimeSformer, VideoMAE, and CLIP). We find that one of the best fine-tuned models, V-JEPA 2.1, nearly triples in performance compared to the pre-trained baseline and reaches close to the noise ceiling, exceeding the strongest sentence-embedding baseline. In addition, finetuned models (i) capture unique variance in human judgments that caption-based language embeddings do not, (ii) develop interpretable social-affective attributes (valence, arousal, and dominance) despite never being trained on any of these attributes, (iii) zero-shot transfer to a separate dataset of out-of-distribution abstract social interactions, and (iv) shift spatial attention from scene context to socially informative regions (faces, gaze, and interacting bodies). A matched language-distillation control fails to reproduce these gains, ruling out caption transfer as the mechanism. Our results show how a modest amount of human behavioral data can steer video models toward human-like social visual understanding.

2025-10-01T22:29:55Z v2: Major revision. Retitled; expanded from TimeSformer alone to four backbones (V-JEPA 2/2.1, TimeSformer, VideoMAE, CLIP), with V-JEPA 2.1 nearly tripling pretrained performance. Adds zero-shot PHASE transfer, attention-rollout analysis, and a language-distillation control. Data (OOO sim. judgments) & core hybrid triplet+RSA LoRA method unchanged from v1. Prepared for NeurIPS 2026 submission Kathy Garcia Leyla Isik http://arxiv.org/abs/2605.12619v1 Human face perception reflects inverse-generative and naturalistic discriminative objectives 2026-05-12T18:06:57Z

The perceptual representations supporting our ability to recognize faces remain a computational mystery. Deep neural networks offer mechanistic hypotheses for human face perception, but theoretically distinct models often make indistinguishable representational predictions for randomly sampled faces. To expose diagnostic differences among these hypotheses, we compared six neural network models sharing an architecture but trained on distinct tasks, using face pairs optimized to elicit contrasting model predictions ("controversial" pairs) alongside randomly sampled pairs. We tested model predictions against face-dissimilarity judgments from 864 human participants across stimulus sets differing in realism and pose variation. Models prioritizing high-level, invariant structures (trained via inverse rendering, face identification, or object classification) most robustly matched human judgments. Furthermore, models trained on natural images typically outperformed synthetic-trained counterparts. Together, these findings suggest that human face perception is shaped by mechanisms that infer latent causes of facial appearance, discount nuisance variation, and are tuned by natural image statistics.

2026-05-12T18:06:57Z 33 pages, 10 figures, 4 tables Wenxuan Guo Heiko H. Schütt Kamila Maria Jozwik Katherine R. Storrs Nikolaus Kriegeskorte Tal Golan http://arxiv.org/abs/2605.10818v2 On periodic distributed representations using Fourier embeddings 2026-05-12T17:19:23Z

Periodic signals are critical for representing physical and perceptual phenomena. Scalar, real angular measures, e.g., radians and degrees, result in difficulty processing and distinguishing nearby angles, especially when their absolute difference exceeds pi. We can avoid this problem by using real-valued, periodic embeddings in high-dimensional space. These representations also allow us to control the nature of their dot product similarities, allowing us to construct a variety of different kernel shapes. In this work, we aim of highlight how these representations can be constructed and focus on the formalization of Dirichlet and periodic Gaussian kernels using the neurally-plausible representation scheme of Spatial Semantic Pointers.

2026-05-11T16:35:32Z Jakeb Chouinard http://arxiv.org/abs/2605.12404v1 Empirical scaling laws in balanced networks with conductance-based synapses 2026-05-12T17:02:52Z

Strongly coupled, recurrent, balanced network models have been successful in describing and predicting many phenomena observed in cortical neural recordings. However, most balanced network models use current-based synapse models in place of more realistic, conductance-based models. Conductance-based synapse models predict unrealistically small membrane potential variability. On the other hand, introducing realistic levels of spike time correlations to models with current-based synapses predicts unrealistically large membrane potential variability. We use computer simulations to show that these two effects can cancel: Recurrent network models with conductance-based synapses and spike time correlations produce more realistic, moderate levels of membrane potential variability. Consistent with recent work on feedforward networks, our results show that including more realistic modeling assumptions produces more realistic dynamics, but only if when two modeling assumptions are included together.

2026-05-12T17:02:52Z Vicky Zhu Gabriel Ocker Robert Rosenbaum http://arxiv.org/abs/2605.13893v1 From Organization to Viability: A Multi-Level Analysis of Gait Dynamics Under Occlusal Constraint 2026-05-12T10:15:18Z

Clinical interpretation often assumes that observable performance provides sufficient information about the organization of an adaptive system. However, similar observable performance may correspond to distinct latent organizations. This study extends a previous multi-level framework by introducing a fourth analytical level centered on longitudinal viability. Using an exploratory single-case design in a Parkinsonian patient, gait data were recorded with instrumented insoles under three occlusal conditions: neutral natural occlusion (ONL), a 2.5-degree increase in vertical dimension of occlusion (OC2.5), and a 3-degree increase in vertical dimension of occlusion (OC3). Two measurement sessions were conducted eleven weeks apart, during which the participant underwent a structured sensorimotor intervention. The vertical dimension of occlusion was considered as an experimentally varied constraint applied to an adaptive neuromechanical system. Although observable performance remained globally comparable across conditions, PCA-based latent-space analysis revealed differentiated longitudinal centroid displacements. OC3 exhibited the smallest displacement, ONL an intermediate displacement, and OC2.5 the largest displacement. This hierarchy supports the relevance of a Level 4 framework centered on viability, understood here as an exploratory proxy for a configuration's capacity to maintain lower longitudinal reorganization over time. These findings remain within-subject, exploratory, and non-causal. They do not establish a validated clinical threshold, causal occlusal effect, or therapeutic optimum. More generally, the work suggests that clinical relevance cannot be inferred solely from instantaneous performance or static latent structure, but may also depend on the capacity of a configuration to sustain a coherent trajectory over time.

2026-05-12T10:15:18Z 16 pages, 2 figures. Exploratory single-case study at the interface of quantitative biology, gait analysis, occlusion, sensorimotor regulation, latent-space modeling, and machine learning Jacques Raynal Pierre Slangen Elsa Raynal Jacques Margerit http://arxiv.org/abs/2605.11885v1 From Clever Hans to Scientific Discovery: Interpreting EEG Foundational Transformers with LRP 2026-05-12T09:59:14Z

Emerging foundation models (FMs) in electroencephalography (EEG) promise a path to scale deep learning in diagnostics and brain-computer interfaces despite data scarcity, yet their opaque nature remains a barrier to wider adoption. We investigate attention-aware Layer-wise relevance propagation (LRP) as a post-hoc attribution method for EEG-FMs, extending LRP's use on convolutional neural network (CNN)-based EEG models to the Transformer architectures that current FMs are based on. We find that LRP can both verify EEG-FM decisions and surface novel, biologically plausible hypotheses from them. In motor imagery, it unmasks 'Clever Hans' behavior where models prioritize task correlated ocular signals over the intended motor correlates. In a naturalistic paradigm for affect prediction, it reveals a recurring reliance on a central electrode cluster, suggesting a candidate sensorimotor signature of arousal. Though heatmap interpretation remains ambiguous in this complex domain, the results position LRP as a tool for both verification and exploration of EEG-FMs, a role that will grow in both importance and discovery potential as the underlying models mature.

2026-05-12T09:59:14Z 18 pages, 6 figures Justus Meyer zu Bexten Nico Scherf Bogdan Franczyk Simon M. Hofmann http://arxiv.org/abs/2605.11718v1 Self-organized MT Direction Maps Emerge from Spatiotemporal Contrastive Optimization 2026-05-12T08:05:35Z

The spatial and functional organization of the primate visual cortex is a fundamental problem in neuroscience. While recent computational frameworks like the Topographic Deep Artificial Neural Network (TDANN) have successfully modeled spatial organization in the ventral stream, the computational origins of the dorsal stream's distinct topographies, such as direction-selective maps in the middle temporal (MT) area, remain largely unresolved. In this work, we present a spatiotemporal TDANN to investigate whether MT topography is governed by the same universal principles. By training a 3D ResNet on naturalistic videos via a Momentum Contrast (MoCo) self-supervised paradigm alongside a biologically inspired spatial loss, we demonstrate the spontaneous emergence of brain-like direction maps and topological pinwheel structures. Crucially, we reveal that MT tuning properties, characterized by strong direction selectivity paired with a residual axial component, arise from a strict optimization trade-off between task-driven discriminative pressure and spatial regularization. The model's representations quantitatively match in vivo macaque MT physiological baselines, including direction selectivity index, circular variance, and pinwheel density. These findings unify the computational origins of the ventral and dorsal streams, establishing a general mechanism for cortical self-organization.

2026-05-12T08:05:35Z Zhaotian Gu Molan Li Jie Su Chang Liu Tianyi Qian Dahui Wang http://arxiv.org/abs/2605.11675v1 Accounting for Missed Events in the Bayesian Modeling of IP3R Multimodal Gating 2026-05-12T07:31:59Z

The Inositol 1,4,5-trisphosphate receptor channel (IP 3 R) is an important calcium channel involved in calcium-induced calcium release, playing a prominent role in intracellular calcium signaling. However, accurately characterizing its gating behavior remains a challenge, particularly due to the temporal resolution of patch clamp techniques that is not large enough to detect all short-lived events. This limitation can significantly bias the inference of kinetic models describing the receptor activity. To address this issue, we focused on the quantitative analysis of IP 3 R gating behavior using patch clamp data, with particular attention to missed events. We modeled IP 3 R channel gating using Hierarchical Markov chains and used a Bayesian approach that integrates missed event correction directly into the likelihood function, enabling more accurate parameter inference and model evaluation. We show that accounting for missed events deeply clarifies the multi-modal model that emerges from model selection. In this new model, the Park and Drive modes both consist of the same 3-state Markov model, with mode-dependent kinetic parameters: the Drive mode stabilizes the closed state directly connected to the open one, whereas the Park mode stabilizes the other closed state, that is not connected to the open one. Intermediate Ca 2+ concentrations are found to strongly depress the Drive to Park transition rate, so that the IP 3 R channel undergoes frequent transitions to the Park mode only for __ 50 nM or micromolar Ca 2+ concentrations. Overall, our approach provides a refined perspective on IP 3 R channel modeling and highlights the critical importance of accounting for missed events upon model selection based on single-channel recordings.

2026-05-12T07:31:59Z Schayma Ben Marzougui AISTROSIGHT Audrey Denizot AISTROSIGHT Hugues Berry AISTROSIGHT