https://arxiv.org/api/gLVrn7jYKNbWOiiY+iiNeaX7Ww82026-06-22T03:29:26Z1218127015http://arxiv.org/abs/2605.13884v1Consciousness as Uncommon Self-Knowledge: A Synergistic Information Framework2026-05-11T18:17:27ZWe propose uncommon self-knowledge (USK) as a candidate criterion for consciousness: synergistic information a system carries about itself that exists only in the joint of its subsystems and is destroyed by decomposition. Drawing on Gottwald's partition-lattice grounding of Partial Information Decomposition (PID), where redundancy corresponds to Aumann's common knowledge and synergy to the gap between separate and joint observation, we propose the synergistic component of self-directed information as a candidate formal signature for conscious processing. If correct, the framework would (1) offer a clean separation between consciousness and metacognition (synergistic vs. redundant self-knowledge), (2) provide principled resolutions to counterexamples that challenge IIT, GWT, and HOT, (3) be operationalizable via Partial Information Rate Decomposition (PIRD) with self-targeting, and (4) generate distinctive empirical predictions, the strongest being a GWT timing dissociation (consciousness correlates with pre-broadcast synergy formation, not broadcast itself) and a specific dissociation between self-report disruption and task-performance disruption under middle-layer perturbation in LLMs. The proposal is consistent with recent empirical findings that both anaesthesia and Alzheimer's disease specifically reduce synergistic information processing while preserving or increasing redundancy.2026-05-11T18:17:27ZConceptual and formal paper on consciousness as uncommon self-knowledge, 8 pages, 2 tablesKrti Tallamhttp://arxiv.org/abs/2605.10356v1Cortico-cerebellar modularity as an architectural inductive bias for efficient temporal learning2026-05-11T11:03:41ZThe cerebellum and cerebral cortex form tightly coupled circuits thought to support flexible and efficient temporal processing. How this interaction shapes cortical learning dynamics, and whether such heterogeneous modularity can benefit artificial systems, remains unclear. Here, we augment a recurrent neural network (RNN) with a cerebellar-inspired feedforward module and evaluate the resulting architecture on temporal tasks of varying difficulty. The cortico-cerebellar RNN (CB-RNN) learns faster and reaches higher maximum performance than parameter-matched fully recurrent baselines across a variety of regimes. Crucially, freezing the recurrent core after minimal training and delegating subsequent learning to the cerebellar module preserves superior learning efficiency, suggesting the cerebellar module is a primary driver of efficiency and that the cortical network can largely function as a fixed reservoir. Our results suggest that heterogeneous modular architectures can act as a powerful structural inductive bias in neural systems.2026-05-11T11:03:41ZAlexandra VoceEmmanouil GiannakakisClaudia Clopathhttp://arxiv.org/abs/2506.04289v3Relational reasoning and inductive bias in transformers and large language models2026-05-11T09:35:19ZTransformer-based models have demonstrated remarkable reasoning abilities, but the mechanisms underlying relational reasoning remain poorly understood. We investigate how transformers perform \textit{transitive inference}, a classic relational reasoning behavior from psychology which elicits inference about indirectly related items (e.g., if $A > B$ and $B > C$, then $A > C$). We compare in-weights learning (IWL) and in-context learning (ICL) behaviors and mechanisms on these tasks, and fine profoundly different patterns of generalization. IWL models learn a linear embedding, which leads to transitive inference as well as other behavioral effects present in humans and animals. ICL models, in contrast, are capable of learning to generalize transitively, but only do so when it is necessitated by the training data, otherwise learning a match-and-copy strategy. Interestingly, pre-training ICL models on in-context linear regression tasks that provide them with a latent linear representation is sufficient to make the ICL behaviors and internal representations qualitatively and quantitatively more like IWL. In order to test whether the same inference patterns are present across in large language models, we leverage a congruency paradigm which allows us to differentially probe IWL and ICL generalization patterns without access to their training data. We indeed see IWL reasoning leads to more transitive generalization than ICL. Moreover, we find that prompting the ICL models to use a linear mental map led to increased transitive inference over different geometric prompts. Together, these results reveal that both the training regime and the geometric structure of induced representations critically determine transformers capacity for transitive inference.2025-06-04T10:15:05Z15 pages, 10 figuresJesse GeertsAndrew LiuStephanie ChanClaudia ClopathKimberly Stachenfeldhttp://arxiv.org/abs/2605.10178v1Joint sparse coding and temporal dynamics support context reconfiguration2026-05-11T08:29:00ZAdaptive behavior requires the brain to transition between distinct contexts while maintaining representations of prior experience. The ability to reconfigure neural representations without erasing previously acquired knowledge is central to learning in dynamic environments, yet the neural mechanisms that support this balance remain unclear. Understanding these mechanisms is also critical for addressing catastrophic forgetting in artificial systems designed for lifelong learning. Here, we identify joint sparse coding and temporal dynamics in both the mouse medial prefrontal cortex (mPFC) and computational networks as mechanisms that help preserve prior representations during context transitions. Specifically, sparsity in context-dependent representations reduces cross-context interference, whereas temporal dynamics within the network activity further enhance context separability across time. Strikingly, networks endowed with both properties, such as spiking neural networks, exhibit improved retention during lifelong learning without auxiliary heuristics. These findings establish joint sparse coding and temporal dynamics as a core mechanism supporting flexible context reconfiguration in lifelong learning and, through their activity constraining nature, as an energy-efficient architectural principle for stable adaptation. Together, they provide a mechanistic framework for understanding how the brain preserves prior knowledge while flexibly adapting to new contexts.2026-05-11T08:29:00Z37 pages, 6 figures, 6 extended data figures. Preprint versionQianqian ShiYue CheFaqiang LiuHongyi LiMingkun XuSandra ReinertPieter M. GoltsteinRong ZhaoLuping Shihttp://arxiv.org/abs/2603.22705v5Detecting outliers of pursuit eye movements: a preliminary analysis of autism spectrum disorder2026-05-11T03:00:05ZBackground: Autism spectrum disorder (ASD) is characterized by significant clinical and biological heterogeneity. Conventional group-mean analyses of eye movements often mask individual atypicalities, potentially overlooking critical pathological signatures. This study aimed to identify idiosyncratic oculomotor patterns in ASD using an "outlier analysis" of smooth pursuit eye movement (SPEM).
Methods: We recorded SPEM during a slow Lissajous pursuit task in 18 adults with ASD and 39 typically developed (TD) individuals. To quantify individual deviations, we derived an "outlier score" based on the Mahalanobis distance. This score was calculated from a feature vector, optimized via Principal Component Analysis (PCA), comprising the temporal lag ($Δ$t) and the spatial deviation ($Δ$s). An outlier was statistically defined as a score exceeding $\sqrt{10}$ (approximately 3.16$σ$) relative to the TD normative distribution.
Results: While the TD group exhibited a low outlier rate of 5.1%, the ASD group demonstrated a significantly higher prevalence of 38.9% (7/18) (binomial P = 0.0034). Furthermore, the mean outlier score was significantly elevated in the ASD group (3.00 $\pm$ 2.62) compared to the TD group (1.52 $\pm$ 0.80; P = 0.002). Notably, these extreme deviations were captured even when conventional mean-based comparisons showed limited sensitivity.
Conclusions: Our outlier analysis successfully visualized the high degree of idiosyncratic atypicality in ASD oculomotor control. By shifting the focus from group averages to individual deviations, this approach provides a sensitive metric for capturing the inherent heterogeneity of ASD, offering a potential baseline for identifying clinical subtypes.2026-03-24T01:57:37Z4 pages, 2 figures, 2 video files, Supplementary Materials (2 files), Supplementary Methods and Codes in GitHubEmiko ShishidoSeiko MiyataTetsuya YamamotoMasaki FukunagaRyota HashimotoKenichiro MiuraNorio Ozakihttp://arxiv.org/abs/2509.21671v2Neuroprobe: Evaluating Intracranial Brain Responses to Naturalistic Stimuli2026-05-10T21:37:42ZHigh-resolution neural datasets enable foundation models for the next generation of brain-computer interfaces and neurological treatments. The community requires rigorous benchmarks to discriminate between competing modeling approaches, yet no standardized evaluation frameworks exist for intracranial EEG (iEEG) recordings. To address this gap, we present Neuroprobe: a suite of decoding tasks for studying multi-modal language processing in the brain. Unlike scalp EEG, intracranial EEG requires invasive surgery to implant electrodes that record neural activity directly from the brain with minimal signal distortion. Neuroprobe is built on the BrainTreebank dataset, which consists of over 40 hours of iEEG recordings from 10 human subjects performing a naturalistic movie viewing task. Neuroprobe serves two critical functions. First, it is a source from which neuroscience insights can be drawn. The high temporal and spatial resolution of the labeled iEEG allows researchers to systematically determine when and where computations for each aspect of language processing occur in the brain by measuring the decodability of each feature across time and all electrode locations. Using Neuroprobe, we visualize how information flows from key language and audio processing sites in the superior temporal gyrus to sites in the prefrontal cortex. We also demonstrate the time evolution of processing from simple auditory features (e.g., pitch and volume) to more complex language features (e.g., part of speech) in a purely data-driven manner. Second, as the field moves toward neural foundation models trained on large-scale datasets, Neuroprobe provides a rigorous framework for comparing competing architectures and training protocols. We make the code for Neuroprobe openly available, aiming to enable rapid progress in the field of iEEG foundation models. Public leaderboard: https://neuroprobe.dev/2025-09-25T22:38:53Z38 pages, 7 main figures, 16 supplementary figures, 13 tablesAndrii ZahorodniiChristopher WangGeeling ChauBennett StankovitsCharikleia MoraitakiEli GrossAlexander BradyAndrei BarbuBoris KatzIla R Fietehttp://arxiv.org/abs/2605.09770v1Encoding and Decoding Temporal Signals with Spiking Bandpass Wavelets2026-05-10T21:32:44ZSpike-based encodings are sparse and energy-efficient, but have largely been formulated probabilistically, disconnected from most signal processing literature. We recast spike encoders as time-causal wavelet frames with quantitative bandwidths and reconstruction error bounds. The proposed wavelets preserve the sparsity and locality of spiking representations, with reconstruction up to spike quantization and time discretization. We demonstrate reconstruction on ECG and audio datasets, achieving a normalized RMSE comparable to continuous wavelet transforms. The spiking wavelets map directly to neuromorphic hardware.2026-05-10T21:32:44ZJens Egholm PedersenTony LindebergPeter Gerstofthttp://arxiv.org/abs/2605.23952v1Machine Psychometrics: A Mathematical Psychology of Artificial Intelligence2026-05-10T21:15:53ZArtificial agents now generate behavior rich enough to invite trust, surprise, and concern, yet our evaluation tools still privilege capability scores over psychological structure. This paper argues that the philosophical impasse between two symmetrical errors (Artificial Mind Blindness, which dismisses psychological organization in non-biological systems, and Artificial Mind Projection, which infers human-like inner life from fluent behavior alone) can be circumvented not by resolving the consciousness question, but by introducing a disciplined measurement layer beneath it. Drawing on Michael Levin's continuum view of cognition as goal-directed competency across substrates, and on the methodological repertoire of mathematical psychology (Item Response Theory, Signal Detection Theory, Bayesian cognitive modeling, calibration analysis, cognitive-bias batteries), the paper develops Machine Psychometrics as a measurement science of latent behavioral, metacognitive, communicative, and self-modeling dispositions in artificial agents. Its operational core is the Machine Mindprint: a multidimensional, domain-bounded, versioned profile spanning calibration, source integrity, suggestibility resistance, context stability, expressive alignment, tool integrity, drift monitoring, and distributional grounding. A complementary Trust Protocol turns Mindprints into deployment decisions through probe batteries, perturbation testing, reliability and validity analysis, and longitudinal monitoring across high-stakes domains. The philosophical contribution is a third stance, Artificial Mind Discipline, that neither anthropomorphizes nor dismisses, neither presupposes consciousness nor forecloses it. The aim is not to humanize artificial agents, but to understand them precisely because they are not human, through measurement before judgment.2026-05-10T21:15:53Z45 pages, 11 figuresAlex BogdanAdrian de Valois-Franklinhttp://arxiv.org/abs/2605.09409v1Predictive and feedback signals differently shape the formation of group-level and individualized language representations2026-05-10T08:19:30ZAdults vary greatly in how effectively they learn a new language, but the signals driving the learning processes and individual differences remain unclear. Over seven days, we tracked behavioral learning and collected fMRI data from 102 adults as they learned an artificial language with corrective feedback. We trained matched transformer models with prediction, feedback, or combined objectives and compared their internal representations to brain activity. Representations derived from the prediction-focused model accounted for the largest share of unique neural variance at the group level, despite the human task being feedback-based. Throughout model training, both objectives showed a shift in brain-model alignment from sensory to higher-order language and associative networks, indicating abstraction processing. Conversely, neural patterns related to the feedback model were most useful for predicting individual generalization outcomes on Day 7. These findings support a multi-signal model of adult language learning, in which prediction shapes a common neural learning architecture across learners, whereas feedback-related mechanisms better explain individual differences over time.2026-05-10T08:19:30ZShuguang YangShaoyun YuXin JiangSuiping WangGangyi Fenghttp://arxiv.org/abs/2605.09243v1How Much is Brain Data Worth for Machine Learning?2026-05-10T00:58:21ZIf a person can solve a task, can measuring their brain make it easier to train a model to solve that task too? Recent NeuroAI work suggests that supplementing task training with neural recordings can modestly improve model performance and robustness. However, it is unclear when there should be a benefit from using neural data and how much benefit to expect. We formulate this question mathematically, and begin to address it theoretically using a simple, analytically tractable linear gaussian model of task targets and neural recordings. For a multimodal estimator trained on both brain data and task labels, we derive scaling laws for how performance scales with the numbers of brain and task samples. From these laws we derive relative value and exchange rates between brain samples and task samples, quantifying how much extra task samples neural data is worth as a function of task-brain alignment, neural and task noise, latent dimension, and brain data sample size. We also analyze test distribution shift, to identify conditions where brain-regularized learning can produce substantial robustness gains through learned invariances. Finally, under a fixed collection budget, we characterize the regimes in which brain data is worth collecting. Our results provide a foundation for understanding how valuable brain data could be for improving machine learning.2026-05-10T00:58:21Z9 pages main text, 5 figures, 34 pages of appendix with detailed proofsLane LewisZhixin WangDavid SchwabXaq Pitkowhttp://arxiv.org/abs/2605.09152v1Meow-Omni 1: A Multimodal Large Language Model for Feline Ethology2026-05-09T20:30:15ZDeciphering animal intent is a fundamental challenge in computational ethology, largely because of semantic aliasing, the phenomenon where identical external signals (e.g., a cat's purr) correspond to radically different internal states depending on physiological context. Existing Multimodal Large Language Models (MLLMs) are blind to high-frequency biological time-series data, restricting them to superficial behavioural pattern matching rather than genuine latent-state reasoning. To bridge this gap, we introduce Meow-Omni 1, the first open-source, quad-modal MLLM purpose-built for computational ethology. It natively fuses video, audio, and physiological time-series streams with textual reasoning. Through targeted architectural adaptation, we integrate specialized scientific encoders into a unified backbone and formalize intent inference via physiologically grounded cross-modal alignment. Evaluated on MeowBench, a novel, expert-verified quad-modal benchmark, Meow-Omni 1 achieves state-of-the-art intent-recognition accuracy (71.16%), substantially outperforming leading vision-language and omni-modal baselines. We release the complete open-source pipeline including model weights, training framework, and the Meow-10K dataset, to establish a scalable paradigm for inter-species intent understanding and to advance foundation models toward real-world veterinary diagnostics and wildlife conservation.2026-05-09T20:30:15ZJucheng HuZhangquan ChenYulin ChenChengjie HongLiang ZhouTairan WangSifei LiGiulio ZhuFeng ZhouYiheng ZengSuorong YangDongzhan Zhouhttp://arxiv.org/abs/2605.10994v1Internally triggered retrospective learning in neural networks2026-05-09T14:30:43ZLearning in artificial neural networks usually relies on continuous, externally driven weight updates, in which parameters are modified at every step in response to incoming data, error signals or reward feedback. In this setting, routine and informative inputs contribute similarly to parameter adjustment. We introduce a learning approach in which parameter updates are governed by internally generated events arising from the network own representational dynamics. During ongoing activity, synaptic interactions are accumulated as latent traces encoding recent coactivation patterns, without immediately modifying the underlying parameters. In parallel, an internal predictive process estimates the evolving latent state, while a scalar measure of discrepancy between predicted and observed states is continuously computed. When discrepancy exceeds an adaptive threshold derived from recent error statistics, a learning event is triggered, inducing a retrospective update selectively integrating past activity into the current configuration. We performed simulations using a minimal neural network exposed to structured sequential inputs with transient perturbations. We found that learning occurs through sparse, temporally localized events associated with increases in prediction error, leading to stepwise changes in synaptic efficacy and discrete transitions in latent state organization. By selectively reorganizing parameters in response to internally detected discrepancies, our episodic updating may reduce unnecessary parameter drift while preserving informative patterns. Potential applications include systems requiring selective adaptation to rare or informative inputs such as physiological, industrial or environmental monitoring, edge computing under limited energy budgets, autonomous systems operating in dynamic conditions and sequential computational data processing.2026-05-09T14:30:43Z13 pagews, 2 figuresArturo Tozzihttp://arxiv.org/abs/2602.02494v2MEG-XL: Data-Efficient Brain-to-Text via Long-Context Pre-Training2026-05-09T11:25:55ZClinical brain-to-text interfaces are designed for paralysed patients who cannot provide extensive training recordings. Pre-training improves data-efficient generalisation by learning statistical priors across subjects, but these priors critically depend on context. While natural speech might unfold gradually over minutes, most methods pre-train with only a few seconds of context. Thus, we propose MEG-XL, a model pre-trained with 2.5 minutes of MEG context per sample, 5-300x longer than prior work, and equivalent to 191k tokens, capturing extended neural context. Fine-tuning on the task of word decoding from brain data, MEG-XL matches supervised performance with a fraction of the data (e.g. 1hr vs 50hrs) and outperforms brain foundation models. We find that models pre-trained with longer contexts learn representations that transfer better to word decoding. Our results indicate that long-context pre-training helps exploit extended neural context that other methods unnecessarily discard. Code, model weights, and instructions are available at https://github.com/neural-processing-lab/MEG-XL .2026-02-02T18:59:50ZPublished as a conference paper at ICML 2026. 19 pages, 8 figures, 5 tablesDulhan JayalathOiwi Parker Joneshttp://arxiv.org/abs/2605.21506v1Canonical Functionalism: Defining Functional Structure without Observer-Relative Semantic Maps2026-05-09T11:08:40ZComputational functionalism about consciousness is often criticized for relying on observer-relative interpretations of physical systems. This paper proposes a mathematical refinement of functionalism that avoids this problem. The central idea is that consciousness-relevant functional organization should be identified not with arbitrary input-output mappings, semantic labels, or externally imposed computational descriptions, but with a system's canonical functional structure: the minimal state-transition structure obtained by identifying internal states that have identical future behavior under all possible continuations.
On this view, a state is functionally defined by its complete counterfactual role: how the system would evolve and respond from that state under possible future interactions. We call this position canonical functionalism. The framework does not claim to identify which systems are conscious, nor to show that functional organization is sufficient for consciousness. Rather, it identifies the canonical object over which functionalist theories of consciousness should be formulated: the task is to specify consciousness-relevant invariants, measures, or structural conditions over canonical functional structures, rather than over arbitrary semantic interpretations or superficial behavioral profiles. This reframes familiar objections about lookup tables, simulations, unfolding, and observer-relative computation: such cases do not by themselves refute functionalism, but force the functionalist to specify whether the relevant canonical structure is preserved, and if not, which additional structural features are missing.2026-05-09T11:08:40ZRyota KanaiShuqin Mahttp://arxiv.org/abs/2605.12543v1Why the Unfinished Keeps Returning: Canxianization and the Dynamics of Conscious Priority2026-05-09T08:52:04ZSome conscious contents disappear after access; others return repeatedly, long after their triggering conditions have ceased. We propose Canxianization as the process by which a perturbation becomes closure-resistant self-relevant unfinishedness and thereby acquires recurrent conscious priority. The theory distinguishes this phenomenon from emotional arousal, memory strength, the Zeigarnik effect, curiosity, prediction error, and intrusive thought. A perturbation becomes canxianized when it is attributed to the self-world boundary, value-marked, blocked from causal or action closure, and metacognitively coupled to the self-model. We distinguish latent canxian strength from observed conscious recurrence, and introduce a Recurrent Priority Index and a Canxian Update Index to separate productive from pathological recurrence. Cold Canxianization, recurrence driven by structural incompleteness rather than affective arousal, is identified as a critical discriminant. Reset Resistance and Stake Transfer tests are proposed for artificial systems. Canxianization is not memory persistence; it is failed self-world repair. The unfinished does not merely remain. When it concerns the self and resists closure, it returns.2026-05-09T08:52:04ZHengjin CaiTianqi Cai