Bayesian brain mapping: a population-informed framework for personalized functional network topography and connectivity

2026-05-04T04:52:15Z

The spatial topography of functional brain organization is increasingly recognized to play an important role in cognition and disease. Accounting for individual differences in functional topography is also crucial for accurately distinguishing spatial and temporal aspects of functional brain connectivity. Yet, accurate estimation of personalized functional brain networks from functional magnetic resonance imaging (fMRI) without extensive scanning remains challenging due to high noise levels. Here, we describe Bayesian Brain Mapping (BBM), a technique for personalized functional topography and connectivity informed by population information. BBM relies on population-derived priors on both spatial topography of networks and between-network functional connectivity to guide subject-level estimation and combat noise. These priors are based on existing spatial templates, such as parcellations or continuous network maps, providing correspondence to those templates. Yet BBM is highly flexible, avoiding strong spatial or temporal constraints and allowing for overlap between networks and heterogeneous patterns of engagement. BBM is designed for single-subject analysis, making it computationally efficient and translatable to clinical settings. Here, we describe the BBM model and illustrate the use of the BayesBrainMap R package to construct population-derived priors, fit the model, and perform inference to identify engagements. A demo is provided in an accompanying Github repo. We also share priors derived from the Human Connectome Project and provide code to support the construction of priors from different data sources, lowering the barrier to adoption of BBM for studies of individual brain organization.

Multimodal Fusion and Interpretability in Human Activity Recognition: A Reproducible Framework for Sensor-Based Modeling

2026-05-03T23:45:19Z

The research introduces a reproducible framework for transforming raw, heterogeneous sensor streams into aligned, semantically meaningful representations for multimodal human activity recognition. Grounded in the Carnegie Mellon University Multi-Modal Activity Database (CMU-MMAC) database and focused on the naturalistic Subject 07 Brownie session, the study traces the full pipeline from data ingestion to modeling and interpretation. Unlike black box preprocessing, a unified preprocessing workflow is proposed that temporally aligns video, audio, and RFID through resampling, grayscale conversion, sliding-window segmentation, and modality-specific normalization, producing standardized fused tensors suitable for downstream learning. Building on this foundation, the work systematically compares early, late, and hybrid fusion strategies using LSTM-based models implemented with PyTorch and TensorFlow, showing that late fusion consistently achieves the highest validation accuracy, with hybrid fusion outperforming early fusion. To evaluate interpretability and modality contribution, PCA and t-SNE visualizations reveal coherent temporal structure and confirm that the video carries stronger discriminative power than audio, while their combination yields substantial performance gains. Incorporating sparse, asynchronous RFID signals further improves accuracy by over 50% and boosts macro-averaged ROC-AUC, demonstrating the added value of object-interaction cues. Overall, the framework contributes a modular, empirically validated approach to multimodal fusion that links preprocessing design, fusion architecture, and interpretability, offering a transferable template for intelligent systems operating in complex, real-world activity settings.

Singular Bayesian Neural Networks

2026-05-03T22:46:39Z

Bayesian neural networks promise calibrated uncertainty but require $O(mn)$ parameters for standard mean-field Gaussian posteriors. We argue this cost is often unnecessary, particularly when weight matrices exhibit fast singular value decay. By parameterizing weights as $W = AB^{\top}$ with $A \in \mathbb{R}^{m \times r}$, $B \in \mathbb{R}^{n \times r}$, we induce a posterior that is \emph{singular} with respect to the Lebesgue measure, concentrating on the rank-$r$ manifold. This singularity captures structured weight correlations through shared latent factors, geometrically distinct from mean-field's independence assumption. We derive PAC-Bayes generalization bounds whose complexity term scales as $\sqrt{r(m+n)}$ instead of $\sqrt{m n}$, and prove loss bounds that decompose the error into optimization and rank-induced bias using the Eckart-Young-Mirsky theorem. We further adapt recent Gaussian complexity bounds for low-rank deterministic networks to Bayesian predictive means. Empirically, across MLPs, LSTMs, and Transformers on standard benchmarks, our method achieves competitive predictive performance while using up to $33\times$ fewer parameters than 5-member Deep Ensembles. It substantially improves OOD detection and often improves calibration relative to mean-field and perturbation baselines, while Deep Ensembles can still be stronger on in-distribution likelihood-based metrics.

Cost-sensitive retraining via posterior learning debt

2026-05-03T19:49:58Z

Deployed prediction systems are often retrained on fixed calendars, even when model staleness and retraining burden vary over time. This short communication formulates retraining for Bayesian prediction systems as a cost-sensitive predictive-regret decision. The central monitoring state is posterior learning debt, defined as the Kullback--Leibler divergence from a reference shadow posterior to the deployed frozen posterior. In the decision layer, a retraining cost is compared with the expected one-period predictive regret of waiting. A continuous-severity version retrains when calibrated expected regret exceeds the retraining cost, while the familiar two-state excess-loss rule is a special case. The empirical study is an exact-state proof-of-concept in a synthetic conjugate simulation with warm-started deployed and shadow normal-inverse-gamma posteriors, separate update, monitoring, and evaluation batches, lagged deployment actions, expanded baseline grids, and score-unit sensitivity. Under the primary 75th-percentile score-unit scaling, an age-adjusted debt-threshold policy improves on tuned calendar retraining in all 72 non-stable scenario cells and on tuned CUSUM in 58 of 72 cells, with mean relative objectives 0.677 and 0.975, respectively. Debt-utility and hybrid-utility policies also improve strongly over tuned calendar retraining, but they do not dominate tuned CUSUM. Median and mean score-unit sensitivities show the same main calendar result, while the CUSUM comparison remains policy-dependent. The contribution is a transparent decision layer for deployed Bayesian prediction systems, not a universal replacement for drift detection.

Deep learning-based pavement performance modeling using multiple distress indicators and road work history

2026-05-03T14:51:20Z

The deterioration of pavement is a complex and dynamic process determined by different factors including material, environment, design, and some other unobserved variables. Accurate predictions of pavement condition can help maximize the use of available resources for pavement management agencies through better coordinated preservation and maintenance activities. This paper uses deep neural networks such as the convolutional neural network (CNN) and the long short-term memory (LSTM) to model the pavement deterioration process. In this paper, pavement condition data and maintenance and rehabilitation history collected by the Texas Department of Transportation over the past 18 years were used. Twenty-one flexible pavement condition indicators, including cracking, rutting, raveling, and roughness, collected from more than 100,000 pavement sections were included in the proposed models. Promising preliminary results were obtained. Case study results show that the proposed CNN model outperforms standard machine learning models in predicting pavement condition values.

Adaptive Influence-Based Borrowing Framework for Improving Treatment Effect Estimation in RCTs Using External Controls

2026-05-03T13:34:29Z

Randomized controlled trials (RCTs) often suffer from limited sample sizes due to high costs and lengthy recruitment periods, compromising precision in treatment effect estimation. External real-world control data offer a valuable opportunity for augmentation, but naïve integration may introduce bias without careful compatibility assessment. This paper presents a practical tutorial on the adaptive influence-based borrowing framework~\citep{Yang-etal2026}, which addresses this challenge through a principled, individual-level borrowing strategy. The core intuition is straightforward: rather than indiscriminately pooling all external controls (ECs), the framework first asks how much each external patient would perturb the outcome model fitted using RCT controls. External patients whose inclusion barely changes this model are deemed comparable and prioritized for borrowing, whereas those who substantially shift it are flagged as potentially incompatible. This individual-level compatibility metric, based on the influence score, is then used to construct a sequence of nested candidate subsets of ECs, from which the optimal subset is selected by minimizing the mean squared error of the treatment effect estimator, balancing the competing risks of bias from over-borrowing and imprecision from under-borrowing. When systematic differences between ECs and RCT controls are substantial, an optional outcome calibration step can align the two groups before influence-based selection proceeds. We provide a clear, step-by-step workflow with emphasis on methodological intuition, practical considerations, and visualization, thereby offering a principled, transparent, and practical method for leveraging ECs when RCTs alone are underpowered. Implementation is supported by an accompanying \texttt{R} package InfluenceBorrowing.

The Catastrophic Consequences of Agnosticism for Life Searches and a Possible Workaround

2026-05-03T13:19:46Z

Planned and ongoing searches for life, both biological and technological, confront an epistemic barrier concerning false positives - namely, that we don't know what we don't know. The most defensible and agnostic approach is to adopt diffuse (uninformative) priors, not only for the prevalence of life, but also for the prevalence of confounders. We evaluate the resulting Bayes factors between the null and life hypotheses for an idealized experiment with $N_{pos}$ positive labels (biosignature detections) among $N_{tot}$ targets with various priors. Using diffuse priors, the consequences are catastrophic for life detection, requiring at least ${\sim}10^4$ (for some priors ${\sim}10^{13}$) surveyed targets to ever obtain "strong evidence" for life. Accordingly, an HWO-scale survey with $N_{tot}{\sim}25$ would have no prospect of achieving this goal. A previously suggested workaround is to forgo the agnostic confounder prior, by asserting some upper limit on it for example, but we find that the results can be highly sensitive to this choice - as well as difficult to justify. Instead, we suggest a novel solution that retains agnosticism: by dividing the sample into two groups for which the prevalence of life differs, but the confounder rate is global. We show that a $N_{tot}=24$ survey could expect 24% of possible outcomes to produce strong life detections with this strategy, rising to $\geq50$% for $N_{tot}\geq76$. However, AB-testing introduces its own unique challenges to survey design, requiring two groups with differing life prevalence rates (ideally greatly so) but a global confounder rate.

A Model-Based Restricted Shapley Value to Measure the Players' Contribution to Shot Actions in Football

2026-05-03T07:10:36Z

This paper proposes a novel framework to assess individual player contributions in football, explicitly accounting for the cooperative nature of shot-ending offensive actions. By incorporating team interaction into player evaluation, it also supports economically sustainable decision-making, with practical implications for performance analysis and player scouting. Extending the expected Goal (xG) paradigm, we propose the expected Goal Action (xGA), a measure of shot quality that incorporates build-up play and passing networks. Furthermore, we adapt cooperative game theory and introduce the Player's Restricted Shapley (PRS) statistic, a contribution metric based on restricted coalition structures derived from observed passing interactions, where xGA is adopted to compute the cohesion function. Unlike traditional Shapley approaches, the PRS one restricts coalitions to tactically admissible player subsets, offering action-specific, interpretable measures of marginal contribution in a cooperative context. We apply the framework to 8,421 shot-actions from the Italian League Serie A season 2022/23, and the case studies of AC Milan and SSC Napoli reveal some heterogeneity in contributions within teams. Furthermore, combining the PRS statistic with a final efficiency metric highlights the discrepancies between cooperative engagement and goal conversion.

Data-driven time-frequency tessellation for signals with oscillatory amplitude envelopes and instantaneous frequency, with application to photoplethysmograhy

2026-05-03T02:49:17Z

Biomedical signals often comprise multiple non-sinusoidal oscillatory components whose amplitude modulation (AM) and instantaneous frequency (IF) may themselves be governed by additional (second-order) oscillatory dynamics with time-varying amplitude and frequency. We introduce a novel time-frequency (TF) analysis framework, {\em Tessellation-based Ensembled Time-Frequency Representation via Integrated Shifting} (TETRIS), designed based on the proposed generalized adaptive non-harmonic model to leverage second-order oscillatory information in this class of signals. We present the model and algorithm using the photoplethysmogram (PPG) as a canonical example, whose cardiac component is known to encode respiratory information in both AM and IF, and demonstrate how respiratory signals can be recovered from PPG. The central idea of TETRIS is to partition the TF plane along the estimated IF of the cardiac component and to process each partition adaptively to enhance representation quality. This tessellation enables a refined time-frequency representation (TFR), allowing more effective recovery of the respiratory modulation governing the AM of the cardiac component. We provide theoretical justification for the proposed method and validate its performance on semi-synthetic signals. Finally, we demonstrate that TETRIS enables improved reconstruction of multiple surrogate respiratory signals directly from PPG data. While the model and algorithm are developed with a focus on PPG, the framework is flexible and has potential to be applied to other signals.

Persistent Homology of Time Series through Complex Networks

2026-05-02T22:28:42Z

We present a unified pipeline for univariate time series classification via complex networks and persistent homology. A time series is mapped to a graph through one of five constructions across three families (visibility (natural and horizontal visibility graphs), transition, and proximity) and the graph is converted to a dissimilarity matrix from which a Vietoris-Rips filtration yields persistence diagrams. These diagrams are vectorized into fixed-length features through persistence landscapes and topological summary statistics. By standardizing the downstream processing, differences in classification performance are attributable to the network construction and distance metric alone. Experiments on twelve UCR benchmarks show that (i) no single construction dominates: the optimal graph type depends on the signal's discriminative structure; (ii) the graph distance metric is a first-order design choice, with diffusion distance uniformly outperforming shortest-path alternatives; and (iii) persistence-based features degrade gracefully under noise, consistent with the classical stability theorem of persistent homology.

Threshold Exceedance Estimation in Spatially Correlated Areal Data Using Maxima-Nominated Sampling

2026-05-02T21:33:38Z

We study estimation of the proportion of areal units in a spatially correlated domain whose success probabilities exceed a prespecified threshold. Such problems arise in health surveillance, environmental monitoring, and social policy, where the goal is to estimate the fraction of high-risk areas. We propose a DUST-MNS design that combines maxima-nominated sampling (MNS) with the probability-proportional-to-size dependent unit sequential technique (pps-DUST), thereby promoting spatial spread while mitigating the effect of spatial autocorrelation. The design forms $n$ candidate sets of size $k$ and obtains final measurements only from the area judged to be at highest risk in each set, yielding $n$ measured areas from $nk$ screened candidates. Ranking may be based on expert judgment, prior surveys, or easily obtained auxiliary covariates. We derive a closed-form estimator of the exceedance probability $θ$ based on data from DUST-MNS design, establish its bias and variance, and show that, in the rare-to-moderate exceedance regime $θ<θ^\star(k)$, the proposed DUST-MNS estimator outperforms its SRS and DUST-SRS counterparts, where $θ^\star(k)$ depends only on $k$. We also provide guidance on the choice of $k$, derive efficiency bounds under a Beta model, extend the method to imperfect ranking, and develop variance estimation and bootstrap confidence intervals. An application to county-level stroke prevalence data from CDC PLACES, using diabetes prevalence as the ranking concomitant, illustrates the proposed approach.

The Partial Testimony of Logs: Evaluation of Language Model Generation under Confounded Model Choice

2026-05-02T07:55:55Z

Offline evaluation of language models from usage logs is biased when model choice is confounded: the same user-side factors that influence which model is used can also influence how its output is judged, so raw comparisons of logged scores mix self-selected populations rather than estimating a common quantity of interest. A small randomized experiment can break this bias by overriding model choice, but in practice such experiments are scarce and costly. We study a three-source design that combines a large confounded observational log (OBS) for scale, a small randomized experiment (EXP) for unconfounded scoring, and an offline simulator (SIM) that replays candidate models on cached contexts. Our main result is an identification theorem showing that the randomized experiment and the simulator are together enough to recover causal model values; the observational log enters only afterward, to reduce estimation error rather than to make the causal comparison valid. Six estimator families are evaluated in a controlled semi-synthetic validation and in two real-task cached benchmarks for summarization and coding. No family dominates every regime; relative performance depends on the amount of unbiased EXP supervision and on how closely the target reward aligns with OBS-derived structure.

Factor State Space Modelling of the Ornstein-Uhlenbeck Process with Measurement Error and its Application

2026-05-02T05:41:48Z

Standard Ornstein-Uhlenbeck (OU) models often yield biased parameter estimates when measurement error is ignored. While the Ornstein-Uhlenbeck State Space Model (OUSSM) addresses this in univariate settings, multidimensional extensions remain limited. This paper introduces the factor OUSSM to model multi-dimensional, mean-reverting systems with observational noise. We resolve critical identifiability challenges in parameter estimation by establishing necessary constraints and validating the method through extensive simulations. We demonstrate the model's versatility by analyzing human gut microbiome dynamics and North Atlantic Sea Surface Temperature (SST) data. The results reveal distinct latent temporal structures in both biological and environmental systems, establishing the factor OUSSM as a robust framework for multivariate time series analysis.

GEEPERs: Principal Stratification using Principal Scores and Stacked Estimating Equations

2026-05-01T20:45:44Z

Principal stratification is a framework for making sense of causal effects conditioned on variables that may themselves have been affected by the treatment. For instance, in an evaluation of an educational intervention, some subjects in the treatment group may not fully utilize the intervention, and researchers may be interested in how this subgroup is affected. Most principal stratification estimators rely on strong structural or modeling assumptions and often require advanced statistical training to fit and evaluate, making them inaccessible to many applied researchers. In this paper, we introduce a new principal effect estimator for one-way noncompliance based on a binary indicator. Estimates may be computed using conventional regression methods (though the standard errors require a specialized sandwich estimator) and do not rely on distributional assumptions. We present a simulation study that demonstrates the novel method's greater robustness compared to popular alternatives and illustrate the method through a real-data analysis.

High-Dimensional Multivariate VAR Estimation with Spatio-Temporal Structure

2026-05-01T17:49:01Z

High-dimensional vector autoregressive (VAR) models provide a flexible framework for characterizing dynamic dependence in multivariate spatio-temporal systems, but their unrestricted estimation becomes infeasible when multiple variables are observed over many spatial locations. This paper develops a structured estimation procedure for high-dimensional multivariate VAR processes that explicitly incorporates spatial information. We decompose each block transition matrix into a cross-variable dependence coefficient and a spatial transition matrix, and constrain the spatial transition matrices through a pre-specified spatial graph. The resulting estimator is formulated as a weighted $\ell_1$-regularized least-squares problem, where the weights encode spatial proximity or topological similarity and induce stronger shrinkage on spatially implausible interactions. Since the objective function is bi-convex, we estimate the cross-variable dependence matrix and the spatial transition matrices through an alternating convex-search algorithm implemented with ADMM. Under stability and restricted-eigenvalue-type conditions for high-dimensional VAR processes, we establish convergence to a blockwise stationary point in the subgradient sense and derive high-probability estimation error bounds for both components of the model. Simulation studies demonstrate that the proposed estimator accurately recovers sparse transition structures and improves over existing two-step $\ell_1$-regularized methods in support recovery and estimation accuracy. An application to North American climate data illustrates that the method recovers interpretable variable-dependence networks and spatial interaction patterns across different climate regions.