https://arxiv.org/api/hc4pNq1AdemvACsxET3NWKVljqc 2026-06-19T02:32:08Z 23582 510 15 http://arxiv.org/abs/2605.12760v1 How long should a block be? 2026-05-12T21:14:09Z

The block maximum method, which is widely used in extreme value analysis, uses a generalized extreme value distribution to approximate that of the maximum of m observations. The quality of this approximation depends on the value of m and may be poor if m is too small. Surprisingly little attention has been paid to the choice of the block length, although a good choice is crucial to the success of the method. In this paper we assess the effect of taking excessively long blocks in terms of asymptotic relative efficiency, and propose likelihood-based approaches and graphical diagnostics to determine whether a proposed block length is suitable, allowing for potential rounding and left-censoring of observations. We investigate our ideas using simulation and illustrate them using wind speed, river flow and rainfall data.

2026-05-12T21:14:09Z 18 pages, plus supplementary material Léo R. Belzile Anthony C. Davison http://arxiv.org/abs/2503.17606v2 Combining longitudinal cohort studies to examine cardiovascular risk factor trajectories across the adult lifespan 2026-05-12T17:07:47Z

We introduce a statistical framework for combining data from multiple large longitudinal cardiovascular cohorts to enable the study of long-term cardiovascular health starting in early adulthood. Using data from seven cohorts belonging to the Lifetime Risk Pooling Project (LRPP), we present a Bayesian hierarchical multivariate approach that jointly models multiple longitudinal risk factors over time and across cohorts. Because few cohorts in our project cover the entire adult lifespan, our strategy uses information from all risk factors to increase precision for each risk factor trajectory and borrows information across cohorts to fill in unobserved risk factors. We develop novel diagnostic testing and model validation methods to ensure that our model robustly captures and maintains critical relationships over time and across risk factors. Our modeling reveals substantial age-related variation in risk factor trajectories, with patterns that differ across life stages, subgroups, and cohorts, thereby highlighting key periods for cardiovascular prevention and monitoring. Keywords: Bayesian hierarchical models; Missing data; Model validation; Multiple imputation; Random effects.

2025-03-22T01:21:13Z Zeynab Aghabazaz Michael J Daniels Hongyan Ning Donald M. Lloyd-Jones Juned Siddique http://arxiv.org/abs/2604.16642v2 Geometric coherence of single-cell CRISPR perturbations reveals regulatory architecture and predicts cellular stress 2026-05-12T17:01:20Z

Genome engineering has achieved remarkable sequence-level precision, yet predicting the transcriptomic state that a cell will occupy after perturbation remains an open problem. Single-cell CRISPR screens measure how far cells move from their unperturbed state, but this effect magnitude ignores a fundamental question: do the cells move together? Two perturbations with identical magnitude can produce qualitatively different outcomes if one drives cells coherently along a shared trajectory while the other scatters them across expression space. We introduce a geometric stability metric, Shesha, that quantifies the directional coherence of single-cell perturbation responses as the mean cosine similarity between individual cell shift vectors and the mean perturbation direction. Across five CRISPR datasets (2,200+ perturbations spanning CRISPRa, CRISPRi, and pooled screens), stability correlates strongly with effect magnitude (Spearman $ρ=0.75-0.97$), with a calibrated cross-dataset correlation of 0.97. Crucially, discordant cases where the two metrics decouple expose regulatory architecture: pleiotropic master regulators such as CEBPA and GATA1 pay a "geometric tax," producing large but incoherent shifts, while lineage-specific factors such as KLF1 produce tightly coordinated responses. After controlling for magnitude, geometric instability is independently associated with elevated chaperone activation (HSPA5/BiP; $ρ_{partial}=-0.34$ and $-0.21$ across datasets), and the high-stability/high-stress quadrant is systematically depleted. The magnitude-stability relationship persists in scGPT foundation model embeddings, confirming it is a property of biological state space rather than linear projection. Perturbation stability provides a complementary axis for hit prioritization in screens, phenotypic quality control in cell manufacturing, and evaluation of in silico perturbation predictions.

2026-04-17T19:01:05Z Prashant C. Raju http://arxiv.org/abs/2605.12248v1 Time-variant reliability using time-dependent surrogate models 2026-05-12T15:15:14Z

Time-variant reliability analysis is a critical task for ensuring the safety of engineering dynamical systems subjected to stochastic excitations. However, assessing failure probability for realistic systems with Monte-Carlo simulation-based methods is often computationally intractable due to the high cost of the underlying models and the large number of simulations required. While surrogate models such as polynomial chaos expansions or Kriging are well-established for time-invariant reliability problems, their direct application to time-dependent systems remains challenging. This chapter introduces two advanced surrogate modeling frameworks designed specifically for dynamical systems: manifold-NARX (mNARX) and functional NARX (F-NARX). The mNARX approach constructs the surrogate on a reduced-order manifold of auxiliary state variables, enabling the efficient handling of high-dimensional inputs by embedding physical insight into a regression formulation. Conversely, the F-NARX framework exploits the functional nature of system trajectories, extracting principal component features from continuous time windows to mitigate issues associated with discrete lag selection and long-memory effects. We demonstrate the efficacy of these methods on two benchmark reliability problems: a stochastic quarter-car model and a hysteretic Bouc-Wen oscillator. The results highlight that, when combined with suitably biased experimental designs, both frameworks accurately capture the tail behavior of the system response, enabling precise and efficient estimation of first-passage probabilities.

2026-05-12T15:15:14Z Stefano Marelli Styfen Schär Bruno Sudret http://arxiv.org/abs/2501.16931v2 Beyond Point Estimates: Distributional Uncertainty in Machine Learning Performance Evaluation 2026-05-12T14:55:21Z

Machine learning models are often evaluated using point estimates of performance metrics such as accuracy, F1 score, or mean squared error. Such summaries fail to capture the inherent variability induced by stochastic elements of the training process, including data splitting, initialization, and hyperparameter optimization. This work proposes a distributional perspective on model evaluation by treating performance metrics as random quantities rather than fixed values. Instead of focusing solely on aggregate measures, empirical distributions of performance metrics are analyzed using quantiles and corresponding confidence intervals. The study investigates point and interval estimation of quantiles based on real-data use cases for classification and regression tasks, complemented by simulation studies for validation. Special emphasis is placed on small sample sizes, reflecting practical constraints in machine learning, where repeated training is computationally expensive. The results show that meaningful statistical inference on the underlying performance distribution is feasible even with sample sizes in the range of 10-25, while standard nonparametric confidence interval remain applicable under these conditions. The proposed approach provides a more detailed characterization of variability and uncertainty compared to mean-based evaluation and enables a more differentiated comparison of models. In particular, it supports a risk-oriented interpretation of model performance, which is relevant in applications where reliability is critical. The presented methods are easy to implement and broadly applicable, making them a practical extension to standard performance evaluation procedures in machine learning.

2025-01-28T13:21:34Z 21 pages, 9 figures Christoph Lehmann Yahor Paromau http://arxiv.org/abs/2503.15821v4 Temporal Point Process Modeling of Aggressive Behavior Onset in Psychiatric Inpatient Youths with Autism 2026-05-12T14:49:30Z

Aggressive behavior, including aggression towards others and self-injury, occurs in up to 80% of children and adolescents with autism, making it a leading cause of behavioral health referrals and a major driver of healthcare costs. Predicting when autistic youth will exhibit aggression can be challenging due to their communication difficulties. Many are minimally verbal or have poor emotional insight. Recent advances in Machine Learning and wearable biosensing demonstrate the ability to predict aggression within a limited future window (typically one to three minutes) in autistic individuals. However, existing works don't estimate aggression onset probability or the expected number of aggression onsets over longer periods, nor do they provide interpretable insights into onset dynamics. To address these limitations, we apply Temporal Point Processes (TPPs) - particularly self-exciting Hawkes processes - to model the timing of aggressive behavior onsets in psychiatric inpatient autistic youth. We benchmark several TPP models by evaluating their goodness-of-fit and predictive metrics. Our results demonstrate that self-exciting TPPs more accurately captures the irregular and clustered nature of aggression onsets, especially compared to traditional Poisson models. These incipient findings suggest that TPPs can provide interpretable, probabilistic forecasts of aggression onset along a time continuum, supporting future clinical decision-making and preemptive intervention.

2025-03-20T03:12:54Z Accepted to Nature Scientific Reports. Updated results on Hawkes Process with Power Law intensity, and made stricter conditions for sampling evaluation points in the Mean Absolute Percent Error and ROC-AUC calculations. Small notation discrepancies fixed Michael Potter Michael Everett Ashutosh Singh Georgios Stratis Yuna Watanabe Ahmet Demirkaya Deniz Erdogmus Tales Imbiriba Matthew S. Goodwin 10.1038/s41598-026-46996-8 http://arxiv.org/abs/2605.12577v1 Circula-based multivariate distributions on the flat torus, with applications in structural biology 2026-05-12T13:15:22Z

Modeling dependencies between random variables independently from their marginals is fundamental in applications ranging from finance to (structural) biology. In this work, we undertake this problem using circula to model data living on the $d$-dimensional flat torus $\mathbb{T}^d$, making two contributions. First, using a low rank covariance structure to define circulae based on a latent variable model, we design the first closed-form normalized distribution on the flat torus $\mathbb{T}^d$--with covariance structure. Second, building on this framework, we propose the first models for joint distributions of torsion angles (backbone and side-chains) for neighboring amino-acids in proteins. In practice, we fit mixtures on flat torii from $\mathbb{T}^{2}$ to $\mathbb{T}^{14}$, and show they are SOTA in terms of likelihood and sparsity. We anticipate that these models will prove fundamental to move from discrete structural studies like in AlphaFold2, to thermodynamics and kinetics, which are the ultimate goals in theoretical biophysics.

2026-05-12T13:15:22Z Guillaume Carrière Alix Lhéritier Frédéric Cazals http://arxiv.org/abs/2605.12089v1 Power Studies For Two-Sample and Goodness-of-Fit Methods For Multivariate Data 2026-05-12T13:10:26Z

We present the results of a large number of simulation studies regarding the power of various goodness-of-fit as well as non-parametric two-sample tests for multivariate data. In two dimensions this includes both continuous and discrete data, in higher dimensions continuous data only. In general no single method can be relied upon to provide good power, any one method may be quite good for some combination of null hypothesis and alternative and may fail badly for another. Based on the results of these studies we propose a fairly small number of methods chosen such that for any of the case studies included here at least one of the methods has good power. The studies were carried out using the R packages MD2sample and MDgof, available from CRAN.

2026-05-12T13:10:26Z Wolfgang Rolke http://arxiv.org/abs/2605.11987v1 Random-Set Graph Neural Networks 2026-05-12T11:38:13Z

Uncertainty quantification has become an important factor in understanding the data representations produced by Graph Neural Networks (GNNs). Despite their predictive capabilities being ever useful across industrial workspaces, the inherent uncertainty induced by the nature of the data is a huge mitigating factor to GNN performance. While aleatoric uncertainty is the result of noisy and incomplete stochastic data such as missing edges or over-smoothing, epistemic uncertainty arises from lack of knowledge about a system or model (e.g., a graph's topology or node feature representation), which can be reduced by gathering more data and information. In this paper, we propose an original new framework in which node-level epistemic uncertainty is modelled in a belief function (finite random set) formalism. The resulting Random-Set Graph Neural Networks have a belief-function head predicting a random set over the list of classes, from which both a precise probability prediction and a measure of epistemic uncertainty can be obtained. Extensive experiments on 9 different graph learning datasets, including real-world autonomous driving benchmarks as such Nuscene and ROAD, demonstrate RS-GNN's superior uncertainty quantification capabilities

2026-05-12T11:38:13Z 23 pages, 6 figures Tommy Woodley Shireen Kudukkil Manchingal Matteo Tolloso Davide Bacciu Fabio Cuzzolin http://arxiv.org/abs/2605.11926v1 An ensemble prediction method for forecasting sap flux density and water-use in temperate trees 2026-05-12T10:39:37Z

Efficient irrigation management is crucial to agriculture, forestry and horticulture, especially under climate change. Developments in novel sensors and Internet of Things technology provide an opportunity to carry out real-time monitoring of tree sap flux density, which, when coupled with advanced modelling techniques, enables online prediction of tree water-use suitable for irrigation planning. This manuscript proposes one such pipeline that integrates tree sap flow sensors, weather station sensors, and statistical models to predict tree daily water-use. In particular, an ensemble prediction approach based on additive models has been developed, using weather data as the main predictors of sap flux density. The method simultaneously considers the non-linear relationships and interactions between sap flux density and its environmental drivers, as well as the variability among individual trees over different growing seasons. Using field data collected on nine species of trees over the 2022, 2023 and 2024 growing seasons, this manuscript demonstrates the ability of the proposed ensemble prediction method in producing reliable daily water-use forecasts. The challenge of predicting tree water-use under climate stress, such as heatwaves, and the impact of tree sizes on prediction have also been discussed. Despite the complexity of the problem, the proposed method provides a general framework which can be used in a variety of settings, from commercial tree growers to conversation work. The model can be integrated into an online monitoring platform, assisting real-time decision making on irrigation management.

2026-05-12T10:39:37Z Main manuscript: 18 pages, 6 figures. Supplementary document: 11 pages, 10 figures Mengyi Gong Rebecca Killick Andrew Hirons http://arxiv.org/abs/2605.11684v1 Partial Model Sharing Improves Byzantine Resilience in Federated Conformal Prediction 2026-05-12T07:42:20Z

We propose a Byzantine-resilient federated conformal prediction (FCP) method that leverages partial model sharing, where only a subset of model parameters is exchanged each round. Unlike existing robust FCP approaches that primarily harden the calibration stage, our method protects both the federated training and conformal calibration phases. During training, partial sharing inherently restricts the attack surface and attenuates poisoned updates while reducing communication. During calibration, clients compress their non-conformity scores into histogram-based characterization vectors, enabling the server to detect Byzantine clients via distance-based maliciousness scores and to estimate the conformal quantile using only benign contributors. Experiments across diverse Byzantine attack scenarios show that the proposed method achieves closer-to-nominal coverage with substantially tighter prediction intervals than standard FCP, establishing a robust and communication-efficient approach to federated uncertainty quantification.

2026-05-12T07:42:20Z 5 pages, 4 figures, Accepted for presentation at the 34th European Signal Processing Conference (EUSIPCO 2026) in Bruges, Belgium Ehsan Lari Reza Arablouei Stefan Werner http://arxiv.org/abs/2605.11614v1 Fairness Testing for Algorithmic Pricing 2026-05-12T06:43:46Z

Algorithmic systems now set prices across auto insurance, credit, and lending markets, and regulators increasingly require firms to demonstrate that these systems do not discriminate against protected groups. The standard audit regresses pricing output on a protected attribute and legitimate rating factors, then tests the resulting coefficient using ordinary least squares standard errors. We show that this approach is structurally invalid. Pricing algorithms are usually deterministic, so residuals reflect approximation error rather than sampling variability, rendering classical standard errors invalid in both direction and magnitude. We derive correct asymptotic variance estimators for OLS and GLM audit regressions and the correct cross-covariance formula for proxy discrimination testing. Applied to quoted premiums from 34 Illinois auto insurers, every insurer fails the conditional demographic parity test, with minority zip codes paying $34-$158 more per year than comparable-risk white zip codes. The standard proxy discrimination formula flags zero insurers. However, our corrected formula identifies all 34 as statistically significant, of which 16 exceed the substantive threshold. Our framework provides statistically valid audit tools for any deterministic algorithmic system subject to regression-based fairness testing.

2026-05-12T06:43:46Z Fei Huang Giles Hooker http://arxiv.org/abs/2605.11531v1 Generative climate downscaling enables high-resolution compound risk assessment by preserving multivariate dependencies 2026-05-12T04:56:59Z

Physics-based climate projections using general circulation models are essential for assessing future risks, but their coarse resolution limits regional decision-making. Statistical downscaling can efficiently add detail, yet many methods treat variables independently, degrading inter-variable relationships that govern compound hazards such as heat stress, drought, and wildfire. Here we show that a diffusion-based multivariate generative framework, combined with bias correction, recovers degraded inter-variable correlations even under a 50$\times$ increase in linear resolution. When applied to five meteorological variables over Japan, the framework reduces inter-variable correlation errors by more than fourfold relative to existing baselines while improving both univariate and spatial accuracy, leading to more accurate detection of severe drought. These results demonstrate that multivariate generative downscaling improves the reliability of compound risk assessment under large resolution gaps.

2026-05-12T04:56:59Z Takuro Kutsuna Noriko N. Ishizaki Norihiro Oyama Hiroaki Yoshida http://arxiv.org/abs/2605.11394v1 Spatial Adapter: Structured Spatial Decomposition and Closed-Form Covariance for Frozen Predictors 2026-05-12T01:29:56Z

We present the Spatial Adapter, a parameter-efficient post-hoc layer that equips any frozen first-stage predictor with a structured spatial representation of its residual field and an induced closed-form spatial covariance. The adapter operates as a cascade second stage on residuals, jointly learning a spatially regularized orthonormal basis and per-sample scores via a tractable mini-batch ADMM procedure, without modifying any first-stage parameter. Because the first-stage parameters are frozen, the adapter does not retrain the backbone; its role is to supply a compressed distributional summary of the residual field. Smoothness, sparsity, and orthogonality together turn a generic low-rank factorization into an identifiable spatial representation whose induced residual covariance admits a closed-form low-rank-plus-noise estimator; the effective rank is determined data-adaptively by spectral thresholding, while the nominal rank K is an optimization-side upper bound only. This covariance enables kriging-style spatial prediction at unobserved locations, with plug-in uncertainty quantification as a secondary downstream use. Across synthetic data, Weather2K for spatial-holdout prediction, and GWHD patch grids as a basis-transferability diagnostic, the adapter recovers residual spatial structure when paired with frozen first stages from linear models to deep spatiotemporal and vision backbones; the added representation uses fewer than K(N+T) parameters alongside a compact residual-trend network.

2026-05-12T01:29:56Z Preprint. 10 pages main text, with appendices Wen-Ting Wang Wei-Ying Wu Hao-Yun Huang Xuan-Chun Wang http://arxiv.org/abs/2605.11371v1 Statistical evaluation of measurement precision in linear dose-response relationships via interlaboratory studies 2026-05-12T00:48:38Z

This paper proposes a framework for evaluating the statistical precision of measurement methods from interlaboratory studies where the outcome is a dose-response relationship summarized by a regression line. For such measurement methods, where a linear mixed-effects model is applied that allows laboratories to differ in both baseline level and dose-response slope, we define precision evaluation metrics specified in ISO 5725, repeatability and between-laboratory variances. These are method-level precision metrics, and the latter are constructed as design-averaged dose-specific between-laboratory variances over the dose levels and the participating laboratories. For fully balanced designs with common dose levels and equal replication, we obtain an exact decomposition of the total sum of squares, closed-form analysis of variance (ANOVA) estimators of the precision variances, and three associated $F$-tests targeting (i) the overall dose-response trend, (ii) homogeneity of intercepts, and (iii) homogeneity of slopes across laboratories. This formulation enables precision to be quantified and estimated directly and supports an evaluation of whether between-laboratory discrepancies are caused primarily by baseline shifts or by differences in sensitivity, in contrast to fixed-effect comparisons that only detect the presence of differences. Furthermore, we analyze data obtained from an interlaboratory study on observations in bronchoalveolar lavage fluid from experiments involving the intratracheal administration of nanomaterials to rats, using the proposed method as a case study.

2026-05-12T00:48:38Z Jun-ichi Takeshita Yuto Ikeuchi Tomomichi Suzuki