https://arxiv.org/api/7dsNkQEAN7Zmg8ImaCkn+6lkxVs 2026-06-21T20:41:24Z 23582 750 15 http://arxiv.org/abs/2604.21647v1 Exploring climate change effects on concurrent floods and concurrent droughts via statistical deep learning 2026-04-23T13:09:44Z

Concurrent floods and concurrent droughts in nearby catchments pose challenges to risk assessment and water management. Climate change is affecting extremely high and low discharge, but the complex interplay between changes in individual catchments and in the dependence across catchments make it difficult to provide accurate assessments of the occurrence probabilities of concurrent extremes. In this work, we use a contemporary statistical deep learning model (the deep SPAR framework) to capture concurrent river floods and droughts in four catchments in the Upper Danube basin, based on discharge simulated by a hydrological model driven with large ensemble climate model output. The statistical model is able to accurately capture the multivariate extremes of the simulated discharge, which we assess by making use of the large available sample size. We subsequently use our statistical model to study changes in joint tail behaviour of discharge over time, finding that both compound flooding and drought-like conditions are becoming increasingly likely towards the end of the 21st century under a high-emission scenario. In particular, our results highlight that changes in the dependence structure of extremes strongly contribute to the detected changes, an aspect that would be difficult to capture with traditional approaches. This work paves the way for highly flexible, general inference on compound extremes in hydrological applications, and demonstrates key advantages of using statistical deep learning in this setting.

2026-04-23T13:09:44Z C. J. R. Murphy-Barltrop J. Richards B. Poschlod A. Sasse J. Zscheischler http://arxiv.org/abs/2604.21545v1 Informed Asymmetric Dirichlet Priors for Multivariate Bernoulli Mixture Models 2026-04-23T11:11:41Z

Clustering multivariate binary data is of interest in many scientific fields, including ecology, biomedicine, and social policy. Beyond heuristic clustering algorithms, such data can be modelled using multivariate Bernoulli mixture models. Many Bayesian implementations of these models involve a trade-off between computational efficiency and full posterior inference. We propose instead a Bayesian approach able to provide both aspects. The method fixes the total number of components to a large value and employs an asymmetric Dirichlet prior on the mixture weights. The asymmetric Dirichlet hyperparameters are elicited using the popular Penalized Complexity prior framework, which provides an intuitive way for users to inform the induced distribution of the number of clusters. An efficient MCMC algorithm is then developed to fit the model. Simulations and real-world applications demonstrate that the method is competitive with existing alternatives and can outperform them in certain settings. The proposal is illustrated using an ecological dataset about presence-absence of species across multiple sites, where cluster-specific parameters are modelled on the basis of environmental conditions. Overall, the proposed method provides a computationally efficient, fully Bayesian, and interpretable framework for clustering multivariate binary data, with potential applications across diverse scientific domains.

2026-04-23T11:11:41Z 44 pages, 11 figures Luisa Ferrari Maria Franco Villoria Garritt L. Page Alex Laini http://arxiv.org/abs/2604.21498v1 Analyzing directional errors in spatial orientation using nonparametric circular regression with mixed covariates 2026-04-23T10:01:03Z

Spatial orientation is a fundamental cognitive skill that relies on sensory information to update perceived direction. Understanding how sensory conditions influence directional accuracy is important for both cognitive science and the design of assistive technologies. We analyze experimental data in which blind, low-vision, and sighted participants performed spatial updating tasks under five sensory conditions, with signed angular error as the response. To model these data, we propose a nonparametric circular regression framework that accommodates both continuous and categorical predictors via a product-kernel estimator. Bandwidth selection is crucial in this setting, yet developing practical data-driven methods remains challenging. We derive asymptotic bias and variance expressions for the estimator, though these results do not directly lead to a feasible plug-in bandwidth selector. To address this, we develop a bootstrap bandwidth selection criterion tailored to the cosine loss and compare it with cross-validation and rule-of-thumb approaches in simulation studies. Applied to the spatial updating data, the proposed framework reveals nonlinear, condition-specific patterns and quantifies uncertainty via simultaneous bootstrap confidence bands. Across the scenarios considered, the proposed bootstrap selector achieves a favorable bias-variance trade-off and yields stable inference relative to the competing methods. An implementation is available in the R package circMixedReg.

2026-04-23T10:01:03Z 33 pages, 13 figures, 3 tables Mario Francisco-Fernández Andrea Meilán-Vila http://arxiv.org/abs/2604.21491v1 Benchmarking the Utility of Privacy-Preserving Cox Regression Under Data-Driven Clipping Bounds: A Multi-Dataset Simulation Study 2026-04-23T09:53:15Z

Differential privacy (DP) is a mathematical framework that guarantees individual privacy; however, systematic evaluation of its impact on statistical utility in survival analyses remains limited. In this study, we systematically evaluated the impact of DP mechanisms (Laplace mechanism and Randomized Response) with data-driven clipping bounds on the Cox proportional hazards model, using 5 clinical datasets ($n = 168$--$6{,}524$), 15 levels of $\varepsilon$ (0.1--1000), and $B = 1{,}000$ Monte Carlo iterations. The data-driven clipping bounds used here are observed min/max and therefore do not provide formal $\varepsilon$-DP guarantees; the results represent an optimistic lower bound on utility degradation under formal DP. We compared three types of input perturbations (covariates only, all inputs, and the discrete-time model) with output perturbations (dfbeta-based sensitivity), using loss of significance rate (LSR), C-index, and coefficient bias as metrics. At standard DP levels ($\varepsilon \leq 1$), approximately 90% (90--94%) of the significant covariates lost significance, even in the largest dataset ($n = 6{,}524$), and the predictive performance approached random levels (test C-index $\approx 0.5$) under many conditions. Among the input perturbation approaches, perturbing only covariates preserved the risk-set structure and achieved the best recovery, whereas output perturbation (dfbeta-based sensitivity) maintained near-baseline performance at $\varepsilon \geq 5$. At $n \approx 3{,}000$, the significance recovered rapidly at $\varepsilon = 3$--10; however, in practice, $\varepsilon \geq 10$ (for predictive performance) to $\varepsilon \geq 30$--60 (for significance preservation) is required. In the moderate-to-high $\varepsilon$ range, false-positive rates increased for variables whose baseline $p$-values were near the significance threshold.

2026-04-23T09:53:15Z 11 pages, 6 figures, 5 tables. Supplementary material (5 pages, 2 figures, 3 tables) included as ancillary file. Submission to IEEE Journal of Biomedical and Health Informatics (J-BHI) Keita Fukuyama Yukiko Mori Tomohiro Kuroda Hiroaki Kikuchi http://arxiv.org/abs/2506.04292v3 GARG-AML against Smurfing: A Scalable and Interpretable Graph-Based Framework for Anti-Money Laundering 2026-04-23T09:21:51Z

Purpose: We introduce GARG-AML, a fast and transparent graph-based method to catch `smurfing', a common money-laundering tactic. It assigns a single, easy-to-understand risk score to every account in both directed and undirected networks. Unlike overly complex models, it balances detection power with the speed and clarity that investigators require. Methodology: The method maps an account's immediate and secondary connections (its second-order neighbourhood) into an adjacency matrix. By measuring the density of specific blocks within this matrix, GARG-AML flags patterns that mimic smurfing behaviour. We further boost the model's performance using decision trees and gradient-boosting classifiers, testing the results against current state-of-the-art on both synthetic and open-source data. Findings: GARG-AML matches or beats state-of-the-art performance across all tested datasets. Crucially, it easily processes the massive transaction graphs typical of large financial institutions. By leveraging only the adjacency matrix of the second-order neighbourhood and basic network features, this work highlights the potential of fundamental network properties towards advancing fraud detection. Originality: The originality lies in the translation of human expert knowledge of smurfing directly into a simple network representation, rather than relying on uninterpretable deep learning. Because GARG-AML is built expressly for the real-world business demands of scalability and interpretability, banks can easily incorporate it in their existing AML solutions.

2025-06-04T11:30:37Z Bruno Deprez Bart Baesens Tim Verdonck Wouter Verbeke http://arxiv.org/abs/2604.21457v1 Context-Aware Displacement Estimation from Mobile Phone Data: A Methodological Framework 2026-04-23T09:14:11Z

Timely population displacement estimates are critical for humanitarian response during disasters, but traditional surveys and field assessments are slow. Mobile phone data enables near real-time tracking, yet existing approaches apply uniform displacement definitions regardless of individual mobility patterns, misclassifying regular commuters as displaced. We present a methodological framework addressing this through three innovations: (1) mobility profile classification distinguishing local residents from commuter types, (2) context-aware between-municipality displacement detection accounting for expected location by user type and day of week, and (3) operational uncertainty bounds derived from baseline coefficient of variation with a disaster adjustment factor, intended for humanitarian decision support rather than formal statistical inference. The framework produces three complementary metrics scaled to population with uncertainty bounds: displacement rates, origin-destination flows, and return dynamics. An Aparri case study following Super Typhoon Nando (2025, Philippines) applies the framework to vendor-provided daily locations from Globe Telecom. Context-aware detection reduced estimated between-municipality displacement by 1.6-2.7 percentage points on weekdays versus naive methods, attributable to the commuter exception but not independently validated. The method captures between-municipality displacement only. Within-municipality evacuation falls outside scope. The single-case demonstration establishes proof of concept. External validity requires application across multiple events and locations. The framework provides humanitarian actors with operational displacement information while preserving individual privacy through aggregation.

2026-04-23T09:14:11Z 24 pages, 4 figures, 14 tables. Case study: Super Typhoon Nando, Philippines (2025) Rajius Idzalika Muhammad Rheza Muztahid Radityo Eko Prasojo http://arxiv.org/abs/2604.21372v1 Optimal basis risk weighting in expectile-based parametric insurance 2026-04-23T07:35:29Z

Parametric insurance contracts translate index measurements to compensation for policyholders' losses using predefined payment schemes. These need to be designed carefully to keep basis risk, i.e. the disparity between payouts and true damages, small. Previous research has motivated the use of conditional expectiles as payment schemes, whose compensation is impacted by the policyholder's potentially unknown attitude towards basis risk. To alleviate this model uncertainty and to investigate the impact of (hidden) influencing factors, we characterize existence and uniqueness of the optimal basis risk weighting in a utility-maximization framework through a set of boundary conditions. In the absence of an optimal solution, we provide comparisons to the utility of no insurance and full indemnity coverage. We establish a link between location-scale distributions and separability of conditional expectiles' derivatives, thus improving the understanding of these statistical functionals. A simulation study on parametric hurricane insurance visualizes our results, investigates the influence of premium loading and risk aversion on the optimal weighting, and comments on the challenge of (spatial) loss dependence.

2026-04-23T07:35:29Z Markus Johannes Maier Matthias Scherer http://arxiv.org/abs/2604.21292v1 Large values in time series and additive combinatorics 2026-04-23T05:11:05Z

It is well-known in industrial data science that large values of real-life time series tend to be structured and often follow concrete and visible patterns. In this paper, we use ideas from additive combinatorics and discrete Fourier analysis to give this heuristic a mathematical foundation. Our main tool is the Fourier ratio, a complexity measure previously used in compressed sensing, combined with a generalized version of Chang's lemma from additive combinatorics. Together, these yield a precise prediction: when the Fourier ratio of a time series is small, the set of its largest values can be additively generated by a very small set using only $\{-1,0,1\}$ coefficients. We test this prediction on US inflation data and Delhi climate data, both in their original form and after mean-centering. The numerical results confirm the predicted structure: a generating set of size $4$--$7$ suffices to span large spectra containing dozens of points, even when the Fourier ratio is large enough that our theoretical bounds become loose. These findings provide a rigorous explanation for why extreme values in real-world data are information-rich and structurally significant.

2026-04-23T05:11:05Z 13 pages, 6 figures Alex Iosevich Vishal Gupta http://arxiv.org/abs/2604.21115v1 Complex Approximate Message Passing with Non-separable Denoising 2026-04-22T21:58:28Z

Approximate Message Passing (AMP) is a general framework for iterative algorithms, originally developed for compressed sensing and later extended to a wide range of high-dimensional inference problems. Although recent work has advanced matrix AMP, complex AMP, and AMP for non-separable functions independently, a unified state evolution theory for complex AMP with non-separable denoisers has been lacking. This article fills that gap by establishing state evolution in the setting of complex, non-separable denoising functions. The proposed approach constructs an augmented real-valued system that lifts the problem to a higher-dimensional space, then recovers the complex domain through a many-to-one canonical transformation. Under this construction, the Onsager correction naturally involves Wirtinger derivatives, and the resulting state evolution reduces to scalar complex recursions despite the non-separable structure of the denoisers. The framework extends to the matrix-valued setting, accommodating multiple feature vectors simultaneously. This generalization enables AMP to exploit joint structural constraints, such as simultaneous group and element sparsity, in complex-valued recovery problems. The complex sparse group least absolute shrinkage and selection operator (LASSO) serves as a key instantiation, motivated by preamble detection in Orthogonal Time-Frequency Space (OTFS)-based unsourced random access. Numerical experiments confirm that state evolution accurately predicts performance and show that complex non-separable denoising can produce significant gains over separable and real-valued alternatives.

2026-04-22T21:58:28Z Vishnu Teja Kunde Alessandro Mirri Jean-Francois Chamberland Enrico Paolini http://arxiv.org/abs/2604.21067v1 The geometry of conflict : 3D Spatio-temporal patterns in fatalities prediction 2026-04-22T20:20:58Z

Understanding how conflict events spread over time and space is crucial for predicting and mitigating future violence. However, progress in this area has been limited by the lack of methods capable of capturing the intricate, dynamic patterns of conflict diffusion. The complex nature of those trends needs flexibility in the models to untangle them. This study addresses this gap by analyzing spatio-temporal conflict fatality data using an innovative approach that transforms the data into three-dimensional patterns at the Prio-Grid level. In this paper, a shape-based model called ShapeFinder is adapted. By applying the Earth Movers Distance (EMD) algorithm, we detect and classify these patterns, allowing us to compare and match patterns with high adaptive capacity in all dimensions. Using historical similar patterns, we generate predictions of conflict fatalities and compare these with forecasts from the Views ensemble model, a leading benchmark. Our findings demonstrate that recognizing and analyzing conflict diffusion patterns significantly improves predictive accuracy, outperforming the benchmark model. This research contributes to the study of conflict dynamics by introducing a novel pattern recognition framework that enhances the analysis of spatio-temporal data and offers practical applications for early warning systems.

2026-04-22T20:20:58Z 68 Pages, 34 figures Thomas Schincariol http://arxiv.org/abs/2409.07609v2 Survival of the Cheapest: Cost-Aware Hardware Adaptation for Adversarial Robustness 2026-04-22T17:36:44Z

Deploying adversarially robust machine learning systems requires continuous trade-offs between robustness, cost, and latency. We present an autonomic decision-support framework providing a quantitative foundation for adaptive hardware selection and hyper-parameter tuning in cloud-native deep learning. The framework applies accelerated failure time (AFT) models to quantify the effect of hardware choice, batch size, epochs, and validation accuracy on model survival time. This framework can be naturally integrated into an autonomic control loop (monitor--analyse--plan--execute, MAPE-K), where system metrics such as cost, robustness, and latency are continuously evaluated and used to adapt model configurations and hardware selection. Experiments across three GPU architectures confirm the framework is both sound and cost-effective: the Nvidia L4 yields a 20% increase in adversarial survival time while costing 75% less than the V100, demonstrating that expensive hardware does not necessarily improve robustness. The analysis further reveals that model inference latency is a stronger predictor of adversarial robustness than training time or hardware configuration.

2024-09-11T20:43:59Z Charles Meyers Mohammad Reza Saleh Sedghpour Tommy Löfstedt Erik Elmroth http://arxiv.org/abs/2604.20625v1 Dynamic Prediction of the Target Survival Time in Metastatic Solid Tumor Cancer Clinical Trials 2026-04-22T14:40:03Z

Overall survival (OS) is the gold standard for assessing patient benefit and cost-effectiveness of new cancer drugs. However, it is often difficult to use OS as the primary endpoint in randomized clinical trials (RCTs) for patients with metastatic cancer due to multiple reasons. In recent years, progression-free survival (PFS) has increasingly been used as the primary endpoint in metastatic cancer RCTs to accelerate development. However, regulatory authorities often seek mature OS data for approval. Therefore, it is critical to determine the target time when OS data are expected to be mature for reliable statistical inference. Motivated by an advanced renal cell carcinoma (RCC) clinical trial, we develop and investigate different prediction models leveraging information from disease progression to improve target OS prediction times. We propose a multivariate joint modeling approach considering components of progression and OS and extend three models commonly used for association to be used for OS prediction. To the best of our knowledge, this is the first comprehensive statistical study exploring the prediction of OS using different levels of information on disease progression and illustrating these models using a real, complex dataset. Our findings have significant implications for OS prediction.

2026-04-22T14:40:03Z Sidi Wang Kelley Kidwell Bo Huang Satrajit Roychoudhury http://arxiv.org/abs/2604.20611v1 Bayesian Inference for Incomplete 2x2 Diagnostic Tables 2026-04-22T14:26:32Z

Incomplete reporting of diagnostic accuracy data remains a persistent problem in medical research. In many studies, only part of the 2x2 diagnostic table is reported, leaving denominators for diseased and non-diseased groups unknown and preventing direct calculation of sensitivity, specificity, predictive values, and related operating characteristics. To address this limitation, we develop hierarchical Bayesian models for reconstructing incomplete 2x2 diagnostic tables from such partial information. Two motivating scenarios are considered: one in which only a single test-outcome row is observed, and another in which true positives, false positives, and the total sample size are reported but the remaining cells are missing. The proposed models are illustrated on a benchmark breast MRI study with complete counts, treated as partially observed in order to assess reconstruction performance under controlled missingness. The framework yields posterior inference for the missing cell counts and associated diagnostic measures, together with uncertainty quantification in weakly identified settings.

2026-04-22T14:26:32Z 21 pages, 10 tables. Supplementary materials and reproducible code available at https://github.com/saraantonijevic/bayesian_diagnostic_table-reconstruction Sara Antonijevic Danielle Sitalo Brani Vidakovic http://arxiv.org/abs/2505.13106v5 How to optimise tournament draws: The case of the FIFA World Cup 2026-04-22T13:12:46Z

The organisers of major sports competitions use different policies with respect to constraints in the group draw. Our paper aims to rationalise these choices by analysing the trade-off between attractiveness (the number of games played by teams from the same geographic zone) and fairness (the departure of the draw mechanism from a uniform distribution). A parametric optimisation model is formulated and applied to the 2018 and 2022 FIFA World Cup draws. A flaw of the draw procedure is identified: the pre-assignment of the host to a group unnecessarily increases the distortions. All Pareto efficient sets of draw constraints are determined via simulations. The proposed framework can be used to find the optimal draw rules and justify the non-uniformity of the draw procedure for the stakeholders.

2025-05-19T13:36:00Z 32 pages, 8 figures, 6 tables International Transactions in Operational Research, 2026, forthcoming László Csató http://arxiv.org/abs/2503.16744v3 Modeling and forecasting subnational age distribution of death counts 2026-04-22T12:18:54Z

Existing mortality forecasting methods focus on age-specific mortality rates, which lie in an unconstrained space and overlook the distributional nature of life-table death counts. Few studies have developed and compared forecasting methods that model the shape and dynamics of the age distribution of deaths, especially at the subnational level, where data quality varies greatly. This paper presents several forecasting methods to model and forecast the subnational age distribution of death counts. The age distribution of death counts has many similarities to probability density functions, which are non-negative and have a constrained integral, and thus live in a constrained nonlinear space. To address the nonlinear nature of objects, we implement a cumulative distribution function transformation that is scale-free and has additional monotonicity. Using subnational Japanese life-table death counts from the Japanese Mortality Database (2025), we evaluate the forecast accuracy of the transformation and forecasting methods. The improved forecast accuracy of life-table death counts implemented here will be of great interest to demographers in estimating regional age-specific survival probabilities and life expectancy, and to actuaries as a foundation for exploring potential applications in determining annuity prices for various ages and maturities.

2025-03-20T23:11:50Z 45 pages, 9 figures, 7 tables Han Lin Shang Cristian F. Jiménez-Varón