https://arxiv.org/api/y6bzivk3hTF40cWakNPq3Sd2QI8 2026-06-21T14:56:39Z 23582 675 15 http://arxiv.org/abs/2602.10125v6 How segmented is my network? 2026-04-29T20:02:58Z

Network segmentation is a popular security practice for limiting lateral movement, yet practitioners lack a metric to measure how segmented a network actually is. We define segmentedness as the fraction of potential node-pair communications disallowed by policy -- equivalently, the complement of graph edge density -- and show it to be the first statistically principled scalar metric for this purpose. Then, we derive a normalized estimator for segmentedness and evaluate its uncertainty using confidence intervals. For a 95\% confidence interval with a margin-of-error of $\pm 0.1$, we show that a minimum of $M=97$ sampled node pairs is sufficient. This result is independent of the total number of nodes in the network, provided that node pairs are sampled uniformly at random. We evaluate the estimator through Monte Carlo simulations on Erdős--Rényi, stochastic block models, and real-world enterprise network datasets, demonstrating accurate estimation. Finally, we discuss applications of the estimator, such as baseline tracking, zero trust assessment, and merger integration.

2026-01-31T15:28:09Z 5 Tables, 5 Figures Rohit Dube http://arxiv.org/abs/2604.26884v1 Improving Bias Correction Methods for Daily Rainfall Using a Markov Chain Approach 2026-04-29T16:53:05Z

Accurate, localised rainfall information is essential for applications such as agricultural planning, climate risk assessment, and water resources management. Gridded climate products provide rainfall information over large areas but can lack the accuracy needed at local scales, often requiring bias correction before use in local impact studies. Bias correction of daily rainfall is particularly challenging due to its complex characteristics. Local intensity scaling (LOCI) and quantile mapping (QM) are two widely used bias correction methods which adjust both rainfall frequency and intensity, but do not account for the temporal structure of daily rainfall. This can lead to biases in the representation of wet and dry spells. This study proposes integrating a two-state first-order Markov chain directly into existing bias correction methods through state-dependent rain day thresholds and rainfall adjustments, aimed at improving the temporal structure of rainfall. Two implementations of this framework are presented: Markov chain local intensity scaling (MC LOCI) and Markov chain quantile mapping (MC QM). The proposed methods were applied to AgERA5 reanalysis data with rainfall data from five stations in Zimbabwe. Results showed that the Markov chain methods outperformed LOCI and QM by improving the representation of rainfall persistence, onset, and wet and dry spell characteristics, while maintaining improvements in rain day frequency and overall rainfall statistics. These results demonstrate that the proposed methods could be beneficial for applications such as crop simulation, hydrological modelling and other applications which rely on accurate representation of rainfall sequencing.

2026-04-29T16:53:05Z 42 pages, 19 figures Danny Parsons David Stern Mouhamadou Bamba Sylla James Musyoka John Bagiliko Lily Clements John Mupuro Denis Ndanguza http://arxiv.org/abs/2604.22140v3 Concave Statistical Utility Maximization Bandits via Influence-Function Gradients 2026-04-29T11:34:22Z

We study stochastic multi-armed bandits in which the objective is a statistical functional of the long-run reward distribution, rather than expected reward alone. Under mild continuity assumptions, we show that the infinite-horizon problem reduces to optimizing over stationary mixed policies: each weight vector $w$ on the simplex induces a mixture law $P^w$, and performance is measured by the concave utility $U(w)=\mathfrak U(P^w)$. For differentiable statistical utilities, we use influence-function calculus to derive stochastic gradient estimators from bandit feedback. This leads to an entropic mirror-ascent algorithm on a truncated simplex, implemented through multiplicative-weights updates and plug-in estimates of the influence function. We establish regret bounds that separate the mirror-ascent optimization error from the bias caused by estimating the influence function. The framework is developed for general concave distributional utilities and illustrated through variance and Wasserstein objectives, with numerical experiments comparing exact and plug-in influence-function implementations.

2026-04-24T01:13:19Z Matías Carrasco Alejandro Cholaquidis http://arxiv.org/abs/2505.18441v2 DB-KSVD: Scalable Alternating Optimization for Disentangling High-Dimensional Embedding Spaces 2026-04-29T10:08:45Z

Dictionary learning has recently emerged as a promising approach for mechanistic interpretability of large transformer models. Disentangling high-dimensional transformer embeddings requires algorithms that scale to high-dimensional data with large sample sizes. Recent work has explored sparse autoencoders (SAEs) for this problem. However, SAEs use a simple linear encoder to solve the sparse encoding subproblem, which is known to be NP-hard. It is therefore interesting to understand whether this approach is sufficient to find good solutions to the dictionary learning problem or if a more sophisticated algorithm could find better solutions. In this work, we propose Double-Batch KSVD (DB-KSVD), a scalable dictionary learning algorithm that adapts the classic KSVD algorithm. DB-KSVD is informed by the rich theoretical foundations of KSVD but scales to datasets with millions of samples and thousands of dimensions. We demonstrate the efficacy of DB-KSVD by disentangling text embeddings of the Gemma-2-2B and Pythia-160M models and evaluating on six metrics from the SAEBench benchmark, where we achieve competitive results when compared to established approaches based on SAEs. We further show similar results when disentangling image embeddings obtained from the DINOv2-S and DINOv2-B models, solidifying our findings. By matching SAE performance with an entirely different optimization approach, our results suggest that (i) SAEs do find strong solutions to the dictionary learning problem and (ii) traditional optimization approaches can be scaled to the required problem sizes, offering a promising avenue for further research. We make an implementation of DB-KSVD available at https://github.com/romeov/ksvd.jl.

2025-05-24T00:32:50Z 8 pages + 10 pages appendix. Updated with additional vision transformer experiments Romeo Valentin Sydney M. Katz Vincent Vanhoucke Mykel J. Kochenderfer http://arxiv.org/abs/2509.10736v2 Adaptive Bayesian computation for efficient biobank-scale genomic inference 2026-04-29T09:29:13Z

Motivation: Modern biobanks, with unprecedented sample sizes and phenotypic diversity, have become foundational resources for genomic studies, enabling powerful cross-phenotype and population-scale analyses. As studies grow in complexity, Bayesian hierarchical models offer a principled framework for jointly modeling multiple units such as cells, traits, and experimental conditions, increasing statistical power through information sharing. However, adoption of Bayesian hierarchical models in biobank-scale studies remains limited due to computational inefficiencies, particularly in posterior inference over high-dimensional parameter spaces. Deterministic approximations such as variational inference provide scalable alternatives to Markov Chain Monte Carlo, yet current implementations do not fully exploit the structure of genome-wide multi-unit modeling, especially when biological effects of interest are concentrated in a few units. Results: We propose an adaptive focus (AF) strategy within a block coordinate ascent variational inference (CAVI) framework that selectively updates subsets of parameters at each iteration, corresponding to units deemed relevant based on current estimates. We illustrate this approach in protein quantitative trait locus (pQTL) mapping using a joint model of hierarchically linked regressions with shared parameters across traits. In both simulated data and real proteomic data from the UK Biobank, AF-CAVI achieves up to a 50\% reduction in runtime while maintaining statistical performance. We also provide a genome-wide pipeline for multi-trait pQTL mapping across thousands of traits, demonstrating AF-CAVI as an efficient scheme for large-scale, multi-unit Bayesian analysis in biobanks.

2025-09-12T22:52:38Z Yiran Li John Whittaker Sylvia Richardson Helene Ruffieux http://arxiv.org/abs/2604.26471v1 A simple strategy for valid inference in target trial emulations 2026-04-29T09:28:12Z

Target trial emulation has improved comparative effectiveness research by making the causal question, assumptions, and analysis plan explicit. However, target trial protocols are usually developed iteratively. After examining the data, investigators revise the protocol to reflect which target trials the observational data can realistically support. While this iterative procedure is part of normal scientific practice, it raises concerns about selective choices and invalid statistical inference. A simple procedure can address these concerns. This procedure is based on sample splitting. In the initial split, investigators explore the data to define a target trial protocol. When these choices are made, the target trial protocol is implemented on the second split. Although the investigators made data-informed choices to select the target trial protocol, the inference has the usual coverage guarantees. The procedure is created to mirror how trialists move from pilot studies to a phase 3 trial. First, they use data from pilots and early-phase trials to learn and decide on a final protocol. Then they implement this protocol and analyze a new set of data in a phase 3 trial.

2026-04-29T09:28:12Z This is a short, non-technical manuscript that outlines how valid inference can be ensured in target trials, using existing ideas for sample splitting Mats Julius Stensrud http://arxiv.org/abs/2604.26410v1 Longitudinal Outcomes Truncated by Death: Causal Estimands and Bayesian Estimators 2026-04-29T08:23:47Z

Defining a causal estimand for a longitudinal outcome truncated by death is challenging, because the outcome may be undefined at the end of follow-up. Although a range of estimands and several estimators have been proposed, guidance on the underlying causal assumptions and on the contexts in which each estimand is most appropriate remains limited. We propose a framework to clarify the challenges of defining causal estimands in a longitudinal setting with censoring due to death. Within this framework, we review existing estimands and make explicit the assumptions required for their identification and estimation. We develop Bayesian estimators for each estimand and compare their behavior in a simulation study. Finally, we illustrate the proposed approach using data from a randomized controlled trial in amyotrophic lateral sclerosis. We show that the main difficulty arises from the lack of a natural notion of ordering and distance for outcomes truncated by death. This leads to an inherently multifactorial problem. In this context, the stratified average causal effect, combined with restricted mean survival time, provides a more complete characterisation of treatment effects.

2026-04-29T08:23:47Z Juliette Ortholand Young Lee Marie-Abele C Bind http://arxiv.org/abs/2604.26359v1 A spatio-temporal statistical framework for heatwave attribution under climate change 2026-04-29T07:13:54Z

We develop a unified statistical framework for attributing heatwaves as spatio-temporal phenomena under climate change. We quantify the impact of anthropogenic forcing on the probability and persistence of heatwaves not captured by standard marginal extreme-value approaches. Our methodology constructs a generative model for daily temperature fields that separates marginal nonstationarity from spatio-temporal dependence. We combine three components: a Bayesian spatial quantile regression model for the bulk of the data; a nonstationary spatial generalized extreme value model for tail behavior; and a copula-based model capturing both asymptotic dependence and independence in the extremes. The framework is applied to the CMIP6 MRI-ESM2 climate model, contrasting factual and counterfactual scenarios for probabilistic attribution. Our results show that the approach captures key heatwave characteristics inaccessible to traditional methods, enabling direct estimation of event-level attribution metrics. Overall, it provides a flexible basis for analyzing and attributing complex climate extremes as space-time objects.

2026-04-29T07:13:54Z Kamal Gasser Johan Segers Francesco Ragone http://arxiv.org/abs/2604.26268v1 The Difference Between "Replicable" and "Not replicable" is not Itself Scientifically Replicable 2026-04-29T03:50:20Z

Replication studies estimate the replicability rate of scientific results by aggregating binary verdicts of experiments. Exact replications are rarely attainable, so most replication sequences are non-exact. Experiments differ in ways that matter and do not share a single data-generating process. We formalize two statistical interpretations of non-exactness. In a shared latent rate (benchmark) model, experiments are exchangeable and depend on a common random replicability rate. In a conditionally independent rates (operational) model, each experiment has its own replicability rate drawn from a population distribution. Under the benchmark model, even small variability among replicability rates induces an irreducible variance floor on the estimated mean replicability rate that no amount of replication can eliminate. Under the operational model, the degree of non-exactness is not identifiable from standard replication data, because one binary verdict per experiment carries no information about between-experiment heterogeneity. Researchers cannot tell which precision regime they are in or whether high- and low-replicability sequences can be distinguished in principle. The usual data structure cannot support reliable demarcation between "replicable" and "not replicable" results and systematically understates uncertainty, making high- and low-replicability sequences appear discriminable when they are not. We show how common sources of heterogeneity amplify these problems and demonstrate practical consequences in a reanalysis of Many Labs 4. Aggregating replicability rates across heterogeneous literatures produces averages that conflate incommensurable regimes and lack a stable interpretation. Replicability rate is not a reliable demarcation criterion. The replication crisis, if there is one, cannot be established by the methods used to declare it.

2026-04-29T03:50:20Z Berna Devezer Erkan O. Buzbas http://arxiv.org/abs/2503.05023v3 A Behavioral Scorecard Model Using Survival Analysis 2026-04-29T01:05:25Z

Credit risk assessment is a crucial aspect of financial decision-making, enabling institutions to predict the likelihood of default and make informed lending decisions. Two prominent methodologies in credit risk modeling are logistic regression and survival analysis. Logistic regression is widely used in scorecard development due to its simplicity, interpretability, and effectiveness in estimating the probability of binary outcomes, such as default versus non-default. In contrast, survival analysis -- particularly within the hazard rate framework -- provides insights into the timing of events, such as the time to default. By integrating logistic regression with survival analysis, traditional scorecard models can be enhanced to capture not only the probability of default but also the dynamics of default over time. This combined approach offers a more comprehensive view of credit risk, enabling institutions to manage risk proactively and tailor strategies to individual borrower profiles. This article presents the process of developing a monthly hazard rate model using logistic regression and augmented data with survival analysis techniques to incorporate time-varying risk factors. The process includes data preparation, model construction, and the evaluation of performance metrics. Monthly hazard rates are then converted into default probabilities. Finally, a behavioral scorecard is developed using offset adjustment.

2025-03-06T22:48:44Z Cheng Lee Hsi Lee http://arxiv.org/abs/2604.26198v1 Pricing Global Macroeconomic Risk in Equity Markets: Evidence from Selected G20 Economies 2026-04-29T01:00:51Z

This study investigates whether international equity markets systematically price global macroeconomic risks. The empirical analysis is conducted using monthly excess returns for ten G20 countries over the period 2000-2024. A Dynamic Factor Model (DFM) is employed to extract latent global factors from a set of macroeconomic variables capturing global inflation, real activity, monetary policy, term structure, exchange rates, volatility, and oil prices. The model selection criteria of the dynamic factor framework, which support a 3 factor specification that is parsimonious. The Fama MacBeth regressions demonstrate the low explanatory power of the 3-factor model. In contrast, a 4 factor specification results in economically large and statistically significant factor loadings, an obvious rise in explanatory power, and a significant improvement in model performance. The results indicate that a four-factor specification provides the best balance between explanatory power and model stability, significantly improving the ability to explain cross-sectional variation in excess returns , with all factors statistically significant. The Capital Asset Pricing Model, while offering a parsimonious and stable benchmark with consistently significant market betas, exhibits limited explanatory power due to its single factor structure. Overall, the findings suggest that macro driven latent factors extracted through the DFM provide a more comprehensive and empirically robust framework for international asset pricing than the CAPM, highlighting the importance of incorporating multiple sources of systematic risk in explaining cross-country equity returns.

2026-04-29T01:00:51Z Vivek Mishra http://arxiv.org/abs/2602.21876v2 Comparative Evaluation of Machine Learning Models for Predicting Donor Kidney Discard 2026-04-29T00:43:40Z

A kidney transplant can improve the life expectancy and quality of life of patients with end-stage renal failure. Even more patients could be helped with a transplant if the rate of kidneys that are discarded and not transplanted could be reduced. Machine learning (ML) can support decision-making in this context by early identification of donor organs at high risk of discard, for instance to enable timely interventions to improve organ utilization such as rescue allocation. Although various ML models have been applied, their results are difficult to compare due to heterogenous datasets and differences in feature engineering and evaluation strategies. This study aims to provide a systematic and reproducible comparison of ML models for donor kidney discard prediction. We trained five commonly used ML models: Logistic Regression, Decision Tree, Random Forest, Gradient Boosting, and Deep Learning along with an ensemble model on data from 4,080 deceased donors (death determined by neurologic criteria) in Germany. A unified benchmarking framework was implemented, including standardized feature engineering and selection, and Bayesian hyperparameter optimization. Model performance was assessed for discrimination (MCC, AUC, F1), calibration (Brier score), and explainability (SHAP). The ensemble achieved the highest discrimination performance (MCC=0.76, AUC=0.87, F1=0.90), while individual models such as Logistic Regression, Random Forest, and Deep Learning performed comparably and better than Decision Trees. Platt scaling improved calibration for tree-and neural network-based models. SHAP consistently identified donor age and renal markers as dominant predictors across models, reflecting clinical plausibility. This study demonstrates that consistent data preprocessing, feature selection, and evaluation can be more decisive for predictive success than the choice of the ML algorithm.

2026-02-25T13:00:05Z Peer Schliephacke Hannah Schult Leon Mizera Judith Würfel Gunter Grieser Axel Rahmel Carl-Ludwig Fischer-Fröhlich Antje Jahn-Eimermacher http://arxiv.org/abs/2512.08824v2 Commanding the Foul Shot: A New Ensemble of Free Throw Metrics 2026-04-28T21:20:17Z

With the NBA's adoption of in-game limb tracking in 2023, Sony's Hawk-Eye system now captures high-resolution, 3D poses of players and the ball 60 times per second. Linking these data to key events opens a new era in NBA analytics. Here, we leverage a large dataset of 21,964 shot attempts from 72 NBA players to introduce a novel ensemble of metrics for evaluating free-throw shooting. Inspired by baseball analytics, we introduce command, which quantifies the quality of a free throw by measuring a shooter's accuracy and precision near the basket's bullseye. This metric recognizes that some makes (or misses) are better than others and captures a player's ability to execute quality attempts consistently. We demonstrate that command captures underlying skill more effectively than traditional make-or-miss statistics; early-season command predicts late-season success more reliably than traditional shooting percentage. To identify what drives command, we define launch-based metrics assessing consistency in release velocity, angle, and 3D position. Players with greater touch, i.e., more consistent launch dynamics, exhibit stronger command as they can reliably control their shot trajectory. Finally, we develop a physics model to identify the range of launch conditions that result in a make and to determine which launch conditions are most robust to small perturbations. This framework reveals ''safe'' launch regions and explains why certain players excel at free throws, providing actionable insights for player development.

2025-12-09T17:15:01Z Jake McGrath Amanda Glazer Vanna Bushong Michelle Nguyen Kirk Goldsberry http://arxiv.org/abs/2604.18898v2 A Review of Statistical Methods for Spontaneous Reporting System Data Mining: Signal Detection and Beyond 2026-04-28T15:56:00Z

Postmarketing safety surveillance relies on data from spontaneous reporting systems (SRS) such as FAERS, EudraVigilance and VigiBase, and commonly uses SRS data mining methods to assess the associations between drugs and adverse events (AEs). Traditionally, these analyses have focused on signal detection framed as a binary decision problem, whereas more recent work has emphasized more nuanced inference involving signal strength estimation and uncertainty quantification. In this paper, we review contemporary SRS data mining approaches and their statistical underpinnings for safety assessment using data from major pharmacovigilance databases worldwide. In addition to methodological review, we provide practical guidance on data preprocessing for such analysis, including construction of SRS contingency tables using only aggregated AE-drug counts, as are publicly available from databases such as VigiBase and EudraVigilance. We illustrate the guidance via opioid-related datasets obtained from FAERS and VigiBase, complied with subsequent downstream SRS data analyses.

2026-04-20T22:57:22Z Yihao Tan Marianthi Markatou Saptarshi Chakraborty http://arxiv.org/abs/2604.25710v1 Adaptive Meta-Learning Stochastic Gradient Hamiltonian Monte Carlo Simulation for Bayesian Updating of Structural Dynamic Models 2026-04-28T14:34:48Z

In the last few decades, Markov chain Monte Carlo (MCMC) methods have been widely applied to Bayesian updating of structural dynamic models in the field of structural health monitoring. Recently, several MCMC algorithms have been developed that incorporate neural networks to enhance their performance for specific Bayesian model updating problems. However, a common challenge with these approaches lies in the fact that the embedded neural networks often necessitate retraining when faced with new tasks, a process that is time-consuming and significantly undermines the competitiveness of these methods. This paper introduces a newly developed adaptive meta-learning stochastic gradient Hamiltonian Monte Carlo (AM-SGHMC) algorithm. The idea behind AM-SGHMC is to optimize the sampling strategy by training adaptive neural networks, and due to the adaptive design of the network inputs and outputs, the trained sampler can be directly applied to various Bayesian updating problems of the same type of structure without further training, thereby achieving meta-learning. Additionally, practical issues for the feasibility of the AM-SGHMC algorithm for structural dynamic model updating are addressed, and two examples involving Bayesian updating of multi-story building models with different model fidelity are used to demonstrate the effectiveness and generalization ability of the proposed method.

2026-04-28T14:34:48Z Comput Meth Appl Mech Eng; 437: 117753 (2025) Xianghao Meng James L. Beck Yong Huang Hui Li 10.1016/j.cma.2025.117753