https://arxiv.org/api/mRTVCBqzVtfaym4zXCZjbozFUTU 2026-03-21T02:18:37Z 34634 150 15 http://arxiv.org/abs/2509.01437v2 Sampling as Bandits: Evaluation-Efficient Design for Black-Box Densities 2026-03-14T01:44:24Z We propose bandit importance sampling (BIS), a powerful importance sampling framework tailored for settings in which evaluating the target density is computationally expensive. BIS facilitates accurate sampling while minimizing the required number of target-density evaluations. In contrast to adaptive importance sampling, which optimizes a proposal distribution, BIS directly optimizes the set of samples through a sequential selection process driven by multi-armed bandits. BIS serves as a general framework that accommodates user-defined bandit strategies. Theoretically, the weak convergence of the weighted samples, and thus the consistency of the Monte Carlo estimator, is established regardless of the specific strategy employed. In this paper, we present a practical strategy that leverages Gaussian process surrogates to guide sample selection, adapting the principles of Bayesian optimization for sampling. Comprehensive numerical studies demonstrate the superior performance of BIS across multimodal, heavy-tailed distributions, and real-world Bayesian inference tasks involving Markov random fields. 2025-09-01T12:47:32Z Takuo Matsubara Andrew Duncan Simon Cotter Konstantinos Zygalakis http://arxiv.org/abs/2603.13662v1 Fast Uncertainty Quantification for Kernel-Based Estimators in Large-Scale Causal Inference 2026-03-14T00:08:15Z Kernel methods are widely used in causal inference for tasks such as treatment effect estimation, policy evaluation, and policy learning. The bootstrap is a standard tool for uncertainty quantification because of its broad applicability. As increasingly large datasets become available, such as the 2023 U.S. Natality data from the National Vital Statistics System (NVSS), which includes 3,596,017 registered births, the computational demands of these methods increase substantially. Kernel methods are known to scale poorly with sample size, and this limitation is further exacerbated by the repeated re-fitting required by the bootstrap. As a result, bootstrap-based inference for kernel-based estimators can become computationally infeasible in large-scale settings. In this paper, we address these challenges by extending the causal Bag of Little Bootstraps (cBLB) algorithm to kernel methods. Our approach achieves computational scalability by combining subsampling and resampling while preserving first-order uncertainty quantification and asymptotically correct coverage. We evaluate the method across three representative implementations: kernelized augmented outcome-weighted learning, kernel-based minimax weighting, and double machine learning with kernel support vector machines. We show in simulations that our method yields confidence intervals with nominal coverage at a fraction of the computational cost. We further demonstrate its utility in a real-world application by estimating the effect of any amount of smoking on birth weight, as well as the optimal treatment regime, using the NVSS dataset, where the standard bootstrap is prohibitively expensive computationally and effectively infeasible at this scale. 2026-03-14T00:08:15Z 47 pages Matthew Kosko Falco J Bargagli-Stoffi Lin Wang Michele Santacatterina http://arxiv.org/abs/2603.13646v1 Surrogate-Based Bayesian Inference: Uncertainty Quantification and Active Learning 2026-03-13T23:06:44Z Surrogate models - also called emulators - are widely used to facilitate Bayesian inference in settings where computational costs preclude the use of standard posterior inference algorithms. Their deployment is now standard practice across many scientific domains. However, integrating surrogates in statistical analyses introduces unique challenges that complicate established Bayesian workflow principles. While significant progress has been made in addressing these issues, the relevant developments are scattered across several distinct research communities, with different emphases and perspective. We present a unifying review that synthesizes the literature into a coherent framework, aiming to benefit both practitioners and methods developers. We place particular emphasis on propagating surrogate uncertainty and sequentially refining emulators via active learning, two key components of a robust surrogate-based Bayesian workflow. 2026-03-13T23:06:44Z Andrew Gerard Roberts Michael C. Dietze Jonathan H. Huggins http://arxiv.org/abs/2406.19152v2 Mixture priors for replication studies 2026-03-13T22:54:20Z Replication of scientific studies is important for assessing the credibility of their results. However, there is no consensus on how to quantify the extent to which a replication study replicates an original result. We propose a novel Bayesian approach for replication studies based on mixture priors. The idea is to use a mixture of the posterior distribution based on the original study and a non-informative distribution as the prior for the analysis of the replication study. The mixture weight then determines the extent to which the original and replication data are pooled. Two distinct strategies are presented: one with fixed mixture weights, and one that introduces uncertainty by assigning a prior distribution to the mixture weight itself. Furthermore, it is shown how within this framework Bayes factors can be used for formal testing of relevant scientific hypotheses, such as tests on the presence or absence of an effect or whether the mixture weight equals zero (completely discounting the original data) or one (fully pooling with the original data). To showcase the practical application of the methodology, we analyze data from three replication studies. Our findings suggest that mixture priors are a valuable and intuitive alternative to other Bayesian methods for analyzing replication studies, such as hierarchical models and power priors. We provide the free and open source R package repmix that implements the proposed methodology. 2024-06-27T13:11:15Z Roberto Macrì-Demartino Leonardo Egidi Leonhard Held Samuel Pawel http://arxiv.org/abs/2603.13622v1 The Continuous Rank Probability Score of a Generalized Beta-Prime Distribution and Some Special Cases 2026-03-13T22:00:42Z This working paper describes new results in derivations of the Continuous Ranked Probability Score of a generalized beta-prime distribution and several special cases, such as the Dagum distribution and Singh-Maddala distribution. Comparison with Monte Carlo estimates is also presented. 2026-03-13T22:00:42Z 9 pages, no figures. Work in progress Matthew LeDuc http://arxiv.org/abs/2603.13614v1 Measuring Extreme Tail Association 2026-03-13T21:43:25Z Simultaneous occurrences of extreme events need not imply symmetric or reciprocal tail dependence. However, most existing measures of extremal dependence are inherently symmetric and hence often fail to capture directional influence in tail association. We introduce a rank-based measure of Extreme Tail Association (ETA) for bivariate data quantifying such directional influence of one variable on another in extreme tail regions. The proposed estimator is easily computable, consistent with its population counterpart, and asymptotically normal under mild conditions, allowing for statistical inference. We further develop a formal test for asymmetry in tail association based on a multiplier bootstrap procedure. The practical relevance of the methodology is illustrated using data on extreme price movements in major cryptocurrencies. Beyond providing a flexible tool for extremal association, the proposed framework offers a substantive argument for investigating causal relationships in extreme scenarios. 2026-03-13T21:43:25Z 38 pages, 13 figures, includes appendix Bikramjit Das Xiangyu Liu http://arxiv.org/abs/2507.01375v2 Mixtures of Neural Network Experts with Application to Phytoplankton Flow Cytometry Data 2026-03-13T21:41:41Z Flow cytometry is a valuable technique that measures the optical properties of particles at a single-cell resolution. When deployed in the ocean, flow cytometry allows oceanographers to study different types of photosynthetic microbes called phytoplankton. It is of great interest to study how phytoplankton properties change in response to environmental conditions. In our work, we develop a nonlinear mixture of experts model to estimate separate regression functions for each subpopulation utilizing random-weight neural networks. Our model allows one to flexibly estimate how cell properties and relative abundances depend on environmental covariates in each segment of a heterogeneous sample, without the computational burden of backpropagation. We show that the proposed model provides superior predictive performance in simulated examples compared to a mixture of linear experts. Also, applying our model to real data, we show that our model has (1) comparable out-of-sample prediction performance, and (2) more realistic estimates of phytoplankton behavior. 2025-07-02T05:26:29Z 46 pages, 20 figures. Under revisions by Environmetrics Ethan Pawl François Ribalet Paul A. Parker Sangwon Hyun http://arxiv.org/abs/2409.06680v3 Sequential stratified inference for the mean 2026-03-13T21:19:16Z We develop conservative tests for the mean of a bounded population under stratified sampling and apply them to risk-limiting post-election audits. The tests are ``anytime valid'' under sequential sampling, allowing optional stopping in each stratum. Our core method expresses a global hypothesis about the population mean as a union of intersection hypotheses describing within-stratum means. It tests each intersection hypothesis using independent test supermartingales (TSMs) combined across strata by multiplication. A $P$-value for each intersection hypothesis is the reciprocal of that test statistic, and the largest $P$-value in the union is a $P$-value for the global hypothesis. This approach has two primary moving parts: the rule selecting which stratum to draw from next given the sample so far, and the form of the TSM within each stratum. These rules may vary over intersection hypotheses. We construct the test with the smallest expected stopping time and present a few strategies for approximating that optimum. In instances that arise in auditing and other applications, its expected sample size is substantially smaller than that of previous methods. 2024-09-10T17:44:38Z 22 pages, 5 figures, submitted to Annals of Applied Statistics Jacob V. Spertus Mayuri Sridhar Philip B. Stark http://arxiv.org/abs/2603.13583v1 Confidence intervals for two-stage adaptive designs with subpopulation selection 2026-03-13T20:47:10Z We consider clinical trials in which an experimental treatment is compared with a control in pre-specified patient subpopulations. In such settings, adaptive enrichment designs allow the enrolled population to be modified at an interim analysis, with subpopulations selected according to preplanned rules. Since these interim decisions are data-dependent, valid statistical inference must account for them. We focus on constructing confidence intervals for the treatment effect in the selected population. Confidence interval methods that ignore the possibility of population modification may fail to achieve the desired coverage probability. We propose a new approach that constructs confidence intervals with exact nominal coverage conditional on the interim decision. Importantly, our method applies to a broad class of adaptive enrichment designs, rather than a single specific design. Our method involves deriving the distribution of the naive estimator of the treatment effect in the selected population conditional on the interim decision and inverting uniformly most accurate unbiased tests to obtain the confidence interval. We provide an efficient computational procedure and show through extensive simulations that the resulting confidence intervals satisfy the theoretical coverage guarantees. 2026-03-13T20:47:10Z Enyu Li Clinical Trials Unit, University of Warwick, Coventry, UK Nigel Stallard Clinical Trials Unit, University of Warwick, Coventry, UK Ekkehard Glimm Advanced Methodology and Data Science, Novartis Pharma AG, Basel, Switzerland Dominic Magirr Advanced Methodology and Data Science, Novartis Pharma AG, Basel, Switzerland Peter K. Kimani Clinical Trials Unit, University of Warwick, Coventry, UK http://arxiv.org/abs/2603.13561v1 Addressing both variable selection and misclassified responses with parametric and semiparametric methods 2026-03-13T19:58:13Z While variable selection has received extensive attention in the literature, its exploration in the presence of response measurement error remains underexplored. In this paper, we investigate this important problem within the context of binary classification with error-prone responses. We present valid variable selection procedures to address the complexities of response errors. Leveraging validation data, we introduce both parametric and semiparametric methodologies to accommodate the mismeasurement effects. By rigorously establishing theoretical results, we offer insights and justifications of the validity of the proposed methods. By properly choosing {the} penalty function and regularization parameter, we demonstrate that the resulting estimators possess the oracle property. To assess the finite sample properties of the proposed methods, we conduct numerical studies that confirm the effectiveness of our proposed methods. 2026-03-13T19:58:13Z Hui Guo Grace Y. Yi Boyu Wang http://arxiv.org/abs/2603.13542v1 Robust Inferential Methodology for Multidimensional Diffusion Processes 2026-03-13T19:27:05Z We investigate robust parameter estimation and testing procedure for multivariate diffusion processes observed at high frequency via the minimum density power divergence estimator (MDPDE). Within a general diffusion framework and under standard regularity conditions, we establish consistency and asymptotic normality for the estimators of both drift and diffusion parameters. The drift estimator converges at the $\sqrt{n h_n}$ rate, whereas the diffusion estimator attains the standard $\sqrt{n}$ rate, and the two estimators are shown to be asymptotically independent. The proposed methodology constitutes a robust alternative to quasi-likelihood and ordinary least squares based approaches, offering resilience against outliers, local contamination, and mild model misspecification, while remaining asymptotically equivalent to classical methods in the absence of contamination. Simulation studies demonstrate that the MDPDE achieves reliable finite-sample performance and enhanced numerical stability relative to likelihood-based estimators. These results underscore the practical relevance of divergence-based estimation for high-frequency diffusion models and point to natural extensions to more complex continuous-time settings. 2026-03-13T19:27:05Z Sourojyoti Barick http://arxiv.org/abs/2603.13464v1 Modeling Heterogeneous Mediation Effects in Survival Analysis via an Interpretable M-Learner Framework 2026-03-13T17:34:15Z Mediation analysis is a useful tool to evaluate surrogate endpoints in clinical trials. We propose a novel method, the M-survival learner, for estimating heterogeneous indirect treatment effects in the presence of censored outcomes. The proposed approach enables the identification of interpretable patient subgroups characterized by distinct mediation pathways. To distinguish heterogeneous from homogeneous mediation effects, we introduce a new statistical criterion specifically designed for survival data. The method provides a principled framework for evaluating heterogeneity in surrogate biomarker performance across patient populations, offering evidence to support accelerated approval drug. By explicitly assessing subgroup-specific surrogate validity, the proposed approach addresses key regulatory concerns regarding the reliability of surrogate endpoints. We further establish theoretical properties of the method to justify its statistical guarantees. We apply the approach to data from a Phase III randomized clinical trial of HIV treatment, demonstrating its practical utility in real-world settings. Extensive simulation studies further evaluate and demonstrate its finite-sample performance. 2026-03-13T17:34:15Z Xingyu Li Qing Liu Xun Jiang Hong Amy Xia Brian P. Hobbs Peng Wei http://arxiv.org/abs/2603.13156v1 When Your Model Stops Working: Anytime-Valid Calibration Monitoring 2026-03-13T16:50:14Z Practitioners monitoring deployed probabilistic models face a fundamental trap: any fixed-sample test applied repeatedly over an unbounded stream will eventually raise a false alarm, even when the model remains perfectly stable. Existing methods typically lack formal error guarantees, conflate alarm time with changepoint location, and monitor indirect signals that do not fully characterize calibration. We present PITMonitor, an anytime-valid calibration-specific monitor that detects distributional shifts in probability integral transforms via a mixture e-process, providing Type I error control over an unbounded monitoring horizon as well as Bayesian changepoint estimation. On river's FriedmanDrift benchmark, PITMonitor achieves detection rates competitive with the strongest baselines across all three scenarios, although detection delay is substantially longer under local drift. 2026-03-13T16:50:14Z Tristan Farran http://arxiv.org/abs/2311.07733v2 Credible Intervals for Probability of Failure with Gaussian Processes 2026-03-13T16:16:37Z Estimating the probability of failure for expensive simulations is a central task in reliability analysis for structural design, power grid design, and safety certification, among other areas. This work derives credible intervals on the probability of failure by modeling the simulation as a realization of a Gaussian process surrogate. These intervals are governed by the pointwise binary classification error of the surrogate and are compatible with the broad class of adaptive sampling schemes proposed in the literature. We further propose a novel batch sampling scheme that suggests multiple evaluation points per iteration, enabling parallel simulation on HPC systems. The method is empirically validated using our scalable, open-source implementation on a variety of test problems including a Tsunami model where failure is quantified in terms of maximum wave height. 2023-11-13T20:35:21Z Aleksei G. Sorokin Vishwas Rao http://arxiv.org/abs/2510.01112v3 The causal structure of galactic astrophysics 2026-03-13T15:11:17Z Data-driven astrophysics currently relies on the detection and characterisation of correlations between objects' properties, which are then used to test physical theories that make predictions for them. This process fails to utilise information in the data that forms a crucial part of the theories' predictions, namely which variables are directly correlated (as opposed to accidentally correlated through others), the directions of these determinations, and the presence or absence of confounders that correlate variables in the dataset but are themselves absent from it. We propose to recover this information through causal discovery, a well-developed methodology for inferring the causal structure of datasets that is however almost entirely unknown to astrophysics. We develop a causal discovery algorithm suitable for large astrophysical datasets and illustrate it on $\sim$4.5$\times10^5$ nearby galaxies from the Nasa Sloan Atlas, demonstrating its ability to distinguish physical mechanisms that are degenerate on the basis of correlations alone. 2025-10-01T16:55:49Z 10 pages, 4 figures; published in the Open Journal of Astrophysics Open Journal of Astrophysics, Vol 9 (2026) Harry Desmond Joseph Ramsey 10.33232/001c.159080