https://arxiv.org/api/wU2wU5XxYMShACXi3kHRuehEn4A2026-06-18T21:59:51Z3629694515http://arxiv.org/abs/2605.20145v1Goal-Oriented Lower-Tail Calibration of Gaussian Processes for Bayesian Optimization2026-05-19T17:32:25ZBayesian optimization (BO) selects evaluation points for expensive black-box objectives using Gaussian process (GP) predictive distributions. Kernel choice and hyperparameter selection can lead to miscalibrated predictive distributions and an inappropriate exploration-exploitation trade-off. For minimization, sampling criteria such as expected improvement (EI) depend on the predictive distribution below the current best value, so lower-tail miscalibration directly affects the sampling decision. This article studies goal-oriented calibration of GP predictive distributions below a low threshold $t$ in the noiseless setting, for standard GP models with hyperparameters selected by maximum likelihood. A framework for predictive reliability below $t$ is introduced, based on two notions of spatial calibration: occurrence calibration over the design space and thresholded $μ$-calibration on sublevel sets of the form $\{x\in\mathbb{X}, f(x)\le t\}$. Building on this framework, we propose tcGP, a post-hoc method that calibrates GP predictive distributions below~$t$, and we show that the resulting EI-based global optimization algorithm remains dense in the design space. Experiments on standard benchmarks show improved lower-tail calibration and BO performance relative to standard GP models and globally calibrated GP models.2026-05-19T17:32:25ZICML 2026Aurélien PionEmmanuel Vazquezhttp://arxiv.org/abs/2605.20135v1Quantile-Based Effectiveness Persistence Function: A Tail-Focused Metric with Theory, Estimation, and Application to Biosimilar Evaluation2026-05-19T17:19:38ZIn clinical studies, persistence, which measures the duration of time a patient continues to take a prescribed medication without discontinuation, is increasingly recognized as a critical indicator of adherence to medication. Adherence encompasses not only whether a patient takes their medication as prescribed but also the consistency and duration with which they do so. Among the various metrics used to evaluate adherence, persistence stands out as a particularly robust measure because it provides a temporal dimension, reflecting the sustained commitment of patients to their therapeutic regimens. This focus on persistence offers unique insights into adherence-related quality and performance, shedding light on the challenges and opportunities to optimize long-term medication use. The comparison of upper-tail clinical performance, which measures the extent to which very large responses persist among top responders, is often more decisive in therapy evaluation than conventional summaries. In this paper, we introduce the quantile-based effectiveness persistence function defined as the ratio between the tail mean and the quantile function. The notion parallels expected shortfall in risk theory and is tailored to detect clinically meaningful deviations in the upper tail. We establish key properties and show that the function is equivalent to the first L-moment of the scaled tail, yielding robust inference tools. We derive a simple nonparametric estimator of the function and develop a bootstrap-calibrated two-sample (upper-tail) equivalence test. Simulation studies and real-data analysis illustrate that the proposed measures captures clinically relevant tail persistence that complements median and mean-based summaries.2026-05-19T17:19:38ZSankaran P. G.Prasanth V. P.Midhu N. Nhttp://arxiv.org/abs/2605.20125v1Federated Learning with Incomplete Data: When to Use Complete Cases and When to Weight2026-05-19T17:12:37ZPrivacy constraints have driven the rise of federated learning (FL), which enables multi-site analyses without sharing individual participant data. We develop a framework for FL with missing data, identifying conditions under which the complete case (CC) estimator is preferred over the inverse probability weighting (IPW) estimator. For settings where the CC estimator fails, we introduce a calibrated weight estimation approach that combines candidate weighting models across sites and remains consistent if at least one is correctly specified. Consistency conditions are stated at the site level, ensuring that the federated estimator inherits validity from local properties. We derive a sandwich variance estimator that accounts for uncertainty in weight estimation, and illustrate the framework by evaluating risk factors for 90-day mortality among patients with pleural infections treated with intrapleural enzyme therapy.2026-05-19T17:12:37ZJesus E. VazquezYicheng ShenJason AkulianChad HochbergTheodore J. IwashynaElizabeth A. StuartJiayi Tonghttp://arxiv.org/abs/2605.20099v1A Goodness-of-Fit Test for Independent Component Models in High Dimensions2026-05-19T16:50:14ZIndependent component (IC) models are a standard tool for representing multivariate data in statistics, signal processing, and machine learning. Despite the extensive use of IC models, much less attention has been given to goodness-of-fit tests for assessing their compatibility with data. We develop the first goodness-of-fit test for IC models that is supported by a theoretical validity guarantee when the data dimension and sample size diverge proportionally. This is made possible by the fact that the test does not rely on a pre-whitening step, which often limits the applicability of other goodness-of-fit tests in high dimensions. Our theoretical analysis is complemented with numerical experiments that demonstrate the test's size control and power under a range of conditions. In addition, we provide examples involving gene-expression data to illustrate that the test has potential for effective diagnostic use in practice.2026-05-19T16:50:14ZMingshuo LiuSiyao WangMiles E. Lopeshttp://arxiv.org/abs/2602.09061v2Optimal information deletion and Bayes' theorem2026-05-19T16:33:49ZArnold Zellner published a seminal paper on Bayes' theorem as an optimal information processing rule, a result that led to the variational formulation of Bayes' theorem, and a central idea in generalized variational inference. Almost 40 years later, we revisit these ideas, but from the perspective of information deletion. We investigate rules that update a posterior distribution into an antedata distribution when a portion of data is removed. In such context, a rule that does not destroy or create nonexistent information is called the optimal information deletion rule and we prove that it coincides with the leave-data-out posterior from Bayes' theorem.2026-02-08T23:43:29ZHans MontchoHåvard Ruehttp://arxiv.org/abs/2510.25632v3Automatic selection of hyper-parameters via the use of softened profile likelihood2026-05-19T15:44:52ZWe extend a heuristic method for automatic dimensionality selection, which maximizes a profile likelihood to identify "elbows" in scree plots. Our extension enables researchers to make automatic choices of multiple hyper-parameters simultaneously. To facilitate our extension to multi-dimensions, we propose a "softened" profile likelihood. We present two distinct parameterizations of our solution and demonstrate our approach on elastic nets, support vector machines, and neural networks. We also report a small simulation study to investigate violations to an assumption we make, and briefly discuss applications of our method to other data-analytic tasks than hyper-parameter selection.2025-10-29T15:36:43ZReplaced first example (Section 3.1). Added simulation study (Appendix D). Included URL to Python code on GitHubGengyang ChenMu Zhuhttp://arxiv.org/abs/2605.20007v1Identifying Interventional Joint Distributions via Extended Bridge Functions2026-05-19T15:37:01ZExisting identification results in proximal causal inference often focus on marginal interventional distributions using standard outcome or treatment bridge functions. These methods do not generally identify joint interventional distributions that contain all proxy variables that were used to define the corresponding bridge functions. In many applications, however, these joint interventional distributions are a natural target of interest. We introduce extended bridge functions and derive new identification results for joint interventional distributions that may retain all relevant proxy variables. We then apply these results to proximal identification algorithms, where interventional kernels naturally arise as intermediate objects, yielding a generalized framework based on kernel operations.2026-05-19T15:37:01Z38 pages, 4 figuresConstantin Schotthttp://arxiv.org/abs/2605.20003v1Estimating treatment duration effects via clone-censor-weight: a breast cancer case study2026-05-19T15:34:49ZIn this work, we study the estimation of treatment duration effects in observational survival data, where treatment and covariate histories evolve over time and longer observed durations are only attainable among individuals who remain event-free and under follow-up, leading to immortal time bias under naive analyses. The cloning-censoring-weighting (CCW) framework provides a practical approach to emulate target trials of treatment duration strategies, but several methodological aspects remain insufficiently understood.
We focus on static treatment duration strategies under two settings of increasing complexity: baseline confounding only, and confounding with time-varying covariates. We formalize the assumptions underlying CCW, with particular emphasis on treatment admissibility, relaxed intervention rules, and the distinction between artificial and natural censoring. We then compare several estimation approaches after cloning and censoring, including inverse probability of censoring weighting (IPCW), the G-formula, and doubly robust estimators, through simulation studies assessing robustness, variability, and sensitivity to censoring model misspecification.
Finally, we apply the framework to a Breast Cancer cohort to emulate a target trial comparing 2 versus 5 years of adjuvant tamoxifen in early stage breast cancer. Due to the small number of events and limited support for the 2-year strategy, estimates are associated with substantial uncertainty. These findings highlight both the practical relevance and the limitations of CCW, and underscore the importance of sensitivity analyses in complex longitudinal observational settings.2026-05-19T15:34:49ZCharlotte VoinotNoémie Simon-TillauxEmma TorriniStefan MichielsBernard SebastienClément BerenfeldJulie Jossehttp://arxiv.org/abs/2512.24139v5Colorful Pinball: Density-Weighted Quantile Regression for Conditional Guarantee of Conformal Prediction2026-05-19T15:14:11ZAlthough conformal prediction provides robust marginal coverage guarantees, achieving reliable conditional coverage for specific inputs remains challenging. While exact distribution-free conditional coverage is impossible with finite samples, recent work has focused on improving the conditional coverage of standard conformal procedures. Distinct from approaches that target relaxed notions of conditional coverage, we directly target the mean squared error of conditional coverage by refining the quantile regression components that underpin many conformal methods. Leveraging a Taylor expansion, we derive a sharp surrogate objective for quantile regression: a density-weighted pinball loss, where the weights are given by the conditional density of the nonconformity score evaluated at the true quantile. We propose a three-headed quantile network that estimates these weights via finite differences using auxiliary quantile levels at $1-α\pm δ$, subsequently fine-tuning the central quantile by optimizing the weighted loss. We provide a theoretical analysis with exact non-asymptotic guarantees characterizing the resulting excess risk. Extensive experiments on diverse high-dimensional real-world datasets demonstrate remarkable improvements in conditional coverage performance.2025-12-30T11:02:35ZICML 2026Qianyi ChenBo Lihttp://arxiv.org/abs/2603.25690v3Approximate Bayesian Inference for Structural Equation Models using Integrated Nested Laplace Approximations2026-05-19T14:58:45ZMarkov chain Monte Carlo (MCMC) methods remain the mainstay of Bayesian estimation of structural equation models (SEM), though they often incur a high computational cost. We present a bespoke approximate Bayesian approach to SEM, drawing on ideas from the integrated nested Laplace approximation (INLA, Rue et al., 2009, J. R. Stat. Soc. Series B Stat. Methodol.) framework. We implement a simplified Laplace approximation that efficiently profiles the posterior density in each parameter direction while correcting for asymmetry, allowing for parametric skew-normal estimation of the marginals. Furthermore, we apply a variational Bayes correction to shift the marginal locations, thereby better capturing the posterior mass. Essential quantities, including factor scores and model-fit indices, are obtained via an adjusted Gaussian copula sampling scheme. For normal-theory SEM, this approach offers a highly accurate alternative to sampling-based inference, achieving near-'maximum likelihood' speeds while retaining the precision of full Bayesian inference.2026-03-26T17:39:01ZHaziq JamilHåvard Ruehttp://arxiv.org/abs/2403.20200v5High-dimensional analysis of ridge regression for non-identically distributed data with a variance profile2026-05-19T14:46:24ZHigh-dimensional linear regression has been thoroughly studied in the context of independent and identically distributed data. We propose to investigate high-dimensional regression models for independent but non-identically distributed data. To this end, we suppose that the set of observed predictors (or features) is a random matrix with a variance profile and with dimensions growing at a proportional rate. Assuming a random effect model, we study the predictive risk of the ridge estimator for linear regression with such a variance profile. In this setting, we provide deterministic equivalents of this risk and of the degree of freedom of the ridge estimator. For certain class of variance profile, our work highlights the emergence of the well-known double descent phenomenon in high-dimensional regression for the minimum norm least-squares estimator when the ridge regularization parameter goes to zero. We also exhibit variance profiles for which the shape of this predictive risk differs from double descent. The proofs of our results are based on tools from random matrix theory in the presence of a variance profile that have not been considered so far to study regression models. Numerical experiments are provided to show the accuracy of the aforementioned deterministic equivalents on the computation of the predictive risk of ridge regression. We also investigate the similarities and differences that exist with the standard setting of independent and identically distributed data.2024-03-29T14:24:49ZJérémie BigotIssa-Mbenard DaboCamille Malehttp://arxiv.org/abs/2509.19707v2Diffusion and Flow-based Copulas: Forgetting and Remembering Dependencies2026-05-19T14:33:26ZCopulas are a fundamental tool for modelling multivariate dependencies in data, forming the method of choice in diverse fields and applications. However, the adoption of existing models for multimodal and high-dimensional dependencies is hindered by restrictive assumptions and poor scaling. In this work, we present methods for modelling copulas based on the principles of diffusions and flows. We design two processes that progressively forget inter-variable dependencies while leaving dimension-wise distributions unaffected, provably defining valid copulas at all times. We show how to obtain copula models by learning to remember the forgotten dependencies from each process, theoretically recovering the true copula at optimality. The first instantiation of our framework focuses on direct density estimation, while the second specialises in expedient sampling. Empirically, we demonstrate the superior performance of our proposed methods over state-of-the-art copula approaches in modelling complex and high-dimensional dependencies from scientific datasets and images. Our work enhances the representational power of copula models, empowering applications and paving the way for their adoption on larger scales and more challenging domains.2025-09-24T02:33:29ZPublished as a conference paper at ICLR 2026David HukTheodoros Damoulashttp://arxiv.org/abs/2605.19878v1Sample Size Determination Under Selection Bias: Robust Tolerance Limits for Prevalent Cohort Data2026-05-19T14:09:24ZTolerance limits have received considerable attention in the statistical literature, with applications reaching far beyond their initial role in quality control. The well-known formula of Scheffé and Tukey (1944) establishes a simple, distribution-free relation between sample size and population coverage by two given order statistics and a given confidence level. A key requirement in applying this formula is the availability of an unbiased, representative sample from the population of interest. However, as it often happens in biological and medical applications, various logistical constraints may preclude the possibility of obtaining an unbiased sample. We derive extensions of this formula which accommodate a large class of biased sampling schemes including weight bias and censoring. The modified formulae are validated through a simulation study and compared to its unmodified counterpart. We illustrate the use of the modified formulae using the partially observed failure times for individuals with dementia using data collected from the Canadian Study of Health and Aging.2026-05-19T14:09:24Z11 pages, 3 figures, 1 tableJames H. McVittieMartin LysyMasoud Asgharianhttp://arxiv.org/abs/2605.19861v1Stationary subspace analysis for spatial data2026-05-19T13:53:02ZStationary subspace analysis (SSA) is a blind source separation framework that decomposes linearly mixed multivariate data into stationary and nonstationary components. We extend SSA to spatially indexed data by introducing spatial stationary subspace analysis (spSSA), which explicitly accounts for spatial dependence. We propose three estimation procedures for the unmixing matrix based on first- and second-order spatial statistics. Each procedure targets a different type of nonstationarity and can be formulated as the solution to a generalized eigenvalue problem. To address situations where multiple forms of nonstationarity are present simultaneously, we combine the three procedures using approximate joint diagonalization. Simulation studies demonstrate that this combined approach yields superior separation performance. When the dimension of the nonstationary subspace is known, the proposed methods reliably recover the latent stationary and nonstationary components. However, determining this dimension remains a fundamental challenge in SSA, for which no generally accepted solution currently exists. Building on our estimation procedures, we propose a novel data augmentation approach to estimate the dimension of the nonstationary subspace and demonstrate its effectiveness through simulation studies. The proposed methodology is easily transferable to time series settings, making it of broader methodological interest.2026-05-19T13:53:02ZPerttu SaarelaKlaus NordhausenJaakko PereAnne M. Ruizhttp://arxiv.org/abs/2409.02311v3A simple distributional difference-in-differences estimator for univariate and bivariate outcomes2026-05-19T13:38:25ZWe provide a simple distribution regression estimator for treatment effects in the difference-in-differences (DiD) design. Our procedure is particularly useful when the treatment effect differs across the distribution of the outcome variable. Our proposed estimator easily incorporates covariates and, importantly, can be extended to settings where the treatment potentially affects the joint distribution of multiple outcomes. Our key identifying restriction is that the untreated outcome distribution does not exhibit an interaction effect of group and time. This assumption results in a parallel trend assumption on a transformation of the distribution. We highlight the relationship between our procedure and assumptions with the changes-in-changes approach of Athey and Imbens (2006). We also reexamine the Card and Krueger (1994) study of the impact of minimum wages on employment to illustrate the utility of our approach.2024-09-03T21:47:48Z43 pages, 3 figures, 4 tables; new section on asymptotic theory with respect to previous versionIván Fernández-ValJonas MeierAico van VuurenFrancis Vella