Approximating full conformal prediction: distribution free guarantees via the tournament correction

2026-06-02T03:37:06Z

Conformal prediction is a framework for providing prediction intervals with distribution-free validity, guaranteeing predictive coverage for data drawn from any distribution. Its two main variants are full conformal prediction and split conformal prediction (also called transductive and inductive). Full conformal prediction is widely considered to be statistically more efficient (since split conformal prediction requires data splitting, and therefore can lead to wider prediction intervals due to the resulting loss in sample size), but its implementation is computationally prohibitive, as it requires the underlying model to be refit for every candidate value in the response space. Existing computational shortcuts, such as using a discrete grid of values to approximate the full conformal prediction construction, frequently lack theoretical guarantees on marginal coverage and can fail in practice. To address this limitation, we introduce a novel class of approximations to the full conformal prediction method, based on the idea of \emph{tournaments}, which enables the construction of prediction sets with a rigorous marginal coverage guarantee of $1-2α$. Under stability conditions, the theoretical coverage guarantee tightens to approximately $1-α$. This new framework generalizes the existing method of leave-one-out cross-conformal prediction, while allowing for flexible use of various existing approximation strategies.

Learning to Bet for Horizon-Aware Anytime-Valid Testing

2026-06-02T03:12:53Z

We develop horizon-aware anytime-valid tests and confidence sequences for bounded means under a strict deadline $N$. Using the betting/e-process framework, we cast horizon-aware betting as a finite-horizon optimal control problem with state space $(t, \log W_t)$, where $t$ is the time and $W_t$ is the test martingale value. We first show that in certain interior regions of the state space, policies that deviate significantly from Kelly betting are provably suboptimal, while Kelly betting reaches the threshold with high probability. We then identify sufficient conditions showing that outside this region, more aggressive betting than Kelly can be better if the bettor is behind schedule, and less aggressive can be better if the bettor is ahead. Taken together these results suggest a simple phase diagram in the $(t, \log W_t)$ plane, delineating regions where Kelly, fractional Kelly, and aggressive betting may be preferable. Guided by this phase diagram, we introduce a Deep Reinforcement Learning approach based on a universal Deep Q-Network (DQN) agent that learns a single policy from synthetic experience and maps simple statistics of past observations to bets across horizons and null values. In limited-horizon experiments, the learned DQN policy yields state-of-the-art results.

Marginalised Poisson Hurdle Model for Cross-Sectional Count Data with Excess Zeros

2026-06-02T01:57:05Z

Count data with excess zeros arise frequently in health economics and epidemiology. The standard Poisson Hurdle Model (PHM) parametrises the underlying Poisson rate directly, so its count-component coefficients are log-rate ratios rather than log-ratios of the marginal mean. Consequently, the incidence density ratio (IDR) from the PHM is neither exact nor constant across covariate profiles, complicating applied reporting. We propose the Marginalised Poisson Hurdle Model (MPHM), which reparametrises the count component so that the coefficient vector beta directly governs the marginal mean E[Y]. A nonlinear connector equation links the structural Poisson rate to this parametrised mean. We prove existence and uniqueness of the connector solution, develop a vectorised Brent's-method solver, derive the score equations and block-diagonal Fisher information, establish asymptotic normality, and prove that exp(beta) is exactly constant across all covariate values. A simulation study with n in {100, 250, 500, 1000}, zero proportion pi in {0.2, 0.4, 0.6, 0.8}, and R = 200 replications confirms consistency, near-zero bias, and 95% Wald coverage of 0.905-0.975 across all 16 scenarios. Applied to the NMES1988 physician visit data (n = 4,406), the MPHM yields IDR = 1.163 (95% CI: 1.150-1.177) per additional chronic condition - an exact, population-wide effect not derivable from the PHM. The MPHM resolves the non-constant IDR problem by directly parametrising E[Y]. The resulting IDR holds for every individual and the whole population without further marginalisation, substantially simplifying the reporting of covariate effects in health utilisation research.

A Fast Screening Approach for High-dimensional Outcomes and High-dimensional Predictors

2026-06-02T01:49:02Z

Modeling interactions among multimodal, high-dimensional data is intrinsically challenging due to ultra-high dimensionality and complex dependence structure with high level noise. Screening methods are effective for reducing dimensionality, but most existing approaches shrink only the predictor space while retaining all outcomes. In cross-modal analyses, different outcomes often select different predictor subsets, so the union remains large and the response dimension is unchanged, limiting the practical benefit of screening. This gives rise to heavy computational burdens and poor interpretability. To address these limitations, we propose a new screening framework, Graph Independence Dual Screening (GIDS), which simultaneously reduces the dimensionality of response variables and predictors. We design computationally efficient algorithms that facilitate downstream selection procedures, improving accuracy and scalability, and establish supporting theoretical results. Extensive simulation studies demonstrate that GIDS outperforms existing methods that screen only predictors. To illustrate its utility, we applied GIDS to the Alzheimer's Disease Neuroimaging Initiative (ADNI) dataset, analyzing interactions between genome-wide 865,353 DNA methylation and 49,386 transcriptomic variables. GIDS reduced the feature space to approximately 9,000 CpGs and 2,000 transcripts, uncovering blockwise interaction structures: clusters of CpG sites and gene transcripts with strong associations. These findings not only improve computational tractability but also yield interpretable biological insights, highlighting coordinated regulatory mechanisms underlying Alzheimer's disease.

Powerful Switchback Experiments -- Or Not?

2026-06-02T01:33:19Z

Switchback experiments -- in which treatment is assigned at the level of a cluster crossed with a time period -- are widely used in marketplace and platform settings, yet no closed-form power formula exists for them. We fill this gap by deriving a closed-form, multi-level asymptotic variance approximation for the individual-level OLS estimator, facilitating power budgeting. Using this formula, we reveal a structural floor on statistical power: while idiosyncratic noise vanishes with observation density, macro-level shocks are multiplicatively penalized by cluster size imbalance. We confirm through analytical derivations and Monte Carlo simulations that the formula is exact across typical parameters and serves as a mathematically conservative upper bound in extreme boundary regimes. We study three methodological applications. First, we prove that advanced assignment designs like stratification only partially eliminate the penalty of cluster size imbalance on power. Second, we demonstrate that variance reduction techniques targeting macro-level shocks yield disproportionately greater efficiency gains than those targeting residual noise. Third, we formalize the finite-sample power trade-offs between individual-level and cell-level estimators.

Self-Certifying Transport MCMC via Dual Spectral-Gap Certificates

2026-06-02T01:21:49Z

We propose CerT-MCMC, a framework that equips learned-transport Markov chain Monte Carlo with automatic, rigorous convergence certificates. A normalising flow maps a Gaussian reference to an approximation of the target posterior; the same flow then serves as both the independence Metropolis-Hastings proposal and the basis for a computable spectral-gap bound. We develop two complementary certificates. The covering certificate bounds the weight-ratio oscillation over the full proposal support via finite-sample covering arguments, yielding full-support spectral-gap bounds when a conservative gradient bound is available; its correction term scales as O(n^{-1/D}), making it rapidly weak and eventually vacuous as dimension increases. We prove a matching Omega(n^{-1/D}) lower bound, establishing that this barrier is intrinsic to pointwise Lipschitz certification. The quantile-core certificate restricts attention to a high-probability residual core on which the oscillation is controlled by one-dimensional empirical quantiles, with a finite-sample probability slack of O(n^{-1/2}), independent of the ambient dimension. On synthetic targets (D=2-20), structural-engineering posteriors (D=6,8), real-data logistic regression on the Heart Disease data set (D=13), and synthetic Bayesian logistic regression (D=20), the quantile-core certificate delivers non-vacuous spectral-gap bounds where the covering certificate is vacuous, and its spectral-gap proxy tracks empirical effective sample sizes within 7%. A negative control experiment confirms that the certificate discriminates flow quality by a factor exceeding 10x, whereas acceptance rates differ by only 1.15x. To our knowledge, the dual-certificate framework is the first to provide automatic, dimension-aware convergence certificates for learned-transport MCMC, distinguishing genuine transport failure from proof-technique limitations.

Estimation of Treatment Effects Under Nonstationarity via the Truncated Policy Gradient Estimator

2026-06-02T00:54:37Z

Randomized experiments (or A/B tests) are widely used to evaluate interventions in dynamic systems such as recommendation platforms, marketplaces, and digital health. In these settings, interventions affect both current and future system states, so estimating the global average treatment effect (GATE) requires accounting for temporal dynamics, which is especially challenging in the presence of nonstationarity; existing approaches suffer from high bias, high variance, or both. In this paper, we address this challenge via the novel Truncated Policy Gradient (TPG) estimator, which replaces instantaneous outcomes with short-horizon outcome trajectories. The estimator admits a policy gradient interpretation: it is a truncation of the first-order approximation to the GATE, yielding provable reductions in bias and variance in nonstationary Markovian settings. We further establish a central limit theorem for the TPG estimator and develop a consistent variance estimator that remains valid under nonstationarity with single-trajectory data. We validate our theory with two real-world case studies. The results show that relative to existing approaches, a well-calibrated TPG estimator can achieve a favorable balance between bias and variance in nonstationary settings, highlighting the value of the policy-gradient perspective for designing effective estimators under complex dynamics.

Beyond Empirical Bayes: A Hierarchical Bayesian Approach to Crash Rate Estimation with Missing Traffic Volume

2026-06-01T20:36:56Z

The Empirical Bayes (EB) procedure of Hauer et al. (2002) is the workhorse of highway safety analysis: it combines a Safety Performance Function with observed crash counts to produce shrinkage estimates of segment-level crash rates. EB delivers practicality by holding several quantities fixed at calibration: SPF coefficients, per-type overdispersion, observed ADT, and a fixed exposure exponent. These assumptions strain when ADT is missing on a majority of segments. We present a fully Bayesian hierarchical model that moves beyond EB by relaxing each of these assumptions in a single joint inference. Fit on Ohio's road inventory (408,304 segments, 2.9 million crashes, 2013-2025), the model jointly imputes missing ADT and estimates per-segment crash rates with uncertainty. Posterior predictive checks of an initial fixed-exposure model expose a tail misfit; relaxing the exposure structure to a per-functional-class exposure exponent and an estimated length exponent, in place of a single scalar and a fixed offset, resolves it and improves out-of-sample predictive accuracy (PSIS-LOO $Δ\mathrm{elpd}$ = 9,394, SE 238). Crash count is sublinear in traffic in every class (exposure exponents 0.49-0.70, all $<1$, the safety-in-numbers effect) and sublinear in segment length ($β_{\mathrm{len}} = 0.69$). Partial pooling substantially improves out-of-sample predictive accuracy over complete pooling (PSIS-LOO $Δ\mathrm{elpd}$ = 4,780, SE 225). The Bayesian ADT submodel attains $R^2_{\log} = 0.756$ by encoding county and functional class as hierarchical priors, versus $0.653$ for a LightGBM restricted to the same continuous predictors. The output is a posterior crash rate distribution per segment, replacing the median-by-type point estimates used in our prior risk-aware routing framework.

Identification, Estimation, and Inference for Sequential Causally Ordered Mediation Pathways

2026-06-01T19:52:44Z

Mediation analysis plays an essential role in uncovering the mechanisms by which an exposure influences an outcome through intermediate pathways. While methodological advances for single-mediator settings are well established, rigorous tools for handling multiple, sequentially ordered mediators remain underdeveloped. Such settings are common in applications like longitudinal cohort studies, where exposures operate through complex chains of mediators over time. In this paper, we establish a general framework for sequentially ordered mediators that enables the identification and formal decomposition of the total effect into component path-specific effects. We also develop estimation procedures for mediation estimands with both continuous and categorical outcomes. Furthermore, we introduce a new testing strategy to conduct inference using a studentized statistic combined with data-splitting. This approach achieves valid Type I error control under the composite null across diverse data-generating mechanisms. Through extensive simulations and applications to two large-scale empirical studies, we demonstrate that the proposed methodology provides reliable estimation, valid inference, and improved power for discovering novel mediation pathways.

Emulators for Large-scale Computer Experiments with Quantitative and Qualitative Inputs

2026-06-01T18:38:51Z

Computer experiments with both quantitative and qualitative inputs have become common across various areas. However, constructing accurate and computationally efficient emulators for such experiments at large scales remains a significant challenge. We propose a novel, scalable framework for emulating computer experiments with mixed inputs. Our approach is based on a new covariance function integrating additive Gaussian Processes (GPs) to handle the mixed inputs, with Vecchia approximation for scalability. We demonstrate that methods for large-scale computer experiments can be effectively extended when paired with our proposed modeling framework.

Space-Filling One-Factor-At-A-Time Designs

2026-06-01T17:39:45Z

Space-filling designs are commonly used in deterministic computer experiments. However, they are ineffective for factor screening, which makes them inefficient when only a small subset of input factors is influential to the output. Recently developed screening designs, such as MOFAT designs, are effective at identifying important factors but lack space-filling properties, limiting their usefulness for surrogate modeling. In this article, we propose a new class of screening designs that improves the space-fillingness while retaining their screening capability. Through several numerical examples, we demonstrate that the proposed designs offer clear advantages over existing designs.

Adaptive clinical trial design with delayed treatment effects using elicited prior distributions

2026-06-01T17:37:37Z

Clinical trials with time-to-event endpoints, such as overall survival (OS) or progression-free survival (PFS), are fundamental for evaluating new treatments, particularly in immuno-oncology. However, modern therapies, such as immunotherapies and targeted treatments, often exhibit delayed effects that challenge traditional trial designs. These delayed effects violate the proportional hazards assumption, which underpins standard statistical methods like the Cox proportional hazards model and the log-rank test. Careful planning is essential to ensure trials are appropriately designed to account for the timing and magnitude of these effects. Without this planning, interim analyses may lead to premature trial termination if the treatment effect is underestimated early in the study. We present an adaptive trial design framework that incorporates prior distributions, elicited from experts, for delayed treatment effects. By addressing the uncertainty surrounding delayed treatment effects, our approach enhances trial efficiency and robustness, minimizing the risk of premature termination and improving the detection of treatment benefits over time. We present an example illustrating how interim analyses, informed by prior distributions, can guide early stopping decisions. To facilitate the implementation of our framework, we have developed free, open-source software that enables researchers to integrate prior distributions into trial planning and decision-making. This software provides a flexible, accessible tool for designing trials that more accurately evaluate modern therapies through adaptive trial designs.

Optimal sequential two-stage Bayes Factor Design for two-arm clinical Phase II Trials with binary Endpoints

2026-06-01T15:53:21Z

Two-arm phase II clinical trials often benefit from an interim analysis that allows early stopping for futility, but Bayesian calibration of such designs is usually based on computationally intensive Monte Carlo simulation. In this work, a simulation-free methodology is developed to obtain Bayesian optimal two-stage designs in two-arm phase II trials with binary endpoints using Bayes factors as the primary measure of evidence. Building on recent matrix-search methods for fixed-sample two-arm Bayes factor designs and earlier correction formulas for one-arm two-stage designs, the proposed approach derives exact expressions for the operating characteristics of a two-stage two-arm design with a single futility interim. Bayesian power and type-I error are obtained by correcting the corresponding fixed-sample quantities for trajectories that would have been removed by early stopping, yielding a fully numerical calibration procedure that avoids Monte Carlo error entirely. The resulting method searches over admissible interim and final sample sizes to identify the optimal design that satisfies target constraints on Bayesian power, type-I error, and the probability of compelling evidence in favour of the null hypothesis, while minimizing the expected sample size under the null hypothesis. The methodology is illustrated in realistic phase II settings, including a detailed re-analysis of the riociguat trial in systemic sclerosis. Overall, the approach extends simulation-free Bayes factor design methodology to the practically important setting of two-arm two-stage phase II trials and provides a transparent basis for Bayesian design calibration and sensitivity analysis.

Bayesian Mixed Multidimensional Scaling for Auditory Processing

2026-06-01T15:36:21Z

The human brain distinguishes speech sounds by mapping acoustic signals into a latent perceptual space. This space can be estimated via multidimensional scaling (MDS), preserving the similarity structure in lower dimensions. However, individual and group-level heterogeneity, especially between native and non-native listeners, remains poorly understood. Prior approaches often ignore such variability or cannot capture shared structure, limiting principled comparisons. Moreover, the literature often focuses on latent distances rather than the underlying features themselves. To address these issues, we develop a Bayesian mixed MDS method that accounts for both subject- and group-level heterogeneity, allows for the recovery of unique, identifiable latent features, facilitating their biological interpretability, while also determining the effective dimensionality of the latent space in an automated, data-adaptive manner. Simulations and an auditory neuroscience application demonstrate how these features reconstruct observed distances and vary with individual and language background, revealing novel insights.

Robust Wasserstein barycenter

2026-06-01T14:52:59Z

In this paper, we address a fundamental limitation of the classical Wasserstein barycenter -- its sensitivity to outliers. To overcome these issues, we propose the robust Wasserstein barycenter (RWB) based on a recent concept of the robust optimal transport. Theoretical guarantees, including existence and consistency, are established for the proposed RWB. Through extensive numerical experiments on both simulated and real-world data -- including image processing and financial data analysis -- we demonstrate that the RWB exhibits superior robustness compared to the classical Wasserstein barycenter.