https://arxiv.org/api/3FBm3ZHVsKhxQCLWDOaJ0YSMgOI2026-06-18T07:33:16Z3629675015http://arxiv.org/abs/2409.19712v2Posterior Conformal Prediction2026-05-27T06:52:51ZConformal prediction is a popular technique for constructing prediction intervals with distribution-free coverage guarantees. The coverage is marginal, meaning it only holds on average over the entire population but not necessarily for any specific subgroup. This article introduces posterior conformal prediction (PCP), which generates prediction intervals with both marginal and approximate conditional validity for clusters (or subgroups) naturally discovered in the data. PCP achieves these guarantees by modelling the conditional nonconformity score distribution as a mixture of cluster distributions. Compared to other methods with approximate conditional validity, this approach produces tighter intervals, particularly when the test data is drawn from clusters that are well represented in the validation data. PCP can also be applied to guarantee conditional coverage on user-specified subgroups, in which case it further ensures coverage for underrepresented individuals in each subgroup. When the response variable is categorical, PCP can adjust the coverage level based on the classifier's predictive probabilities, yielding low-cardinality prediction sets if the classifier is well calibrated. We demonstrate enhanced performance on datasets from socioeconomics, materials science, and healthcare.2024-09-29T14:09:07Z67 pages, 17 figuresYao ZhangEmmanuel J. Candèshttp://arxiv.org/abs/2603.08276v2A Unified Framework for Density Estimation under Right-Censored Point-Centred Quarter Sampling2026-05-27T06:24:13ZWhile the point-centred quarter method (PCQM) is widely used for density estimation, existing methods for handling right-censored data from truncated search radii rely primarily on a Poisson model assuming complete spatial randomness (CSR), leaving a critical gap for spatially aggregated populations. To address this limitation, we develop a unified likelihood- and moment-based framework for right-censored point-centred quarter sampling under both Poisson and negative binomial distribution (NBD) models. In particular, the proposed NBD-based estimators explicitly account for spatial aggregation and censoring simultaneously, extending distance-based inference beyond the CSR setting. Extensive simulations and applications to fully mapped forest plots reveal that the NBD-based MLE delivers the most robust overall performance across diverse ecological scenarios. Across more than 100 species from fully mapped forest plots, the proposed NBD-based MLE approximately reduced absolute relative bias by a median of 0.10 compared with existing censored estimators, representing a relative improvement of over 30%. Ultimately, our framework provides a rigorously validated and practically useful toolkit for analysing censored point-to-tree distance data.2026-03-09T11:47:55Z42 pages, 28 figures, 4 tableWenzhe HuangGuochun ShenDingliang XingJiangyan Zhaohttp://arxiv.org/abs/2605.27967v1Multi-Teacher Knowledge Distillation via Teacher-Informed Mixture Priors2026-05-27T05:03:24ZKnowledge distillation is a powerful method for model compression, enabling the efficient deployment of complex deep learning models (teachers), including large language models. However, its underlying statistical mechanisms remain unclear, and uncertainty evaluation is often overlooked, especially in real-world scenarios requiring diverse teacher expertise. To address these challenges, we introduce \textit{Multi-Teacher Bayesian Knowledge Distillation} (MT-BKD), where a distilled student model learns from multiple teachers within the Bayesian framework. Our approach leverages Bayesian inference to capture inherent uncertainty in the distillation process. We introduce a teacher-informed prior, integrating external knowledge from teacher models and task-specific training data, offering better generalization, robustness, and scalability. Additionally, an entropy-based weighting mechanism adaptively adjusts each teacher's influence, allowing the student to combine multiple sources of expertise effectively. MT-BKD enhances the interpretability of the student model's learning process, improves predictive accuracy, and provides uncertainty quantification. We validate MT-BKD on both synthetic and real-world tasks, including protein subcellular location prediction and image classification. Our experiments show improved performance and robust uncertainty quantification, highlighting the strengths of our MT-BKD framework.2026-05-27T05:03:24ZLuyang FangYongkai ChenJiazhang CaiPing MaWenxuan Zhonghttp://arxiv.org/abs/2605.27925v1Finite-size occupancy scaling of apparent fractal dimensions in stochastic trajectories2026-05-27T03:56:52ZEstimating a fractal dimension from a finite stochastic trajectory is a finite-size scaling problem: the apparent box-counting exponent is shaped by an occupancy crossover between the resolved range of scales and the finite number of sampled points, and need not equal the dimension of the limiting process. We model this crossover with a balls-in-boxes occupancy law, which predicts the box-count curve, the finite-size saturation scale, and a scaling function for the normalized local slope. Across random-walk traces, fractional Brownian graphs, and Levy flights, the normalized local slope collapses onto a single crossover curve, while the windowed box-counting bias collapses when the regression window is positioned relative to the saturation scale. Inverting the occupancy model gives a finite-size bias correction that reduces error on controlled stochastic trajectories and transfers across held-out model classes. Comparisons with correlation dimension, detrended fluctuation analysis, the variogram, and Higuchi's method show that the dominant bias is specific to point-sampled box-counting over finite scale windows, and that local-slope stability alone is not a reliable diagnostic. A DNA-walk example illustrates the workflow on measured data, and all figures, tables, and in-text numbers are regenerated from released single-seed code.2026-05-27T03:56:52ZMain text: 30 pages, 5 figures; supplementary material includedBon A. KooUniversity of PennsylvaniaEdward JuCalifornia Institute of Technologyhttp://arxiv.org/abs/2603.19745v3Invariant quantile regression for heterogeneous environments2026-05-27T02:25:45ZIn this paper, we propose an invariant quantile regression (IQR) framework specifically designed for multi-environment datasets, which captures the invariance across different environments. This framework is closely related to transfer learning, causal inference, and fair machine learning, and is motivated by scenarios in which the conditional probability of the response given covariates varies, while certain key variables remain invariant. This perspective differs notably from previous works that restrict attention to the conditional mean, which is often insufficient to capture the full causal relationships between covariates and the response in heterogeneous environments. In contrast, quantile-based invariance naturally accommodates heterogeneity, and aligns more closely with structural causal models, in which variables invariant across environments at one or multiple quantile levels directly indicate potential and stable causal variables. Moreover, we show that IQR may yield a larger set of endogenous variables compared to the conditional mean framework, which in turn promotes more effective exclusion of spurious (non-causal) variables. To achieve this, we introduce a Kernel-Smoothed Invariant Quantile Regression (KS-IQR) estimator, which leverages the underlying invariance structure and heterogeneity among environments, ensuring stable estimation across multiple environments. We establish the causal discovery properties of our method, demonstrate its ability to overcome the ``curse of endogeneity'', and derive an $\ell_2$ error bound for our estimator, all in a non-asymptotic framework. We apply our method to real data for causal discovery and obtain biologically meaningful relationships, recovering known signaling pathways and revealing additional quantile-specific effects.2026-03-20T08:29:51Z25 pages, 4 figuresBo FuDandan Jianghttp://arxiv.org/abs/2605.27844v1A Parameterization-Invariant DIC2026-05-27T02:01:26ZThe classic Deviance Information Criterion (DIC) is not invariant to reparameterization and can have a negative and unstable effective number of parameters. The reason for the effective number of parameters being negative is actually that the plug-in deviance becomes excessively large when the posterior means of the model parameter differ dramatically from the maximum likelihood estimates. In latent variable models, the cause can be identifiability issues that lead to meaningless and unstable plug-in estimates. Specifically, nonidentifiability means that distinct parameter points can have the same likelihood and switching between such points within or between MCMC chains produces unstable and meaningless posterior means. To address this issue, we propose a plug-in-free, parameterization-invariant version of the DIC, denoted DIC$_i$, and show that it is asymptotically equivalent to the Watanabe-Akaike Information Criterion (WAIC). Simulations demonstrate that DIC$_i$ aligns with WAIC in factor analysis and growth mixture models where the classic DIC breaks down. These results suggest that DIC$_i$ is a useful, computationally efficient alternative to the DIC when WAIC is not applicable or not available.2026-05-27T02:01:26ZXingyao XiaoStanford UniversitySophia Rabe-HeskethUniversity of California, Berkeleyhttp://arxiv.org/abs/2605.27794v1Learning to target with network interference2026-05-27T00:28:52ZThis paper studies adaptive targeting under network interference in a bandit setting, where treatments applied to one individual may affect others through spillover effects. We consider a linear model in a sparse regime, where each individual's outcome can be affected by at most a few others. We first establish a regret lower bound showing that ignoring the network structure and reducing the problem to a standard linear bandit inevitably leads to inefficient learning, particularly in large populations. To understand how structural information can be leveraged, we analyze regimes with varying levels of knowledge of the interference structure: (1) full support knowledge, (2) knowledge of the column support sizes, and (3) no prior knowledge. For each regime, we establish regret lower bounds characterizing the fundamental limits of learning, and develop algorithms that achieve near-optimal regret. Together, our results provide a unified view of how knowledge of the interference structure governs the efficiency of online learning under interference, and offer practical adaptive targeting algorithms in each setting. Numerical experiments on synthetic and real-world data demonstrate the practical benefits of our algorithms.2026-05-27T00:28:52ZXiaomeng WangHamsa BastaniOsbert BastaniZhimei Renhttp://arxiv.org/abs/2605.27718v1Robust Moment-Based Estimation via Spectral Gradient Reweighting2026-05-26T21:44:02ZMoment-based estimation is a theoretically attractive approach to parametric inference, especially when likelihood-based estimation is unavailable, misspecified, or computationally inconvenient. However, the moment equations involve sample averages, which makes moment-based estimation sensitive to outliers. We propose the SGR-GMM algorithm, a robust generalized method of moments (GMM) procedure that uses a spectral gradient reweighting (SGR) primitive to soft-reweight the per-observation gradients during the moment-matching optimization. Our analysis has three layers. First, for a fixed center, the SGR primitive is formulated as an entropy-regularized spectral game between a sample-weight player and a density-matrix player, which is analyzed using classical multiplicative-weights and matrix-multiplicative-weights regret bounds. Second, we establish explicit convergence radius and finite termination bound for the fixed-center updates in the SGR primitive. Third, we prove a local finite-sample parameter estimation error bound with explicit dependence on the contamination fraction, inlier gradient stability, local GMM identification strength, and optimization accuracy. We further specialize the SGR-GMM algorithm to obtain a robust diagonally-weighted GMM (DGMM) estimator for estimating heteroscedastic low-rank Gaussian mixtures observed under additive Gaussian noise and strong contamination. In the numerical experiments, the SGR primitive produces nearly-oracle gradient estimation and the robust DGMM specialization substantially improves over non-robust moment baselines. The code and data are available at https://github.com/liu-lzhang/sgr-gmm.2026-05-26T21:44:02ZLiu ZhangAmit Singerhttp://arxiv.org/abs/2605.27711v1Improving Power in Randomized Controlled Trials with Time-to-Event Endpoints: A Risk-Free Approach2026-05-26T21:36:00ZLeveraging external or historical data to improve the efficiency of randomized clinical trials without introducing bias or inflating the Type I error rate remains challenging. Recent work on externally trained prognostic scores, such as PROCOVA for continuous endpoint, has demonstrated a risk-free approach via covariate adjustment. However, extending this paradigm to time-to-event endpoints is nontrivial due to the non-collapsibility of the marginal hazard ratio (HR). In this paper, we address this challenge by proposing a unified framework for incorporating complex, high-dimensional prognostic information learned from external data into the primary analysis of RCTs with time- to-event endpoints, while targeting the marginal hazard ratio. The proposed procedure proceeds in two steps. First, a prognostic score is estimated from external or historical data by regressing martingale residuals on baseline covariates using flexible supervised learning methods. Second, the fitted score is included as an additional covariate in the nonparametric covariate-adjusted log-rank test and the associated marginal HR estimator of Ye et al. [2024]. The proposed method controls Type I error and provides asymptotic unbiased estimation of the marginal HR, irrespective of prognostic model misspecification, or population heterogeneity between external/historical and trial data. We show that the variance reduction, and corresponding event count savings, are approximately equal to the squared correlation between the prognostic score and the martingale pseudo-outcome in the trial. Extensions to stratified randomization are straightforward. Simulation studies demonstrate satisfactory finite-sample performance and meaningful efficiency gains when historical prognostic information is informative.2026-05-26T21:36:00ZJunyi ZhouQing LiuMay MoAmy Xiahttp://arxiv.org/abs/2206.15475v3Causal Machine Learning: A Survey and Open Problems2026-05-26T21:14:13ZCausal Machine Learning (CausalML) is an umbrella term for machine learning methods that formalize the data-generation process as a structural causal model (SCM). This perspective enables us to reason about the effects of changes to this process (interventions) and what would have happened in hindsight (counterfactuals). We categorize work in CausalML into five groups according to the problems they address: (1) causal supervised learning, (2) causal generative modeling, (3) causal explanations, (4) causal fairness, and (5) causal reinforcement learning. We systematically compare the methods in each category and point out open problems. Further, we review data-modality-specific applications in computer vision, natural language processing, and graph representation learning. Finally, we provide an overview of causal benchmarks and a critical discussion of the state of this nascent field, including recommendations for future work.2022-06-30T17:59:15Zv03. Work in progress. Feedback and comments are highly appreciated!Jean KaddourAengus LynchQi LiuMatt J. KusnerRicardo Silvahttp://arxiv.org/abs/2605.27664v1BOOST: Power-Optimal Strong-FWER Testing for Block-Structured Multiplicity2026-05-26T20:30:36ZStructured multiple-testing problems (gatekeeping trials, dose-finding, multi-tissue eQTL mapping, bundled-challenger A/B experiments) organize hypotheses into design-imposed blocks and demand strong family-wise error rate (FWER) control for confirmatory claims. Practitioners currently use objective-agnostic stepwise rules (Bonferroni, Holm, Hochberg, Hommel), closed-testing and graphical extensions, or hierarchical and resampling methods; none is power-optimal within the block-separable class these designs induce. We introduce BOOST (Block-Optimal Objective-driven Strong-FWER Testing), the power-optimal strong-FWER procedure for block size three, with three guarantees: (i) finite-sample strong-FWER validity at $O(K)$ cost (versus $O(K^2)$ for general closed testing) without independence assumptions, with a strict Sidak improvement under cross-block independence; (ii) power-optimal allocation across heterogeneous blocks via an equalized-marginal KKT condition, solvable by bisection in $O(B\log(1/\varepsilon))$; and (iii) a sample-split plug-in variant for unknown alternative density $g$, attaining $α$-control up to $O(B_T \mathbb E\|g-\widehat g\|_\infty)$ inflation with per-hypothesis power deficit independent of $B_T$. Simulations across independent, equicorrelated, sparse, and mis-specified regimes show 1.4-1.7$\times$ power gains over the strongest existing baseline at calibrated FWER. On two published datasets (BLUEPRINT cross-lineage cis-eQTL and Upworthy bundled-challenger A/B experiments), BOOST certifies an order of magnitude more full-block discoveries than existing baselines at controlled FWER.2026-05-26T20:30:36ZPrasanjit DubeyXiaoming Huohttp://arxiv.org/abs/2606.07578v1MST-Direct at Scale: Multivariate and Conditional Geostatistical Simulation via Sinkhorn Optimal Transport2026-05-26T20:20:06ZThis paper extends MST-Direct, a Matching-via-Sinkhorn-Transport approach for multivariate geostatistical simulation, from the original bivariate, unconditional, small-grid formulation to multivariate, conditional, and large-grid settings. We address the three main limitations identified in the original work: (i) scalability beyond a few thousand nodes through a sparse, candidate-restricted Sinkhorn matcher with O(nC) memory complexity; (ii) extension to multiple variables by matching target value tuples onto an independent FFT-MA Gaussian backbone that reproduces a prescribed variogram; and (iii) hard-data conditioning by fixing observed data tuples at their spatial locations while conditioning the backbone through kriging. Because the transport plan remains a permutation of the target tuples, the multivariate joint distribution is preserved exactly.
The method is validated using the same six-variate, heteroscedastic, strongly nonlinear reference distribution employed in Direct Multivariate Simulation (DMS), under both unconditional (200x200) and conditional (100x100, 200 hard-data samples) scenarios, and is benchmarked against the Projection Pursuit Multivariate Transform (PPMT). Results show that MST-Direct reproduces the joint distribution with zero histogram error, exactly honours hard data, and accurately reproduces the prescribed spatial correlation structure, whereas PPMT remains an approximation.
Index Terms-Optimal transport, Sinkhorn algorithm, geostatistical simulation, multivariate simulation.2026-05-26T20:20:06ZTcharlies Bachmann Schmitzhttp://arxiv.org/abs/2509.22446v2Rescuing double robustness: safe estimation under complete misspecification2026-05-26T20:15:50ZDouble robustness is a major selling point of semiparametric and missing data methodology. Its virtues lie in protection against partial nuisance misspecification and asymptotic semiparametric efficiency under correct nuisance specification. However, in many applications, complete nuisance misspecification should be regarded as the norm (or at the very least the expected default), and thus doubly robust estimators may behave fragilely. In fact, it has been amply verified empirically that these estimators can perform poorly when all nuisance functions are misspecified. Here, we first characterize this phenomenon of double fragility, and then propose a solution based on adaptive correction clipping (DR+ACC). We argue that our DR+ACC proposal is safe, in that it inherits the favorable properties of doubly robust estimators under correct nuisance specification, but its error is guaranteed to be bounded by a convex combination of the individual nuisance model errors, which prevents the instability caused by the compounding product of errors of doubly robust estimators. We also show that our proposal comes with no reduction in semiparametric efficiency compared to doubly robust estimators, and thus valid inference based on asymptotic normality can be conducted when nuisances are well-specified. We showcase the efficacy of our DR+ACC estimator both through extensive simulations and by applying it to the analysis of Alzheimer's disease proteomics data.2025-09-26T15:03:18Z23 pages, 4 figuresLorenzo TestaFrancesca ChiaromonteKathryn Roederhttp://arxiv.org/abs/2605.27655v1Implementing the principal stratum strategy for intercurrent events with survival outcomes: a tutorial2026-05-26T20:14:51ZThe International Council for Harmonization (ICH) E9 (R1) addendum provides the estimand framework to formulate treatment effects in a clinical trial. One of the attributes of an estimand the framework describes is intercurrent events. Among the five strategies to intercurrent events the guidance lists, the principal stratum strategy is the most conceptually and technically challenging because it defines treatment effects on unobserved strata. Its application to survival outcomes is particularly inaccessible to practitioners. This tutorial reviews the methodology and implementation of the estimand framework with the principal stratum strategy to address intercurrent events with survival outcomes. We illustrate using a clinical trial in oncology and focus on a simple case with binary treatment and a single binary intercurrent event of discontinuation of the assigned treatment. We define the causal effects and review two main methods for estimating the effects: the mixture model method and the weighting method. For each method, we elaborate the associated assumptions, models, sensitivity analysis, software and provide example R code. We conduct simulation studies that mimic the real study to study the operation characteristics of these methods.2026-05-26T20:14:51ZXiaoxiao ZhouJoyce ChenPallavi Mishra-KalyaniXiaoxue LiYuan Li ShenShu WangSusan HalabiFan Lihttp://arxiv.org/abs/2605.27650v1Bayesian Imputation for Unplayed Games in Round-Robin Chess Tournaments: Application to Grand Chess Tour, Bucharest 20262026-05-26T20:10:05ZWhen a player withdraws mid-tournament from a round-robin chess event, organizers face a fundamental problem: how should scores be assigned for games that were never played? Current FIDE guidelines specify annulment if withdrawal occurs before 50% of games are completed, and forfeit (awarding unplayed opponents a full point) thereafter. This dichotomous rule creates arbitrary discontinuities and can substantially distort final standings. We develop a Bayesian framework based on best linear unbiased prediction (BLUP) that optimally combines pre-tournament ratings with observed performance, producing imputed scores that reflect both the withdrawn player's current form and the strength differentials among unplayed opponents. The estimator is consistent, point-conserving, and minimizes mean squared error among linear unbiased predictors. A Monte Carlo simulation study on 180,000 simulated tournaments demonstrates that Bayesian BLUP imputation reduces prediction error by 26% overall compared to FIDE's current rule, with improvements of 41% over forfeit and 12% over annulment. The largest gains occur when the withdrawn player is underperforming, the most common withdrawal scenario. We further show that annulment achieves 15-45% lower RMSE than forfeit across all scenarios. The methodology is applied to GM Alireza Firouzja's withdrawal at Grand Chess Tour, Bucharest 2026, where Bayesian imputation would have awarded unplayed opponents 0.55-0.70 points rather than the 1.0 awarded under forfeit rules. An open-source R Shiny application is provided for tournament organizers. We recommend that FIDE adopt Bayesian imputation for World Championship cycle events, or at minimum replace the current dichotomous rule with uniform annulment.2026-05-26T20:10:05ZRavi Varadhan