https://arxiv.org/api/B8sNcrOZtE4RK0/hSYTeBDcLKuM 2026-03-20T14:32:32Z 34634 30 15 http://arxiv.org/abs/2507.03681v3 Robust estimation of heterogeneous treatment effects in randomized trials leveraging external data 2026-03-18T16:15:55Z Randomized trials are typically designed to detect average treatment effects but often lack the statistical power to uncover individual-level treatment effect heterogeneity, limiting their value for personalized decision-making. To address this, we propose the QR-learner, a model-agnostic learner that estimates conditional average treatment effects (CATE) within the trial population by leveraging external data from other trials or observational studies. The proposed method is robust: it can reduce the mean squared error relative to a trial-only CATE learner, and is guaranteed to recover the true CATE even when the external data are not aligned with the trial. Moreover, we introduce a procedure that combines the QR-learner with a trial-only CATE learner and show that it asymptotically matches or exceeds both component learners in terms of mean squared error. We examine the performance of our approach in simulation studies and apply the methods to a real-world dataset, demonstrating improvements in both CATE estimation and statistical power for detecting heterogeneous effects. 2025-07-04T16:01:05Z Accepted to AISTATS 2026. 24 pages, including references and appendix Rickard Karlsson Piersilvio De Bartolomeis Issa J. Dahabreh Jesse H. Krijthe http://arxiv.org/abs/2603.08511v2 Kantorovich Regression Analysis of Random Distributions with Mixed Predictors 2026-03-18T15:59:21Z We study regression problems with distribution-valued responses and mixed distributional and Euclidean predictors. In quadratic cost, the negative gradient of the Kantorovich potential represents, at each source location, the displacement to its matched location under the optimal transport map. By constructing potentials from the Wasserstein barycenter to individual distributions, the proposed Kantorovich regression model approximates the response displacement field as a sum of predictor displacement fields, each adjusted by a functional parameter. Owing to the linear structure, Euclidean predictors can enter as scaling coefficients of $c$-concave parameter potentials. We characterize functional parameter classes ensuring the intrinsic structure of the model, establish asymptotic theory through uniform convergence of the empirical Wasserstein loss, and derive Gâteaux derivatives leading to first-order optimization algorithms. Real data applications include a mixed-predictor analysis of housing price distributions and an analysis of two-dimensional temperature distributions, demonstrating the flexibility and interpretability of the proposed framework. 2026-03-09T15:43:20Z Kaheon Kim Changbo Zhu http://arxiv.org/abs/2603.17866v1 Bayesian multilevel step-and-turn models for evaluating player movement in American football 2026-03-18T15:54:28Z In sports analytics, player tracking data have driven significant advancements in the task of player evaluation. We present a novel generative framework for evaluating the observed frame-by-frame player positioning against a distribution of hypothetical alternatives. We illustrate our approach by modeling the within-play movement of an individual ball carrier in the National Football League (NFL). Specifically, we develop Bayesian multilevel models for frame-level player movement based on two components: step length (distance between successive locations) and turn angle (change in direction between successive steps). Using the step-and-turn models, we perform posterior predictive simulation to generate hypothetical ball carrier steps at each frame during a play. This enables comparison of the observed player movement with a distribution of simulated alternatives using common valuation measures in American football. We apply our framework to tracking data from the first nine weeks of the 2022 NFL season and derive novel player performance metrics based on hypothetical evaluation. 2026-03-18T15:54:28Z Quang Nguyen Ronald Yurko http://arxiv.org/abs/2603.17864v1 Bivariate deconvolution for cancer detection after surgery 2026-03-18T15:52:28Z Detection of minimal residual disease (MRD) in cancer patients after surgery can provide an early marker for disease recurrence and guide subsequent treatment decisions. Accurate and sensitive estimation of tumour burden after cancer surgery may be obtained through liq- uid biopsies, measuring circulating tumour DNA (ctDNA) using, for example, mutation-based Variant Allele Frequency (VAF) values. However, to be applicable to all patients this ei- ther requires tumour-informed, patient-specific mutation panels or sensitive, tumour-agnostic genome-wide measurements. We propose a solution that accounts for patient-specific charac- teristics in genome-wide screens. For that, we introduce a bivariate deconvolution model to estimate tumour proportion from circulating cell-free DNA (cfDNA) methylation profiles of patients before and after surgery. The observations are modelled as a convolution of two bivariate latent variables, corresponding to tumour and background signals, mixed by the tumour proportion at each measurement. This bivariate approach links pre- and post-surgery measurements improving estimation of the tumour proportion after surgery, when the tumour signal is potentially very weak, or absent. We approximate likelihood of the convolution through a discretisation of the bivariate density for each latent variable into a two-dimensional grid for each pair of observations which allows for fast maximum likelihood estimation. We evaluate the predictive performance of the estimated post-surgery tumour proportions based on cfDNA methylation against available mutation-based VAF values in one-year recurrence-free survival. 2026-03-18T15:52:28Z 11 pages, 3 figures and appendix Nuria Senar Stavros Makrodimitris Michel H. Hof Cornelis Verhoef Saskia M. Wilting Mark A. van de Wiel http://arxiv.org/abs/2505.17300v2 Statistical Inference for Online Algorithms 2026-03-18T14:52:57Z The construction of confidence intervals and hypothesis tests for functionals is a cornerstone of statistical inference. Traditionally, the most efficient procedures - such as the Wald interval or the Likelihood Ratio Test - require both a point estimator and a consistent estimate of its asymptotic variance. However, when estimators are derived from online or sequential algorithms, computational constraints often preclude multiple passes over the data, complicating variance estimation. In this article, we propose a computationally efficient, rate-optimal wrapper method (HulC) that wraps around any online algorithm to produce asymptotically valid confidence regions bypassing the need for explicit asymptotic variance estimation. The method is provably valid for any online algorithm that yields an asymptotically normal estimator. We evaluate the practical performance of the proposed method primarily using Stochastic Gradient Descent (SGD) with Polyak-Ruppert averaging. Furthermore, we provide extensive numerical simulations comparing the performance of our approach (HulC) when used with other online algorithms, including implicit-SGD and ROOT-SGD. 2025-05-22T21:31:49Z 1) Adding to ASGD simulations, we add 5 other SGD algorithms: averaged-implicit-SGD, last-iterate-implicit-SGD, ROOT-SGD, truncated-SGD, and noisy-truncated-SGD. 2) We modify links to the online viz/GitHub pages. 3) We qualify previous conclusions on ASGD: ex, we claim that logistic regression is sometimes more challenging "in terms of achieving the target coverage" than linear regression Selina Carter Arun K Kuchibhotla http://arxiv.org/abs/2603.18114v1 Transfer Learning for Contextual Joint Assortment-Pricing under Cross-Market Heterogeneity 2026-03-18T14:48:04Z We study transfer learning for contextual joint assortment-pricing under a multinomial logit choice model with bandit feedback. A seller operates across multiple related markets and observes only posted prices and realized purchases. While data from source markets can accelerate learning in a target market, cross-market differences in customer preferences may introduce systematic bias if pooled indiscriminately. We model heterogeneity through a structured utility shift, where markets share a common contextual utility structure but differ along a sparse set of latent preference coordinates. Building on this, we develop Transfer Joint Assortment-Pricing (TJAP), a bias-aware framework that combines aggregate-then-debias estimation with a UCB-style policy. TJAP constructs two-radius confidence bounds that separately capture statistical uncertainty and transfer-induced bias, uniformly over continuous prices. We establish matching minimax regret bounds of order $\tilde{O}\!\left(d\sqrt{\frac{T}{1+H}} + s_0\sqrt{T}\right),$revealing a transparent variance-bias tradeoff: transfer accelerates learning along shared preference directions, while heterogeneous components impose an irreducible adaptation cost. Numerical experiments corroborate the theory, showing that TJAP outperforms both target-only learning and naive pooling while remaining robust to cross-market differences. 2026-03-18T14:48:04Z Elynn Chen Xi Chen Yi Zhang http://arxiv.org/abs/2302.02415v3 On Separability of Covariance in Multiway Data Analysis 2026-03-18T14:18:26Z Multiway data analysis aims to uncover patterns in data structured as multi-indexed arrays, with multiway covariance playing a crucial role in many applications. However, the high dimensionality of multiway covariance presents significant computational challenges. To overcome these challenges, factorized covariance models have been proposed that rely on a separability assumption: the multiway covariance can be accurately expressed as a sum of Kronecker products of mode-wise covariances. This paper addresses the representability, certification, and approximation of such separable models, leaving statistical estimation or finite-sample properties aside. We reduce the question of whether a given covariance can be decomposed into a separable multiway form to an equivalent question about the separability of quantum states. Leveraging results from quantum information theory, we show that generic multiway covariances are typically \emph{not} separable and that determining the best separable approximation is NP-hard. These findings suggest that factorized covariance models can be overly restrictive and difficult to fit without additional structural assumptions. Nevertheless, our numerical experiments indicate that standard iterative algorithms, namely Frank-Wolfe and gradient descent, often converge close to the best separable approximation. As NP-hardness concerns worst-case computational complexity, Kronecker-separable approximations to multiway covariance could still be tractable to apply for analyzing many real-world datasets. 2023-02-05T15:54:13Z 45 pages, 8 figures, 3 tables Dogyoon Song Alfred O. Hero http://arxiv.org/abs/2502.05021v7 Gradient-based filtering under misspecification: Stability and error bounds 2026-03-18T14:16:26Z Can stochastic gradient methods track a moving target? We study the problem of tracking multidimensional time-varying parameters under noisy observations and possible model misspecification. Gradient-based filters update the time-varying parameters using the gradient of a postulated objective function. A natural filtering objective is the logarithm of the postulated observation density, which gives rise to the widely used class of score-driven filters. As in the optimization literature, these filters come in two forms: explicit filters evaluate the gradient at the predicted parameter, whereas implicit filters evaluate it at the updated parameter. For both filter types, we derive novel sufficient conditions for exponential stability of the filtered parameter path, showing that stability can be guaranteed independently of the data-generating process. Under mild additional moment conditions on the data-generating process, we also obtain finite-sample and asymptotic mean squared error bounds relative to the pseudo-true parameter path. For implicit filters, these guarantees hold under weak parameter restrictions. For explicit filters, they additionally require Lipschitz continuity of the score and a sufficiently small learning rate. Simulation studies support our theoretical findings and show that implicit gradient filters outperform explicit ones in both accuracy and stability. 2025-02-07T15:48:16Z 62 pages Simon Donker van Heel Rutger-Jan Lange Bram van Os Dick van Dijk http://arxiv.org/abs/2603.17663v1 More with Less - Bethel Allocation and Precision-Preserving Sample Size Reduction via Hierarchical Bayes Modelling 2026-03-18T12:28:55Z Statistical offices face a familiar and intensifying dilemma: rising demand for detailed regional and domain-level estimates under budgets that are fixed or shrinking. National statistical offices (NSOs) either ignore the problem of optimal sample allocation for multiple target variables when designing a multi-purpose survey, or address it incorrectly - relying on ad hoc approaches such as computing Neyman allocations separately per variable and taking the element-wise maximum, a practice that simultaneously wastes budget and fails to guarantee precision across all domains. This paper presents a practical two-stage strategy that reframes the question: not how to allocate a given sample, but how small the sample can be made while still meeting pre-defined precision targets for all target variables across all geographic domains at once. The innovation lies not in inventing new methods, but in the novel combination of two well-established techniques applied to this cost-reduction problem: (i) multivariate constrained optimisation via Bethel allocation, which finds the globally minimum sample satisfying all precision constraints simultaneously; and (ii) Hierarchical Bayes (HB) small area modelling, which borrows strength across strata and permits a further reduction of the Bethel sample. The approach is validated using a Monte Carlo study (B = 1,000 replications) based on a synthetic labour-force population of one million individuals, where known population truth allows rigorous evaluation of precision, accuracy, and credible-interval coverage. Keywords: Bethel allocation; Hierarchical Bayes; small area estimation; sample size reduction; multivariate optimisation; labour force survey; coefficient of variation. 2026-03-18T12:28:55Z 29 pages,9 tables and 1 appendix Siu-Ming Tam http://arxiv.org/abs/2603.17628v1 rSDNet: Unified Robust Neural Learning against Label Noise and Adversarial Attacks 2026-03-18T11:47:46Z Neural networks are central to modern artificial intelligence, yet their training remains highly sensitive to data contamination. Standard neural classifiers are trained by minimizing the categorical cross-entropy loss, corresponding to maximum likelihood estimation under a multinomial model. While statistically efficient under ideal conditions, this approach is highly vulnerable to contaminated observations including label noises corrupting supervision in the output space, and adversarial perturbations inducing worst-case deviations in the input space. In this paper, we propose a unified and statistically grounded framework for robust neural classification that addresses both forms of contamination within a single learning objective. We formulate neural network training as a minimum-divergence estimation problem and introduce rSDNet, a robust learning algorithm based on the general class of $S$-divergences. The resulting training objective inherits robustness properties from classical statistical estimation, automatically down-weighting aberrant observations through model probabilities. We establish essential population-level properties of rSDNet, including Fisher consistency, classification calibration implying Bayes optimality, and robustness guarantees under uniform label noise and infinitesimal feature contamination. Experiments on three benchmark image classification datasets show that rSDNet improves robustness to label corruption and adversarial attacks while maintaining competitive accuracy on clean data, Our results highlight minimum-divergence learning as a principled and effective framework for robust neural classification under heterogeneous data contamination. 2026-03-18T11:47:46Z Pre-print; under review Suryasis Jana Abhik Ghosh http://arxiv.org/abs/2603.17599v1 Prediction with Missing Data: Target Probabilities and Missingness Mechanisms 2026-03-18T11:13:41Z Conditions ensuring optimal parameter estimation in the presence of missing data are well established in inference, typically relying on the Missing-at-Random (MAR) assumption. In prediction, similar principles are often assumed to apply. However, methods considered biased in inference, such as pattern sub-modelling or unconditional imputation, have been shown to achieve optimal predictive performance under any missingness mechanism, including non-MAR (MNAR). To explain this apparent contradiction, we introduce a new formal framework for describing missingness in prediction. Central to this framework is a distinction between two prediction targets, defined according to whether or not the indicator of observation of the predictors is exploited to predict the outcome. This distinction leads to a classification of the missingness mechanisms describing the conditions under which these targets are equal, and when consistent prediction of each is achievable. A key result is that both targets may be consistently predicted under conditions weaker than MAR. We discuss the implications of this paradigm for handling missing data in prediction, distinguishing between missingness at development, validation and deployment of a forecaster. The findings are illustrated using simulated data and a real-world application with the prediction of significant injury after trauma upon arrival at the emergency department. 2026-03-18T11:13:41Z 55 pages (including 40 pages for the main article and 15 pages for the supplementary material) Pierre Catoire Robin Genuer Cecile Proust-Lima http://arxiv.org/abs/2603.17502v1 A lightweight framework for characterising extreme precipitation events in climate ensembles 2026-03-18T09:06:13Z This article summarises the methods used by the team ``Ca' Foscari" for the EVA 2025 Data Challenge. The questions of the challenge concern the estimation of exceedance probabilities across several locations. Rather than modelling the spatial dependence structure, we reduce the problems to univariate ones by considering relevant spatial order statistics across the sites. Within a Peaks over Threshold framework, we model the marginal distributions of exceedances using generalised Pareto distributions. Generalised additive models are employed to allow the parameters to vary as functions of external predictors, which for all questions are reduced to the month. For questions 1 and 2, the required estimates and confidence intervals are obtained by generating samples from our fitted models. Question 3 involves the dependence between two consecutive observed statistics. To account for this temporal dependence, we fit a conditional extreme value model and derive empirical estimates of persistent extreme events by simulating from this model. 2026-03-18T09:06:13Z Dáire Healy Isadora Antoniano-Villalobos Claudia Collarin Nathan Huet Ilaria Prosdocimi Emilia Siviero http://arxiv.org/abs/2507.00641v3 Hebbian Physics Networks: A Self-Organizing Computational Architecture Based on Local Physical Laws 2026-03-18T08:55:11Z Physical transport processes organize through local interactions that redistribute imbalance while preserving conservation. Classical solvers enforce this organization by applying fixed discrete operators on rigid grids. We introduce the Hebbian Physics Network (HPN), a computational framework that replaces this rigid scaffolding with a plastic transport geometry. An HPN is a coupled dynamical system of physical states on nodes and constitutive weights on edges in a graph. Residuals--local violations of continuity, momentum balance, or energy conservation--act as thermodynamic forces that drive the joint evolution of both the state and the operator (i.e. the adaptive weights). The weights adapt through a three-factor Hebbian rule, which we prove constitutes a strictly local gradient descent on the residual energy. This mechanism ensures thermodynamic stability: near equilibrium, the learned operator naturally converges to a symmetric, positive-definite form, rigorously reproducing Onsagerś reciprocal relations without explicit enforcement. Far from equilibrium, the system undergoes a self-organizing search for a transport topology that restores global coercivity. Unlike optimization-based approaches that impose physics through global loss functions, HPNs embed conservation intrinsically: transport is restored locally by the evolving operator itself, without a global Poisson solve or backpropagated objective. We demonstrate the framework on scalar diffusion and incompressible lid-driven cavity flow, showing that physically consistent transport geometries and flow structures emerge from random initial conditions solely through residual-driven local adaptation. HPNs thus reframe computation not as the solution of a fixed equation, but as a thermodynamic relaxation process where the constitutive geometry and physical state co-evolve. 2025-07-01T10:34:14Z 16 pages, 3 figures Gunjan Auti Hirofumi Daiguji Gouhei Tanaka 10.1103/tzgk-jqj4 http://arxiv.org/abs/2603.17469v1 Fast and scalable inference in hidden Markov models with Gaussian fields 2026-03-18T08:22:46Z Hidden Markov models (HMMs) are powerful tools for analysing time series data that depend on discrete underlying but unobserved states. As such, they have gained prominence across numerous empirical disciplines, in particular ecology, medicine, and economics. However, the increasing complexity of empirical data is often accompanied by additional latent structure such as spatial effects, temporal trends, or measurement perturbations. Gaussian fields provide an attractive building block for incorporating such structured latent variation into HMMs. Fast inference methods for Gaussian fields have emerged through the stochastic partial differential equation (SPDE) approach. Due to their sparse representation, these integrate well with novel frequentist estimation methods for random-effects models via the use of automatic differentiation and the Laplace approximation. Scaling to high dimensions requires tools such as (R)TMB to exploit sparsity in the Hessian w.r.t. the latent variables - a property satisfied by SPDE fields but violated by the HMM likelihood. We present a modified forward algorithm to compute the HMM likelihood, constructing sparsity in the Hessian and consequently enabling fast and scalable inference. We demonstrate the practical feasibility and the usefulness through simulations and two case studies exploring the detection of stellar flares as well as modelling the movement of lions. 2026-03-18T08:22:46Z 37 pages, 13 figures Jan-Ole Fischer http://arxiv.org/abs/2603.17460v1 Algorithms for Models with Intractable Normalizing Functions 2026-03-18T08:05:17Z In this paper we discuss a well known computing problem -- inference for models with intractable normalizing functions. Models with intractable normalizing functions arise in a wide variety of areas, for instance network models, models for spatial data on lattices, spatial point processes, flexible models for count data and gene expression, and models for permutations. Simulating from these models for fixed parameter values is well studied, starting with work dating back seventy years to the origin of the Metropolis algorithm. On the other hand some of the most practical and theoretically justified algorithms for inference, particularly Bayesian inference, have only been developed within the past two decades. The most computationally efficient algorithms often do not have well developed theory and few if any approaches exist for assessing the quality of approximations based on them. For many problems even the best algorithms can be computationally infeasible. Hence, this is an exciting area of research with many open problems. We explain several key algorithms, providing connections and touching upon practical advantages and disadvantages of each, with some discussion of theoretical properties where they impact practice. We discuss an approach for assessing the accuracy of approximations produced by these algorithms; this diagnostic is particularly valuable for algorithm tuning. While our focus is largely on models with intractable normalizing functions, we also discuss algorithms that are more broadly applicable to models where the entire likelihood function is intractable; these methods are of course also applicable to intractable normalizing function problems. 2026-03-18T08:05:17Z Murali Haran Bokgyeong Kang Jaewoo Park