https://arxiv.org/api/sEVnJsHmwNFF7ovXP+QGa4Y+bCE2026-04-06T11:42:26Z3488830015http://arxiv.org/abs/2511.09500v4Distributional Shrinkage I: Universal Denoiser Beyond Tweedie's Formula2026-03-24T22:28:38ZWe study the problem of denoising when only the noise level is known, not the noise distribution. Independent noise $Z$ corrupts a signal $X$, yielding the observation $Y = X + σZ$ with known $σ\in (0,1)$. We propose \emph{universal} denoisers, agnostic to both signal and noise distributions, that recover the signal distribution $P_X$ from $P_Y$. When the focus is on distributional recovery of $P_X$ rather than on individual realizations of $X$, our denoisers achieve order-of-magnitude improvements over the Bayes-optimal denoiser derived from Tweedie's formula, which achieves $O(σ^2)$ accuracy. They shrink $P_Y$ toward $P_X$ with $O(σ^4)$ and $O(σ^6)$ accuracy in matching generalized moments and densities. Drawing on optimal transport theory, our denoisers approximate the Monge--Ampère equation with higher-order accuracy and can be implemented efficiently via score matching.
Let $q$ denote the density of $P_Y$. For distributional denoising, we propose replacing the Bayes-optimal denoiser, $$\mathbf{T}^*(y) = y + σ^2 \nabla \log q(y),$$ with denoisers exhibiting less-aggressive distributional shrinkage, $$\mathbf{T}_1(y) = y + \frac{σ^2}{2} \nabla \log q(y),$$ $$\mathbf{T}_2(y) = y + \frac{σ^2}{2} \nabla \log q(y) - \frac{σ^4}{8} \nabla \!\left( \frac{1}{2} \| \nabla \log q(y) \|^2 + \nabla \cdot \nabla \log q(y) \right)\!.$$2025-11-12T17:20:42Z27 pages, 5 figuresTengyuan Lianghttp://arxiv.org/abs/2603.23726v1Inverse Probability Weighting of Count Exposures in the Presence of Missing Data: A Simulation Study2026-03-24T21:22:52ZInverse probability of treatment weighting (IPTW) is widely used to estimate causal effects, but guidance is limited for count exposures. It is also unclear how IPTW performs when combined with multiple imputation in this context. In this study, we evaluated five IPTW methods applied to count exposures: multinomial binning, parametric and non-parametric covariate balancing propensity scores (CBPS, npCBPS), generalised boosted models (GBM), and energy balancing. Our simulations were informed by an example using data from the 1970 British Cohort Study, aiming to estimate the effect of psychological distress, measured as a count of symptoms at age 34, on self-reported longstanding illness at age 42. We compared these approaches on bias, coverage, effective sample size, and other metrics under truncated negative binomial and Poisson exposure distributions. We also assessed the performance of Rubin's rules under different missingness mechanisms. Under complete data, multinomial, CBPS, GBM, and energy weights produced low bias and near-nominal coverage, whereas npCBPS resulted in bias and poor coverage due to extreme weights. When data were missing completely at random, similar performance patterns were observed for IPTW with multiple imputation. Under missing at random, bias increased with higher missingness, but this was present for both IPTW and covariate-adjusted regression, possibly reflecting a limitation of the imputation model rather than a failure of IPTW. Overall, these findings support the use of multinomial, CBPS, GBMs, and energy weights for count exposures in similar settings while highlighting trade-offs between these methods and the need for imputation models accommodating right-truncated overdispersed counts.2026-03-24T21:22:52ZMartin N. DankaJessica K. BoneGeorge B. PloubidisRichard J. Silverwoodhttp://arxiv.org/abs/2409.16003v5Easy Conditioning far beyond Gaussian2026-03-24T20:45:59ZMultivariate Gaussian distributions enjoy Gaussian conditional distributions that makes conditioning easy: conditioning boils down to implementing analytical formulae for conditional means and covariances. For more general distributions, however, conditional distributions may not be available in analytical form and require demanding and approximate numerical approaches. Primarily motivatedby probabilistic imputation problems, we review and discuss families of multivariate distributions that do enjoy analytical conditioning, also providing a few counter-examples. Proving that trans-dimensional stability under conditioning extends to mixtures and transformations, we demonstrate that a broader class of multivariate distributions inherit easy conditioning properties. Building on this insight, we developed a generative method to estimate conditional distributions from data by first fitting a flexible joint distribution using copulas and then performing analytical conditioning in a latent space. In our applications, we specifically opt for Gaussian Mixture Copula Models (GMCM), comparing in turn various fitting strategies. Through simulations and real-world data experiments, we showcase the efficacy of our method in tasks involving conditional density estimation and data imputation. We also touch upon links to Gaussian process modelling and how stability by mixtures and transformations and mixtures carries over towards easy conditioning of non-Gaussian processes.2024-09-24T12:04:28Z36 pages, 13 figuresAntoine FaulDavid GinsbourgerBen Spycherhttp://arxiv.org/abs/2507.23743v2Relative Bias Under Imperfect Identification in Observational Causal Inference2026-03-24T20:23:47ZTo conduct causal inference in observational settings, researchers must rely on certain identifying assumptions. In practice, these assumptions are unlikely to hold exactly. This paper considers the bias of selection-on-observables, instrumental variables, and proximal inference estimates under violations of their identifying assumptions. We develop bias expressions for IV and proximal inference that show how violations of their respective assumptions are amplified by any unmeasured confounding in the outcome variable. We propose a set of sensitivity tools that quantify the sensitivity of different identification strategies, and an augmented bias contour plot visualizes the relationship between these strategies. We argue that the act of choosing an identification strategy implicitly expresses a belief about the degree of violations that must be present in alternative identification strategies. Even when researchers intend to conduct an IV or proximal analysis, a sensitivity analysis comparing different identification strategies can help to better understand the implications of each set of assumptions. Throughout, we compare the different approaches on a re-analysis of the impact of state surveillance on the incidence of protest in Communist Poland.2025-07-31T17:29:20Z20 pages, 3 figures, plus references and appendicesMelody HuangCory McCartanhttp://arxiv.org/abs/2603.23688v1Adaptive Gaussian Process Search for Simulation-Based Sample Size Estimation in Clinical Prediction Models: Validation of the pmsims R Package2026-03-24T19:53:37ZBackground: Determining an adequate sample size is essential for developing reliable and generalisable clinical prediction models, yet practical guidance on selecting appropriate methods remains limited. Existing analytical and simulation-based approaches often rely on restrictive assumptions and focus on mean-based criteria. We present and validate pmsims, an R package that uses Gaussian process surrogate modelling to provide a flexible and computationally efficient simulation-based framework for sample size determination across diverse prediction settings. Methods: We conducted a comprehensive simulation study with two aims. First, we compared three search engines implemented in pmsims: a Gaussian process-based adaptive method, a deterministic bisection method, and a hybrid approach, across binary, continuous, and survival outcomes. Second, we benchmarked the best-performing pmsims engine against existing analytical (pmsampsize) and simulation-based (samplesizedev) methods, evaluating recommended sample sizes, computational time, and achieved performance on large independent validation datasets. Results: The Gaussian process-based method consistently produced the most stable sample size estimates, particularly in low-signal, high-dimensional settings. In benchmarking, pmsims achieved performance close to prespecified targets across all outcome types, matching simulation-based approaches and outperforming analytical methods in more challenging scenarios. Conclusions: pmsims provides an efficient and flexible framework for principled sample size planning in clinical prediction modelling, requiring fewer model evaluations than non-adaptive simulation approaches.2026-03-24T19:53:37Z27 pages, 2 main-text figures, 16 supplementary figures, 9 tables, preprintOyebayo Ridwan OlaniranDiana ShamsutdinovaSarah MarkhamFelix ZimmerDaniel StahlGordon ForbesEwan Carrhttp://arxiv.org/abs/2002.12586v7Nonparametric Empirical Bayes Estimation on Heterogeneous Data2026-03-24T18:24:57ZThe simultaneous estimation of many parameters based on data collected from corresponding studies is a key research problem that has received renewed attention in the high-dimensional setting. Many practical situations involve heterogeneous data where heterogeneity is captured by a nuisance parameter. Effectively pooling information across samples while correctly accounting for heterogeneity presents a significant challenge in large-scale estimation problems. We address this issue by introducing the ``Nonparametric Empirical Bayes Structural Tweedie" (NEST) estimator, which efficiently estimates the unknown effect sizes and properly adjusts for heterogeneity via a generalized version of Tweedie's formula. For the normal means problem, NEST simultaneously handles the two main selection biases introduced by heterogeneity: one, the selection bias in the mean, which cannot be effectively corrected without also correcting for, two, selection bias in the variance. We develop theory to show that NEST is asymptotically as good as the optimal Bayes rule that uniquely minimizes a weighted squared error loss. In our simulation studies NEST outperforms competing methods, with much efficiency gains in many settings. The proposed method is demonstrated on estimating the batting averages of baseball players and Sharpe ratios of mutual fund returns. Extensions to other members of the two-parameter exponential family are discussed.2020-02-28T07:48:39ZProof of Theorem 1 revisedTrambak BanerjeeLuella J. FuGareth M. JamesGourab MukherjeeWenguang Sunhttp://arxiv.org/abs/2510.16673v2Identification and estimation of causal mechanisms in cluster-randomized trials with post-treatment confounding using Bayesian nonparametrics2026-03-24T17:32:17ZCausal mediation analysis in cluster-randomized trials (CRTs) is essential for explaining how cluster-level interventions affect individual outcomes, yet it is complicated by interference, post-treatment confounding, and hierarchical covariate adjustment. We develop a Bayesian nonparametric framework that simultaneously accommodates interference and a post-treatment confounder that precedes the mediator. Identification is achieved through a multivariate Gaussian copula that replaces cross-world independence with a single dependence parameter, yielding a built-in sensitivity analysis to residual post-treatment confounding. For estimation, we introduce a nested common atoms enriched Dirichlet process (CA-EDP) prior that integrates the Common Atoms Model (CAM) to share information across clusters while capturing between- and within-cluster heterogeneity, and an Enriched Dirichlet Process (EDP) structure delivering robust covariate adjustment without impacting the outcome model. We provide formal theoretical support for our prior by deriving the model's key distributional properties, including its partially exchangeable partition structure, and by establishing convergence guarantees for the practical truncation-based posterior inference strategy. We demonstrate the performance of the proposed methods in simulations and provide further illustration through a reanalysis of a completed CRT.2025-10-19T00:31:43Z78 pagesYuki OhnishiMichael J. DanielsLei YangFan Lihttp://arxiv.org/abs/2603.16146v2Deep Adaptive Model-Based Design of Experiments2026-03-24T16:20:48ZModel-based design of experiments (MBDOE) is essential for efficient parameter estimation in nonlinear dynamical systems. However, conventional adaptive MBDOE requires costly posterior inference and design optimization between each experimental step, precluding real-time applications. We address this by combining Deep Adaptive Design (DAD), which amortizes sequential design into a neural network policy trained offline, with differentiable mechanistic models. For dynamical systems with known governing equations but uncertain parameters, we extend sequential contrastive training objectives to handle nuisance parameters and propose a transformer-based policy architecture that respects the temporal structure of dynamical systems. We demonstrate the approach on four systems of increasing complexity: a fed-batch bioreactor with Monod kinetics, a Haldane bioreactor with uncertain substrate inhibition, a two-compartment pharmacokinetic model with nuisance clearance parameters, and a DC motor for real-time deployment.2026-03-17T05:53:09ZArno StrouwenSebastian Micluţa-Câmpeanuhttp://arxiv.org/abs/2602.17503v2An extension to reversible jump Markov chain Monte Carlo for change point problems with heterogeneous temporal dynamics2026-03-24T16:07:39ZDetecting brief changes in time-series data remains a major challenge in fields where short-lived states carry meaning. In single-molecule localisation microscopy, this problem is particularly acute as fluorescent molecules used to tag protein oligomers display heterogenous photophysical behaviour that can complicate photobleach step analysis; a key step in resolving nanoscale protein organisation. Existing methods often require extensive filtering or prior calibration, and can fail to accurately account for blinking or reversible dark states that may contaminate downstream analysis. In this paper, an extension to RJMCMC is proposed for change point detection with heterogeneous temporal dynamics. This approach is applied to the problem of estimating per-frame active fluorophore counts from one-dimensional integrated intensity traces derived from Fluorescence Localisation Imaging with Photobleaching (FLImP), where compound change point pair moves are introduced to better account for short-lived events known as blinking and dark states. The approach is validated using simulated and experimental data, demonstrating improved accuracy and robustness when compared with current photobleach step analysis methods and with the existing analysis approach for FLImP data. This Compound RJMCMC (CRJMCMC) algorithm performs reliably across a wide range of fluorophore counts and signal-to-noise conditions, with signal-to-noise ratio (SNR) down to 0.001 and counts as high as nineteen fluorophores, while also effectively estimating low counts observed when studying EGFR oligomerisation. Beyond single molecule imaging, this work has applications for a variety of time series change point detection problems with heterogeneous state persistence. For example, electrocorticography brain-state segmentation, fault detection in industrial process monitoring and realised volatility in financial time series.2026-02-19T16:18:10ZEmily GribbinBenjamin DavisDaniel RolfeHannah Mitchellhttp://arxiv.org/abs/2603.23374v1Shape-Adaptive Conditional Calibration for Conformal Prediction via Minimax Optimization2026-03-24T16:05:43ZAchieving valid conditional coverage in conformal prediction is challenging due to the theoretical difficulty of satisfying pointwise constraints in finite samples. Building upon the characterization of conditional coverage through marginal moment restrictions, we introduce Minimax Optimization Predictive Inference (MOPI), a framework that generalizes prior work by optimizing over a flexible class of set-valued mappings during the calibration phase, rather than simply calibrating a fixed sublevel set. This minimax formulation effectively circumvents the structural constraints of predefined score functions, achieving superior shape adaptivity while maintaining a principled connection to the minimization of mean squared coverage error. Theoretically, we provide non-asymptotic oracle inequalities and show that the convergence rate of the coverage error attains the optimal order under regular conditions. The MOPI also enables valid inference conditional on sensitive attributes that are available during calibration but unobserved at test time. Empirical results on complex, non-standard conditional distributions demonstrate that MOPI produces more efficient prediction sets than existing baselines.2026-03-24T16:05:43ZYajie BaoChuchen ZhangZhaojun WangHaojie RenChangliang Zouhttp://arxiv.org/abs/2312.10618v2Sparse Learning and Class Probability Estimation with Weighted Support Vector Machines2026-03-24T16:03:19ZClassification and probability estimation are fundamental tasks with broad applications across modern machine learning and data science, spanning fields such as biology, medicine, engineering, and computer science. Recent development of weighted Support Vector Machines (wSVMs) has demonstrated considerable promise in robustly and accurately predicting class probabilities and performing classification across a variety of problems (Wang et al., 2008). However, the existing framework relies on an $\ell^2$-norm regularized binary wSVMs optimization formulation, which is designed for dense features and exhibits limited performance in the presence of sparse features with redundant noise. Effective sparse learning thus requires prescreening of important variables for each binary wSVM to ensure accurate estimation of pairwise conditional probabilities. In this paper, we propose a novel class of wSVMs frameworks that incorporate automatic variable selection with accurate probability estimation for sparse learning problems. We developed efficient algorithms for variable selection by solving either the $\ell^1$-norm or elastic net regularized wSVMs optimization problems. Class probability is then estimated either via the $\ell^2$-norm regularized wSVMs framework applied to the selected variables, or directly through elastic net regularized wSVMs. The two-step approach offers a strong advantage in simultaneous automatic variable selection and reliable probability estimators with competitive computational efficiency. The elastic net regularized wSVMs achieve superior performance in both variable selection and probability estimation, with the added benefit of variable grouping, at the cost of increases compensation time for high dimensional settings. The proposed wSVMs-based sparse learning methods are broadly applicable and can be naturally extended to $K$-class problems through ensemble learning.2023-12-17T06:12:33ZLiyun ZengHao Helen Zhanghttp://arxiv.org/abs/2511.15427v2Tractable Estimation of Nonlinear Panels with Interactive Fixed Effects2026-03-24T15:21:16ZInteractive fixed effects are routinely controlled for in linear panel models. While an analogous fixed effects (FE) estimator for nonlinear models has been available in the literature (Chen, Fernandez-Val and Weidner, 2021), it sees much more limited use in applied research because its implementation involves solving a high-dimensional non-convex problem. In this paper, we complement the theoretical analysis of Chen, Fernandez-Val and Weidner (2021) by providing a new computationally efficient estimator that is asymptotically equivalent to their estimator. Unlike the previously proposed FE estimator, our estimator avoids solving a high-dimensional optimization problem and can be feasibly computed in large nonlinear panels. Our proposed method involves two steps. In the first step, we convexify the optimization problem using nuclear norm regularization (NNR) and obtain preliminary NNR estimators of the parameters, including the fixed effects. Then, we find the global solution of the original optimization problem using a standard gradient descent method initialized at these preliminary estimates. Thus, in practice, one can simply combine our computationally efficient estimator with the inferential theory provided in Chen, Fernandez-Val and Weidner (2021) to construct confidence intervals and perform hypothesis testing; we also provide an R package for empirical implementation.2025-11-19T13:26:48ZAndrei ZeleneevWeisheng Zhanghttp://arxiv.org/abs/2603.23309v1Tail-Calibrated Estimation of Extreme Quantile Treatment Effects2026-03-24T15:13:44ZExtreme quantile treatment effects (eQTEs) measure the causal impact of a treatment on the tails of an outcome distribution and are central for studying rare, high-impact events. Standard QTE methods often fail in extreme regimes due to data sparsity, while existing eQTE methods rely on restrictive tail assumptions or on interior-quantile theory. We propose the Tail-Calibrated Inverse Estimating Equation (TIEE) framework, which combines information across quantile levels and anchors the tail using extreme value models within a unified estimating equation approach. We establish asymptotic properties of the resulting estimator and evaluate its performance through simulation under different tail behaviours and model misspecifications. An application to extreme precipitation in the Austrian Alps illustrates how TIEE enables observational causal attribution for very rare events under anthropogenic warming. More broadly, the proposed framework establishes a new foundation for causal inference on rare, high-impact outcomes, with relevance across environmental risk, economics, and public health.2026-03-24T15:13:44ZMengran LiDaniela Castro-Camilohttp://arxiv.org/abs/2603.23294v1Granger Causality in Expectiles: an M-vine copula test2026-03-24T14:56:07ZA model-free measure of Granger causality in expectiles is proposed, generalizing the traditional mean-based measure to arbitrary positions of the conditional distribution. Expectiles are the only law-invariant risk measures that are both coherent and elicitable, making them particularly well-suited for studying distributional Granger causality where risk quantification and forecast evaluation are both relevant. Based on this measure, a test is developed using M-vine copula models that accounts for multivariate Granger causality with $d+1$ series under non-linear and non-Gaussian dependence, without imposing parametric assumptions on the joint distribution. Strong consistency of the test statistic is established under some regularity conditions. In finite samples, simulations show accurate size control and power increasing with sample size. A key advantage is the joint testing capability: causal relationships invisible to pairwise tests can be detected, as demonstrated both theoretically and empirically. Two applications to international stock market indices at the global and Asian regional level illustrate the practical relevance of the proposed framework.2026-03-24T14:56:07ZRoberto Fuentes-MartínezIrene Crimaldihttp://arxiv.org/abs/2603.23277v1A reduced rank model for spatial categorical data with many classes2026-03-24T14:40:46ZWe develop an identifiable reduced-rank spatial multinomial model for categorical data with many classes. The model represents class-specific spatial effects through a low-dimensional set of shared latent factors, substantially reducing parameter dimension while preserving joint dependence across classes. Because standard conjugate and Pólya-Gamma methods fail under this factorization, we propose a Gibbs sampler using Laplace-approximation proposals within Metropolis-Hastings updates. Simulation studies examine dimension selection and the accuracy of the Laplace proposals. An application to dominant tree species mapping in the Blue Ridge Mountains demonstrates scalable inference and flexible joint predictions for individual classes, class unions, and area-level summaries.2026-03-24T14:40:46ZPaul B MayAndrew SimpsonSemhar Michael