https://arxiv.org/api/jxZW/PtnphrgBPFJlgwkWKsk68w 2026-06-14T01:40:52Z 36171 720 15 http://arxiv.org/abs/2401.03834v4 On the error control of invariant causal prediction 2026-05-20T16:57:43Z

Invariant causal prediction provides a useful framework for identifying causal predictors of a response using heterogeneous data from multiple environments. One valuable property of the original invariant causal prediction method is that it guarantees no false causal discoveries with high probability. Such a guarantee, however, can be overly conservative in some applications, resulting in few or no causal discoveries. This raises a natural question: can invariant causal prediction be equipped with less conservative error guarantees and thereby extract more causal information from the data? In this paper, we address this question by focusing on two widely used and more liberal guarantees: false discovery rate control and simultaneous true discovery bounds. A key step in our approach is to reformulate invariant causal prediction as a multiple testing problem. We then adopt the e-Closure principle to obtain (simultaneous) false discovery rate control, together with new p-to-e calibrators tailored to this setting. We also derive simultaneous true discovery bounds via closed testing, which provide additional causal information without requiring extra assumptions and retain all discoveries from the original invariant causal prediction method. Through simulations and a real data application on educational attainment of teenagers in the United States, we show that these more liberal error control guarantees can improve the practical usefulness of invariant causal prediction.

2024-01-08T11:46:20Z Jinzhou Li Jelle J Goeman http://arxiv.org/abs/2605.21387v1 Clustering Craters on the Moon with Dysfunctional Families 2026-05-20T16:43:19Z

Summaries of craters on terrestrial bodies, such as the number and size distribution, are essential for understanding the history of the Solar System. Identifying craters, however, has not been automated and thus relies on expert crater-counters marking static images. Robbins et al. (2014) (hereafter R14) showed that, contrary to previously held assumptions, there exists large variability across expert crater-counters' identified crater lists. How best to combine identified crater lists across multiple experts for the purposes of learning about the Solar System is an open and consequential question. R14 combined identified crater lists via clustering through a modification of the popular DBSCAN clustering method. Their approach did not, however, make use of all the constraining information available nor did it provide an estimate of clustering uncertainty. To address the shortcomings of the DBSCAN method, we present a novel clustering approach that can combine multiple lists of identified objects of interest from the same image. The key innovation is incorporating a dysfunctional family constraint into the Bayesian nonparametric clustering approach, the Chinese restaurant process (CRP), which naturally takes into account information about the crater identifier. The dysfunctional family Chinese restaurant process (DFCRP) provides an estimate of clustering uncertainty. In this work, we provide guidance on hyperparameter specification, present a Gibbs sampler, and perform a simulation study to compare the performance of the DFCRP to the CRP. Finally, we apply the DFCRP to the crater identification problem of R14, comparing results, and also demonstrate the types of analyses that can be performed with posterior draws of cluster assignments.

2026-05-20T16:43:19Z Nathan Weed Emily Castleton Dave Osthus Brian Weaver Richard L. Warr http://arxiv.org/abs/2512.23943v2 Statistical Guarantees in the Search for Less Discriminatory Algorithms 2026-05-20T15:53:32Z

U.S. discrimination law can impose liability on firms that fail to adopt a less discriminatory alternative (LDA): a decision policy that achieves the same business objectives while reducing disparate impact on legally protected groups. Recent scholarship argues that this doctrine has direct implications for algorithmic decision-making in high-stakes domains such as employment, lending, and housing, potentially obligating firms to search for "less discriminatory algorithms" (Black et al., 2024). Regulators have at times encouraged proactive LDA searches, reinforcing the expectation of a good-faith effort to identify equally performant models with lower disparate impact. Model multiplicity makes such searches plausible: retraining with different random seeds can yield models with comparable predictive performance but materially different disparate impacts. Yet firms cannot retrain indefinitely, raising a central question: when is the search sufficient to demonstrate good faith? We formalize LDA search under multiplicity as an optimal stopping problem in which a developer seeks to produce evidence that further search is unlikely to yield meaningful improvements. Our main contribution is an adaptive stopping algorithm that provides a high-probability upper bound on the best disparate-impact gains attainable through continued retraining, enabling developers to certify (e.g., to a court) that additional search is unlikely to help. We also show how stronger distributional assumptions over the model space can yield tighter bounds, and we validate the approach on real-world credit and housing datasets.

2025-12-30T02:20:52Z 38 pages, 10 figures Chris Hays Ben Laufer Solon Barocas Manish Raghavan http://arxiv.org/abs/2605.21307v1 The Bayesian Gaussian Process Latent Variable Model for Spatio-Temporal Stream Networks 2026-05-20T15:35:35Z

A variational inference-based framework for training a multi-output Gaussian process latent variable model, specifically tailored to the tails-up spatio-temporal stream network, is developed. Training, given a censored observational data set subject to missing values, proceeds by maximising a secondary variational lower bound on the model log marginal likelihood using gradient-based optimisation. Consequently, the theoretical development for a new family of tails-up spatio-temporal stream network models is introduced which rely on the sparse Gaussian process inducing variable framework, the Bayesian Gaussian process latent variable model, and local variational methods. These spatio-temporal models use stream distance instead of Euclidean distance and capture spatial and temporal dependencies using auto/cross-correlation and process convolution, respectively, which allows for the development of valid separable spatio-temporal stream network-based covariance functions. Results from the simulation-based case studies indicate that the proposed framework performs well when considering benchmark comparisons and several performance metrics.

2026-05-20T15:35:35Z Marno Basson Tobias M. Louw Theresa R. Smith http://arxiv.org/abs/2605.21304v1 How does limma-trend work? An empirical partially Bayes perspective 2026-05-20T15:33:59Z

In high-throughput biology, it is common to fit thousands of linear regressions -- one per gene, protein, or other unit -- with very few samples per unit. Limma-trend, one of the most widely used methods in this setting, improves power by shrinking variance estimates parametrically toward a fitted curve (the trend) relating variance to a unit-level summary (e.g., average intensity, peptide count), before computing p-values and applying the Benjamini-Hochberg procedure to control the false discovery rate (FDR). We study limma-trend through the lens of empirical partially Bayes inference, a paradigm in which a prior is posited and estimated for the nuisance parameters while parameters of interest remain fixed. From this perspective, limma-trend computes approximate partially Bayes p-values that condition on the residual sample variance and the unit-level summary. The same framework explains why MAnorm2, a popular variant for ChIP-seq, can sometimes fail to control FDR. We then derive a nonparametric generalization of limma-trend that estimates the residual variance prior using nonparametric maximum likelihood. Under dense signals, this procedure asymptotically controls the FDR -- even when the trend is misspecified or inconsistently estimated. To allow the full shape of the conditional variance distribution to depend on the unit-level summary, we develop a second procedure that learns it directly.

2026-05-20T15:33:59Z Sagnik Nandy Wanyi Ling Nikolaos Ignatiadis http://arxiv.org/abs/2605.21283v1 A continuous-time Markov chain framework for population size estimation from multi-list data: accounting for absorbing lists and asymmetric interactions 2026-05-20T15:15:56Z

We introduce a continuous-time Markov chain framework for estimating population size from multi-list data, which allows directional interactions to be modelled and can accommodate absorbing lists, such as death records, or more general data collection processes. The standard model of the continuous-time Markov chain framework and the log-linear model for multi-list data are equivalent when lists are independent and we show empirically that they give similar results in the presence of dependencies between lists. Through a simulation study, we highlight the need to account for an absorbing list by using the Markov model or the log-linear model with forced absorbing interactions, observing biased estimates of the population size otherwise. We motivate our approach with an epidemiological dataset concerning individuals suffering from a first ever stroke in North-West England, in which one of the lists is a death record. We illustrate a further use of our approach by considering a case of ordered lists on drug use data from the City of London.

2026-05-20T15:15:56Z Ophélie Schaller Andrew Titman Rachel McCrea http://arxiv.org/abs/2502.17773v5 How Many Human Survey Respondents is a Large Language Model Worth? An Uncertainty Quantification Perspective 2026-05-20T14:56:31Z

Large language models (LLMs) are increasingly used to simulate survey responses, but synthetic data can be misaligned with the human population, leading to unreliable inference. We develop a general framework that converts LLM-simulated responses into reliable confidence sets for population parameters of human responses, quantifying the uncertainty induced by the human-LLM misalignment. The key design choice is the number of simulated responses: too many produce overly narrow sets with poor coverage, while too few yield overly wide and uninformative sets dominated by stochastic noise. We propose a data-driven approach that adaptively selects the simulation sample size to achieve nominal average-case coverage, regardless of the LLM's simulation fidelity or the confidence set construction procedure. The selected sample size is further shown to reflect the effective human population size that the LLM can represent, providing a quantitative measure of its simulation fidelity. Experiments on real survey datasets reveal heterogeneous simulation fidelity across different LLMs and domains.

2025-02-25T02:07:29Z 63 pages, 13 figures Chengpiao Huang Yuhang Wu Kaizheng Wang http://arxiv.org/abs/2503.00565v3 Batched Single-Index Global Multi-Armed Bandits with Covariates 2026-05-20T14:56:14Z

The multi-armed bandits (MAB) framework is a widely used approach for sequential decision-making, where a decision-maker selects an arm in each round with the goal of maximizing long-term rewards. In many practical applications, such as personalized medicine and recommendation systems, contextual information is available at the time of decision-making, rewards from different arms are related rather than independent, and feedback is provided in batches. We propose a novel semi-parametric framework for batched bandits with covariates that incorporates a shared parameter across arms. We leverage the single-index regression (SIR) model to capture relationships between arm rewards while balancing interpretability and flexibility. Our algorithm, Batched single-Index Dynamic binning and Successive arm elimination (BIDS), employs a batched successive arm elimination strategy with a dynamic binning mechanism guided by the single-index direction. We consider two settings: one where a pilot direction is available and another where the direction is estimated from data, deriving theoretical regret bounds for both cases. When a pilot direction is available with sufficient accuracy and the number of arms $K$ is fixed, our approach achieves minimax-optimal rates (with $d = 1$) for nonparametric batched bandits, circumventing the curse of dimensionality. Extensive experiments on simulated and real-world datasets demonstrate the effectiveness of our algorithm compared to the nonparametric batched bandit method introduced by \cite{jiang2025batched}.

2025-03-01T17:23:55Z Sakshi Arya Hyebin Song http://arxiv.org/abs/2604.19169v2 A Finite Mixture Failure-rate based Heterogeneous Step-stress Accelerated Life Testing (h-SSALT) Model 2026-05-20T14:52:38Z

Traditional step-stress accelerated life testing models assume that test units originate from a homogeneous population. Recently, Lu and Kateri (2025) proposed a heterogeneous cumulative exposure based SSALT model to account for the inhomogeneous aging patterns among test units belonging to the same production batch. This paper introduces an alternative yet flexible failure-rate based heterogeneous simple SSALT (h-SSALT) model with Weibull-distributed Type-II censored failure times, allowing heterogeneity to emerge at the second stress level through a finite mixture of m latent subgroups, each characterized by its own failure behavior. The expectation-maximization algorithm is developed for maximum likelihood estimation of the model parameters, exploiting the incomplete data structure arising from both unknown group membership and Type-II censoring. Interval estimation is performed using the missing information identity of Louis (1982) with transformation-based confidence intervals respecting parameter constraints. An extensive simulation study evaluates the finite-sample performance of the proposed estimators and demonstrates, through a quantile-based comparison, that ignoring population heterogeneity leads to systematic bias in lifetime predictions across the entire quantile range, with the most severe consequences at early failure quantiles of direct relevance to warranty period design. A special case comparison confirms that the proposed Weibull failure-rate based formulation reduces to the existing model of Lu and Kateri (2025) when the shape parameter equals unity, validating the proposed framework as a proper generalization. The practical application of the model is further illustrated through simulated and real data analysis examples.

2026-04-21T07:28:30Z 44 pages, 7 figures, 12 tables. Version 2: we have added interval estimation using Louis' missing information method with transformation-based confidence intervals, and an additional real data analysis example Pranoy Palit Ayan Pal Kiran Prajapat http://arxiv.org/abs/2511.01705v2 Z-Dip: a standardized measure for data modality assessment 2026-05-20T14:31:20Z

Detecting multimodality in empirical distributions is a fundamental problem in statistics and data analysis, with applications ranging from clustering to the study of complex systems. In practice, however, assessing departures from unimodality in a consistent and comparable way remains challenging. Widely used methods such as Hartigan and Hartigan's Dip Test illustrate these difficulties, as the interpretation of their statistics depends strongly on sample size, requires calibration to determine significance, and, for large samples, exhibit increasing sensitivity, leading to rejection of unimodality for arbitrarily small deviations from the null. We introduce Z-Dip, a standardized measure of multimodality that addresses these limitations. By treating the Dip statistic as a random variable under the null hypothesis of unimodality and standardizing its observed value, the proposed approach yields scores that are directly comparable across datasets of different sizes. Using simulation-based calibration, we derive a universal decision threshold that closely reproduces classical Dip Test decisions without requiring sample-size-specific adjustments. Extensive validation on simulated data and on more than 88,000 empirical opinion distributions shows near-perfect agreement with the classical Dip Test while providing a more interpretable and comparable measure of modality. Finally, we propose a downsampling-based correction that mitigates residual sensitivity in extremely large samples. Open-source software and reference tables are provided to facilitate practical adoption.

2025-11-03T16:13:25Z Edoardo Di Martino Matteo Cinelli Roy Cerqueti http://arxiv.org/abs/2605.21197v1 Laplace Approximations for Mixed-Effects and Gaussian Process Quantile Regression 2026-05-20T13:57:55Z

Laplace approximations are a standard tool for computationally efficient inference in latent Gaussian models, but they fail for quantile regression with the asymmetric Laplace likelihood because the observed Hessian vanishes almost everywhere. We show that this obstacle can be overcome without smoothing the likelihood: the relevant local curvature is given not by the observed Hessian, but by the Fisher information when the model is correctly specified and by the population curvature of the expected loss under misspecification. On this basis, we develop a Laplace approximation framework for quantile regression with mixed-effects and Gaussian process models. We propose practical curvature estimators, including the triangular kernel curvature (TKC) estimator, that yield approximations for posterior distributions and marginal likelihoods, and we establish their asymptotic validity. Empirically, the proposed methods are scalable and numerically stable, and for latent Gaussian models, they achieve accuracy comparable to or better than MCMC and variational competitors at substantially lower computational costs. More broadly, the framework clarifies how Laplace approximations can be justified for non-smooth generalized posteriors through local quadratic behavior of the expected loss.

2026-05-20T13:57:55Z Andrea Nava Fabio Sigrist http://arxiv.org/abs/2602.04907v2 Causal Discovery from Heteroscedastic Stochastic Dynamical Systems under Imperfect Physical Models 2026-05-20T13:43:01Z

Causal discovery is a data-driven paradigm for analyzing complex systems, while physics-based models, such as ordinary differential equations (ODEs), provide mechanistic structure for real-world dynamical processes. Integrating these paradigms can improve identifiability, stability, and robustness. However, real dynamical systems often exhibit cyclic interactions and nonstationarity, whereas many causal discovery methods rely on acyclicity, stationarity, or equilibrium assumptions. We propose an integrative causal discovery framework for dynamical systems that leverages partial physical knowledge through stochastic differential equations (SDEs). The drift term encodes known ODE dynamics, while the diffusion term captures unknown causal couplings beyond the prescribed physics. We develop a scalable sparsity-inducing maximum quasi-likelihood estimator with a theoretically justified stabilization technique to improve the optimization landscape. Under mild conditions, we establish causal graph recovery guarantees for both stable and unstable SDEs. We also analyze robustness of our causal graph estimate to ODE misspecification and clarify how the introduced stabilization technique balances numerical stability and statistical recoverability. Experiments on linear SDEs and nonlinear benchmarks, including Lotka-Volterra and Lorenz dynamics with acyclic and cyclic structures, show improved graph recovery and robustness over data-driven baselines. We also demonstrate practical utility on real-world epidemic data by reconstructing stochastic SIR dynamics within our causal discovery framework.

2026-02-03T23:42:01Z 101 pages Jianhong Chen Naichen Shi Xubo Yue http://arxiv.org/abs/2605.21041v1 Conditioning Gaussian Processes on Almost Anything 2026-05-20T11:23:42Z

Gaussian processes (GPs) offer a principled probabilistic model over functions, but exact inference is restricted to the linear-Gaussian regime. We establish an explicit equivalence between GPs and a class of linear diffusion models, recasting predictive sampling as an ODE with closed-form Gaussian dynamics and a likelihood-dependent guidance term that admits a simple Monte Carlo approximation. In the linear-Gaussian setting, we recover standard GP conditioning exactly; beyond conjugacy, the same machinery handles any conditioning statement admitting point-wise likelihood evaluation -- including non-linear physics, and, for the first time, natural language via large language models. Whitening isolates the irreducible non-Gaussian dynamics, minimising Wasserstein-2 transport cost and eliminating numerical stiffness. The result is a general-purpose GP inference scheme requiring no bespoke derivations. Together, these results provide a general mechanism for incorporating the full richness of real-world knowledge as conditioning information, opening a new frontier for the probabilistic modelling of real-world problems.

2026-05-20T11:23:42Z Henry Moss Lachlan Astfalck Thomas Cowperthwaite Colin Doumont Sam Willis Philipp Hennig Christopher Nemeth Andrew Zammit-Mangion http://arxiv.org/abs/2605.20943v1 Missing data and cluster graphs: cluster-level missingness vs variable-level missingness 2026-05-20T09:29:24Z

Missing data is pervasive in many scientific domains such as public health, environmental science, and the social sciences. Recoverability from missing data is typically studied using fully specified variable-level missingness models despite that, in many applications, only coarse structural information is available, for instance when variables are grouped into clusters due to limited knowledge or interpretability reasons. In this paper, we investigate recoverability from such abstract representations. We introduce two classes of cluster-based missingness graphs: the m-C-DMG, which retains variable-specific missingness indicators, and the cm-C-DMG, which aggregates missingness mechanisms at the cluster level. We formalize the notion of compatibility between these abstract graphs and underlying variable-level missingness models, and study how this abstraction affects the recoverability of probabilistic and causal queries. In particular, we give graphical conditions of recovering the joint distribution as well as graphical conditions of recovering a macro causal effect. Overall, our results clarify when cluster-level missingness information is sufficient for valid inference, and when finer-grained modeling is necessary.

2026-05-20T09:29:24Z Willow Scott Eugenio Valdano Charles Assaad http://arxiv.org/abs/2601.14991v2 Consistency of Honest Decision Trees and Random Forests 2026-05-20T09:24:04Z

We study various types of consistency of honest decision trees and random forests in the regression setting. In contrast to related literature, our proofs are elementary and follow the classical arguments used for smoothing methods. Under mild regularity conditions on the regression function and data distribution, we establish weak and almost sure convergence of honest trees and honest forest averages to the true regression function, and moreover we obtain uniform convergence over compact covariate domains. The framework naturally accommodates ensemble variants based on subsampling and also a two-stage bootstrap sampling scheme. Our treatment synthesizes and simplifies existing analyses, in particular recovering several results as special cases. The elementary nature of the arguments clarifies the close relationship between data-adaptive partitioning and kernel-type methods, providing an accessible approach to understanding the asymptotic behavior of tree-based methods.

2026-01-21T13:40:36Z Martin Bladt Rasmus Frigaard Lemvig