https://arxiv.org/api/FI9RcdoB+eQtT5zfArFLrtGuDkY2026-06-18T15:43:12Z3629685515http://arxiv.org/abs/2507.05064v4Vecchia-Inducing-Points Full-Scale Approximations for Gaussian Processes2026-05-22T08:53:32ZGaussian processes are flexible, probabilistic, non-parametric models widely used in machine learning and statistics. However, their scalability to large data sets is limited by computational constraints. To overcome these challenges, we propose Vecchia-inducing-points full-scale (VIF) approximations combining the strengths of global inducing points and local Vecchia approximations. Vecchia approximations excel in settings with low-dimensional inputs and moderately smooth covariance functions, while inducing point methods are better suited to high-dimensional inputs and smoother covariance functions. Our VIF approach bridges these two regimes by using an efficient correlation-based neighbor-finding strategy for the Vecchia approximation of the residual process, implemented via a modified cover tree algorithm. We further extend our framework to non-Gaussian likelihoods by introducing iterative methods that substantially reduce computational costs for training and prediction by several orders of magnitudes compared to Cholesky-based computations when using a Laplace approximation. In particular, we propose and compare novel preconditioners and provide theoretical convergence results. Extensive numerical experiments on simulated and real-world data sets show that VIF approximations are both computationally efficient as well as more accurate and numerically stable than state-of-the-art alternatives. All methods are implemented in the open source C++ library GPBoost with high-level Python and R interfaces.2025-07-07T14:49:06ZTim GygerReinhard FurrerFabio Sigristhttp://arxiv.org/abs/2502.07646v3Causal Additive Models with Unobserved Causal Paths and Backdoor Paths2026-05-22T07:57:31ZCausal additive models provide a tractable yet expressive framework for causal discovery in the presence of hidden variables. When unobserved backdoor or causal paths exist between two variables, their causal relationship is often unidentifiable under existing theories. We establish sufficient conditions under which causal directions can be identified in many such cases. These conditions rely on new characterizations of regression sets to determine independence among regression residuals and conditional independencies among observed variables. Building on these results, we introduce a search algorithm that incorporates these innovations and prove its soundness and completeness. Empirical evaluations demonstrate its competitive performance against state-of-the-art methods.2025-02-11T15:35:15Z23 pagesProceedings of AISTATS 2026Thong PhamTakashi Nicholas MaedaShohei Shimizuhttp://arxiv.org/abs/2605.23318v1Generalized Rank Regression2026-05-22T07:36:08ZRank regression offers robustness to outliers and heavy-tailed response distributions, invariance to monotonic transformations, and improved efficiency under non-Gaussian errors, making it a versatile tool for analyzing complex data. This paper introduces Generalized Rank Regression (GRR), an extension of classical rank-based methods that accommodates non-monotonic score functions. While aimed at enhancing the statistical efficiency of robust estimators, this generalization results in a potentially non-convex and non-smooth objective function, presenting challenges for both theoretical analysis and algorithmic implementation. We derive a non-asymptotic Bahadur representation of the proposed estimator and establish its asymptotic normality under mild conditions. To address the optimization challenges, we propose a new two-stage sub-gradient descent algorithm that enables efficient computation of GRR estimators with desirable statistical properties. Furthermore, we develop a multiplier bootstrap procedure for conducting statistical inference. A close connection between GRR and variants of quantile regression is uncovered, which demonstrates that GRR and composite quantile regression share asymptotically equivalent variances. The advantages of GRR are illustrated through extensive simulation studies and a real data application.2026-05-22T07:36:08Z29 pages, 10 figuresJiyuan TuSuqi WuYichen ZhangWen-Xin Zhouhttp://arxiv.org/abs/2601.20192v2Online Change Point Detection for Multivariate Inhomogeneous Poisson Processes Time Series2026-05-22T05:02:07ZWe study online change point detection for multivariate inhomogeneous Poisson point process time series. This setting arises commonly in applications such as earthquake seismology, climate monitoring, and epidemic surveillance, yet remains underexplored in the machine learning and statistics literature. We propose a method that uses low-rank matrices to represent the multivariate Poisson intensity functions, resulting in an adaptive nonparametric detection procedure. Our algorithm is single-pass and requires only constant computational cost per new observation, independent of the elapsed length of the time series. We provide theoretical guarantees to control the overall false alarm probability and characterize the detection delay under temporal dependence. We also develop a new Matrix Bernstein inequality for temporally dependent Poisson point process time series, which may be of independent interest.
Numerical experiments demonstrate that our method is both statistically robust and computationally efficient.2026-01-28T02:42:33ZXiaokai LuoHaotian XuCarlos Misael Madrid PadillaOscar Hernan Madrid Padillahttp://arxiv.org/abs/2605.23210v1Fundamental Bounds and Efficient Estimation for Dead-Time-Constrained Event Detection, with Application to Single-Photon Lidar2026-05-22T03:55:30ZWe develop an asymptotic statistical theory for parameter estimation from a class of non-i.i.d. periodic binary event-detection processes subject to nonparalyzable dead time and gating, which we call "dead-time event detection" (DED) processes. Such processes arise in single-photon lidar, fluorescence lifetime imaging, X-ray astronomy, and particle or radiation flux measurements in nuclear physics, where each detection renders the radiation/particle detector inactive for a recovery interval. Our theory quantifies how dead time and gating affect the fundamental lower bounds of estimation and identifies practical estimators that attain these bounds. First, we identify a sufficient statistic, showing in particular that activation counts can carry statistically useful information discarded by conventional histogramming hardware. We then prove local asymptotic normality and derive the corresponding Fisher-information rate, thereby obtaining fundamental lower bounds for estimation from DED processes. We prove that the maximum likelihood estimator (MLE), widely used in DED applications, attains these lower bounds. Since computing the MLE typically requires solving a nonconvex optimization problem, we also propose Le Cam one-step estimators, which attain the same asymptotic bounds with only a single local correction rather than iterative optimization. We illustrate the validity of our asymptotic theory and the practical usefulness of one-step estimators through the example of single-photon lidar in both simulations and real-data experiments.2026-05-22T03:55:30Z24 pages, 5 figuresFrederic J. N. JorgensenSteven G. Johnsonhttp://arxiv.org/abs/2605.23208v1A Direct Variance Estimation (DiVE) for Meta-Analysis of Median Differences2026-05-22T03:52:33ZMeta-analyses of two-group studies that report median differences typically rely on methods that require, in addition to the median difference and sample size, summary measures of dispersion such as quartiles or ranges. Studies that do not report such statistics are often excluded from the meta-analysis. Existing two-stage approaches first estimate the asymptotic variance of the median difference within each study under parametric assumptions, and then combine these study-specific estimates to obtain the pooled median difference and its variance. We propose Direct Variance Estimation (DiVE), a method that directly estimates the variance of the pooled difference using only study-level median differences and their sample sizes. A comprehensive simulation study across a wide range of distributional scenarios shows that DiVE performs comparably to or better than conventional two-stage methods, with clear advantages when the number of studies is small. A re-analysis of published meta-analyses demonstrates that DiVE enables the inclusion of studies lacking dispersion statistics, leading to a more comprehensive and potentially less biased synthesis of evidence.2026-05-22T03:52:33ZTadahisa OkudaMasataka TaguriKenichi Hayashihttp://arxiv.org/abs/2605.23207v1Mixture-of-Finite-Mixtures Wishart Model for Clustering Covariance Matrices with an Application to Brain Functional Connectivity2026-05-22T03:52:19ZData represented as covariance-type matrices arise in many fields, including brain functional connectivity and diffusion tensor imaging. We develop the MFM-Wishart, a Bayesian model-based clustering approach for such data that combines Wishart mixture components with a mixture-of-finite-mixtures (MFM) prior, allowing joint posterior inference on both the number of clusters and clustering assignments. Theoretically, we study the properties of Wishart kernels in the context of mixture models and then establish results for posterior consistency for the number of clusters and posterior contraction of the mixing measure under standard regularity conditions. Computationally, we develop an efficient Markov chain Monte Carlo (MCMC) algorithm for posterior inference. Simulation studies show competitive clustering performance and accurate recovery of the number of clusters, even under model misspecification. We apply MFM-Wishart to cluster infants based on functional connectivity during sleep, estimated from functional near-infrared spectroscopy (fNIRS) data, illustrating the practical utility of the model and revealing interpretable heterogeneity.2026-05-22T03:52:19ZZongyu LiStefano CastruccioZhiyong Zhanghttp://arxiv.org/abs/2605.23145v1Operationalizing Individual Fairness via Gradient Descent and Bradley-Terry Models2026-05-22T01:50:38ZIndividual fairness, the notion that "similar individuals should be treated similarly," provides a strong and flexible fairness guarantee for algorithmic decision makers. However, a barrier to implementing individual fairness in practice is the difficulty of learning the similarity metric over individuals. In this work, we present an algorithm for learning a Mahalanobis similarity metric from triplet queries of the form "is individual $i$ more similar to individual $j$ or $k$?" We work in the standard Bradley-Terry model for pairwise comparisons. Our algorithm consists of a spectral initialization step followed by gradient descent. We provide extensive theoretical guarantees on our algorithm, showing that it converges quickly to the ground truth metric despite the non-convexity of the loss in our model. Because our focus is on fairness, we also show that individual fairness with respect to an estimated metric is sufficient to achieve similar fairness with respect to the true metric. We also discuss potential applications of our work to AI model tuning. Finally, we present experimental results that demonstrate the convergence of our algorithm and the fairness performance of downstream fair predictors trained on our estimated metric.2026-05-22T01:50:38Z60 pages, 2 figuresConlan OlsonLinjun ZhangZhun DengPragya Surhttp://arxiv.org/abs/2605.23102v1LLM Sparsity Prior for Robust Feature Selection2026-05-21T23:34:04ZLarge language models (LLMs) offer a scalable mechanism to elicit domain-informed prior information for high-dimensional variable selection. However, existing methods such as LLM-Lasso are sensitive to weight quality, with performance degrading substantially when LLM-generated weights are inaccurate. To address this challenge, we first introduce a framework for quantifying the quality of LLM-generated weights, enabling rigorous evaluation of LLM-informed methods across varying weight regimes. We then propose the LLM Sparsity Prior (LSP), which integrates LLM-generated weights into the prior inclusion probabilities of Spike-and-Slab and Spike-and-Slab Lasso models via two interpretable hyperparameters governing global sparsity and weight concentration. Hierarchical hyperpriors on these parameters allow the model to dynamically discount uninformative or misleading weights, improving robustness without sacrificing gains when weights are accurate. Finally, we develop principled prompt engineering strategies and validate the method on a private medical dataset studying Acute Kidney Injury. LSP improves prediction accuracy and identifies clinically relevant features missed by the baselines, with robustness to prompt variation and particular effectiveness in low-data regimes.2026-05-21T23:34:04ZCaleb SkinnerYihan GuoMeng Lihttp://arxiv.org/abs/2605.23048v1StanBKT: Rethinking Parameter Estimation in Bayesian Knowledge Tracing2026-05-21T21:27:10ZBayesian Knowledge Tracing (BKT) is a widely used and interpretable student modeling approach in intelligent tutoring systems and educational data mining. However, most implementations rely on expectation-maximization or related optimization methods that yield only point estimates, limiting uncertainty quantification and principled comparisons across learners and conditions. We introduce StanBKT, an open-source Python package for estimating BKT models using Bayesian inference in Stan. StanBKT provides a unified framework supporting Hamiltonian Monte Carlo, variational inference, Pathfinder, and optimization-based estimation while preserving the hidden Markov structure and interpretability of classical BKT. It supports standard, grouped, and hierarchical BKT models, flexible prior specification, posterior predictive inference, and utilities for visualization and diagnostics. We evaluate StanBKT on large-scale observational and controlled educational datasets. On the ASSISTments 2020 dataset, we show that supported inference methods achieve comparable predictive performance while differing in computational efficiency and posterior fidelity. We further demonstrate how posterior inference enables principled comparison of condition-specific parameters in an educational intervention involving perceptual cue manipulations. Results illustrate how uncertainty quantification facilitates more reliable interpretation of differences in learning, forgetting, guessing, and slipping parameters across experimental conditions. Overall, StanBKT extends BKT beyond point estimation by providing a flexible framework for probabilistic student modeling, uncertainty quantification, and hierarchical inference in educational data mining.2026-05-21T21:27:10Z5 figures, 7 tablesSiddhartha PradhanYanping PeiMorgan LeePuyuan ZhangErin OttmarAdam C. Saleshttp://arxiv.org/abs/2605.23016v1Sample correlation adjustments for robust Multi-fidelity Monte Carlo under limited pilot sampling2026-05-21T20:38:08ZMulti-fidelity Monte Carlo (MFMC) is a variance reduction method that leverages a multi-fidelity ensemble of models of varying cost and accuracy levels. Constructing an MFMC estimator with optimal variance requires knowledge of the correlation coefficients between the different fidelity models which are not usually known in practice. The correlations are typically estimated using offline pilot samples and the sample correlation formula, after which the MFMC method proceeds as if the estimated correlations are the true correlations. Computational cost often restricts the number of pilot samples used leading to poor correlation estimates and suboptimal estimators. Leveraging the MFMC problem setting and probabilistic information about the sample covariance matrix, we present a method to improve standard sample-based correlation estimates in the presence of limited pilot samples. We define a novel discrepancy function quantifying the estimator suboptimality which in turn facilitates selecting a correlation estimator minimizing the worst-case expected discrepancy, where the expectation is taken with respect to the pilot sampling variability. Through a simple bivariate Gaussian example and a multi-fidelity modeling application from a NASA Entry, Descent, and Landing (EDL) problem, we show that this method produces better MFMC estimators than the standard sample covariance under small pilot sample sizes and limited total budgets.2026-05-21T20:38:08ZMichael StanleyThomas CoonsGeoffrey BomaritoPatrick LeserJoshua PribeJames Warnerhttp://arxiv.org/abs/2506.10152v2Robust copula estimation for one-shot devices with correlated failure modes2026-05-21T19:21:52ZThis paper presents a robust method for estimating copula models to evaluate dependence between failure modes in one-shot devices-systems designed for single use and destroyed upon activation. Traditional approaches, such as maximum likelihood estimation (MLE), often produce unreliable results when faced with outliers or model misspecification. To overcome these limitations, we introduce a divergence-based estimation technique that enhances robustness and provides a more reliable characterization of the joint failure-time distribution. Extensive simulation studies confirm the robustness of the proposed method. Additionally, we illustrate its practical utility through the analysis of a real-world dataset.2025-06-11T20:12:45ZE. CastillaP. J. Chocanohttp://arxiv.org/abs/2602.08927v3Online monotone density estimation and log-optimal calibration2026-05-21T19:20:17ZWe study the problem of online monotone density estimation, where density estimators must be constructed in a predictable manner from sequentially observed data. We propose two online estimators: an online analogue of the classical Grenander estimator, and an expert aggregation estimator inspired by exponential weighting methods from the online learning literature. In the well-specified stochastic setting, where the underlying density is monotone, we show that the expected cumulative log-likelihood gap between the online estimators and the true density admits an $O(n^{1/3})$ bound. We further establish a $\sqrt{n\log{n}}$ pathwise regret bound for the expert aggregation estimator relative to the best offline monotone estimator chosen in hindsight, under minimal regularity assumptions on the observed sequence. As an application of independent interest, we show that the problem of constructing log-optimal p-to-e calibrators for sequential hypothesis testing can be formulated as an online monotone density estimation problem. We adapt the proposed estimators to build empirically adaptive p-to-e calibrators and establish their optimality. Numerical experiments illustrate the theoretical results.2026-02-09T17:29:39Z31 pages, 2 figuresRohan HoreRuodu WangAaditya Ramdashttp://arxiv.org/abs/2512.15244v2Non-parametric Causal Inference in Dynamic Thresholding Designs2026-05-21T18:55:36ZWe consider causal inference in dynamic settings where treatment is assigned by thresholding a state variable that can change over time. There is a large literature on regression-discontinuity methods building on the fact that, in the static setting, treatment assignment via threshold crossing induces a quasi-experimental design that enables pragmatic causal inference. But dynamic settings involve challenges not present in the static setting, e.g., past treatments may affect current state and thus future treatments, and so existing regression-discontinuity methods do not apply. Here, we show that dynamic thresholding designs identify a marginal policy effect that nests the classical regression-discontinuity parameter in the static setting; and propose a tailored local linear regression estimator that is consistent for this marginal policy effect. We demonstrate our approach using an experiment that emulates real-world optimization of thresholds for continuous glucose monitoring using data generated from an FDA-approved simulator.2025-12-17T09:43:20ZAditya GhoshStefan Wagerhttp://arxiv.org/abs/2605.22950v1Diffusion-based Denoising Beats Vanilla Score Matching in Parameter Estimation: A Theoretical Explanation2026-05-21T18:25:22ZScore matching is an alternative to maximum likelihood estimation when the normalizing constant is unknown or too costly to evaluate. However, vanilla score matching has shown to be inefficient relative to maximum likelihood estimation for multimodal distributions with well-separated modes, which are commonly encountered in practical applications. We compare a novel diffusion-based denoising score matching estimator (DDSME) to the vanilla score matching estimator (SME) in this scenario. In particular, we prove statistical guarantees for both estimators, showing that the error bound for the vanilla SME worsens when the separation between the modes increases, which can be avoided in case of the DDSME with suitable hyperparameter tuning. This provides a novel theoretical explanation for the superior behavior of diffusion-based score matching over the vanilla version.2026-05-21T18:25:22ZBenedikt Lütke SchwienhorstNadja KleinJohannes Lederer