https://arxiv.org/api/SU1tktm1aRcAYAj2bFbeHxP8ELk 2026-06-10T12:31:15Z 36124 210 15 http://arxiv.org/abs/2606.04520v1 Beyond First-order Asymptotics in Sequential Mean Testing 2026-06-03T06:56:15Z

We revisit the problem of sequentially testing the mean of bounded distributions in a level-$α$ power-one framework. We study a $\mathrm{KL_{inf}}$-based sequential test that is known to attain the information-theoretic lower bound on the expected stopping time with exact constants as $α\to 0$. Going beyond first-order asymptotics, we establish a central limit theorem (CLT) for the stopping time of this test. Our analysis proceeds in two steps. First, we prove a novel CLT for the $\mathrm{KL_{inf}}$ statistic itself, characterizing its fluctuations around its deterministic limit. We then leverage this result to show that the stopping time, centered appropriately and scaled by $\sqrt{\log(1/α)}$, converges in distribution to a Gaussian limit with an explicit variance. This yields a second-order characterization of an asymptotically optimal sequential test for bounded distributions. Finally, we present numerical experiments that corroborate our theoretical findings.

2026-06-03T06:56:15Z Vikas Deep Shubhada Agrawal http://arxiv.org/abs/2606.04495v1 Fused Spatial Latent Block Models for Co-Clustering 2026-06-03T06:24:06Z

Spatial transcriptomics is a rapidly growing technique that captures gene expression together with spatial coordinates in intact tissue sections, enabling in situ mapping of transcriptional activity. This technology offers unprecedented opportunities to study tissue heterogeneity and spatial gene expression patterns. Uncovering the associations between spatially variable gene modules and spot types can advance our understanding of pathological mechanisms. However, rigorous statistical methods that exploit spatial information to achieve spatially coherent co-clustering of spots and genes are still lacking, and theoretical investigations in this direction remain limited. We propose a fused spatial latent block model (F-SpLBM). Our model uses the LBM to uncover co-expression patterns between spots and genes, penalized fusion to automatically determine the number of co-clusters, and the Potts model to incorporate spatial information. We establish that the fusion-based procedure recovers the true block structure with the misclassification rate converging at a super-polynomial rate. We also prove asymptotic normality of the parameter estimators and quantify the accuracy gain from spatial smoothing. Simulations and real-data analyses demonstrate that F-SpLBM yields spatially coherent and biologically interpretable clustering results.

2026-06-03T06:24:06Z Biao Cai Yuanxing Chen Kuangnan Fang Xiaolong Lin http://arxiv.org/abs/2606.04417v1 saCI: An R Package for Stochastic Approximation Confidence Intervals for Correlation Coefficients 2026-06-03T03:51:05Z

This paper presents saCI, an R package that implements the stochastic approximation method for constructing nonparametric confidence intervals for Pearson's correlation coefficient. The package is based on the algorithm proposed by Garthwaite (1996) and further developed by Xiong & Xu (2016). The implementation provides both the stochastic approximation (SA) method and the bootstrap BCa method for comparison, along with an interactive Shiny application for exploratory analysis. The package has been successfully published on CRAN, demonstrating its compliance with R package standards and reproducibility.

2026-06-03T03:51:05Z 8 pages, 1 figure, R package Pengyu Chen Yifan Jiang Jiashuo Shao http://arxiv.org/abs/2606.04416v1 Powerful Multivariate Sensitivity Analysis via Sample Splitting in an Observational Study of the Effects of Poverty on Cardiovascular Disease Risk Factors 2026-06-03T03:50:09Z

When assessing the causal effect of an exposure on two or more outcomes in an observational study, a linear combination of outcomes may lessen the sensitivity of a test of the global null hypothesis to potential unmeasured biases. While all linear combinations of scored outcomes can be considered using Scheffe projections or constrained variants thereof, finding the combination that minimizes sensitivity to unmeasured biases requires corrections for multiple testing, which can erode power, especially when many outcomes are of interest. To mitigate this issue, we propose splitting the sample into a planning sample to identify an optimal linear combination and an analysis sample to conduct inference. We provide a novel characterization of the set of linear combinations for which this approach is guaranteed to achieve the same asymptotic power as full-sample alternatives and conduct extensive simulation studies that demonstrate enhanced power in finite samples. Finally, we apply our method to investigate the effects of poverty on the emergence of cardiovascular disease risk factors in children and adolescents. We discover adverse consequences on outcomes related to body composition, physical activity, and tobacco exposure. Although the impact of poverty on elevated tobacco exposure shows some robustness to unmeasured confounding, the other findings remain sensitive to potential biases.

2026-06-03T03:50:09Z William Bekerman Anurag Mehta Rebecca E. Hasson Leah E. Robinson Dylan S. Small Colin B. Fogarty http://arxiv.org/abs/2603.21180v4 ALMAB-DC: Active Learning, Multi-Armed Bandits, and Distributed Computing for Sequential Experimental Design and Black-Box Optimization 2026-06-03T03:38:40Z

Sequential experimental design under expensive, gradient-free objectives is a central challenge in computational statistics: evaluation budgets are tightly constrained and information must be extracted efficiently from each observation. We propose \textbf{ALMAB-DC}, a GP-based sequential design framework combining active learning, multi-armed bandits (MAB), and distributed asynchronous computing for expensive black-box experimentation. A Gaussian process surrogate with uncertainty-aware acquisition identifies informative query points; a UCB or Thompson-sampling bandit controller allocates evaluations across parallel workers; and an asynchronous scheduler handles heterogeneous runtimes. We present cumulative regret bounds for the bandit components and characterize parallel scalability via Amdahl's Law. We validate ALMAB-DC on five benchmarks. On the two statistical experimental-design tasks, ALMAB-DC achieves lower simple regret than Equal Spacing, Random, and D-optimal designs in dose--response optimization, and in adaptive spatial field estimation matches the Greedy Max-Variance benchmark while outperforming Latin Hypercube Sampling; at $K=4$ the distributed setting reaches target performance in one-quarter of sequential wall-clock rounds. On three ML/engineering tasks (CIFAR-10 HPO, CFD drag minimization, MuJoCo RL), ALMAB-DC achieves 93.4\% CIFAR-10 accuracy (outperforming BOHB by 1.7\,pp and Optuna by 1.1\,pp), reduces airfoil drag to $C_D = 0.059$ (36.9\% below Grid Search), and improves RL return by 50\% over Grid Search. All advantages over non-ALMAB baselines are statistically significant under Bonferroni-corrected Mann--Whitney $U$ tests. Distributed execution achieves $7.5\times$ speedup at $K = 16$ agents, consistent with Amdahl's Law.

2026-03-22T11:47:20Z 33 pages, and 13 figures Foo Hui-Mean Yuan-chin I Chang http://arxiv.org/abs/2603.19005v3 AgentDS Technical Report: Benchmarking the Future of Human-AI Collaboration in Domain-Specific Data Science 2026-06-03T02:36:25Z

Data science plays a critical role in transforming complex data into actionable insights across numerous domains. Recent developments in large language models (LLMs) and artificial intelligence (AI) agents have significantly automated data science workflow. However, it remains unclear to what extent AI agents can match the performance of human experts on domain-specific data science tasks, and in which aspects human expertise continues to provide advantages. We introduce AgentDS, a benchmark and competition designed to evaluate both AI agents and human-AI collaboration performance in domain-specific data science. AgentDS consists of 17 challenges across six industries: commerce, food production, healthcare, insurance, manufacturing, and retail banking. We conducted an open competition involving 29 teams and 80 participants, enabling systematic comparison between human-AI collaborative approaches and AI-only baselines. Our results show that current AI agents struggle with domain-specific reasoning. AI-only baselines perform below the top quartile of competition participants, while the strongest solutions arise from human-AI collaboration. These findings challenge the narrative of complete automation by AI and underscore the enduring importance of human expertise in data science, while illuminating directions for the next generation of AI. Visit the AgentDS website here: https://agentds.org/ and open source datasets here: https://huggingface.co/datasets/lainmn/AgentDS .

2026-03-19T15:11:13Z An Luo Jin Du Xun Xian Robert Specht Fangqiao Tian Ganghua Wang Xuan Bi Charles Fleming Ashish Kundu Jayanth Srinivasa Mingyi Hong Rui Zhang Tianxi Li Galin Jones Jie Ding http://arxiv.org/abs/2606.03656v2 Beyond Point Estimates: Reliable Evaluation of Prediction Performance Metrics under Clustered Data 2026-06-03T02:21:51Z

Prediction performance metrics such as accuracy and the F1 score are typically reported as single numbers, with no measure of uncertainty. The omission has been tolerable in exploratory settings, where model evaluation is used for informal comparison rather than formal decision-making. But as machine learning is deployed in real-world applications, evaluation results are increasingly used to support binary decisions -- whether a model meets a required standard or not -- making uncertainty quantification essential. The problem is compounded when data are dependent, as in repeated measurements, clustered subjects, or time series, where variability is harder to assess and easy to underestimate. We develop a unified framework that links a broad class of performance metrics through their representation as smooth functionals of confusion-matrix probabilities. This representation allows the use of the cluster-robust sandwich variance estimator to obtain asymptotically valid confidence intervals, hypothesis tests, and paired model comparisons for both binary and multiclass problems under clustered data. We also provide power and sample size approximations based on pilot data, enabling principled study design for model evaluation. Simulations show that the proposed methods achieve near-nominal coverage across a range of dependence structures, while naive methods underestimate variability. A real-data application further illustrates how accounting for clustering can materially change conclusions. These results offer a practical foundation for uncertainty quantification and study design in prediction performance evaluation, in settings where decisions should be justified under dependent and clustered data.

2026-06-02T13:45:59Z Taekwon Hong Daeyoung Lim Woojung Bae http://arxiv.org/abs/2606.04357v1 A New Perspective on Reverse Diffusion for Monte Carlo Sampling 2026-06-03T02:17:50Z

This paper introduces a novel perspective on the use of reverse diffusion processes for sampling from unnormalized densities. The central idea is to embed the target density as the marginal at the initial time of a suitably constructed diffusion process evolving over a finite horizon. In contrast to existing approaches, the proposed methodology involves neither time discretization error nor score function estimation, so that Monte Carlo variability is the only source of approximation. A key theoretical result characterizes the Radon-Nikodym derivative of the reverse diffusion transition distribution with respect to that of an Ornstein-Uhlenbeck (OU) process. This representation provides a tractable change-of-measure formulation and serves as the foundation for two distinct classes of Monte Carlo algorithms. The first class approximates the reverse transition distribution via a sequence of pseudo-marginal Metropolis-Hastings MCMC algorithms. The resulting scheme produces an approximate i.i.d. sample from the target distribution and is fully parallelizable, as trajectories can be generated independently. The second class consists of MCMC algorithms targeting the joint law of the whole diffusion path in $[0,T]$, for a suitably chosen horizon $T$. The proposed samplers combine three types of updates. One update simulates the diffusion forward in time according to an OU dynamics, conditional on its initial value. The remaining two update the backward component via Metropolis-type steps: one conditions on the terminal value at time $T$ and the other one does not. In both cases, acceptance probabilities are implemented using Barker-type Bernoulli factory constructions. The proposed methods perform well for targets with multimodality and complex dependence structures, providing a scalable and efficient alternative to the widely used random-walk Metropolis algorithm.

2026-06-03T02:17:50Z Jairon H. N. Batista Flávio B. Gonçalves Yuri F. Saporito Rodrigo S. Targino http://arxiv.org/abs/2606.04322v1 Robust Prediction Variance Estimation for Gaussian Process Regression Under Covariance Smoothness Misspecification 2026-06-03T00:51:02Z

Best Linear Unbiased Prediction (BLUP) has been a dominant approach in Generalized Linear Mixed Models, spatial models, and Gaussian Process Regression (GPR). In addition to their optimal properties, BLUP procedures quantify prediction uncertainty. However, the general implementation of BLUP goes as follows: (i) assume the probability distribution and covariance function are known and that only the covariance parameter values are unknown; (ii) plug in parameter estimates into BLUP equations to get the Estimated Best Linear Unbiased Prediction (EBLUP) and its variance. In applications, the reality is that the true covariance function for the process is unknown and choosing the wrong covariance model, particularly its smoothness, to estimate parameters yields a quasi-EBLUP whose prediction variance is biased downward. Focusing on a GPR context, in this paper we first demonstrate that the effect of misspecification on the mean squared prediction error (MSPE) of the quasi-EBLUP converges to a positive constant when the working and true measures are non-equivalent, and is smooth in the prediction location. We then propose a new way to estimate the MSPE of the quasi-EBLUP that accounts for covariance function uncertainty. Our new estimator is compared to four other prediction variance estimators. The new prediction variance estimator generally performs better than all other competitors, and the larger the misspecification of the covariance smoothness, the wider the difference among MSPE estimators.

2026-06-03T00:51:02Z Roberto Rivera http://arxiv.org/abs/2606.04307v1 Folded Transport MCMC: Certifiable Quotient Posterior Computation for Symmetric Bayesian Models 2026-06-03T00:26:46Z

Bayesian models with finite symmetry - mixture models with exchangeable components, structural identification with closely-spaced modes - define posteriors that are invariant under a group of label permutations, creating redundant multimodality that degrades MCMC convergence diagnostics. We introduce Folded Transport MCMC (FolT-MCMC), which performs inference directly on the quotient posterior by constructing an independence sampler on the fundamental domain of the symmetry group. The quotient proposal is formed by symmetrising a learned normalising flow over the group orbits. We prove that the LCNF oscillation-based certification framework transfers to the quotient metric with a stabiliser-corrected ball-mass bound and improved covering radius, and that the quantile-core certified lower bound improves whenever the unfolded flow exhibits cross-mode proposal deficiency. On Gaussian mixtures (d = 2 - 20), label-switching targets (up to 24 equivalent modes), and a standard Bayesian three-component mixture posterior, the quantile-core certified improvement ratio ranges from 2x to 145x, with the folded certificate empirically nearly dimension-free. On real accelerometer data from a supertall building during Typhoon Mangkhut, FolT-MCMC yields a non-vacuous quantile-core certificate where the unfolded certificate is vacuous.

2026-06-03T00:26:46Z 48 pages (including supplementary material), 5 figures, 6 tables. Submitted to Journal of the Royal Statistical Society: Series B Jun Hu http://arxiv.org/abs/2209.15448v3 Blessing from Human-AI Interaction: Super Reinforcement Learning in Confounded Environments 2026-06-02T22:28:41Z

As AI becomes more prevalent throughout society, effective methods of integrating humans and AI systems that leverage their respective strengths and mitigate risk have become an important priority. In this paper, we introduce the paradigm of super policy learning that takes advantage of Human-AI interaction for data driven sequential decision making. This approach utilizes the observed action, either from AI or humans, as input for achieving a stronger oracle in policy learning for the decision maker (humans or AI). In the decision process with unmeasured confounding, the actions taken by past agents can offer valuable insights into undisclosed information. By including this information for the policy search in a novel and legitimate manner, the proposed super policy learning will yield a super-policy that is guaranteed to outperform both the standard optimal policy and the behavior one (e.g., past agents' actions). We call this stronger oracle a blessing from human-AI interaction. Furthermore, to address the issue of unmeasured confounding in finding super-policies using the batch data, a number of nonparametric and causal identifications are established under the framework of proximal causal inference. Building upon on these novel identification results, we develop several super-policy learning algorithms and systematically study their theoretical properties such as finite-sample regret guarantee. Finally, we illustrate the effectiveness of our proposal through extensive simulations and real-world applications.

2022-09-29T16:03:07Z Jiayi Wang Zhengling Qi Chengchun Shi http://arxiv.org/abs/2606.04250v1 Locally Equivalent Weights for Multilevel Regression and Poststratification 2026-06-02T21:59:03Z

Multilevel regression and poststratification (MrP) has become a workhorse method for estimating population quantities from non-probability surveys, and is the primary model-based alternative to traditional survey calibration weighting methods, such as raking. For simple linear regression models, MrP methods admit ``equivalent weights'', allowing for direct comparisons between MrP and traditional calibration weighting. Such weights, however, have been unavailable for the most widely used MrP models, such as logistic regression. In this paper, we develop a natural generalization, ``MrP locally equivalent weights'' (MrPlew), which represent MrP as a weighting-style estimator that is locally equivalent to calibration weights near the observed responses. This enables a suite of standard weighting diagnostics, including frequentist sampling variability, covariate balance, and subgroup contribution. We formally justify the use of MrPlew in these cases: we prove the MrPlew-based variance estimator is asymptotically equivalent to the infinitesimal jackknife for common exponential family models, and we introduce a novel class of model checks based on invariance to data perturbations that generalize covariate balance and subgroup contribution to nonlinear models. We further show that MrPlew can be computed easily using existing MCMC samples and provide open-source software to compute MrPlew using the output of standard software. We illustrate our approach for several canonical studies that use MrP, including via a logistic regression outcome model, showing that implied covariate balance can sometimes be worse for MrP than for raking. Given the ease of computing, we recommend making MrPlew a standard part of the MrP model interrogation workflow.

2026-06-02T21:59:03Z 60 pages Ryan Giordano Alice Cima Jared Murray Erin Hartman Avi Feller http://arxiv.org/abs/2606.04237v1 Constrained Weighted Bayesian Bootstrap 2026-06-02T21:37:10Z

We prove the weighted Bayesian bootstrap, a method for approximate sampling of a posterior distribution, can be extended to sample from general constrained posterior distributions under mild assumptions. The method entails a simple algorithm that can take advantage of fast tools from convex optimization. Under regularity conditions, we show the asymptotic distribution of samples from the constrained weighted Bayesian bootstrap has a covariance matching the restricted maximum likelihood estimator, an efficient estimator. We assess the method empirically on a variety of constrained Bayesian problems, demonstrating broad applicability of the method as well as advantages over existing peer methods. The constrained weighted Bayesian bootstrap quickly samples from constrained posteriors, providing adequate uncertainty quantification for problems typically solved via optimization methods designed to deliver only a point estimate. As a case study, using constraints required in European-style option prices, uncertainty estimates of an option pricing surface are derived with constrained weighted Bayesian bootstrap.

2026-06-02T21:37:10Z 24 Pages, 8 Figures. Accepted to 42nd Conference on Uncertainty in Artificial Intelligence (uai2026) Sam Rosen Jason Xu http://arxiv.org/abs/2605.19271v2 Ranking with Confidence: A Probabilistic Framework for Deterministic Ranking Methods 2026-06-02T21:27:38Z

Rankings are central to decision-making in fields ranging from education to online platforms, yet classical deterministic methods such as the Borda count method or Copeland-type pairwise methods ignore uncertainty due to sampling noise or incomplete data. We propose a probabilistic framework that treats true ranks as latent random variables, enabling quantification of ranking uncertainty. We introduce new ranking criteria based on pairwise dominance probabilities, derive approximate inference procedures, and provide a novel Worst Best rank method to construct simultaneous and individual confidence intervals for ranks. Our approach is the first to provide formal uncertainty quantification for classical deterministic rankings. It is inherently robust to missing data: unlike Copeland type methods, which penalize entities with fewer observed comparisons by assigning them fewer wins, our pairwise probability model adjusts for incompleteness, eliminating bias toward items with more complete records. The resulting rankings reflect underlying performance rather than data availability, enhancing fairness, transparency, and statistical reliability in high-stakes applications.

2026-05-19T02:37:34Z Shunpu Zhang http://arxiv.org/abs/1812.05678v5 Objective-Driven Ensembles: Bridging the Gap Between Interpretable Sparsity and Algorithmic Prediction 2026-06-02T20:52:21Z

Sparse methods (e.g., Best Subset Selection, Elastic Net) are the standard approach for obtaining interpretable models, but they can suffer from high variance and vulnerability to spurious correlations. Alternatively, algorithmic ensembles (e.g., Random Forests, Gradient Boosting) achieve high prediction accuracy but yield uninterpretable black boxes driven by randomization or sequential residual fitting. In recent years, a unifying paradigm has emerged: Objective-Driven Ensembles. By generalizing best subset selection into a joint mathematical optimization problem, this approach generates interpretable ensembles by optimally splitting predictors across a small number of diverse models. In this paper, we synthesize this growing body of literature and provide theoretical insights into its empirical success. Specifically, we show that penalizing predictor overlap mathematically bounds prediction covariance and mitigates the impact of finite-sample spurious correlations. We demonstrate these properties using an exact combinatorial oracle, and review how recent computational approximations have successfully scaled this framework to a variety of domains, including high-dimensional data, classification tasks, and settings with casewise or cellwise contamination, achieving machine-learning-level accuracy while retaining the interpretability of sparse models.

2018-12-13T20:36:38Z Anthony Christidis Stefan Van Aelst Ruben Zamar