https://arxiv.org/api/pUs8yj4XFdAelrVnDDan3tFCfwA2026-06-14T02:46:26Z3617173515http://arxiv.org/abs/2511.21836v2A simple and powerful test of vaccine waning2026-05-20T09:07:53ZDetermining whether vaccine efficacy wanes is important for individual and public decision making. Yet, quantification of waning is a subtle task. The classical approaches cannot be interpreted as measures of declining efficacy unless we impose unreasonable assumptions. Recently, formal causal estimands designed to quantify vaccine waning have been proposed. These estimands can be bounded under weaker assumptions, but the bounds are often too wide to make claims about the presence of waning. We propose a different approach: a formal test to assess whether a treatment effect is constant over time at the individual level. This test provides a considerable power gain over existing approaches and is valid under interpretable assumptions in vaccine trials. We illustrate the increase in power through real and simulated examples, using three different approaches to compute the test statistics. Two of these approaches are based solely on summary data, accessible from existing clinical trials. Beyond our test, we also give new results that bound the waning effect. We use our methods to reanalyze data from a randomized controlled trial of the BNT162b2 COVID-19 vaccine. While prior analysis did not establish waning, our test rejects the null hypothesis of no waning.2025-11-26T19:06:15ZGellért PerényiMatias JanvinMats J. Stensrudhttp://arxiv.org/abs/2605.20817v1Topics in Nonparametric Bayesian Statistics2026-05-20T07:13:49ZThe intersection set of Bayesian and nonparametric statistics was almost empty until about 1973, but now is growing at a healthy rate. This chapter, for the {\it Highly Structured Stochastic Systems} book (Oxford University Press, 2003) gives an overview of various theoretical and applied research themes inside this field, partly complementing and extending recent reviews of Dey, M{ü}ller and Sinha (1998) and Walker, Damien, Laud and Smith (1999). The intention is not to be complete or exhaustive, but rather to touch on research areas of interest, partly by example.2026-05-20T07:13:49Z23 pages, no figures. Published, in modified form, as Chapter 15 in the book `Highly Structured Stochastic Systems' (Oxford University Press, 2003, eds. P.J. Green, N.L. Hjort, S. Richardson)Nils Lid Hjorthttp://arxiv.org/abs/2605.20806v1Evaluation of the number of clusters in a data set using $p$-values from Multiple Tests of Hypotheses2026-05-20T06:58:04ZThis paper proposes a novel, nonparametric, interpoint distance-based measure to investigate whether there exist any groups in a set of given data, and if so then, how many groups are prevailing in total. It is a cluster accuracy index useful for arbitrary-dimensional data set, in association with any clustering algorithm having the number of groups specified as a priori. We perform univariate, nonparametric, multiple statistical tests of hypotheses, where as many dependent tests as the sample size are carried out using the interpoint distances. They possess $p$-values to be combined to reach a decision, which is taken in a step-wise process for a possible number of clusters. It reduces the unnecessary computations compared with the other accuracy measures from the literature. Data study establishes the proposed index's efficiency and superiority.2026-05-20T06:58:04ZCommunications in Statistics - Theory and Methods (2024), 53, 8878-8889Soumita Modak10.1080/03610926.2024.2309967http://arxiv.org/abs/2605.20154v2Component over Composite: Mitigating Type I Error Inflation when Imputing "Days Alive and at Home"2026-05-20T06:56:01ZBackground: Days Alive and at Home (DAH) over a pre-defined follow-up period is a novel post-intervention composite outcome that combines data from at least three components: (i) initial length of hospital stay, (ii) length of total readmissions or other post-discharge care and (iii) mortality. Missing values bring unique challenges to the analysis of trials with the DAH outcome as the three components may have different rates of missingness caused by distinct missing data mechanisms. Current approaches define DAH as missing if any of the components are missing, and proceed with complete cases or Multiple Imputation (MI) of the composite. Methods: Through a simulation study motivated by the NOTACS trial, we compare several methods of handling missing data, including complete case analysis, MI of the composite, and MI of the components when the primary analysis is a Mann-Whitney-Wilcoxon test. Results: MI on the component level has good properties in terms of type I error control and power. We caution against the use of MI on the composite level with Predictive Mean Matching, which can lead to type I error inflation. Conclusions: Given the complex distributional characteristics of DAH, naive approaches such as defining missingness on the composite level and directly imputing the composite with Predictive Mean Matching, can lead to type I error inflation. Imputing on the component level is recommended, suggested future work included imputation approaches that are compatible with more complex definitions of DAH, as well as recommendations for sensitivity analyses to the Missing at Random assumption.2026-05-19T17:43:03ZMia S. TackneySarah DawsonLetao YuanDominique-Laurent CouturierSofia S. Villarhttp://arxiv.org/abs/2605.20767v1The Illusion of Intervention: Your LLM-Simulated Experiment is an Observational Study2026-05-20T06:09:41ZLarge language models (LLMs) show potential as simulators of human behavior, offering a scalable way to study responses to interventions. However, because LLMs are trained largely on observational data, interventions in experiments with LLM-simulated synthetic users can induce unintended shifts in latent user attributes, causing user drift where the implicit simulated population differs across treatment conditions, potentially distorting effect estimates. We formalize the confounding or selection bias that can arise due to user drift and show how intervention-dependent shifts can inflate or attenuate observed differences in user responses under intervention. To diagnose confounding, we propose using negative control outcomes--attributes that should remain invariant under intervention--to identify distribution shifts across intervention conditions, providing evidence of user drift. To mitigate drift, we study adjusting the persona specification by eliciting additional confounders, finding that targeted, setting-relevant confounders can substantially reduce bias across survey-style and multi-turn agent evaluations.2026-05-20T06:09:41ZVictoria LinTaedong YunMaja MatarićJohn CannyArthur GrettonAlexander D'Amourhttp://arxiv.org/abs/2605.20726v1Everywhere Valid Bounds on False Discovery Proportions in Conformal Inference2026-05-20T05:24:39ZModern applications of conformal inference to multiple testing problems, such as outlier detection and candidate selection, often involve selecting test samples whose conformal p-values fall below a threshold. The quality of such methods is often measured by the false discovery proportion (FDP), defined as the fraction of incorrect selections. Existing approaches typically control the expected value of the FDP, using methods such as the Benjamini-Hochberg procedure. This approach fails to provide high-probability bounds on the realized false discovery proportion and invalidates statistical guarantees if the rejection threshold is selected after inspecting the data. This paper establishes finite-sample, distribution-free upper bounds on the FDP that hold simultaneously over all possible rejection thresholds, enabling arbitrary post hoc selection of the threshold. Simultaneous validity is achieved by constructing a high-probability envelope for the empirical distribution function of null conformal p-values by sampling from their joint distribution. Furthermore, our framework allows practitioners to modulate the envelope's shape, thereby producing tight bounds in rejection regions of primary interest. We use this flexible approach to derive simultaneous FDP upper bounds for both outlier detection and conformal selection. We demonstrate through synthetic and real-data experiments that the resulting bounds are both valid and substantially less conservative than those derived from existing approaches.2026-05-20T05:24:39Z31 pages, 12 figures. Code available at https://github.com/sza919/everywhere-valid-fdp-bounds-in-conformal-inferenceZiang SongYing JinEmmanuel J. Candèshttp://arxiv.org/abs/2605.20710v1Assessing Estimate of CATE from Observational Data via an RCT Study2026-05-20T05:07:54ZConditional average treatment effects (CATEs) are increasingly estimated from observational data and used to guide policy and individualized treatment decisions. Before such estimates can be trusted in practice, their predictive fitness needs to be assessed, yet observational data alone offer limited opportunities for doing so. We propose CATE Assessment via Fitness Evaluation (CAFE), a formal framework for directly assessing the goodness-of-fit of a CATE estimate learned from observational data, rather than the full underlying outcome model, using evidence from a randomized trial. CAFE partitions the trial covariate space according to estimated propensity scores (or the like) and compares observationally derived conditional treatment effects with group-level experimental averages. The framework accommodates a broad class of CATE learners, including parametric models and flexible machine learning methods such as causal forest and boosting. We establish theoretical guarantees under both the null and alternative hypotheses, and introduce a maximum-type extension to improve sensitivity to localized lack of fit. When both randomized trial and observational data are available, we further develop a two-stage procedure to detect the existence of unobserved confounders. Extensive numerical studies show the utility of the CAFE approach when assessing observational-derived CATE estimates.2026-05-20T05:07:54Z34 pages, 5 figuresBosen CuiYuhong Yanghttp://arxiv.org/abs/2605.20692v1Inferring infectiousness: a joint model of the within-host viral kinetics of SARS-CoV-22026-05-20T04:39:53ZDuring an infectious disease outbreak, providing accurate answers to policy questions about transmission requires a detailed model of the natural history of infectiousness. Unfortunately, direct measures of infectiousness are generally unavailable. Instead, we often rely on indirect proxies, such as viral load measured by PCR or antigen tests, viral culture to detect replication-competent virus, or symptom onset, each of which reflects different aspects of viral dynamics or host response. However, these proxies vary in terms of the ease of collection, scalability, and their relationship to viral shedding and therefore underlying infectiousness. Here, we use data from five prospective, densely sampled cohorts with longitudinal data on multiple proxies of viral shedding for approximately 2,000 infections to develop a Bayesian joint model for the within-host viral kinetics of SARS-CoV-2 infection. Modeling the joint distribution allows us to infer the trajectory of infectious virus shedding -- the most direct correlate of infectiousness -- for individuals who contribute only PCR data, and to compute derived quantities that are inaccessible from any single proxy alone. These include the population-level probability and expected duration of ongoing infectiousness as a function of time since diagnosis, stratified by variant, vaccination status, and infection history; the residual risk of releasing an individual from isolation; and personalized, real-time estimates of infectiousness that are sequentially updated as new test results become available.2026-05-20T04:39:53ZChristopher B. BoyerStephen M. KisslerSeran HakkiJakob JonnerbyAjit LalvaniMarc Lipsitchhttp://arxiv.org/abs/2605.20681v1Scale-Calibrated Median-of-Means for Robust Distributed Principal Component Analysis2026-05-20T03:48:31ZDistributed principal component analysis (PCA) produces node-level estimates of both a mean vector and a principal subspace. Robustly aggregating these heterogeneous objects requires a relative scale between mean error and subspace error. We study a scale-calibrated median-of-means estimator for this problem using the product geometry of Euclidean space and the Grassmann manifold. A node-level PCA expansion shows that the mean component has the usual linear influence, whereas the subspace component is an eigengap-weighted covariance perturbation. We prove a local reduction showing that the proposed product-manifold median-of-means estimator is asymptotically equivalent to a scaled spatial median of node influence errors. This yields fixed-node non-Gaussian limits, growing-node Gaussian limits with finite-block bias, and an explicit scale-dependent covariance formula. We propose robust block-scale and inference-optimal calibration rules, establish high-probability median-of-means bounds, characterize factorwise bad-node influence, and prove node-bootstrap validity. Simulations and large-scale single-cell RNA-seq data show that scale calibration adapts to eigengap-driven subspace uncertainty and provides a robust distributed PCA summary.2026-05-20T03:48:31ZKisung Youhttp://arxiv.org/abs/2605.20634v1New Confidence Regions for Linear Regression Parameters with Stationary-Ergodic Dependent Errors2026-05-20T02:40:35ZWe develop joint confidence regions for linear regression coefficients when the regressors and errors are jointly stationary and ergodic with unspecified serial dependence. The method applies random smoothing, using an independent auxiliary sample and shrinking bandwidth, to a vector of regression and second-moment statistics. Under stationarity, ergodicity, and finite second moments, the estimator is asymptotically normal and yields Wald confidence regions and simultaneous confidence intervals without direct long-run variance estimation or a parametric dependence model. For implementation, we introduce a scaled estimator with data-driven bandwidth selection and a mild truncation that improves finite-sample stability. Simulations under ARMA, ARFIMA, copula-based Markov errors, and fractional Gaussian noise, with Gaussian and heavy-tailed margins, show near-nominal coverage and competitive region volumes relative to Newey-West HAC and MAC. A winter Beijing PM2.5 application illustrates the procedure. Keywords: Random smoothing, Joint inference, Confidence regions, Dependent errors, Long memory, Regression inference2026-05-20T02:40:35ZMous-Abou HamadouMartial LonglaMathias Nthiani MuiaMahmud Hasanhttp://arxiv.org/abs/2605.20633v1Application of Propensity Score Models and Causal Estimators in Observational Studies under Model Misspecification2026-05-20T02:36:46ZPropensity score (PS) methods are widely used in observational studies to reduce confounding and estimate causal treatment effects. However, the validity of PS-based causal estimators depends heavily on correct model specification, and model misspecification may lead to substantial bias and instability. In this study, we systematically evaluate the performance of commonly used causal estimators, including response surface modeling (RSM), inverse probability weighting (IPW), and augmented inverse probability weighting (AIPW), under varying levels of PS and outcome model misspecification. We compare classical logistic regression with several machine learning approaches for PS estimation, including random forests (RF), support vector machines (SVM), and linear discriminant analysis (LDA). Extensive simulation studies were conducted under multiple scenarios defined by combinations of correctly specified and misspecified PS and outcome models, varying sample sizes, and different covariate correlation structures. Estimator performance was assessed using bias, absolute bias, root mean squared error, empirical standard error, and confidence interval width. Results demonstrate that AIPW consistently provides robust and stable estimates across most scenarios due to its doubly robust property, whereas IPW is highly sensitive to PS misspecification and unstable PS estimates produced by flexible machine learning methods. RSM performs well only when the outcome model is correctly specified. Real-world applications using the ACTG175 clinical trial and the Alzheimer's Disease Neuroimaging Initiative (ADNI) dataset further illustrate the practical implications of estimator choice and PS modeling strategy. Overall, our findings highlight the importance of integrating flexible machine learning approaches within doubly robust frameworks to improve causal effect estimation in observational studies.2026-05-20T02:36:46Z24 pages, 4 figuresApu Chandra DasSakib SalamMd Robiul Islam TalukderAshim Chandra DasAntar Chandra DasRakhi Chowdhuryhttp://arxiv.org/abs/2504.05431v3A Generalized Tangent Approximation based Variational Inference Framework for Strongly Super-Gaussian Likelihoods2026-05-20T02:16:55ZVariational inference, as an alternative to Markov chain Monte Carlo sampling, has played a transformative role in enabling scalable computation for complex Bayesian models. Nevertheless, existing approaches often depend on either rigid model-specific formulations or stochastic black-box optimization routines. Tangent approximation is a principled class of structured variational methods that exploits the geometry of the underlying probability model. However, its utility has largely been confined to logistic regression and related modeling regimes. In this article, we propose a novel variational framework based on tangent transformation for a broad class of probability models characterized by strongly super-Gaussian likelihoods. Our method leverages convex duality to construct tangent minorants of the log-likelihood, thereby inducing conjugacy with Gaussian priors over model parameters in an otherwise intractable setup. Under mild assumptions on the data-generating mechanism, we establish algorithmic convergence guarantees, a contribution that stands in contrast to the limited theoretical assurances typically available for black-box variational methods. Additionally, we derive near-minimax optimal bounds for the variational risk. Superior performance of our proposed methodology is illustrated on simulated and real-data scenarios that challenge state-of-the-art variational algorithms in terms of scalability and their ability to consistently capture complex underlying data structure.2025-04-07T18:54:05Z135 pages, 51 figures, 13 tables, Revision SubmittedSomjit RoyPritam DeyDebdeep PatiBani K. Mallickhttp://arxiv.org/abs/2605.20621v1Changepoint Detection in Categorical Time Series with Application to Daily Total Cloud Cover in Canada2026-05-20T02:14:24ZChangepoints are essential for homogenizing categorical time series and analyzing their trends and variations. The original total cloud cover in Canada was recorded hourly in tenths (or eighths), exhibiting inherent seasonality and serial correlation. Lu and Wang (2012) introduced an extended cumulative logit model to detect shifts in the annual frequencies of cloud cover conditions. While annual aggregation mitigates seasonality and serial correlation, it shortens the time series and may lead to overdispersion. This article introduces a marginalized transition model to detect a single changepoint in periodic and serially correlated categorical time series. The model captures serial dependence using a first-order Markov chain and enables category-specific changepoint specification. To enhance computational efficiency, we develop a new parameter estimation procedure for obtaining maximum likelihood estimates. A maximally selected likelihood ratio test statistic is then proposed to test for sudden changes in categorical time series, and the method is illustrated using daily total cloud cover observations recorded at 9 a.m. and 3 p.m. at Fort St. John Airport, British Columbia, Canada.2026-05-20T02:14:24Z31 pages, 16 figures, 5 tables; includes supplementary material; R/Rcpp code available in the linked GitHub repositoryMo LiQiQi LuXiaoLan Wanghttp://arxiv.org/abs/2605.20604v1Conditional regularized halfspace depth for sparse functional data and its applications2026-05-20T01:48:32ZMany functional datasets are observed sparsely and irregularly. Ordering such data is challenging because only limited information is available from each observation, while the underlying trajectories remain infinite-dimensional. This paper develops a novel depth notion for sparse functional data, called the conditional regularized halfspace depth (CRHD). CRHD is defined as the infimum of conditional halfspace probabilities of the underlying trajectory given the observed sparse measurements, thereby enabling depth evaluation directly at sparse observations without requiring trajectory reconstruction. We study several basic theoretical properties of CRHD that clarify its behavior as a depth measure. The proposed depth is applicable even to extremely sparsely observed functional data, overcoming key limitations of existing sparse functional depths that often rely on reconstructed curves. In addition, CRHD induces meaningful rankings for complex functional data. Its numerical performance is demonstrated through rank-based tests, and its practical utility is illustrated using an infant growth dataset.2026-05-20T01:48:32ZHyemin YeonXiongtao DaiSara Lopez-Pintadohttp://arxiv.org/abs/2602.04092v2Time-to-Event Estimation with Unreliably Reported Events in Medicare Health Plan Payment2026-05-20T01:38:36ZOBJECTIVE: To propose time-to-event estimators that help evaluate incident diagnostic coding and possible upcoding in Medicare as well as introduce an open-source software package that enables more reproducible methods development relevant to Medicare billing behavior. STUDY SETTING AND DESIGN: Observational analysis of simulated upcoding based on coding by insurers or providers that may be incentivized by Medicare Advantage risk adjustment. DATA SOURCES AND ANALYTIC SAMPLE: Two years of separately simulated incident health condition coding data for a Medicare Advantage population and a Traditional Medicare population where coding patterns are aligned with known practices in each program. PRINCIPAL FINDINGS: We propose several novel time-to-event estimators of incident coding intensity and possible upcoding in Medicare Advantage, including accounting for unreliable reporting. We demonstrate estimator performance in simulated data leveraging the National Institutes of Health's All of Us study and also develop an open source R package to simulate longitudinal realistic labeled upcoding data, which were not previously available for researchers. In simulations, our novel estimators recovered differences in upcoding within and across monitoring periods. Undercoding had a limited effect on our novel estimators while an existing estimator was more sensitive to undercoding. CONCLUSIONS: Our proposed estimators can help researchers and policymakers track new coding behaviors (e.g., as may be incentivized by risk adjustment formula updates) earlier and at scale while accounting for several real-world data considerations. Further, the R package we provide can be used to improve the development, accessibility, and reproducible evaluation of coding intensity and upcoding methodology.2026-02-04T00:04:44Z44 pages, 10 figuresOana M. EnacheSherri Rose