https://arxiv.org/api/mK0YzploqWXzyQlPYJhM4+U3n042026-06-13T22:23:11Z3617167515http://arxiv.org/abs/2605.23208v1A Direct Variance Estimation (DiVE) for Meta-Analysis of Median Differences2026-05-22T03:52:33ZMeta-analyses of two-group studies that report median differences typically rely on methods that require, in addition to the median difference and sample size, summary measures of dispersion such as quartiles or ranges. Studies that do not report such statistics are often excluded from the meta-analysis. Existing two-stage approaches first estimate the asymptotic variance of the median difference within each study under parametric assumptions, and then combine these study-specific estimates to obtain the pooled median difference and its variance. We propose Direct Variance Estimation (DiVE), a method that directly estimates the variance of the pooled difference using only study-level median differences and their sample sizes. A comprehensive simulation study across a wide range of distributional scenarios shows that DiVE performs comparably to or better than conventional two-stage methods, with clear advantages when the number of studies is small. A re-analysis of published meta-analyses demonstrates that DiVE enables the inclusion of studies lacking dispersion statistics, leading to a more comprehensive and potentially less biased synthesis of evidence.2026-05-22T03:52:33ZTadahisa OkudaMasataka TaguriKenichi Hayashihttp://arxiv.org/abs/2605.23207v1Mixture-of-Finite-Mixtures Wishart Model for Clustering Covariance Matrices with an Application to Brain Functional Connectivity2026-05-22T03:52:19ZData represented as covariance-type matrices arise in many fields, including brain functional connectivity and diffusion tensor imaging. We develop the MFM-Wishart, a Bayesian model-based clustering approach for such data that combines Wishart mixture components with a mixture-of-finite-mixtures (MFM) prior, allowing joint posterior inference on both the number of clusters and clustering assignments. Theoretically, we study the properties of Wishart kernels in the context of mixture models and then establish results for posterior consistency for the number of clusters and posterior contraction of the mixing measure under standard regularity conditions. Computationally, we develop an efficient Markov chain Monte Carlo (MCMC) algorithm for posterior inference. Simulation studies show competitive clustering performance and accurate recovery of the number of clusters, even under model misspecification. We apply MFM-Wishart to cluster infants based on functional connectivity during sleep, estimated from functional near-infrared spectroscopy (fNIRS) data, illustrating the practical utility of the model and revealing interpretable heterogeneity.2026-05-22T03:52:19ZZongyu LiStefano CastruccioZhiyong Zhanghttp://arxiv.org/abs/2605.23145v1Operationalizing Individual Fairness via Gradient Descent and Bradley-Terry Models2026-05-22T01:50:38ZIndividual fairness, the notion that "similar individuals should be treated similarly," provides a strong and flexible fairness guarantee for algorithmic decision makers. However, a barrier to implementing individual fairness in practice is the difficulty of learning the similarity metric over individuals. In this work, we present an algorithm for learning a Mahalanobis similarity metric from triplet queries of the form "is individual $i$ more similar to individual $j$ or $k$?" We work in the standard Bradley-Terry model for pairwise comparisons. Our algorithm consists of a spectral initialization step followed by gradient descent. We provide extensive theoretical guarantees on our algorithm, showing that it converges quickly to the ground truth metric despite the non-convexity of the loss in our model. Because our focus is on fairness, we also show that individual fairness with respect to an estimated metric is sufficient to achieve similar fairness with respect to the true metric. We also discuss potential applications of our work to AI model tuning. Finally, we present experimental results that demonstrate the convergence of our algorithm and the fairness performance of downstream fair predictors trained on our estimated metric.2026-05-22T01:50:38Z60 pages, 2 figuresConlan OlsonLinjun ZhangZhun DengPragya Surhttp://arxiv.org/abs/2605.23102v1LLM Sparsity Prior for Robust Feature Selection2026-05-21T23:34:04ZLarge language models (LLMs) offer a scalable mechanism to elicit domain-informed prior information for high-dimensional variable selection. However, existing methods such as LLM-Lasso are sensitive to weight quality, with performance degrading substantially when LLM-generated weights are inaccurate. To address this challenge, we first introduce a framework for quantifying the quality of LLM-generated weights, enabling rigorous evaluation of LLM-informed methods across varying weight regimes. We then propose the LLM Sparsity Prior (LSP), which integrates LLM-generated weights into the prior inclusion probabilities of Spike-and-Slab and Spike-and-Slab Lasso models via two interpretable hyperparameters governing global sparsity and weight concentration. Hierarchical hyperpriors on these parameters allow the model to dynamically discount uninformative or misleading weights, improving robustness without sacrificing gains when weights are accurate. Finally, we develop principled prompt engineering strategies and validate the method on a private medical dataset studying Acute Kidney Injury. LSP improves prediction accuracy and identifies clinically relevant features missed by the baselines, with robustness to prompt variation and particular effectiveness in low-data regimes.2026-05-21T23:34:04ZCaleb SkinnerYihan GuoMeng Lihttp://arxiv.org/abs/2605.23048v1StanBKT: Rethinking Parameter Estimation in Bayesian Knowledge Tracing2026-05-21T21:27:10ZBayesian Knowledge Tracing (BKT) is a widely used and interpretable student modeling approach in intelligent tutoring systems and educational data mining. However, most implementations rely on expectation-maximization or related optimization methods that yield only point estimates, limiting uncertainty quantification and principled comparisons across learners and conditions. We introduce StanBKT, an open-source Python package for estimating BKT models using Bayesian inference in Stan. StanBKT provides a unified framework supporting Hamiltonian Monte Carlo, variational inference, Pathfinder, and optimization-based estimation while preserving the hidden Markov structure and interpretability of classical BKT. It supports standard, grouped, and hierarchical BKT models, flexible prior specification, posterior predictive inference, and utilities for visualization and diagnostics. We evaluate StanBKT on large-scale observational and controlled educational datasets. On the ASSISTments 2020 dataset, we show that supported inference methods achieve comparable predictive performance while differing in computational efficiency and posterior fidelity. We further demonstrate how posterior inference enables principled comparison of condition-specific parameters in an educational intervention involving perceptual cue manipulations. Results illustrate how uncertainty quantification facilitates more reliable interpretation of differences in learning, forgetting, guessing, and slipping parameters across experimental conditions. Overall, StanBKT extends BKT beyond point estimation by providing a flexible framework for probabilistic student modeling, uncertainty quantification, and hierarchical inference in educational data mining.2026-05-21T21:27:10Z5 figures, 7 tablesSiddhartha PradhanYanping PeiMorgan LeePuyuan ZhangErin OttmarAdam C. Saleshttp://arxiv.org/abs/2605.23016v1Sample correlation adjustments for robust Multi-fidelity Monte Carlo under limited pilot sampling2026-05-21T20:38:08ZMulti-fidelity Monte Carlo (MFMC) is a variance reduction method that leverages a multi-fidelity ensemble of models of varying cost and accuracy levels. Constructing an MFMC estimator with optimal variance requires knowledge of the correlation coefficients between the different fidelity models which are not usually known in practice. The correlations are typically estimated using offline pilot samples and the sample correlation formula, after which the MFMC method proceeds as if the estimated correlations are the true correlations. Computational cost often restricts the number of pilot samples used leading to poor correlation estimates and suboptimal estimators. Leveraging the MFMC problem setting and probabilistic information about the sample covariance matrix, we present a method to improve standard sample-based correlation estimates in the presence of limited pilot samples. We define a novel discrepancy function quantifying the estimator suboptimality which in turn facilitates selecting a correlation estimator minimizing the worst-case expected discrepancy, where the expectation is taken with respect to the pilot sampling variability. Through a simple bivariate Gaussian example and a multi-fidelity modeling application from a NASA Entry, Descent, and Landing (EDL) problem, we show that this method produces better MFMC estimators than the standard sample covariance under small pilot sample sizes and limited total budgets.2026-05-21T20:38:08ZMichael StanleyThomas CoonsGeoffrey BomaritoPatrick LeserJoshua PribeJames Warnerhttp://arxiv.org/abs/2506.10152v2Robust copula estimation for one-shot devices with correlated failure modes2026-05-21T19:21:52ZThis paper presents a robust method for estimating copula models to evaluate dependence between failure modes in one-shot devices-systems designed for single use and destroyed upon activation. Traditional approaches, such as maximum likelihood estimation (MLE), often produce unreliable results when faced with outliers or model misspecification. To overcome these limitations, we introduce a divergence-based estimation technique that enhances robustness and provides a more reliable characterization of the joint failure-time distribution. Extensive simulation studies confirm the robustness of the proposed method. Additionally, we illustrate its practical utility through the analysis of a real-world dataset.2025-06-11T20:12:45ZE. CastillaP. J. Chocanohttp://arxiv.org/abs/2602.08927v3Online monotone density estimation and log-optimal calibration2026-05-21T19:20:17ZWe study the problem of online monotone density estimation, where density estimators must be constructed in a predictable manner from sequentially observed data. We propose two online estimators: an online analogue of the classical Grenander estimator, and an expert aggregation estimator inspired by exponential weighting methods from the online learning literature. In the well-specified stochastic setting, where the underlying density is monotone, we show that the expected cumulative log-likelihood gap between the online estimators and the true density admits an $O(n^{1/3})$ bound. We further establish a $\sqrt{n\log{n}}$ pathwise regret bound for the expert aggregation estimator relative to the best offline monotone estimator chosen in hindsight, under minimal regularity assumptions on the observed sequence. As an application of independent interest, we show that the problem of constructing log-optimal p-to-e calibrators for sequential hypothesis testing can be formulated as an online monotone density estimation problem. We adapt the proposed estimators to build empirically adaptive p-to-e calibrators and establish their optimality. Numerical experiments illustrate the theoretical results.2026-02-09T17:29:39Z31 pages, 2 figuresRohan HoreRuodu WangAaditya Ramdashttp://arxiv.org/abs/2512.15244v2Non-parametric Causal Inference in Dynamic Thresholding Designs2026-05-21T18:55:36ZWe consider causal inference in dynamic settings where treatment is assigned by thresholding a state variable that can change over time. There is a large literature on regression-discontinuity methods building on the fact that, in the static setting, treatment assignment via threshold crossing induces a quasi-experimental design that enables pragmatic causal inference. But dynamic settings involve challenges not present in the static setting, e.g., past treatments may affect current state and thus future treatments, and so existing regression-discontinuity methods do not apply. Here, we show that dynamic thresholding designs identify a marginal policy effect that nests the classical regression-discontinuity parameter in the static setting; and propose a tailored local linear regression estimator that is consistent for this marginal policy effect. We demonstrate our approach using an experiment that emulates real-world optimization of thresholds for continuous glucose monitoring using data generated from an FDA-approved simulator.2025-12-17T09:43:20ZAditya GhoshStefan Wagerhttp://arxiv.org/abs/2605.22950v1Diffusion-based Denoising Beats Vanilla Score Matching in Parameter Estimation: A Theoretical Explanation2026-05-21T18:25:22ZScore matching is an alternative to maximum likelihood estimation when the normalizing constant is unknown or too costly to evaluate. However, vanilla score matching has shown to be inefficient relative to maximum likelihood estimation for multimodal distributions with well-separated modes, which are commonly encountered in practical applications. We compare a novel diffusion-based denoising score matching estimator (DDSME) to the vanilla score matching estimator (SME) in this scenario. In particular, we prove statistical guarantees for both estimators, showing that the error bound for the vanilla SME worsens when the separation between the modes increases, which can be avoided in case of the DDSME with suitable hyperparameter tuning. This provides a novel theoretical explanation for the superior behavior of diffusion-based score matching over the vanilla version.2026-05-21T18:25:22ZBenedikt Lütke SchwienhorstNadja KleinJohannes Ledererhttp://arxiv.org/abs/2605.22640v1Positive-definiteness in separable priors: effects on prior interpretability and inference2026-05-21T15:45:31ZA popular class of priors for symmetric positive-definite matrices assumes independent entries and adds a truncation to ensure positive-definiteness. While conceptually simple and often computationally convenient, unless done carefully this truncation can have unintended effects. If the truncated prior or its margins are significantly different from their untruncated counterpart, then its interpretability may suffer, its shrinkage properties become harder to characterise, and posterior inference may be affected in unanticipated ways. We investigate the effect of the truncation both for dense and sparse matrices, and show how to set prior parameters such as the variance of off-diagonal entries such that said effect is mitigated as the matrix dimension grows. We pay particular attention to sparse inference where, unless prior parameters are set carefully, the truncated prior and hence its corresponding posterior assign systematically higher mass to sparser structures than the untruncated prior.2026-05-21T15:45:31Z32 pages, 3 figuresJack Storror CarterDavid Rossellhttp://arxiv.org/abs/2601.11845v2Reevaluating Causal Estimation Methods with Data from a Product Release2026-05-21T15:43:45ZRecent developments in causal machine learning methods have made it easier to estimate flexible relationships between confounders, treatments and outcomes, making unconfoundedness assumptions in causal analysis more palatable. How successful are these approaches in recovering ground truth baselines? In this paper we analyze a new data sample including an experimental rollout of a new feature at a large technology company and a simultaneous sample of users who endogenously opted into the feature. We find that recovering ground truth causal effects is feasible -- but only with careful modeling choices. Our results build on the observational causal literature beginning with LaLonde (1986), offering best practices for more credible treatment effect estimation in modern, high-dimensional datasets.2026-01-17T00:25:03Z53 pagesJustin YoungEleanor Wiske Dillonhttp://arxiv.org/abs/2509.05443v3Multidimensional constructs and moderated linear and nonlinear factor analysis2026-05-21T15:30:00ZMultidimensional factor models with moderations on all model parameters have so far been limited to single-factor and two-factor models. This does not align well with existing psychological measures, which are commonly intended to assess 3-5 dimensions of a latent construct. In this paper, I introduce a multidimensional MNLFA model that permits the moderation of item intercepts, loadings, residual variances, factor means, variances, and correlations across three or more latent factors. I describe efforts to implement the model using Bayesian methods through Stan and penalized maximum likelihood approaches to stabilize estimation and detect partial measurement non-invariance while preserving model interpretability. Closed-form analytic gradients of the likelihood, eliminating the need for costly numerical or MCMC-based approximations. We conclude by discussing the theoretical implications of penalization for measurement invariance, computational considerations, and future directions for extending the framework to categorical indicators, longitudinal data, and applied research contexts.2025-09-05T18:49:54Z22 pages, 2 figuresR. Noah Padgetthttp://arxiv.org/abs/2605.22595v1A new class of functional conditional autoregressive models2026-05-21T15:12:35ZWe introduce a new class of conditional autoregressive models for spatially dependent functional data, formulated through conditional means given neighboring functional observations and characterized by a covariance operator and a spatial dependence parameter. Our estimation strategy consists of three components: (i) estimating the covariance operator using conditionally centered data, (ii) estimating the spatial dependence parameter by maximizing the likelihood of projected observations, and (iii) applying a novel profile-based approach to obtain the final estimators. Under an expanding lattice framework, we establish two key theoretical results. First, we establish the consistency of the proposed covariance estimator, which is not attainable using naive methods based on marginally centered data. Second, we prove that the spatial dependence parameter estimator is superconsistent and asymptotically normal, where the latter property enables statistical inference for spatial dependence in functional data -- a contribution that is novel in the existing literature. Numerical studies support the theoretical results and demonstrate the computational efficiency of our method. Finally, we illustrate its practical utility by analyzing weekly PM$_{2.5}$ concentration trajectories in 2019 across counties in the Midwestern United States.2026-05-21T15:12:35ZSooran Kimhttp://arxiv.org/abs/2505.18391v4Bayesian Estimation of Cohort-Time-Stratum Specific Effects in Staggered Difference-in-Differences2026-05-21T15:10:12ZDifference-in-Differences designs with staggered treatment adoption are widely used to study heterogeneous treatment effects across cohorts and time periods. We develop a probabilistic framework for estimating potentially high-dimensional ATT arrays that vary across cohorts, periods, and strata defined by baseline covariates. The framework jointly estimates subgroup-specific treatment effects through a unified likelihood-based model, stabilizing inference in sparse cohort-by-time-by-stratum settings. We establish a Bernstein-von Mises theorem for the ATT array, implying asymptotically valid frequentist coverage of posterior credible intervals. Simulations and an application to minimum wage increases and teen employment demonstrate meaningful finite-sample improvements and important subgroup heterogeneity.2025-05-23T21:38:32ZSiddhartha ChibKenichi Shimizu