http://arxiv.org/api/yRH0TaNdyYSoZt2fbBY973VxTKM 2025-04-22T00:00:00-04:00 24847 15 15 http://arxiv.org/abs/2401.07152v3 2025-04-19T04:12:09Z 2024-01-13T19:52:19Z Inference for Synthetic Controls via Refined Placebo Tests The synthetic control method is often applied to problems with one treated unit and a small number of control units. A common inferential task in this setting is to test null hypotheses regarding the average treatment effect on the treated. Inference procedures that are justified asymptotically are often unsatisfactory due to (1) small sample sizes that render large-sample approximation fragile and (2) simplification of the estimation procedure that is implemented in practice. An alternative is permutation inference, which is related to a common diagnostic called the placebo test. It has provable Type-I error guarantees in finite samples without simplification of the method, when the treatment is uniformly assigned. Despite this robustness, the placebo test suffers from low resolution since the null distribution is constructed from only $N$ reference estimates, where $N$ is the sample size. This creates a barrier for statistical inference at a common level like $\alpha = 0.05$, especially when $N$ is small. We propose a novel leave-two-out procedure that bypasses this issue, while still maintaining the same finite-sample Type-I error guarantee under uniform assignment for a wide range of $N$. Unlike the placebo test whose Type-I error always equals the theoretical upper bound, our procedure often achieves a lower unconditional Type-I error than theory suggests; this enables useful inference in the challenging regime when $\alpha < 1/N$. Empirically, our procedure achieves a higher power when the effect size is reasonably large and a comparable power otherwise. We generalize our procedure to non-uniform assignments and show how to conduct sensitivity analysis. From a methodological perspective, our procedure can be viewed as a new type of randomization inference different from permutation or rank-based inference, which is particularly effective in small samples. Lihua Lei Timothy Sudijono 43 pages. V3: New results + references http://arxiv.org/abs/2503.21138v3 2025-04-19T04:06:47Z 2025-03-27T04:00:49Z A Computational Theory for Efficient Model Evaluation with Causal Guarantees In order to reduce the cost of experimental evaluation for models, we introduce a computational theory of evaluation for prediction and decision models: build evaluation model to accelerate the evaluation procedures. We prove upper bounds of generalized error and generalized causal effect error of given evaluation models. We also prove efficiency, and consistency to estimated causal effect from deployed subject to evaluation metric by prediction. To learn evaluation models, we propose a meta-learner to handle heterogeneous evaluation subjects space problem. Comparing with existed evaluation approaches, our (conditional) evaluation model reduced 24.1\%-99.0\% evaluation errors across 12 scenes, including individual medicine, scientific simulation, social experiment, business activity, and quantum trade. The evaluation time is reduced 3-7 order of magnitude comparing with experiments or simulations. Hedong Yan http://arxiv.org/abs/2504.14161v1 2025-04-19T03:19:51Z 2025-04-19T03:19:51Z Robust Estimation in metric spaces: Achieving Exponential Concentration with a Fréchet Median There is growing interest in developing statistical estimators that achieve exponential concentration around a population target even when the data distribution has heavier than exponential tails. More recent activity has focused on extending such ideas beyond Euclidean spaces to Hilbert spaces and Riemannian manifolds. In this work, we show that such exponential concentration in presence of heavy tails can be achieved over a broader class of parameter spaces called CAT($\kappa$) spaces, a very general metric space equipped with the minimal essential geometric structure for our purpose, while being sufficiently broad to encompass most typical examples encountered in statistics and machine learning. The key technique is to develop and exploit a general concentration bound for the Fr\'echet median in CAT($\kappa$) spaces. We illustrate our theory through a number of examples, and provide empirical support through simulation studies. Jakwang Kim Jiyoung Park Anirban Bhattacharya Proceedings of the 28th International Conference on Artificial Intelligence and Statistics (AISTATS) 2025, PMLR 258 http://arxiv.org/abs/2504.10719v2 2025-04-18T23:58:41Z 2025-04-14T21:24:22Z Power properties of the two-sample test based on the nearest neighbors graph In this paper, we study the problem of testing the equality of two multivariate distributions. One class of tests used for this purpose utilizes geometric graphs constructed using inter-point distances. So far, the asymptotic theory of these tests applies only to graphs which fall under the stabilizing graphs framework of \citet{penroseyukich2003weaklaws}. We study the case of the $K$-nearest neighbors graph where $K=k_N$ increases with the sample size, which does not fall under the stabilizing graphs framework. Our main result gives detection thresholds for this test in parametrized families when $k_N = o(N^{1/4})$, thus extending the family of graphs where the theoretical behavior is known. We propose a 2-sided version of the test which removes an exponent gap that plagues the 1-sided test. Our result also shows that increasing the number of nearest neighbors boosts the power of the test. This provides theoretical justification for using denser graphs in testing equality of two distributions. Rahul Raphael Kanekar 62 pages, 12 figures. Author's contact information added, minor changes done to make results easier to understand http://arxiv.org/abs/2405.15074v3 2025-04-18T21:56:39Z 2024-05-23T21:50:54Z 4+3 Phases of Compute-Optimal Neural Scaling Laws We consider the solvable neural scaling model with three parameters: data complexity, target complexity, and model-parameter-count. We use this neural scaling model to derive new predictions about the compute-limited, infinite-data scaling law regime. To train the neural scaling model, we run one-pass stochastic gradient descent on a mean-squared loss. We derive a representation of the loss curves which holds over all iteration counts and improves in accuracy as the model parameter count grows. We then analyze the compute-optimal model-parameter-count, and identify 4 phases (+3 subphases) in the data-complexity/target-complexity phase-plane. The phase boundaries are determined by the relative importance of model capacity, optimizer noise, and embedding of the features. We furthermore derive, with mathematical proof and extensive numerical evidence, the scaling-law exponents in all of these phases, in particular computing the optimal model-parameter-count as a function of floating point operation budget. Elliot Paquette Courtney Paquette Lechao Xiao Jeffrey Pennington http://arxiv.org/abs/2504.14077v1 2025-04-18T21:01:17Z 2025-04-18T21:01:17Z Asymptotic well-calibration of the posterior predictive $p$-value under the modified Kolmogorov-Smirnov test The posterior predictive $p$-value is a widely used tool for Bayesian model checking. However, under most test statistics, its asymptotic null distribution is more concentrated around 1/2 than uniform. Consequently, its finite-sample behavior is difficult to interpret and tends to lack power, which is a well-known issue among practitioners. A common choice of test statistic is the Kolmogorov-Smirnov test with plug-in estimators. It provides a global measure of model-data discrepancy for real-valued observations and is sensitive to model misspecification. In this work, we establish that under this test statistic, the posterior predictive $p$-value converges in distribution to uniform under the null. We further use numerical experiments to demonstrate that this $p$-value is well-behaved in finite samples and can effectively detect a wide range of alternative models. Yueming Shen http://arxiv.org/abs/2212.11385v2 2025-04-18T19:22:45Z 2022-12-21T22:03:06Z Online Statistical Inference in Decision-Making with Matrix Context The study of online decision-making problems that leverage contextual information has drawn notable attention due to their significant applications in fields ranging from healthcare to autonomous systems. In modern applications, contextual information can be rich and is often represented as a matrix. Moreover, while existing online decision algorithms mainly focus on reward maximization, less attention has been devoted to statistical inference. To address these gaps, in this work, we consider an online decision-making problem with a matrix context where the true model parameters have a low-rank structure. We propose a fully online procedure to conduct statistical inference with adaptively collected data. The low-rank structure of the model parameter and the adaptive nature of the data collection process make this difficult: standard low-rank estimators are biased and cannot be obtained in a sequential manner while existing inference approaches in sequential decision-making algorithms fail to account for the low-rankness and are also biased. To overcome these challenges, we introduce a new online debiasing procedure to simultaneously handle both sources of bias. Our inference framework encompasses both parameter inference and optimal policy value inference. In theory, we establish the asymptotic normality of the proposed online debiased estimators and prove the validity of the constructed confidence intervals for both inference tasks. Our inference results are built upon a newly developed low-rank stochastic gradient descent estimator and its convergence result, which are also of independent interest. Qiyu Han Will Wei Sun Yichen Zhang The paper has been accepted by the Annals of Statistics http://arxiv.org/abs/2311.02040v3 2025-04-18T17:56:34Z 2023-11-03T17:14:12Z Spectral Properties of Elementwise-Transformed Spiked Matrices This work concerns elementwise-transformations of spiked matrices: $Y_n = n^{-1/2} f( \sqrt{n} X_n + Z_n)$. Here, $f$ is a function applied elementwise, $X_n$ is a low-rank signal matrix, and $Z_n$ is white noise. We find that principal component analysis is powerful for recovering signal under highly nonlinear or discontinuous transformations. Specifically, in the high-dimensional setting where $Y_n$ is of size $n \times p$ with $n,p \rightarrow \infty$ and $p/n \rightarrow \gamma > 0$, we uncover a phase transition: for signal-to-noise ratios above a sharp threshold -- depending on $f$, the distribution of elements of $Z_n$, and the limiting aspect ratio $\gamma$ -- the principal components of $Y_n$ (partially) recover those of $X_n$. Below this threshold, the principal components of $Y_n$ are asymptotically orthogonal to the signal. In contrast, in the standard setting where $X_n + n^{-1/2}Z_n$ is observed directly, the analogous phase transition depends only on $\gamma$. A similar phenomenon occurs with $X_n$ square and symmetric and $Z_n$ a generalized Wigner matrix. Michael J. Feldman http://arxiv.org/abs/2312.02849v3 2025-04-18T15:55:11Z 2023-12-05T16:02:04Z Algorithms for mean-field variational inference via polyhedral optimization in the Wasserstein space We develop a theory of finite-dimensional polyhedral subsets over the Wasserstein space and optimization of functionals over them via first-order methods. Our main application is to the problem of mean-field variational inference, which seeks to approximate a distribution $\pi$ over $\mathbb{R}^d$ by a product measure $\pi^\star$. When $\pi$ is strongly log-concave and log-smooth, we provide (1) approximation rates certifying that $\pi^\star$ is close to the minimizer $\pi^\star_\diamond$ of the KL divergence over a \emph{polyhedral} set $\mathcal{P}_\diamond$, and (2) an algorithm for minimizing $\text{KL}(\cdot\|\pi)$ over $\mathcal{P}_\diamond$ based on accelerated gradient descent over $\R^d$. As a byproduct of our analysis, we obtain the first end-to-end analysis for gradient-based algorithms for MFVI. Yiheng Jiang Sinho Chewi Aram-Alexandre Pooladian 49 pages http://arxiv.org/abs/2504.13620v1 2025-04-18T10:51:25Z 2025-04-18T10:51:25Z Set-valued conditional functionals of random sets Many key quantities in statistics and probability theory such as the expectation, quantiles, expectiles and many risk measures are law-determined maps from a space of random variables to the reals. We call such a law-determined map, which is normalised, positively homogeneous, monotone and translation equivariant, a gauge function. Considered as a functional on the space of distributions, we can apply such a gauge to the conditional distribution of a random variable. This results in conditional gauges, such as conditional quantiles or conditional expectations. In this paper, we apply such scalar gauges to the support function of a random closed convex set $\bX$. This leads to a set-valued extension of a gauge function. We also introduce a conditional variant whose values are themselves random closed convex sets. In special cases, this functional becomes the conditional set-valued quantile or the conditional set-valued expectation of a random set. In particular, in the unconditional setup, if $\bX$ is a random translation of a deterministic cone and the gauge is either a quantile or an expectile, we recover the cone distribution functions studied by Andreas Hamel and his co-authors. In the conditional setup, the conditional quantile of a random singleton yields the conditional version of the half-space depth-trimmed regions. Tobias Fissler Ilya Molchanov 30 pages http://arxiv.org/abs/2504.13520v1 2025-04-18T07:18:51Z 2025-04-18T07:18:51Z Bayesian Model Averaging in Causal Instrumental Variable Models Instrumental variables are a popular tool to infer causal effects under unobserved confounding, but choosing suitable instruments is challenging in practice. We propose gIVBMA, a Bayesian model averaging procedure that addresses this challenge by averaging across different sets of instrumental variables and covariates in a structural equation model. Our approach extends previous work through a scale-invariant prior structure and accommodates non-Gaussian outcomes and treatments, offering greater flexibility than existing methods. The computational strategy uses conditional Bayes factors to update models separately for the outcome and treatments. We prove that this model selection procedure is consistent. By explicitly accounting for model uncertainty, gIVBMA allows instruments and covariates to switch roles and provides robustness against invalid instruments. In simulation experiments, gIVBMA outperforms current state-of-the-art methods. We demonstrate its usefulness in two empirical applications: the effects of malaria and institutions on income per capita and the returns to schooling. A software implementation of gIVBMA is available in Julia. Gregor Steiner Mark Steel http://arxiv.org/abs/2504.15150v1 2025-04-18T07:11:00Z 2025-04-18T07:11:00Z Prevalence estimation in infectious diseases with imperfect tests: A comparison of Frequentist and Bayesian Logistic Regression methods with misclassification correction Accurate estimation of disease prevalence is essential for guiding public health strategies. Imperfect diagnostic tests can cause misclassification errors-false positives (FP) and false negatives (FN)-that may skew estimates if unaddressed. This study compared four statistical methods for estimating the prevalence of sexually transmitted infections (STIs) and associated factors, while correcting for misclassification. The methods were: (1) Standard Logistic Regression with external correction using known sensitivity and specificity; (2) the Liu et al. model, which jointly estimates FP and FN rates; (3) Bayesian Logistic Regression with external correction; and (4) a Bayesian model with internal correction using informative priors on diagnostic accuracy. Data came from 11,452 participants in a voluntary screening campaign for HIV, syphilis, and hepatitis B (2020-2024). Prevalence estimates and regression coefficients were compared across models using relative changes from crude estimates, confidence interval (CI) width, and coefficient variability. The Liu model produced higher prevalence estimates but had wider CIs and convergence issues in low-prevalence settings. The Bayesian model with internal correction gave intermediate estimates with the narrowest CIs and more stable intercepts, suggesting improved baseline prevalence estimation. Informative or weakly informative priors helped regularize estimates, especially in small-sample or rare-event contexts. Accounting for misclassification influenced both prevalence and covariate associations. While the Liu model offers theoretical strengths, its practical limitations in sparse data settings reduce its utility. Bayesian models with misclassification correction emerge as robust and flexible tools, particularly valuable in low-prevalence contexts where diagnostic uncertainty is high. Jorge Mario Estrada Alvarez Henan F. Garcia Miguel Ángel Montero-Alonso Juan de Dios Luna del Castillo 11 pages, 7 tables http://arxiv.org/abs/2504.13502v1 2025-04-18T06:45:30Z 2025-04-18T06:45:30Z Continuous-time filtering in Lie groups: estimation via the Fr{é}chet mean of solutions to stochastic differential equations We compute the Fr\'echet mean $\mathscr{E}_t$ of the solution $X_{t}$ to a continuous-time stochastic differential equation in a Lie group. It provides an estimator with minimal variance of $X_{t}$. We use it in the context of Kalman filtering and more precisely to infer rotation matrices. In this paper, we focus on the prediction step between two consecutive observations. Compared to state-of-the-art approaches, our assumptions on the model are minimal. Magalie Bénéfice IECL, UL Marc Arnaudon IMB, UB Audrey Giremus IMS, UB http://arxiv.org/abs/2501.06969v2 2025-04-18T03:52:48Z 2025-01-12T23:00:16Z Doubly Robust Inference on Causal Derivative Effects for Continuous Treatments Statistical methods for causal inference with continuous treatments mainly focus on estimating the mean potential outcome function, commonly known as the dose-response curve. However, it is often not the dose-response curve but its derivative function that signals the treatment effect. In this paper, we investigate nonparametric inference on the derivative of the dose-response curve with and without the positivity condition. Under the positivity and other regularity conditions, we propose a doubly robust (DR) inference method for estimating the derivative of the dose-response curve using kernel smoothing. When the positivity condition is violated, we demonstrate the inconsistency of conventional inverse probability weighting (IPW) and DR estimators, and introduce novel bias-corrected IPW and DR estimators. In all settings, our DR estimator achieves asymptotic normality at the standard nonparametric rate of convergence with nonparametric efficiency guarantees. Additionally, our approach reveals an interesting connection to nonparametric support and level set estimation problems. Finally, we demonstrate the applicability of our proposed estimators through simulations and a case study of evaluating a job training program. Yikun Zhang Yen-Chi Chen Revision with added nonparametric efficiency theory. The updated version has 117 pages (25 pages for the main paper), 10 figures http://arxiv.org/abs/2504.13423v1 2025-04-18T02:49:24Z 2025-04-18T02:49:24Z Mixed Fractional Information: Consistency of Dissipation Measures for Stable Laws Symmetric alpha-stable (S alpha S) distributions with alpha<2 lack finite classical Fisher information. Building on Johnson's framework, we define Mixed Fractional Information (MFI) via the initial rate of relative entropy dissipation during interpolation between S alpha S laws with differing scales, v and s. We demonstrate two equivalent formulations for MFI in this specific S alpha S-to-S alpha S setting. The first involves the derivative D'(v) of the relative entropy between the two S alpha S densities. The second uses an integral expectation E_gv[u(x,0) (pF_v(x) - pF_s(x))] involving the difference between Fisher scores (pF_v, pF_s) and a specific MMSE-related score function u(x,0) derived from the interpolation dynamics. Our central contribution is a rigorous proof of the consistency identity: D'(v) = (1/(alpha v)) E_gv[X (pF_v(X) - pF_s(X))]. This identity mathematically validates the equivalence of the two MFI formulations for S alpha S inputs, establishing MFI's internal coherence and directly linking entropy dissipation rates to score function differences. We further establish MFI's non-negativity (zero if and only if v=s), derive its closed-form expression for the Cauchy case (alpha=1), and numerically validate the consistency identity. MFI provides a finite, coherent, and computable information-theoretic measure for comparing S alpha S distributions where classical Fisher information fails, connecting entropy dynamics to score functions and estimation concepts. This work lays a foundation for exploring potential fractional I-MMSE relations and new functional inequalities tailored to heavy-tailed systems. William Cook 20 pages, 1 figure