http://arxiv.org/api/yRH0TaNdyYSoZt2fbBY973VxTKM2025-04-22T00:00:00-04:00248471515http://arxiv.org/abs/2401.07152v32025-04-19T04:12:09Z2024-01-13T19:52:19ZInference for Synthetic Controls via Refined Placebo Tests The synthetic control method is often applied to problems with one treated
unit and a small number of control units. A common inferential task in this
setting is to test null hypotheses regarding the average treatment effect on
the treated. Inference procedures that are justified asymptotically are often
unsatisfactory due to (1) small sample sizes that render large-sample
approximation fragile and (2) simplification of the estimation procedure that
is implemented in practice. An alternative is permutation inference, which is
related to a common diagnostic called the placebo test. It has provable Type-I
error guarantees in finite samples without simplification of the method, when
the treatment is uniformly assigned. Despite this robustness, the placebo test
suffers from low resolution since the null distribution is constructed from
only $N$ reference estimates, where $N$ is the sample size. This creates a
barrier for statistical inference at a common level like $\alpha = 0.05$,
especially when $N$ is small. We propose a novel leave-two-out procedure that
bypasses this issue, while still maintaining the same finite-sample Type-I
error guarantee under uniform assignment for a wide range of $N$. Unlike the
placebo test whose Type-I error always equals the theoretical upper bound, our
procedure often achieves a lower unconditional Type-I error than theory
suggests; this enables useful inference in the challenging regime when $\alpha
< 1/N$. Empirically, our procedure achieves a higher power when the effect size
is reasonably large and a comparable power otherwise. We generalize our
procedure to non-uniform assignments and show how to conduct sensitivity
analysis. From a methodological perspective, our procedure can be viewed as a
new type of randomization inference different from permutation or rank-based
inference, which is particularly effective in small samples.
Lihua LeiTimothy Sudijono43 pages. V3: New results + referenceshttp://arxiv.org/abs/2503.21138v32025-04-19T04:06:47Z2025-03-27T04:00:49ZA Computational Theory for Efficient Model Evaluation with Causal
Guarantees In order to reduce the cost of experimental evaluation for models, we
introduce a computational theory of evaluation for prediction and decision
models: build evaluation model to accelerate the evaluation procedures. We
prove upper bounds of generalized error and generalized causal effect error of
given evaluation models. We also prove efficiency, and consistency to estimated
causal effect from deployed subject to evaluation metric by prediction. To
learn evaluation models, we propose a meta-learner to handle heterogeneous
evaluation subjects space problem. Comparing with existed evaluation
approaches, our (conditional) evaluation model reduced 24.1\%-99.0\% evaluation
errors across 12 scenes, including individual medicine, scientific simulation,
social experiment, business activity, and quantum trade. The evaluation time is
reduced 3-7 order of magnitude comparing with experiments or simulations.
Hedong Yanhttp://arxiv.org/abs/2504.14161v12025-04-19T03:19:51Z2025-04-19T03:19:51ZRobust Estimation in metric spaces: Achieving Exponential Concentration
with a Fréchet Median There is growing interest in developing statistical estimators that achieve
exponential concentration around a population target even when the data
distribution has heavier than exponential tails. More recent activity has
focused on extending such ideas beyond Euclidean spaces to Hilbert spaces and
Riemannian manifolds. In this work, we show that such exponential concentration
in presence of heavy tails can be achieved over a broader class of parameter
spaces called CAT($\kappa$) spaces, a very general metric space equipped with
the minimal essential geometric structure for our purpose, while being
sufficiently broad to encompass most typical examples encountered in statistics
and machine learning. The key technique is to develop and exploit a general
concentration bound for the Fr\'echet median in CAT($\kappa$) spaces. We
illustrate our theory through a number of examples, and provide empirical
support through simulation studies.
Jakwang KimJiyoung ParkAnirban BhattacharyaProceedings of the 28th International Conference on Artificial
Intelligence and Statistics (AISTATS) 2025, PMLR 258http://arxiv.org/abs/2504.10719v22025-04-18T23:58:41Z2025-04-14T21:24:22ZPower properties of the two-sample test based on the nearest neighbors
graph In this paper, we study the problem of testing the equality of two
multivariate distributions. One class of tests used for this purpose utilizes
geometric graphs constructed using inter-point distances. So far, the
asymptotic theory of these tests applies only to graphs which fall under the
stabilizing graphs framework of \citet{penroseyukich2003weaklaws}. We study the
case of the $K$-nearest neighbors graph where $K=k_N$ increases with the sample
size, which does not fall under the stabilizing graphs framework. Our main
result gives detection thresholds for this test in parametrized families when
$k_N = o(N^{1/4})$, thus extending the family of graphs where the theoretical
behavior is known. We propose a 2-sided version of the test which removes an
exponent gap that plagues the 1-sided test. Our result also shows that
increasing the number of nearest neighbors boosts the power of the test. This
provides theoretical justification for using denser graphs in testing equality
of two distributions.
Rahul Raphael Kanekar62 pages, 12 figures. Author's contact information added, minor
changes done to make results easier to understandhttp://arxiv.org/abs/2405.15074v32025-04-18T21:56:39Z2024-05-23T21:50:54Z4+3 Phases of Compute-Optimal Neural Scaling Laws We consider the solvable neural scaling model with three parameters: data
complexity, target complexity, and model-parameter-count. We use this neural
scaling model to derive new predictions about the compute-limited,
infinite-data scaling law regime. To train the neural scaling model, we run
one-pass stochastic gradient descent on a mean-squared loss. We derive a
representation of the loss curves which holds over all iteration counts and
improves in accuracy as the model parameter count grows. We then analyze the
compute-optimal model-parameter-count, and identify 4 phases (+3 subphases) in
the data-complexity/target-complexity phase-plane. The phase boundaries are
determined by the relative importance of model capacity, optimizer noise, and
embedding of the features. We furthermore derive, with mathematical proof and
extensive numerical evidence, the scaling-law exponents in all of these phases,
in particular computing the optimal model-parameter-count as a function of
floating point operation budget.
Elliot PaquetteCourtney PaquetteLechao XiaoJeffrey Penningtonhttp://arxiv.org/abs/2504.14077v12025-04-18T21:01:17Z2025-04-18T21:01:17ZAsymptotic well-calibration of the posterior predictive $p$-value under
the modified Kolmogorov-Smirnov test The posterior predictive $p$-value is a widely used tool for Bayesian model
checking. However, under most test statistics, its asymptotic null distribution
is more concentrated around 1/2 than uniform. Consequently, its finite-sample
behavior is difficult to interpret and tends to lack power, which is a
well-known issue among practitioners. A common choice of test statistic is the
Kolmogorov-Smirnov test with plug-in estimators. It provides a global measure
of model-data discrepancy for real-valued observations and is sensitive to
model misspecification. In this work, we establish that under this test
statistic, the posterior predictive $p$-value converges in distribution to
uniform under the null. We further use numerical experiments to demonstrate
that this $p$-value is well-behaved in finite samples and can effectively
detect a wide range of alternative models.
Yueming Shenhttp://arxiv.org/abs/2212.11385v22025-04-18T19:22:45Z2022-12-21T22:03:06ZOnline Statistical Inference in Decision-Making with Matrix Context The study of online decision-making problems that leverage contextual
information has drawn notable attention due to their significant applications
in fields ranging from healthcare to autonomous systems. In modern
applications, contextual information can be rich and is often represented as a
matrix. Moreover, while existing online decision algorithms mainly focus on
reward maximization, less attention has been devoted to statistical inference.
To address these gaps, in this work, we consider an online decision-making
problem with a matrix context where the true model parameters have a low-rank
structure. We propose a fully online procedure to conduct statistical inference
with adaptively collected data. The low-rank structure of the model parameter
and the adaptive nature of the data collection process make this difficult:
standard low-rank estimators are biased and cannot be obtained in a sequential
manner while existing inference approaches in sequential decision-making
algorithms fail to account for the low-rankness and are also biased. To
overcome these challenges, we introduce a new online debiasing procedure to
simultaneously handle both sources of bias. Our inference framework encompasses
both parameter inference and optimal policy value inference. In theory, we
establish the asymptotic normality of the proposed online debiased estimators
and prove the validity of the constructed confidence intervals for both
inference tasks. Our inference results are built upon a newly developed
low-rank stochastic gradient descent estimator and its convergence result,
which are also of independent interest.
Qiyu HanWill Wei SunYichen ZhangThe paper has been accepted by the Annals of Statisticshttp://arxiv.org/abs/2311.02040v32025-04-18T17:56:34Z2023-11-03T17:14:12ZSpectral Properties of Elementwise-Transformed Spiked Matrices This work concerns elementwise-transformations of spiked matrices: $Y_n =
n^{-1/2} f( \sqrt{n} X_n + Z_n)$. Here, $f$ is a function applied elementwise,
$X_n$ is a low-rank signal matrix, and $Z_n$ is white noise. We find that
principal component analysis is powerful for recovering signal under highly
nonlinear or discontinuous transformations. Specifically, in the
high-dimensional setting where $Y_n$ is of size $n \times p$ with $n,p
\rightarrow \infty$ and $p/n \rightarrow \gamma > 0$, we uncover a phase
transition: for signal-to-noise ratios above a sharp threshold -- depending on
$f$, the distribution of elements of $Z_n$, and the limiting aspect ratio
$\gamma$ -- the principal components of $Y_n$ (partially) recover those of
$X_n$. Below this threshold, the principal components of $Y_n$ are
asymptotically orthogonal to the signal. In contrast, in the standard setting
where $X_n + n^{-1/2}Z_n$ is observed directly, the analogous phase transition
depends only on $\gamma$. A similar phenomenon occurs with $X_n$ square and
symmetric and $Z_n$ a generalized Wigner matrix.
Michael J. Feldmanhttp://arxiv.org/abs/2312.02849v32025-04-18T15:55:11Z2023-12-05T16:02:04ZAlgorithms for mean-field variational inference via polyhedral
optimization in the Wasserstein space We develop a theory of finite-dimensional polyhedral subsets over the
Wasserstein space and optimization of functionals over them via first-order
methods. Our main application is to the problem of mean-field variational
inference, which seeks to approximate a distribution $\pi$ over $\mathbb{R}^d$
by a product measure $\pi^\star$. When $\pi$ is strongly log-concave and
log-smooth, we provide (1) approximation rates certifying that $\pi^\star$ is
close to the minimizer $\pi^\star_\diamond$ of the KL divergence over a
\emph{polyhedral} set $\mathcal{P}_\diamond$, and (2) an algorithm for
minimizing $\text{KL}(\cdot\|\pi)$ over $\mathcal{P}_\diamond$ based on
accelerated gradient descent over $\R^d$. As a byproduct of our analysis, we
obtain the first end-to-end analysis for gradient-based algorithms for MFVI.
Yiheng JiangSinho ChewiAram-Alexandre Pooladian49 pageshttp://arxiv.org/abs/2504.13620v12025-04-18T10:51:25Z2025-04-18T10:51:25ZSet-valued conditional functionals of random sets Many key quantities in statistics and probability theory such as the
expectation, quantiles, expectiles and many risk measures are law-determined
maps from a space of random variables to the reals. We call such a
law-determined map, which is normalised, positively homogeneous, monotone and
translation equivariant, a gauge function. Considered as a functional on the
space of distributions, we can apply such a gauge to the conditional
distribution of a random variable. This results in conditional gauges, such as
conditional quantiles or conditional expectations. In this paper, we apply such
scalar gauges to the support function of a random closed convex set $\bX$. This
leads to a set-valued extension of a gauge function. We also introduce a
conditional variant whose values are themselves random closed convex sets. In
special cases, this functional becomes the conditional set-valued quantile or
the conditional set-valued expectation of a random set. In particular, in the
unconditional setup, if $\bX$ is a random translation of a deterministic cone
and the gauge is either a quantile or an expectile, we recover the cone
distribution functions studied by Andreas Hamel and his co-authors. In the
conditional setup, the conditional quantile of a random singleton yields the
conditional version of the half-space depth-trimmed regions.
Tobias FisslerIlya Molchanov30 pageshttp://arxiv.org/abs/2504.13520v12025-04-18T07:18:51Z2025-04-18T07:18:51ZBayesian Model Averaging in Causal Instrumental Variable Models Instrumental variables are a popular tool to infer causal effects under
unobserved confounding, but choosing suitable instruments is challenging in
practice. We propose gIVBMA, a Bayesian model averaging procedure that
addresses this challenge by averaging across different sets of instrumental
variables and covariates in a structural equation model. Our approach extends
previous work through a scale-invariant prior structure and accommodates
non-Gaussian outcomes and treatments, offering greater flexibility than
existing methods. The computational strategy uses conditional Bayes factors to
update models separately for the outcome and treatments. We prove that this
model selection procedure is consistent. By explicitly accounting for model
uncertainty, gIVBMA allows instruments and covariates to switch roles and
provides robustness against invalid instruments. In simulation experiments,
gIVBMA outperforms current state-of-the-art methods. We demonstrate its
usefulness in two empirical applications: the effects of malaria and
institutions on income per capita and the returns to schooling. A software
implementation of gIVBMA is available in Julia.
Gregor SteinerMark Steelhttp://arxiv.org/abs/2504.15150v12025-04-18T07:11:00Z2025-04-18T07:11:00ZPrevalence estimation in infectious diseases with imperfect tests: A
comparison of Frequentist and Bayesian Logistic Regression methods with
misclassification correction Accurate estimation of disease prevalence is essential for guiding public
health strategies. Imperfect diagnostic tests can cause misclassification
errors-false positives (FP) and false negatives (FN)-that may skew estimates if
unaddressed. This study compared four statistical methods for estimating the
prevalence of sexually transmitted infections (STIs) and associated factors,
while correcting for misclassification. The methods were: (1) Standard Logistic
Regression with external correction using known sensitivity and specificity;
(2) the Liu et al. model, which jointly estimates FP and FN rates; (3) Bayesian
Logistic Regression with external correction; and (4) a Bayesian model with
internal correction using informative priors on diagnostic accuracy. Data came
from 11,452 participants in a voluntary screening campaign for HIV, syphilis,
and hepatitis B (2020-2024). Prevalence estimates and regression coefficients
were compared across models using relative changes from crude estimates,
confidence interval (CI) width, and coefficient variability. The Liu model
produced higher prevalence estimates but had wider CIs and convergence issues
in low-prevalence settings. The Bayesian model with internal correction gave
intermediate estimates with the narrowest CIs and more stable intercepts,
suggesting improved baseline prevalence estimation. Informative or weakly
informative priors helped regularize estimates, especially in small-sample or
rare-event contexts. Accounting for misclassification influenced both
prevalence and covariate associations. While the Liu model offers theoretical
strengths, its practical limitations in sparse data settings reduce its
utility. Bayesian models with misclassification correction emerge as robust and
flexible tools, particularly valuable in low-prevalence contexts where
diagnostic uncertainty is high.
Jorge Mario Estrada AlvarezHenan F. GarciaMiguel Ángel Montero-AlonsoJuan de Dios Luna del Castillo11 pages, 7 tableshttp://arxiv.org/abs/2504.13502v12025-04-18T06:45:30Z2025-04-18T06:45:30ZContinuous-time filtering in Lie groups: estimation via the Fr{é}chet
mean of solutions to stochastic differential equations We compute the Fr\'echet mean $\mathscr{E}_t$ of the solution $X_{t}$ to a
continuous-time stochastic differential equation in a Lie group. It provides an
estimator with minimal variance of $X_{t}$. We use it in the context of Kalman
filtering and more precisely to infer rotation matrices. In this paper, we
focus on the prediction step between two consecutive observations. Compared to
state-of-the-art approaches, our assumptions on the model are minimal.
Magalie BénéficeIECL, ULMarc ArnaudonIMB, UBAudrey GiremusIMS, UBhttp://arxiv.org/abs/2501.06969v22025-04-18T03:52:48Z2025-01-12T23:00:16ZDoubly Robust Inference on Causal Derivative Effects for Continuous
Treatments Statistical methods for causal inference with continuous treatments mainly
focus on estimating the mean potential outcome function, commonly known as the
dose-response curve. However, it is often not the dose-response curve but its
derivative function that signals the treatment effect. In this paper, we
investigate nonparametric inference on the derivative of the dose-response
curve with and without the positivity condition. Under the positivity and other
regularity conditions, we propose a doubly robust (DR) inference method for
estimating the derivative of the dose-response curve using kernel smoothing.
When the positivity condition is violated, we demonstrate the inconsistency of
conventional inverse probability weighting (IPW) and DR estimators, and
introduce novel bias-corrected IPW and DR estimators. In all settings, our DR
estimator achieves asymptotic normality at the standard nonparametric rate of
convergence with nonparametric efficiency guarantees. Additionally, our
approach reveals an interesting connection to nonparametric support and level
set estimation problems. Finally, we demonstrate the applicability of our
proposed estimators through simulations and a case study of evaluating a job
training program.
Yikun ZhangYen-Chi ChenRevision with added nonparametric efficiency theory. The updated
version has 117 pages (25 pages for the main paper), 10 figureshttp://arxiv.org/abs/2504.13423v12025-04-18T02:49:24Z2025-04-18T02:49:24ZMixed Fractional Information: Consistency of Dissipation Measures for
Stable Laws Symmetric alpha-stable (S alpha S) distributions with alpha<2 lack finite
classical Fisher information. Building on Johnson's framework, we define Mixed
Fractional Information (MFI) via the initial rate of relative entropy
dissipation during interpolation between S alpha S laws with differing scales,
v and s. We demonstrate two equivalent formulations for MFI in this specific S
alpha S-to-S alpha S setting. The first involves the derivative D'(v) of the
relative entropy between the two S alpha S densities. The second uses an
integral expectation E_gv[u(x,0) (pF_v(x) - pF_s(x))] involving the difference
between Fisher scores (pF_v, pF_s) and a specific MMSE-related score function
u(x,0) derived from the interpolation dynamics. Our central contribution is a
rigorous proof of the consistency identity: D'(v) = (1/(alpha v)) E_gv[X
(pF_v(X) - pF_s(X))]. This identity mathematically validates the equivalence of
the two MFI formulations for S alpha S inputs, establishing MFI's internal
coherence and directly linking entropy dissipation rates to score function
differences. We further establish MFI's non-negativity (zero if and only if
v=s), derive its closed-form expression for the Cauchy case (alpha=1), and
numerically validate the consistency identity. MFI provides a finite, coherent,
and computable information-theoretic measure for comparing S alpha S
distributions where classical Fisher information fails, connecting entropy
dynamics to score functions and estimation concepts. This work lays a
foundation for exploring potential fractional I-MMSE relations and new
functional inequalities tailored to heavy-tailed systems.
William Cook20 pages, 1 figure