http://arxiv.org/api/Q7Ua4lM55UnRMrZTTKi05SdWUZo2025-04-22T00:00:00-04:0024847015http://arxiv.org/abs/2504.15251v12025-04-21T17:31:55Z2025-04-21T17:31:55ZOn Learning Parallel Pancakes with Mostly Uniform Weights We study the complexity of learning $k$-mixtures of Gaussians ($k$-GMMs) on
$\mathbb{R}^d$. This task is known to have complexity $d^{\Omega(k)}$ in full
generality. To circumvent this exponential lower bound on the number of
components, research has focused on learning families of GMMs satisfying
additional structural properties. A natural assumption posits that the
component weights are not exponentially small and that the components have the
same unknown covariance. Recent work gave a $d^{O(\log(1/w_{\min}))}$-time
algorithm for this class of GMMs, where $w_{\min}$ is the minimum weight. Our
first main result is a Statistical Query (SQ) lower bound showing that this
quasi-polynomial upper bound is essentially best possible, even for the special
case of uniform weights. Specifically, we show that it is SQ-hard to
distinguish between such a mixture and the standard Gaussian. We further
explore how the distribution of weights affects the complexity of this task.
Our second main result is a quasi-polynomial upper bound for the aforementioned
testing task when most of the weights are uniform while a small fraction of the
weights are potentially arbitrary.
Ilias DiakonikolasDaniel M. KaneSushrut KarmalkarJasper C. H. LeeThanasis Pittashttp://arxiv.org/abs/2504.15186v12025-04-21T15:55:29Z2025-04-21T15:55:29ZSum of Independent XGamma Distributions The XGamma distribution is a generated distribution from a mixture of
Exponential and Gamma distributions. It is found that in many cases the XGamma
has more flexibility than the Exponential distribution. In this paper we
consider the sum of independent XGamma distributions with different parameters.
We showed that the probability density function of this distribution is a sum
of the probability density function of the Erlang distributions. As a
consequence, we find exact closed expressions of the other related statistical
functions. Next, we examine the estimation of the parameters by maximum
likelihood estimators. We observe in an applications a real data set which
shows that this model provides better fit to the data as compared to the sum of
the Exponential distributions, the Hypoexponential models.
Therrar KadriRahil OmairiKhaled SmailiSeifedine Kadry14 pages, 1 figurehttp://arxiv.org/abs/2504.11834v22025-04-21T07:27:30Z2025-04-16T07:45:44ZEstimation and inference in error-in-operator model Many statistical problems can be reduced to a linear inverse problem in which
only a noisy version of the operator is available. Particular examples include
random design regression, deconvolution problem, instrumental variable
regression, functional data analysis, error-in-variable regression, drift
estimation in stochastic diffusion, and many others. The pragmatic plug-in
approach can be well justified in the classical asymptotic setup with a growing
sample size. However, recent developments in high dimensional inference reveal
some new features of this problem. In high dimensional linear regression with a
random design, the plug-in approach is questionable but the use of a simple
ridge penalization yields a benign overfitting phenomenon; see
\cite{baLoLu2020}, \cite{ChMo2022}, \cite{NoPuSp2024}. This paper revisits the
general Error-in-Operator problem for finite samples and high dimension of the
source and image spaces. A particular focus is on the choice of a proper
regularization. We show that a simple ridge penalty (Tikhonov regularization)
works properly in the case when the operator is more regular than the signal.
In the opposite case, some model reduction technique like spectral truncation
should be applied.
Vladimir Spokoinyhttp://arxiv.org/abs/2305.12789v32025-04-21T04:26:48Z2023-05-22T07:37:12ZThe Decaying Missing-at-Random Framework: Model Doubly Robust Causal
Inference with Partially Labeled Data In modern large-scale observational studies, data collection constraints
often result in partially labeled datasets, posing challenges for reliable
causal inference, especially due to potential labeling bias and relatively
small size of the labeled data. This paper introduces a decaying
missing-at-random (decaying MAR) framework and associated approaches for doubly
robust causal inference on treatment effects in such semi-supervised (SS)
settings. This simultaneously addresses selection bias in the labeling
mechanism and the extreme imbalance between labeled and unlabeled groups,
bridging the gap between the standard SS and missing data literatures, while
throughout allowing for confounded treatment assignment and high-dimensional
confounders under appropriate sparsity conditions. To ensure robust causal
conclusions, we propose a bias-reduced SS (BRSS) estimator for the average
treatment effect, a type of 'model doubly robust' estimator appropriate for
such settings, establishing asymptotic normality at the appropriate rate under
decaying labeling propensity scores, provided that at least one nuisance model
is correctly specified. Our approach also relaxes sparsity conditions beyond
those required in existing methods, including standard supervised approaches.
Recognizing the asymmetry between labeling and treatment mechanisms, we further
introduce a de-coupled BRSS (DC-BRSS) estimator, which integrates inverse
probability weighting (IPW) with bias-reducing techniques in nuisance
estimation. This refinement further weakens model specification and sparsity
requirements. Numerical experiments confirm the effectiveness and adaptability
of our estimators in addressing labeling bias and model misspecification.
Yuqian ZhangAbhishek ChakraborttyJelena Bradichttp://arxiv.org/abs/2504.01318v22025-04-21T03:25:44Z2025-04-02T03:05:28ZTail Bounds for Canonical $U$-Statistics and $U$-Processes with
Unbounded Kernels In this paper, we prove exponential tail bounds for canonical (or degenerate)
$U$-statistics and $U$-processes under exponential-type tail assumptions on the
kernels. Most of the existing results in the relevant literature often assume
bounded kernels or obtain sub-optimal tail behavior under unbounded kernels. We
obtain sharp rates and optimal tail behavior under sub-Weibull kernel
functions. Some examples from nonparametric and semiparametric statistics
literature are considered.
Abhishek ChakraborttyArun K. KuchibhotlaThis is a slightly edited version of the 2018 draft available at
https://faculty.wharton.upenn.edu/wp-content/uploads/2018/10/Chakrabortty-UStat-Draft.pdf.
Added more comments on the assumptions and the proof technique of Theorem 1.
Corrected a few typos. More improvements to follow in the future for the
U-process resultshttp://arxiv.org/abs/2406.06941v32025-04-21T00:07:13Z2024-06-11T04:49:04ZEfficient estimation and data fusion under general semiparametric
restrictions on outcome mean functions We provide a novel characterization of semiparametric efficiency in a generic
supervised learning setting where the outcome mean function -- defined as the
conditional expectation of the outcome of interest given the other observed
variables -- is restricted to lie in some known semiparametric function class.
The primary motivation is causal inference where a researcher running a
randomized controlled trial often has access to an auxiliary observational
dataset that is confounded or otherwise biased for estimating causal effects.
Prior work has imposed various bespoke assumptions on this bias in an attempt
to improve precision via data fusion. We show how many of these assumptions can
be formulated as restrictions on the outcome mean function in the concatenation
of the experimental and observational datasets. Then our theory provides a
unified framework to maximally leverage such restrictions for precision gain by
constructing efficient estimators in all of these settings as well as in a wide
range of others that future investigators might be interested in. For example,
when the observational dataset is subject to outcome-mediated selection bias,
we show our novel efficient estimator dominates an existing control variate
approach both asymptotically and in numerical studies.
Harrison H. Li52 pages, 4 figures. Substantially rewritten for clarity from
previous versionhttp://arxiv.org/abs/2504.14659v12025-04-20T15:42:41Z2025-04-20T15:42:41ZMarkovian Continuity of the MMSE Minimum mean square error (MMSE) estimation is widely used in signal
processing and related fields. While it is known to be non-continuous with
respect to all standard notions of stochastic convergence, it remains robust in
practical applications. In this work, we review the known counterexamples to
the continuity of the MMSE. We observe that, in these counterexamples, the
discontinuity arises from an element in the converging measurement sequence
providing more information about the estimand than the limit of the measurement
sequence. We argue that this behavior is uncharacteristic of real-world
applications and introduce a new stochastic convergence notion, termed
Markovian convergence, to address this issue. We prove that the MMSE is, in
fact, continuous under this new notion. We supplement this result with
semi-continuity and continuity guarantees of the MMSE in other settings and
prove the continuity of the MMSE under linear estimation.
Elad DomanovitzAnatoly Khinahttp://arxiv.org/abs/2503.14747v22025-04-20T15:15:52Z2025-03-18T21:22:55ZTesting Conditional Stochastic Dominance at Target Points This paper introduces a novel test for conditional stochastic dominance (CSD)
at specific values of the conditioning covariates, referred to as target
points. The test is relevant for analyzing income inequality, evaluating
treatment effects, and studying discrimination. We propose a
Kolmogorov--Smirnov-type test statistic that utilizes induced order statistics
from independent samples. Notably, the test features a data-independent
critical value, eliminating the need for resampling techniques such as the
bootstrap. Our approach avoids kernel smoothing and parametric assumptions,
instead relying on a tuning parameter to select relevant observations. We
establish the asymptotic properties of our test, showing that the induced order
statistics converge to independent draws from the true conditional
distributions and that the test is asymptotically of level $\alpha$ under weak
regularity conditions. While our results apply to both continuous and discrete
data, in the discrete case, the critical value only provides a valid upper
bound. To address this, we propose a refined critical value that significantly
enhances power, requiring only knowledge of the support size of the
distributions. Additionally, we analyze the test's behavior in the limit
experiment, demonstrating that it reduces to a problem analogous to testing
unconditional stochastic dominance in finite samples. This framework allows us
to prove the validity of permutation-based tests for stochastic dominance when
the random variables are continuous. Monte Carlo simulations confirm the strong
finite-sample performance of our method.
Federico A. BugniIvan A. CanayDeborah Kimhttp://arxiv.org/abs/2403.11343v32025-04-20T15:05:22Z2024-03-17T21:04:48ZFederated Transfer Learning with Differential Privacy Federated learning has emerged as a powerful framework for analysing
distributed data, yet two challenges remain pivotal: heterogeneity across sites
and privacy of local data. In this paper, we address both challenges within a
federated transfer learning framework, aiming to enhance learning on a target
data set by leveraging information from multiple heterogeneous source data sets
while adhering to privacy constraints. We rigorously formulate the notion of
federated differential privacy, which offers privacy guarantees for each data
set without assuming a trusted central server. Under this privacy model, we
study three classical statistical problems: univariate mean estimation,
low-dimensional linear regression, and high-dimensional linear regression. By
investigating the minimax rates and quantifying the cost of privacy in each
problem, we show that federated differential privacy is an intermediate privacy
model between the well-established local and central models of differential
privacy. Our analyses account for data heterogeneity and privacy, highlighting
the fundamental costs associated with each factor and the benefits of knowledge
transfer in federated learning.
Mengchu LiYe TianYang FengYi Yu89 pages, 4 figureshttp://arxiv.org/abs/2503.22080v22025-04-20T11:16:34Z2025-03-28T01:57:15ZAn Improved Satterthwaite Effective Degrees of Freedom Correction for
Weighted Syntheses of Variance This article presents an improved approximation for the effective degrees of
freedom in the Satterthwaite (1941, 1946) method which estimates the
distribution of a weighted combination of variance components The standard
Satterthwaite approximation assumes a scaled chisquare distribution for the
composite variance estimator but is known to be biased downward when component
degrees of freedom are small. Building on recent work by von Davier (2025), we
propose an adjusted estimator that corrects this bias by modifying both the
numerator and denominator of the traditional formula. The new approximation
incorporates a weighted average of component degrees of freedom and a scaling
factor that ensures consistency as the number of components or their degrees of
freedom increases. We demonstrate the utility of this adjustment in practical
settings, including Rubin's (1987) total variance estimation in multiple
imputations, where weighted variance combinations are common. The proposed
estimator generalizes and further improves von Davier's (2025) unweighted case
and more accurately approximates synthetic variance estimators with arbitrary
weights.
Matthias von Davierhttp://arxiv.org/abs/2504.14555v12025-04-20T10:03:08Z2025-04-20T10:03:08ZNonparametric Estimation in Uniform Deconvolution and Interval Censoring In the uniform deconvolution problem one is interested in estimating the
distribution function $F_0$ of a nonnegative random variable, based on a sample
with additive uniform noise. A peculiar and not well understood phenomenon of
the nonparametric maximum likelihood estimator in this setting is the dichotomy
between the situations where $F_0(1)=1$ and $F_0(1)<1$. If $F_0(1)=1$, the MLE
can be computed in a straightforward way and its asymptotic pointwise behavior
can be derived using the connection to the so-called current status problem.
However, if $F_0(1)<1$, one needs an iterative procedure to compute it and the
asymptotic pointwise behavior of the nonparametric maximum likelihood estimator
is not known. In this paper we describe the problem, connect it to interval
censoring problems and a more general model studied in Groeneboom (2024) to
state two competing naturally occurring conjectures for the case $F_0(1)<1$.
Asymptotic arguments related to smooth functional theory and extensive
simulations lead us to to bet on one of these two conjectures.
Piet GroeneboomGeurt Jongbloed16 pages, 4 figureshttp://arxiv.org/abs/2502.09832v22025-04-20T00:32:19Z2025-02-14T00:24:51ZAlgorithmic contiguity from low-degree conjecture and applications in
correlated random graphs In this paper, assuming a natural strengthening of the low-degree conjecture,
we provide evidence of computational hardness for two problems: (1) the
(partial) matching recovery problem in the sparse correlated Erd\H{o}s-R\'enyi
graphs $\mathcal G(n,q;\rho)$ when the edge-density $q=n^{-1+o(1)}$ and the
correlation $\rho<\sqrt{\alpha}$ lies below the Otter's threshold, solving a
remaining problem in \cite{DDL23+}; (2) the detection problem between the
correlated sparse stochastic block model $\mathcal
S(n,\tfrac{\lambda}{n};k,\epsilon;s)$ and a pair of independent stochastic
block models $\mathcal S(n,\tfrac{\lambda s}{n};k,\epsilon)$ when $\epsilon^2
\lambda s<1$ lies below the Kesten-Stigum (KS) threshold and $s<\sqrt{\alpha}$
lies below the Otter's threshold, solving a remaining problem in
\cite{CDGL24+}.
One of the main ingredient in our proof is to derive certain forms of
\emph{algorithmic contiguity} between two probability measures based on bounds
on their low-degree advantage. To be more precise, consider the
high-dimensional hypothesis testing problem between two probability measures
$\mathbb{P}$ and $\mathbb{Q}$ based on the sample $\mathsf Y$. We show that if
the low-degree advantage $\mathsf{Adv}_{\leq D} \big(
\frac{\mathrm{d}\mathbb{P}}{\mathrm{d}\mathbb{Q}} \big)=O(1)$, then (assuming
the low-degree conjecture) there is no efficient algorithm $\mathcal A$ such
that $\mathbb{Q}(\mathcal A(\mathsf Y)=0)=1-o(1)$ and $\mathbb{P}(\mathcal
A(\mathsf Y)=1)=\Omega(1)$. This framework provides a useful tool for
performing reductions between different inference tasks.
Zhangsong Li37 pages. Fixed several typos and added a proof of Theorem~3.2
assuming only the original low-degree conjecture in Appendix~Chttp://arxiv.org/abs/2310.20460v32025-04-19T23:59:36Z2023-10-31T13:54:11ZAggregating Dependent Signals with Heavy-Tailed Combination Tests Combining dependent p-values poses a long-standing challenge in statistical
inference, particularly when aggregating findings from multiple methods to
enhance signal detection. Recently, p-value combination tests based on
regularly varying-tailed distributions, such as the Cauchy combination test and
harmonic mean p-value, have attracted attention for their robustness to unknown
dependence. This paper provides a theoretical and empirical evaluation of these
methods under an asymptotic regime where the number of p-values is fixed and
the global test significance level approaches zero. We examine two types of
dependence among the p-values. First, when p-values are pairwise asymptotically
independent, such as with bivariate normal test statistics with no perfect
correlation, we prove that these combination tests are asymptotically valid.
However, they become equivalent to the Bonferroni test as the significance
level tends to zero for both one-sided and two-sided p-values. Empirical
investigations suggest that this equivalence can emerge at moderately small
significance levels. Second, under pairwise quasi-asymptotic dependence, such
as with bivariate t-distributed test statistics, our simulations suggest that
these combination tests can remain valid and exhibit notable power gains over
Bonferroni, even as the significance level diminishes. These findings highlight
the potential advantages of these combination tests in scenarios where p-values
exhibit substantial dependence. Our simulations also examine how test
performance depends on the support and tail heaviness of the underlying
distributions.
Lin GuiYuchao JiangJingshu Wanghttp://arxiv.org/abs/2401.13760v22025-04-19T19:50:12Z2024-01-24T19:18:46ZEarly Detection of Treatments Side Effect: A Sequential Approach With the emergence and spread of infectious diseases with pandemic potential,
such as COVID- 19, the urgency for vaccine development have led to
unprecedented compressed and accelerated schedules that shortened the standard
development timeline. In a relatively short time, the leading pharmaceutical
companies1, received an Emergency Use Authorization (EUA) for vaccine\prime s
en-mass deployment To monitor the potential side effect(s) of the vaccine
during the (initial) vaccination campaign, we developed an optimal sequential
test that allows for the early detection of potential side effect(s). This test
employs a rule to stop the vaccination process once the observed number of side
effect incidents exceeds a certain (pre-determined) threshold. The optimality
of the proposed sequential test is justified when compared with the ({\alpha},
{\beta}) optimality of the non-randomized fixed-sample Uniformly Most Powerful
(UMP) test. In the case of a single side effect, we study the properties of the
sequential test and derive the exact expressions of the Average Sample Number
(ASN) curve of the stopping time (and its variance) via the regularized
incomplete beta function. Additionally, we derive the asymptotic distribution
of the relative savings in ASN as compared to maximal sample size. Moreover, we
construct the post-test parameter estimate and studied its sampling properties,
including its asymptotic behavior under local-type alternatives. These limiting
behavior results are the consistency and asymptotic normality of the post-test
parameter estimator. We conclude the paper with a small simulation study
illustrating the asymptotic performance of the point and interval estimation
and provide a detailed example, based on COVID-19 side effect data (see Beatty
et al. (2021)) of our suggested testing procedure.
Jiayue WangBen BoukaiThere are 21 pages, 8 pictures and 4 tableshttp://arxiv.org/abs/2409.15597v22025-04-19T14:53:49Z2024-09-23T23:04:47ZHigher-criticism for sparse multi-stream change-point detection We study a statistical procedure based on higher criticism (HC) to address
the sparse multi-stream quickest change-point detection problem. Namely, we aim
to detect a potential change in the distribution of multiple data streams at
some unknown time. If a change occurs, only a few streams are affected, whereas
the identity of the affected streams is unknown. The HC-based procedure
involves testing for a change point in individual streams and combining
multiple tests using higher criticism. Relying on HC thresholding, the
procedure also indicates a set of streams suspected to be affected by the
change. We provide a theoretical analysis under a sparse heteroscedastic normal
change-point model. We establish an information-theoretic detection delay lower
bound when individual tests are based on the likelihood ratio or the
generalized likelihood ratio statistics and show that the delay of the HC-based
method converges in distribution to this bound. In the special case of constant
variance, our bound coincides with known results in (Chan, 2017). We
demonstrate the effectiveness of the HC-based method compared to other methods
in detecting sparse changes through extensive numerical evaluations.
Tingnan GongAlon KipnisYao XieAuthors are listed in alphabetical order