http://arxiv.org/api/Q7Ua4lM55UnRMrZTTKi05SdWUZo 2025-04-22T00:00:00-04:00 24847 0 15 http://arxiv.org/abs/2504.15251v1 2025-04-21T17:31:55Z 2025-04-21T17:31:55Z On Learning Parallel Pancakes with Mostly Uniform Weights We study the complexity of learning $k$-mixtures of Gaussians ($k$-GMMs) on $\mathbb{R}^d$. This task is known to have complexity $d^{\Omega(k)}$ in full generality. To circumvent this exponential lower bound on the number of components, research has focused on learning families of GMMs satisfying additional structural properties. A natural assumption posits that the component weights are not exponentially small and that the components have the same unknown covariance. Recent work gave a $d^{O(\log(1/w_{\min}))}$-time algorithm for this class of GMMs, where $w_{\min}$ is the minimum weight. Our first main result is a Statistical Query (SQ) lower bound showing that this quasi-polynomial upper bound is essentially best possible, even for the special case of uniform weights. Specifically, we show that it is SQ-hard to distinguish between such a mixture and the standard Gaussian. We further explore how the distribution of weights affects the complexity of this task. Our second main result is a quasi-polynomial upper bound for the aforementioned testing task when most of the weights are uniform while a small fraction of the weights are potentially arbitrary. Ilias Diakonikolas Daniel M. Kane Sushrut Karmalkar Jasper C. H. Lee Thanasis Pittas http://arxiv.org/abs/2504.15186v1 2025-04-21T15:55:29Z 2025-04-21T15:55:29Z Sum of Independent XGamma Distributions The XGamma distribution is a generated distribution from a mixture of Exponential and Gamma distributions. It is found that in many cases the XGamma has more flexibility than the Exponential distribution. In this paper we consider the sum of independent XGamma distributions with different parameters. We showed that the probability density function of this distribution is a sum of the probability density function of the Erlang distributions. As a consequence, we find exact closed expressions of the other related statistical functions. Next, we examine the estimation of the parameters by maximum likelihood estimators. We observe in an applications a real data set which shows that this model provides better fit to the data as compared to the sum of the Exponential distributions, the Hypoexponential models. Therrar Kadri Rahil Omairi Khaled Smaili Seifedine Kadry 14 pages, 1 figure http://arxiv.org/abs/2504.11834v2 2025-04-21T07:27:30Z 2025-04-16T07:45:44Z Estimation and inference in error-in-operator model Many statistical problems can be reduced to a linear inverse problem in which only a noisy version of the operator is available. Particular examples include random design regression, deconvolution problem, instrumental variable regression, functional data analysis, error-in-variable regression, drift estimation in stochastic diffusion, and many others. The pragmatic plug-in approach can be well justified in the classical asymptotic setup with a growing sample size. However, recent developments in high dimensional inference reveal some new features of this problem. In high dimensional linear regression with a random design, the plug-in approach is questionable but the use of a simple ridge penalization yields a benign overfitting phenomenon; see \cite{baLoLu2020}, \cite{ChMo2022}, \cite{NoPuSp2024}. This paper revisits the general Error-in-Operator problem for finite samples and high dimension of the source and image spaces. A particular focus is on the choice of a proper regularization. We show that a simple ridge penalty (Tikhonov regularization) works properly in the case when the operator is more regular than the signal. In the opposite case, some model reduction technique like spectral truncation should be applied. Vladimir Spokoiny http://arxiv.org/abs/2305.12789v3 2025-04-21T04:26:48Z 2023-05-22T07:37:12Z The Decaying Missing-at-Random Framework: Model Doubly Robust Causal Inference with Partially Labeled Data In modern large-scale observational studies, data collection constraints often result in partially labeled datasets, posing challenges for reliable causal inference, especially due to potential labeling bias and relatively small size of the labeled data. This paper introduces a decaying missing-at-random (decaying MAR) framework and associated approaches for doubly robust causal inference on treatment effects in such semi-supervised (SS) settings. This simultaneously addresses selection bias in the labeling mechanism and the extreme imbalance between labeled and unlabeled groups, bridging the gap between the standard SS and missing data literatures, while throughout allowing for confounded treatment assignment and high-dimensional confounders under appropriate sparsity conditions. To ensure robust causal conclusions, we propose a bias-reduced SS (BRSS) estimator for the average treatment effect, a type of 'model doubly robust' estimator appropriate for such settings, establishing asymptotic normality at the appropriate rate under decaying labeling propensity scores, provided that at least one nuisance model is correctly specified. Our approach also relaxes sparsity conditions beyond those required in existing methods, including standard supervised approaches. Recognizing the asymmetry between labeling and treatment mechanisms, we further introduce a de-coupled BRSS (DC-BRSS) estimator, which integrates inverse probability weighting (IPW) with bias-reducing techniques in nuisance estimation. This refinement further weakens model specification and sparsity requirements. Numerical experiments confirm the effectiveness and adaptability of our estimators in addressing labeling bias and model misspecification. Yuqian Zhang Abhishek Chakrabortty Jelena Bradic http://arxiv.org/abs/2504.01318v2 2025-04-21T03:25:44Z 2025-04-02T03:05:28Z Tail Bounds for Canonical $U$-Statistics and $U$-Processes with Unbounded Kernels In this paper, we prove exponential tail bounds for canonical (or degenerate) $U$-statistics and $U$-processes under exponential-type tail assumptions on the kernels. Most of the existing results in the relevant literature often assume bounded kernels or obtain sub-optimal tail behavior under unbounded kernels. We obtain sharp rates and optimal tail behavior under sub-Weibull kernel functions. Some examples from nonparametric and semiparametric statistics literature are considered. Abhishek Chakrabortty Arun K. Kuchibhotla This is a slightly edited version of the 2018 draft available at https://faculty.wharton.upenn.edu/wp-content/uploads/2018/10/Chakrabortty-UStat-Draft.pdf. Added more comments on the assumptions and the proof technique of Theorem 1. Corrected a few typos. More improvements to follow in the future for the U-process results http://arxiv.org/abs/2406.06941v3 2025-04-21T00:07:13Z 2024-06-11T04:49:04Z Efficient estimation and data fusion under general semiparametric restrictions on outcome mean functions We provide a novel characterization of semiparametric efficiency in a generic supervised learning setting where the outcome mean function -- defined as the conditional expectation of the outcome of interest given the other observed variables -- is restricted to lie in some known semiparametric function class. The primary motivation is causal inference where a researcher running a randomized controlled trial often has access to an auxiliary observational dataset that is confounded or otherwise biased for estimating causal effects. Prior work has imposed various bespoke assumptions on this bias in an attempt to improve precision via data fusion. We show how many of these assumptions can be formulated as restrictions on the outcome mean function in the concatenation of the experimental and observational datasets. Then our theory provides a unified framework to maximally leverage such restrictions for precision gain by constructing efficient estimators in all of these settings as well as in a wide range of others that future investigators might be interested in. For example, when the observational dataset is subject to outcome-mediated selection bias, we show our novel efficient estimator dominates an existing control variate approach both asymptotically and in numerical studies. Harrison H. Li 52 pages, 4 figures. Substantially rewritten for clarity from previous version http://arxiv.org/abs/2504.14659v1 2025-04-20T15:42:41Z 2025-04-20T15:42:41Z Markovian Continuity of the MMSE Minimum mean square error (MMSE) estimation is widely used in signal processing and related fields. While it is known to be non-continuous with respect to all standard notions of stochastic convergence, it remains robust in practical applications. In this work, we review the known counterexamples to the continuity of the MMSE. We observe that, in these counterexamples, the discontinuity arises from an element in the converging measurement sequence providing more information about the estimand than the limit of the measurement sequence. We argue that this behavior is uncharacteristic of real-world applications and introduce a new stochastic convergence notion, termed Markovian convergence, to address this issue. We prove that the MMSE is, in fact, continuous under this new notion. We supplement this result with semi-continuity and continuity guarantees of the MMSE in other settings and prove the continuity of the MMSE under linear estimation. Elad Domanovitz Anatoly Khina http://arxiv.org/abs/2503.14747v2 2025-04-20T15:15:52Z 2025-03-18T21:22:55Z Testing Conditional Stochastic Dominance at Target Points This paper introduces a novel test for conditional stochastic dominance (CSD) at specific values of the conditioning covariates, referred to as target points. The test is relevant for analyzing income inequality, evaluating treatment effects, and studying discrimination. We propose a Kolmogorov--Smirnov-type test statistic that utilizes induced order statistics from independent samples. Notably, the test features a data-independent critical value, eliminating the need for resampling techniques such as the bootstrap. Our approach avoids kernel smoothing and parametric assumptions, instead relying on a tuning parameter to select relevant observations. We establish the asymptotic properties of our test, showing that the induced order statistics converge to independent draws from the true conditional distributions and that the test is asymptotically of level $\alpha$ under weak regularity conditions. While our results apply to both continuous and discrete data, in the discrete case, the critical value only provides a valid upper bound. To address this, we propose a refined critical value that significantly enhances power, requiring only knowledge of the support size of the distributions. Additionally, we analyze the test's behavior in the limit experiment, demonstrating that it reduces to a problem analogous to testing unconditional stochastic dominance in finite samples. This framework allows us to prove the validity of permutation-based tests for stochastic dominance when the random variables are continuous. Monte Carlo simulations confirm the strong finite-sample performance of our method. Federico A. Bugni Ivan A. Canay Deborah Kim http://arxiv.org/abs/2403.11343v3 2025-04-20T15:05:22Z 2024-03-17T21:04:48Z Federated Transfer Learning with Differential Privacy Federated learning has emerged as a powerful framework for analysing distributed data, yet two challenges remain pivotal: heterogeneity across sites and privacy of local data. In this paper, we address both challenges within a federated transfer learning framework, aiming to enhance learning on a target data set by leveraging information from multiple heterogeneous source data sets while adhering to privacy constraints. We rigorously formulate the notion of federated differential privacy, which offers privacy guarantees for each data set without assuming a trusted central server. Under this privacy model, we study three classical statistical problems: univariate mean estimation, low-dimensional linear regression, and high-dimensional linear regression. By investigating the minimax rates and quantifying the cost of privacy in each problem, we show that federated differential privacy is an intermediate privacy model between the well-established local and central models of differential privacy. Our analyses account for data heterogeneity and privacy, highlighting the fundamental costs associated with each factor and the benefits of knowledge transfer in federated learning. Mengchu Li Ye Tian Yang Feng Yi Yu 89 pages, 4 figures http://arxiv.org/abs/2503.22080v2 2025-04-20T11:16:34Z 2025-03-28T01:57:15Z An Improved Satterthwaite Effective Degrees of Freedom Correction for Weighted Syntheses of Variance This article presents an improved approximation for the effective degrees of freedom in the Satterthwaite (1941, 1946) method which estimates the distribution of a weighted combination of variance components The standard Satterthwaite approximation assumes a scaled chisquare distribution for the composite variance estimator but is known to be biased downward when component degrees of freedom are small. Building on recent work by von Davier (2025), we propose an adjusted estimator that corrects this bias by modifying both the numerator and denominator of the traditional formula. The new approximation incorporates a weighted average of component degrees of freedom and a scaling factor that ensures consistency as the number of components or their degrees of freedom increases. We demonstrate the utility of this adjustment in practical settings, including Rubin's (1987) total variance estimation in multiple imputations, where weighted variance combinations are common. The proposed estimator generalizes and further improves von Davier's (2025) unweighted case and more accurately approximates synthetic variance estimators with arbitrary weights. Matthias von Davier http://arxiv.org/abs/2504.14555v1 2025-04-20T10:03:08Z 2025-04-20T10:03:08Z Nonparametric Estimation in Uniform Deconvolution and Interval Censoring In the uniform deconvolution problem one is interested in estimating the distribution function $F_0$ of a nonnegative random variable, based on a sample with additive uniform noise. A peculiar and not well understood phenomenon of the nonparametric maximum likelihood estimator in this setting is the dichotomy between the situations where $F_0(1)=1$ and $F_0(1)<1$. If $F_0(1)=1$, the MLE can be computed in a straightforward way and its asymptotic pointwise behavior can be derived using the connection to the so-called current status problem. However, if $F_0(1)<1$, one needs an iterative procedure to compute it and the asymptotic pointwise behavior of the nonparametric maximum likelihood estimator is not known. In this paper we describe the problem, connect it to interval censoring problems and a more general model studied in Groeneboom (2024) to state two competing naturally occurring conjectures for the case $F_0(1)<1$. Asymptotic arguments related to smooth functional theory and extensive simulations lead us to to bet on one of these two conjectures. Piet Groeneboom Geurt Jongbloed 16 pages, 4 figures http://arxiv.org/abs/2502.09832v2 2025-04-20T00:32:19Z 2025-02-14T00:24:51Z Algorithmic contiguity from low-degree conjecture and applications in correlated random graphs In this paper, assuming a natural strengthening of the low-degree conjecture, we provide evidence of computational hardness for two problems: (1) the (partial) matching recovery problem in the sparse correlated Erd\H{o}s-R\'enyi graphs $\mathcal G(n,q;\rho)$ when the edge-density $q=n^{-1+o(1)}$ and the correlation $\rho<\sqrt{\alpha}$ lies below the Otter's threshold, solving a remaining problem in \cite{DDL23+}; (2) the detection problem between the correlated sparse stochastic block model $\mathcal S(n,\tfrac{\lambda}{n};k,\epsilon;s)$ and a pair of independent stochastic block models $\mathcal S(n,\tfrac{\lambda s}{n};k,\epsilon)$ when $\epsilon^2 \lambda s<1$ lies below the Kesten-Stigum (KS) threshold and $s<\sqrt{\alpha}$ lies below the Otter's threshold, solving a remaining problem in \cite{CDGL24+}. One of the main ingredient in our proof is to derive certain forms of \emph{algorithmic contiguity} between two probability measures based on bounds on their low-degree advantage. To be more precise, consider the high-dimensional hypothesis testing problem between two probability measures $\mathbb{P}$ and $\mathbb{Q}$ based on the sample $\mathsf Y$. We show that if the low-degree advantage $\mathsf{Adv}_{\leq D} \big( \frac{\mathrm{d}\mathbb{P}}{\mathrm{d}\mathbb{Q}} \big)=O(1)$, then (assuming the low-degree conjecture) there is no efficient algorithm $\mathcal A$ such that $\mathbb{Q}(\mathcal A(\mathsf Y)=0)=1-o(1)$ and $\mathbb{P}(\mathcal A(\mathsf Y)=1)=\Omega(1)$. This framework provides a useful tool for performing reductions between different inference tasks. Zhangsong Li 37 pages. Fixed several typos and added a proof of Theorem~3.2 assuming only the original low-degree conjecture in Appendix~C http://arxiv.org/abs/2310.20460v3 2025-04-19T23:59:36Z 2023-10-31T13:54:11Z Aggregating Dependent Signals with Heavy-Tailed Combination Tests Combining dependent p-values poses a long-standing challenge in statistical inference, particularly when aggregating findings from multiple methods to enhance signal detection. Recently, p-value combination tests based on regularly varying-tailed distributions, such as the Cauchy combination test and harmonic mean p-value, have attracted attention for their robustness to unknown dependence. This paper provides a theoretical and empirical evaluation of these methods under an asymptotic regime where the number of p-values is fixed and the global test significance level approaches zero. We examine two types of dependence among the p-values. First, when p-values are pairwise asymptotically independent, such as with bivariate normal test statistics with no perfect correlation, we prove that these combination tests are asymptotically valid. However, they become equivalent to the Bonferroni test as the significance level tends to zero for both one-sided and two-sided p-values. Empirical investigations suggest that this equivalence can emerge at moderately small significance levels. Second, under pairwise quasi-asymptotic dependence, such as with bivariate t-distributed test statistics, our simulations suggest that these combination tests can remain valid and exhibit notable power gains over Bonferroni, even as the significance level diminishes. These findings highlight the potential advantages of these combination tests in scenarios where p-values exhibit substantial dependence. Our simulations also examine how test performance depends on the support and tail heaviness of the underlying distributions. Lin Gui Yuchao Jiang Jingshu Wang http://arxiv.org/abs/2401.13760v2 2025-04-19T19:50:12Z 2024-01-24T19:18:46Z Early Detection of Treatments Side Effect: A Sequential Approach With the emergence and spread of infectious diseases with pandemic potential, such as COVID- 19, the urgency for vaccine development have led to unprecedented compressed and accelerated schedules that shortened the standard development timeline. In a relatively short time, the leading pharmaceutical companies1, received an Emergency Use Authorization (EUA) for vaccine\prime s en-mass deployment To monitor the potential side effect(s) of the vaccine during the (initial) vaccination campaign, we developed an optimal sequential test that allows for the early detection of potential side effect(s). This test employs a rule to stop the vaccination process once the observed number of side effect incidents exceeds a certain (pre-determined) threshold. The optimality of the proposed sequential test is justified when compared with the ({\alpha}, {\beta}) optimality of the non-randomized fixed-sample Uniformly Most Powerful (UMP) test. In the case of a single side effect, we study the properties of the sequential test and derive the exact expressions of the Average Sample Number (ASN) curve of the stopping time (and its variance) via the regularized incomplete beta function. Additionally, we derive the asymptotic distribution of the relative savings in ASN as compared to maximal sample size. Moreover, we construct the post-test parameter estimate and studied its sampling properties, including its asymptotic behavior under local-type alternatives. These limiting behavior results are the consistency and asymptotic normality of the post-test parameter estimator. We conclude the paper with a small simulation study illustrating the asymptotic performance of the point and interval estimation and provide a detailed example, based on COVID-19 side effect data (see Beatty et al. (2021)) of our suggested testing procedure. Jiayue Wang Ben Boukai There are 21 pages, 8 pictures and 4 tables http://arxiv.org/abs/2409.15597v2 2025-04-19T14:53:49Z 2024-09-23T23:04:47Z Higher-criticism for sparse multi-stream change-point detection We study a statistical procedure based on higher criticism (HC) to address the sparse multi-stream quickest change-point detection problem. Namely, we aim to detect a potential change in the distribution of multiple data streams at some unknown time. If a change occurs, only a few streams are affected, whereas the identity of the affected streams is unknown. The HC-based procedure involves testing for a change point in individual streams and combining multiple tests using higher criticism. Relying on HC thresholding, the procedure also indicates a set of streams suspected to be affected by the change. We provide a theoretical analysis under a sparse heteroscedastic normal change-point model. We establish an information-theoretic detection delay lower bound when individual tests are based on the likelihood ratio or the generalized likelihood ratio statistics and show that the delay of the HC-based method converges in distribution to this bound. In the special case of constant variance, our bound coincides with known results in (Chan, 2017). We demonstrate the effectiveness of the HC-based method compared to other methods in detecting sparse changes through extensive numerical evaluations. Tingnan Gong Alon Kipnis Yao Xie Authors are listed in alphabetical order