https://arxiv.org/api/DLFFQfiwR1YGCPwQgfjshiku9f02026-06-10T20:48:40Z3612433015http://arxiv.org/abs/2605.14432v2Singular Asymptotics of SPADE in Quantum Source Discrimination2026-05-31T07:19:17ZWe study far-field discrimination between one and two incoherent point sources in the singular regime of weak and closely spaced emitters. Under ideal alignment, spatial-mode demultiplexing (SPADE) attains the quantum-optimal large-sample Stein exponent, but the finite-photon behavior near the one-source boundary and the effect of realistic imperfections remain less understood. Using singular learning theory, we analyze both the aligned and misaligned problems. In the aligned Gaussian case, we derive the zeta-function poles for direct imaging and SPADE, show that both share the same real log canonical threshold $λ=1/2$ but differ in multiplicity, and obtain the corresponding Bayes free-energy asymptotics. This yields a universal subleading advantage of aligned SPADE in the local prior-weighted regime. In the misaligned setting, we study a physically motivated binary-SPADE reduction that retains the full leading $O(s^2)$ leakage contrast near alignment, with corrections from the detailed higher-mode redistribution entering only at $O(s^4)$. We show that misaligned binary-SPADE and direct imaging acquire nontrivial local power on different intrinsic scales, $s=O(n^{-1/4})$ and $s=O(n^{-1/2})$, respectively. However, finite-$n$ Neyman--Pearson comparisons under common physical conditions reveal that direct imaging is stronger on the plotted grids and that misaligned binary-SPADE exhibits an exact blind separation $s^\ast=2θ$, where its power collapses to $α$. These results identify model singularity as a structural organizing principle for finite-photon quantum discrimination and clarify how ideal aligned SPADE benchmarks can fail to translate into finite-$n$ advantages under misalignment.2026-05-14T06:26:41Z13 pages, 2 figuresNatsuki Kariyahttp://arxiv.org/abs/2606.01034v1A Finite-Calibration Regime Map for LLM Judge Panels2026-05-31T05:50:27ZWe study when LLM judge panels should be calibrated with low-dimensional stackers versus joint output tables under finite human-label budgets. Low-dimensional stackers have small estimation cost but miss interactions, whereas joint-table calibrators can represent interactions but pay for cell counts and unseen patterns. We cast this tradeoff as a finite-calibration regime map and instantiate it as Finite-Calibration Panel Selection, a deployable validation selector over judge path, prefix size, and aggregator family with table and parametric estimation diagnostics. On RewardBench, LLMBar, SummEval, and Arena100K with a seven-judge pool including DeepSeek V4 Flash, scalar/reliability aggregation wins 16 of 20 real dataset--budget cells, indicating that current judge outputs are often additive or redundant. Controlled calibration-growth data show the complementary regime: additive labels remain scalar-favored, whereas a six-way interaction selects a larger joint table and its test MSE drops from 0.224 to 0.061 once unseen mass vanishes. Thus the practical question is not ``how many judges?'' but whether the next judge's information is estimable under the available human labels.2026-05-31T05:50:27ZWork in ProgressBin ZhuYanghui Raohttp://arxiv.org/abs/2606.01011v1Semiparametric Efficiency of Residual Correlation Testing under Gaussian Additive Noise Models2026-05-31T04:59:33ZThis paper studies conditional independence testing under the Gaussian additive noise model (GANM), where two variables are modeled as nonlinear functions of covariates with independent bivariate Gaussian regression errors. Under this framework, conditional independence can be characterized by the correlation coefficient of the regression errors, which motivates a test based on the Pearson correlation coefficient computed from the fitted residuals. Despite its simple form, the asymptotic behavior and statistical efficiency of the resulting test have not been well understood. In this paper, we develop the semiparametric efficiency theory under GANM and show, surprisingly, that the efficient estimator coincides exactly with the ordinary residual Pearson correlation estimator. We further establish the asymptotic properties of the proposed test and develop the corresponding inference procedure. Simulation studies demonstrate that the proposed method achieves near-oracle efficiency and competitive empirical power while maintaining valid Type I error control. We further apply the proposed test to conditional dependence analysis of U.S. stock returns.2026-05-31T04:59:33ZYin TangYanyuan MaBing Lihttp://arxiv.org/abs/2606.01002v1Theoretical Analysis of Engression and Reverse Markov Engression2026-05-31T04:37:44ZEngression is a recently proposed and effective framework for conditional distribution learning. Its multi-step Reverse Markov extension further improves generative flexibility by decomposing complex conditional sampling into sequential reverse transitions. Despite their strong empirical performance, rigorous finite-sample statistical guarantees for these methods remain unavailable. In this paper, under deep neural network parameterizations, we establish nonasymptotic convergence bounds for Engression by directly controlling the Energy Distance between the learned and target conditional distributions. For the Reverse Markov framework, we further develop an Energy-Distance-based chain rule that enables a rigorous analysis of error propagation across reverse steps. Our analysis yields corresponding excess-risk bounds that are near-optimal up to logarithmic factors relative to the classical minimax rate over a general Hölder class.2026-05-31T04:37:44ZJiaqi HuangGongjun XuJi Zhuhttp://arxiv.org/abs/2606.00965v1Design-based edge-level causal inference with machine learning assisted covariate adjustment2026-05-31T02:47:56ZWe study design-based causal inference for edge-level outcomes in directed networks under dyadic interference. In this setting, outcomes are defined on directed edges and depend on the joint treatment assignments of pairs of units, inducing a complex dependence structure that invalidates standard estimation and inference procedures developed for node-level data. We construct Horvitz--Thompson estimators for a general class of edge-level causal effects and establish their asymptotic normality under mild regularity conditions. To enable valid inference, we develop variance estimators that exploit identifiable components of network dependence, yielding substantially less conservative bounds than classical approaches. To improve efficiency, we incorporate auxiliary covariates through a sample splitting and cross-fitting procedure. A key technical challenge is that standard two-fold sample splitting fails in the presence of edge-level outcomes due to the dependence induced by shared units. To address this issue, we introduce a three-fold sample splitting and cross-fitting scheme that restores the conditional independence required for unbiased estimation. Under a stability condition, the resulting covariate-adjusted estimator is asymptotically normal and accommodates both linear adjustment and flexible machine learning methods. We further introduce a calibration step that guarantees no asymptotic efficiency loss relative to the unadjusted estimator. Simulation studies and a real-data application confirm the theoretical results and demonstrate substantial efficiency gains.2026-05-31T02:47:56ZHaoyang YuYilin LiLu DengYong WangXin LuHanzhong Liuhttp://arxiv.org/abs/2606.00934v1Efficient Synthetic Network Generation via Latent Embedding Reconstruction2026-05-31T00:01:13ZNetwork data are ubiquitous across the social sciences, biology, and information systems. Generating realistic synthetic network data has broad applications from network simulation to scientific discovery. However, many existing black-box approaches for network generation tend to overfit observed data while overlooking characteristic network structure, and incur substantial computational overhead at scale. These practical challenges call for synthetic network generation methods that are both efficient and capable of capturing structural properties of networks. In this paper, we introduce Synthetic Network Generation via Latent Embedding Reconstruction (SyNGLER), a general and efficient framework for synthetic network generation that builds on latent space network models. Given an observed network, SyNGLER first learns low-dimensional latent node embeddings via a latent space network model and then reconstructs the latent space by building a distribution-free generator over these embeddings. For generation, SyNGLER first samples (or resamples) node embeddings from the generator in the latent space and then produces synthetic networks using the latent space network model. Through the latent space framework, SyNGLER preserves unique characteristics in networks such as sparsity and node degree heterogeneity, while allowing for efficient training with lower computational cost than many existing deep architectures. We provide theoretical guarantees by developing consistency results on the distance between the true and synthetic edge distributions. Empirical studies further demonstrate the effectiveness of SyNGLER, which efficiently produces networks that better preserve key network characteristics such as network moments and degree distributions compared with existing approaches. Code is available at https://github.com/FeifanJiang/syngler.2026-05-31T00:01:13ZFeifan JiangYinan BuShihao WuGongjun XuJi Zhuhttp://arxiv.org/abs/2210.02850v2Evaluating the Impact of COVID-19 Vaccination in the United Kingdom: A Gaussian Process Approach2026-05-30T22:54:23ZThe rapid rollout of COVID-19 vaccines in the United Kingdom in early 2021 differed markedly from that of many other European countries, providing a natural setting to assess the impact of vaccination speed on public health outcomes. We evaluate the impact of the accelerated UK vaccination rollout and associated policy transition on COVID-19 mortality and transmission dynamics by constructing a probabilistic reference trajectory for the UK under a slower vaccination and reopening trajectory. The proposed framework combines ideas from interrupted time series analysis and synthetic control methods with flexible probabilistic modelling based on multi-output Gaussian processes. These models capture non-linear and heterogeneous dependence structures across countries and over time, while providing uncertainty quantification through predictive distributions. A central feature of the methodology is a design-consistent validation strategy based on predictive performance in held-out pre-intervention periods, which is used both to guide model specification and to assess the plausibility of the reconstructed reference trajectory. The empirical results indicate a substantial reduction in COVID-19 mortality associated with the accelerated vaccination-policy transition, with little evidence of an effect on transmission rates. Generally, the framework illustrates how flexible probabilistic models and predictive validation can support causal and policy evaluation in complex time series settings.2022-10-06T12:10:57ZGianluca GiudiceSara GenelettiKonstantinos Kalogeropouloshttp://arxiv.org/abs/2606.00900v1Notes on Randomized Controlled Trials for Studying Social Media Harms2026-05-30T21:46:00ZRandomized controlled trials (RCTs) and person-level observational studies feature prominently in debates over social media harms. I highlight some under-acknowledged limitations of such evidence. Most important is that published RCTs typically identify effects of a \textit{local}, or small-scale, intervention: a person is assigned to quit social media, but her immediate peers continue using it in large numbers. Critics of social media, in contrast, focus on a \textit{global}, or large-scale, intervention: the mass adoption of social media among U.S. teenagers. Such global interventions alter both the proximal social environment and the broader culture, potentially harming teenagers who abstain from social media entirely. This paper discusses the local--global distinction at length and offers other notes on the limits of learning about social media harms from existing RCTs and person-level observational studies. I suggest that triangulating different forms of imperfect evidence may provide the deepest insights about social media's aggregate effect on teen mental health.2026-05-30T21:46:00Z34 pages, 2 figuresChris Feltonhttp://arxiv.org/abs/2606.00887v1Hypothesis Testing for a Functional Parameter via Self-normalization2026-05-30T20:45:33ZTesting simple or composite hypothesis on a functional parameter has attracted considerable attention in time series analysis. To accommodate for the unknown temporal dependence, classical nonparametric approaches such as block bootstrapping and subsampling all involve a bandwidth parameter, the choice of which can substantially affect the finite sample performance. The self normalization (SN) method is tuning parameter free when applied to the inference of a finite-dimensional parameter but its applicability to a functional parameter is unknown.
In this paper, we propose a sample splitting based approach to generalize the SN method to hypothesis testing of a functional parameter. Our SS-SN (sample splitting plus self-normalization) idea is broadly applicable to many testing problems for functional parameters, including testing for simple/composite hypothesis on marginal cumulative distribution function, testing for time-reversibility and testing for a change point on the spectral distribution of a multivariate time series. Specifically, we derive the pivotal limiting distributions of our SS-SN test statistics under the null for both simple and composite null hypothesis, and derive the limiting power function under the local alternatives. Numerical simulations show that our new tests tend to yield accurate size with competitive power performance as compared to many existing ones.2026-05-30T20:45:33ZJournal of the American Statistical Association 2025 120 (552)Yi ZhangXiaofeng Shao10.1080/01621459.2025.2483483http://arxiv.org/abs/2606.00878v1Anytime-valid testing with e-values and confirmatory adaptive designs2026-05-30T20:24:29ZConfirmatory adaptive designs were introduced more than 30 years ago and enable for example sample size re-assessments and the selection of treatments, endpoints as well as subpopulations during the course of a clinical trial. Recently, sequential tests based on e-values for an anytime-valid inference have been developed, promising seemingly similar or even more flexibility and utility. In this note, we compare these two independently developed concepts, shedding light on their formal and methodological connections and differences. Specifically, we show that adaptive design tools like conditional error functions and combination tests are formally equivalent to e-value based, anytime-valid sequential tests. However, in spite of their common fundamental intention to bring flexibility into statistical inference, they have quite different emphases: While hypothesis testing with combination tests and conditional error function usually intent to exhaust type I error rates under the offered flexibility, e-value based testing aims on the additional flexibility with regard to optional continuation, the chosen level and, in recent extensions, in the loss functions to be controlled. We also indicate how recent e-value achievements could enrich clinical trial methodology and adaptive design methodology could inspire and improve e-value based testing.2026-05-30T20:24:29ZWerner BrannathLasse Fischerhttp://arxiv.org/abs/2606.00864v1Another Look at Bandwidth-free Inference: a Sample Splitting Approach2026-05-30T19:40:37ZThe bandwidth-free tests/inferences for a multi-dimensional parameter have attracted considerable attention in econometrics and statistics literature. These tests can be conveniently implemented due to their tuning-parameter free nature and possess more accurate size as compared to the traditional HAC-based approaches, where consistent long run variance estimation was involved. However, when sample size is small/medium, these bandwidth-free tests exhibit large size distortion when both the dimension of the parameter and the magnitude of temporal dependence are moderate, making them unreliable to use in practice. In this paper, we propose a sample splitting based approach to reduce the dimension of the parameter to one for the subsequent bandwidth-free inference.
Our SS-SN (sample splitting plus self-normalization) idea is broadly applicable to many testing problems for time series, including mean testing, testing for zero autocorrelation, linear hypotheses testing in a time series regression model and testing for a change point in multivariate mean. Specifically, we propose $L_{\infty}$-type and $L_2$-type SS-SN test statistics and derive their limiting distributions under both the null and alternatives and show their effectiveness in alleviating size distortion via simulations. As an important theoretical contribution, we obtain the limiting distributions for both SS-SN test statistics in the multivariate mean testing problem when the dimension is allowed to diverge as sample size grows to infinity. In addition we show the asymptotic independence of $L_{\infty}$-type and $L_2$-type SS-SN test statistics under the null in the growing dimensional setting.2026-05-30T19:40:37ZJournal of the Royal Statistical Society Series B: Statistical Methodology 2024 86 (1)Yi ZhangXiaofeng Shao10.1093/jrsssb/qkad108http://arxiv.org/abs/2606.00858v1Change-Point Detection for Object-valued Time Series2026-05-30T19:20:06ZThis article is concerned with change point detection for object-valued data that reside in a metric space, which has attracted some recent interests in statistics and econometrics literature. The existing methods either focus on independent data or can only detect change in the Fréchet mean or variance. In this paper, we propose a self-normalization (SN, hereafter) based statistic for detecting a shift in the marginal distribution of object-valued time series. Our test is universally applicable to a wide range of object-valued data, such as distributional and network data, and can accommodate weak serial dependence. In addition the proposed test statistic is almost tuning parameter free, has pivotal limiting null distribution and only uses the pairwise distances. When combined with the Wild Binary Segmentation algorithm (WBS, hereafter), our statistic can be used to estimate the number and locations of multiple change points. Asymptotic results for our SN based statistic are derived under both null and local alternatives in the single change point setting. For the first time, the WBS estimation consistency is shown for a broad class of object-valued time series and in a nonparametric setting, which requires new non-standard theoretical arguments. Extensive numerical experiments and real data analysis are conducted to illustrate the effectiveness and broad applicability of our proposed method.2026-05-30T19:20:06ZJournal of Business and Economic Statistics 2026 44 1Yi ZhangChangbo ZhuXiaofeng Shao10.1080/07350015.2025.2520862http://arxiv.org/abs/2606.00847v1Partial Identification under High-Dimensional Potential Outcomes and Confounders via Optimal Transport2026-05-30T18:48:32ZPartial identification provides informative causal guarantees when point identification is impossible, but existing approaches based on optimal transport (OT) become computationally and statistically intractable in high-dimensional settings. This limitation is particularly severe when both potential outcomes and confounders are high-dimensional, where classical OT-based bounds suffer from the curse of dimensionality and unfavorable convergence rates. To address this challenge, we propose a novel estimator that decomposes the transport problem into a low-dimensional signal subspace and a high-dimensional residual subspace. Unlike existing projection-based methods that discard residual information, we recover the residual transport energy using the Sliced Wasserstein distance, which is computationally efficient and robust to high dimensions. We establish interpretable conditions controlling the approximation gap based on residual structure and provide a data-driven rule for signal dimension selection. Empirical results show that our estimator consistently outperforms projection-only baselines by recovering lost transport energy, yielding more informative causal bounds while remaining computationally tractable in high dimensions.2026-05-30T18:48:32Z19 pages, 3 figuresYunfeng WangZhiheng ZhangZijun Gaohttp://arxiv.org/abs/2606.00839v1Sequential multiple testing with multiple hypotheses and prior information on the hypothesis configuration2026-05-30T18:29:46ZIn this work, we study the problem of testing the marginal distributions of multiple independent, sequentially observed data streams, where for each stream there are multiple candidate hypotheses to select from, in the presence of prior information on the unknown hypothesis configuration. The goal is to understand the benefit of such information and to design a sequential testing procedure that effectively leverages it. We start with arbitrary prior information and specialize to concrete examples, including known number or known lower bound on the number of streams following each hypothesis, and the presence of exclusive hypotheses. The designed procedure is three-fold: (i) reliable, i.e., controlling all types of familywise error probabilities below arbitrary user-specified levels, (ii) computationally efficient, i.e., focusing on minimal sets of alternative hypothesis configurations in making decisions, and (iii) asymptotically optimal, i.e., achieving the minimum expected sample size among all reliable procedures asymptotically as the error levels go to zero. Numerical studies are presented for illustration.2026-05-30T18:29:46Z20 pages, 4 figuresYiming Xinghttp://arxiv.org/abs/2507.21692v2Signal Detection under Composite Hypotheses with Identical Distributions for Signals and for Noises2026-05-30T18:28:23ZIn this paper, we consider the problem of detecting signals in multiple, sequentially observed data streams, where the distribution of each stream lies in one of two common composite spaces, depending on whether it is a signal or a noise. For this problem, we study a practical yet underexplored setting where it is a priori known that all signals have an identical distribution and so do all noises. Compared to the general setting where local distributions are free to take any values, this structure facilitates faster decision-making thanks to a smaller joint distribution space. However, it introduces additional challenges to the analysis of problem and design of tests, since the local distributions are now coupled. In this paper, we first establish a universal lower bound on the minimum expected sample size, which characterizes the essential difficulty of the problem and involves constants that are neither the minimum Kullback-Leibler divergences between the signal/noise distribution to the noise/signal distribution space, which appear in the lower bound for the general setting, nor the Kullback-Leibler divergences between the signal distribution and the noise distribution. Besides, we propose a test that controls the two types of familywise error rates below arbitrary levels, and achieves the minimum expected sample size asymptotically as the levels go to zero. Numerical studies are presented to compare with the state-of-the-art test for the general setting and demonstrate robustness against model misspecification.2025-07-29T11:15:48Z9 pages, 3 figureYiming XingAnamitra ChaudhuriYifan Chen