https://arxiv.org/api/UHiOO8o1URzZm2j/R7u8mPJLOXs 2026-06-10T02:03:11Z 36124 75 15 http://arxiv.org/abs/2605.16866v3 Heavy Tails and Predictive Ability Testing 2026-06-07T13:11:45Z

We study the asymptotic behaviour of widely used tests for evaluating and comparing predictive accuracy when forecast errors exhibit heavy tails. In particular, when loss differentials have infinite variance, the Diebold-Mariano test statistic converges to a nonstandard limit involving non-Gaussian stable random variables. As a consequence, conventional critical values can yield severely distorted inference: a nominal 5$\%$ test may reject a true null as often as 70$\%$ of the time. To establish these results, we develop a new stable limit theorem for strongly mixing, infinite-variance time series processes. Building on this theory, we consider sub-sampling-based inference that remains valid irrespective of tail-heaviness and requires no estimation of long-run variances or tail indices. An application to risk forecasts for emerging-market exchange rates shows that accounting for heavy tails can substantially alter conclusions about predictive performance relative to standard procedures.

2026-05-16T07:58:02Z 72 pages, 3 figures. Application in Econometrics Jonas F. Frederiksen Muneya Matsui Rasmus S. Pedersen http://arxiv.org/abs/2602.05553v2 Sensitivity analysis for contamination in egocentric-network randomized trials with interference 2026-06-07T10:51:15Z

Egocentric-Network Randomized Trials (ENRTs) are increasingly used to estimate causal effects under interference when measuring complete sociocentric network data is infeasible. ENRTs rely on egocentric network sampling, where a set of egos is first sampled, and each ego recruits a subset of its neighbors as alters. Treatments are then randomized across egos. While the observed ego-networks are disjoint by design, the underlying population network may contain edges connecting them, leading to contamination. Under a design-based framework, we show that the Horvitz-Thompson estimators of direct and indirect effects are biased whenever contamination is present. To address this, we derive bias-corrected estimators and propose a novel sensitivity analysis framework based on sensitivity parameters representing the probability or expected number of missing edges. This framework is implemented via both grid sensitivity analysis and probabilistic bias analysis, providing researchers with a flexible tool to assess the robustness of the causal estimators to contamination. We apply our methodology to the HIV Prevention Trials Network 037 study, finding that ignoring contamination may lead to underestimation of indirect effects and overestimation of direct effects.

2026-02-05T11:23:23Z Bar Weinstein Daniel Nevo http://arxiv.org/abs/2606.08560v1 CP-factorization for high dimensional tensor time series and double projection iterations 2026-06-07T10:34:03Z

We adopt the canonical polyadic (CP) decomposition to model high-dimensional tensor time series. Our primary goal is to identify and estimate the factor loadings in the CP decomposition. We propose a one-pass estimation procedure through standard eigen-analysis for a matrix constructed based on the serial dependence structure of the data. The asymptotic properties of the proposed estimator are established under a general setting as long as the factor loading vectors are linearly independent, allowing the factors to be correlated and the factor loading vectors to be not nearly orthogonal. The procedure adapts to the sparsity of the factor loading vectors, accommodates weak factors, and demonstrates strong performance across a wide range of scenarios. To further reduce estimation errors, we also introduce an iterative algorithm based on a novel double projection approach. We theoretically justify the improved convergence rate of the iterative estimator, and derive the associated limiting distribution. A consistent estimator of the asymptotic variance is also provided, which plays a key role in the related inference problems. All results are validated through extensive simulations and two real data applications.

2026-06-07T10:34:03Z Jinyuan Chang Guanglin Huang Qiwei Yao Long Yu http://arxiv.org/abs/2606.08551v1 Enhanced localized conformal prediction with imperfect auxiliary information 2026-06-07T10:12:30Z

There is growing interest in constructing conformal prediction sets that provide approximate or asymptotic conditional coverage guarantees, capturing local data heterogeneity. However, methods like localized conformal prediction (LCP) may face challenges in ensuring reliable prediction sets in regions with sparse calibration data. This paper introduces Enhanced Localized Conformal Prediction (ELCP), a novel approach that incorporates auxiliary data to refine localized prediction sets while preserving finite-sample marginal coverage guarantees. By utilizing a density-ratio-weighted kernel estimator, ELCP seamlessly integrates auxiliary and calibration data, accommodating potential distributional shifts and improving the local reliability of prediction sets. Theoretical analysis confirms that ELCP maintains marginal coverage and enhances asymptotic test-conditional coverage. Simulation results demonstrate its superior local coverage and smaller prediction sets compared to standard LCP, highlighting its effectiveness in settings with limited calibration data but available auxiliary information from related tasks.

2026-06-07T10:12:30Z Yinjie Min Liuhua Peng Changliang Zou http://arxiv.org/abs/2606.08499v1 A Transferability Criterion for Null-Optimized Variance Reduction in Cumulant-Based Error-Independence Testing 2026-06-07T07:56:14Z

Control-variate and polynomial-maximization (PMM) estimators are optimized at a single fixed distribution, yet they are increasingly proposed to strengthen hypothesis tests, which decide between two regions of a parameter family. We give a closed-form criterion for when this transfer succeeds. For an H0-centered augmentation of a target moment statistic with null-optimized weight vector K0, the alternative-side expectation equals the target plus K0^T mu_a,H1, where mu_a,H1 is the alternative-side mean of the augmenting basis. Null-variance reduction therefore transfers without bias only under the orthogonality condition K0^T mu_a,H1 = 0; requiring each augmenting function to remain mean-zero is sufficient but not necessary. We instantiate the criterion on the recently proposed Wiedermann-Shi third-order cumulant test for measurement-error independence. A second-order PMM correction is unbiased and lower-variance under the null (relative efficiency >= 1 in all 36 conditions; aggregated mean ARE values 1.23-5.16; Type-I 0.04-0.09), yet provably inconsistent under the alternative: the antisymmetric polynomial auxiliaries acquire nonzero means, attenuating the target by a closed-form factor and costing 7-52 percentage points of power, worst where the test is strongest and worsening under heavy tails. A fourth-order variant reduces variance (ratio 1.127) but fails a nuisance guard (rejection 0.295 versus 0.10). We derive a reusable alternative-consistency acceptance gate for variance-reduced test statistics.

2026-06-07T07:56:14Z 16 pages; no figures; submitted manuscript version Serhii Zabolotnii http://arxiv.org/abs/2606.08498v1 Tests for Independence of High-Dimensional Nonstationary Time Series 2026-06-07T07:54:53Z

This manuscript studies the problem of independence testing between two high-dimensional time series without assuming weak stationarity, that is, allowing their autocovariances to vary over time. To this end, we propose a bimodal weighted-average test statistic that removes the bias induced by temporal dependence under the null hypothesis, thereby avoiding the need to whiten the time series prior to hypothesis testing -- a procedure that is challenging in high-dimensional and nonstationary settings. To facilitate statistical inference, we develop a dependent wild bootstrap procedure. On the theoretical side, we derive a concentration inequality for quadratic forms of time series data stemming from a class of high-dimensional, nonlinear, and nonstationary processes. This result enables us to derive the asymptotic null distribution of the proposed test statistic and to establish the validity of the bootstrap algorithm. Numerical results show that the proposed test attains desired size and good power performance even when the dimension exceeds the sample size or when the data-generating process exhibits time-varying autocovariances. In contrast, tests based on whitening time series fail to maintain correct size in the presence of unstable autocovariance structures. Since nonstationary autocovariances commonly arise in real-life time series data, our work offers a robust procedure for independence testing.

2026-06-07T07:54:53Z Yunyi Zhang http://arxiv.org/abs/2606.08475v1 Parameter uncertainty in dynamical models: a practical identifiability index 2026-06-07T06:44:39Z

Ordinary differential equation models are widely used to understand and forecast complex dynamical systems, but their predictive value depends on reliable parameter estimation. Structural identifiability assesses whether parameters can be uniquely recovered from ideal observations, whereas practical identifiability depends on finite, noisy and partially observed data. We introduce the Practical Identifiability Index (PII), a marginal uncertainty-width metric based on the logarithmic span of confidence intervals. Expressed on an order-of-magnitude scale, the PII summarises how tightly individual positive-valued parameters are constrained by available observations, enabling comparison across parameters, models, error structures and observation designs. The PII is intended as a complementary diagnostic, not a standalone identifiability test, and should be interpreted alongside coverage, profile likelihoods, posterior summaries, sensitivity analysis or structural identifiability results. Using parametric bootstrap experiments across growth and compartmental epidemic models, we identify consistent principles: uncertainty decreases as calibration windows become more informative, increases with observation noise and parameter coupling, and remains high for latent or indirectly observed processes. Parameters governing early observable dynamics become constrained sooner, while additional observables can improve constraint for latent progression and recovery parameters. The PII provides a simple, reportable summary of marginal parameter uncertainty for dynamical modelling.

2026-06-07T06:44:39Z Hamed Karami Alexandra Smirnova Sunmi Lee Gerardo Chowell http://arxiv.org/abs/2606.08468v1 Nonparametric undirected graphical model selection using diffusion models 2026-06-07T06:22:13Z

Undirected graphical models provide a fundamental framework for representing conditional independence structures among high-dimensional random variables. While undirected graphical model selection has become a central problem in high-dimensional statistics, most existing methods are restricted to parametric settings. In this paper, we develop a nonparametric approach to undirected graphical model selection based on diffusion models. Recent work has shown that diffusion models can adapt to the unknown graph structure of the underlying distribution, yet utilizing these models for explicit graph estimation remains unexplored. To bridge this gap, we introduce a novel diffusion-based method for nonparametric undirected graphical model selection. We establish the model selection consistency of the proposed method and demonstrate its empirical performance through extensive simulations and two real data analyses.

2026-06-07T06:22:13Z Hyeok Kyu Kwon Myeonggu Kang Minwoo Chae Wanjie Wang http://arxiv.org/abs/2103.11066v6 Treatment Allocation under Uncertain Costs 2026-06-07T03:51:53Z

We consider the problem of learning how to optimally allocate treatments whose cost is uncertain and can vary with pre-treatment covariates. This setting may arise in medicine if we need to prioritize access to a scarce resource that different patients would use for different amounts of time, or in marketing if we want to target discounts whose cost to the company depends on how much the discounts are used. Here, we show that the optimal treatment allocation rule under budget constraints is a thresholding rule based on priority scores (those with a higher score are treated first), and we propose a number of practical methods for learning these priority scores using data from a randomized trial. Our formal results leverage a statistical connection between our problem and that of learning heterogeneous treatment effects under endogeneity using an instrumental variable. We find our method to perform well in a number of empirical evaluations.

2021-03-20T00:36:28Z Georgy Kalashnov Evan Munro Hao Sun Shuyang Du Stefan Wager http://arxiv.org/abs/2606.08418v1 TS-Neyman: Posterior Sampling for Adaptive Stratified Estimation 2026-06-07T02:36:16Z

Many model evaluation tasks reduce to estimating an average loss, error rate, or subgroup metric on a stratified pool when each label, human rating, or simulator call is costly. The precision-optimal Neyman allocation depends on within-stratum variances, which must be learned from the same observations used for estimation. We formulate this as a sequential allocation problem and use the exact one-step marginal variance reduction as the priority index. Replacing the unknown variances by independent inverse-chi-squared posterior draws yields TS-Neyman, a Thompson-sampling rule that preserves the oracle marginal-gain structure while randomizing over variance uncertainty. For any fixed finite number of strata, we prove almost-sure convergence of the TS-Neyman allocation proportions to the Neyman target, asymptotic optimality of the variance proxy, and a central limit theorem for the resulting adaptive stratified estimator. In two five-stratum budget-scaling benchmarks, one bounded-loss benchmark and one binary model-error benchmark in the spirit of Dai et al. 2023, TS-Neyman's relative efficiency stays within 5 percent of the oracle on the bounded-loss population and within about 15 percent on the binary benchmark. In an additional CivilComments real-data replay with confidence-based strata, it stays within about 8 percent of the oracle and improves on equal allocation by roughly 7 to 14 percent in MSE across budgets, while plug-in greedy and two-stage plug-in can degrade by over an order of magnitude under sparse pilots. Common-pilot warm-start and prior-sensitivity studies show that this behavior is stable under working-model and working-prior misspecification.

2026-06-07T02:36:16Z Kosuke Morikawa Mst Moushumi Pervin Jae Kwang Kim http://arxiv.org/abs/2606.08409v1 Matrix representations and distance metrics for unlabeled ranked phylogenetic networks 2026-06-07T02:17:03Z

Phylogenetic networks are graphs inferred from molecular sequence data that represent ancestral histories shaped by reticulate processes such as recombination, hybridization, and horizontal gene transfer. We introduce a family of distance metrics for rooted, ranked, unlabeled phylogenetic networks, extending a previously developed distance for ranked trees. Our approach relies on a bijective triangular matrix representation of phylogenetic networks that captures the temporal order of internal events, speciations, and hybridizations. Our metrics, defined as standard matrix norms, allow efficient quantitative comparisons of network topologies, timed networks and networks with differing numbers of hybridizations. Our distance can be used for both isochronous networks where all tips are sampled at one time point, and heterochronous networks where tips are allowed to be sampled at different time points. We show that our metrics capture biologically meaningful differences among evolutionary histories in both simulations and empirical posterior distributions of viral phylogenetic networks. These tools fill a methodological gap, enabling principled comparisons of ranked, unlabeled phylogenetic networks, including ancestral recombination graphs.

2026-06-07T02:17:03Z 25 pages, 11 figures. Submitted to the Proceedings of the National Academy of Sciences (PNAS) Jiayang Wang Julia A. Palacios Claudia Solís-Lemus http://arxiv.org/abs/2606.08407v1 Topological Effective Connectivity Modeling in Brain Networks 2026-06-07T02:11:09Z

Characterizing directed information flow in brain networks is difficult because neural circuits are full of recurrent feedback loops. Many existing tools for directed dependence assume a directed acyclic graph (DAG) structure to resolve directional ambiguity, and therefore cannot represent these loops. We present a nonparametric, information-theoretic framework that addresses this by coupling the discrete Hodge decomposition with lead-lag mutual information, splitting the resulting edge flow into three orthogonal components: a gradient term capturing hierarchical, feed-forward relationships; a curl term isolating triangle-level feedback loops; and a harmonic term capturing cyclic flow around topological holes. This separation makes it possible to disentangle feed-forward drive from recurrent circulation, which conventional measures conflate. We further develop a permutation-based hypothesis-testing layer that identifies nodes and triangular motifs whose information-flow signatures change significantly between conditions. We validate the framework on simulations with known ground-truth structure and apply it to local field potential recordings from a rodent model of focal ischemic stroke. In three of four animals, we find a post-stroke shift toward hierarchical, source-driven propagation at the expense of recurrent feedback, while the fourth shows no significant change.

2026-06-07T02:11:09Z 45 pages, 15 figures Anass El-Yaagoubi Moo K. Chung Hernando Ombao http://arxiv.org/abs/2603.21161v2 An information criterion for detecting periodicities in functional time series 2026-06-07T00:39:43Z

We propose an information criterion for determining an unknown number of periodic components in functional time series. Identifying the number of frequencies in large-scale time series has been a central focus. To achieve this goal, we suggest an iterative procedure, utilizing the residual process obtained through least squares fitting. This iterative approach demonstrates broad applicability. We establish the consistency of the estimated number of periodic components by minimizing the information criterion. The efficacy of the procedure is illustrated through numerical simulations. In real data analysis, we apply this information criterion to temperature data and sunspot data.

2026-03-22T10:28:54Z Computational Statistics & Data Analysis (2026) 108430 Rinka Sagawa Yan Liu Valentin Patilea 10.1016/j.csda.2026.108430 http://arxiv.org/abs/2306.06756v3 Semi-Parametric Inference for Doubly Stochastic Spatial Point Processes: An Approximate Penalized Poisson Likelihood Approach 2026-06-06T23:12:19Z

Doubly-stochastic point processes model the occurrence of events over a spatial domain as an inhomogeneous Poisson process conditioned on the realization of a random intensity function. They are flexible tools for capturing spatial heterogeneity and correlation. However, existing implementations of doubly-stochastic spatial models are computationally demanding, often have limited theoretical guarantee, and/or rely on restrictive assumptions. We propose a penalized regression method for estimating covariate effects in doubly-stochastic point processes that is computationally efficient and does not require a parametric form or stationarity of the underlying intensity. Our approach is based on an approximate (discrete and deterministic) formulation of the true (continuous and stochastic) intensity function. We show that consistency and asymptotic normality of the covariate effect estimates can be achieved despite the model misspecification, and develop a covariance estimator that leads to a valid, albeit conservative, statistical inference procedure. A simulation study shows the validity of our approach under less restrictive assumptions on the data generating mechanism, and an application to Seattle crime data demonstrates better prediction accuracy compared with existing alternatives.

2023-06-11T19:48:39Z Si Cheng Jon Wakefield Ali Shojaie http://arxiv.org/abs/2606.08322v1 Orthogonality and Dimensionality in Airline Cluster Analysis using PCA and Kernel PCA 2026-06-06T20:20:55Z

To characterize the US airline profit cycles from 1995 to 2020, the authors of Renold et al. (2023) combine k-means clustering, principal component analysis, and system dynamic modelling. We replicate their clustering experiment in three spaces -- the original 7-dimensional raw-variable space, a 3-dimensional PC score space, and a 4-dimensional PC score space using their dataset gratefully included in the paper. We show that the six-cluster taxonomy is geometrically robust: k-means in 3-PC space produces bit-for-bit identical cluster assignments relative to 7D raw space. As a nonlinearity check we apply kernel PCA under six kernels spanning three families plus a linear baseline. All six kernels preserve the six-cluster assignment in 2D. A 1D diagnostic tightens this: the linear kernel conflates the COVID year C_3 with the peak-profit cluster C_0, whereas all five non-baseline kernels shift C_3 to overlap only the post-financial-crisis cluster C_5. Agreement across the kernel families confirms an intrinsically linear manifold with no hidden curvature. The silhouette criterion reveals that the dataset structurally supports only three clusters, not six. Collinearity in the raw 7D space suppresses the silhouette signal that would otherwise identify k=3 as the structurally motivated choice.

2026-06-06T20:20:55Z Andreas Schlapbach