https://arxiv.org/api/NqyrkpiGgoQDcpGlyBnRVAXeHx0 2026-03-31T08:21:51Z 34813 195 15 http://arxiv.org/abs/2603.22729v1 Behavioral Heterogeneity as Quantum-Inspired Representation 2026-03-24T02:48:18Z Driver heterogeneity is often reduced to labels or discrete regimes, compressing what is inherently dynamic into static categories. We introduce quantum-inspired representation that models each driver as an evolving latent state, presented as a density matrix with structured mathematical properties. Behavioral observations are embedded via non-linear Random Fourier Features, while state evolution blends temporal persistence of behavior with context-dependent profile activation. We evaluate our approach on empirical driving data, Third Generation Simulation Data (TGSIM), showing how driving profiles are extracted and analyzed. 2026-03-24T02:48:18Z Mohammad Elayan Wissam Kontar http://arxiv.org/abs/2603.22719v1 A Frequency-Domain Approach for Integrating Multiple Functional Time Series 2026-03-24T02:28:01Z Integrative analysis of multivariate functional time series (MFTS) is both critical and challenging across many scientific domains. Such data often exhibit complex multi-way dependencies arising from within-curve structures, temporal correlations across curves, and cross-subject interactions, underscoring the need for efficient methods that can jointly capture these dependencies and support accurate downstream analyses. In this work, we propose a novel frequency-domain framework based on a marginal dynamic Karhunen--Loève expansion. The key idea is to integrate individual spectral densities of the MFTS to construct a marginal spectral operator, whose eigenfunctions yield optimal functional filters. These filters transform complex functional observations into a structured multivariate time series representation, providing a powerful foundation for joint modeling and estimation. Through extensive simulation studies, we demonstrate the superior performance of the proposed approach. We further validate its practical utility through an application to the imputation and forecasting of air pollutant concentration trajectories in China. 2026-03-24T02:28:01Z Stat 15.1 (2026): e70140 Zerui Guo Jianbin Tan Hui Huang 10.1002/sta4.70140 http://arxiv.org/abs/2603.20938v2 Refactor Analysis: Predictive Evaluations of Factor Models and Dimensionality 2026-03-24T00:59:28Z Unidimensional factor models justify some of the most consequential summaries in science -- single scores, single ranks, and single leaderboards -- yet unidimensionality is usually assessed indirectly by fitting and evaluating models on images of the data (e.g., correlation matrices) rather than on the response matrix itself. We introduce Refactor analysis, a data-first evaluation paradigm that converts a one-factor solution into a rank-1 prediction of the original matrix by estimating both respondent- and item-side structure from dual association images. We further introduce Verifactor analysis, which evaluates the same construction under bi-cross-validated (BCV) row-column partitions for improved generalization. In simulations where the data-generating mechanism is truly rank-1 and correlational, Refactor metrics align with classical unidimensionality indices, validating the approach. However, across 200 public dichotomous datasets, traditional fit and unidimensionality measures, though highly intercorrelated, are weakly related to data recoverability, especially out of sample. This gap exposes a methodological vulnerability: excellent image-based fit can coexist with poor data-level explanatory power. Finally, treating the association measure itself as a testable hypothesis, we compare $φ$, tetrachoric, and quadrant correlation, $q^\prime$, an important reintroduction. Quadrant correlation emerges as a simple, interpretable, and remarkably robust alternative, yielding consistently stronger reconstruction and more stable behavior under sample-size variation than commonly used correlations. Together, Refactor and Verifactor shift unidimensionality assessment from "does a one-factor model fit the correlation matrix?" to the question that matters for measurement and benchmarking: does a one-factor dependence structure recover and generalize the observed responses? 2026-03-21T20:41:45Z Michael Hardy http://arxiv.org/abs/2404.15654v5 Autoregressive networks with dependent edges 2026-03-24T00:51:25Z We propose an autoregressive framework for modelling dynamic networks with dependent edges. It encompasses models that accommodate, for example, transitivity, degree heterogenenity, and other stylized features often observed in real network data. By assuming the edges of networks at each time are independent conditionally on their lagged values, the models, which exhibit a close connection with temporal ERGMs, facilitate both simulation and the maximum likelihood estimation in a straightforward manner. Due to the possibly large number of parameters in the models, the natural MLEs may suffer from slow convergence rates. An improved estimator for each component parameter is proposed based on an iteration employing projection, which mitigates the impact of the other parameters (Chang et al., 2021; Chang et al., 2023). Leveraging a martingale difference structure, the asymptotic distribution of the improved estimator is derived without the assumption of stationarity. The limiting distribution is not normal in general, although it reduces to normal when the underlying process satisfies some mixing conditions. Illustration with a transitivity model was carried out in both simulation and a real network data set. 2024-04-24T05:16:12Z 33 pages, 2 tables, 3 figures Jinyuan Chang Qin Fang Eric D. Kolaczyk Peter W. MacDonald Qiwei Yao http://arxiv.org/abs/2603.22668v1 Fixed-level calibration of the Cauchy combination test 2026-03-24T00:42:51Z The Cauchy combination test (CCT) is widely used because it gives a closed-form combined $p$-value and is known to be asymptotically valid as the nominal level $α\downarrow0$ under broad dependence structures. We study a different asymptotic question: whether the usual Cauchy cutoff remains accurate at an ordinary fixed level when the number $K$ of combined $p$-values grows under dependence. Under a canonical one-factor equicorrelated Gaussian copula model, we show that the raw CCT is generally not asymptotically exact at fixed $α$. With fixed positive correlation, the statistic converges to a random latent-factor limit, so there is no universal fixed-level reference law. When the common correlation $ρ_K$ weakens with $K$, fixed-level behaviour is governed by the boundary-layer scale $s_K=\sqrt{ρ_K}(\log K)^{3/2}$, and the raw CCT is asymptotically exact if and only if $ρ_K(\log K)^3\to0$. Because the size distortion arises entirely from the reference law and not from the statistic, it can be corrected without modifying the test statistic itself. We propose the boundary-layer calibrated CCT (BL-CCT), which replaces the standard Cauchy reference by a one-parameter Gaussian-smoothed Cauchy family while keeping the statistic unchanged. This reference-law correction is fundamentally different from existing approaches that modify the test statistic. BL-CCT is asymptotically exact under the weaker condition $ρ_K\log K\to0$ and provides a useful finite-$K$ approximation on bounded boundary layers. Numerical experiments support the theory. 2026-03-24T00:42:51Z Hirofumi Ota http://arxiv.org/abs/2603.22636v1 When lookout sees crackle: Anomaly detection via kernel density estimation 2026-03-23T23:28:20Z We present an updated version of lookout -- an algorithm for detecting anomalies using kernel density estimates with bandwidth based on Rips death diameters -- with theoretical guarantees. The kernel density estimator for updated lookout is shown to be consistent, and the proposed multivariate scaling is robust and efficient. We show our updated algorithm performs better than the previous version on diverse examples. 2026-03-23T23:28:20Z 30 pages Rob J Hyndman Sevvandi Kandanaarachchi Katharine Turner http://arxiv.org/abs/2509.12066v2 On the universal calibration of heavy-tailed combination tests 2026-03-23T23:03:16Z It is often of interest to test a global null hypothesis using multiple, possibly dependent $p$-values by combining their strengths while controlling the type-I error. Recently, several heavy-tailed combination tests, such as the harmonic mean test and the Cauchy combination test, have been proposed: they transform $p$-values into heavy-tailed random variables before combining them into a single test statistic. The resulting tests, which are calibrated under some form of independence assumption among the $p$-values, have been shown to be rather robust to dependence asymptotically as the $α$ level gets small. Yet, it has remained an open problem to understand this general phenomenon and characterize how such tests behave under dependence. Using the framework of multivariate regular variation from extreme value theory, we show that for a class of combination tests that are homogeneous, the asymptotic level of the test can be expressed using the angular measure under multivariate regular variation. This measure characterizes the dependence of the transformed heavy-tailed variables in their upper tails, or equivalently, the dependence of the $p$-values near zero. We use this result to study several tests. The harmonic mean test, which coincides with the Pareto linear combination test, is shown to be universally calibrated regardless of the tail dependence; further, this test is shown to be the only one that achieves universal calibration among all homogeneous heavy-tailed combination tests. In contrast, the Cauchy combination test is shown to be universally honest but often conservative; the Dunn-Šidák correction, also known as the Tippett's method, while being honest, is calibrated if and only if the underlying $p$-values are independent near zero. These theoretical findings are corroborated with simulations and an application to independence testing with survey data. 2025-09-15T15:49:18Z 5 figures, 44 pages Parijat Chakraborty F. Richard Guo Kerby Shedden Stilian Stoev http://arxiv.org/abs/2603.14561v3 Refined Inference for Asymptotically Linear Estimators with Non-Negligible Second-Order Remainders 2026-03-23T22:55:04Z Asymptotically linear estimators in semiparametric models use a von Mises expansion to reduce inference to the influence-function variance. This reduction is valid when the second-order remainder is negligible in variance, a condition that is not implied by the product-rate requirement guaranteeing asymptotic linearity. When the remainder contributes non-negligible variance, the standard sandwich underestimates the total sampling variance and Wald intervals undercover; we call this the \emph{near-boundary regime}. We derive a finite-sample variance decomposition separating influence-function and remainder components, characterize the sandwich consistency condition sharply, and establish asymptotic validity of the leave-one-out jackknife and pairs cluster bootstrap in this regime. Jackknife validity follows from a self-normalization argument; bootstrap validity is established directly. An analytic formula for the amplification of the sandwich gap by intra-cluster correlation is derived for clustered data. A simulation study using a surrogate-assisted targeted learning estimator in stepped-wedge cluster-randomized trials illustrates the regime: the variance ratio $\hat{V}_{\rm JK}/\hat{V}_{\rm Sand}$ is 1.14--1.38 and persistent across cluster counts, and the refined procedures substantially improve coverage. 2026-03-15T19:23:26Z 22 page, 1 supplement Lin Li Pengcheng Wu http://arxiv.org/abs/2603.12058v2 Low-Rank and Sparse Drift Estimation for High-Dimensional Lévy-Driven Ornstein--Uhlenbeck Processes 2026-03-23T22:28:41Z We study high-dimensional Ornstein--Uhlenbeck processes driven by Lévy noise and consider drift matrices that decompose into a low-rank plus sparse component, capturing a few latent factors together with a sparse network of direct interactions. For discrete-time observations under the localized, truncated contrast of Dexheimer and Jeszka, we analyze a convex estimator that minimizes this contrast with a combined nuclear-norm and $\ell_1$-penalty on the low-rank and sparse parts, respectively. Under a restricted strong convexity condition, a rank--sparsity incoherence assumption, and regime-specific choices of truncation level, horizon, and sampling mesh for the background driving Lévy process, we derive a non-asymptotic oracle inequality for the Frobenius risk of the estimator. The bound separates a discretization bias term of order $d^2Δ_n^2$ from a stochastic term of order $γ(Δ_n)T^{-1}(r \log d + s \log d)$, thereby showing that the low-rank-plus-sparse structure improves the dependence on the ambient dimension relative to purely sparse estimators while retaining the same discretization and truncation behavior across the four Lévy regimes. 2026-03-12T15:26:35Z Marina Palaisti http://arxiv.org/abs/2603.22594v1 Making Effective Statistical Inferences: From Significance Testing to the Open Science Inference Ecosystem (2016-2026) 2026-03-23T21:38:25Z Statistical inference has undergone a profound transformation over the past decade, evolving from a significance-testing paradigm toward a comprehensive, transparency-driven framework embedded within the broader open science ecosystem. While traditional approaches such as null hypothesis significance testing (NHST) remain widely used, they have been increasingly criticised for fostering dichotomous thinking, misinterpretation, and irreproducible findings. This review synthesises developments from 2016 to 2026, integrating methodological advances-including compatibility-based interpretation of p-values, S-values, equivalence testing with smallest effect sizes of interest (SESOI), Bayesian workflow, and sequential inference using e-values-with systemic reforms such as preregistration, Registered Reports, multiverse analysis, and updated reporting standards (PRISMA 2020, CONSORT 2025). A central contribution of this article is the conceptual unification of statistical inference into two complementary domains: evidence-centric inference, which quantifies compatibility between data and models, and decision-centric inference, which guides actions under uncertainty. By embedding statistical tools within transparent and reproducible research workflows, the modern inferential paradigm moves beyond single-metric evaluation toward a multidimensional assessment of evidence and practical relevance. 2026-03-23T21:38:25Z 22 pages, 1 Figure, 3 tables Aswini Kumar Patra http://arxiv.org/abs/2603.22569v1 Proxy-Reliance Control in Conformal Recalibration of One-Sided Value-at-Risk 2026-03-23T20:59:44Z We introduce a proxy-reliance-controlled conformal recalibration framework for one-sided Value-at-Risk (VaR), and study a question that existing state-aware methods do not usually isolate: how strongly should the recalibration adjustment depend on an imperfect volatility proxy? We formalize this through a proxy-reliance parameter that continuously interpolates between an approximately constant-shift correction and a fully proxy-scaled correction. This makes proxy reliance a distinct and practically interpretable design choice in one-sided VaR recalibration. We show theoretically that larger proxy reliance increases the responsiveness of the tail adjustment to proxy scale, but also increases stressed-state fragility when the proxy underreacts. Empirically, in rolling out-of-sample tests on a six-ETF panel with VIX-linked state variables, and with supporting evidence from SPY, we find that the empirical value of proxy-reliance control lies in improved stressed-state robustness rather than uniform overall dominance. In particular, when the baseline forecast remains exposed to proxy imperfection in stressed states, lower or intermediate proxy reliance can outperform fully proxy-scaled recalibration in stressed left-tail VaR control. 2026-03-23T20:59:44Z 44 pages, 4 figures, 9 tables, appendix included Tenghan Zhong http://arxiv.org/abs/2601.17145v3 Optimal Design under Interference, Homophily, and Robustness Trade-offs 2026-03-23T20:02:13Z To minimize the mean squared error (MSE) in global average treatment effect (GATE) estimation under network interference, a popular approach is to use a cluster-randomized design. However, in the presence of homophily, which is common in social networks, cluster randomization can instead increase the MSE. We develop a novel potential outcomes model that accounts for interference, homophily, and heterogeneous variation. In this setting, we establish a framework for optimizing designs for worst-case MSE under the Horvitz-Thompson estimator. This leads to an optimization problem over the covariance matrices of the treatment assignment, trading off interference, homophily, and robustness. We frame and solve this problem using two complementary approaches. The first involves formulating a semidefinite program (SDP) and employing Gaussian rounding, in the spirit of the Goemans-Williamson approximation algorithm for MAXCUT. The second is an adaptation of the Gram-Schmidt Walk, a vector-balancing algorithm which has recently received much attention. Finally, we evaluate the performance of our designs through various experiments on simulated network data and a real village network dataset. 2026-01-23T19:50:50Z Vydhourie Thiyageswaran Alex Kokot Jennifer Brennan Marina Meila Christina Lee Yu Maryam Fazel http://arxiv.org/abs/2603.22540v1 Variable Selection in Functional Linear Quantile Regression for Identifying Associations between Daily Patterns of Physical Activity and Cognitive Function 2026-03-23T20:01:12Z Quantile regression is useful for characterizing the conditional distribution of a response variable and understanding heterogeneity in the covariate effects at different quantiles. The rise of high-dimensional physiological data in biomedical research through wearable and sensor devices underscores the need for effective variable selection methods for interpretable and accurate quantile regression, which can offer robust insights into heterogeneous and dynamic covariate effects. We develop a flexible variable selection approach for functional linear quantile regression with multiple functional and scalar predictors. We use a smooth approximation of the quantile loss function and integrate functional principal component analysis (FPCA) with a group minimax concave penalty (MCP) to impose sparsity on the functional coefficients. A computationally efficient group descent algorithm is employed for optimization. Through numerical simulations, we demonstrate a satisfactory selection, estimation, and prediction accuracy of the proposed method across different quantiles for both dense and sparsely observed functional data. The proposed method is applied to accelerometer data from the 2011-2014 National Health and Nutrition Examination Survey (NHANES) to identify key time-varying distributional patterns of physical activity and demographic predictors associated with cognitive function across different quantiles. Our analysis provides new insights into the complex relationship between the daily distributional patterns of physical activity and cognitive function among older adults, capturing heterogeneous associations across different quantiles. 2026-03-23T20:01:12Z Yuanzhen Yue Stella Self Yichao Wu Jiajia Zhang Rahul Ghosal http://arxiv.org/abs/2601.13419v2 Pathway-based Bayesian factor models for 'omics data 2026-03-23T19:39:34Z Interpreting RNA-sequencing data requires identifying coordinated gene expression patterns that correspond to biological pathways. Standard factor models provide useful dimension reduction but typically ignore existing pathway knowledge or incorporate it through restrictive assumptions, limiting interpretability, and reproducibility. Here, we develop Bayesian Analysis with gene-Sets Informed Latent space (BASIL), a scalable framework for analyzing transcriptomic data that integrates annotated gene sets into latent variable inference. BASIL places structured priors on factor loadings, shrinking them toward combinations of annotated gene sets, enhancing biological interpretability and stability, while simultaneously learning new unstructured components. BASIL provides accurate covariance estimates and uncertainty quantification, without resorting to computationally expensive Markov chain Monte Carlo sampling, by exploiting a pre-training approach that pre-estimates the latent factors. An automatic empirical Bayes procedure eliminates the need for manual hyperparameter tuning, promoting reproducibility and usability in practice. Applying BASIL to the global fever transcriptomic cohort uncovers interpretable host-response modules, with phosphoinositide signaling and interferon-driven inflammation emerging as key drivers of gene-expression variability. 2026-01-19T22:03:12Z Lorenzo Mauri Federica Stolf Amy H. Herring Cameron Miller David B. Dunson http://arxiv.org/abs/2509.15197v2 Consistent Bayesian causal discovery for structural equation models with equal error variances 2026-03-23T18:09:26Z We consider the problem of recovering the true causal structure among a set of variables, generated by a linear acyclic structural equation model (SEM) with the error terms being independent, not necessarily Gaussian, and having equal variances. It is well-known that the true underlying directed acyclic graph (DAG) encoding the causal structure is uniquely identifiable under this assumption. Interestingly, in this setting, it further holds that the sum of minimum expected squared errors for every variable, while predicted by the best linear combination of its parent variables, is minimised if and only if the causal structure is represented by any supergraph of the true DAG. In this work, we propose a Bayesian DAG selection method, where the working model assumes Gaussian SEM with equal error variances, and employ independent g-priors on each set of SEM coefficients. Furthermore, we utilise the aforementioned key property to establish that the proposed method recovers the true graph consistently without any additional distributional assumption, and illustrate it with a simulation study. 2025-09-18T17:52:26Z Anamitra Chaudhuri Yang Ni Anirban Bhattacharya