https://arxiv.org/api/I0YhWWtWL+yPkNq5Q6YCOQ9MIPg 2026-03-21T03:43:37Z 34634 165 15 http://arxiv.org/abs/2410.17046v2 Mesoscale two-sample testing for networks 2026-03-13T14:34:50Z Networks arise naturally in many scientific fields as a representation of pairwise connections. Statistical network analysis has most often considered a single large network, but it is common in a number of applications to observe multiple networks on a shared node set. When these networks are grouped by case-control status or another categorical covariate, the classical statistical question of two-sample comparison arises. In this work, we address the problem of testing for statistically significant differences in a given arbitrary subset of connections. This general framework allows an analyst to focus on a single node, a specific region of interest, or compare whole networks. Our ability to conduct ``mesoscale'' testing on a meaningful group of edges is particularly relevant for applications such as neuroimaging and distinguishes our approach from prior work, which tends to focus either on a single node or the whole network. In this mesoscale setting, we develop statistically sound projection-based tests for two-sample comparison in both weighted and binary edge networks. The key to our approach is to leverage network information from outside the set of interest to learn informative low-rank projections which leads to more powerful tests. 2024-10-22T14:24:38Z 59 pages, 9 figures Peter W. MacDonald Elizaveta Levina Ji Zhu http://arxiv.org/abs/2603.13009v1 TwoTimeScales: An R-package for Smoothing Hazards with Two Time Scales 2026-03-13T14:16:03Z Background: Time-to-event data with multiple time scales are observed in many epidemiological and clinical studies. While models that allow for simultaneous consideration of multiple time scales for the hazard of an event have been proposed, their use is still not wide-spread in applied research. One reason for this might be the lack of convenient statistical software to estimate such models. Here we introduce the R-package TwoTimeScales. The package provides tools to estimate models for hazards that vary smoothly over two time scales, including proportional hazards models with such a two-dimensional baseline hazard. Extensions to competing risks models are implemented as well. Methodology is based on two-dimensional smoothing with P-splines. Results: We demonstrate the features of the R-package by analysing a freely available dataset containing post-surgery follow-up data on patients with breast cancer. We present two examples, a proportional hazards regression and a competing risks problem. Besides estimation, we illustrate the plotting utilities of the package. Conclusion: The R-package TwoTimeScales can be easily used to fit flexible hazard models with two time scales, allowing new perspectives in the analysis of time-to-event data with multiple time scales. 2026-03-13T14:16:03Z 15 pages, 6 figures Angela Carollo Paul H. C. Eilers Hein Putter Jutta Gampe http://arxiv.org/abs/2507.14389v2 Spatiotemporal Autoregressive Models for Areal Compositional Data 2026-03-13T13:54:35Z Compositional data, such as regional shares of economic sectors or property transactions, are central to understanding structural change in economic systems across space and time. This paper introduces a spatiotemporal multivariate autoregressive model tailored for panel data with composition-valued responses at each areal unit and time point. The proposed framework enables the joint modelling of temporal dynamics and spatial dependence under compositional constraints, and is estimated via a quasi-maximum likelihood approach. We build on recent theoretical advances to establish the identifiability and asymptotic properties of the estimator as both the number of regions and the number of time points grow. The utility and flexibility of the model are demonstrated through two applications: analysing property transaction compositions in an intra-city housing market (Berlin), and regional sectoral compositions in Spain's economy. These case studies highlight how the proposed framework captures key features of spatiotemporal economic processes that are often missed by conventional methods. 2025-07-18T22:31:18Z Matthias Eckardt Philipp Otto http://arxiv.org/abs/2303.07167v4 When Respondents Don't Care Anymore: Identifying the Onset of Careless Responding 2026-03-13T12:41:17Z Questionnaires in the behavioral sciences tend to be lengthy. However, literature suggests that survey length is a contributing factor to careless responding, with longer questionnaires yielding higher probability that participants start responding carelessly. Consequently, in long surveys a large number of participants may engage in careless responding, posing a major threat to internal validity. We propose a novel method for identifying the onset of careless responding (or an absence thereof) that searches for a changepoint in combined measurements of multiple dimensions in which carelessness may manifest, such as inconsistency and invariability. It is highly flexible, based on machine learning, and provides statistical guarantees for controlling the false positive rate. In simulation experiments, the proposed method achieves high accuracy in identifying carelessness onset and discriminates well between attentive and various types of careless responding, even when a large number of careless respondents are present. An empirical application highlights how identifying partial carelessness uncovers novel insights on careless responding behavior. Furthermore, we provide the freely available open source software package "carelessonset" to facilitate adoption by empirical researchers. 2023-03-13T15:10:30Z Max Welz Andreas Alfons http://arxiv.org/abs/2208.13701v5 Data-Driven Influence Functions for Optimization-Based Causal Inference 2026-03-13T11:10:14Z We study a constructive algorithm that approximates Gateaux derivatives for statistical functionals by finite differencing, with a focus on functionals that arise in causal inference. We study the case where probability distributions are not known a priori but need to be estimated from data. These estimated distributions lead to empirical Gateaux derivatives, and we study the relationships between empirical, numerical, and analytical Gateaux derivatives. Starting with a case study of the interventional mean (average potential outcome), we delineate the relationship between finite differences and the analytical Gateaux derivative. We then derive requirements on the rates of numerical approximation in perturbation and smoothing that preserve the statistical benefits of one-step adjustments, such as rate double robustness. We then study more complicated functionals such as dynamic treatment regimes, the linear-programming formulation for policy optimization in infinite-horizon Markov decision processes, and sensitivity analysis in causal inference. More broadly, we study optimization-based estimators, since this begets a class of estimands where identification via regression adjustment is straightforward but obtaining influence functions under minor variations thereof is not. The ability to approximate bias adjustments in the presence of arbitrary constraints illustrates the usefulness of constructive approaches for Gateaux derivatives. We also find that the statistical structure of the functional (rate double robustness) can permit less conservative rates for finite-difference approximation. This property, however, can be specific to particular functionals; e.g., it occurs for the average potential outcome (hence average treatment effect) but not the infinite-horizon MDP policy value. 2022-08-29T16:16:22Z Revision Michael I. Jordan Yixin Wang Angela Zhou http://arxiv.org/abs/2506.20021v2 Speeding up the ordered allocation sampler 2026-03-13T10:53:37Z The ordered allocation sampler is a Gibbs sampler designed to explore the posterior distribution in nonparametric mixture models. It encompasses both infinite mixtures and finite mixtures with random number of components, and it has be shown to possess mixing properties that pair well with collapsed, or marginal, samplers that integrate out the mixing distribution. The main advantage is that it adapts to mixing priors that do not enjoy tractable predictive structures needed for the implementation of marginal sampling methods. Thus it is as widely applicable as other conditional samplers while enjoying better algorithmic performances. In this paper we provide a modification of the ordered allocation sampler that enhances its performances in a substantial way while easing its implementation. In addition, exploiting the similarity with marginal samplers, we are able to adapt to the new version of the sampler the split-merge moves of Jain and Neal. Simulation studies confirm these findings. 2025-06-24T21:36:36Z Change from v1: added acknowledgment Maria F. Gil-Leyva Fidel Selva Pierpaolo De Blasi http://arxiv.org/abs/2603.12881v1 Multivariate lattice deformation: A spatially explicit framework for assessing crop rotation impacts on soil nutrient dynamics 2026-03-13T10:32:45Z Crop rotation impacts on soil nutrients are typically assessed using field-averaged or single-nutrient analyses that ignore spatial heterogeneity and multivariate interactions. We propose a multivariate lattice model treating soil as a 4D tensor (space, time, and N, P, K channels). Crop rotations are represented as force vectors, with soil buffering capacity ("stiffness") varying spatially with texture. Lateral nutrient movement is introduced via kernel smoothing. Cumulative impact is quantified by Euclidean distance in N-P-K space, with significance assessed via Cramer-von Mises permutation tests. Simulating a three-year corn-soybean-wheat rotation on a 20 x 20 heterogeneous grid shows mean stress of 0.63 after one cycle, with maximum 0.91 in sandy areas. Phosphorus depletion (17.9%) exceeds nitrogen (10.8%), dominating stress in 19% of cells - obscured by single-nutrient analyses. Continuous corn increases mean stress by 41%. Cramer-von Mises tests detect significant deviation (p <= 0.002), and Moran's I (0.29-0.30) confirms spatial autocorrelation. Our framework identifies risk zones and guides site-specific management, bridging geostatistics with mechanistic crop models. 2026-03-13T10:32:45Z 34 pages, 2 figures Marco Mandap http://arxiv.org/abs/2603.11829v2 Robust Sequential Hypothesis Testing with Generalized Estimating Equations for Incomplete Clustered and Longitudinal Data 2026-03-13T10:14:09Z Existing sequential generalized estimating equation methodology for longitudinal and group-correlated data focuses on narrow hypotheses concerning treatment efficacy and often makes modeling assumptions that impede the desirable robustness of the involved test statistics. Drawing upon the well-established theory of incremental information gain for well-posed sequential analyses, we develop an approach that does not rely on modeling assumptions that infringe upon the robustness of the resulting estimators while simultaneously testing a much wider range of hypotheses. Our methodology provides general submatrix-level asymptotic theory for the evaluation of joint covariance matrices of sequential test statistics. Moreover, this framework allows us to construct a novel approach to computing efficacy boundaries, the likes of which can be estimated with greater precision at later interim times. These constructions also accommodate accessible multiple imputation procedures, thereby allowing for our approach to be applied to incomplete datasets. Type I error and power are assessed through a series of comprehensive simulations mirroring the simulations of recent work to facilitate a proper comparison. We conclude by applying our methods to a dataset from a longitudinal study concerning the impact of race on the efficacy a treatment for hepatitis C. 2026-03-12T11:48:50Z VERSION 2: First version accidentally used older abbreviated title, this has been corrected. 24 pages; 1 figure Nathan T. Provost Abdus S. Wahed http://arxiv.org/abs/2603.12867v1 Breaking the Winner's Curse with Bayesian Hybrid Shrinkage 2026-03-13T10:13:38Z The widespread adoption of randomized controlled trials (A/B Tests) for decision-making has introduced a pervasive "Winner's Curse": experiments selected for launch often exhibit upwardly biased effect estimates and invalid confidence intervals. This selection bias leads to over-optimistic impact projections and undermines decision-making, particularly in low-power regimes. We propose Bayesian Hybrid Shrinkage (BHS), an empirical Bayes (EB) framework that leverages data-driven priors to mitigate selection bias and provides accurate uncertainty quantification. Unlike traditional EB methods that apply uniform shrinkage, BHS introduces an experiment-specific "local" shrinkage factor that incorporates individual experiment characteristics, improving robustness against prior misspecification. We also derive a closed-form inference strategy designed for high-throughput production environments. Extensive simulations and real-world evaluations at Meta Platforms demonstrate that BHS outperforms existing methods in terms of bias reduction and interval coverage, even under substantial violations of modeling assumptions. 2026-03-13T10:13:38Z Richard Mudd Abbas Zaidi Rina Friedberg Ilya Gorbachev Anchal Choubey Houssam Nassif http://arxiv.org/abs/2603.12843v1 The geometry of Stein's method of moments: A canonical decomposition via score matching 2026-03-13T09:45:03Z In this paper, we elucidate the geometry of Stein's method of moments (SMoM). SMoM is a parameter estimation method based on the Stein operator, and yields a wide class of estimators that do not depend on the normalizing constant. We present a canonical decomposition of an SMoM estimator after centering the score matching estimator, which sheds light on the central role of the score matching within the SMoM framework. Using this decomposition, we construct an SMoM estimator that improves upon the score matching estimator in the asymptotic variance. We also discuss the connection between SMoM and the Wasserstein geometry. Specifically, using the Wasserstein score function, we provide a geometrical interpretation of the gap in the asymptotic variance between the score matching estimator and the maximum likelihood estimator. Furthermore, it is shown that the score matching estimator is asymptotically efficient if and only if the Fisher score functions span the same space as the Wasserstein score functions. 2026-03-13T09:45:03Z Mitsuki Nagai Keisuke Yano http://arxiv.org/abs/2603.12753v1 Balancing the privacy-utility trade-off: How to draw reliable conclusions from private data 2026-03-13T07:54:08Z Absolute anonymization, conceived as an irreversible transformation that prevents re-identification and sensitive value disclosure, has proven to be a broken promise. Consequently, modern data protection must shift toward a privacy-utility trade-off grounded in risk mitigation. Differential Privacy (DP) offers a rigorous mathematical framework for balancing quantified disclosure risk with analytical usefulness. Nevertheless, widespread adoption remains limited, largely because effective translation of complex technical concepts, such as privacy-loss parameters, into forms meaningful to non-technical stakeholders has yet to be achieved. This difficulty arises from the inherent use of randomization: both legitimate analysts and potential adversaries must draw conclusions from uncertain observations rather than deterministic values. In this work, we propose a new interpretation of the privacy-utility trade-off based on hypothesis testing. This perspective explicitly accounts for the uncertainty introduced by randomized mechanisms in both membership inference scenarios and general data analysis. In particular, we introduce the concept of relative disclosure risk to quantify the maximum reduction in uncertainty an adversary can obtain from protected outputs, and we show that this measure is directly related to standard privacy-loss parameters. At the same time, we analyze how DP affects analytical validity by studying its impact on hypothesis tests commonly used to assess the statistical significance of empirical results. Finally, we provide practical guidance, accessible to non-experts, for navigating the privacy-utility trade-off, aiding in the selection of suitable protection mechanisms and the values for the privacy-loss parameters. 2026-03-13T07:54:08Z Raphaël de Fondeville http://arxiv.org/abs/2603.12561v1 Consistent and powerful CUSUM change-point test for panel data with changes in variance 2026-03-13T01:49:54Z This paper investigates change-point of variance in panel data models with time series of $α$-mixing. Based on the cumulative sum (CUSUM) method and the individual differences, we construct a CUSUM test for panel data models to detect variance changes. Under the null hypothesis, we derive the limit distribution of this test, which can be used to detect the change-point of variance. Under the alternative hypothesis, the limit behavior of the CUSUM test is also derived. To validate the performance of the test, we conducted simulation analyses on with Gaussian and Gamma errors. The results demonstrate that this testing method significantly outperforms existing approaches, particularly in detecting sparse variance changes. Finally, we conducted a practical case study using panel data from the Shanghai Shenzhen CSI 300 Index Components. Not only did we successfully identify the change-points of variance, but we also delved deeper into the underlying economic drivers behind these changes. 2026-03-13T01:49:54Z Wenzhi Yang Yueting Xu Xiaoping Shi Qiong Li http://arxiv.org/abs/2603.12523v1 Inference for function-on-function regression: central limit theorem and residual bootstrap 2026-03-12T23:50:53Z We investigate asymptotic inference in a linear regression model where both response and regressors are functions, using an estimator based on functional principal components analysis. Although this approach is widely used in functional data analysis, there remains significant room for developing its asymptotic properties for function-on-function regression. Our study targets the mean response at a new regressor with two primary aims. First, we refine the existing central limit theorem by relaxing certain technical conditions, which include generalizing the scaling factor, resulting in incorporating a broader class of random functions beyond those having scores with independence or finite higher moments. Second, we introduce a residual bootstrap method that enhances the calibration of various confidence sets for quantities related to mean response, while its consistency is rigorously verified. Numerical studies compare the finite sample performance of both asymptotic and bootstrap approaches, demonstrating higher accuracy of the latter. To illustrate bootstrap inference for mean response, we apply it to the Canadian weather dataset. 2026-03-12T23:50:53Z Hyemin Yeon http://arxiv.org/abs/2603.12518v1 Gaussian and bootstrap approximations for functional principal component regression 2026-03-12T23:32:48Z Asymptotic inference using functional principal component regression (FPCR) has long been considered difficult, largely because, upon any scalar scaling, the FPCR estimator fails to satisfy a central limit theorem, leading to the prevailing belief that it is unsuitable for direct statistical inference. In this paper, we upend this traditional viewpoint by establishing a new result: upon suitable operator scaling, valid Gaussian and bootstrap approximations hold for the FPCR estimator. We apply this surprising finding to hypothesis testing for the significance of the slope function in functional regression models and demonstrate the strong numerical performance of the resulting tests. While concise, our results yield powerful inferential tools for functional regression. We believe it paves the way for new lines of inferential methodology for more complex functional regression settings. 2026-03-12T23:32:48Z Hyemin Yeon http://arxiv.org/abs/2508.21742v2 Orientability of Causal Relations in Time Series using Summary Causal Graphs and Faithful Distributions 2026-03-12T20:30:17Z Understanding causal relations between temporal variables is a central challenge in time series analysis, particularly when the full causal structure is unknown. Even when the full causal structure cannot be fully specified, experts often succeed in providing a high-level abstraction of the causal graph, known as a summary causal graph, which captures the main causal relations between different time series while abstracting away micro-level details. In this work, we present conditions that guarantee the orientability of micro-level edges between temporal variables given the background knowledge encoded in a summary causal graph and assuming having access to a faithful and causally sufficient distribution with respect to the true unknown graph. Our results provide theoretical guarantees for edge orientation at the micro-level, even in the presence of cycles or bidirected edges at the macro-level. These findings offer practical guidance for leveraging SCGs to inform causal discovery in complex temporal systems and highlight the value of incorporating expert knowledge to improve causal inference from observational time series data. 2025-08-29T16:08:35Z Accepted to AISTATS 2026 Timothée Loranchet Charles K. Assaad