https://arxiv.org/api/7zmvUm/5ZzC961ExWbQuVNd1+dA 2026-04-06T08:32:32Z 34888 270 15 http://arxiv.org/abs/2512.10069v2 Information Borrowing from Partially Compatible Trajectories for Estimation of Dynamic Treatment Regimes 2026-03-25T16:27:39Z Dynamic Treatment Regimes (DTRs) provide a systematic framework for optimizing sequential decision-making in chronic disease management, where therapies must adapt to patients' evolving clinical profiles. Inverse probability weighting (IPW) is a cornerstone methodology for estimating regime values from observational data due to its intuitive formulation and established theoretical properties, yet standard IPW estimators face significant limitations, including variance instability and data inefficiency. A fundamental but underexplored source of inefficiency lies in the strict alignment requirement between observed and target treatment trajectories, which fails to account for partial compatibility and discards substantial information from individuals with only minimal deviations from the regime. We propose two novel methodologies that relax the strict inclusion rule through flexible compatibility mechanisms. Both methods provide computationally tractable alternatives that can be easily integrated into existing IPW workflows, offering more efficient approaches to DTR estimation. Theoretical analysis demonstrates that both estimators preserve consistency while achieving superior finite-sample efficiency compared to standard IPW, and comprehensive simulation studies confirm improved stability. We illustrate the practical utility of our methods through an application to HIV treatment data from the AIDS Clinical Trials Group Study 175 (ACTG175). 2025-12-10T20:43:40Z Chloe Si David A. Stephens Erica E. M. Moodie http://arxiv.org/abs/2603.24439v1 Distributionally balanced sampling designs via minimum tactical configurations 2026-03-25T15:50:57Z Distributionally balanced sampling designs are low-discrepancy probability designs obtained by minimizing the expected discrepancy between the auxiliary-variable distribution of a random sample and the target population distribution. Existing constructions rely on circular population sequences, which restrict the design space by forcing samples to be contiguous blocks of a sequence. We propose a new construction based on minimum tactical configurations that removes this topological constraint. The resulting designs are fixed-size, have equal inclusion probabilities, and belong to the class with minimum feasible configuration size. We develop both a simple initialization valid for arbitrary population and sample sizes and a spatial initialization that yields a lower initial expected discrepancy, together with a simulated annealing algorithm for optimization within this class. In simulations and empirical examples, the proposed method outperforms state-of-the-art alternatives in terms of distributional fit, balance, and spatial spread. 2026-03-25T15:50:57Z 15 pages, 3 figures Anton Grafström Wilmer Prentius http://arxiv.org/abs/2603.24421v1 E-values as statistical evidence: A comparison to Bayes factors, likelihoods, and p-values 2026-03-25T15:32:53Z A recurring debate in the philosophy of statistics concerns what, exactly, should count as a measure of evidence for or against a given hypothesis. P-values, likelihood ratios, and Bayes factors all have their defenders. In this paper we add two additional candidates to this list: the e-value and its sequential analogue, the e-process. E-values enjoy several desirable properties as measures of evidence: they combine naturally across studies, handle composite hypotheses, provide long-run error rates, and admit a useful interpretation as the wealth accrued by a bettor in a game against the null distribution. E-processes additionally handle optional stopping and optional continuation. This work examines the extent to which e-values and e-processes satisfy the evidential desiderata of different statistical traditions, concluding that they combine attractive features of p-values, likelihood ratios, and Bayes factors, and merit serious consideration as interpretable and intuitive measures of statistical evidence. 2026-03-25T15:32:53Z 34 pages Ben Chugg Aaditya Ramdas Peter Grünwald http://arxiv.org/abs/2506.03462v2 Robust domain selection for functional data via interval-wise testing and effect size mapping 2026-03-25T15:28:38Z Among inferential problems in functional data analysis, domain selection is one of the practical interests aiming to identify sub-interval(s) of the domain where desired functional features are displayed. Motivated by applications in quantitative ultrasound signal analysis, we propose the robust domain selection method, particularly aiming to discover a subset of the domain presenting distinct behaviors on location parameters among different groups. By extending the interval testing approach, we propose to take into account multiple aspects of functional features simultaneously to detect the practically interpretable domain. To further handle potential outliers and missing segments on collected functional trajectories, we perform interval testing with a test statistic based on functional M-estimators for the inference. In addition, we introduce the effect size heatmap by calculating robustified effect sizes from the lowest to the largest scales over the domain to reflect dynamic functional behaviors among groups so that clinicians get a comprehensive understanding and select practically meaningful sub-interval(s). The performance of the proposed method is demonstrated through simulation studies and an application to motivating quantitative ultrasound measurements. 2025-06-04T00:01:16Z Journal of the Royal Statistical Society Series C: Applied Statistics (2026) Yeonjoo Park Aiguo Han 10.1093/jrsssc/qlag014 http://arxiv.org/abs/2012.08371v3 Limiting laws and consistent estimation criteria for fixed and diverging number of spiked eigenvalues 2026-03-25T15:10:49Z In this paper, we study limiting laws and consistent estimation criteria for the extreme eigenvalues in a spiked covariance model of dimension $p$. Firstly, for fixed $p$, we propose a generalized estimation criterion that can consistently estimate, $k$, the number of spiked eigenvalues. Compared with the existing literature, we show that consistency can be achieved under weaker conditions on the penalty term. Next, allowing both $p$ and $k$ to diverge, we derive limiting distributions of the spiked sample eigenvalues using random matrix theory techniques. Notably, our results do not require the spiked eigenvalues to be uniformly bounded from above or tending to infinity, as have been assumed in the existing literature. Based on the above derived results, we formulate a generalized estimation criterion and show that it can consistently estimate $k$, while $k$ can be fixed or grow at an order of $k=o(n^{1/3})$. We further show that the results in our work continue to hold under a general population distribution without assuming normality. The efficacy of the proposed estimation criteria is illustrated through comparative simulation studies. 2020-12-15T15:36:03Z Jianwei Hu Jingfei Zhang Jianhua Guo Ji Zhu http://arxiv.org/abs/2603.24392v1 Federated fairness-aware classification under differential privacy 2026-03-25T15:09:12Z Privacy and algorithmic fairness have become two central issues in modern machine learning. Although each has separately emerged as a rapidly growing research area, their joint effect remains comparatively under-explored. In this paper, we systematically study the joint impact of differential privacy and fairness on classification in a federated setting, where data are distributed across multiple servers. Targeting demographic disparity constrained classification under federated differential privacy, we propose a two-step algorithm, namely FDP-Fair. In the special case where there is only one server, we further propose a simple yet powerful algorithm, namely CDP-Fair, serving as a computationally-lightweight alternative. Under mild structural assumptions, theoretical guarantees on privacy, fairness and excess risk control are established. In particular, we disentangle the source of the private fairness-aware excess risk into a) intrinsic cost of classification, b) cost of private classification, c) non-private cost of fairness and d) private cost of fairness. Our theoretical findings are complemented by extensive numerical experiments on both synthetic and real datasets, highlighting the practicality of our designed algorithms. 2026-03-25T15:09:12Z Gengyu Xue Yi Yu http://arxiv.org/abs/2503.13191v2 Stein's method of moment estimators for local dependency exponential random graph models 2026-03-25T14:43:36Z Providing theoretical guarantees for parameter estimation in exponential random graph models is a largely open problem. While maximum likelihood estimation has theoretical guarantees in principle, verifying the assumptions for these guarantees to hold can be very difficult. Moreover, in complex networks, numerical maximum likelihood estimation is computer-intensive and may not converge in reasonable time. To ameliorate this issue, local dependency exponential random graph models have been introduced, which assume that the network consists of many independent exponential random graphs. In this setting, progress towards maximum likelihood estimation has been made. However the estimation is still computer-intensive. Instead, we propose to use so-called Stein estimators: we use the Stein characterizations to obtain new estimators for local dependency exponential random graph models. 2025-03-17T14:01:11Z Updated version with detailed connection to MPLE Adrian Fischer Gesine Reinert Wenkai Xu http://arxiv.org/abs/2603.24333v1 Notes on Forré's Notion of Conditional Independence and Causal Calculus for Continuous Variables 2026-03-25T14:13:23Z Recently, Forré (arXiv:2104.11547, 2021) introduced transitional conditional independence, a notion of conditional independence that provides a unified framework for both random and non-stochastic variables. The original paper establishes a strong global Markov property connecting transitional conditional independencies with suitable graphical separation criteria for directed mixed graphs with input nodes (iDMGs), together with a version of causal calculus for iDMGs in a general measure-theoretic setting. These notes aim to further illustrate the motivations behind this framework and its connections to the literature, highlight certain subtlies in the general measure-theoretic causal calculus, and extend the "one-line" formulation of the ID algorithm of Richardson et al. (Ann. Statist. 51(1):334--361, 2023) to the general measure-theoretic setting. 2026-03-25T14:13:23Z Leihao Chen http://arxiv.org/abs/2603.20518v2 Multi-dimensional Mortality (MDMx): Sex-Age-Specific Model Life Tables, Fitting, Prediction from Summary Mortality Indicators, and Forecasting 2026-03-25T13:52:28Z Demographers rely on a variety of tools and methods to work with mortality schedules - model life tables, fitting methods, summary-indicator prediction, and forecasting - largely developed independently and not providing structurally coherent sex-specific outputs. The multi-dimensional mortality model (MDMx) unifies all four within one Tucker tensor decomposition demonstrated using the Human Mortality Database (HMD). Period life tables from the HMD are organized as a four-way tensor of logit(1qx) indexed by sex, age, country, and year. Shared factor matrices for sex and age make every output schedule structurally coherent by construction. From this decomposition four capabilities emerge: model life tables via clustering and smooth within-regime trajectories; life table fitting via a three-stage algorithm with Bayes-factor disruption detection; summary-indicator prediction mapping child or adult mortality to complete schedules, reformulating SVD-Comp in tensor coordinates; and forecasting via a damped local linear trend Kalman filter on PCA-reduced core matrices with hierarchical drift. 2026-03-20T21:35:35Z Samuel J. Clark http://arxiv.org/abs/2603.24299v1 Mortality Forecasting as a Flow Field in Tucker Decomposition Space 2026-03-25T13:38:25Z Mortality forecasting methods in the Lee-Carter tradition extrapolate temporal components via time-series models, producing forecasts that can systematically underpredict life expectancy at long horizons and require ad hoc adjustments for sex coherence. We reframe forecasting as integrating a flow field through the low-dimensional score space of a Tucker tensor decomposition of multi-population mortality data from the Human Mortality Database. PCA reduction of the effective core matrices reveals that the mortality transition is essentially a one-dimensional flow: a scalar speed function advances the level, trajectory functions supply the structural scores, and the Tucker reconstruction produces complete sex-specific mortality schedules at each horizon. An era-weighted speed function adapts to contemporary dynamics at each forecast origin, and empirically calibrated convergence rates control relaxation from country-specific to canonical mortality structure. The system is evaluated by leave-country-out cross-validation with a 50-year horizon against Lee-Carter and Hyndman-Ullah benchmarks. 2026-03-25T13:38:25Z Samuel J. Clark http://arxiv.org/abs/2603.24276v1 Rethinking Individual Risk and Aggregation in Survival Analysis: A Latent Mechanism Framework 2026-03-25T13:08:57Z Survival analysis provides a well-established framework for modeling time-to-event data, with hazard and survival functions formally defined as population-level quantities. In applied work, however, these quantities are often interpreted as representing individual-level risk, despite the absence of a clear generative account linking individual risk mechanisms to observed survival data. This paper develops a latent hazard framework that makes this relationship explicit by modeling event times as arising from unobserved, individual-specific hazard mechanisms and viewing population-level survival quantities as aggregates over heterogeneous mechanisms. Within this framework, we show that individual hazard trajectories are not identifiable from survival data under partial information. More generally, the conditional distribution of latent hazard mechanisms given covariates is structurally non-identifiable, even when population-level survival functions are fully known. This non-identifiability arises from the aggregation inherent in survival data and persists independently of model flexibility or estimation strategy. Finally, we show that classical survival models can be systematically reinterpreted according to how they handle this unresolved conditional mechanism distribution. This paper provides a unified framework for understanding heterogeneity, identifiability, and interpretation in survival analysis, and clarifies how population-level survival models should be interpreted when individual risk mechanisms are only partially observed, thereby establishing explicit information constraints for principled modeling and inference. 2026-03-25T13:08:57Z Xijia Liu http://arxiv.org/abs/2603.24263v1 XT-REM: A Two-Component Model for Meta-Analysis of Extreme Event Proportions 2026-03-25T12:55:30Z In this paper, we introduce a novel model for the meta-analysis of proportions that integrates the standard random-effects model (REM) with an extreme value theory (EVT)-based component. The proposed model, named XT-REM (Extreme-Tail Random Effects Model), extends the classical REM framework by explicitly accounting for extreme proportions through a partial segmentation of the study set based on a predefined threshold. While the majority of proportions are modeled using REM, proportions exceeding the threshold are analyzed using the Generalized Pareto Distribution (GPD). This formulation enables a dual interpretation of meta-analytic results, providing both an aggregate estimate for the central bulk of studies and a separate characterization of tail behavior. The XT-REM framework accommodates heteroskedastic variance structures inherent to proportion data, while preserving identifiability and consistency. Using real-world data on immunotherapy-related adverse events, together with simulation studies calibrated to empirical settings, we demonstrate that XT-REM yields a comparable central estimate while enabling a more explicit assessment of tail behavior, including high-percentile extreme proportions. Compared with the classical REM, XT-REM achieves higher log-likelihood values and lower AIC, in the considered scenarios, indicating a better fit within this modeling framework. In summary, XT-REM offers a theoretically grounded and practically useful extension of random-effects meta-analysis, with potential relevance to clinical contexts in which extreme event rates carry important implications for risk assessment. 2026-03-25T12:55:30Z Under preparation for submission to Computational Statistics & Data Analysis. Includes simulation study and real-world application of the XT-REM model Jovana Dedeić Jelena Ivetić Srđan Milićević Katarina Vidojević Marija Delić http://arxiv.org/abs/2402.08151v4 Perturbative adaptive importance sampling for Bayesian LOO cross-validation 2026-03-25T12:53:33Z Importance sampling (IS) is an efficient stand-in for model refitting in performing (LOO) cross-validation (CV) on a Bayesian model. IS inverts the Bayesian update for a single observation by reweighting posterior samples. The so-called importance weights have high variance -- we resolve this issue through adaptation by transformation. We observe that removing a single observation perturbs the posterior by $\mathcal{O}(1/n)$, motivating bijective transformations of the form $T(θ)=θ+ h Q(θ)$ for $0<h\ll 1.$ We introduce several such transformations: partial moment matching, which generalizes prior work on affine moment-matching with a tunable step size; log-likelihood descent, which partially invert the Bayesian update for an observation; and gradient flow steps that minimize the KL divergence or IS variance. The gradient flow and likelihood descent transformations require Jacobian determinants, which are available via auto-differentiation; we additionally derive closed-form expressions for logistic regression and shallow ReLU networks. We tested the methodology on classification ($n\ll p$), count regression (Poisson and zero-inflated negative binomial), and survival analysis problems, finding that no single transformation dominates but their combination nearly eliminates the need to refit. 2024-02-13T01:03:39Z Submitted Joshua C Chang Xiangting Li Tianyi Su Shixin Xu Hao-Ren Yao Julia Porcino Carson Chow http://arxiv.org/abs/2510.26485v3 Discovering Causal Relationships Between Time Series With Spatial Structure 2026-03-25T12:20:49Z Causal discovery is the subfield of causal inference concerned with estimating the structure of cause-and-effect relationships in a system of interrelated variables, as opposed to quantifying the strength or describing the form of causal effects. As interest in causal discovery builds in fields such as ecology, public health, and environmental sciences where data are regularly collected with spatial and temporal structures, approaches must evolve to manage autocorrelation and complex confounding. As it stands, the few proposed causal discovery algorithms for spatiotemporal data require summarizing across locations, ignore spatial autocorrelation, and/or scale poorly to high dimensions. Here, we introduce our developing framework that extends time-series causal discovery to systems with spatial structure, building upon work on causal discovery across contexts and methods for handling spatial confounding in causal effect estimation. We close by outlining remaining gaps in the literature and directions for future research. 2025-10-30T13:38:08Z 10 pages, 2 figures Rebecca F. Supple School of Mathematics and Statistics, University of St Andrews Centre for Research into Ecological and Environmental Modelling, University of St Andrews Hannah Worthington School of Mathematics and Statistics, University of St Andrews Centre for Research into Ecological and Environmental Modelling, University of St Andrews Ben Swallow School of Mathematics and Statistics, University of St Andrews Centre for Research into Ecological and Environmental Modelling, University of St Andrews http://arxiv.org/abs/2603.24227v1 Identification of NMF by choosing maximum-volume basis vectors 2026-03-25T12:00:53Z In nonnegative matrix factorization (NMF), minimum-volume-constrained NMF is a widely used framework for identifying the solution of NMF by making basis vectors as similar as possible. This typically induces sparsity in the coefficient matrix, with each row containing zero entries. Consequently, minimum-volume-constrained NMF may fail for highly mixed data, where such sparsity does not hold. Moreover, the estimated basis vectors in minimum-volume-constrained NMF may be difficult to interpret as they may be mixtures of the ground truth basis vectors. To address these limitations, in this paper we propose a new NMF framework, called maximum-volume-constrained NMF, which makes the basis vectors as distinct as possible. We further establish an identifiability theorem for maximum-volume-constrained NMF and provide an algorithm to estimate it. Experimental results demonstrate the effectiveness of the proposed method. 2026-03-25T12:00:53Z Qianqian Qi Zhongming Chen Peter G. M. van der Heijden