https://arxiv.org/api/rolOiv5wmaKbwSXR4LqCGwhqjcI2026-06-18T19:16:52Z2357140515http://arxiv.org/abs/2605.20621v1Changepoint Detection in Categorical Time Series with Application to Daily Total Cloud Cover in Canada2026-05-20T02:14:24ZChangepoints are essential for homogenizing categorical time series and analyzing their trends and variations. The original total cloud cover in Canada was recorded hourly in tenths (or eighths), exhibiting inherent seasonality and serial correlation. Lu and Wang (2012) introduced an extended cumulative logit model to detect shifts in the annual frequencies of cloud cover conditions. While annual aggregation mitigates seasonality and serial correlation, it shortens the time series and may lead to overdispersion. This article introduces a marginalized transition model to detect a single changepoint in periodic and serially correlated categorical time series. The model captures serial dependence using a first-order Markov chain and enables category-specific changepoint specification. To enhance computational efficiency, we develop a new parameter estimation procedure for obtaining maximum likelihood estimates. A maximally selected likelihood ratio test statistic is then proposed to test for sudden changes in categorical time series, and the method is illustrated using daily total cloud cover observations recorded at 9 a.m. and 3 p.m. at Fort St. John Airport, British Columbia, Canada.2026-05-20T02:14:24Z31 pages, 16 figures, 5 tables; includes supplementary material; R/Rcpp code available in the linked GitHub repositoryMo LiQiQi LuXiaoLan Wanghttp://arxiv.org/abs/2602.04092v2Time-to-Event Estimation with Unreliably Reported Events in Medicare Health Plan Payment2026-05-20T01:38:36ZOBJECTIVE: To propose time-to-event estimators that help evaluate incident diagnostic coding and possible upcoding in Medicare as well as introduce an open-source software package that enables more reproducible methods development relevant to Medicare billing behavior. STUDY SETTING AND DESIGN: Observational analysis of simulated upcoding based on coding by insurers or providers that may be incentivized by Medicare Advantage risk adjustment. DATA SOURCES AND ANALYTIC SAMPLE: Two years of separately simulated incident health condition coding data for a Medicare Advantage population and a Traditional Medicare population where coding patterns are aligned with known practices in each program. PRINCIPAL FINDINGS: We propose several novel time-to-event estimators of incident coding intensity and possible upcoding in Medicare Advantage, including accounting for unreliable reporting. We demonstrate estimator performance in simulated data leveraging the National Institutes of Health's All of Us study and also develop an open source R package to simulate longitudinal realistic labeled upcoding data, which were not previously available for researchers. In simulations, our novel estimators recovered differences in upcoding within and across monitoring periods. Undercoding had a limited effect on our novel estimators while an existing estimator was more sensitive to undercoding. CONCLUSIONS: Our proposed estimators can help researchers and policymakers track new coding behaviors (e.g., as may be incentivized by risk adjustment formula updates) earlier and at scale while accounting for several real-world data considerations. Further, the R package we provide can be used to improve the development, accessibility, and reproducible evaluation of coding intensity and upcoding methodology.2026-02-04T00:04:44Z44 pages, 10 figuresOana M. EnacheSherri Rosehttp://arxiv.org/abs/2605.21536v1High-Volume Plaintiff-Side Counsel and Single-Appearance Eviction Cases in Philadelphia2026-05-20T01:32:51ZAmong 755,004 Philadelphia landlord--tenant records filed during 1969-2022, 396,163 residential cases involve tenants who appear exactly once in the observed docket. In unadjusted comparisons, single-appearance cases handled by high-volume plaintiff-side counsel are more likely to advance to the writ-of-possession and served-writ stages, but no more likely to end in default. Comparisons within the same plaintiff, and within the same plaintiff at the same property, show no broad premium on adverse case outcomes such as default, judgment, or fees. The clearer pattern is organizational: after a plaintiff adopts or switches into high-volume counsel, monthly filings rise by about 2-5% and the number of distinct buildings reached rises by a similar margin; near the prior-year top-10 attorney threshold, cases display local differences in default and enforcement; and continuances under specialist counsel are more closely linked to default. Non-flat pre-treatment trends and imprecise reverse-direction estimates from attorney exits restrict the strength of any causal claim. High-volume plaintiff-side counsel therefore functions as a mechanism of filing scale and procedural sequence, not as a uniform escalator of case outcomes or as a cause of any individual tenant becoming single-appearance.2026-05-20T01:32:51ZPreprintMarios PapamichalisRegina Ruanehttp://arxiv.org/abs/2605.20559v1Group-Aware Matrix Estimation and Latent Subspace Recovery2026-05-19T23:22:32ZModern matrix completion problems often involve heterogeneous data whose rows simultaneously belong to many meta-categories, such as demographic and age groups in recommendation systems, or region and recording session labels in neural electrophysiological experiments. Standard low-rank estimators impose a single global latent geometry, which can recover average structure but may smooth away subgroup-specific variation, especially when observations are unevenly distributed across groups. We introduce Group-Aware Matrix Estimation (GAME), a convex estimator for overlapping subgroup-wise low-rank matrix estimation. GAME regularizes category-specific submatrices through overlapping nuclear-norm penalties, allowing related groups to borrow information while preserving local latent structure in a shared coordinate system. We provide finite-sample guarantees for both reconstruction error and subgroup-specific subspace recovery, showing how performance depends on sampling density, subgroup rank, and overlap structure. Experiments on synthetic, recommendation, ecological, and neuroscience datasets show that GAME is most beneficial in structured missingness regimes, where subgroup-aware regularization improves both reconstruction accuracy and latent subspace fidelity. Across these benchmarks, GAME is competitive or best among global low-rank, side-information, and modern imputation baselines, with the largest gains when subgroups exhibit distinct low-rank structure.2026-05-19T23:22:32Z12 pages, 6 main figures, 1 main algorithmHamza GolubovicMatthew ShenGenevera I. AllenTarek M. Zikryhttp://arxiv.org/abs/2512.02182v3Two-phase validation sampling via principal components to improve efficiency in multi-model estimation from error-prone biomedical databases2026-05-19T21:52:15ZTwo-phase sampling offers a cost-effective way to validate error-prone covariate measurements in biomedical databases. Inexpensive or easy-to-obtain information is collected for the entire study in Phase I. Then, a subset of patients undergoes cost-intensive validation (e.g., expert chart review) to collect more accurate data in Phase II. When balancing primary and secondary analyses, competing models and priorities can result in poorly defined objectives for the most informative Phase II sampling criterion. Extreme tail sampling (ETS), wherein patients with the smallest and largest values of a particular quantity (like a covariate or residual) are selected, can offer great statistical efficiency in two-phase studies when focusing on a single analytic objective by targeting observations with the biggest contributions to the Fisher information. We propose an intuitive, easy-to-use approach that extends ETS to balance and prioritize explaining the largest amount of variability across multiple models of interest. Using principal components analysis, we succinctly summarize the inherent variability of all models' error-prone exposures. Then, we sample patients with the most extreme values of the first principal component for validation. Through extensive simulations and an application to the National Health and Nutrition Examination Survey (NHANES), the proposed strategy offered simultaneous efficiency gains across multiple models of interest. Its advantages persisted across various real-world scenarios, including correlated or heterogeneous measurement error. When designing a validation study, concentrating on a single model may be short-sighted. Strategically allocating resources more broadly balances multiple analytical goals simultaneously. Employing dimension reduction before sampling will allow this strategy to scale up well to big-data applications with many error-prone exposures.2025-12-01T20:22:34Z22 pages, 5 figures, 2 tables, GitHub repositories with R package and simulation/analysis codeSarah C. LotspeichCole Manschothttp://arxiv.org/abs/2605.20508v1Compensator-Based Inference for Signal Detection Under Unknown Background2026-05-19T21:24:45ZThe problem of detecting new signals in the presence of an unknown background is ubiquitous in scientific discoveries and is especially prominent in the physical sciences. Most solutions proposed thus far to address the problem focus on estimating the background distribution and using that estimate to infer the signal. By studying the geometry of the problem, this article demonstrates that estimating the background distribution is somewhat unnecessary for inferring the signal intensity. Instead, it suffices to estimate a single parameter, referred to as the compensator, to account for the incomplete knowledge on the background, substantially simplifying the problem's complexity and enabling proper uncertainty propagation. Such a compensator is shown to govern the conservativeness of the inference, both in the proposed setup and in likelihood-based approaches.2026-05-19T21:24:45ZAritra BanerjeeSara Algerihttp://arxiv.org/abs/2605.20502v1Tippett-minimum Fusion of Representation-space Diffusion Models for Multi-Encoder Out-of-Distribution Detection2026-05-19T21:18:49ZWe address out-of-distribution (OOD) detection across the full spectrum of distribution shifts -- global domain changes, semantic divergence, texture differences, and covariate corruptions -- through a multi-encoder fusion of per-encoder representation-space diffusion models (RDMs). We statistically identify each encoder's sensitivity to specific shift types from ID data alone and introduce EncMin2L -- an encoder-agnostic two-level $\min(\cdot)$-gate that combines and calibrates per-encoder diffusion-based likelihood detectors without OOD labels, outperforming monolithic multi-encoder baselines at $2.3\times$ lower parameter cost. Two ID-data diagnostics: $η^2$ (class-conditional F-test) and $Δμ$ (log-likelihood shift under synthetic corruptions) -- quantify encoder specialization, while a Tippett minimum $p$-value combination aggregates per-encoder scores into a single, calibration-stable OOD signal. EncMin2L achieves $\geq 0.94$ AUROC across all four shift types simultaneously, outperforming the state-of-the-art representation-space diffusion OOD detectors across overlapping benchmarks.2026-05-19T21:18:49Z14 pagesNeelkamal Bhuyanhttp://arxiv.org/abs/2605.20494v1A 10,000-Year Global Stochastic Tropical Cyclone Catalog with Wind-Dependent Track Transitions (WHITS)2026-05-19T20:58:36ZReliable assessment of tropical cyclone (TC) risk is limited by the brevity and spatial sparsity of the historical record, particularly for the rare, high-intensity landfalls that dominate insured loss. We present WHITS (Wind-focused Hurricane Interactive Track Simulator), a non-parametric semi-Markov track generator that extends the HITS framework of Nakamura et al. (2015) in three ways: transitions between historical track segments are conditioned on local wind speed in addition to position, age, and forward vector; the kernel selection on the comparative-vector term is sharpened to suppress dynamically inconsistent jumps; and a short smoothing window is applied across each transition to remove the position and wind discontinuities reported by downstream surge users. WHITS is fit to the full available best-track record in each of six basins in IBTrACS, extending in the North Atlantic to 1851 and in other basins to the earliest year of reliable best-track data. The resulting 10,000-yr global synthetic catalog reproduces observed track density and the annual hurricane/typhoon-force wind-hit probability across all basins. The catalog is intended for catastrophe-risk applications where a large, low-bias sample of physically plausible tracks is more useful than a small, statistically corrected one.2026-05-19T20:58:36ZJennifer NakamuraUpmanu Lallhttp://arxiv.org/abs/2605.20429v1Design and Validation of a Grid-based Home Detection via Stay-Time (GHOST) Software for Mobile Location Data2026-05-19T19:25:52ZAccurately detecting home locations from GPS data generated by mobile devices is a foundational step in human mobility research, with significant implications for transportation planning, public health, and emergency response. However, existing home detection algorithms often produce unreliable results for noisy real-world data and are barely validated due to a lack of ground-truth benchmarks. To tackle these limitations, this study presents the development and validation of a Grid-based home detection via Stay-Time (GHOST) algorithm, implemented as an open-source Python package. The algorithm infers proxy home locations by identifying the most frequently visited nighttime or weekend daytime grid cells based on customizable spatial and temporal filters. To validate its performance, we use the large-scale BostonWalks dataset, which includes over 155,000 trips from 377 participants in the Boston metropolitan area, to test robustness to noisy data. Additionally, we collected a ground-truth dataset for ten volunteers across different regions in the U.S., including Florida, Mississippi, and Colorado, along with their self-reported home coordinates, to evaluate GHOST across diverse mobility patterns and sampling conditions. We compared GHOST accuracy to that of 5 well-established home detection algorithms: All-time clustering method, Stay-point method, DBSCAN, K-MEANS++, and SciKit-Mobility Home Detection, across multiple parameter settings. Results show that GHOST outperforms all algorithms in accuracy and robustness, with average errors as low as 22.3 meters under optimal configurations. Our findings highlight the high accuracy and flexibility of our algorithm, with grid size being the most influential parameter during validation, demonstrating the potential of this algorithm for real-world mobile location data analysis.2026-05-19T19:25:52ZAlessandra RecaldeMustafa SameenXiaojian ZhangXilei Zhaohttp://arxiv.org/abs/2605.20400v1Understanding Deterioration Random Effects for Causal Discovery in Infrastructure Management2026-05-19T18:57:35ZInfrastructure deterioration poses significant challenges for asset management, yet existing approaches rely on population-averaged models that overlook equipment-specific heterogeneity. We present a novel framework that combines Bayesian hierarchical hazard modeling with causal discovery to identify operational patterns that drive heterogeneous deterioration rates in pump equipment. Our approach first estimates pump-specific random effects $u_i$ using GPU-accelerated No-U-Turn Sampling (NUTS), achieving 3--5$\times$ speedup over CPU implementations. We then employ DirectLiNGAM to discover causal relationships between 22 engineered time-series features and deterioration rates, stratified by positive ($u_i > 0$, faster deterioration) versus negative ($u_i \leq 0$, slower deterioration) random effects. Analyzing 112 pumps with 92,861 observations over 650 days, we uncover striking heterogeneity: the negative group exhibits causal effects 400$\times$ larger than the positive group, with standard deviation (std) showing a strong positive causal effect ($+1.515$) on deterioration rates in low-risk equipment. We validate linearity assumptions through NonlinearLiNGAM comparison and demonstrate practical scalability through GPU acceleration. Our findings enable targeted maintenance strategies by revealing that different operational regimes require fundamentally distinct management approaches, advancing predictive maintenance from population-averaged to heterogeneity-aware decision making.2026-05-19T18:57:35Z20 pages, 7 figures, 4 tablesTakato Yasunohttp://arxiv.org/abs/2605.20399v1A duration-augmented binary Markov chain for rainfall occurrence with long dry spells2026-05-19T18:53:55ZSimulating realistic wet and dry spells is central in weather generators and climate-impact studies. While finite-order Markov chains are standard, they often fail to reproduce persistent dry conditions due to their inherent subexponential decay. We model rainfall occurrence by introducing a duration-augmented binary Markov chain. We establish a link with alternating renewal chains, enabling flexible parametric modelling of wet and dry spell duration distribution. We model those using two regime-adapted specifications from the general class of extended Generalized Pareto Distributions, yielding flexible tail behaviour across various climates. We use estimation methods adapted to each specification. Our model is applied to around 200 stations in the South of Europe spanning diverse Mediterranean and continental climates. We compare this framework to standard Markov models in characterising persistence and high-quantile extrapolation. The approach is generic, extending naturally to multi-state settings or other binary sequence applications in environmental statistics.2026-05-19T18:53:55ZAntoine DoizéLPSM, SUDenis AllardBioSPPhilippe NaveauLSCE, ESTIMROlivier WintenbergerLPSM, SUhttp://arxiv.org/abs/2412.15076v4Digital N-of-1 Trials and their Application in Experimental Physiology2026-05-19T18:40:37ZTraditionally, studies in experimental physiology have been conducted in small groups of human participants, animal models or cell lines. Identifying optimal study designs that achieve sufficient power for drawing proper statistical inferences to detect group level effects with small sample sizes has been challenging. Moreover, average effects derived from traditional group-level inference do not necessarily apply to individual participants. Here, we introduce N-of-1 trials as an innovative study design that can be used to draw valid statistical inference about the effects of interventions on individual participants and can be aggregated across multiple study participants to provide population-level inferences more efficiently than standard group randomized trials. N-of-1 trials have been used in healthcare settings since the late 1980s, but without large-scale adoption and with few applications in experimental physiology research settings. In this manuscript, we introduce the key components and design features of N-of-1 trials, describe statistical analysis and interpretations of the results, and describe some available digital tools to facilitate their use using examples from experimental physiology.2024-12-19T17:26:02ZAccepted in Experimental Physiology. https://doi.org/10.1113/EP092753Stefan KonigorskiMathias Ried-LarsenChristopher H Schmidhttp://arxiv.org/abs/2605.20142v1Mining Financial Data using Mixtures of Mirrored Weibull Distributions2026-05-19T17:30:53ZRisk management is an important part of financial practice, essential for protecting assets and investments in modern-day volatile markets. This paper proposes a mixture of mirrored Weibull (MMW) distribution for modelling stock returns and estimating risk measures. Unlike common practices which are typically based on the normal distribution, the MMW model can flexibly accommodate non-normal features frequently exhibited in financial data. It also enjoys appealing properties such as having a simple density expression and fast parameter estimation. We demonstrate the effectiveness of our model by assessing its performance in Value-at-Risk (VaR) estimation of three S&P500 stocks. The MMW model compares favourably to Gaussian mixture model and t-mixture model, with significant improvements in VaR estimation and prediction.2026-05-19T17:30:53ZZijun JiaSharon X. Lee10.1109/ICIBA62489.2024.10868849http://arxiv.org/abs/2605.20003v1Estimating treatment duration effects via clone-censor-weight: a breast cancer case study2026-05-19T15:34:49ZIn this work, we study the estimation of treatment duration effects in observational survival data, where treatment and covariate histories evolve over time and longer observed durations are only attainable among individuals who remain event-free and under follow-up, leading to immortal time bias under naive analyses. The cloning-censoring-weighting (CCW) framework provides a practical approach to emulate target trials of treatment duration strategies, but several methodological aspects remain insufficiently understood.
We focus on static treatment duration strategies under two settings of increasing complexity: baseline confounding only, and confounding with time-varying covariates. We formalize the assumptions underlying CCW, with particular emphasis on treatment admissibility, relaxed intervention rules, and the distinction between artificial and natural censoring. We then compare several estimation approaches after cloning and censoring, including inverse probability of censoring weighting (IPCW), the G-formula, and doubly robust estimators, through simulation studies assessing robustness, variability, and sensitivity to censoring model misspecification.
Finally, we apply the framework to a Breast Cancer cohort to emulate a target trial comparing 2 versus 5 years of adjuvant tamoxifen in early stage breast cancer. Due to the small number of events and limited support for the 2-year strategy, estimates are associated with substantial uncertainty. These findings highlight both the practical relevance and the limitations of CCW, and underscore the importance of sensitivity analyses in complex longitudinal observational settings.2026-05-19T15:34:49ZCharlotte VoinotNoémie Simon-TillauxEmma TorriniStefan MichielsBernard SebastienClément BerenfeldJulie Jossehttp://arxiv.org/abs/2605.19812v1FLUXtrapolation: A benchmark on extrapolating ecosystem fluxes2026-05-19T13:09:05ZWe introduce FLUXtrapolation, a benchmark for extrapolating ecosystem fluxes under progressively harder distribution shifts. Ecosystem fluxes are central to understanding the carbon, water, and energy cycles, yet they can only be measured directly at sparsely located measurement towers. Producing global flux estimates therefore requires training models on observed sites using globally available covariates and predicting in unobserved regions, that is, upscaling. Flux upscaling is a challenging domain generalization problem that is affected by a shift in covariate distribution across climates, ecosystem types, and environmental conditions, as well as by conditional shift: important drivers remain unobserved at global scale. We provide a quantitative analysis of both these shifts in $P_X$ and $P_{Y\mid X}$. FLUXtrapolation is designed based on domain expertise on flux upscaling: it defines temporal, spatial, and temperature-based extrapolation scenarios and evaluates performance across held-out domains, temporal aggregations, and tail errors. In a pilot study, we find that baselines perform similarly under median hourly RMSE, but separate under the proposed tail-focused and multi-scale evaluation. FLUXtrapolation therefore poses a realistic and thus relevant challenge for machine learning methods under distribution shift; at the same time, progress on this benchmark would directly support the scientific goal of improving flux upscaling.2026-05-19T13:09:05ZAnya FriesJacob A NelsonMartin JungMarkus ReichsteinJonas Peters