https://arxiv.org/api/E6/sVMfYG3H687kQXVXoY/Qh8Bo2026-06-18T18:21:02Z2357139015http://arxiv.org/abs/2405.17032v4Exact phylodynamic likelihood via structured Markov genealogy processes2026-05-20T21:34:15ZWe show that each member of a broad class of Markovian population models induces a unique stochastic process on the space of genealogies. We construct this genealogy process and derive exact expressions for the likelihood of an observed genealogy in terms of a filter equation, the structure of which is completely determined by the population model. We show that existing phylodynamic methods based on the coalescent and linear birth-death processes are special cases. We derive some properties of filter equations and describe a class of algorithms that can be used to numerically solve them. Importantly, because these algorithms rely only on simulation of the population model, they retain the plug-and-play property upon which simulation-based inference depends. Our results open the door to statistically efficient likelihood-based phylodynamic inference for a much wider class of models than is currently possible.2024-05-27T10:39:18ZAaron A. KingQianying LinEdward L. Ionideshttp://arxiv.org/abs/2605.16108v2Estimating Association Between Paired Outcomes in Clustered Data with Informative Subgroup Size2026-05-20T20:43:50ZInformative cluster size (ICS) and informative subgroup size (ISS) can distort marginal association estimates when the number of observed units, or their distribution across outcome-defined categories, is related to the outcomes under study. This issue is especially relevant for paired outcomes, where the observed association can depend on cluster size, paired-category composition, and the process by which units become available for analysis. We propose three weighted estimating approaches for marginal association between paired outcomes in clustered data. The weights are derived from within-cluster resampling arguments and extend inverse cluster-size and subgroup-size weighting to paired outcome categories. We also modify an existing ISS testing procedure by utilizing Stouffer's method to reduce computational burden. To evaluate the methods, we develop a simulator for clustered paired outcomes that separates unit-level association, latent cluster-level association, and outcome-dependent retention. Simulations show that pair-based weighting can reduce bias when association arises through unit-level dependence and subgroup composition is informative, but can attenuate association carried by latent cluster-level structure. Typical inverse-cluster weighting remains more stable when the association is primarily cluster-level. Application to NHANES oral-health data shows small positive periodontal and caries associations overall, with filled-surface outcomes showing stronger ISS evidence and greater sensitivity to pair-based weighting than decayed-surface outcomes. These results indicate that marginal association under ICS and ISS should be interpreted in relation to the source of association, observed-unit structure, and assumptions used to choose the weighting scheme.2026-05-15T15:56:03ZOwen VisserSomnath Dattahttp://arxiv.org/abs/2602.16195v2Phase Transitions in Collective Damage of Civil Structures under Natural Hazards2026-05-20T18:21:34ZThe fate of cities under natural hazards depends not only on hazard intensity but also on the coupling of structural damage, a collective process that remains poorly understood. Here we show that urban structural damage exhibits phase-transition phenomena. As hazard intensity increases, the system can shift abruptly from a largely safe to a largely damaged state, analogous to a first-order phase transition in statistical physics. Higher diversity in the building portfolio smooths this transition, but multiscale damage clustering traps the system in an extended critical-like regime (analogous to a Griffiths phase), suppressing the emergence of a more predictable disordered (Gaussian) phase. These phenomenological patterns are characterized by a random-field Ising model, with the external field, disorder strength, and temperature interpreted as the effective hazard demand, structural diversity, and modeling uncertainty, respectively. Applying this framework to real urban inventories reveals that widely used engineering modeling practices can shift urban damage patterns between synchronized and volatile regimes, systematically biasing exceedance-based risk metrics by up to 50% under moderate earthquakes ($M_w \approx 5.5$--$6.0$), equivalent to a several-fold gap in repair costs. This phase-aware description turns the collective behavior of civil infrastructure damage into actionable diagnostics for urban risk assessment and planning.2026-02-18T05:31:35ZSebin OhJinyan ZhaoRaul RinconJamie E. PadgettZiqi Wanghttp://arxiv.org/abs/2605.21464v1Assessing the impact of tourist attractions through the integration of causal inference and demand-side economic analysis: A case study of the Sensoria experience museum in Holzminden, Germany2026-05-20T17:51:42ZThis research note investigates the impact of the experience museum Sensoria, opened in September 2024 in Holzminden, Germany, on local tourism demand and related direct and indirect effects. To this end, the study employs a novel approach by combining causal inference and demand-side economic analysis. A difference-in-differences approach is employed to quantify the number of additional guest overnight stays in the treatment city; the results are converted into industry-specific expenditures, from which the direct and indirect effects of Sensoria are determined. A positive and significant impact which corresponds to 4,691 additional overnight stays can be detected in the first year of operation of the new tourist attraction, resulting in an additional gross turnover of approximately 0.56 million EUR across the hospitality and retail industries and other services. The direct effects and indirect effects amount to approximately 0.23 and 0.21 million EUR, respectively. However, long-term effects cannot (yet) be determined. Additionally, positive effects from small and large events in the cities studied can be demonstrated. This brief study demonstrates that combining the two approaches mentioned holds promise, yet requires a more in-depth analysis, for which suggestions are also discussed regarding how it could be conducted.2026-05-20T17:51:42Zv1.0.0Thomas Wielandhttp://arxiv.org/abs/2508.04074v3Matrix Factorization-Based Solar Spectral Irradiance Missing Data Imputation with Uncertainty Quantification2026-05-20T16:23:58ZThe solar spectral irradiance (SSI) depicts the spectral distribution of solar energy flux reaching the top of the Earth's atmosphere. Daily SSI measurements constitute a matrix with spectrally (rows) and temporally (columns) resolved solar energy flux measurements. The most recent SSI measurements have been made by NASA's Total and Spectral Solar Irradiance Sensor-1 (TSIS-1) Spectral Irradiance Monitor (SIM) since March 2018. This data has considerable missing data due to both random factors and instrument downtime, a periodic trend related to the Sun's cyclical magnetic activity, and varying degrees of correlation among the spectra, some approaching unity. We propose a low-rank matrix factorization method for SSI reconstruction that incorporates autoregressive temporal regularization, periodic spline detrending, and cross-spectral covariance information. The method is implemented as a two-stage procedure designed to address scattered missingness and extended downtime missingness, respectively, and is fitted using efficient alternating optimization algorithms. We further accompany the reconstructed SSI values with a distribution-free interval estimation procedure based on conformal prediction. Through synthetic experiments and real-data analyses, we compare this method with Gaussian process regression, linear time series smoothing, and existing matrix-completion approaches in terms of imputation accuracy, interval coverage, interval length, and computational efficiency. The results show that exploiting the periodic, temporal, and cross-spectral structure of SSI substantially improves reconstruction performance and yields calibrated uncertainty intervals, producing a reconstructed SSI data product suitable for downstream climate science studies.2025-08-06T04:20:14ZYuxuan KeXianglei HuangOdele CoddingtonYang Chenhttp://arxiv.org/abs/2605.21316v1Bitcoin's Power Law: Weak Structure, Strong Forecasts2026-05-20T15:46:46ZBitcoin's price has been described as following a power law (PL) in time, $P \sim t^β$ with $\hatβ\approx 5.7$ over 2010-2026. We test this claim using the Clauset-Shalizi-Newman protocol applied to Bitcoin's tail-relevant distributional series, and develop three principled time-domain adaptations of the protocol. We find that (i) the distributional power law is rejected on UTXO balances and daily |returns|, with lognormal preferred decisively; (ii) the fitted time-domain exponent varies by nearly a factor of three across reasonable shifts of the time origin -- it is not specification-robust in the sense required for a shift-invariant structural reading; (iii) standard residual diagnostics and scale-invariance tests proposed in earlier work cannot distinguish a power law from a multi-component sigmoid stack fit to the same data; (iv) Bitcoin price stands apart in a cross-asset comparison spanning Bitcoin on-chain metrics and traditional asset classes: it is the only series in the nine-series in-sample test where no single-component growth curve improves on the power law, and the quarterly $K=3$ wave-stability bootstrap rejects the PL+AR(1) null on Bitcoin at $p = 0.015$ (strict 15% CV threshold) -- a clear cross-asset separation, although not a Bonferroni-robust rejection; and (v) walk-forward Diebold-Mariano evaluation against ten candidates -- including standard time-series baselines (RW with drift, auto-ARIMA, ETS, local-linear-trend) -- shows the in-sample winner (multi-sigmoid) is among the worst long-horizon forecasters, while the simple power law dominates 12-24 month horizons against every standard baseline at $p < 0.05$, precisely because it does not commit to specific wave shapes. The fit-prediction tradeoff is the practical counterpart of the descriptive findings.2026-05-20T15:46:46ZCarlos BaqueroRaquel Menezeshttp://arxiv.org/abs/2605.21283v1A continuous-time Markov chain framework for population size estimation from multi-list data: accounting for absorbing lists and asymmetric interactions2026-05-20T15:15:56ZWe introduce a continuous-time Markov chain framework for estimating population size from multi-list data, which allows directional interactions to be modelled and can accommodate absorbing lists, such as death records, or more general data collection processes. The standard model of the continuous-time Markov chain framework and the log-linear model for multi-list data are equivalent when lists are independent and we show empirically that they give similar results in the presence of dependencies between lists. Through a simulation study, we highlight the need to account for an absorbing list by using the Markov model or the log-linear model with forced absorbing interactions, observing biased estimates of the population size otherwise. We motivate our approach with an epidemiological dataset concerning individuals suffering from a first ever stroke in North-West England, in which one of the lists is a death record. We illustrate a further use of our approach by considering a case of ordered lists on drug use data from the City of London.2026-05-20T15:15:56ZOphélie SchallerAndrew TitmanRachel McCreahttp://arxiv.org/abs/2511.01705v2Z-Dip: a standardized measure for data modality assessment2026-05-20T14:31:20ZDetecting multimodality in empirical distributions is a fundamental problem in statistics and data analysis, with applications ranging from clustering to the study of complex systems. In practice, however, assessing departures from unimodality in a consistent and comparable way remains challenging. Widely used methods such as Hartigan and Hartigan's Dip Test illustrate these difficulties, as the interpretation of their statistics depends strongly on sample size, requires calibration to determine significance, and, for large samples, exhibit increasing sensitivity, leading to rejection of unimodality for arbitrarily small deviations from the null. We introduce Z-Dip, a standardized measure of multimodality that addresses these limitations. By treating the Dip statistic as a random variable under the null hypothesis of unimodality and standardizing its observed value, the proposed approach yields scores that are directly comparable across datasets of different sizes. Using simulation-based calibration, we derive a universal decision threshold that closely reproduces classical Dip Test decisions without requiring sample-size-specific adjustments. Extensive validation on simulated data and on more than 88,000 empirical opinion distributions shows near-perfect agreement with the classical Dip Test while providing a more interpretable and comparable measure of modality. Finally, we propose a downsampling-based correction that mitigates residual sensitivity in extremely large samples. Open-source software and reference tables are provided to facilitate practical adoption.2025-11-03T16:13:25ZEdoardo Di MartinoMatteo CinelliRoy Cerquetihttp://arxiv.org/abs/2603.26184v2Why decision curves go above or below treat-all and treat-none: a PPV- and calibration-based guide for clinical prediction models2026-05-20T12:16:39ZNet benefit is widely used and reported to evaluate the clinical utility of prediction models, yet its interpretation often remains difficult in practice. In this didactical note, we develop two complementary interpretations that make net benefit easier to understand for clinical audiences. We show that comparisons with treat-none and treat-all can be expressed through threshold-specific observed risk in patients above and below the decision threshold, linking decision-curve performance to calibration in clinically relevant subgroups. We also show how net benefit relates to positive predictive value, offering a more intuitive explanation of when acting on model predictions is justified. We derive and illustrate these results and propose positive predictive value curves as a practical complement to decision curves.2026-03-27T08:54:51ZComments welcomeLinard Hoesslyhttp://arxiv.org/abs/2511.20183v2Multi-fidelity Gaussian process regression for noisy outputs and non-nested experimental designs: a comparison between the recursive and non-recursive formulations2026-05-20T08:09:33ZThis paper investigates a recursive formulation of auto-regressive multi-fidelity Gaussian process regression in the challenging setting of noisy and non-nested high- and low-fidelity data. We propose a decoupled optimization strategy based on the expectation-maximization algorithm, which exploits the structure of the recursive model. In particular, we derive closed-form update formulas when the scaling factor is modeled as a parametric linear predictor. This approach is compared with the fully coupled likelihood maximization of the classical non-recursive formulation introduced by Kennedy and O'Hagan. A series of benchmark experiments, covering applications of increasing complexity, highlights the performance of both approaches. The results demonstrate that the proposed recursive strategy significantly reduces training time, especially when large low-fidelity datasets are available, while maintaining competitive predictive accuracy and uncertainty estimation.2025-11-25T11:06:08ZNils BaillieBaptiste KerleguerCyril FeauJosselin Garnierhttp://arxiv.org/abs/2605.20806v1Evaluation of the number of clusters in a data set using $p$-values from Multiple Tests of Hypotheses2026-05-20T06:58:04ZThis paper proposes a novel, nonparametric, interpoint distance-based measure to investigate whether there exist any groups in a set of given data, and if so then, how many groups are prevailing in total. It is a cluster accuracy index useful for arbitrary-dimensional data set, in association with any clustering algorithm having the number of groups specified as a priori. We perform univariate, nonparametric, multiple statistical tests of hypotheses, where as many dependent tests as the sample size are carried out using the interpoint distances. They possess $p$-values to be combined to reach a decision, which is taken in a step-wise process for a possible number of clusters. It reduces the unnecessary computations compared with the other accuracy measures from the literature. Data study establishes the proposed index's efficiency and superiority.2026-05-20T06:58:04ZCommunications in Statistics - Theory and Methods (2024), 53, 8878-8889Soumita Modak10.1080/03610926.2024.2309967http://arxiv.org/abs/2605.20154v2Component over Composite: Mitigating Type I Error Inflation when Imputing "Days Alive and at Home"2026-05-20T06:56:01ZBackground: Days Alive and at Home (DAH) over a pre-defined follow-up period is a novel post-intervention composite outcome that combines data from at least three components: (i) initial length of hospital stay, (ii) length of total readmissions or other post-discharge care and (iii) mortality. Missing values bring unique challenges to the analysis of trials with the DAH outcome as the three components may have different rates of missingness caused by distinct missing data mechanisms. Current approaches define DAH as missing if any of the components are missing, and proceed with complete cases or Multiple Imputation (MI) of the composite. Methods: Through a simulation study motivated by the NOTACS trial, we compare several methods of handling missing data, including complete case analysis, MI of the composite, and MI of the components when the primary analysis is a Mann-Whitney-Wilcoxon test. Results: MI on the component level has good properties in terms of type I error control and power. We caution against the use of MI on the composite level with Predictive Mean Matching, which can lead to type I error inflation. Conclusions: Given the complex distributional characteristics of DAH, naive approaches such as defining missingness on the composite level and directly imputing the composite with Predictive Mean Matching, can lead to type I error inflation. Imputing on the component level is recommended, suggested future work included imputation approaches that are compatible with more complex definitions of DAH, as well as recommendations for sensitivity analyses to the Missing at Random assumption.2026-05-19T17:43:03ZMia S. TackneySarah DawsonLetao YuanDominique-Laurent CouturierSofia S. Villarhttp://arxiv.org/abs/2605.20692v1Inferring infectiousness: a joint model of the within-host viral kinetics of SARS-CoV-22026-05-20T04:39:53ZDuring an infectious disease outbreak, providing accurate answers to policy questions about transmission requires a detailed model of the natural history of infectiousness. Unfortunately, direct measures of infectiousness are generally unavailable. Instead, we often rely on indirect proxies, such as viral load measured by PCR or antigen tests, viral culture to detect replication-competent virus, or symptom onset, each of which reflects different aspects of viral dynamics or host response. However, these proxies vary in terms of the ease of collection, scalability, and their relationship to viral shedding and therefore underlying infectiousness. Here, we use data from five prospective, densely sampled cohorts with longitudinal data on multiple proxies of viral shedding for approximately 2,000 infections to develop a Bayesian joint model for the within-host viral kinetics of SARS-CoV-2 infection. Modeling the joint distribution allows us to infer the trajectory of infectious virus shedding -- the most direct correlate of infectiousness -- for individuals who contribute only PCR data, and to compute derived quantities that are inaccessible from any single proxy alone. These include the population-level probability and expected duration of ongoing infectiousness as a function of time since diagnosis, stratified by variant, vaccination status, and infection history; the residual risk of releasing an individual from isolation; and personalized, real-time estimates of infectiousness that are sequentially updated as new test results become available.2026-05-20T04:39:53ZChristopher B. BoyerStephen M. KisslerSeran HakkiJakob JonnerbyAjit LalvaniMarc Lipsitchhttp://arxiv.org/abs/2605.20633v1Application of Propensity Score Models and Causal Estimators in Observational Studies under Model Misspecification2026-05-20T02:36:46ZPropensity score (PS) methods are widely used in observational studies to reduce confounding and estimate causal treatment effects. However, the validity of PS-based causal estimators depends heavily on correct model specification, and model misspecification may lead to substantial bias and instability. In this study, we systematically evaluate the performance of commonly used causal estimators, including response surface modeling (RSM), inverse probability weighting (IPW), and augmented inverse probability weighting (AIPW), under varying levels of PS and outcome model misspecification. We compare classical logistic regression with several machine learning approaches for PS estimation, including random forests (RF), support vector machines (SVM), and linear discriminant analysis (LDA). Extensive simulation studies were conducted under multiple scenarios defined by combinations of correctly specified and misspecified PS and outcome models, varying sample sizes, and different covariate correlation structures. Estimator performance was assessed using bias, absolute bias, root mean squared error, empirical standard error, and confidence interval width. Results demonstrate that AIPW consistently provides robust and stable estimates across most scenarios due to its doubly robust property, whereas IPW is highly sensitive to PS misspecification and unstable PS estimates produced by flexible machine learning methods. RSM performs well only when the outcome model is correctly specified. Real-world applications using the ACTG175 clinical trial and the Alzheimer's Disease Neuroimaging Initiative (ADNI) dataset further illustrate the practical implications of estimator choice and PS modeling strategy. Overall, our findings highlight the importance of integrating flexible machine learning approaches within doubly robust frameworks to improve causal effect estimation in observational studies.2026-05-20T02:36:46Z24 pages, 4 figuresApu Chandra DasSakib SalamMd Robiul Islam TalukderAshim Chandra DasAntar Chandra DasRakhi Chowdhuryhttp://arxiv.org/abs/2604.21212v3Legal Infrastructure Organizes Eviction: Evidence from Philadelphia2026-05-20T02:28:28ZWe analyze the filing-side legal infrastructure of eviction using 755,004 Philadelphia Municipal Court landlord-tenant records filed between 1969 and 2022, of which 747,125 are residential. Eviction in Philadelphia is organized upstream by a concentrated plaintiff-side bar, durable plaintiff-attorney dependence, repeated use of the same properties, and recurring tenant-name exposure. Between 1983 and 2022, the ten most active plaintiff attorneys handled 82.2% of represented plaintiff-side cases per year on average, compared with 14.8% for the ten most active plaintiffs. Large plaintiffs depend heavily on a single attorney: among plaintiffs filing at least 101 cases, 78.3% of each plaintiff's filings are handled by that plaintiff's most-used attorney, on average. Repetition is likewise central to the docket. Across the residential filing universe, 48.8% of cases occur at addresses with a prior filing in the preceding year, and 23.6% at addresses with six or more prior filings; these repeats are usually filed by the same plaintiff and follow a more default-heavy, less agreement-heavy pathway. We further examine a narrower mechanism: strict switches into specialist plaintiff-side counsel, defined as a plaintiff changing attorney to one in the prior-year top ten. Filing counts rise around the switch with non-flat pre-trends, indicating organizational reconfiguration rather than a clean exogenous shock. Within-plaintiff and within-plaintiff-property comparisons yield more stable estimates: judgment by agreement, fee share, waiver language, and corrected lockout-trigger language decline, while deadline language rises. We interpret eviction as a layered upstream process in which concentrated counsel, repeated places, and recurring tenants produce filings before any courtroom bargaining or adjudication occurs.2026-04-23T02:06:36ZThis is a preprint before submissionMarios PapamichalisRegina Ruane