Causal Fairness for Survival Analysis

2026-05-12T00:29:37Z

In the data-driven era, large-scale datasets are routinely collected and analyzed using machine learning (ML) and artificial intelligence (AI) to inform decisions in high-stakes domains such as healthcare, employment, and criminal justice, raising concerns about the fairness behavior of these systems. Existing works in fair ML cover tasks such as bias detection, fair prediction, and fair decision-making, but largely focus on static settings. At the same time, fairness in temporal contexts, particularly survival/time-to-event (TTE) analysis, remains relatively underexplored, with current approaches to fair survival analysis adopting statistical fairness definitions, which, even with unlimited data, cannot disentangle the causal mechanisms that generate disparities. To address this gap, we develop a causal framework for fairness in TTE analysis, enabling the decomposition of disparities in survival into contributions from direct, indirect, and spurious pathways. This provides a human-understandable explanation of why disparities arise and how they evolve over time. Our non-parametric approach proceeds in four steps: (1) formalizing the necessary assumptions about censoring and lack of confounding using a graphical model; (2) recovering the conditional survival function given covariates; (3) applying the Causal Reduction Theorem to reframe the problem in a form amenable to causal pathway decomposition; (4) estimating the effects efficiently. Finally, our approach is used to analyze the temporal evolution of racial disparities in outcome after admission to an intensive care unit (ICU).

A Data-Consistent Approach to Ensemble Filtering

2026-05-11T22:09:55Z

Ensemble filtering of chaotic, partially observed systems is often performed with ensembles far smaller than the state dimension resulting in empirical covariances that are low rank. Subsequently, stochastic observation perturbations can degrade both accuracy and probabilistic calibration. We develop a data-consistent perspective on ensemble filtering and introduce the Quantity-of-Interest Principal Component Analysis Ensemble Data Consistent Filter (QPCA-EnDCF), which is a deterministic method that replaces perturbed observations with a spectrally regularized update in observation space. The method whitens forecast--observation residuals, computes an empirical eigendecomposition of the residual covariance, and restricts the correction to a rank-$κ$ subspace before mapping the increment back to state space through an empirical gain. We establish a theoretical framework that separates population and finite-ensemble objects and yields a bias--variance decomposition for the analysis mean. The analysis shows that stochastic EnKF variants incur an irreducible $\mathcal{O}(1/N)$ variance contribution from observation perturbations, whereas QPCA-EnDCF replaces this term with projector-estimation variability that is also $\mathcal{O}(1/N)$ but depends on the retained rank and the cutoff gap through eigenspace stability. Numerical experiments on the Lorenz--96 system in strongly undersampled regimes demonstrate that QPCA-EnDCF substantially improves spread--skill behavior, temporal tracking between spread and error, and rank-histogram reliability relative to sequential and four-dimensional stochastic EnKF. Under the baseline configuration, these calibration gains are accompanied by lower RMSE.

Information-Theoretic Grid Topology Reconstruction using Low-Precision Smart Meter Data

2026-05-11T21:55:35Z

Accurate knowledge of power grid topology is a prerequisite for effective state estimation and grid stability. While data-driven methods for topology reconstruction exist, the minimum requirements for measurement quality, specifically regarding quantization, precision, and sampling frequency, remain under-explored. This study investigates the data fidelity required to reconstruct distribution grid topologies using voltage magnitude measurements. Adopting an information-theoretic approach, we utilize the Chow-Liu algorithm to generate maximum spanning trees based on mutual information. Rather than proposing a new reconstruction algorithm, our primary contribution is a comprehensive sensitivity analysis of the measurement data itself. We systematically evaluate the impact of data bit-depth, significant digit truncation, time-window length, and different mutual information estimators on reconstruction accuracy. We validate this approach using IEEE test cases (via MATPOWER) and time-series data from GridLAB-D. Our results demonstrate that grid topology can be successfully recovered even with highly quantized 8-bit data or millivolt-level precision. However, performance degrades significantly when downsampling intervals exceed 20 minutes or when data availability is limited to short durations. These findings establish an optimistic theoretical lower bound, suggesting that costly high-precision instrumentation may not be strictly necessary for structural inference under ideal conditions. This rigorous baseline provides a foundation for future evaluations of noisy real world smart meter data and hybrid approaches that incorporate existing engineering priors.

Prediction Markets Underperform Simple Baselines For Infectious Disease Forecasting

2026-05-11T20:32:19Z

Prediction markets (e.g., Polymarket, Kalshi) allow participants to bet on future events, producing real-time forecasts based on collective judgment. In domains such as elections and finance, markets have been effective at aggregating information, often rivaling or outperforming expert forecasters or polls. Whether this performance extends to infectious disease dynamics is unclear. Participants are self-selected and typically lack epidemiological expertise. However, markets can respond in real time to emerging news and unstructured signals in ways that standard forecasting pipelines cannot. Also, substantial financial stakes encourage participants to make an effort to be accurate. We evaluate Polymarket forecasts during 2025 and 2026 for two settings: weekly cumulative influenza hospitalizations in the US, which have an established expert-curated forecasting ensemble (CDC FluSight), and monthly measles cases, which do not. Across both settings, prediction markets fail to outperform standard benchmarks. For influenza, markets are competitive with low-performing individual FluSight models but are dominated by the FluSight ensemble: even when we combine market forecasts with the ensemble, the best combination puts zero weight on the markets. For measles, markets are outperformed by simple statistical baselines. We diagnose two sources of market inefficiency: placement of probability mass on impossible outcomes (e.g., decreasing values in cumulative forecasts) and low trading volume. These results suggest that current prediction markets are not reliable forecasters of infectious disease dynamics on their own or useful as complementary features for existing forecasting systems.

Prediction of linear fractional stable motions using codifference, with application to non-Gaussian rough volatility

2026-05-11T14:20:57Z

The linear fractional stable motion (LFSM) extends the fractional Brownian motion (fBm) by considering $α$-stable increments. We propose a method to forecast future increments of the LFSM from past discrete-time observations, using the conditional expectation when $α>1$ or a semimetric projection otherwise. It relies on the codifference, which describes the serial dependence of the process, instead of the covariance. Indeed, covariance is commonly used for predicting an fBm but it is infinite when $α<2$. Some theoretical properties of the method and of its accuracy are studied and both a simulation study and an application to real volatility data, with a comparison to the fBm and to the heterogeneous auto-regressive model, confirm the relevance of the approach. The LFSM-based method shows a promising performance in the forecast of time series of volatilities, decomposing properly, in the fractal dynamic of rough volatilities, the contribution of the kurtosis of the increments and the contribution of their serial dependence. Moreover, the analysis of hit ratios suggests that, beside independence, persistence, and antipersistence, a fourth regime of serial dependence exists for fractional processes, characterized by a selective memory controlled by a few large increments.

Extending Evidence Accumulation Models to Bounded Continuous Self-report Data

2026-05-11T14:16:44Z

Evidence accumulation models (EAMs) provide a powerful framework for inferring latent cognitive processes from choice and reaction time data. While EAMs are traditionally limited to binary choices, recent developments have extended them to rotationally symmetric continuous responses via the circular diffusion model \citep{smith2016diffusion} and the spatially continuous diffusion model \citep{ratcliff2018decision}. Yet, such extensions are limited in scope, as many psychological constructs are measured on bounded non-rotational scales. In this paper, we bridge this gap by presenting and comparing two adaptations designed for bounded continuous data: the Half-Circular Diffusion Model (HCDM) and the Beta Drift Diffusion Model (BDDM). Because both models have intractable likelihoods, we fit them using Amortized Bayesian Inference (ABI) and compare them using Amortized Bayesian Model Comparison (ABMC). We demonstrate the complete workflow on an empirical affect dataset (N = 215), including parameter recovery, simulation-based calibration, posterior predictive checks, and model comparison. Both models accurately capture the joint distribution of responses and reaction times and yield interpretable parameters that can be reliably recovered. The model comparison further reveals a simple diagnostic for choosing between them: the dispersion of the rating distribution, with HCDM preferred for moderate spread and BDDM for highly concentrated or highly dispersed ratings. This work extends the EAM framework to a new application context, bounded continuous self-report data, and offers researchers a user-friendly toolkit for modeling the cognitive dynamics of continuous responses. We release fully documented Python code with both GPU and CPU implementations, along with example datasets.

The Global Carbon Budget as a cointegrated system

2026-05-11T12:21:19Z

The Global Carbon Budget, maintained by the Global Carbon Project, summarizes Earth's global carbon cycle through four annual time series beginning in 1959: atmospheric CO$_2$ concentrations, anthropogenic CO$_2$ emissions, and CO$_2$ uptake by land and by ocean. We analyze these four time series as a multivariate (cointegrated) system. Statistical tests show that the four time series are cointegrated with rank three and identify anthropogenic CO$_2$ emissions as the single stochastic trend driving the nonstationary dynamics of the system. The three cointegrated relations correspond to the physical relations that the sinks are linearly related to atmospheric concentrations and that the change in concentrations equals emissions minus the combined uptake by land and ocean. Furthermore, likelihood ratio tests show that a parametrically restricted error-correction model that embodies these physical relations cannot be rejected on the data. The model can be used for both in-sample and out-of-sample analysis. In an application of the latter, we demonstrate that projections based on this model, using Shared Socioeconomic Pathways scenarios, yield results consistent with established climate science.

Generative AI Fuels Solo Entrepreneurship, but Teams Still Lead at the Top

2026-05-11T09:53:44Z

Recent advances in generative artificial intelligence (AI) are reshaping who enters entrepreneurship, but not who reaches the top of the quality distribution. Using data on over 160,000 product launches on Product Hunt, we find that entrepreneurial entry increased sharply following the public release of ChatGPT-3.5, driven disproportionately by solo entrepreneurs. This shift toward solo entry is particularly pronounced in categories that historically favored team-based ventures. However, much of this growth reflects low-commitment, experimental entry and does not translate into greater representation among the highest-quality outcomes. Team-based ventures are increasingly dominant in the top tiers of platform rankings. These findings suggest that generative AI lowers barriers to solo entrepreneurship while reinforcing team-based advantages.

Rethinking Factor Loading Thresholds: A Case for a Strict λ >= .70 Rule

2026-05-11T09:28:16Z

This paper challenges the prevailing practice of accepting standardized factor loadings as low as .50 in confirmatory factor analysis. Drawing on the logic of Average Variance Extracted (AVE) and communality, the author argues for a stricter item level threshold: only indicators with loadings of λ >= .70 (implying λsq >= .50) should be retained in final measurement models. The rationale is that indicators with λ < .70 contain more error than explained variance, undermining both construct validity and the stability of factor solutions. The paper reviews theoretical foundations, simulation evidence, and implications for structural equation modeling, showing that weak loadings degrade measurement quality, factor score determinacy, and model fit. Adopting a minimum λ >= .70 rule aligns item level standards with established construct level criteria and enhances the rigor and interpretability of latent variable models.

Estimating Consensus Epidemic Trajectories via a Constrained Power Fréchet Mean with Functional Registration

2026-05-11T06:47:48Z

We propose a method for summarizing multiple solutions to SEIR-type compartmental models on a functional space by computing a constrained power Fréchet mean with functional registration to obtain consensus epidemic trajectories with partial mechanistic interpretability. In our method, we regard the pairs of exposed and infectious compartments as objects in a Hilbert space, and the consensus trajectory is defined as the solution to a constrained optimization problem. Differential equation constraints and population constraints are incorporated in the optimization to preserve a partially mechanistic interpretation regarding the infectious compartment. The full dynamics with additional susceptible and removed compartments can then be recovered from the estimated trajectories and parameters. We develop an efficient block-optimization algorithm based on functional data analysis and illustrate the method using simulated and literature-derived epidemiological parameters for COVID-19 in the early phase of the pandemic that began in 2020. The proposed approach provides a generalized trajectory-summarization framework that includes mean- and median-type estimators on a functional space and holds potential for model averaging and ensemble forecasting in infectious disease modeling.

A Statistical Framework for Learning Preferences from the Past

2026-05-11T06:11:09Z

In many real-world settings such as online recommendation or consumer choice modeling, individuals make repeated choices from a fixed set of options. Accurately estimating their underlying preferences is essential for generating personalized future recommendations. Probabilistic models for understanding user choice behavior from past decisions can serve as a valuable addition to existing recommender systems and choice prediction methods. To this end, in this article, we introduce a novel statistical framework for predicting user preferences based on their past choices, under a natural monotonicity assumption: options that were chosen more frequently or more intensely in the past are more likely to be chosen again in the future. Our approach builds on a parametric model proposed by Le Goff and Soulier (2017), originally used to describe how ants in an ant colony select a path among many pre-existing paths. We propose a non-parametric generalization of this model, drawing inspiration from the generalized elephant random walk introduced by Maulik et al. (2024). We develop a method of maximum likelihood estimation of the user preference probabilities under the above-mentioned monotonicity constraint. We also derive theoretical guarantees for our estimator and demonstrate the effectiveness of our method through both simulated experiments and real-world datasets.

Decision Making in Drug Development via Inference on Power

2026-05-10T22:03:04Z

A typical power calculation is performed by replacing unknown population-level quantities in the power function with what is observed in external studies. Many authors and practitioners view this as an assumed value of power and offer the Bayesian quantity probability of success or assurance as an alternative. The claim is by averaging over a prior or posterior distribution, probability of success transcends power by capturing the uncertainty around the unknown true treatment effect and any other population-level parameters. We use p-value functions to frame both the probability of success calculation and the typical power calculation as merely producing two different point estimates of power. We demonstrate that Go/No-Go decisions based on either point estimate of power do not adequately quantify and control the risk involved, and instead we argue for Go/No-Go decisions that utilize inference on power for better risk management and decision making.

Detecting and Correcting Sample-by-Sample Scale Distortion in RNA Sequencing Data

2026-05-10T21:00:07Z

RNA sequencing (RNA-seq) is the conventional genome-scale approach used to capture the expression levels of all detectable genes in a biological sample. This is now regularly used for population-based studies designed to identify genetic determinants of various diseases. Naturally, the accuracy of these tests should be verified and improved if possible. In this study, we aimed to detect and correct for expression level-dependent errors which vary from sample to sample, and are not corrected by conventional normalization techniques . We examined several RNA-seq datasets from the Cancer Genome Atlas (TCGA), Stand Up 2 Cancer (SU2C), and GTEx databases with various types of preprocessing. By applying local averaging, we found sample by sample expression-level dependent biases in all datasets studied. Using simulations, we show that these biases corrupt gene-gene correlation estimations and $t$ tests between subpopulations. To mitigate these biases, we introduce two different nonlinear transforms based on statistical considerations that correct these observed biases. We demonstrate that that these transforms effectively remove the observed per-sample biases, reduce sample-to-sample variance, and improve the characteristics of gene-gene correlation distributions. Using a novel simulation methodology that creates controlled differences between subpopulations, we show that these transforms reduce variability and increase sensitivity of two population tests. The improvements in sensitivity and specificity were of the order of 3-5\% in most instances after the data was corrected for bias. Altogether, these results improve our capacity to understand gene-gene relationships, and may lead to novel ways to utilize the information derived from clinical tests.

Sequential Randomization Tests Using e-values: Applications for trial monitoring

2026-05-10T20:57:17Z

Sequential monitoring of randomized trials traditionally relies on parametric assumptions or asymptotic approximations. We discuss a family of nonparametric sequential tests - collectively called e-RT - for binary, event-only, and continuous endpoints. All active variants derive validity from the randomization mechanism. Using a betting framework, each test constructs a test martingale by sequentially wagering on randomized assignments or observed event labels before using the current label in the wealth update. Under the null hypothesis of no treatment effect, the expected wealth cannot grow, guaranteeing anytime-valid Type I error control regardless of stopping rule. The default e-RT posture is effect-size agnostic: monitoring can begin without specifying a hypothesized treatment effect. Alternatively, fixed design-calibrated wagers, including growth-rate-optimal (GROW) wagers, may be used as optional efficiency tools when a clinically meaningful design alternative is credible. We present simulation studies demonstrating calibration and power, and discuss the principled asymmetry in betting strategies across outcome types. These methods provide a conservative, assumption-light complement to model-based sequential analyses.

Mixture of Finite Mixtures Model for Basket Trial

2026-05-10T18:11:40Z

With the recent paradigm shift from cytotoxic drugs to new generation of target therapy and immuno-oncology therapy during oncology drug developments, patients with various cancer (sub)types may be eligible to participate in a basket trial if they have the same molecular target. Bayesian hierarchical modeling (BHM) are widely used in basket trial data analysis, where they adaptively borrow information among different cohorts (subtypes) rather than fully pool the data together or doing stratified analysis based on each cohort. Those approaches, however, may have the risk of over shrinkage estimation because of the invalidated exchangeable assumption. We propose a two-step procedure to find the balance between pooled and stratified analysis. In the first step, we treat it as a clustering problem by grouping cohorts into clusters that share the similar treatment effect. In the second step, we use shrinkage estimator from BHM to estimate treatment effects for cohorts within each cluster under exchangeable assumption. For clustering part, we adapt the mixture of finite mixtures (MFM) approach to have consistent estimate of the number of clusters. We investigate the performance of our proposed method in simulation studies and apply this method to Vemurafenib basket trial data analysis.