A longitudinal Bayesian framework for estimating causal dose-response relationships

2026-06-01T08:23:36Z

Existing causal methods for time-varying exposure and time-varying confounding focus on estimating the average causal effect of a time-varying binary treatment on an end-of-study outcome, offering limited tools for characterizing marginal causal dose-response relationships under continuous exposures. We propose a scalable, nonparametric Bayesian framework for estimating marginal longitudinal causal dose-response functions with repeated outcome measurements. Our approach targets the average potential outcome at any fixed dose level and accommodates time-varying confounding through the generalized propensity score. The proposed approach embeds a Dirichlet process specification within a generalized estimating equations structure, capturing temporal correlation while making minimal assumptions about the functional form of the continuous exposure. We apply the proposed methods to monthly metro ridership and COVID-19 case data from major international cities, identifying causal relationships and the dose-response patterns between higher ridership and increased case counts.

Mapping the Storm: Geospatial Impacts of Severe Weather on LEO Network Performance

2026-06-01T05:41:55Z

LEO satellite constellations, led by deployments such as Starlink, are playing an increasingly pivotal role in enabling global broadband connectivity. However, the reliability and performance of these space-based networks are highly sensitive to environmental dynamics, particularly localized weather phenomena that exhibit strong spatio-temporal variability. In this study, we present a continental-scale geospatial analysis of weather-induced performance degradation in the Starlink LEO network, with a focus on the contiguous United States. Leveraging a unique dataset comprising more than 870,000 terminal hours of minute-level telemetry from 1,292 Starlink terminals, we integrate high-resolution localized weather observations to quantify the impact of various meteorological conditions. We evaluated key performance indicators (KPIs)-including ping latency, ping drop rate, and signal quality-using spatial join techniques and time-aligned correlation with classified weather events. Our analysis reveals that severe weather events, such as thunderstorms with heavy rain or snow, have a pronounced effect on network performance. In particular, more than 55% affected terminals experienced substantial degradation. Temporal continuity analysis at the minute level shows that such degradation can lead to sustained impairments or full service outages lasting from several minutes to multiple hours.This work contributes to the first large-scale empirical study linking LEO satellite Internet performance with fine-grained weather data in both space and time. Our findings offer actionable insights for geospatial predictive modeling, weather-aware network provisioning, and resilient satellite communication system design. We also propose a framework for incorporating weather-inferred performance variability into future geospatial planning and service-level forecasting tools for LEO-based Internet systems.

Geometry-preserving and interpretable dimension reduction for compositional data

2026-06-01T01:59:02Z

High-dimensional compositional data pose unique statistical challenges due to the simplex constraint and excess zeros. While dimension reduction is indispensable for analyzing such data, conventional approaches often rely on log-ratio transformations that compromise interpretability and distort the data through ad hoc zero replacements. To address these issues, we introduce a geometry-preserving framework for dimension reduction of compositional data, mapping high-dimensional compositions directly to a lower-dimensional simplex. This framework is interpretable as a softened amalgamation of compositions and enables dual visualization -- showing both projected data and how variables contribute to reduced components -- for at-a-glance interpretation. Within this geometry, we define a new sufficient dimension reduction (SDR) approach for compositional predictors, whose identifiable object, termed the central compositional subspace, differs from the classical central subspace in Euclidean SDR. For estimation, we propose a kernel-based method that yields sparse solutions and comes with an intrinsic predictive model for direct downstream analyses. We prove consistency through a new subspace-comparison argument that allows the estimated and target subspaces to have different dimensions. Applications to real microbiome datasets demonstrate that our approach provides a powerful graphical exploration tool for uncovering meaningful biological patterns in high-dimensional compositional data.

Multiview Graph Fusion with Covariates

2026-06-01T01:25:56Z

Joint modeling of multiview graphs with a common set of nodes between views and auxiliary predictors is an essential, yet less explored, area in statistical methodology. Traditional approaches often treat graphs in different views as independent or fail to adequately incorporate predictors, potentially missing complex dependencies within and across graph views and leading to reduced inferential accuracy. Motivated by such methodological shortcomings, we introduce an integrative Bayesian approach for joint learning of a multiview graph with vector-valued predictors. Our modeling framework assumes a common set of nodes for each graph view while allowing for diverse interconnections or edge weights between nodes across graph views, accommodating both binary and continuous valued edge weights. By adopting a hierarchical Bayesian modeling approach, our framework seamlessly integrates information from diverse graphs through carefully designed prior distributions on model parameters. This approach enables the estimation of crucial model parameters defining the relationship between these graph views and predictors, as well as offers predictive inference of the graph views. Crucially, the approach provides uncertainty quantification in all such inferences. Theoretical analysis establishes that the posterior predictive density for our model asymptotically converges to the true data-generating density, under mild assumptions on the true data-generating density and the growth of the number of graph nodes relative to the sample size. Simulation studies validate the inferential advantages of our approach over predictor-dependent tensor learning and independent learning of different graph views with predictors. We further illustrate model utility by analyzing functional connectivity graphs in neuroscience under cognitive control tasks, relating task-related brain connectivity with phenotypic measures.

The Information Content of Quasar Variability Light Curves: How Well Can we Infer Stochastic Model Parameters?

2026-05-31T23:28:30Z

Quasar variability, driven by multi-scale physical processing within a relativistic accretion disk, is commonly modelled with stochastic time series models. The simplest of these is the Damped Random Walk (DRW), also known as the Ornstein-Uhlenbeck (OU) process. Here, we demonstrate that, when fitting such a model to quasar light curve data, the mean of the light curve, $μ$, should not be fixed (which is the typical approach), as this leads to overconfident inferences about the variability timescale $τ$, with substantially underestimated uncertainties. However, the short term volatility parameter $η$ is typically very well constrained from short light curves. Through simulations, we compute information theoretic quantities such as the conditional entropy and the mutual information, confirming that light curves provide much more information about $η$ than about $τ$. As a result, we recommend that future quasar variability studies focus on $η$ rather than $τ$. To demonstrate this approach, we fit a hierarchical Bayesian regression model for $η$ as a function of bolometric luminosity and rest wavelength to a dataset of 570 light curves measured over decades. We perform the fit using a likelihood function that uses the light curves directly, rather than using intermediate $η$ values from individual light curve fits. We find that volatility decreases as a function of both bolometric luminosity and rest wavelength. The volatility also decreases more steeply with redshift than time dilation alone would suggest, pointing to an increase in intrinsic volatility as quasars evolve over cosmic time.

Model complexity in econometrics - a combinatorial analysis

2026-05-31T23:14:18Z

Regression models and Vector Autoregressive Models (VARs) play crucial roles in econometrics by allowing the analysis of multiple variables simultaneously. Despite their utility, these models face challenges like underfitting and overfitting, especially when determining the optimal model specification, which can lead to significant computational costs. To address these challenges, econometricians often rely on widely adopted model selection criteria such as the Akaike Information Criterion (AIC) and Bayesian Information Criterion (BIC). These criteria help balance model complexity and goodness of fit, aiding in the selection of the most suitable model specification for the given data. Nonetheless, there is a notable gap in existing research concerning the correct specification of these models, particularly in determining the optimal number of states a system can assume. Addressing this gap, we introduce a combinatorial framework designed to calculate the potential number of states in such econometric models. Our approach involves delineating four distinct stages in model development, each offering a range of specifications. This method enables a comprehensive combinatorial calculation of all possible states. The aim of this paper is to highlight this overlooked aspect of model specification and to spark a constructive dialogue within the empirical research community. By doing so, we hope to inspire further research that enhances the precision and applicability of econometric models. A theoretical complexity criterion is necessary to elucidate fundamental limitations and propose new objectives to pursue.

Logistic regression is not enough: The need for Bayesian nonparametric modelling for causal inference using observational data, exemplified by the 'gateway' effect

2026-05-31T22:41:50Z

Introduction: Logistic regression (LR)-type model limitations for causal inference are explained theoretically and empirically through the lens of the purported gateway effect from e-cigarette use to smoking. Previous studies have reported that baseline e-cigarette use quadruples odds of follow-up smoking (binarized) in LR-type models of adolescent longitudinal cohorts (LCs), such that increased e-cigarette use would counteract smoking declines. However, US population-level trends show accelerated smoking declines to record-lows when e-cigarette use increased, presenting an apparent paradox. Methods: Population Assessment of Tobacco and Health (USA) Youth Waves 3 to 4 were analyzed with Bayesian Additive Regression Trees (BART) to model baseline e-cigarette use (treatment) and change in number of days smoking from baseline to follow-up (numerical response) among never- and ever-smoking respondents (group effects), adjusting for confounding risk factors (socio-demographic, intra-individual, behavioural, peer influence, and family background). Unlike LR-type models, BART provides nonlinear, nonparametric modelling with counterfactuals and provides causal effect estimates with principled uncertainty estimation. Results: The average effect of e-cigarette use on smoking was both clinically and statistically significant among ever-smoking adolescents (-2 days smoking [diversionary effect; opposite to gateway]) and was not clinically significant among never-smoking adolescents (<1-day absolute change in days smoking [null effect]). Conclusions: When LC data are analyzed with causal inference techniques, the gateway effect disappears, consistent with population-level trends. This likely explains why gateway effects predicted in previous LR-type studies have not materialized in a population-level reversal/unexpected slowing of the US adolescent smoking decline, resolving the paradox.

Quantifying Evidential Rigor in Meta-Analytic Corpora: A Simulation-Characterized, Bias-Robust Bayesian Workflow with a Nutrition Case Study

2026-05-31T19:56:12Z

Conventional meta-analysis summarizes evidence through pooled estimates, intervals, and p-values, but these outputs do not directly measure evidence for an effect, evidence for no effect, or the degree to which conclusions depend on publication selection or small-study effects. We introduce a corpus-scale Bayesian evidential-audit workflow for meta-analytic corpora. The workflow reconstructs or accepts study-level effects and standard errors, harmonizes directions, fits a matched Bayesian random-effects baseline and a bias-aware model-averaged ensemble, and reports paired estimates with component and joint model-family evidence. The central estimand is rigor: a joint Bayes-factor summary combining resolved effect/no-effect evidence with absence of an explicit bias component in the fitted ensemble. Rigor is not a positive-finding score; no-effect evidence can score highly, whereas inconclusive or bias-dependent evidence scores poorly. We characterize the workflow using an ADEMP-framed simulation/resampling design with known-cell synthetic simulation, empirical registry resampling, and empirical fitted-profile-weighted synthetic sampling. A nutrition intervention corpus provides the worked case study, where bias-aware fitting often attenuates conventional estimates and many nominally meaningful effects lose clean evidential support. A public companion repository provides empirical inputs, generated artifacts, simulation source/design files, and documentation for reproducing and adapting the audit.

Domain-Shift-Aware Conformal Prediction for Large Language Models

2026-05-31T19:40:48Z

Large language models have achieved impressive performance across diverse tasks. However, their tendency to produce overconfident and factually incorrect outputs, known as hallucinations, poses risks in real-world applications. Conformal prediction provides finite-sample, distribution-free coverage guarantees, but standard conformal prediction breaks down under domain shift, often leading to under-coverage and unreliable prediction sets. We propose a new framework called Domain-Shift-Aware Conformal Prediction (DS-CP). Our framework adapts conformal prediction to large language models under domain shift, by systematically reweighting calibration samples based on their proximity to the test prompt, thereby preserving validity while enhancing adaptivity. Our theoretical analysis and experiments on the MMLU benchmark demonstrate that the proposed method delivers more reliable coverage than standard conformal prediction, especially under substantial distribution shifts, while maintaining efficiency. This provides a practical step toward trustworthy uncertainty quantification for large language models in real-world deployment.

Conformal Risk Prediction for Non-Alcoholic Fatty Liver Disease Using Gradient Boosting with Distribution-Free Coverages

2026-05-31T14:23:21Z

Non-alcoholic fatty liver disease (NAFLD) affects roughly 25% of global adults, posing substantial hepatic and cardiovascular risks. Yet, population-level screening tools remain inadequate. We present Method, a machine-learning framework for NAFLD risk prediction coupling gradient-boosted decision trees with conformal prediction to yield calibrated, distribution-free coverage guarantees on individual risk estimates. It integrates a mutual-information-based stability selection procedure to identify a compact, clinically interpretable feature subset via bootstrap resampling, constructing prediction sets whose marginal coverage provably exceeds a user-specified confidence level. We evaluated Method on a multicenter cohort from Guangzhou, China (primary n=2,187; external validation n=412) using 78 candidate features across demographics, metabolic biomarkers, and lifestyle factors. Method achieves an AUROC of 0.912 internally and 0.891 externally, outperforming deep neural networks, TabNet, support vector machines, and logistic regression. Conformal prediction sets achieve 91.3% empirical coverage at the 90% nominal level. A three-tier risk stratification derived from these scores separates the population into distinct groups, with the high-risk subgroup showing a 12-month progression rate 4.7 times that of the low-risk tier. The selected features -- notably waist circumference, ALT, GGT, triglycerides, fasting glucose, and BMI -- align with established metabolic risk factors, providing biological plausibility.

Analysis of Ethnic Disparities in Autism Spectrum Disorder among Toddlers

2026-05-31T13:10:41Z

Autism Spectrum Disorder (ASD) is a neurodevelopmental disorder characterized by challenges in communication and behavior. This study examines the relationship between ethnicity and ASD traits, along with behavioural scores, sex and neonatal jaundice across three ethnic groups: White Europeans, Asians, and Middle Eastern individuals. We perform a logistic regression and show that ethnicity has a significant effect on incidence of ASD. White Europeans are 81% increased risk of ASD and Middle Easterners are at 79\% reduced risk of ASD compared to Asians. We also confirm earlier studied which show that neonatal jaundice is a significant predictor of ASD, while male children are at much higher risk of ASD compared to female children. These results suggest the need for diagnostic frameworks and interventions that account for ethnic in the presentation and assessment of ASD traits

Markovianity-Based Conditioning Depth Diagnostics for Hidden Confounding in Observational Datasets

2026-05-31T13:03:58Z

Reliable causal discovery in time series depends on whether the conditioning set adequately represents the system state. If relevant history or unobserved processes are omitted, residual dependence can appear as direct causal links. We study this failure mode on promnient constraint-based causal discovery methods through a simple premise: how much does the inferred graph change as conditioning depth increases? When the observed process is described approximately by a finite-order Markovian representation, inferred graphs should stabilize once sufficient past observations are observed. Hidden confounding and other hidden-memory mechanisms should remain sensitive to depth when the observed state is incomplete. We formalise this behavior with graph instability statistics computed over the conditioning-depth grid. The empirical study covers synthetic systems with known ground truth and calcium imaging recordings with unknown causal structure. In simulations, both Markovian and non-Markovian systems relatively upheld our premise. With known ground truth, we evaluate recovery using confusion matrix metrics; while in real data without ground truth, we use descriptive graph instability summaries. Across synthetic Markovian and hidden memory systems, c-GC variants give the clearest separation, while PCMCI variants show weaker compatible trends. In real data, inferred connectivity drops sharply with conditioning depths and then levels off. This method, however, does not recover latent graphs, nor does it clearly separate latent confounding from lag-order misspecification, non-stationarity, measurement error. Its contribution is more modest and practical: and explicit model-checking tool for deciding when causal claims are stable and when they should be treated caustiosly.

How to Correctly Report LLM-as-a-Judge Evaluations

2026-05-31T12:00:00Z

Large language models (LLMs) are widely used as scalable evaluators of model responses in lieu of human annotators. However, imperfect sensitivity and specificity of the LLM judges induce bias in naive evaluation scores. We propose a simple plug-in framework that corrects this bias and enables statistically principled uncertainty quantification. Our framework constructs confidence intervals that account for uncertainty from both the test dataset and a human-labeled calibration dataset. Additionally, it uses an adaptive strategy to allocate calibration samples for tighter intervals. Importantly, we characterize parameter regimes defined by the true evaluation score and the LLM judge's sensitivity and specificity in which our LLM-based evaluation yields more reliable estimates than human-only evaluation. Moreover, we show that our framework remains unbiased under distribution shift between the test and calibration datasets, in contrast to existing approaches.

Incorporating estimands into meta-analyses of clinical trials

2026-05-31T11:47:23Z

The estimand framework is increasingly established to pose research questions in confirmatory clinical trials. In evidence synthesis, the uptake of estimands has been modest, and the PICO (Population, Intervention, Comparator, Outcome) framework is more often applied. While PICOs and estimands have overlapping elements, the estimand framework explicitly considers different strategies for intercurrent events. We propose a pragmatic framework for the use of estimands in meta-analyses of clinical trials, highlighting the value of estimands to systematically identify and mitigate key sources of quantitative heterogeneity, and to enhance the applicability or external validity of pooled estimates. Focus is placed on the role of strategies for intercurrent events, within the specific context of meta-analyses for health technology assessment. We apply the estimand framework to a network meta-analysis of clinical trials, comparing the efficacy of semaglutide versus dulaglutide in type 2 diabetes. We explore the impact of a treatment policy strategy for treatment discontinuation or initiation of rescue medication versus a hypothetical strategy for the corresponding intercurrent events. The specification of different target estimands at the meta-analytical level allows us to be explicit about the source of heterogeneity, the intercurrent event strategy, driving any potential differences in results. We advocate for the integration of estimands into the planning of meta-analyses, while acknowledging that potential challenges exist in the absence of subject-level data. Estimands can complement PICOs to strengthen communication between stakeholders about what evidence syntheses seek to demonstrate, and to ensure that the generated evidence is maximally relevant to healthcare decision-makers.

HR-VILAGE-3K3M: A Human Respiratory Viral Immunization Longitudinal Gene Expression Dataset for Systems Immunity

2026-05-31T06:33:45Z

Respiratory viral infections pose a global health burden, yet the cellular immune mechanisms underlying protection and pathology remain unclear. Natural infection cohorts often lack pre-exposure baselines and time-controlled sampling, whereas inoculation and vaccination trials generate well-structured longitudinal transcriptomic data. However, these datasets are scattered across repositories and processed inconsistently, hindering integrative and AI-driven analyses. To address these challenges, we developed the Human Respiratory Viral Immunization LongitudinAl Gene Expression (HR-VILAGE-3K3M) repository: an AI-ready resource integrating bulk and single-cell transcriptomic profiles from 3,178 subjects across 66 studies. The dataset spans vaccination, inoculation, and mixed exposures, with samples from blood and nasal swabs collected from public repositories including GEO, ImmPort, and ArrayExpress. We curated and harmonized subject-level metadata, standardized outcome measures, and applied unified preprocessing with rigorous quality control. We further provide benchmark analyses illustrating its utility. This resource supports discovery of biomarkers, immune mechanisms, and methodological development. As one of the largest longitudinal transcriptomic resources for human respiratory viral immunization, HR-VILAGE-3K3M enables reproducible and scalable analyses to accelerate vaccine and antiviral research.