https://arxiv.org/api/fBwcu5DKUSFIFtXLnu0Zvh89jK0 2026-06-13T12:24:44Z 23522 30 15 http://arxiv.org/abs/2606.11144v1 OncoTraj: a public benchmark for longitudinal resistance prediction in EGFR-mutant non-small-cell lung cancer on osimertinib 2026-06-09T17:33:24Z

Resistance to first-line osimertinib in EGFR-mutant non-small-cell lung cancer (NSCLC) is the canonical example of predictable clonal evolution under therapeutic pressure, yet no public benchmark exists for training or evaluating computational models on the corresponding longitudinal patient trajectories. We introduce OncoTraj, a public benchmark of 813 EGFR-mutant NSCLC patients receiving first-line osimertinib, harmonized from three real-world clinical-genomic sources: MSK-CHORD (672 patients), AACR Project GENIE BPC NSCLC (34 patients), and the FLAURA molecular-resistance supplement (107 patients). OncoTraj defines three locked tasks: (A) binary classification of progression by a fixed 12-month landmark, (B) regression of time-to-first-progression in days, and (C) six-class classification of the dominant resistance mechanism. We release the harmonized dataset, patient-level train/validation/test splits with an audited no-leakage guarantee, an open-source evaluation harness, and six reference baselines spanning a majority-class predictor, logistic regression, random forest, XGBoost, an LSTM, and a multi-task transformer. With v1's single-timepoint snapshot features, no task clears chance on clean within-source evaluation: the uniformity of this ceiling across every model class localizes the limit to the input modality (single-snapshot tissue NGS rather than serial ctDNA), not the algorithm. The benchmark does recover a reproducible literature-consistent association: TP53 co-mutation raises the 12-month progression rate from 29% to 59% cohort-wide. OncoTraj establishes a reproducible, leakage-audited baseline and converts the modality limit into concrete design requirements for a serial-ctDNA-enriched v2.

2026-06-09T17:33:24Z 24 pages, 7 figures, 4 tables. Code, data, and trained model weights: https://github.com/span-ai-labs/oncotraj. Python package: pip install oncotraj. Dataset: https://huggingface.co/datasets/span-ai-labs/oncotraj-v1 Abhijoy Sarkar Aarchi Singh Thakur http://arxiv.org/abs/2606.11140v1 Data assimilation for subsurface flow using latent diffusion model parameterization: performance of ensemble-Kalman and Monte Carlo techniques 2026-06-09T17:29:47Z

Data assimilation (DA) in subsurface flow entails calibrating model parameters to match observed data, typically at wells, while preserving geological realism. Latent diffusion models (LDMs) provide efficient mappings from high-dimensional geological model space to a low-dimensional latent variable, reducing the dimensionality of the inverse problem while maintaining plausibility in posterior geomodels. However, the high nonlinearity in the LDM mapping may degrade the performance of Kalman-gain-based ensemble updates. We present a systematic comparison of DA algorithms applied to large-scale 3D channelized geomodels with hierarchical geological uncertainty. We compare model-space and latent-space DA using the ensemble smoother with multiple data assimilation (ESMDA), and demonstrate a key trade-off: model-space updates achieve significant uncertainty reduction but produce geologically unrealistic posterior models, while latent-space updates preserve realism but exhibit limited uncertainty reduction. Motivated by this, we explore rigorous Markov chain Monte Carlo (MCMC) and Sequential Monte Carlo (SMC) algorithms in the 3D-LDM latent space. To accommodate their high computational demands, we develop a fast surrogate flow model that approximates well-rate responses. MCMC and SMC are evaluated against ESMDA across three synthetic test cases, with DA performed in the LDM latent space. All models maintain geological realism due to the LDM parameterization. MCMC and SMC are consistent with one another and achieve lower data mismatch and more uncertainty reduction than latent-space ESMDA. Our overall results demonstrate that ensemble Kalman methods may provide overestimated posterior uncertainty with highly nonlinear parameterizations, while rigorous Monte Carlo sampling, enabled by fast surrogate models, can provide a more reliable alternative.

2026-06-09T17:29:47Z Guido Di Federico Wenchao Teng Louis J. Durlofsky http://arxiv.org/abs/2606.11282v1 The Statistical Compass 2026-06-09T14:39:29Z

This monograph develops probability and stochastic-process ideas as a translation language for statistics: from designed observations and data objects to targets, stability statements, inference, and use. The chapters move from motivating examples and randomization through probability measures, kernels, likelihoods, data objects, weak convergence, empirical fields, functional data, M- and Z-estimation, testing, local approximations, event-time processes, and prediction. Historical and biomedical examples are used to keep abstract objects tied to records, mechanisms, and decisions. The aim is to give readers a common grammar for classical probability, modern data structures, and statistical practice.

2026-06-09T14:39:29Z 669 pages, 23 figures; textbook/monograph working manuscript Eliuvish Han Cui http://arxiv.org/abs/2603.08924v2 Quantifying Uncertainty in AI Visibility: A Statistical Framework for Generative Search Measurement 2026-06-09T14:18:14Z

AI-powered answer engines are inherently non-deterministic: identical queries submitted at different times can produce different responses and cite different sources. Despite this stochastic behavior, current approaches to measuring domain visibility in generative search typically rely on single-run point estimates of citation share and prevalence, implicitly treating them as fixed values. This paper argues that citation visibility metrics should be treated as sample estimators of an underlying response distribution rather than fixed values. We conduct an empirical study of citation variability across three generative search platforms--Perplexity Search, OpenAI SearchGPT, and Google Gemini--using repeated sampling across three consumer product topics. Two sampling regimes are employed: daily collections over nine days and high-frequency sampling at ten-minute intervals. We show that citation distributions follow a power-law form and exhibit substantial variability across repeated samples. Bootstrap confidence intervals reveal that many apparent differences between domains fall within the noise floor of the measurement process. Distribution-wide rank stability analysis further demonstrates that citation rankings are unstable across samples, not only among top-ranked domains but throughout the frequently cited domain set. These findings demonstrate that single-run visibility metrics provide a misleadingly precise picture of domain performance in generative search. We argue that citation visibility must be reported with uncertainty estimates and provide practical guidance for sample sizes required to achieve interpretable confidence intervals.

2026-03-09T20:47:22Z 39 pages, 13 figures Ronald Sielinski http://arxiv.org/abs/2606.10866v1 Adressing Separation: A Firth-corrected Joint Model for Longitudinal and Time-to-event Data with an Application on Dropout from Vocational Training 2026-06-09T13:44:48Z

Joint Models for longitudinal and time-to-event data are frequently used to model endogenous longitudinal covariates alongside a time-to-event outcome. However, the model class borrows some limitations of the survival submodels, including the necessity for non-separation for each category of categorical covariates. We therefore incorporate Firth's correction into the frequentist estimation procedure of joint models in order to make the model class applicable in settings with separation cases. We derive the needed quantities for the correction term and implement it in the Expectation-Maximization Algorithm for the parameter estimation in joint models. Our simulation study shows, that in data situations with separation issues, the Firth-corrected estimation procedure yields less biased estimates and the respective coefficients approach the estimated values observed in the non-separation cases. The application on a data set on satisfaction with and dropouts from vocational training demonstrates the advantages of the Firth-corrected joint model in a real world data set with separation. The results add to the literature on dropout from vocational training in Germany by explicitly modeling direct effects of socioeconomic and training-specific factors on the risk of dropout as well as their indirect contribution via satisfaction with the training.

2026-06-09T13:44:48Z Sophie Potts Viola Deutscher Elisabeth Bergherr http://arxiv.org/abs/2606.10772v1 Structural Under-Representation of Women in News: Nonparametric Bayesian Mixtures Capture Time-Dependent Dynamics 2026-06-09T12:27:07Z

The under-representation of women as sources cited in news media is one prominent representation of gender bias. Understanding where gender bias concentrates and how it evolves is essential for targeted mitigation. Because gender representation varies across topics, time, and reported-on regions, creating complex dependencies that are difficult to capture parametrically, we employ a nonparametric model to uncover latent cluster structures and temporal dynamics. We combine time-dependent Bayesian mixture modeling techniques with a Beta mixture kernel tailored to female quote shares, bounded between 0 and 1. Fitted on Canadian news articles from 2019 to 2024, the model reveals structural under-representation of women across all clusters, with news topic driving differences in female quote shares more strongly than the reported-on region. More than 85% of topic-region time series show no improvement toward gender parity over the observation period. Dynamic density estimation confirms that the aggregate distribution of female quote shares remains stable between 2019 and 2024. Our application demonstrates that advanced probabilistic models not only reproduce findings in gender bias research but also reveal latent dependencies and structural patterns that simpler approaches miss, encouraging future adoption of model-based frameworks for studying media bias.

2026-06-09T12:27:07Z Isabella Habereder Thomas Kneib Isao Echizen Timo Spinde http://arxiv.org/abs/2606.07129v2 Collaborative estimation and evaluation of SARS-CoV-2 variant nowcasting in the United States 2026-06-09T09:00:36Z

The ability to estimate and predict pathogen variant dynamics can inform public health responses, including planning for increased transmission or severity, shifts in population immunity, or changes to vaccine or therapeutic effectiveness. The COVID-19 pandemic demonstrated the importance of monitoring SARS-CoV-2 variant evolution through viral genome sequencing, enabling predictive models to estimate variant frequencies in the recent past, present, and short-term future. Collaborative forecasting Hubs provided a valuable way to centralize predictive modeling of epidemiological indicators such as cases, hospitalizations, and deaths during the pandemic; however, none existed for variant dynamics. Here, we discuss the creation of the United States SARS-CoV-2 Variant Nowcast Hub, designed to solicit estimates of the relative abundance of a specified set of SARS-CoV-2 variants at the U.S. state level. We discuss the design decisions and challenges in building the Hub and its scoring procedures. Using submissions from the Hub's first respiratory virus season (nowcast dates October 9th, 2024 to June 4th, 2025), we evaluate five individual models and a baseline model. We found that the baseline model, which pools sequences across the U.S., performs well overall, with most individual models performing similarly or slightly worse. Locations with lower sequencing volumes exhibited greater variability in model performance. Models submitted for a single location outperformed those submitted for all locations, potentially due to greater timeliness and magnitude of local data. Much remains to be investigated regarding relative model performance across different phases of variant emergence, and we conclude by proposing future directions within and beyond this Hub.

2026-06-05T10:41:24Z 32 pages, 9 figures Isaac MacArthur Thomas Robacker Bren Case Spencer J. Fox Dylan H. Morris Evan L. Ray Benjamin Rogers Becky Sweger Natalie M. Linton John Huddleston Andrew Magee Zachary Susswein Jover Lee Trevor Bedford Marlin D. Figgins Ehsan Suez Rajath Prabhakar Tomas Leon Brent Siegel Mugdha Thakur Christopher M. Hoover Rahil Ryder Jesse Elder Michael Kupperman Ruian Ke Emma Goldberg Sebastian Funk Maryclare Griffin Nicholas G. Reich Kaitlyn E. Johnson http://arxiv.org/abs/2606.10574v1 Two-stage imputation of longitudinal anthropometric data with cross-reference harmonisation: a simulation study 2026-06-09T08:41:09Z

Objective. Longitudinal datasets frequently contain missing weight and height measurements, and studies that combine data sources may index measurements against different growth reference standards (e.g., the WHO reference and CDC charts). We describe and evaluate a reproducible two-stage method that imputes missing anthropometry while making the choice of reference standard an explicit parameter. Methods. Stage 1 applies within-subject linear interpolation across visit dates (interior gaps only, no extrapolation). Stage 2 imputes remaining values from an age- and sex-specific growth reference using the LMS method by estimating each subject's centile, carrying it forward and backwards within the subject, defaulting to the 50th centile when a subject is never measured, and reading the expected value off the reference at the visit age. Different references can be supplied per data source so that the standard applied is recorded and auditable. We assessed recovery accuracy by masking and re-imputing a random 20% of observed values. All evaluations used computer-generated synthetic data. Results. On synthetic data (n = 60 subjects, 288 visits, 30% missing), the method resolved missingness to 100% completeness. Masked-value recovery gave a mean absolute error of 1.78 kg for weight (3.5% mean absolute percentage error) and 2.84 cm for height (2.0%), with negligible bias. Values recovered by within-subject interpolation were more accurate than those recovered from the growth reference, as expected, supporting the two-stage ordering. Conclusion. The method offers a simple, dependency-free, and auditable approach to anthropometric imputation, with explicit handling of differing reference standards and per-value provenance. Application to empirical data and propagation of imputation uncertainty into downstream models are the necessary next steps before use in substantive analyses.

2026-06-09T08:41:09Z Flavia Alves http://arxiv.org/abs/2512.08232v2 Wishart kernel density estimation for strongly mixing time series on the cone of positive definite matrices 2026-06-09T07:04:04Z

A Wishart kernel density estimator (KDE) is introduced for density estimation in the cone of positive definite matrices. The estimator is boundary-aware and mitigates the boundary bias suffered by conventional KDEs, while remaining simple to implement. Its mean squared error, uniform strong consistency on expanding compact sets, and asymptotic normality are established under the Lebesgue measure and suitable mixing conditions. This work represents the first study of density estimation for dependent data on this space under any metric. For independent observations, an asymptotic upper bound on the mean absolute error is also derived. A simulation study compares the performance of the Wishart KDE with that of the log-Gaussian KDE, another boundary-aware estimator based on the matrix-variate lognormal distribution proposed by Schwartzman [Int. Stat. Rev., 2016, 84(3), 456--486], and with the naive Gaussian KDE on the ambient Euclidean space. When estimating the stationary marginal density of a Wishart autoregressive process for several autoregressive coefficient matrices and innovation covariance matrices, the Wishart KDE exhibits the best overall accuracy and stability. The practical utility of the Wishart KDE is illustrated by estimating the marginal density of a one-year time series of realized covariance matrices computed from 5-minute intra-day returns on Amazon Corp. shares and on the Standard & Poor's 500 exchange-traded fund. All code is publicly available via the R package ksm to facilitate implementation of the method and reproducibility of the findings.

2025-12-09T04:17:16Z 43 pages, 4 figures, 2 tables Léo R. Belzile Christian Genest Frédéric Ouimet Donald Richards http://arxiv.org/abs/2606.10342v1 Binomial Smoothing for Inventory and Information Control in Supply Chains 2026-06-09T02:51:51Z

In many decentralized supply chains, upstream firms do not observe market demand directly and instead infer downstream conditions from the order stream. A retailer's replenishment policy therefore plays a dual role: it governs inventory replenishment and shapes the information available for upstream forecasting. This creates a fundamental trade-off. Smoother orders improve upstream predictability, but delaying the response to demand can increase downstream inventory costs. We study how a retailer should optimally smooth demand in a two-tier supply chain with one retailer and one manufacturer when the manufacturer forecasts future orders from the retailer's order history. We propose Binomial Smoothing, a class of replenishment policies that implements delayed demand response by spreading each unit of demand over a finite horizon using binomial weights. The class is interpretable, easy to calibrate, and analytically tractable. Under weakly stationary Gaussian demand satisfying mild regularity conditions, we show that, for any fixed smoothing horizon, the Binomial policy minimizes the manufacturer's forecast error among all policies with the same degree of smoothing. It remains invertible, so the manufacturer can recover demand history from observed orders. More generally, Binomial Smoothing achieves a constant-factor approximation guarantee relative to an optimal policy. Our results yield a broader insight: replenishment policies should be designed not merely to reduce order variance, as in the traditional bullwhip measure, but to reduce the unpredictable component of orders. Carefully designed smoothing can improve supply-chain performance and partially substitute for information sharing, providing a concrete mechanism for coordination without collaboration.

2026-06-09T02:51:51Z 59 pages, 7 figures, 4 tables Rene Caldentey Avi Giloni Clifford Hurvich Prem Talwai Yichen Zhang http://arxiv.org/abs/2606.10330v1 The Power of Altruism in Sticker Economics: Generosity Minimizes Collective Costs and Overprotective Norms Fuel Inefficiency 2026-06-09T02:24:59Z

Collecting the FIFA World Cup sticker album presents a classic public-goods and collective-action dilemma, in which completing a collection on one's own is highly inefficient. To evaluate how localized community norms shape collective efficiency, we use agent-based modeling and Monte Carlo simulations, parameterized with empirical field observations from exchange meetups in Natal, Brazil. Reflecting the tournament's recent expansion, the Panini 2026 album features 980 individual stickers, including 68 metallic specials. We contrast a standard baseline economy (1:2 special-to-normal exchange ratio) with an overprotective, strict strategy (exclusive special-for-special trading) and an altruistic, generous strategy (in which advanced players surrender needed duplicates to assist peers). Our findings reveal that overprotective rules trap liquidity and drive network-wide inefficiency. The strict strategy increases median completion costs by 10 packs and severely penalizes the least fortunate 5\% of collectors, adding 20 packs in large cities and 30 in small communities. Conversely, widespread generosity optimizes network liquidity and dramatically compresses the long tail of bad luck. Introducing the generous strategy reduces required purchases for the 5th percentile by 90 packs in large-scale configurations and 130 packs in smaller clusters. Furthermore, widespread altruism triggers a strong functional coupling that effectively synchronizes completion rates across the network. This study demonstrates that while rigid, protective norms degrade collective welfare, generosity successfully mitigates pack-draw variance, transforming an expensive, isolated hobby into a resilient, highly efficient public good.

2026-06-09T02:24:59Z Luana Ferraz Alvarenga Caetano Alvarenga Costa César Rennó-Costa http://arxiv.org/abs/2603.01374v2 Multi-pathogen situational assessment and forecasting of respiratory disease in Aotearoa New Zealand 2026-06-09T02:05:57Z

Real-time analysis of epidemic trends and forecasts can help support public health planning and the response to seasonal respiratory disease. Here, we present two models that were used in a 2025 New Zealand winter situational assessment programme for three respiratory pathogens: SARS-CoV-2, influenza and respiratory syncytial virus (RSV). Data on SARS-CoV-2 were obtained from the national Covid-19 surveillance system; data on influenza and RSV were limited to a sentinel hospital surveillance programme. Models were run weekly from May to October 2025 on these real-time disease surveillance data and provided a quantitative representation of the current epidemic trend, along with estimates of the epidemic growth rate and 28-day ahead forecasts of case incidence. Model results and interpretation were provided in weekly reports to public health partners as part of a trans-Tasman winter programme run by the Australia--Aotearoa Consortium for Epidemic Forecasting and Analytics (ACEFA). We compare in-season results that were included in these reports to a retrospective analysis of the complete data for the season. We conclude that real-time analyses performed reasonably well, and identify some areas for improvement in future winter situational assessment programmes.

2026-03-02T02:15:51Z M. J. Plank A. R. Young K. L. Senior R. J. Tobin M. O'Hara-Wild F. Callaghan F. Shearer O. Eales http://arxiv.org/abs/2310.09295v2 On the Impact of Insurance on Households Susceptible to Random Proportional Losses: An Analysis of Poverty Trapping 2026-06-08T23:54:18Z

The trapping probability, $ψ$, as defined in Kovacevic and Pflug (2011), is modelled by assuming proportional capital losses, both in the case where there is no insurance and in the case where insurance is purchased by the household. Insurance coverage is likewise proportional, mirroring the structure of quota-share contracts, which are both prevalent in practice and analytically convenient. New closed formulae for $ψ$ are obtained in the case of no insurance when the distribution of the remaining proportion of capital is a power law, extending the results in Kovacevic and Pflug (2011). When proportional insurance is acquired and the remaining proportion of capital is uniformly distributed on $[0,1]$, $ψ$ satisfies a non-local differential equation whose analysis is based on the properties of diffusion processes. The non-local nature of the equation can be addressed using iterative solution methods, leading to a constructive determination of the trapping probability. Constraints on the parameters governing the capital process are derived in both the uninsured and insured cases to prevent the certainty of trapping. Numerical calculations are used to determine the trapping probability for the insured process and to illustrate the impact of different parameters. Consequences on the trapping probability for vulnerable non-poor populations with initial capital slightly above the poverty line are discussed.

2023-09-22T14:00:02Z 42 pages, 9 figures Kira Henshaw Jorge Ramirez José Miguel Flores-Contró Enrique A. Thomann Sooie-Hoe Loke Corina Constantinescu http://arxiv.org/abs/2606.10256v1 Confidence, Statistical Evidence and Relative Belief with Applications to a Problem in Particle Physics 2026-06-08T23:48:47Z

Probability theory provides a clear definition of what is meant by evidence in favor, against or none either way, of an event occurring for an unobserved response, via the principle of evidence. This is immediately applicable when carrying out a proper Bayesian analysis. Even without a prior, this imposes restrictions on reported inferences as these need to reflect the likelihood ordering. Relative belief inferences satisfy this requirement and, when the errors in these inferences are controlled, they also satisfy repeated sampling, or frequentist, requirements such as achieving given confidence levels. Relative belief inferences are considered here for the construction of intervals for uncertainty quantification in the context of a Poisson model for a signal with background noise. These intervals are contrasted with the well-known Feldman-Cousins intervals for this problem.

2026-06-08T23:48:47Z Michael Evans Siqi Zheng http://arxiv.org/abs/2501.17835v2 An Estimator-Robust Design for Augmenting Randomized Controlled Trials with External Real-World Data 2026-06-08T22:54:16Z

Augmenting randomized controlled trials (RCTs) with external real-world data (RWD) has the potential to improve the finite sample efficiency of treatment effect estimators. We describe using adaptive targeted maximum likelihood estimation (A-TMLE) for estimating the average treatment effect (ATE) by decomposing the ATE estimand into two components: a pooled-ATE estimand that combines data from both the RCT and external sources, and a bias estimand that captures the conditional effect of RCT enrollment on the outcome. This approach views the RCT data as the reference and corrects for inconsistencies of any kind between the RCT and the external data source. Given the growing abundance of external RWD from modern electronic health records, determining the optimal strategy to select candidate external patients for data integration remains an open yet critical problem. In this work, we begin by studying the robustness property of the A-TMLE estimator and then propose a matching-based sampling strategy that attempts to improve the robustness of the estimator with respect to the target estimand. Our proposed strategy is outcome-blind and involves matching based on two one-dimensional scores: the trial enrollment score and the propensity score in the external data. We demonstrate in simulations that our sampling strategy improves the coverage and narrows the widths of confidence intervals produced by A-TMLE. We illustrate our method with a case study of augmenting the DEVOTE cardiovascular safety trial by using the Optum Clinformatics claims database.

2025-01-29T18:31:25Z Sky Qiu Jens Tarp Andrew Mertens Mark van der Laan