The Effect of Training Task Diversity on In-Context Learning through the Lens of Low-Dimensional Subspaces

2026-06-05T01:35:42Z

The transformer's emergent ability to perform in-context learning (ICL) has sparked a wide range of studies designed to understand its underlying mechanisms. Existing works often study how training task diversity, defined either as the number of ICL training task vectors or as the number of function classes from which the task vectors are drawn, shapes both the learning dynamics and generalization capabilities of ICL. While both definitions have uncovered many interesting phenomena, many observations under the latter definition remain theoretically unexplained. This paper presents a minimal analytical model under which these phenomena provably emerge from the properties of the training data. By modeling the training task vectors as a mixture of low-rank Gaussians, we show how training task diversity, defined by the number of non-overlapping columns between subspaces that parameterize the covariance matrices, improves both the generalization and optimization trajectory of ICL with linear attention. In particular, we show that our model can explain (i) why training with task diversity shortens the ICL plateau and (ii) why ICL appears to achieve out-of-distribution generalization. We conclude by empirically demonstrating how our results extend to nonlinear transformers and nonlinear function classes. Overall, our work presents a tractable framework to unify existing observations.

Interpreting Learning Under Competing Models: Joint and Stepwise Approaches for Dynamic Cognitive Diagnosis

2026-06-05T01:08:48Z

Digital learning environments record learners' responses to individual items, making it possible to study the development of specific skills rather than overall scores. Drawing conclusions about learning from these data requires a model that links responses to latent skills and tracks how mastery changes over time. When the skills measured by each item are unknown, the analyst must decide whether to estimate this structure, the Q-matrix, jointly with the learning process, or to establish it first and study learning afterwards. We show that this decision can change substantive conclusions about how learners develop. Using dynamic cognitive diagnostic models, we analyse data from two reading games measuring vocabulary and comprehension from Grade 2 to Grade 3, with item-text embeddings providing prior information for the unknown Q-matrix. A joint analysis and a bias-corrected stepwise analysis agree that most learners move toward mastering both skills, but disagree about how many remain only partially proficient at Grade 3, changing how reading progress would be reported. A simulation study identifies when the two analyses diverge and shows that joint analysis is more reliable when the item-skill structure is uncertain and the item pool changes between grades. We provide R code for both analyses.

Robust inference for cyclic-stress accelerated life tests under interval monitoring with lognormal lifetimes

2026-06-04T20:37:01Z

Highly reliable products are often tested under accelerated conditions to provoke failures within a feasible timeframe. For products whose service life involves repeated alternation between two stress levels, such as automotive air-conditioners, batteries, and aerospace components, cyclic-stress accelerated life testing (CyALT) provides a more realistic loading profile than conventional accelerated tests. In practice, failures are often recorded only at scheduled inspection times, leading to interval-censored counts rather than exact lifetimes. Moreover, traditional maximum likelihood estimation is sensitive to data contamination, which is a genuine concern in small-sample industrial experiments. This paper develops robust inferential procedures for CyALT models with lognormal lifetimes under interval monitoring. Robust estimators are obtained by minimizing a weighted density power divergence (WDPD), leading to the weighted minimum density power divergence estimator (WMDPDE). We establish the asymptotic distribution of the WMDPDE, derive influence function expressions to characterize the robustness, and present asymptotic and bootstrap confidence intervals for important lifetime characteristics. A simulation study confirms that the WMDPDE provides substantial protection against outliers while retaining high efficiency under clean data. The methodology is illustrated through the analysis of an air-conditioner reliability dataset, demonstrating the practical advantages of robust inference in the CyALT framework.

Disentangling Latent Risk Pathways via Bayesian Hypergraph Inference

2026-06-04T19:56:09Z

Electronic health records (EHR) pose large-scale multi-disease modeling problems in which many outcomes are rare and strongly influenced by shared risk factors. While modern approaches achieve strong predictive performance, they often treat diseases independently or rely on black-box architectures, offering limited insight into how risk factors organize disease risk and little principled uncertainty quantification. We introduce a Bayesian hypergraph inference framework that reframes multi-disease modeling around latent, risk-factor-modulated disease pathways. Risk factors act on hyperedges, latent disease subsets with shared risk patterns, allowing diseases to participate in multiple distinct pathways and enabling interpretable, higher-order structure beyond pairwise associations. A repulsion prior encourages parsimonious and identifiable structure, while posterior inference provides calibrated uncertainty over both disease groupings and risk-factor influence. To enable scalable inference on large EHR datasets, we develop a structured variational inference algorithm that preserves logical dependencies among hyperedge existence, disease membership, and pathway-level effects. Experiments on simulated data and UK Biobank demonstrate stable and interpretable disease pathway structure, well-calibrated uncertainty, improved estimation for rare diseases, and competitive predictive performance.

When Should Forecasting Models Be Re-Specified? A Cost-Sensitive Trigger for Adaptive Model-Form Updating

2026-06-04T19:43:13Z

Forecasting systems are commonly refreshed at every review period, and that refresh usually bundles two distinct operations: estimating parameters and selecting the model form. Recent evidence suggests the second operation is often unnecessary, since intermediate updating strategies can hold forecast accuracy roughly fixed while cutting computational cost and forecast instability. This technical note takes up the complementary question. Once a system has adopted a reduced-update policy, when should it interrupt that policy and re-specify the model form? We define specification debt as the evidence accumulated against the deployed model form, and we use it to build a cost-sensitive trigger for re-specification. In a closed discrete model space the trigger reduces to a threshold on the negative log posterior probability of the deployed specification. In open production settings the same decision rule can be run with predictive score gaps, stacking weights, or calibrated monitoring diagnostics. Fixed update frequencies turn out to be a special case of the rule, recovered when evidence against the deployed form accumulates at a constant rate. We illustrate the idea on 500 monthly M4 series, comparing full updating, fixed model-form update frequencies, parameter-only updating, and capped adaptive score-triggered updating, and within the finite ETS grid we also compute information-criterion analogues of specification debt from AIC and BIC weights over the candidate forms. In that illustration the best capped adaptive policy is comparable to full updating in accuracy, runs in about 28 percent of full-update computational time, lowers forecast instability, and behaves like a fixed schedule with a small number of evidence-based exceptions.

Counting the uncounted: How many were killed in Guatemala, 1978-1995?

2026-06-04T17:53:49Z

In various application domains, there is a certain `null cell', inside a multinomial setup, where observations are recorded for the other cells, but where one cannot count the number of occurrences for the null cell. I develop inference theory for assessing such unknown numbers, counting the uncounted, in situations where counts are available for the other cells, via parametric modelling. The methods are used to estimate the number of persons killed in Guatemala during the Genocidio guatemalteco years 1978--1995. There are three carefully curated lists of killed people, where the information can be mapped to a Venn diagram with $2^3=8$ cells. Summing over the seven observed cells, $R=\hbox{47,803}$ killed individuals can be identified, but how big is $N_{0,0,0}$, and hence $N=N_{0,0,0}+R$?

Directional-Shift Dirichlet ARMA Models for Compositional Time Series with Structural Break Intervention

2026-06-04T17:44:22Z

Compositional time series frequently exhibit structural breaks due to external shocks, policy changes, or market disruptions. Standard methods either ignore such breaks or handle them through fixed effects that cannot extrapolate beyond the sample, or step-function dummies that impose instantaneous adjustment. We develop a Bayesian Dirichlet ARMA model augmented with a directional-shift intervention mechanism that captures structural breaks through three interpretable parameters: a direction vector specifying which components gain or lose share, an amplitude controlling redistribution magnitude, and a logistic gate governing transition timing and speed. The model preserves compositional constraints by construction, maintains DARMA dynamics for short-run dependence, and produces coherent probabilistic forecasts through and after structural breaks. The intervention trajectory corresponds to geodesic motion on the simplex and is invariant to the choice of ILR basis. A simulation study with 400 fits across 8 scenarios shows near-zero amplitude bias and nominal 80\% credible interval coverage when the shift direction is correctly identified (77.5\% of cases); supplementary studies confirm robustness across extreme transition speeds and non-monotone DGPs. Two empirical applications to COVID-era Airbnb data characterize performance relative to simpler alternatives. Where the break is monotone and ongoing, the intervention model achieves near-nominal calibration (79.6\%) while the fixed effect substantially under-covers (66.1\%). Where post-break dynamics are non-monotone, both models are acceptably calibrated and the fixed effect outperforms on point accuracy. The intervention model's advantages are thus specific to settings with roughly monotone structural transitions.

Leveraging External Controls for Treatment Switching in Randomized Controlled Trials: A Weighted Causal Inference Framework for Overall Survival

2026-06-04T17:41:40Z

In many oncology clinical trials where overall survival is a key endpoint, patients are permitted to switch from the control arm to the experimental treatment arm or other suitable therapies. Switching can occur for various reasons, including disease progression. This violates the causal guarantees of randomized treatment assignment, resulting in biased treatment effect estimates. Existing methods often require strong assumptions, complicated model specifications, or both. In this paper, we propose a general framework that incorporates external controls to account for treatment switching in randomized controlled trials. Leveraging the synthetic control method and balancing weights from observational causal inference, we propose several estimators that use multiple imputation and time-varying weights to adjust for treatment switching. We also discuss approaches to selecting the risk set of external controls to impute from. Through extensive simulation studies, we show that our proposed methods lead to meaningful statistical improvements relative to standard adjustment methods that utilize external controls in naive ways or those that do not utilize external controls at all. We then demonstrate the utility of our external control-based approaches with two phase III oncology trials.

Exact Geometric Typicality and Bipartite Entanglement from the Projected Central Limit Theorem on Hyperspheres

2026-06-04T13:53:10Z

Starting from the exact Projected Central Limit Theorem on hyperspheres, we rederive the Beta distribution for subsystem occupation probabilities and Lubkin's purity formula from elementary hyperspherical moments, quantifying the finite-size ``platykurtic'' suppression of tails relative to the Gaussian approximation used in standard eigenstate-thermalization and typicality treatments. Our main new result concerns the bipartite quantum mutual information $\langle I(A{:}B)\rangle$ for Haar-random pure states. We show that its full asymptotic expansion in $1/N$ admits a Bernoulli-factorized form in which every order $k \ge 1$ carries the symmetric factor $(d_A^{2k}-1)(d_B^{2k}-1)$ and all higher odd-order corrections vanish identically. Through an exact algebraic reorganization of Page's formula (conjectured in Ref.~\cite{Page1993} and subsequently proven~\cite{Foong1994, SanchezRuiz1995, Sen1996}), we establish that the leading finite-size correction separates into a dominant $\mathfrak{su}(d_A) \otimes \mathfrak{su}(d_B)$ bipartite quantum coherence contribution $(d_A^2 - 1)(d_B^2 - 1)/(2N)$ and a subtracted classical-probability (Cartan $\otimes$ Cartan) contribution $(d_A - 1)(d_B - 1)/(2N)$, and we trace this separation to the difference between diagonal and eigenvalue entropies via Schur's majorisation theorem, with the dimensional counts $(d-1)$ and $(d^2-1)$ acquiring meaning through the Cartan structure of the generalised Bloch decomposition. These results admit a single non-perturbative closed form: the exact typical mutual information factors as $\langle I(A{:}B)\rangle = (d_A^2-1)(d_B^2-1)\,\mathcal{G}(d_A,d_B,d_E)$, with $\mathcal{G}$ given by an explicit Bose--Einstein integral whose asymptotic expansion in $1/N$ reproduces the Bernoulli series.

Learning to model pediatric asthma exacerbation from multiple risk factors: a case study in coastal Virginia

2026-06-04T13:47:49Z

Childhood asthma is a common illness exacerbated by air pollution as well as meteorological and neighborhood-level socioeconomic factors. Modeling asthma exacerbation (AE) in large spatiotemporal datasets requires disentangling impacts from multiple contributors. In this case study, we compared three techniques that balance predictive power with interpretability to predict AE in Hampton Roads, a coastal Virginia region comprising 7 cities and over 1.5 million people. After collating ambient air pollution measurements, weather data, and measures of neighborhood opportunity, we modeled zip code-level acute AE visits to a regional children's hospital and affiliated providers from 2018-2023. Generalized linear models (GLM) provided a baseline while neural networks (NN) served as a maximally predictive target. To bridge between statistical models and deep learning, we developed a framework based on sparse dictionary learning to identify and interpret parsimonious nonlinear interacting equations. After comparing each model's predictive performance, we estimated relative risks for AE due to input exposure variables and found consensus across frameworks. Our work links statistical and interpretable machine learning models to highlight possible synergistic interactions influencing AE, and may enable future studies to guide public health interventions in coastal Virginia.

Non-Perturbative Closed Form for the Typical Bipartite Mutual Information of Haar-Random States

2026-06-04T13:46:02Z

The average bipartite quantum mutual information $\langle I(A{:}B)\rangle$ of Haar-random pure states can be expressed exactly through Page's formula in terms of digamma functions. We show that this quantity admits a single non-perturbative closed form: $\langle I(A{:}B)\rangle = (d_A^2-1)(d_B^2-1)\,\mathcal{G}(d_A,d_B,d_E)$, where $\mathcal{G}$ is given by an explicit convergent integral over a Bose--Einstein kernel. The overall factor $(d_A^2-1)(d_B^2-1)=\dim[\mathfrak{su}(d_A)]\cdot\dim[\mathfrak{su}(d_B)]$ is exact, not merely asymptotic. The asymptotic expansion of $\mathcal{G}$ in $1/N$ yields a Bernoulli-factorised series whose coefficients involve $ζ(1{-}2k)$; this series diverges, and our integral is its exact Borel sum. The integral representation also makes $\langle I\rangle < (d_A^2{-}1)(d_B^2{-}1)/(2N)$ manifest via a scale-inversion symmetry of the kernel. Our derivation traces the mutual information's structure to an exact decomposition of Page's entropy into a diagonal (Dirichlet) contribution and a Schur-majorisation eigenvalue correction, whose assembly into the mutual information cleanly separates classical from quantum correlations.

From data to decisions: Bayesian modelling and global sensitivity analysis for flotation control

2026-06-04T13:45:15Z

This work presents a data-driven framework for interpretable modelling and decision support in flotation systems, integrating Gaussian Process (GP) regression with Global Sensitivity Analysis (GSA) via Sobol indices and local interpretability using SHapley Additive exPlanations (SHAP). Based on laboratory-scale experimental data, a static GP surrogate model is developed to capture how superficial air velocity, overflowing froth velocity, froth height over the lip, pulp height, bubble size, and tailings flowrate influence the measured air recovery. The trained GP enables the computation of Sobol indices to quantify the contribution of each variable and their interactions to the overall variance in air recovery. The combination of Bayesian inference and Sobol-based sensitivity metrics provides a systematic approach to identify the dominant and interacting variables governing air recovery. This study links Bayesian learning, sensitivity quantification, and explainability to provide a foundation for data-driven control and optimisation of flotation processes.

Higher-Order Multivariate Environmental Influences in Structural Health Monitoring

2026-06-04T11:18:34Z

System outputs such as eigenfrequencies or strain data, often used in structural health monitoring (SHM), not only react to damage but also depend on environmental conditions. When trying to correct for these confounding effects, it is often (at least implicitly) assumed that only the expected, i.e., mean, output values are affected by environmental conditions. However, the evaluation of real-world SHM data indicates that environmental conditions may influence not only the mean output but also higher-order statistical moments, particularly the variances of and the covariances and correlations between the output quantities, such as eigenfrequencies of different modes or strain sensors at different locations. To address these issues, we discuss two approaches for identifying and quantifying multivariate confounding effects on output covariances and correlations: a random forest and a nonparametric, kernel-based approach. We compare the two competing methods on both artificial and real-world SHM data, finding that the kernel-based approach achieves higher accuracy, but the random forest produces estimates that are more robust and sometimes easier to interpret.

Bankruptcy Prediction from 10-K Narratives: Evidence from Interpretable Text Scores and Accounting Baselines

2026-06-04T02:49:47Z

Bankruptcy is a low-frequency but high-impact corporate event, making early risk identification important for creditors, investors, regulators, and risk managers. Traditional bankruptcy-prediction models rely primarily on accounting ratios, but these measures may reflect financial deterioration only after it appears in reported financial statements. Narrative disclosures in annual 10-K filings may therefore provide incremental warning signals about emerging distress. This study examines whether 10-K narratives improve bankruptcy prediction beyond conventional accounting variables. Using firm-year observations matched to 10-K text, SEC financial statement data, and bankruptcy events from the Florida-UCLA-LoPucki Bankruptcy Research Database, the analysis evaluates bankruptcy risk over the year following the 10-K filing date. The paper develops a transparent Pre-Bankruptcy Stress (PB Stress) Score, a dictionary-based measure designed to capture distress-specific language related to liquidity and funding stress, debt covenant and refinancing stress, operating deterioration, restructuring and legal distress, and business fragility. The score is evaluated against a five-variable accounting baseline and a Loughran-McDonald dictionary benchmark. In the primary one-year holdout test, adding the PB Stress Score increases AUC from 0.8323 to 0.9019 and raises top-decile bankruptcy capture from 44.12% to 64.71%. The positive incremental pattern remains visible across bootstrap inference, alternative accounting benchmarks, alternative outcome definitions, and out-of-time validation. The findings indicate that distress-specific 10-K narratives provide interpretable incremental information for bankruptcy-risk monitoring beyond conventional accounting ratios.

From Causal Discovery to Dynamic Causal Inference in Neural Time Series

2026-06-04T00:09:54Z

Time-varying causal models provide a powerful framework for studying dynamic scientific systems, yet most existing approaches assume that the underlying causal network is known a priori - an assumption rarely satisfied in real-world domains where causal structure is uncertain, evolving, or only indirectly observable. This limits the applicability of dynamic causal inference in many scientific settings. We propose Dynamic Causal Network Autoregression (DCNAR), a two-stage neural causal modeling framework that integrates data-driven causal discovery with time-varying causal inference. In the first stage, a neural autoregressive causal discovery model learns a sparse directed causal network from multivariate time series. In the second stage, this learned structure is used as a structural prior for a time-varying neural network autoregression, enabling dynamic estimation of causal influence without requiring pre-specified network structure. We evaluate the scientific validity of DCNAR using behavioral diagnostics that assess causal necessity, temporal stability, and sensitivity to structural change, rather than predictive accuracy alone. Experiments on multi-country panel time-series data demonstrate that learned causal networks yield more stable and behaviorally meaningful dynamic causal inferences than coefficient-based or structure-free alternatives, even when forecasting performance is comparable. These results position DCNAR as a general framework for using AI as a scientific instrument for dynamic causal reasoning under structural uncertainty.