Nonlinear Amplification of Finite-Sample Uncertainty in Capability-Based Decisions

2026-05-07T18:46:50Z

This paper studies the propagation of finite-sample uncertainty under nonlinear transformations commonly used in statistical decision systems. In particular, we consider process capability indices, which are widely used in manufacturing practice but are estimated from finite samples, rendering the resulting approval decisions inherently uncertain. We show that such uncertainty cannot be fully explained by estimator variability alone, but is substantially influenced by a nonlinear amplification mechanism through which capability uncertainty is transformed into defect-risk metrics. While capability estimators vary approximately linearly with process dispersion, defect probabilities depend on tail curvature, causing small estimation errors to be disproportionately amplified in measures such as defect probability and parts-per-million (PPM) rates. Consequently, capability assessments that appear stable in index space may exhibit substantial variability in defect-risk space, particularly near decision thresholds. This insight provides a unified explanation of finite-sample decision instability, motivates reliability-aware decision formulations, and links sample-size requirements directly to decision reliability. Monte Carlo simulations and industrial data analyses validate the proposed mechanism and demonstrate its practical implications, including the impact of distributional assumptions on defect-risk estimation.

Penalized KLIC Model Selection for the Generalized Method of Moments in Longitudinal Data with Time-Dependent Covariates

2026-05-07T16:39:34Z

Model selection plays an important role in longitudinal data analysis, especially when models are estimated using the generalized method of moments (GMM) in the presence of time-dependent covariates. In this setting, the number of valid moment conditions can grow quickly and may lead to over-parameterized models. The Kullback--Leibler Information Criterion (KLIC) has been proposed as a model-selection tool for this framework; however, the original KLIC criterion may favor overly complex models when the number of parameters or valid moment conditions increases. To address this limitation, this study proposes two penalized versions of KLIC that incorporate penalties based on both the number of model parameters and the number of valid moment conditions. The proposed criteria are referred to as the Moment--Parameter Product Penalty KLIC (MPPP--KLIC) and the Logarithmic Penalty KLIC (LP--KLIC). These criteria provide a theoretically motivated mechanism for balancing model fit and model complexity in GMM-based longitudinal models. Through an extensive simulation study involving both binary and continuous response settings, the proposed criteria are shown to improve the ability of KLIC to distinguish among competing models and to reduce the selection of over-parameterized models. The performance of the proposed methods is further illustrated using the Filipino Child Morbidity dataset, a longitudinal study of child health in the Philippines. The results show that the proposed penalized criteria provide stable and interpretable model rankings and consistently identify age as the most important predictor of child morbidity. Overall, the proposed penalized KLIC criteria offer practical and theoretically grounded tools for model selection in GMM-based longitudinal data analysis with time-dependent covariates.

Bivariate Frank Copula: Some More Results on Point Estimation of the Association Parameter from a Bayesian Perspective and Revisiting the Goodness of Fit Tests with an Application to Model Groundwater Data from Dong Thap, Vietnam

2026-05-07T16:15:50Z

This work has two major parts. First, we extend the recent study of Pham et al. (2025) on point estimation of the association parameter of a bivariate Frank copula. We investigate two Bayes estimators under the generalized flat prior and the Jeffreys prior, and compare them with the maximum likelihood estimator (MLE). Simulation results show that, for small sample sizes (n <= 25), the Bayes estimator under the Jeffreys prior uniformly outperforms both the generalized flat prior estimator and the MLE in terms of mean squared error (MSE). For moderate and large sample sizes, all estimators have very similar performances in terms of bias and MSE. We also discuss computational issues in the R package implementation that may significantly affect the computation of the MLE for very small samples. In the second part, we apply the Frank copula to analyze the association between groundwater arsenic concentration and other hydrochemical variables using a recent dataset from Vietnam. We revisit the goodness-of-fit tests proposed by Genest et al. (2006), investigate several non-intuitive behaviors of the test statistics, and provide extensive simulated critical value tables. Our results complement and refine the computational findings reported in the earlier literature.

Bayesian Modeling and Prediction of Generalized Contact Matrices

2026-05-07T14:30:57Z

Social contact matrices are essential tools in infectious disease epidemiology as they quantify close-range human contact patterns which directly drive the transmission of airborne infectious diseases. In this work we propose a Bayesian modeling framework for inferring generalized contact matrices which stratify contact matrices beyond contemporary age dimensions. The model is designed to satisfy fundamental structural assumptions of contacts while leveraging tensor structures and smoothing constraints to make high-dimensional matrix estimation computationally feasible and statistically stable. We discover a link between multi-dimensional matrix stratification subject to structural constraints with the theory of contingency tables. This enables us to approach a challenging missing-data problem commonly encountered in real-world analysis where feature information on the contacts is unobserved. We benchmark the framework against existing methods through simulation studies and illustrate the framework's practical utility through two real-world datasets: BICS (United States) and COVIMOD (Germany). Our models are implemented in an open-source Python package to facilitate adoption in the wider scientific community.

TinyBayes: Closed-Form Bayesian Inference via Jacobi Prior for Real-Time Image Classification on Edge Devices

2026-05-07T14:26:10Z

Cocoa (Theobroma cacao) is a critical cash crop for millions of smallholder farmers in West Africa, where Cocoa Swollen Shoot Virus Disease (CSSVD) and anthracnose cause devastating yield losses. Automated disease detection from leaf images is essential for early intervention, yet deploying such systems in resource-constrained settings demands models that are small, fast, and require no internet connectivity. Existing edge-deployable plant disease systems rely on end-to-end deep learning without uncertainty quantification, while Bayesian methods for edge devices focus on hardware-level inference architectures rather than agricultural applications. We bridge this gap with TinyBayes, the first framework to combine a closed-form Bayesian classifier with a mobile-grade computer vision pipeline for crop disease detection. Our pipeline uses YOLOv8-Nano (5.9 MB) for lesion localisation, MobileNetV3-Small (3.5 MB) for feature extraction, and the Jacobi prior; a Bayesian method that provides a closed form non-iterative estimators via projection, for the classification. The Jacobi-DMR (Distributed Multinomial Regression) classifier adds only 13.5 KB to the pipeline, bringing the total model size within 9.5 MB, while achieving 78.7% accuracy on the Amini Cocoa Contamination Challenge dataset and enabling end-to-end CPU inference under 150 ms per image. We benchmark against seven classifiers including Random Forest, SVM, Ridge, Lasso, Elastic Net, XGBoost, and Jacobi-GP, and demonstrate that the Jacobi-DMR offers the best trade-off between accuracy, model size, and inference speed for edge deployment. We have proved the asymptotic equivalence and consistency, asymptotic normality and the bias correction of Jacobi-DMR. All data and codes are available here: https://github.com/shouvik-sardar/TinyBayes

Detecting Consumers' Financial Vulnerability using Open Banking Data: Evidence from UK Payday Loans

2026-05-07T13:47:53Z

This paper examines whether repeated payday loan use, commonly known as the debt trap, harms borrowers' financial wellbeing. Using Open Banking data from 1,815 UK borrowers observed between 2017 and 2018, we model borrowing intensity using a two-state hidden Markov model (HMM). The HMM outperforms single-regime alternatives and identifies two distinct borrowing patterns: occasional (low-intensity) and persistent (high-intensity) use. Each regime exhibits a characteristic relationship between borrowing intensity and wider transaction behaviour. We translate the decoded state sequence into a practical monitoring rule based on sustained high-intensity exposure. Defining a trigger event as 12 consecutive weeks in the high-intensity regime, we find that 36.4% of borrowers experience at least one such event. Among those who do, high-intensity weeks represent 17.8% of all borrower-week observations on average. Together, these results provide evidence for a persistent high-intensity borrowing pattern and demonstrate that it can serve as a simple, interpretable rule for monitoring prolonged reliance on payday loans.

Bayesian Fractional Polynomials for Optimal Dosage Estimation with Fish Nutrition Applications

2026-05-07T13:23:54Z

The problem of optimal dosage estimation arises in diverse scientific domains, from pharmacology and toxicology to aquaculture and environmental studies. Statistical modeling of nonlinear dose-response relationships is essential to quantify biological effects and determine response-optimal levels. This paper introduces a flexible Bayesian fractional polynomial (BFP) framework for modeling such relationships, allowing for model uncertainty quantification and robust prediction through Bayesian model averaging. Extensive simulation results demonstrate that the proposed BFP approach yields accurate estimation of optimal dose levels, outperforming benchmarks significantly. The approach is demonstrated on real data from fish nutrient requirement experiments.

A Two-Level Plackett-Luce Model for preference modeling in smart mobility platforms

2026-05-07T13:23:23Z

The Plackett-Luce model is widely used to deal with probabilities in discrete choice settings. This paper introduces a novel two-level Plackett-Luce model combined with a multinomial logistic scheme that provides the basis for the route choice module in a smart mobility platform. For this, we develop Bayesian inference and prediction mechanisms to capture consumers' preferences for personalized route recommendations. The model is empirically tested, allowing for refinements and discussion of its applicability. We also illustrate its practical relevance through several use cases, including relevant route selection, coordinated car pooling, incentive design and synthetic data generation.

Super-Level-Set Regression: Conditional Quantiles via Volume Minimization

2026-05-07T13:14:45Z

Constructing minimum-volume prediction regions that satisfy conditional coverage is a fundamental challenge in multivariate regression. Standard approaches rely on explicitly estimating the full conditional density and subsequently thresholding it. This two-step plug-in process is notoriously difficult, sensitive to estimation errors, and computationally expensive. One would like to instead optimize the region directly. Formulating a direct solution is challenging, however, because it requires minimizing a volume objective that is coupled with the conditional quantiles of the model's own estimation error. In this work, we address this challenge. We introduce super-level-set regression (SLS), a novel mathematical framework that successfully resolves this implicit coupling, allowing us to directly parameterize and optimize the geometric boundaries of the target conditional level sets. By bypassing full distribution estimation and leveraging flexible volume-preserving frontier functions, our approach natively captures complex, multimodal, and disjoint conditional structures end-to-end. Ultimately, SLS offers a new perspective on multivariate conditional quantile regression, replacing the restrictive assumptions of density-first methods with a direct geometric optimization strategy.

Scalable model selection for count time series with structural breaks: application to solid-organ transplantation during and after COVID-19 in the USA and Italy

2026-05-07T12:53:08Z

Weekly healthcare activity data are typically non-negative counts with temporal dependence and occasional system-wide disruptions, settings in which Gaussian time-series models may be inadequate. Solid organ transplant (SOT) activity provides a representative case study of a count process affected by a large external shock. We analyse weekly SOT counts in the USA and Italy from 2014 to October 2024, stratified by donor type (deceased vs living) and organ (kidney and liver). We fit Poisson and negative-binomial count time-series models incorporating short-term dynamics, calendar effects (holiday weeks), and pre-specified pandemic-period level and/or slope indicators. Candidate specifications are screened within a pre-defined portfolio and selected using BIC within each training window. Forecasting performance is evaluated with an expanding-window design at horizons $h\in\{4,8,12\}$ weeks. Alongside RMSE, we report empirical coverage of nominal $95\%$ predictive intervals and interval widths to summarise calibration and forecast uncertainty. Across strata, selected models capture substantial pandemic-period deviations and varying post-period trajectories. Deceased-donor series are broadly consistent with a return towards pre-pandemic baselines in both countries, whereas the US living-donor series shows a more gradual convergence in this application. Within the explored model class and validation protocol, auxiliary covariates representing COVID burden and mortality add limited incremental predictive contribution beyond autoregressive and calendar components. Our analysis shows that donation time series represent an unconditional phenomenon, with auxiliary variables having a statistically negligible impact on donations, thus allowing a focus on more practical aspects related to ongoing challenges in the post-pandemic era, such as hospital overloads and changes in public perception.

A Novel Exact Inference Approach for Log-Logistic Reliability Functions with Applications to Time-to-Event Data

2026-05-07T12:45:50Z

Log-logistic distribution is a flexible distribution that can model a wide range of failure patterns in the field of electrical, electronic and mechanical engineering and is often used in reliability inference. However, the inference of the parameters and reliability function of the log-logistic distribution can be challenging, especially in small sample scenarios. In this paper, we propose a new inference framework based on the least squares estimator-based generalized pivotal quantities (LSE-GPQ) for the parameters and reliability functions of the log-logistic distribution, which can provide better coverage in small sample scenarios. We will compare the performance of our proposed method with traditional methods such as the MLE and parametric bootstrapping through simulation studies and real data applications.

Correcting heterogeneous diagnostic bias when developing clinical prediction models using causal hidden Markov models

2026-05-07T11:43:47Z

In routine care, individuals identified a priori as high-risk are usually tested for conditions more frequently. Protected attributes, such as sex or ethnicity may also determine testing frequency. Such heterogeneous detection rates across a population induce label error. This causes systematic model error for specific groups and biases performance metrics during validation. This paper proposes a method to correct for such bias in prediction models due to differential diagnostic delay. We use a causal inference framework to define our target estimand: an individual's diagnosis probability in a counterfactual scenario where their diagnosis rate matches that of a reference group. We model the longitudinal process as a hidden Markov model, in which confirmatory test results are emissions from a latent progressive disease stage. We validate our approach in simulated data and apply it to a case study of chronic kidney disease prediction using electronic health records. In simulations, our method reduces prediction bias and improves calibration-in-the-large, correcting the Observed:Expected ratio in the underdiagnosed group from 1.34 (standard deviation: 0.09) in a model developed without any correction for underdiagnosis bias to 1.02 (0.09). Violations of assumptions in the simulation affected the estimation of model parameters, but the proposed approach nonetheless remained better calibrated than the standard model. In the clinical case study, we identify diabetes as the main driver of observability, with an odds ratio of 10.36 (95% confidence interval, 9.80 - 11.02) in 6-month urine albumin-creatinine ratio testing rate. Using our approach to predict the counterfactual diagnostic rate in patients without diabetes, we improved the Observed:Expected ratio of a developed clinical prediction model from 1.55 (1.51 - 1.59) to 1.01 (0.98 - 1.04).

Towards Reliable LLM Evaluation: Correcting the Winner's Curse in Adaptive Benchmarking

2026-05-07T10:18:56Z

Adaptive prompt and program search makes LLM evaluation selection-sensitive. Once benchmark items are reused inside tuning, the observed winner's score need not estimate the fresh-data performance of the full tune-then-deploy procedure. We study inference for this procedure-level target under explicit tuning budgets. We propose SIREN, a selection-aware repeated-split reporting protocol that freezes the post-search shortlist, separates splitwise selection from held-out evaluation, and uses an item-level Gaussian multiplier bootstrap for uncertainty quantification. In a fixed-shortlist regime with smooth stabilized selection, the estimator admits a first-order item-level representation, and the bootstrap yields valid simultaneous inference on a finite budget grid. This supports confidence intervals for procedure-performance curves and pre-specified equal-budget and cross-budget comparisons. Controlled simulations and MMLU-Pro tuning experiments show that winner-based reporting can be optimistic and can change deployment conclusions, while SIREN remains close to the finite-sample reporting target.

Errors-in-variables regression for dependent data with estimated error covariance matrix: To prewhiten or not?

2026-05-07T10:16:33Z

We consider statistical inference for errors-in-variables regression models with dependent observations under the high dimensionality of the error covariance matrix. It is tempting to prewhiten the model and data that had led to efficient weighted least squares estimation in the presence of the measurement errors, as being practised in the optimal fingerprinting approach in climate change studies. However, it is unclear to what extent the prewhitened estimator can improve the estimation efficiency of the unprewhitened estimator for errors-in-variables regression. We compare the prewhitening and unprewhitening estimators in terms of their estimation efficiency and computational cost. It shows that while the prewhitening operation does not necessarily improve the estimation efficiency of its unprewhitening counterpart, it demands more on the ensemble size needed in the error-covariance matrix estimation to ensure the asymptotic normality, and hence it would requires much more computationally resource.

Generative AI-Based Monte Carlo Simulation for Method Evaluation Using Synthetic Multilevel Data

2026-05-07T06:45:17Z

The role of AI-generated synthetic data has recently been expanded to support realistic Monte Carlo simulations. However, guidance is limited on generating data with multilevel structures and designing simulations based on such data. This study proposes a general framework for AI-based simulation studies to evaluate the predictive performance and parameter recovery of quantitative methods, specifically using multilevel data commonly observed in the social sciences. Our proposed six-stage workflow consists of (i) specifying a method and real data, (ii) training Generative AI with real data, (iii) assessing synthetic data quality, (iv) designing and conducting simulations, (v) evaluating method performance, and (vi) checking robustness. To enhance fidelity in multilevel data generation, we also introduce targeted modifications to diffusion models and Generative Adversarial Networks (GANs). Furthermore, we develop a systematic quality evaluation framework that assesses both within-table and between-table fidelity, and discuss how AI-based simulation designs should differ depending on whether the simulation's objective is predictive performance or parameter recovery. Finally, using empirical multilevel data and multilevel modeling methods, we demonstrate the utility of the proposed AI-based simulation framework. This approach leads to more accurate and honest evaluations of quantitative methods in the real world, unlike traditional simulation studies based on arbitrary simulated scenarios.