https://arxiv.org/api/jhHPNpPLRsQzr7iW7eytIIjG/iw 2026-06-10T01:42:13Z 1685 75 15 http://arxiv.org/abs/2602.15562v4 Either a Confidence Interval Covers, or It Doesn't (Or Does It?): A Model-Based View of Ex-Post Coverage Probability 2026-03-17T19:34:31Z

In Neyman's original formulation, a 1-alpha confidence interval procedure is justified by its long-run coverage properties, and a single realized interval is to be described only by the slogan that it either covers the parameter or it does not. On this view, post-data probability statements about the coverage of an individual interval are taken to be conceptually out of bounds. In this paper, I present two kinds of arguments against treating that "either-or" reading as the only legitimate interpretation of confidence. The first is informal, via a set of thought experiments in which the same joint probability model is used to compute both forward-looking and backward-looking probabilities for occurred-but-unobserved events. The second is more formal, recasting the standard confidence-interval construction in terms of infinite sequences of trials and their associated 0/1 coverage indicators. In that representation, the design-level coverage probability 1-alpha and the degenerate conditional probabilities given the full data appear simply as different conditioning levels of the same model. I argue that a strict behavioristic reading that privileges only the latter is in tension with the very mathematical machinery used to define long-run error rates. I then sketch an alternative view of confidence as a predictive probability (or forecast) about the coverage indicator, together with a simple normative rule for when intermediate probabilities for single coverage events should be allowed. Keywords: confidence intervals; coverage probability; frequentist inference; single-case probability; predictive probability; Neyman. Disclaimer: The findings and conclusions in this report are those of the author and do not necessarily represent the official position of the Centers for Disease Control and Prevention.

2026-02-17T13:22:11Z Scott Lee http://arxiv.org/abs/2603.16317v1 Balance and Fairness through Multicalibration in Nonlife Insurance Pricing 2026-03-17T09:50:27Z

Autocalibration is known to be an important requirement for insurance premiums since it guarantees that premium income balances corresponding claims, on average, not only at portfolio level but also inside each group paying similar premiums. Also, fairness has become a major concern because unfair treatment may expose insurers to lawsuits or reputational damage. Translating fairness into conditional mean independence allows actuaries to combine autocalibration and fairness into the multicalibration concept. This paper studies the properties of multicalibration in an insurance context and proposes practical ways to implement it, through local regression or bias correction within groups including credibility adjustments. A case study based on motor insurance data illustrates the relevance of multicalibration in insurance pricing.

2026-03-17T09:50:27Z Michel Denuit Marie Michaelides Julien Trufin http://arxiv.org/abs/2603.15215v1 Deepest voting on rankings 2026-03-16T12:52:49Z

This article aims to present a unified framework for ranking-based voting rules based on the use of depth functions on permutations, as a counterpart of deepest voting rules on evaluation introduced in Aubin et al. [2022]. It introduces the notion of depth functions, in continuous sets and in permutation sets, the later using the notion of Fr{é}chet means. Deepest voting procedures are then formally defined, and some classical voting rules are expressed as deepest voting procedures, using a large variety of distances on the set of permutations. Links are done between the depth functions mathematical properties and some behaviours of the voting rule, such as Neutrality, Anonymity, Universality, Condorcet winner/loser property and so on.

2026-03-16T12:52:49Z Jean-Baptiste Aubin DEEP, ICJ, PSPM, INSA Lyon Antoine Rolland ERIC, UL2 Ioana Gavra IRMAR, UR2 Irène Gannaz G-SCOP\_GROG, G-SCOP, Grenoble INP Jacques Anderson Kouassi G-SCOP\_GROG, G-SCOP, Grenoble INP http://arxiv.org/abs/2508.06431v2 Nonparametric Learning Non-Gaussian Quantum States of Continuous Variable Systems 2026-03-16T06:26:11Z

Continuous-variable quantum systems are foundational to quantum computation, communication, and sensing. While traditional representations using wave functions or density matrices are often impractical, the tomographic picture of quantum mechanics provides an accessible alternative by associating quantum states with classical probability distribution functions called tomograms. Despite its advantages, including compatibility with classical statistical methods, tomographic method remain underutilized due to a lack of robust estimation techniques. This work addresses this gap by introducing a non-parametric \emph{kernel quantum state estimation} (KQSE) framework for reconstructing quantum states and their trace characteristics from noisy data, without prior knowledge of the state. In contrast to existing methods, KQSE yields estimates of the density matrix in various bases, as well as trace quantities such as purity, higher moments, overlap, and trace distance, with a near-optimal convergence rate of $\tilde{O}\bigl(T^{-1}\bigr)$, where $T$ is the total number of measurements. KQSE is robust for multimodal, non-Gaussian states, making it particularly well suited for characterizing states essential for quantum science.

2025-08-08T16:19:58Z Liubov A. Markovich Xiaoyu Liu Jordi Tura http://arxiv.org/abs/2603.11240v2 Statistical Methodology Groups in the Pharmaceutical Industry 2026-03-13T14:24:17Z

Research and Development is the largest budget position in the pharmaceutical industry, with clinical trials being a critical, yet costly and time-consuming component to inform decisions. Beyond drug efficacy, the probability of success and efficiency of research and development are highly dependent on the approaches used for designing, analyzing, and interpreting clinical trials. Deep understanding of statistical methodology and quantitative approaches is therefore essential. Consequently, dedicated methodology groups have emerged in mid-size and large pharmaceutical companies and CROs. Their remit is to lead the conception and implementation of innovative quantitative methodologies in order to improve drug development, often by addressing complexities or offering more efficient designs. To achieve this, they collaborate internally and externally (e.g., with academics, regulators) to identify common challenges and tear down silos in order to invest in methods with the highest impact on efficiency and value to the portfolio. Given the immense financial stakes of drug development -- where delays carry massive implications -- these groups represent a critical strategic investment. However, to realize this business impact, statistical innovations must be rigorously validated and seamlessly integrated. This manuscript explores the setup, remit, and value of dedicated methodology groups, alongside the critical organizational considerations and success factors required to maximize their impact on the speed, efficiency, and probability of success.

2026-03-11T19:05:51Z 39 pages, 2 figures, 1 table Jenny Devenport Tobias Mielke Mouna Akacha Kaspar Rufibach Alex Ocampo Vivian Lanius Marc Vandemeulebroecke Philip Hougaard Pierre Collin David Wright Jurgen Hummel Cornelia Ursula Kunz Mike Krams http://arxiv.org/abs/2603.10866v1 Beyond Reproducible Research: Building a Formal Representation of a Data Analysis 2026-03-11T15:18:22Z

Data analyses are often constructed in an imperative manner, where commands representing actions taken on the data are issued sequentially. The publication of these commands, along with the data, is essential to the reproducibility of the analysis by others. However, simply presenting the code and the results of running the code can hide important details about the data analyst's premises, expectations, and assumptions about the data. Understanding this analysis reasoning can be critical to evaluating the quality of an analysis and for suggesting possible improvements. We argue that a formal representation of a data analysis that externalizes its logical construction offers more useful information for statically illustrating an analyst's reasoning. Such a formal representation would allow for the evaluation of some aspects of a data analysis without the need for the data, the visualization of the logical connections leading to a conclusion, and the ability to assess the sensitivity of an analyst's assumptions to unexpected features in the data. In this paper we describe an implementation of this formal representation and how it might be applied to some common data analysis tasks.

2026-03-11T15:18:22Z Roger D. Peng http://arxiv.org/abs/2603.20254v1 AI Detectors Fail Diverse Student Populations: A Mathematical Framing of Structural Detection Limits 2026-03-11T06:38:31Z

Student experiences and empirical studies report that "black box" AI text detectors produce high false positive rates with disproportionate errors against certain student populations, yet typically theoretical analyses model detection as a test between two known distributions for human and AI prose. This framing omits the structural feature of university assessment whereby an assessor generally does not know the individual student's writing distribution, making the null hypothesis composite. Standard application of the variational characterisation of total variation distance to this composite null shows trade-off bounds that any text-only, one-shot detector with useful power must produce false accusations at a rate governed by the distributional overlap between student writing and AI output. This is a constraint arising from population diversity that is logically independent of AI model quality and cannot be overcome by better detector engineering or technology. A subgroup mixture bound connects these quantities to observable demographic groups, providing a theoretical basis for the disparate impact patterns documented empirically. We propose suggestions to improve policy and practice, and argue that detection scores should not serve as sole evidence in misconduct proceedings.

2026-03-11T06:38:31Z Nathan Garland http://arxiv.org/abs/2603.09318v1 Anomaly detection using surprisals 2026-03-10T07:50:22Z

Anomaly detection methods are widely used but often rely on ad hoc rules or strong assumptions, and they often focus on tail events, missing ``inlier'' anomalies that occur in low-density gaps between modes. We propose a unified framework that defines an anomaly as an observation with unusually low probability under a (possibly misspecified) model. For each observation we compute its surprisal (the negative log generalized density) and define an anomaly score as the probability of a surprisal at least as large as that observed. This reduces anomaly detection for complex univariate or multivariate data to estimating the upper tail of a univariate surprisal distribution. We develop two model-robust estimators of these tail probabilities: an empirical estimator based on the observed surprisal distribution and an extreme-value estimator that fits a Generalized Pareto Distribution above a high threshold. For the empirical method we give conditions under which tail ordering is preserved and derive finite-sample confidence guarantees via the Dvoretzky--Kiefer--Wolfowitz inequality. For the GPD method we establish broad tail conditions ensuring classical extreme-value behavior. Simulations and applications to French mortality and Test-cricket data show the approach remains effective under substantial model misspecification.

2026-03-10T07:50:22Z Rob J Hyndman David T. Frazier http://arxiv.org/abs/2603.07742v1 A Cylindrical Galton Board at the Galton Board's 150th Anniversary 2026-03-08T17:32:41Z

The Galton board is a well known device for showing how repeated Bernoulli trials on a triangular lattice produce an approximately normal distribution. Marking the 150th anniversary of Galton's 1875 construction, this paper revisits the original apparatus and extends it to a cylindrical setting in which the peg lattice is wrapped around a cylinder. This creates angular periodicity and leads to height dependent behaviour that does not arise in the classical planar design. The cylindrical form links Galton's demonstration of variation and the emergence of the normal distribution with modern ideas in circular statistics, giving a physical realisation of binomial random walks on a circular linear product space. We distinguish cases where the wrapped lattice covers only an arc from those that span the full circumference, and show how these geometries lead to wrapped binomial and wrapped normal behaviour. We describe the construction of our physical model, discuss practical issues for replication, and analyse its statistical and pedagogical properties as a modern reinterpretation of Galton's work.

2026-03-08T17:32:41Z 18 pages, 8 Figures Kanti V. Mardia Colin Goodall John Rubbo http://arxiv.org/abs/2511.01040v3 From Structural Equation Modeling to Targeted Learning: A Tutorial Introduction to Targeted Maximum Likelihood Estimation for SEM Researchers 2026-03-07T02:48:07Z

Structural equation modeling (SEM) and path analysis have long been central tools for studying complex causal relationships in the social and behavioral sciences, yet their reliance on parametric assumptions can lead to biased inference under model misspecification. To bridge traditional SEM with modern causal machine learning, this paper introduces targeted maximum likelihood estimation (TMLE), a doubly robust framework built on nonparametric structural equation modeling. We formally connect TMLE to classical path analysis, showing that standard SEM estimators arise as special cases of TMLE under restrictive parametric specifications and that both approaches can estimate common causal quantities such as direct, indirect, and total effects. Through simulation studies under both correctly specified and misspecified models, we demonstrate that while the two methods perform similarly when models are correctly specified, TMLE consistently achieves lower bias, reduced mean squared error, and improved confidence interval coverage when parametric assumptions are violated. We further illustrate these differences using an applied mediation analysis examining the role of poverty in access to high school education, where path analysis suggests a significant direct effect, whereas TMLE does not, highlighting the practical consequences of robustness in causal inference. Overall, this tutorial offers SEM researchers a conceptual and practical introduction to targeted learning, providing guidance on leveraging TMLE to enhance causal analysis beyond traditional parametric frameworks.

2025-11-02T18:35:42Z Junjie Ma Xiaoya Zhang Guangye He Yuting Han Ting Ge Feng Ji http://arxiv.org/abs/2603.06871v1 Adaptive Bi-Level Variable Selection of Conditional Main Effects for Generalized Linear Models 2026-03-06T20:44:43Z

Understanding interaction effects among variables is important for regression modeling in various applications. The conventional approach of quantifying interactions as the product of variables often lacks clear interpretability, especially in complex systems. The concept of conditional main effects (CME) provides a more intuitive and interpretable framework for capturing interaction effects by quantifying the effect of one variable conditional on the level of another. A recent method called cmenet further considered the bi-level selection of CMEs by leveraging their natural grouping structure (e.g., sibling and cousin groups) through penalization. However, there are several limitations in the cmenet method, including the coupling ability of penalties for within-group CMEs, lack of adaptiveness for between-group penalties, and restriction to linear models with continuous responses. To overcome these limitations, we propose an adaptive cmenet method for CME selection under the generalized linear model (GLM) framework. The proposed method considers a penalized likelihood approach with adaptive weights to enable effective bi-level variable selection, improving both between-group and within-group selection. An efficient algorithm for parameter estimation is also developed by employing an iteratively reweighted least squares procedure. The performance of the proposed method is evaluated by both simulation studies and real-data studies in gene association analysis.

2026-03-06T20:44:43Z Kexin Xie Xinwei Deng 10.1080/00401706.2026.2643213 http://arxiv.org/abs/2603.06820v1 Hippocratic Utility 2026-03-06T19:29:57Z

A utility function has been proposed that values more those lives that are saved by not imposing a harmful treatment and values less those lives that could be saved by treating people who would otherwise die. I do not dispute the ethical motivation behind this kind of asymmetry. However, as my example illustrates, the scope of applicability of such a decision criterion may be limited.

2026-03-06T19:29:57Z Tomasz Strzalecki http://arxiv.org/abs/2603.06328v1 Variable selection in linear mixed model meta-regression with suspected interaction effects -- How can tree-based methods help? 2026-03-06T14:40:23Z

Detecting interaction effects (IEs) in meta-regression is challenging, especially when few studies are available and many plausible interactions are considered. In many meta-analyses, interpretability is essential, which limits the use of complex machine learning methods. Tree-based approaches offer a potentially useful compromise, but their role in meta-regression with random effects is not yet well understood. This paper examines how traditional linear and tree-based methods can support variable selection for IEs in random effects meta-regression. We compare test-based and information-criterion-based linear selection procedures with meta-CART approaches. These include fixed effect and random effects trees and their stability-selected ensemble variants. All methods are evaluated using a real-world meta-analytic dataset and a plasmode simulation study. The data-generating process assumes linear IEs and is complemented by settings with nonlinear interactions. Our results show that under strictly linear interactions, linear selection methods perform as expected and achieve superior performance for IE detection. Tree-based methods are more conservative when the number of studies is small, but become competitive as sample size increases, particularly the stability-selected variants. When IEs deviate from strict linearity, even in simple ways, the performance of linear methods deteriorates, whereas tree-based approaches, especially stability-selected fixed effect trees, provide a more robust alternative. Overall, stability-selected random effects trees are useful complementary tools for IE detection in applied meta-regression, particularly for metric covariates. They are well suited for pre-selection and sensitivity analyses, and selection frequency patterns in tree ensembles can help reveal structural patterns in the data.

2026-03-06T14:40:23Z 25 pages, 5 figures. Supplementary Materials at https://doi.org/10.17877/TUDODATA-2026-3CDZSS Jan-Bernd Igelmann Paula Lorenz Markus Pauly http://arxiv.org/abs/2603.06072v1 A Hierarchical Bayesian Dynamic Game for Competitive Inventory and Pricing under Incomplete Information: Learning, Credible Risk, and Equilibrium 2026-03-06T09:24:49Z

We develop a hierarchical Bayesian dynamic game for competitive inventory and pricing under incomplete information. Two firms repeatedly choose order quantities and prices while facing two layers of uncertainty: unknown market demand and private rival characteristics. The framework combines Bayesian learning about demand and substitution with strategic belief updating about rival types. To make decisions robust to posterior uncertainty, we introduce a credible-risk criterion that rewards expected future profit while penalizing posterior predictive dispersion. This yields a conservative equilibrium concept in which firms learn, compete, and adapt simultaneously. The paper provides the model formulation, information structure, posterior updating mechanism, equilibrium definition, and a computational strategy based on belief-state dynamic programming. A simulation study shows that Bayesian learning is crucial for strong performance and that the credible-risk rule is especially effective as an operational regularizer under uncertainty. A real-data illustration on a high-dimensional protein-expression dataset demonstrates that the same uncertainty-aware Bayesian principle can produce biologically interpretable subgroup and latent-state findings. The proposed framework offers a unified bridge between Bayesian game theory and operations research, with practical relevance for competitive decision-making in uncertain and information-limited environments.

2026-03-06T09:24:49Z Debashis Chatterjee http://arxiv.org/abs/2603.05885v1 Bayesian Linear Programming under Learned Uncertainty: Posterior Feasibility Guarantees, Scenario Certification, and Applications 2026-03-06T04:07:44Z

Linear programming is widely used for decision-making in science, engineering, and operations research, yet in many modern applications the coefficients entering the constraints and objective are not known exactly and must be learned from data. Classical stochastic and robust optimization offer two influential paradigms for handling such uncertainty, but they typically treat the underlying uncertainty description as given and do not directly integrate priors and updated to posteriors guarantees. This paper develops a Bayesian framework for linear programming in which uncertain quantities are modeled probabilistically, updated through observed data, and propagated into optimization through posterior feasibility requirements. We present two complementary computational strategies: a credible-region robustification that converts posterior uncertainty into deterministic protection, and a posterior-scenario approach that uses sampled posterior realizations to construct tractable optimization problems with finite-sample interpretability. We also propose a Monte Carlo certification procedure that provides conservative, data-conditioned assessments of residual infeasibility. Simulation experiments show that the proposed framework substantially improves safety relative to naive plug-in decisions, while a real-data study on single-cell transcriptomic data demonstrates that the approach can produce scientifically interpretable decisions together with explicit uncertainty-aware feasibility diagnostics. The proposed methodology offers a unified bridge between Bayesian learning, optimization under uncertainty, and practical decision certification.

2026-03-06T04:07:44Z Debashis Chatterjee