https://arxiv.org/api/R160X1/M4+dsETG51qgAff8j7A02026-03-22T08:34:58Z1629015http://arxiv.org/abs/2603.18928v1A Bayesian Reinterpretation of Cornfield-Type Sensitivity Analysis: From Thresholds to Probabilities2026-03-19T14:08:27ZSensitivity analysis for unmeasured confounding in observational studies is commonly based on threshold quantities, such as the Cornfield condition or the E-value, which quantify how strong a confounder must be to explain away an observed association. However, these approaches do not address a fundamental inferential question: how plausible is it that such a confounder exists? In this work, we propose a Bayesian reformulation of Cornfield-type sensitivity analysis in which the strength of unmeasured confounding is treated as a random variable. Within this framework, the E-value is reinterpreted as a threshold, and the central inferential quantity becomes the posterior probability that confounding exceeds this threshold. This transforms sensitivity analysis from a descriptive diagnostic into a probabilistic assessment of robustness. We develop a simple generative model linking observed effect estimates to true causal effects and confounding bias, and we specify prior distributions reflecting plausible confounding mechanisms. The resulting framework yields posterior measures of evidential vulnerability that are directly interpretable and applicable to summary-level data. Illustrations based on empirical case studies show that the proposed approach preserves the interpretability of the E-value while providing a more nuanced and decision-relevant characterization of robustness. More broadly, the framework aligns sensitivity analysis with Bayesian principles of inference under uncertainty, offering a coherent alternative to purely threshold-based reasoning.2026-03-19T14:08:27ZTommaso Costahttp://arxiv.org/abs/2503.19068v2Minimum Volume Conformal Sets for Multivariate Regression2026-03-18T17:44:07ZConformal prediction provides a principled framework for constructing predictive sets with finite-sample validity. While much of the focus has been on univariate response variables, existing multivariate methods either impose rigid geometric assumptions or rely on flexible but computationally expensive approaches that do not explicitly optimize prediction set volume. We propose an optimization-driven framework based on a novel loss function that directly learns minimum-volume covering sets while ensuring valid coverage. This formulation naturally induces a new nonconformity score for conformal prediction, which adapts to the residual distribution and covariates. Our approach optimizes over prediction sets defined by arbitrary norm balls, including single and multi-norm formulations. Additionally, by jointly optimizing both the predictive model and predictive uncertainty, we obtain prediction sets that are tight, informative, and computationally efficient, as demonstrated in our experiments on real-world datasets.2025-03-24T18:54:22ZSacha BraunLiviu AolariteiMichael I. JordanFrancis Bachhttp://arxiv.org/abs/2602.15562v4Either a Confidence Interval Covers, or It Doesn't (Or Does It?): A Model-Based View of Ex-Post Coverage Probability2026-03-17T19:34:31ZIn Neyman's original formulation, a 1-alpha confidence interval procedure is justified by its long-run coverage properties, and a single realized interval is to be described only by the slogan that it either covers the parameter or it does not. On this view, post-data probability statements about the coverage of an individual interval are taken to be conceptually out of bounds. In this paper, I present two kinds of arguments against treating that "either-or" reading as the only legitimate interpretation of confidence. The first is informal, via a set of thought experiments in which the same joint probability model is used to compute both forward-looking and backward-looking probabilities for occurred-but-unobserved events. The second is more formal, recasting the standard confidence-interval construction in terms of infinite sequences of trials and their associated 0/1 coverage indicators. In that representation, the design-level coverage probability 1-alpha and the degenerate conditional probabilities given the full data appear simply as different conditioning levels of the same model. I argue that a strict behavioristic reading that privileges only the latter is in tension with the very mathematical machinery used to define long-run error rates. I then sketch an alternative view of confidence as a predictive probability (or forecast) about the coverage indicator, together with a simple normative rule for when intermediate probabilities for single coverage events should be allowed.
Keywords: confidence intervals; coverage probability; frequentist inference; single-case probability; predictive probability; Neyman.
Disclaimer: The findings and conclusions in this report are those of the author and do not necessarily represent the official position of the Centers for Disease Control and Prevention.2026-02-17T13:22:11ZScott Leehttp://arxiv.org/abs/2603.16317v1Balance and Fairness through Multicalibration in Nonlife Insurance Pricing2026-03-17T09:50:27ZAutocalibration is known to be an important requirement for insurance premiums since it guarantees that premium income balances corresponding claims, on average, not only at portfolio level but also inside each group paying similar premiums. Also, fairness has become a major concern because unfair treatment may expose insurers to lawsuits or reputational damage. Translating fairness into conditional mean independence allows actuaries to combine autocalibration and fairness into the multicalibration concept. This paper studies the properties of multicalibration in an insurance context and proposes practical ways to implement it, through local regression or bias correction within groups including credibility adjustments. A case study based on motor insurance data illustrates the relevance of multicalibration in insurance pricing.2026-03-17T09:50:27ZMichel DenuitMarie MichaelidesJulien Trufinhttp://arxiv.org/abs/2603.15426v1Exact and limit results for the CTRW in presence of drift and position dependent noise intensity2026-03-16T15:34:48ZContinuous-time random walks (CTRWs) with drift and position-dependent jumps provide a highly general framework for describing a wide range of natural and engineered systems. We analyze the stochastic differential equation (SDE) associated with this class of models, in which the driving noise $ξ(t)$ consists of spike (shot) events, and we derive two exact analytical results. First, we obtain a closed-form expression for the $n$-time correlation functions of $ξ(t)$, expressed as a sum over all $2^{\,n-1}$ ordered partitions of the observation times (Proposition~2). Second, using the $G$-cumulant formalism, we derive an \emph{exact} non-local master equation (ME) for the probability density function of the CTRW variable $x(t)$, valid without invoking diffusive limits, fractional scaling assumptions, or closure hypotheses (Proposition~3). In interaction representation, this ME retains the same structural form as that of the standard CTRW without drift or position-dependent jumps. Our main result is the emergence of a \textbf{universal local master equation}: at long times, the exact non-local ME is universally and accurately approximated by a time-local ME whose only coefficient is the instantaneous renewal rate $R(t)$. This approximation reproduces the exact Poissonian ME when $R$ is constant, and numerical experiments confirm its remarkable accuracy even far beyond regimes where a naive time-scale separation would justify it.2026-03-16T15:34:48Z76 pages, 12 FiguresMarco BianucciMauro BolognaRiccardo Mannellahttp://arxiv.org/abs/2603.15215v1Deepest voting on rankings2026-03-16T12:52:49ZThis article aims to present a unified framework for ranking-based voting rules based on the use of depth functions on permutations, as a counterpart of deepest voting rules on evaluation introduced in Aubin et al. [2022]. It introduces the notion of depth functions, in continuous sets and in permutation sets, the later using the notion of Fr{é}chet means. Deepest voting procedures are then formally defined, and some classical voting rules are expressed as deepest voting procedures, using a large variety of distances on the set of permutations. Links are done between the depth functions mathematical properties and some behaviours of the voting rule, such as Neutrality, Anonymity, Universality, Condorcet winner/loser property and so on.2026-03-16T12:52:49ZJean-Baptiste AubinDEEP, ICJ, PSPM, INSA LyonAntoine RollandERIC, UL2Ioana GavraIRMAR, UR2Irène GannazG-SCOP\_GROG, G-SCOP, Grenoble INPJacques Anderson KouassiG-SCOP\_GROG, G-SCOP, Grenoble INPhttp://arxiv.org/abs/2508.06431v2Nonparametric Learning Non-Gaussian Quantum States of Continuous Variable Systems2026-03-16T06:26:11ZContinuous-variable quantum systems are foundational to quantum computation, communication, and sensing. While traditional representations using wave functions or density matrices are often impractical, the tomographic picture of quantum mechanics provides an accessible alternative by associating quantum states with classical probability distribution functions called tomograms. Despite its advantages, including compatibility with classical statistical methods, tomographic method remain underutilized due to a lack of robust estimation techniques. This work addresses this gap by introducing a non-parametric \emph{kernel quantum state estimation} (KQSE) framework for reconstructing quantum states and their trace characteristics from noisy data, without prior knowledge of the state. In contrast to existing methods, KQSE yields estimates of the density matrix in various bases, as well as trace quantities such as purity, higher moments, overlap, and trace distance, with a near-optimal convergence rate of $\tilde{O}\bigl(T^{-1}\bigr)$, where $T$ is the total number of measurements. KQSE is robust for multimodal, non-Gaussian states, making it particularly well suited for characterizing states essential for quantum science.2025-08-08T16:19:58ZLiubov A. MarkovichXiaoyu LiuJordi Turahttp://arxiv.org/abs/2603.14757v1The Rise of Null Hypothesis Significance Testing (NHST): Institutional Massification and the Emergence of a Procedural Epistemology2026-03-16T02:41:26ZIt has long been a puzzle why, despite sustained reform efforts, many applied scientific fields remain dominated by Null Hypothesis Significance Testing (NHST), a framework that dichotomizes study results and privileges "statistically significant" findings. This paper examines that puzzle by situating the development and rise of NHST within its historical and institutional context. Taking Actor-Network Theory as a point of entry, the analysis identifies the conditions under which particular inferential technologies stabilize and endure. The analysis shows that, although NHST does not resolve the technical problem of statistical inference, it came to dominate as a social technology that addressed the most pressing institutional challenge of the postwar period: the mass expansion of scientific networks. Under conditions of rapid institutional growth, NHST's technical slippages--purging research context and replacing epistemic judgment with mechanical procedures--became functional features rather than flaws. These features enabled procedural self-sufficiency across settings marked by heterogeneous goals and uneven expertise, thereby sealing NHST's position as the obligatory passage point in many postwar scientific fields.2026-03-16T02:41:26Z29 pages, 6 figuresCarol Tinghttp://arxiv.org/abs/2603.14273v1Using large language models for sensitivity analysis in causal inference: cases studies on Cornfield inequality and E-value2026-03-15T08:07:00ZSensitivity analysis methods such as the Cornfield inequality and the E-value were developed to assess the robustness of observed associations against unmeasured confounding -- a major challenge in observational studies. However, the calculation and interpretation of these methods can be difficult for clinicians and interdisciplinary researchers. Recent advances in large language models (LLMs) offer accessible tools that could assist sensitivity analyses, but their reliability in this context has not been studied. We assess four widely used LLMs, ChatGPT, Claude, DeepSeek, and Gemini, on their ability to conduct sensitivity analyses using Cornfield inequalities and E-values. We first extract study-specific information (exposures, outcomes, measured confounders, and effect estimates) from four published observational studies in different fields. Using those information, we develop structured prompts to assess the performance of the LLMs in three aspects: (1) accuracy of E-value calculation, (2) qualitative interpretation of robustness to unmeasured confounding, and (3) suggestion of possible unmeasured confounders. To our knowledge, this is the first study to investigate the use of LLMs for sensitivity analysis. The results show that ChatGPT, Claude, and Gemini accurately reproduce the E-values, whereas DeepSeek shows small biases. Qualitative conclusions from all the LLMs align with the magnitude of the E-values and the reported effect sizes, and all models identify biologically and epidemiologically plausible unmeasured confounders. These findings suggest that, when guided by structured prompting, LLMs can effectively assist in evaluating unmeasured confounding, and thereby can support study design and decision-making in observational studies.2026-03-15T08:07:00ZQingyan XiangJiahao ZhangBojian Fenghttp://arxiv.org/abs/2603.11240v2Statistical Methodology Groups in the Pharmaceutical Industry2026-03-13T14:24:17ZResearch and Development is the largest budget position in the pharmaceutical industry, with clinical trials being a critical, yet costly and time-consuming component to inform decisions. Beyond drug efficacy, the probability of success and efficiency of research and development are highly dependent on the approaches used for designing, analyzing, and interpreting clinical trials. Deep understanding of statistical methodology and quantitative approaches is therefore essential. Consequently, dedicated methodology groups have emerged in mid-size and large pharmaceutical companies and CROs. Their remit is to lead the conception and implementation of innovative quantitative methodologies in order to improve drug development, often by addressing complexities or offering more efficient designs. To achieve this, they collaborate internally and externally (e.g., with academics, regulators) to identify common challenges and tear down silos in order to invest in methods with the highest impact on efficiency and value to the portfolio. Given the immense financial stakes of drug development -- where delays carry massive implications -- these groups represent a critical strategic investment. However, to realize this business impact, statistical innovations must be rigorously validated and seamlessly integrated. This manuscript explores the setup, remit, and value of dedicated methodology groups, alongside the critical organizational considerations and success factors required to maximize their impact on the speed, efficiency, and probability of success.2026-03-11T19:05:51Z39 pages, 2 figures, 1 tableJenny DevenportTobias MielkeMouna AkachaKaspar RufibachAlex OcampoVivian LaniusMarc VandemeulebroeckePhilip HougaardPierre CollinDavid WrightJurgen HummelCornelia Ursula KunzMike Kramshttp://arxiv.org/abs/2603.10866v1Beyond Reproducible Research: Building a Formal Representation of a Data Analysis2026-03-11T15:18:22ZData analyses are often constructed in an imperative manner, where commands representing actions taken on the data are issued sequentially. The publication of these commands, along with the data, is essential to the reproducibility of the analysis by others. However, simply presenting the code and the results of running the code can hide important details about the data analyst's premises, expectations, and assumptions about the data. Understanding this analysis reasoning can be critical to evaluating the quality of an analysis and for suggesting possible improvements. We argue that a formal representation of a data analysis that externalizes its logical construction offers more useful information for statically illustrating an analyst's reasoning. Such a formal representation would allow for the evaluation of some aspects of a data analysis without the need for the data, the visualization of the logical connections leading to a conclusion, and the ability to assess the sensitivity of an analyst's assumptions to unexpected features in the data. In this paper we describe an implementation of this formal representation and how it might be applied to some common data analysis tasks.2026-03-11T15:18:22ZRoger D. Penghttp://arxiv.org/abs/2603.09318v1Anomaly detection using surprisals2026-03-10T07:50:22ZAnomaly detection methods are widely used but often rely on ad hoc rules or strong assumptions, and they often focus on tail events, missing ``inlier'' anomalies that occur in low-density gaps between modes. We propose a unified framework that defines an anomaly as an observation with unusually low probability under a (possibly misspecified) model. For each observation we compute its surprisal (the negative log generalized density) and define an anomaly score as the probability of a surprisal at least as large as that observed. This reduces anomaly detection for complex univariate or multivariate data to estimating the upper tail of a univariate surprisal distribution. We develop two model-robust estimators of these tail probabilities: an empirical estimator based on the observed surprisal distribution and an extreme-value estimator that fits a Generalized Pareto Distribution above a high threshold. For the empirical method we give conditions under which tail ordering is preserved and derive finite-sample confidence guarantees via the Dvoretzky--Kiefer--Wolfowitz inequality. For the GPD method we establish broad tail conditions ensuring classical extreme-value behavior. Simulations and applications to French mortality and Test-cricket data show the approach remains effective under substantial model misspecification.2026-03-10T07:50:22ZRob J HyndmanDavid T. Frazierhttp://arxiv.org/abs/2603.07742v1A Cylindrical Galton Board at the Galton Board's 150th Anniversary2026-03-08T17:32:41ZThe Galton board is a well known device for showing how repeated Bernoulli trials on a triangular lattice produce an approximately normal distribution. Marking the 150th anniversary of Galton's 1875 construction, this paper revisits the original apparatus and extends it to a cylindrical setting in which the peg lattice is wrapped around a cylinder. This creates angular periodicity and leads to height dependent behaviour that does not arise in the classical planar design. The cylindrical form links Galton's demonstration of variation and the emergence of the normal distribution with modern ideas in circular statistics, giving a physical realisation of binomial random walks on a circular linear product space. We distinguish cases where the wrapped lattice covers only an arc from those that span the full circumference, and show how these geometries lead to wrapped binomial and wrapped normal behaviour. We describe the construction of our physical model, discuss practical issues for replication, and analyse its statistical and pedagogical properties as a modern reinterpretation of Galton's work.2026-03-08T17:32:41Z18 pages, 8 FiguresKanti V. MardiaColin GoodallJohn Rubbohttp://arxiv.org/abs/2511.01040v3From Structural Equation Modeling to Targeted Learning: A Tutorial Introduction to Targeted Maximum Likelihood Estimation for SEM Researchers2026-03-07T02:48:07ZStructural equation modeling (SEM) and path analysis have long been central tools for studying complex causal relationships in the social and behavioral sciences, yet their reliance on parametric assumptions can lead to biased inference under model misspecification. To bridge traditional SEM with modern causal machine learning, this paper introduces targeted maximum likelihood estimation (TMLE), a doubly robust framework built on nonparametric structural equation modeling. We formally connect TMLE to classical path analysis, showing that standard SEM estimators arise as special cases of TMLE under restrictive parametric specifications and that both approaches can estimate common causal quantities such as direct, indirect, and total effects. Through simulation studies under both correctly specified and misspecified models, we demonstrate that while the two methods perform similarly when models are correctly specified, TMLE consistently achieves lower bias, reduced mean squared error, and improved confidence interval coverage when parametric assumptions are violated. We further illustrate these differences using an applied mediation analysis examining the role of poverty in access to high school education, where path analysis suggests a significant direct effect, whereas TMLE does not, highlighting the practical consequences of robustness in causal inference. Overall, this tutorial offers SEM researchers a conceptual and practical introduction to targeted learning, providing guidance on leveraging TMLE to enhance causal analysis beyond traditional parametric frameworks.2025-11-02T18:35:42ZJunjie MaXiaoya ZhangGuangye HeYuting HanTing GeFeng Jihttp://arxiv.org/abs/2603.06871v1Adaptive Bi-Level Variable Selection of Conditional Main Effects for Generalized Linear Models2026-03-06T20:44:43ZUnderstanding interaction effects among variables is important for regression modeling in various applications. The conventional approach of quantifying interactions as the product of variables often lacks clear interpretability, especially in complex systems. The concept of conditional main effects (CME) provides a more intuitive and interpretable framework for capturing interaction effects by quantifying the effect of one variable conditional on the level of another. A recent method called cmenet further considered the bi-level selection of CMEs by leveraging their natural grouping structure (e.g., sibling and cousin groups) through penalization. However, there are several limitations in the cmenet method, including the coupling ability of penalties for within-group CMEs, lack of adaptiveness for between-group penalties, and restriction to linear models with continuous responses. To overcome these limitations, we propose an adaptive cmenet method for CME selection under the generalized linear model (GLM) framework. The proposed method considers a penalized likelihood approach with adaptive weights to enable effective bi-level variable selection, improving both between-group and within-group selection. An efficient algorithm for parameter estimation is also developed by employing an iteratively reweighted least squares procedure. The performance of the proposed method is evaluated by both simulation studies and real-data studies in gene association analysis.2026-03-06T20:44:43ZKexin XieXinwei Deng