On the symmetry of evidential support

2026-03-31T14:23:55Z

For events $A$ and $B$, we have \[ \mathbb{P}(A\mid B) > \mathbb{P}(A\mid \neg B) \qquad\Longleftrightarrow\qquad \mathbb{P}(B\mid A) > \mathbb{P}(B\mid \neg A) \] whenever all four quantities are defined. In other words, $B$ is evidence for $A$ if and only if $A$ is evidence for $B$. This note gives seven different proofs of this fact -- by cross-multiplication, covariance, coupling parameters, odds ratios, pointwise mutual information, combinatorial double counting, and mixed discrete derivatives -- and develops a surrounding web of interpretations. Once the marginals $\mathbb{P}(A)$ and $\mathbb{P}(B)$ are fixed, a $2\times 2$ table has only one degree of freedom, so every scalar notion of positive association must be governed by the same signed parameter.

Violence Against Women: a pilot study on the perception of Apulian High school students

2026-03-30T17:43:27Z

Violence Against Women (VAW) is a widespread issue deeply rooted in social and cultural structures. Affecting women of all ages and backgrounds, VAW is often underreported due to stigma and victim-blaming. This study explores young people's perceptions of VAW in the Apulia region (Southern Italy), using a local survey inspired by a National framework on gender stereotypes and attitudes towards VAW. The survey gathers insights into youth opinions on gender roles, the acceptability of violence, and awareness of VAW within their communities, aiming to uncover the underlying attitudes that perpetuate this issue. The analysis combines two methodological approaches to examine these data. A network-based approach explores relationships within item responses, allowing for an in-depth look at the direct interactions among youth attitudes. This approach is paired with a psychometric model based on Item Response Theory, specifically the Graded Response Model, which interprets attitudes as manifestations of latent traits, revealing how different factors shape perceptions of VAW. Together, these methods offer a comprehensive analysis of young people's views on VAW, highlighting both individual response patterns and broader cultural trends essential for designing effective interventions. Findings indicate a gradual shift in attitudes toward gender roles; however, traditional views remain prevalent, especially among young males. Socioeconomic factors, such as parents' employment status, also contribute to the persistence of stereotypes, underscoring the need for targeted interventions to address and reduce VAW in youth populations.

Statistics 101, 201, and 202: Three Shiny Apps for Teaching Probability Distributions, Inferential Statistics, and Simple Linear Regression

2026-03-30T10:56:59Z

Statistics 101, 201, and 202 are three open-source interactive web applications built with R \citep{R} and Shiny \citep{shiny} to support the teaching of introductory statistics and probability. The apps help students carry out common statistical computations -- computing probabilities from standard probability distributions, constructing confidence intervals, conducting hypothesis tests, and fitting simple linear regression models -- without requiring prior knowledge of R or any other programming language. Each app provides numerical results, plots rendered with \texttt{ggplot2} \citep{ggplot2}, and inline mathematical derivations typeset with MathJax \citep{cervone2012mathjax}, so that computation and statistical reasoning appear side by side in a single interface. The suite is organised around a broad pedagogical progression: Statistics~101 introduces probability distributions and their properties; Statistics~201 addresses confidence intervals and hypothesis tests; and Statistics~202 covers the simple linear model. All three apps are freely accessible online and their source code is released under a CC-BY-4.0 license.

Statistical Compatibility, Refutational Information, and Acceptability

2026-03-29T14:53:07Z

This paper develops an interpretive framework for divergence P-values and S-values within a descriptive frequentist perspective. Statistical analysis is framed as operating within idealized worlds defined by a set of assumptions and a target hypothesis, where probabilities describe the behavior of data under the model but do not assign truth values to hypotheses. Within this view, P-values are interpreted as graded indices of compatibility between the observed result and the predictions generated by the assumed model; accordingly, small P-values should not be read as indicating logical impossibility or strict inconsistency of the model itself. Building on this distinction, the paper argues that practical inference requires moving beyond the internal logic of the model toward judgments of overall acceptability, which depend not only on data-model compatibility but also on multiple contextual considerations such as subject-matter knowledge, plausibility of assumptions, data quality, usefulness, and loss - all interpreted through the competence, intentions, perceptions, and moral values of the specific analyst. S-values are therefore interpreted not as evidence against the epistemic status of the model, but as a specific form of refutational information that contributes to the broader body of information used by the analyst to judge whether a model remains acceptable for an intended practical purpose. The paper also examines the linguistic and conceptual risks associated with the language of incompatibility, distinguishes probability from rarity, and clarifies different notions of surprise - including a possible definition of Shannon-type surprise, to be distinguished from Bayesian belief revision. Overall, the article proposes a more cautious and explicit interpretation of frequentist measures, centered on model-based description, analyst responsibility, and decision acceptability.

Network Evolution and National Interests: Global Scientific Reorganization and the Rise of Scientific Nationalism

2026-03-28T17:54:31Z

The global network of scientific cooperation has undergone major restructuring over the past two decades, with important implications for geopolitics and science policy. China's integration into this network has redistributed positions of influence in ways that challenge zero-sum views of national competition and security. Drawing on structural holes theory and the Bianconi-Barabasi fitness model, we argue that China's entry accelerated an ongoing process of network maturation. As China's scientific capacity expanded, it formed direct collaborations that reduced reliance on U.S. intermediation. Network analysis shows a large decline in U.S. betweenness centrality, while weighted measures remain stable, indicating a loss of brokerage advantages but continued strong bilateral ties. Granger causality tests suggest that China's early participation predicted later structural changes across fields. Results are consistent across six major domains.

Hybrid physics-data driven spectral forecasts of semisubmersible response

2026-03-27T02:48:44Z

A framework for probabilistic forecasting of vessel motion is developed and validated for a semisubmersible operating in long period swell. Bayesian statistical methods are applied to predictions of the heave response from a physics model using numerical wave spectra and measured motion data. Model diagnoses motivate an additional level of complexity required for the error structure in the Bayesian model, specifically to account for heteroskedasticity and time-correlated errors. The hybrid model forecasts were evaluated during periods where the heave resonance and cancellation frequencies were excited. The method is demonstrated to be effective for providing reliable quantification of uncertainty and correcting bias in the raw physics model predictions. This justifies its value for improving the efficiency and safety of offshore operations.

Identification of physiological shock in intensive care units via Bayesian regime switching models

2026-03-25T23:30:33Z

Detection of occult hemorrhage (i.e., internal bleeding) in patients in intensive care units (ICUs) can pose significant challenges for critical care workers. Because blood loss may not always be clinically apparent, clinicians rely on monitoring vital signs for specific trends indicative of a hemorrhage event. The inherent difficulties of diagnosing such an event can lead to late intervention by clinicians which has catastrophic consequences. Therefore, a methodology for early detection of hemorrhage has wide utility. We develop a Bayesian regime switching model (RSM) that analyzes trends in patients' vitals and labs to provide a probabilistic assessment of the underlying physiological state that a patient is in at any given time. This article is motivated by a comprehensive dataset we curated from Mayo Clinic of 33,924 real ICU patient encounters. Longitudinal response measurements are modeled as a vector autoregressive process conditional on all latent states up to the current time point, and the latent states follow a Markov process. We present a novel Bayesian sampling routine to learn the posterior probability distribution of the latent physiological states, as well as develop an approach to account for pre-ICU-admission physiological changes. A simulation and real case study illustrate the effectiveness of our approach.

Adversarial Selection

2026-03-25T18:52:41Z

In many institutional settings, $k$ items are selected with the goal of representing the underlying distribution of claims, opinions, or characteristics in a large population. We study environments with two adversarial parties whose preferences over the selected items are commonly known and opposed. We propose the Quantile Mechanism: one party partitions the population into $k$ disjoint subsets, and the other selects one item from each subset. We show that this procedure is optimally representative among all feasible mechanisms, and illustrate its use in jury selection, multi-district litigation, and committee formation.

E-values as statistical evidence: A comparison to Bayes factors, likelihoods, and p-values

2026-03-25T15:32:53Z

A recurring debate in the philosophy of statistics concerns what, exactly, should count as a measure of evidence for or against a given hypothesis. P-values, likelihood ratios, and Bayes factors all have their defenders. In this paper we add two additional candidates to this list: the e-value and its sequential analogue, the e-process. E-values enjoy several desirable properties as measures of evidence: they combine naturally across studies, handle composite hypotheses, provide long-run error rates, and admit a useful interpretation as the wealth accrued by a bettor in a game against the null distribution. E-processes additionally handle optional stopping and optional continuation. This work examines the extent to which e-values and e-processes satisfy the evidential desiderata of different statistical traditions, concluding that they combine attractive features of p-values, likelihood ratios, and Bayes factors, and merit serious consideration as interpretable and intuitive measures of statistical evidence.

Exact and limit results for the CTRW in presence of drift and position dependent noise intensity

2026-03-24T11:45:15Z

Continuous-time random walks (CTRWs) with drift and position-dependent jumps provide a general framework for describing a wide range of natural and engineered systems. We analyze the stochastic differential equation associated with this class of models, in which the driving noise consists of spike (shot) events, and we derive two exact analytical results. First, we obtain a closed-form expression for the $n$-time correlation functions of The noise, expressed as a sum over all $2^{n-1}$ ordered partitions of the observation times (Proposition 2). Second, using the $G$-cumulant formalism, we derive an \emph{exact} non-local master equation (ME) for the probability density function of the CTRW variable, valid without invoking diffusive limits, fractional scaling assumptions, or closure hypotheses (Proposition 3). In interaction representation, this ME retains the same structural form as that of the standard CTRW without drift or position-dependent jumps. Our main result is the emergence of a \emph{universal local master equation}: at long times, the exact non-local ME is universally and accurately approximated by a time-local ME whose only coefficient is the instantaneous renewal rate $R(t)$. From this equation, exact in the well known Poissonian case, both local and global properties of the PDF can be readily inferred. For example, the temporal behavior of the PDF is directly controlled by that of the rate function $R(t)$: if the waiting-time distribution decays as a power law with exponent $μ>2$, then $R(t)\to const$ and the system converges to the Poissonian equilibrium. By contrast, for $μ<2$, the rate decays in time and the effective diffusion induced by the noise slowly weakens, without leading to a stationary state. Numerical experiments confirm its remarkable accuracy even far beyond regimes where a naive time-scale separation would justify it.

The Rise of Null Hypothesis Significance Testing (NHST): Institutional Massification and the Emergence of a Procedural Epistemology

2026-03-23T13:00:25Z

It has long been a puzzle why, despite sustained reform efforts, many applied scientific fields remain dominated by Null Hypothesis Significance Testing (NHST), a framework that dichotomizes study results and privileges "statistically significant" findings. This paper examines that puzzle by situating the development and rise of NHST within its historical and institutional context. Taking Actor-Network Theory as a point of entry, the analysis identifies the conditions under which particular inferential technologies stabilize and endure. The analysis shows that, although NHST does not resolve the technical problem of statistical inference, it came to dominate as a social technology that addressed the most pressing institutional challenge of the postwar period: the mass expansion of scientific networks. Under conditions of rapid institutional growth, NHST's technical slippages--purging research context and replacing epistemic judgment with mechanical procedures--became functional features rather than flaws. These features enabled procedural self-sufficiency across settings marked by heterogeneous goals and uneven expertise, thereby sealing NHST's position as the obligatory passage point in many postwar scientific fields.

A Bayesian Reinterpretation of Cornfield-Type Sensitivity Analysis: From Thresholds to Probabilities

2026-03-19T14:08:27Z

Sensitivity analysis for unmeasured confounding in observational studies is commonly based on threshold quantities, such as the Cornfield condition or the E-value, which quantify how strong a confounder must be to explain away an observed association. However, these approaches do not address a fundamental inferential question: how plausible is it that such a confounder exists? In this work, we propose a Bayesian reformulation of Cornfield-type sensitivity analysis in which the strength of unmeasured confounding is treated as a random variable. Within this framework, the E-value is reinterpreted as a threshold, and the central inferential quantity becomes the posterior probability that confounding exceeds this threshold. This transforms sensitivity analysis from a descriptive diagnostic into a probabilistic assessment of robustness. We develop a simple generative model linking observed effect estimates to true causal effects and confounding bias, and we specify prior distributions reflecting plausible confounding mechanisms. The resulting framework yields posterior measures of evidential vulnerability that are directly interpretable and applicable to summary-level data. Illustrations based on empirical case studies show that the proposed approach preserves the interpretability of the E-value while providing a more nuanced and decision-relevant characterization of robustness. More broadly, the framework aligns sensitivity analysis with Bayesian principles of inference under uncertainty, offering a coherent alternative to purely threshold-based reasoning.

Minimum Volume Conformal Sets for Multivariate Regression

2026-03-18T17:44:07Z

Conformal prediction provides a principled framework for constructing predictive sets with finite-sample validity. While much of the focus has been on univariate response variables, existing multivariate methods either impose rigid geometric assumptions or rely on flexible but computationally expensive approaches that do not explicitly optimize prediction set volume. We propose an optimization-driven framework based on a novel loss function that directly learns minimum-volume covering sets while ensuring valid coverage. This formulation naturally induces a new nonconformity score for conformal prediction, which adapts to the residual distribution and covariates. Our approach optimizes over prediction sets defined by arbitrary norm balls, including single and multi-norm formulations. Additionally, by jointly optimizing both the predictive model and predictive uncertainty, we obtain prediction sets that are tight, informative, and computationally efficient, as demonstrated in our experiments on real-world datasets.

Either a Confidence Interval Covers, or It Doesn't (Or Does It?): A Model-Based View of Ex-Post Coverage Probability

2026-03-17T19:34:31Z

In Neyman's original formulation, a 1-alpha confidence interval procedure is justified by its long-run coverage properties, and a single realized interval is to be described only by the slogan that it either covers the parameter or it does not. On this view, post-data probability statements about the coverage of an individual interval are taken to be conceptually out of bounds. In this paper, I present two kinds of arguments against treating that "either-or" reading as the only legitimate interpretation of confidence. The first is informal, via a set of thought experiments in which the same joint probability model is used to compute both forward-looking and backward-looking probabilities for occurred-but-unobserved events. The second is more formal, recasting the standard confidence-interval construction in terms of infinite sequences of trials and their associated 0/1 coverage indicators. In that representation, the design-level coverage probability 1-alpha and the degenerate conditional probabilities given the full data appear simply as different conditioning levels of the same model. I argue that a strict behavioristic reading that privileges only the latter is in tension with the very mathematical machinery used to define long-run error rates. I then sketch an alternative view of confidence as a predictive probability (or forecast) about the coverage indicator, together with a simple normative rule for when intermediate probabilities for single coverage events should be allowed. Keywords: confidence intervals; coverage probability; frequentist inference; single-case probability; predictive probability; Neyman. Disclaimer: The findings and conclusions in this report are those of the author and do not necessarily represent the official position of the Centers for Disease Control and Prevention.

Balance and Fairness through Multicalibration in Nonlife Insurance Pricing

2026-03-17T09:50:27Z

Autocalibration is known to be an important requirement for insurance premiums since it guarantees that premium income balances corresponding claims, on average, not only at portfolio level but also inside each group paying similar premiums. Also, fairness has become a major concern because unfair treatment may expose insurers to lawsuits or reputational damage. Translating fairness into conditional mean independence allows actuaries to combine autocalibration and fairness into the multicalibration concept. This paper studies the properties of multicalibration in an insurance context and proposes practical ways to implement it, through local regression or bias correction within groups including credibility adjustments. A case study based on motor insurance data illustrates the relevance of multicalibration in insurance pricing.