https://arxiv.org/api/Y9aPJUxDRFvILuiRTfgjyoiOCSA 2026-06-18T10:44:11Z 23571 285 15 http://arxiv.org/abs/2605.30609v1 Rectified Linear Unit Regression 2026-05-28T22:02:55Z

This paper develops a regression framework for the direct estimation of integrated functionals of conditional outcome distributions. The proposed method, termed rectified linear unit (ReLU) regression, projects the ReLU-transformed outcome onto covariates and admits a closed-form estimator. Its population regression function coincides with the integrated conditional distribution function of the outcome, and its convex conjugate, obtained via the Legendre-Fenchel transformation, recovers the integrated conditional quantile function. Both the regression and its conjugate require only mild distributional assumptions and accommodate non-continuous outcomes. We establish the uniform asymptotic distribution of the estimator and develop inference for the conjugate functional via the delta method for Hadamard directionally differentiable maps. Building on these results, we establish identification and inference for average quantile treatment effects over arbitrary subintervals of probability levels. This broadens the set of distributional parameters available to empirical work.

2026-05-28T22:02:55Z Tatsushi Oka http://arxiv.org/abs/2605.30607v1 Orthogonalized Kernel Regression for Spatial and Spatio-Temporal Residual Risk: Application to School Shootings in the Contiguous United States 2026-05-28T21:59:53Z

Distinguishing background heterogeneity from excess risk is a central challenge in case-control event data when both covariates and residual spatial or spatio-temporal structure matter. We develop a covariate-adjusted kernel regression framework that embeds an orthogonalized residual risk surface within a semiparametric binary model, and extend the approach from purely spatial to explicit spatio-temporal analysis. We apply the method to 959 gun violence incidents at public schools in the contiguous United States from 2000 to 2024, using incidents from the K-12 School Shooting Database linked to official school records for the corresponding year. The fitted models identify stable school-level associations, including markedly higher risk for larger schools and for middle and high schools, while also revealing substantial residual structure beyond the background distribution of schools. After adjustment for covariates, excess risk is found to remain concentrated in a persistent central-eastern corridor of the United States, with the strongest evidence appearing in recent years. More broadly, the analysis shows how residual risk surfaces can sharpen inference by separating background heterogeneity from anomalous structure in case-control event processes evolving over space and time.

2026-05-28T21:59:53Z 31 pages, 4 figures Tilman M. Davies Michael R. Desjardins Alexander Hohl Guangzhen Wu http://arxiv.org/abs/2606.00128v1 Exploring the periodicity of flight patterns 2026-05-28T19:25:40Z

Each year the American Statistical Association (ASA) hosts the Annual Data Challenge Expo, which tasks participants with analyzing a given dataset and presenting their work at the Joint Statistical Meeting (JSM). The 2025 Data Challenge Expo tasked participants with analyzing over 35 years of commercial flight data from the United States Bureau of Transportation Statistics (BTS). These data provide extensive geographic coverage and operational details for the U.S. domestic aviation market. For millions of past flights, there is information about the flight's date, origin, destination, carrier, plane, departure, and arrival. In this article, we present our analysis for the 2025 JSM Data Challenge Expo. We chose to explore patterns in the daily scheduling of departures and arrivals across airlines, airports, and time. In doing so, we observed distinct scheduling ``waves'', or periodic structures at major airline hubs as well as large Federal Aviation Administration (FAA) hubs. In the remainder of this article, we detail the process of visualizing periodicity in flight scheduling as well as quantifying it through the calculation of Shannon entropy. An additional element to the 2025 Data Challenge Expo is the incorporation of a second dataset, to be decided by the participants. We detail the use of a BTS dataset with passenger enplanement (boarding) information to determine Federal Aviation Administration (FAA) hub classification (as opposed to airline-specific hubs). Furthermore, we discuss results from this visual and quantitative analysis, highlighting noticeable differences in the scheduling periodicity and entropy across airports, for the ``big four'' or four largest carriers, in U.S. aviation: American Airlines, Delta Air Lines, United Airlines, and Southwest Airlines.

2026-05-28T19:25:40Z Sarah M. Coleman H. Sherry Zhang Lydia R. Lucchesi Saptarshi Roy http://arxiv.org/abs/2605.30471v1 Multidimensional Item Response Theory under General Latent Distributions 2026-05-28T18:41:56Z

Multidimensional item response theory (MIRT) provides an important psychometric framework for modeling how multiple latent traits jointly influence observed item responses. In most existing estimation procedures, the latent trait distribution is assumed to be Gaussian. Although computationally convenient, this assumption can be restrictive in many applications where the latent distribution exhibits skewness, heavy tails, or multimodality. More importantly, misspecifying the latent distribution may bias the estimation of item parameters and latent traits. To address this limitation, we propose a data-driven flow-based framework for MIRT models that can capture a broad class of non-Gaussian latent distributions. The proposed approach represents the latent distribution as an invertible transformation of a simple base distribution. For efficient estimation, we further introduce a conditional flow as a function of both the observed response and the noise to approximate the posterior distribution. Under this framework, the item parameters, latent distribution, and posterior approximation can be learned jointly. Comprehensive simulation studies show that the proposed method improves item-parameter and latent-trait recovery when the true latent distribution is non-normal. An application to a personality dataset further illustrates the practical utility of the proposed framework for modeling complex latent trait distributions in large-scale data.

2026-05-28T18:41:56Z Chengyu Cui Taoyi Chen Chun Wang Gongjun Xu http://arxiv.org/abs/2602.19296v2 A Causal Framework for Estimating Heterogeneous Effects of On-Demand Tutoring 2026-05-28T18:03:51Z

This paper introduces a scalable causal inference framework for estimating the immediate, session-level effects of on-demand human tutoring embedded within adaptive learning systems. Because students seek assistance at moments of difficulty, conventional evaluation is confounded by self-selection and time-varying knowledge states. We address these challenges by integrating principled analytic sample construction with Deep Knowledge Tracing (DKT) to estimate latent mastery, followed by doubly robust estimation using Causal Forests. Applying this framework to over 5,000 middle-school mathematics tutoring sessions, we find that requesting human tutoring increases next-problem correctness by approximately 4 percentage points and accuracy on the subsequent skill encountered by approximately 3 percentage points, suggesting that the effects of tutoring have proximal transfer across knowledge components. This effect is robust to various forms of model specification and potential unmeasured confounders. Notably, these effects exhibit significant heterogeneity across sessions and students, with session-level effect estimates ranging from $-20.25pp$ to $+19.91pp$. Our follow-up analyses suggest that typical behavioral indicators, such as student talk time, do not consistently correlate with high-impact sessions. Furthermore, treatment effects are larger for students with lower prior mastery and slightly smaller for low-SES students. This framework offers a rigorous, practical template for the evaluation and continuous improvement of on-demand human tutoring, with direct applications for emerging AI tutoring systems.

2026-02-22T18:10:36Z Kirk Vanacore Danielle R Thomas Digory Smith Bibi Groot Justin Reich Rene Kizilcec http://arxiv.org/abs/2605.30289v1 Statistical Embeddings for Similarity, Retrieval, and Interpretable Alignment of Numeric Tabular Datasets 2026-05-28T17:40:42Z

Numeric tabular datasets are the dominant data format in scientific practice, yet large language models lack native mechanisms for representing numeric datasets in a meaningful way across heterogeneous feature spaces. Existing approaches either target predictive modeling over individual datasets, which requires a shared set of variable definitions, or lack mechanisms for interpretable cross-dataset alignment. The proposed methodology characterizes numeric tabular datasets through structured exploratory data analysis descriptors, embeds those descriptors into a shared vector space using a pretrained sentence transformer, and quantifies cross-dataset similarity via Canonical Correlation Analysis (CCA). Furthermore, a penalized formulation of CCA is applied to recover sparse, interpretable variable-level correspondences between datasets, identifying which statistical descriptors or variable-level quantities drive cross-dataset alignment without requiring shared variable names or feature conventions. Differential privacy is optionally applied to the descriptor set prior to embedding, supporting deployment in sensitive data contexts without requiring access to raw observations at time of comparison. The methodology is evaluated across 15 datasets spanning general-purpose benchmarks, materials informatics, and nuclear-grade graphite characterization. Results demonstrate a total P@1 score of 0.9, with known nearest-neighbor retrieval and cluster structure remaining robust across embedding ablations and differential privacy budgets. The proposed framework provides a principled pathway for integrating heterogeneous numeric data into retrieval-augmented generation pipelines while preserving statistical context, with direct applications to data-driven algorithm selection and simulation model initialization for unknown datasets.

2026-05-28T17:40:42Z M. Ross Kunz John Merickel Keith Wilson http://arxiv.org/abs/2605.30209v1 Betting Against Integrity: Identifying Match-Fixing Through In-Play Market Dynamics 2026-05-28T16:44:54Z

Match-fixing undermines the integrity of sport by eroding public trust and threatening the financial sustainability of clubs and leagues. The global expansion of sports betting markets has created new incentives and opportunities for manipulation, calling for rigorous, data-driven monitoring tools. Football, which accounts for the largest share of global betting turnover, remains particularly exposed: integrity reports continue to flag several suspicious matches, with past scandals in Italy and Turkey underlining the problem's persistence. This study uses high-frequency live-betting data from the Italian Serie B (2018/19-2020/21) to explore statistical approaches for detecting abnormal betting behaviour. A state-space modelling framework is employed to describe standard betting market dynamics and to predict expected betting volumes conditional on match characteristics. Deviations from these expectations can then be analysed using outlier detection techniques to identify potentially suspicious periods. The results demonstrate how statistical modelling can contribute to the early identification of irregular betting patterns, thereby supporting integrity assurance in live sports betting markets.

2026-05-28T16:44:54Z David Winkelmann Maya Vienken Christian Deutscher Roland Langrock http://arxiv.org/abs/2605.30167v1 Visual Spatial Learning: Single-Field Spatial Interpolation Using Convolutional Neural Networks 2026-05-28T16:19:21Z

Predicting a complete spatially correlated field from sparse observations is a fundamental challenge in spatial statistics and environmental modelling. Classical interpolation methods such as Kriging rely on Gaussian process assumptions and variography, which can limit their effectiveness in non-stationary settings and require substantial domain expertise. In this work, we leverage an architecture based on convolutional neural networks (CNNs) for spatial interpolation that is trained and applied on a single partially observed field, without access to external data or prior fields. The model is supervised directly on the observed locations and learns to predict values at unobserved points on the user defined grid. Unlike Kriging, our method does not require explicit covariance modelling or variogram estimation, and it can flexibly capture local spatial patterns in a data-driven manner. This work demonstrates the potential of CNNs for single-instance spatial interpolation under sparse supervision, offering a practical alternative to classical geostatistical methods, and extending the use of CNNs to a new problem domain.

2026-05-28T16:19:21Z 53 pages, 10 figures Daniel Tinoco Raquel Menezes Carlos Baquero Alexandra Silva http://arxiv.org/abs/2605.01050v3 Trust Me, I'm a Doctor? 2026-05-28T16:17:47Z

Clinical trials usually target average treatment effects, but treatment decisions are made for individuals. This tension motivates a common criticism of evidence-based medicine: a treatment that is beneficial on average may be inappropriate for a particular patient, and skilled physicians may outperform rigid adherence to the strategy that performed best in a randomized trial. We consider how randomized and observational data from the same target population can be used to assess that possibility. Specifically, we study settings in which a randomized trial is nested within an observational cohort, so that outcomes are observed under treatment, control, and usual care. We ask what the observed data can reveal about how often physicians outperform the strategy suggested by the trial. We derive sharp bounds on the proportion of physicians whose personal strategies perform better than always choosing the better performing treatment from the trial under the assumption that no physician's strategy is worse than always choosing the worse performing treatment from the trial. These results shed light on when clinical data support relying on physician discretion over the trial-average recommendation and when stronger justification is required.

2026-05-01T19:25:32Z Zach Shahn Mats Stensrud http://arxiv.org/abs/2605.30157v1 Leveraging Large Language Models to Improve Precision in Randomized Controlled Trials 2026-05-28T16:16:53Z

Large language models (LLMs) are increasingly used in statistical research and applications. However,they are also notorious for unreliable or biased information. Here, we explore whether LLMs can be used to improve the precision of randomized controlled trials (RCTs) in a safe and rigorous way. Following similar work on leveraging observational data, we incorporate LLM predictions into an RCT analysis. While incorporating external predictions to improve precision is not new, the value of using LLM predictions in this manner is an open question. We develop a pipeline for best leveraging LLM predictions in this context and apply it to three different case studies. We find that these predictions can safely improve precision, particularly when the RCT lacks predictive covariates or contains covariates, such as text data, that are well-suited to LLMs.

2026-05-28T16:16:53Z Submitted to Machine Learning and Artificial Intelligence for Causal Inference in the Behavioral and Social Sciences: Methodological Advances and Applications, a topical issue of the Zeitschrift für Psychologie Jaylin Lowe Adam Sales Johann A. Gagnon-Bartsch http://arxiv.org/abs/2605.30034v1 Constructing Contact and Connectivity Matrices for Infectious Disease Modelling 2026-05-28T14:55:24Z

Contact (or mixing, or more generally connectivity) matrices are a fundamental component of modelling and inference for infectious disease epidemiology. Their structure and parametrisation directly accounts for the frequency of interactions between different subpopulations of individuals, as well as having the potential to encode dynamic heterogeneity in these interactions across demographic axes, space and time. Considerable research has been devoted to the structure and estimation of (components of) these matrices to help inform outbreak control and forecast disease spread. In this paper, we review the existing literature on the data types used to construct contact matrices and the methods for incorporating uncertainties and heterogeneities into them. We also highlight remaining challenges and future directions in the use of these contact matrices for epidemiological research.

2026-05-28T14:55:24Z Xiahui Li Dongni Zhang Neha Bansal Jessica R. E. Bridgen Chris Jewell Emma McBryde Glenn Marion Emily Nixon Philip D. O'Neill David J. Pascall Lorenzo Pellis Simon E. F. Spencer Panayiota Touloupou Lloyd Chapman Ben Swallow http://arxiv.org/abs/2605.26964v2 Semiparametric Inference for Causal Effects on Functional Outcomes 2026-05-28T14:21:32Z

Difference-in-differences (DiD) is a cornerstone of causal inference, yet extending it to functional outcomes is not a routine scalar generalization; rather, it entails three fundamental challenges in identification, inference, and observation. This paper develops a comprehensive semiparametric inference framework for functional DiD with discretely observed data. First, we define the functional average treatment effect under parallel trends and derive its efficient influence function (EIF), thereby establishing the semiparametric efficiency bound. Second, leveraging Neyman orthogonality and cross-fitting, we construct a debiased estimator that effectively mitigates regularization bias arising from nonparametric reconstruction. Third, we establish weak convergence of the estimator and propose an asymptotically valid uniform confidence band, enabling a rigorous transition from pointwise to curve-level inference. Finally, we demonstrate that reconstruction error under discrete sampling is asymptotically negligible for semiparametric inference, ensuring practical feasibility. Simulations and empirical applications confirm that the proposed method achieves superior coverage and testing power in finite samples, providing a theoretically grounded and computationally tractable foundation for causal evaluation with functional data.

2026-05-26T12:52:11Z Junzhu Nie Chengxiu Ling Mengfei Ran http://arxiv.org/abs/2305.16842v8 Accounting statement analysis at industry level. A gentle introduction to the compositional approach 2026-05-28T14:19:41Z

Compositional data are contemporarily defined as positive vectors, the ratios among whose elements are of interest to the researcher. Financial statement analysis by means of accounting ratios a.k.a. financial ratios fulfils this definition to the letter. Compositional data analysis solves the major problems in statistical analysis of standard financial ratios at industry level, such as skewness, non-normality, non-linearity, outliers, and dependence of the results on the choice of which accounting figure goes to the numerator and to the denominator of the ratio. Despite this, compositional applications to financial statement analysis are still rare. In this article, we present some transformations within compositional data analysis that are particularly useful for financial statement analysis. We show how to compute industry or sub-industry means of standard financial ratios from a compositional perspective by means of geometric means. We show how to visualise firms in an industry with a compositional principal-component-analysis biplot; how to classify them into homogeneous financial performance profiles with compositional cluster analysis; and how to introduce financial ratios as variables in a statistical model, for instance to relate financial performance and firm characteristics with compositional regression models. We show an application to the accounting statements of Spanish wineries using the decomposition of return on equity by means of DuPont analysis, and a step-by-step tutorial to the compositional freeware CoDaPack.

2023-05-26T11:47:29Z Germà Coenders University of Girona Núria Arimany Serrat University of Vic - Central University of Catalonia http://arxiv.org/abs/2512.03109v2 E-valuator: Reliable Agent Verifiers with Sequential Hypothesis Testing 2026-05-28T14:15:57Z

Agentic AI systems execute a sequence of actions, such as reasoning steps or tool calls, in response to a user prompt. To evaluate the success of their trajectories, researchers have developed verifiers, such as LLM judges and process-reward models, to score the quality of each action in an agent's trajectory. Although these heuristic scores can be informative, there are no guarantees of correctness when used to decide whether an agent will yield a successful output. Here, we introduce e-valuator, a method to convert any black-box verifier score into a decision rule with provable control of false alarm rates. We frame the problem of distinguishing successful trajectories (that is, a sequence of actions that will lead to a correct response to the user's prompt) and unsuccessful trajectories as a sequential hypothesis testing problem. E-valuator builds on tools from e-processes to develop a sequential hypothesis test that remains statistically valid at every step of an agent's trajectory, enabling online monitoring of agents over arbitrarily long sequences of actions. Empirically, we demonstrate that e-valuator provides greater statistical power and better false alarm rate control than other strategies across six datasets and three agents. We additionally show that e-valuator can be used for to quickly terminate problematic trajectories and save tokens. Together, e-valuator provides a lightweight, model-agnostic framework that converts verifier heuristics into decisions rules with statistical guarantees, enabling the deployment of more reliable agentic systems.

2025-12-02T05:59:18Z Shuvom Sadhuka Drew Prinster Clara Fannjiang Gabriele Scalia Bonnie Berger Aviv Regev Hanchen Wang http://arxiv.org/abs/2605.27265v2 Quantifying Social Inflation in Liability Insurance with Advanced Statistical Methods 2026-05-28T13:57:04Z

Social inflation, which is the rise in liability claim costs beyond general economic inflation, has become a major concern for insurers and reinsurers, yet it is difficult to quantify because litigation outcomes are heavy-tailed and the mix of cases reaching verdict versus settlement changes over time. Using a large database of US jury verdicts and settlements, we develop case-mix-adjusted social inflation measures through multiple channels that matter to reinsurers: plaintiff win rates (a frequency-type channel), settlement propensity (a frequency-type channel), and verdict/settlement severity. The approach combines rolling-window logistic regression for probabilities and quantile (value-at-risk) regression for severities, with uncertainty quantified via a random-weighted bootstrap. We find statistically significant relative increases in plaintiff win probability of approximately 20%-30% from 2009 to 2024, alongside a statistically significant relative decline in settlement probability of more than 10% over the same period. The dominant channel is verdict severity: Even after controlling for explanatory variables, verdict awards show a sharp rise after 2020, increasing by more than 100% from 2020 to 2024, whereas settlement amounts show limited and often statistically insignificant inflation. Therefore, inflation in total amounts payable to plaintiffs closely tracks verdict severity. Social inflation is more pronounced in corporate-defendant and uninsured-defendant cases and in states without tort caps or third-party litigation funding regulation. In addition, we find that social inflation has impacts not only on "nuclear verdicts" but also, in a similar manner, on moderate losses.

2026-05-26T16:42:07Z Tsz Chai Fung Lie Ma Liang Peng Fang Yang