https://arxiv.org/api//1dBiMq22Xh5AFdfLjMsWSN/dKE 2026-06-14T02:12:47Z 23522 210 15 http://arxiv.org/abs/2605.01050v3 Trust Me, I'm a Doctor? 2026-05-28T16:17:47Z

Clinical trials usually target average treatment effects, but treatment decisions are made for individuals. This tension motivates a common criticism of evidence-based medicine: a treatment that is beneficial on average may be inappropriate for a particular patient, and skilled physicians may outperform rigid adherence to the strategy that performed best in a randomized trial. We consider how randomized and observational data from the same target population can be used to assess that possibility. Specifically, we study settings in which a randomized trial is nested within an observational cohort, so that outcomes are observed under treatment, control, and usual care. We ask what the observed data can reveal about how often physicians outperform the strategy suggested by the trial. We derive sharp bounds on the proportion of physicians whose personal strategies perform better than always choosing the better performing treatment from the trial under the assumption that no physician's strategy is worse than always choosing the worse performing treatment from the trial. These results shed light on when clinical data support relying on physician discretion over the trial-average recommendation and when stronger justification is required.

2026-05-01T19:25:32Z Zach Shahn Mats Stensrud http://arxiv.org/abs/2605.30157v1 Leveraging Large Language Models to Improve Precision in Randomized Controlled Trials 2026-05-28T16:16:53Z

Large language models (LLMs) are increasingly used in statistical research and applications. However,they are also notorious for unreliable or biased information. Here, we explore whether LLMs can be used to improve the precision of randomized controlled trials (RCTs) in a safe and rigorous way. Following similar work on leveraging observational data, we incorporate LLM predictions into an RCT analysis. While incorporating external predictions to improve precision is not new, the value of using LLM predictions in this manner is an open question. We develop a pipeline for best leveraging LLM predictions in this context and apply it to three different case studies. We find that these predictions can safely improve precision, particularly when the RCT lacks predictive covariates or contains covariates, such as text data, that are well-suited to LLMs.

2026-05-28T16:16:53Z Submitted to Machine Learning and Artificial Intelligence for Causal Inference in the Behavioral and Social Sciences: Methodological Advances and Applications, a topical issue of the Zeitschrift für Psychologie Jaylin Lowe Adam Sales Johann A. Gagnon-Bartsch http://arxiv.org/abs/2605.30034v1 Constructing Contact and Connectivity Matrices for Infectious Disease Modelling 2026-05-28T14:55:24Z

Contact (or mixing, or more generally connectivity) matrices are a fundamental component of modelling and inference for infectious disease epidemiology. Their structure and parametrisation directly accounts for the frequency of interactions between different subpopulations of individuals, as well as having the potential to encode dynamic heterogeneity in these interactions across demographic axes, space and time. Considerable research has been devoted to the structure and estimation of (components of) these matrices to help inform outbreak control and forecast disease spread. In this paper, we review the existing literature on the data types used to construct contact matrices and the methods for incorporating uncertainties and heterogeneities into them. We also highlight remaining challenges and future directions in the use of these contact matrices for epidemiological research.

2026-05-28T14:55:24Z Xiahui Li Dongni Zhang Neha Bansal Jessica R. E. Bridgen Chris Jewell Emma McBryde Glenn Marion Emily Nixon Philip D. O'Neill David J. Pascall Lorenzo Pellis Simon E. F. Spencer Panayiota Touloupou Lloyd Chapman Ben Swallow http://arxiv.org/abs/2605.26964v2 Semiparametric Inference for Causal Effects on Functional Outcomes 2026-05-28T14:21:32Z

Difference-in-differences (DiD) is a cornerstone of causal inference, yet extending it to functional outcomes is not a routine scalar generalization; rather, it entails three fundamental challenges in identification, inference, and observation. This paper develops a comprehensive semiparametric inference framework for functional DiD with discretely observed data. First, we define the functional average treatment effect under parallel trends and derive its efficient influence function (EIF), thereby establishing the semiparametric efficiency bound. Second, leveraging Neyman orthogonality and cross-fitting, we construct a debiased estimator that effectively mitigates regularization bias arising from nonparametric reconstruction. Third, we establish weak convergence of the estimator and propose an asymptotically valid uniform confidence band, enabling a rigorous transition from pointwise to curve-level inference. Finally, we demonstrate that reconstruction error under discrete sampling is asymptotically negligible for semiparametric inference, ensuring practical feasibility. Simulations and empirical applications confirm that the proposed method achieves superior coverage and testing power in finite samples, providing a theoretically grounded and computationally tractable foundation for causal evaluation with functional data.

2026-05-26T12:52:11Z Junzhu Nie Chengxiu Ling Mengfei Ran http://arxiv.org/abs/2305.16842v8 Accounting statement analysis at industry level. A gentle introduction to the compositional approach 2026-05-28T14:19:41Z

Compositional data are contemporarily defined as positive vectors, the ratios among whose elements are of interest to the researcher. Financial statement analysis by means of accounting ratios a.k.a. financial ratios fulfils this definition to the letter. Compositional data analysis solves the major problems in statistical analysis of standard financial ratios at industry level, such as skewness, non-normality, non-linearity, outliers, and dependence of the results on the choice of which accounting figure goes to the numerator and to the denominator of the ratio. Despite this, compositional applications to financial statement analysis are still rare. In this article, we present some transformations within compositional data analysis that are particularly useful for financial statement analysis. We show how to compute industry or sub-industry means of standard financial ratios from a compositional perspective by means of geometric means. We show how to visualise firms in an industry with a compositional principal-component-analysis biplot; how to classify them into homogeneous financial performance profiles with compositional cluster analysis; and how to introduce financial ratios as variables in a statistical model, for instance to relate financial performance and firm characteristics with compositional regression models. We show an application to the accounting statements of Spanish wineries using the decomposition of return on equity by means of DuPont analysis, and a step-by-step tutorial to the compositional freeware CoDaPack.

2023-05-26T11:47:29Z Germà Coenders University of Girona Núria Arimany Serrat University of Vic - Central University of Catalonia http://arxiv.org/abs/2512.03109v2 E-valuator: Reliable Agent Verifiers with Sequential Hypothesis Testing 2026-05-28T14:15:57Z

Agentic AI systems execute a sequence of actions, such as reasoning steps or tool calls, in response to a user prompt. To evaluate the success of their trajectories, researchers have developed verifiers, such as LLM judges and process-reward models, to score the quality of each action in an agent's trajectory. Although these heuristic scores can be informative, there are no guarantees of correctness when used to decide whether an agent will yield a successful output. Here, we introduce e-valuator, a method to convert any black-box verifier score into a decision rule with provable control of false alarm rates. We frame the problem of distinguishing successful trajectories (that is, a sequence of actions that will lead to a correct response to the user's prompt) and unsuccessful trajectories as a sequential hypothesis testing problem. E-valuator builds on tools from e-processes to develop a sequential hypothesis test that remains statistically valid at every step of an agent's trajectory, enabling online monitoring of agents over arbitrarily long sequences of actions. Empirically, we demonstrate that e-valuator provides greater statistical power and better false alarm rate control than other strategies across six datasets and three agents. We additionally show that e-valuator can be used for to quickly terminate problematic trajectories and save tokens. Together, e-valuator provides a lightweight, model-agnostic framework that converts verifier heuristics into decisions rules with statistical guarantees, enabling the deployment of more reliable agentic systems.

2025-12-02T05:59:18Z Shuvom Sadhuka Drew Prinster Clara Fannjiang Gabriele Scalia Bonnie Berger Aviv Regev Hanchen Wang http://arxiv.org/abs/2605.27265v2 Quantifying Social Inflation in Liability Insurance with Advanced Statistical Methods 2026-05-28T13:57:04Z

Social inflation, which is the rise in liability claim costs beyond general economic inflation, has become a major concern for insurers and reinsurers, yet it is difficult to quantify because litigation outcomes are heavy-tailed and the mix of cases reaching verdict versus settlement changes over time. Using a large database of US jury verdicts and settlements, we develop case-mix-adjusted social inflation measures through multiple channels that matter to reinsurers: plaintiff win rates (a frequency-type channel), settlement propensity (a frequency-type channel), and verdict/settlement severity. The approach combines rolling-window logistic regression for probabilities and quantile (value-at-risk) regression for severities, with uncertainty quantified via a random-weighted bootstrap. We find statistically significant relative increases in plaintiff win probability of approximately 20%-30% from 2009 to 2024, alongside a statistically significant relative decline in settlement probability of more than 10% over the same period. The dominant channel is verdict severity: Even after controlling for explanatory variables, verdict awards show a sharp rise after 2020, increasing by more than 100% from 2020 to 2024, whereas settlement amounts show limited and often statistically insignificant inflation. Therefore, inflation in total amounts payable to plaintiffs closely tracks verdict severity. Social inflation is more pronounced in corporate-defendant and uninsured-defendant cases and in states without tort caps or third-party litigation funding regulation. In addition, we find that social inflation has impacts not only on "nuclear verdicts" but also, in a similar manner, on moderate losses.

2026-05-26T16:42:07Z Tsz Chai Fung Lie Ma Liang Peng Fang Yang http://arxiv.org/abs/2406.15844v3 Bayesian modeling of multi-species labeling errors in ecological studies 2026-05-28T13:56:38Z

Ecological and conservation studies monitoring bird communities typically rely on species classification based on bird vocalizations. Historically, this has been based on expert volunteers going into the field and making lists of the bird species that they observe. Recently, machine learning algorithms have emerged that can accurately classify bird species based on audio recordings of their vocalizations. Such algorithms crucially rely on training data that are labeled by experts. Automated classification is challenging when multiple species are vocalizing simultaneously, there is background noise, and/or the bird is far from the microphone. In continuously monitoring different locations, the size of the audio data become immense and it is only possible for human experts to label a tiny proportion of the available data. In addition, experts can vary in their accuracy and breadth of knowledge about different species. This article focuses on the important problem of combining sparse expert annotations to improve bird species classification while providing uncertainty quantification. We additionally are interested in providing expert performance scores to increase their engagement and encourage improvements. We propose a Bayesian hierarchical modeling approach and evaluate this approach on a new community science platform developed in Finland.

2024-06-22T13:16:38Z Haoxuan Wang Patrik Lauha David B. Dunson http://arxiv.org/abs/2603.15192v2 Benchmarking Formula 1 results using a normal model 2026-05-28T13:07:48Z

There is enduring interest in disentangling the effects of skill and luck in sport. A key issue in Formula 1 is distinguishing between car-level and driver-level effects. Four elite teams currently dominate Formula 1 and have won every major race for the last four years. In this paper we use univariate and bivariate normal models to quantify reasonable performance expectations at both driver and team levels, distinguishing between elite and non-elite teams. We illustrate our approach with an application to the last fully completed 2025 season.

2026-03-16T12:27:34Z John Fry Silvio Fanzon Mark Austin Tom Brighton http://arxiv.org/abs/2605.28327v2 Insurance Pricing Optimization via Off-Policy Evaluation 2026-05-28T12:19:33Z

Traditional insurance pricing relies on risk-based principles that ensure actuarial fairness and solvency but do not explicitly account for policyholders' price sensitivity. We formulate insurance pricing as a decision-making problem and study it using tools from off-policy evaluation and stochastic control. We propose a kernelized inverse propensity score estimator that exploits local structure in the action space and yields variance reduction compared to the classical inverse propensity score estimator. Building on these value estimates, we investigate policy optimization and present two practical approaches for computing optimal pricing rules: an interpretable data-shared Lasso formulation and a flexible policy parameterization based on neural networks. Using a controlled synthetic travel insurance environment, we empirically confirm the theoretical results and show that neural networks outperform existing techniques for policy optimization.

2026-05-27T11:27:32Z Sascha Günther Dimitri Semenovich Mario V. Wüthrich http://arxiv.org/abs/2605.29830v1 A Multi-factorial Innovation Model with Feature Interaction 2026-05-28T12:09:55Z

We introduce an Indian-buffet-type model for multi-factorial innovation in which each arriving agent may exhibit both previously observed and new features. The number of new features follows a power-law behavior, while the probability of selecting an old feature combines self-reinforcement, depending on the feature-specific popularity, with a mean-field interaction term depending on the average popularity of all observed features. The model is governed by the usual innovation parameters (mass, discount and concentration), together with two additional parameters: one controlling the strength of reinforcement against a forcing input toward zero, and one regulating the intensity of feature interaction. Although the growth of the total number of distinct observed features has the same behavior as in the three-parameter Indian buffet process, the interaction mechanism produces new asymptotic regimes. For aggregate quantities, including the predictive mean, the averaged number of features per agent, the mean inclusion probability, and the mean feature popularity, the phase transition is determined by the comparison between the discount parameter and the weight of the forcing input. For feature-specific quantities, a further transition appears according to the comparison between the interaction level and a critical threshold. In particular, high interaction leads to an asymptotic synchronization of feature-specific inclusion probabilities. We establish strong laws and second-order asymptotic results, including central limit theorems in regimes where martingale fluctuations compete with deterministic or random terms. The analysis relies on novel general results for recursive stochastic dynamics, which may be useful beyond the present framework.

2026-05-28T12:09:55Z Giacomo Aletti Irene Crimaldi Andrea Ghiglietti http://arxiv.org/abs/2602.05786v2 Selecting Hyperparameters for Tree-Boosting 2026-05-28T06:28:05Z

Tree-boosting is a widely used machine learning technique for tabular data. However, its out-of-sample accuracy is critically dependent on multiple hyperparameters. In this article, we empirically compare several popular methods for hyperparameter optimization for tree-boosting including random grid search, the tree-structured Parzen estimator (TPE), Gaussian-process-based Bayesian optimization (GP-BO), Hyperband, the sequential model-based algorithm configuration (SMAC) method, and deterministic full grid search using $59$ regression and classification data sets. We find that the SMAC method clearly outperforms all the other considered methods. We further observe that (i) a relatively large number of trials larger than $100$ is required for accurate tuning, (ii) using default values for hyperparameters yields very inaccurate models, (iii) all considered hyperparameters can have a material effect on the accuracy of tree-boosting, i.e., there is no small set of hyperparameters that is more important than others, and (iv) choosing the number of boosting iterations using early stopping yields more accurate results compared to including it in the search space for regression tasks.

2026-02-05T15:44:42Z Floris Jan Koster Fabio Sigrist http://arxiv.org/abs/2605.29424v1 Model-free estimation in scattering analysis of microscopy 2026-05-28T06:18:48Z

The mean squared displacement (MSD) of particles or probes is commonly estimated from microscopy videos using particle tracking approaches, which rely on tuning parameters manually, and are often unstable over the entire lag time range, especially in dense or low-contrast situations. In this work, we propose model-free ab initio uncertainty quantification (MF-AIUQ), a model-free method for scattering analysis of microscopy video based on a probabilistic framework, which estimates MSD without isolating particles and linking their trajectories. Based on the relationship between the intermediate scattering function (ISF) and the MSD derived from the cumulant theorem, MF-AIUQ estimates the MSD values by the marginal maximum likelihood estimator. To reduce the computational cost, the likelihood function is approximated by a subset of Fourier-transformed intensities. These intensities are equally spaced at the logarithmic values of Fourier basis functions and lag time points. We found that the ISF is smooth in this logarithmic input space, and the information of the ISF can be captured by this subset of inputs. We examine the method through simulation studies covering several representative stochastic processes and three experimental systems: a Newtonian fluid for evaluating performance in optically dense and bright-field settings, a gelation system with an evolving MSD shape, and snail mucin, a viscoelastic biopolymer, for modulus estimation. Across these studies, MF-AIUQ provides smooth and stable MSD estimates over the full lag time range and serves as a useful complementary approach in settings where particle tracking is unreliable or a parametric model of MSD is unavailable or unverifiable.

2026-05-28T06:18:48Z 18 pages, 6 figures Tong Lin Jinseok Lee Matt Helgeson Megan T. Valentine Yimin Luo Mengyang Gu http://arxiv.org/abs/2605.29413v1 From Classical Optimization to Bayesian Integration: A Comprehensive Analysis of Systematic Portfolio Management 2026-05-28T06:02:21Z

This paper compares a series of contemporary portfolio construction approaches by employing ten U.S. stocks (TSLA, WMT, BAC, GS, LLY, MRK, GOOG, META, AAPL and XOM) in a time frame from September 2023 to December 2025. The paper explores both basic mean-variance optimization, constrained optimization, Fama French five factor regression modeling, Monte Carlo simulation, and the Black-Litterman model to determine how constraints to a solution, risk factors to a strategy, simulated approximations, and specific market views may all impact the outcome of portfolio allocation, performance and stability. Overall, the results show that standard optimization may result in highly concentrated portfolios, while constrained optimization leads to changes in portfolio allocations by altering the efficient frontier, five factor regression models suggest that a basic investment style of defensive large value and profitability exposure, Monte Carlo approximation is a viable technique to arrive at mean-variance optimal portfolios provided the simulations are high enough especially under a box constraint, the Black Litterman portfolio approach produces more economically intuitive allocations and greater stability compared to standard mean-variance optimization as the approach balances equilibrium returns with investor views.

2026-05-28T06:02:21Z Ajay Kumar Verma Shravya Barkam http://arxiv.org/abs/2605.29403v1 Power Estimation for Longitudinal Studies with Time Dependent Covariates Using Generalized Method of Moments 2026-05-28T05:56:01Z

Longitudinal studies frequently incorporate covariates that evolve over time, creating complex dependence structures between outcomes and predictors. When covariates are time dependent, standard power analysis tools--largely developed for generalized estimating equations (GEE)--can yield misleading results because they do not account for the moment based structure required for valid marginal inference. Generalized Method of Moments (GMM) provides a flexible and efficient framework for estimating marginal effects in the presence of time dependent covariates, yet no practical tools exist for conducting power analysis under GMM. This paper introduces a modern, implementable framework for power estimation in longitudinal studies with time dependent covariates using GMM. Two complementary approaches are developed: a Wald based method that leverages the asymptotic normality of GMM estimators, and a distance metric method based on quadratic forms of sample and population moment conditions. Both approaches require only limited distributional assumptions and rely on valid moment conditions rather than full likelihood specification. We outline the theoretical foundations, provide step by step implementation guidance, and illustrate the methods using data from the Osteoarthritis Initiative. A simulation framework is presented for evaluating empirical performance. These methods fill a critical gap in the longitudinal modeling literature by offering applied researchers a practical, distribution light approach to power estimation when time dependent covariates are present and GMM is the preferred estimation technique.

2026-05-28T05:56:01Z 27 pages with appendix, 16 pages main manuscript, 3 figures in main manuscript, 7 figures including figures in appendix Niloofar Ramezani Oliver Hurst