https://arxiv.org/api/cC5YVeKwglUbkRZLpdO5eiWBgck2026-03-28T09:05:50Z163515015http://arxiv.org/abs/2412.12233v2Russian roulette: The need for stochastic potential outcomes when utilities depend on counterfactuals2025-09-16T16:23:51ZIt has been proposed in medical decision analysis to express the ``first do no harm'' principle as an asymmetric utility function in which the loss from killing a patient would count more than the gain from saving a life. Such a utility depends on unrealized potential outcomes, and we show how this yields a paradoxical decision recommendation in a simple hypothetical example involving games of Russian roulette. The problem is resolved if we abandon the stable unit treatment value assumption (SUTVA) and allow the potential outcomes to be random variables. This leads us to conclude that, if you are interested in this sort of asymmetric utility function, you need to move to the stochastic potential outcome framework. We discuss the implications of the choice of parameterization in this setting.2024-12-16T15:29:18ZAndrew GelmanJonas M. Mikhaeil10.1093/biomet/asaf062http://arxiv.org/abs/2302.08724v3Piecewise Deterministic Markov Processes for Bayesian Neural Networks2025-09-15T05:10:19ZInference on modern Bayesian Neural Networks (BNNs) often relies on a variational inference treatment, imposing violated assumptions of independence and the form of the posterior. Traditional MCMC approaches avoid these assumptions at the cost of increased computation due to its incompatibility to subsampling of the likelihood. New Piecewise Deterministic Markov Process (PDMP) samplers permit subsampling, though introduce a model specific inhomogenous Poisson Process (IPPs) which is difficult to sample from. This work introduces a new generic and adaptive thinning scheme for sampling from these IPPs, and demonstrates how this approach can accelerate the application of PDMPs for inference in BNNs. Experimentation illustrates how inference with these methods is computationally feasible, can improve predictive accuracy, MCMC mixing performance, and provide informative uncertainty measurements when compared against other approximate inference schemes.2023-02-17T06:38:16ZIncludes correction to software and corrigendum note (fix supplementary references)Ethan GoanDimitri PerrinKerrie MengersenClinton Fookeshttp://arxiv.org/abs/2412.14222v2A Survey on Large Language Model-based Agents for Statistics and Data Science2025-09-14T04:25:33ZIn recent years, data science agents powered by Large Language Models (LLMs), known as "data agents," have shown significant potential to transform the traditional data analysis paradigm. This survey provides an overview of the evolution, capabilities, and applications of LLM-based data agents, highlighting their role in simplifying complex data tasks and lowering the entry barrier for users without related expertise. We explore current trends in the design of LLM-based frameworks, detailing essential features such as planning, reasoning, reflection, multi-agent collaboration, user interface, knowledge integration, and system design, which enable agents to address data-centric problems with minimal human intervention. Furthermore, we analyze several case studies to demonstrate the practical applications of various data agents in real-world scenarios. Finally, we identify key challenges and propose future research directions to advance the development of data agents into intelligent statistical analysis software.2024-12-18T15:03:26ZAm. Statist. (2025) 1-14Maojun SunRuijian HanBinyan JiangHouduo QiDefeng SunYancheng YuanJian Huang10.1080/00031305.2025.2561140http://arxiv.org/abs/2509.07147v2On the Ambiguities of Incompatibility in Frequentist Inference2025-09-12T05:17:41ZThe interpretation of the P-value and its monotone transform s=-log2(p), or S-value, remains debated despite decades of dedicated literature. Within the neo-Fisherian framework, these values are often described as indices of (in)compatibility between the observed data and a set of ideal assumptions (i.e., the statistical model). In this regard, this paper proposes the distinction between two domains: the model domain, where assumptions are taken as perfectly true and every admissible outcome is, by construction, fully compatible with the model; and the real domain, where assumptions may fail and face empirical scrutiny. I argue that, although interpreted through an objective numerical index, any level of incompatibility can arise only in the latter domain, where the epistemic status of the model under examination is uncertain and a genuine conflict between data and hypotheses can therefore occur. The extent to which P- and S-values are taken as indicating incompatibility is a matter of contextual judgment. Within this framework, descriptive approaches serve to quantify the numerical values of P and S; these can be interpreted as indicative of a certain degree (or amount) of incompatibility between data and hypotheses once causal knowledge of the data-generating process and information about the costs and benefits of related decisions become clearer. Although the distinction between the model domain and the real domain may appear merely theoretical or even philosophical, I argue that this perspective is useful for developing a clear mental representation of how statistical estimates should be evaluated in practical settings and applications.2025-09-08T18:53:19ZAlessandro Rovettahttp://arxiv.org/abs/2409.16613v5Oral exams in introductory statistics class with non-native English speakers2025-09-12T00:08:31ZOral exams are a powerful tool to assess student's learning. This is particularly important in introductory statistics classes where students struggle to grasp various topics like the interpretation of probability, $p$-values and more. The challenge of acquiring conceptual understanding is only heightened when students are learning in a second language. In this paper, I share my experience administering oral exams to an introductory statistics class of non-native English speakers at a Japanese university. I explain the context of the university and course, before detailing the exam. Of particular interest is the relationship between exam performance and English proficiency. The results showed little relationship between the two, meaning the exam seemed to truly test student's statistical knowledge rather than their English ability. I close with encouragements and recommendations for practitioners hoping to implement similar oral exams, focusing on the unique difficulties faced by students not learning in their mother tongue.2024-09-25T04:31:07ZEric Yanchenkohttp://arxiv.org/abs/2509.01778v2Grid Transmission Evaluation for Solar Deployment and Data Center Growth2025-09-11T13:07:26ZThe rapid growth of renewable energy deployment and data center demand in the United States has intensified challenges in grid interconnection, with project delays and escalating costs threatening both economic expansion and energy reliability. This study investigates transmission constraints using the IEEE 39-bus New England Power System model to evaluate the simultaneous interconnection of a 1 GW solar facility and a 1 GW data center load. Employing PSSE and Python-based automation (psspy), we conducted 1,560 load flow simulations across varying siting configurations to assess branch overloads and transmission line limits. Results revealed that only 14 configurations avoided overloads, while most scenarios highlighted recurring congestion on specific network branches, particularly between buses 21 and 22. Optimal siting was identified with the load at bus #35 and the generator at bus #39, yielding minimal overloads (maximum 91.1% loading). Conversely, poor siting decisions resulted in severe congestion with maximum branch loading above 220%. The findings underscore the critical importance of optimized siting and modernized, automated interconnection studies to reduce delays and costs in renewable integration. This research demonstrates the potential of advanced modeling tools to accelerate interconnection processes, improve system reliability, and inform future strategies for balancing renewable energy deployment with rising data center demand.2025-09-01T21:22:51ZarXiv admin note: This paper has been withdrawn by arXiv due to disputed authorshipKajal ShethDhvanil PatelShyam Kareepadath Sajeevhttp://arxiv.org/abs/2509.08744v1Who has the best probabilities? Luck versus skill in prediction tournaments2025-09-10T16:34:49ZAn informal and elementary introduction to probability scoring and forecast verification and improvement, slightly extended from Significance 22:3(2025)16, which might be useful for less mathematical readers as a prologue to the classic review by Gneiting and Raftery [Strictly proper scoring rules, prediction, and estimation, Journal of the American Statistical Association 102 (2007): 359].2025-09-10T16:34:49Z13 pages, 2 figuresSignificance vol.22 no.3 (2025) 16-21Niall MacKay10.1093/jrssig/qmaf023http://arxiv.org/abs/2509.08451v1Comparing Methodologies for Ranking Alternatives: A case study in assessing bank financial performance2025-09-10T09:48:12ZBank financial performance encapsulates an institution's capacity to effectively manage its assets, capital, and operational activities to generate profits and ensure stability. Evaluating this performance necessitates the integration of diverse metrics, including profitability indicators, loan growth rates, capital utilization efficiency, and more. Nevertheless, directly comparing the financial performance across different banks presents a complex challenge due to inherent disparities in their specific performance parameters. Multi-criteria decision-making (MCDM) techniques are frequently employed to navigate this intricate assessment. This study undertakes a comparative analysis of various MCDM approaches in evaluating bank financial performance. Our investigation encompasses both a comparison of methods for assigning weights to criteria and a comparison of methodologies for ranking the alternatives (banks). We examine five distinct weighting methods: Equal, Entropy, MEREC, LOPCOW, and SPC. Concurrently, three alternative ranking methods Probability, TOPSIS, and RAM are compared. These comparisons are conducted within the context of a case study involving the performance assessment of 19 banks. The findings indicate that the highest degree of stability in ranking bank financial performance is achieved when the Entropy method is utilized for criteria weighting in conjunction with the Probability method for ranking alternatives.2025-09-10T09:48:12Z17 pages, 9 tablesDong Trung ChinhNguyen Thi Thu HienPham Huong QuynhVu Quang Minhhttp://arxiv.org/abs/2509.08187v1A Comparative Analysis of Multi-Criteria Decision-Making (MCDM) Methods2025-09-09T23:20:18ZMulti-Criteria Decision-Making (MCDM) techniques have found widespread application across diverse fields. The rapid evolution of MCDM has led to the development of hundreds of methods, each employing distinct approaches. However, due to inherent algorithmic differences, various MCDM methods often yield divergent results when applied to the same specific problem. This study undertakes a comparative analysis of four particular methods: RAM, MOORA, FUCA, and CURLI, within a defined case study. The evaluation context involves ranking 30 Vietnamese banks based on six criteria: capital adequacy, asset quality, management capability, earnings ability, liquidity, and sensitivity to market risk. Prior to this analysis, these banks had also been ranked by the CAMELS rating system. The CAMELS rankings serve as a benchmark to assess the performance of the RAM, MOORA, FUCA, and CURLI methods. Our findings indicate that FUCA and CURLI are highly suitable methods for this application, demonstrating Spearman's rank correlation coefficients with CAMELS of 0.9996 and 0.9984, respectively. In contrast, both RAM and MOORA proved unsuitable, exhibiting very low Spearman's correlation coefficients of -1.0296 against the CAMELS ranking.2025-09-09T23:20:18Z10 pages, 3 tablesEngineering, Technology & Applied Science Research, Vol. 15, No. 5, 2025, 26369-26375Nguyen Thi Thu HienPham Huong QuynhVu Quang Minh10.48084/etasr.12782http://arxiv.org/abs/2509.08183v1Chaotic Bayesian Inference: Strange Attractors as Risk Models for Black Swan Events2025-09-09T23:11:23ZWe introduce a new risk modeling framework where chaotic attractors shape the geometry of Bayesian inference. By combining heavy-tailed priors with Lorenz and Rossler dynamics, the models naturally generate volatility clustering, fat tails, and extreme events. We compare two complementary approaches: Model A, which emphasizes geometric stability, and Model B, which highlights rare bursts using Fibonacci diagnostics. Together, they provide a dual perspective for systemic risk analysis, linking Black Swan theory to practical tools for stress testing and volatility monitoring.2025-09-09T23:11:23Z13 pages, 5 figures. Includes supplementary baseline diagnosticsCrystal Rusthttp://arxiv.org/abs/2506.07437v2One-dimensional quantile-stratified sampling and its application in statistical simulations2025-09-06T06:04:39ZIn this paper we examine quantile-stratified samples from a known univariate probability distribution, with stratification occurring over a partition of the quantile regions in the distribution. We examine some general properties of this sampling method and we contrast it with standard IID sampling to highlight its similarities and differences. We examine the applications of this sampling method to various statistical simulations including importance sampling. We conduct simulation analysis to compare the performance of standard importance sampling against the quantile-stratified importance sampling to see how they each perform on a range of functions.2025-06-09T05:25:33ZBen O'Neillhttp://arxiv.org/abs/2509.05277v1Bridge Modal Identification using Single Moving Sensor under Random Traffic Loading2025-09-05T17:34:21ZThis paper explores the feasibility of utilizing the response recorded by a single moving sensor to identify the modal parameters of a bridge system under different loading conditions, such as known excitation and unknown random traffic-induced vibrations. The sensor traverses the bridge and captures its dynamic response (acceleration). The natural frequencies and damping ratios are identified using the moving sensor data in the frequency domain. In the case of known inputs, these parameters are then used to obtain the mode shapes, expressed as a linear combination of basic orthonormal polynomials (BOPs), with the coefficients of the BOPs in the linear combinations obtained via optimization. A statistical formulation is proposed to estimate the mode shapes in the case of unknown random traffic-induced vibrations, including the effect of road roughness. It is shown that the absolute value of the mode shapes are proportional to the ensemble standard deviation (SD) of the modal responses. This approach requires the sensor to traverse the bridge multiple times, with the mode shapes identified in both the time domain using variances, and in frequency domain through the evolutionary power spectrum of these responses. The random traffic loading is modeled such that vehicle arrival times follow a Poisson distribution, while the mass and velocity of the vehicles are assumed to follow uniform distributions. To incorporate the effect of road roughness, modeled as a homogeneous random field, a vehicle-bridge-interaction (VBI) model is utilized. Numerical validation under the different loading conditions demonstrates that a single moving sensor can be used to identify the modal parameters quite accurately, with high spatial resolution of the identified mode shapes, offering a cost-effective and efficient alternative for bridge health monitoring.2025-09-05T17:34:21Z56 pages, 18 figuresDhiraj GhoshSuparno MukhopadhyayShaily Jainhttp://arxiv.org/abs/2509.04546v1The Actuary's Final Word on Algorithmic Decision Making2025-09-04T16:59:20ZPaul Meehl's foundational work "Clinical versus Statistical Prediction," provided early theoretical justification and empirical evidence of the superiority of statistical methods over clinical judgment. Despite a century of empirical evidence supporting Meehl's central thesis, from early parole prediction studies in the 1920s to modern meta-analyses, confusion persists regarding when and why his troubling finding applies. This paper provides a contemporary theoretical justification for Meehl's result. Importantly, Meehl's prediction problems require a small set of possible outcomes and machine-readable data. Second, individual predictions and decisions are evaluated only on average. This formulation leads to a natural analysis from statistical decision theory, which shows that statistical rules are more accurate than clinical intuition almost by definition. Meehl's prediction paradox is an example of metrical determinism, where the rules of evaluation implicitly determine the best procedure. The decision-theoretic analysis of Meehl's problem elucidates the utility of algorithmic systems as decision-support tools, but also reveals their natural shortcomings, inducing expertise erosion, decision fatigue, and the usurpation of discretionary judgment.2025-09-04T16:59:20ZBenjamin Rechthttp://arxiv.org/abs/2508.07754v2Asymptotic Consistency and Generalization in Hybrid Models of Regularized Selection and Nonlinear Learning2025-08-31T10:45:21ZThis study explores how different types of supervised models perform in the task of predicting and selecting relevant variables in high-dimensional contexts, especially when the data is very noisy. We analyzed three approaches: regularized models (such as Lasso, Ridge, and Elastic Net), black-box models (such as Random Forest, XGBoost, LightGBM, CatBoost, and H2O GBM), and hybrid models that combine both approaches: regularization with nonlinear algorithms. Based on simulations inspired by the Friedman equation, we evaluated 23 models using three complementary metrics: RMSE, Jaccard index, and recall rate. The results reveal that, although black-box models excel in predictive accuracy, they lack interpretability and simplicity, essential factors in many real-world contexts. Regularized models, on the other hand, proved to be more sensitive to an excess of irrelevant variables. In this scenario, hybrid models stood out for their balance: they maintain good predictive performance, identify relevant variables more consistently, and offer greater robustness, especially as the sample size increases. Therefore, we recommend using this hybrid framework in market applications, where it is essential that the results make sense in a practical context and support decisions with confidence.2025-08-11T08:36:03ZLuciano Ribeiro GalvãoRafael de Andrade Morahttp://arxiv.org/abs/2508.21523v1Quantile Function-Based Models for Neuroimaging Classification Using Wasserstein Regression2025-08-29T11:23:50ZWe propose a novel quantile function-based approach for neuroimaging classification using Wasserstein-Fréchet regression, specifically applied to the detection of mild traumatic brain injury (mTBI) based on the MEG and MRI data. Conventional neuroimaging classification methods for mTBI detection typically extract summary statistics from brain signals across the different epochs, which may result in the loss of important distributional information, such as variance, skewness, kurtosis, etc. Our approach treats complete probability density functions of epoch space results as functional response variables within a Wasserstein-Fréchet regression framework, thereby preserving the full distributional characteristics of epoch results from $L_{1}$ minimum norm solutions. The global Wasserstein-Fréchet regression model incorporating covariates (age and gender) allows us to directly compare the distributional patterns between healthy control subjects and mTBI patients. The classification procedure computes Wasserstein distances between estimated quantile functions from control and patient groups, respectively. These distances are then used as the basis for diagnostic decisions. This framework offers a statistically principled approach to improving diagnostic accuracy in mTBI detection. In practical applications, the test accuracy on unseen data from Innovision IP's dataset achieves up to 98\%.2025-08-29T11:23:50Z17 pages, 2 figuresJie LiGary GreenJian Zhang