https://arxiv.org/api/DEAktAycGsbamIwoPp4Tc2BNWic 2026-03-24T11:10:47Z 22812 45 15 http://arxiv.org/abs/2510.17641v5 Are penalty shootouts better than a coin toss? Evidence from international club football in Europe 2026-03-20T10:17:08Z Penalty shootouts play a crucial role in the knockout stage of major football tournaments. Their importance has substantially increased from the 2021/22 season, when the Union of European Football Associations (UEFA) scrapped the away goals rule. Our paper examines whether the outcome of a penalty shootout can be predicted in UEFA club competitions. Based on all shootouts between 2000 and 2025, no evidence is found for the effect of the kicking order, the field of the match, or psychological momentum. In contrast to previous results, we do not detect any relationship between shootout success and relative team strength, quantified by differences in Elo ratings and the implied winning probability. Thus, the hypothesis that penalty shootouts are close to a coin toss in international competitions for European football clubs cannot be rejected. 2025-10-20T15:21:44Z 23 pages, 5 figures, 7 tables László Csató Dóra Gréta Petróczy http://arxiv.org/abs/2603.20345v1 Towards Improved Short-term Hypoglycemia Prediction and Diabetes Management based on Refined Heart Rate Data 2026-03-20T10:13:31Z Hypoglycemia is a severe condition of decreased blood glucose, specifically below 70 mg/dL (3.9 mmol/L). This condition can often be asymptomatic and challenging to predict in individuals with type 1 diabetes (T1D). Research on hypoglycemic prediction typically uses a combination of blood glucose readings and heart rate data to predict hypoglycemic events. Given that these features are collected through wearable sensors, they can sometimes have missing values, necessitating efficient imputation methods. This work makes significant contributions to the current state of the art by introducing two novel imputation techniques for imputing heart rate values over short-term horizons: Controlled Weighted Rational Bézier Curves (CRBC) and Controlled Piecewise Cubic Hermite Interpolating Polynomial with mapped peaks and valleys of Control Points (CMPV). In addition to these imputation methods, we employ two metrics to capture data patterns, alongside a combined metric that integrates the strengths of both individual metrics with RMSE scores for a comprehensive evaluation of the imputation techniques. According to our combined metric assessment, CMPV outperforms the alternatives with an average score of 0.33 across all time gaps, while CRBC follows with a score of 0.48. These findings clearly demonstrate the effectiveness of the proposed imputation methods in accurately filling in missing heart rate values. Moreover, this study facilitates the detection of abnormal physiological signals, enabling the implementation of early preventive measures for more accurate diagnosis. 2026-03-20T10:13:31Z 10 pages, 2 tables Vaibhav Gupta Florian Grensing Beyza Cinar Louisa van den Boom Maria Maleshkova http://arxiv.org/abs/2603.20343v1 A practical introduction to ODE modelling in Stan for biological systems 2026-03-20T10:02:39Z Integrating dynamical systems models with time series data is a central part of contemporary mathematical biology. With the rich variety of available models and data, numerous methods and computational tools have been developed for these purposes. One such tool is Stan, a freely available and open-source probabilistic programming framework that provides efficient methods for estimating model parameters from data using computational Bayesian inference algorithms. Stan includes built-in mechanisms for working with ordinary differential equation (ODE) models, which are widely used in mathematical biology and related fields to study simulated, experimental, and real-world systems that change over time. Through step-by-step worked examples, including both pedagogical toy models and applications with real data, this article provides a practical, self-contained introduction to performing parameter estimation and model evaluation for first-order linear and nonlinear ODE models in Stan. The article also explains key statistical methods that underpin Stan and discusses computational Bayesian modelling in the context of biological applications. 2026-03-20T10:02:39Z 23 pages, 10 figures Sara Hamis John Forslund Cici Chen Gu Jodie A. Cochrane http://arxiv.org/abs/2603.19756v1 Extraction of tabulated statistical results with tableParser 2026-03-20T08:41:37Z Tabulated content is omnipresent in scientific literature. This work presents the R package *tableParser*, designed to extract and postprocess tables from NISO-JATS-encoded XML, HTML, DOCX, and, with limitations, PDF documents. *tableParser* focuses on extracting and analyzing statistical test results reported in scientific publications. It can be used for large-scale analysis of effect sizes, reporting practices, or summarization of results, as well as for checking completeness and consistency of standard test results in unpublished documents. Documents can be processed in three decoding levels. *table2matrix()* compiles all tables into a list of character matrices with captions and footnotes. *table2text()* collapses the matrix contents into human-readable text, mimicking a screen reader. Optionally, many common codings that are reported within the table's caption and footnote can be used to decode and expand the table's content. The collapsed and decoded table content can be further processed match an ideal input for the extraction of statistical standard results with the *standardStats()* function from the *JATSdecoder* package. The output of *table2stats()* is a data frame with all detected standard results as columns and, if calculation is possible, a recalculated p-value. If desired, an automated consistency check of the reported and the coded p-values with the recalculated p-value can be initiated. *tableParser* works best on barrier-free HTML tables encoded in NISO-JATS, where captions and footnotes are clearly identifiable. By guessing the tables captions and footnotes conservatively, the processing of tables within HTML and DOCX documents is comparably robust. Technically, tables in PDFs often fail to be correctly extracted, with captions and footnotes not detectable. Therefore, a decoding of codes is not possible, which lowers *tableParser*'s decoding accuracy on PDFs. 2026-03-20T08:41:37Z 16 pages, 14 tables Ingmar Böschen http://arxiv.org/abs/2502.04082v2 Market-based insurance ratemaking: application to pet insurance 2026-03-20T07:07:03Z This paper introduces a method for pricing insurance policies using market data. The approach is designed for scenarios in which the insurance company seeks to enter a new market, in our case: pet insurance, lacking historical data. The methodology involves an iterative two-step process. First, a suitable parameter is proposed to characterize the underlying risk. Second, the resulting pure premium is linked to the observed commercial premium using an isotonic regression model. To validate the method, comprehensive testing is conducted on synthetic data, followed by its application to a dataset of actual pet insurance rates. To facilitate practical implementation, we have developed an R package called IsoPriceR. By addressing the challenge of pricing insurance policies in the absence of historical data, this method helps enhance pricing strategies in emerging markets. 2025-02-06T13:46:52Z Pierre-Olivier Goffard Pierrick Piette Gareth W. Peters http://arxiv.org/abs/2603.19640v1 Logistic-aided Huber M-estimator for robust GNSS positioning 2026-03-20T04:45:35Z This paper develops a logistic-aided Huber (LAH) M-estimator for robust GNSS positioning under long-tailed, multipath-affected measurement errors. The key idea is to leverage a logistic measurement error assumption and establish a one-to-one approximation between the logistic-based loglikelihood (i.e., quasi-log-cosh) and the Huber kernel by matching their score functions. This yields closed-form tuning rules for the scale and threshold parameters in the Huber estimator, grounded on logistic error statistical properties. We further show that the proposed LAH estimator preserves comparable efficiency and robustness to the connected logistic-based least quasi-log-cosh (LQLC) estimator. Both Monte Carlo simulations with long-tailed measurement errors and a one-hour urban GNSS dataset confirm that the proposed logistic-statistics-based tuning improves positioning accuracy and precision while suppressing large error spikes. Specifically, LAH reduces the 2D RMSE/STD by 28.03%/38.83% versus conventional 95%-efficiency-based Huber tuning in simulation, and reduces the overall 3D RMSE/STD by 4.85%/16.68% in real-world experiments while suppressing large positioning error spikes by up to 51%. 2026-03-20T04:45:35Z Submitted to IEEE Transactions on Aerospace and Electronic Systems Zhengdao Li Penggao Yan Li-Ta Hsu http://arxiv.org/abs/2603.19549v1 Plagiarism or Productivity? Students Moral Disengagement and Behavioral Intentions to Use ChatGPT in Academic Writing 2026-03-20T01:21:34Z This study examined how moral disengagement influences Filipino college students' intention to use ChatGPT in academic writing. The model tested five mechanisms: moral justification, euphemistic labeling, displacement of responsibility, minimizing consequences, and attribution of blame. These mechanisms were analyzed as predictors of attitudes, subjective norms, and perceived behavioral control, which then predicted behavioral intention. A total of 418 students with ChatGPT experience participated. The results showed that several moral disengagement mechanisms influenced students' attitudes and sense of control. Among the predictors, attribution of blame had the strongest influence, while attitudes had the highest impact on behavioral intention. The model explained more than half of the variation in intention. These results suggest that students often rely on institutional gaps and peer behavior to justify AI use. Many believe it is acceptable to use ChatGPT for learning or when rules are unclear. This shows a need for clear academic integrity policies, ethical guidance, and classroom support. The study also recognizes that intention-based models may not fully explain student behavior. Emotional factors, peer influence, and convenience can also affect decisions. The results provide useful insights for schools that aim to support responsible and informed AI use in higher education. 2026-03-20T01:21:34Z 5 pages, 1 figure, 2 table, conference proceeding 2025 International Workshop on Artificial Intelligence and Education (2026) 383-387 John Paul P. Miranda Rhiziel P. Manalese Mark Anthony A. Castro Renen Paul M. Viado Vernon Grace M. Maniago Rudante M. Galapon Jovita G. Rivera Amado B. Martinez 10.1109/WAIE67422.2025.11381217 http://arxiv.org/abs/2506.12177v2 A proxy-based approach for unmeasured confounding in electronic health records research 2026-03-19T22:11:41Z Electronic health records (EHR) are widely used to study clinical decisions, yet unmeasured confounding remains a persistent challenge. Proxy variables offer a potential solution. In EHR data, clinicians already record many such measurements (e.g., vitals), each revealing something about a patient's underlying health. Despite this, proxy-based methods are rarely used in practice. We introduce a new way to use proxies to adjust for unmeasured confounding. Our approach uses a vector of proxies to construct covariates that capture aspects of the unmeasured confounder, which are then included in a regression model. As one implementation, we use factor analysis followed by regression. We compare this approach with existing methods, including proximal causal inference, across a range of realistic settings. In practice, assumptions rarely hold exactly, so we study what happens when models are misspecified and variables are used incorrectly: e.g., a confounder or instrument is treated as a proxy. Finally, we apply the method to EHR data to estimate the effect of hospital admission for older adults presenting to the emergency department with chest pain, a setting where unmeasured confounding is a substantial concern. This work provides a practical way to use proxies and may help bring proxy-based methods into broader use. 2025-06-13T18:57:04Z Haley Colgate Kottler Amy Cochran http://arxiv.org/abs/1911.01850v3 Stabilizing Variable Selection and Regression 2026-03-19T19:43:05Z We consider regression in which one predicts a response $Y$ with a set of predictors $X$ across different experiments or environments. This is a common setup in many data-driven scientific fields and we argue that statistical inference can benefit from an analysis that takes into account the distributional changes across environments. In particular, it is useful to distinguish between stable and unstable predictors, i.e., predictors which have a fixed or a changing functional dependence on the response, respectively. We introduce stabilized regression which explicitly enforces stability and thus improves generalization performance to previously unseen environments. Our work is motivated by an application in systems biology. Using multiomic data, we demonstrate how hypothesis generation about gene function can benefit from stabilized regression. We believe that a similar line of arguments for exploiting heterogeneity in data can be powerful for many other applications as well. We draw a theoretical connection between multi-environment regression and causal models, which allows to graphically characterize stable versus unstable functional dependence on the response. Formally, we introduce the notion of a stable blanket which is a subset of the predictors that lies between the direct causal predictors and the Markov blanket. We prove that this set is optimal in the sense that a regression based on these predictors minimizes the mean squared prediction error given that the resulting regression generalizes to unseen new environments. 2019-11-05T15:04:33Z Niklas Pfister Evan G. Williams Jonas Peters Ruedi Aebersold Peter Bühlmann http://arxiv.org/abs/2603.19403v1 Evaluation of Individual and Trial Level Association Metrics in the Validation of a Binary Surrogate Endpoint for a True Time-to-Event Endpoint 2026-03-19T18:51:44Z Candidate binary endpoints are often considered as surrogates for time-to-event (TTE) clinical endpoints, primarily because they can be assessed at earlier time points. To be submitted for regulatory approval candidate binary endpoints need to validated. The most well-known method for performing such validation employs a meta-analytic framework to estimate individual-level and trial-level association. However, the performance of these association estimates in the context of a binary surrogate has not yet been examined through a comprehensive simulation study. This research aims to systematically investigate the performance of association estimates at the trial-level and at the individual-level under various trial design choices, using both simulation studies and clinical trial data, where available. 2026-03-19T18:51:44Z 29 pages, 6 figures Renee Y. Ge Azadeh Shohoudi Malini Iyengar Quefeng Li Judy Li http://arxiv.org/abs/2603.19143v1 The Uncertain Policy Price of Scaling Direct Air Capture 2026-03-19T16:59:11Z Direct air carbon capture and storage (DACCS) is a promising CO2 removal technology, but its deployment at scale remains speculative. Yet, its technological, economic, and policy-related uncertainties have often been overlooked in mitigation pathways. This paper conducts the first uncertainty quantification and global sensitivity analysis of DACCS on technological, market, financial and public support drivers, using a detailed-process Integrated Assessment Model and newly developed sensitivity algorithms. We find that DACCS deployment exhibits a fat-tailed distribution: most scenarios show modest technology uptake, but there is a small but non-zero probability (4-6%) of achieving gigaton-scale removals by mid-century. Scaling DACCS to gigaton levels requires subsidies that always exceed 200-330 USD/tCO2 and are sustained for decades, resulting in a public support programme of 900-3000 USD Billions. Such an effort pays back by mid-century, but only if accompanied by strong emission reduction policies. These findings highlight the critical role of climate policies in enabling a robust and economically sustainable CO2 removal strategy. 2026-03-19T16:59:11Z Leonardo Chiani Pietro Andreoni Laurent Drouet Tobias Schmidt Katrin Sievert Bjerne Steffen Massimo Tavoni http://arxiv.org/abs/2603.19055v1 Probabilistic multivariate statistical process control via kernel parameter uncertainty propagation 2026-03-19T15:48:48Z Kernel-based multivariate statistical process control (K-MSPC) extends classical monitoring to nonlinear industrial processes. Its performance depends critically on kernel parameters such as lengthscales and variance terms. In current practice these parameters are typically selected by heuristics or deterministic optimisation, and then treated as fixed, despite being inferred from finite and noisy data. This can lead to overconfident control limits and unstable alarm behaviour when the kernel choice is uncertain. This work proposes a probabilistic K-MSPC framework that quantifies and propagates kernel parameter uncertainty to the monitoring statistics. The approach follows a two-stage workflow: (i) deterministic kernel calibration using supervised or unsupervised models, and (ii) Bayesian inference of kernel parameters via Markov chain Monte Carlo. Posterior samples are propagated through kernel Principal Component Analysis to produce probabilistic $T^2$ and squarred prediction error control charts, together with uncertainty-aware contribution plots. The framework is evaluated on the Tennessee Eastman Process benchmark. Results show that posterior-mean monitoring often improves fault detection compared to deterministic prior-mean charts for the squared exponential kernel, while credible bands remain narrow in-control and widen under faults, reflecting amplified epistemic uncertainty in abnormal regimes. The automatic relevance determination kernel reduces posterior uncertainty and yields performance close to the deterministic baseline, whereas unsupervised calibration produces wider posterior bands but still robust fault detection. 2026-03-19T15:48:48Z Zina-Sabrina Duma Victoria Jorry Ayesha Safraz Maria Paola di Crosta Tuomas Sihvonen Lassi Roininen Satu-Pia Reinikainen http://arxiv.org/abs/2603.18781v1 SRRM: Improving Recursive Transport Surrogates in the Small-Discrepancy Regime 2026-03-19T11:32:28Z Recursive partitioning methods provide computationally efficient surrogates for the Wasserstein distance, yet their statistical behavior and their resolution in the small-discrepancy regime remain insufficiently understood. We study Recursive Rank Matching (RRM) as a representative instance of this class under a population-anchored reference. In this setting, we establish consistency and an explicit convergence rate for the anchored empirical RRM under the quadratic cost. We then identify a dominant mismatch mechanism responsible for the loss of resolution in the small-discrepancy regime. Based on this analysis, we introduce Selective Recursive Rank Matching (SRRM), which suppresses the resulting dominant mismatches and yields a higher-fidelity practical surrogate for the Wasserstein distance at moderate additional computational cost. 2026-03-19T11:32:28Z 29 pages,20 figures Yufei Zhang Tao Wang Jingyi Zhang http://arxiv.org/abs/2507.04303v3 Forecasting age distribution of deaths across countries: Life expectancy and annuity valuation 2026-03-18T22:47:54Z In this paper, we provide a comprehensive cross-country validation study of compositional mortality modeling and forecasting methods. Thus, we consider two one-to-one transformations: the cumulative distribution function and the centered log-ratio transformation in compositional data analysis. Between the two transformations, the cumulative distribution function provides a scale-free way to visualize the gender gap and cross-country heterogeneity in the probability of dying by sex and country. Drawing on age-specific period life-table death counts from 24 countries in the Human Mortality Database (2025), we assess and compare the point and interval forecast accuracy of the two transformations, using the same forecasting method. Enhancing the forecast accuracy of period life-table death counts is of significant value to demographers, who rely on such forecasts to estimate survival probabilities and life expectancy, and to actuaries, who use them to price annuities across various entry ages and maturities. 2025-07-06T09:03:48Z 34 pages, 15 figures, 5 tables Han Lin Shang Steven Haberman http://arxiv.org/abs/2603.18279v1 Covariate-Dependent Functional Principal Component Analysis for SHM 2026-03-18T20:54:13Z In Structural Health Monitoring (SHM), sensor measurements and derived features such as eigenfrequencies often exhibit systematic daily patterns and can therefore be naturally represented as functional data. Furthermore, these patterns are typically influenced by environmental factors, particularly temperature, which can substantially affect the observed system response. While most existing methods for removing environmental effects assume that confounding influences affect only the mean response, it has been shown that environmental and operational factors may also alter the covariance structure of the residual process. To address this limitation in a functional data monitoring framework, we incorporate so-called covariate-dependent functional principal component analysis (CD-FPCA), which allows eigenfunctions and eigenvalues of the residual process to vary smoothly with covariates such as temperature. The proposed methodology is illustrated using an extended version of the KW51 railway bridge eigenfrequency dataset. This case study suggests that accounting for covariate effects beyond the functional mean can improve the robustness of the monitoring procedure, in particular by reducing environmentally induced (false) alarms under challenging low-temperature conditions. 2026-03-18T20:54:13Z 10 pages, 3 figures, conference Philipp Wittenberg Lizzie Neumann Kristof Maes Jan Gertheiss