https://arxiv.org/api/xElZ1BESEYYqHiKo0hjkkOoLem4 2026-06-10T04:09:54Z 1686 105 15 http://arxiv.org/abs/2603.00098v1 Profiling vs. Case-specific Evidence: A Probabilistic Analysis 2026-02-17T13:43:02Z

The use of profiling evidence in criminal trials is a longstanding controversy in legal epistemology and evidence law theory. Many scholars, even when they oppose its use at trial, still assume that profiling evidence can be probative of guilt. We reject that assumption. Profiling evidence may support a generic hypothesis, but is not evidence that the defendant is guilty of the specific crime of which they are accused. We contrast profiling evidence with case-specific evidence, which speaks more directly to the facts of the case. Our critique departs from others by grounding the argument in a probabilistic analysis of evidentiary value. We also explore the implications of our account for debates about stereotyping.

2026-02-17T13:43:02Z 16 pages Marcello Di Bello Nicolò Cangiotti Michele Loi http://arxiv.org/abs/2602.14284v1 Benchmarking AI Performance on End-to-End Data Science Projects 2026-02-15T19:16:04Z

Data science is an integrated workflow of technical, analytical, communication, and ethical skills, but current AI benchmarks focus mostly on constituent parts. We test whether AI models can generate end-to-end data science projects. To do this we create a benchmark of 40 end-to-end data science projects with associated rubric evaluations. We use these to build an automated grading pipeline that systematically evaluates the data science projects produced by generative AI models. We find the extent to which generative AI models can complete end-to-end data science projects varies considerably by model. Most recent models did well on structured tasks, but there were considerable differences on tasks that needed judgment. These findings suggest that while AI models could approximate entry-level data scientists on routine tasks, they require verification.

2026-02-15T19:16:04Z Evelyn Hughes Rohan Alexander http://arxiv.org/abs/2602.13565v1 An Improved Milstein Method for the Numerical Solution of Multidimensional Stochastic Differential Equations 2026-02-14T02:54:38Z

Stochastic differential equations (SDEs) offer powerful and accessible mathematical models for capturing both deterministic and probabilistic aspects of dynamic behavior across a wide range of physical, financial, and social systems. However, analytical solutions for many SDEs are often unavailable, necessitating the use of numerical approximation methods. The rate of convergence of such numerical methods is of great importance, as it directly influences both computational efficiency and accuracy. This paper presents a proposed theorem, along with its proof, that facilitates the numerical evaluation of the strong (and weak) order of convergence of a numerical scheme for an SDE when the analytical solution is unavailable. Additionally, we address the challenge of numerically computing the multiple stochastic integrals required by the Milstein method to achieve improved convergence rates for multidimensional SDEs. In this context, two newly proposed numerical techniques for computing these multiple stochastic integrals are introduced and compared with existing approaches in terms of efficiency and effectiveness. The methodologies are further illustrated through simulation studies and applications to widely used financial models.

2026-02-14T02:54:38Z Paromita Banerjee Anirban Mondal http://arxiv.org/abs/2602.12216v1 Bayesian inference for the automultinomial model with an application to landcover data 2026-02-12T17:54:02Z

Multicategory lattice data arise in a wide variety of disciplines such as image analysis, biology, and forestry. We consider modeling such data with the automultinomial model, which can be viewed as a natural extension of the autologistic model to multicategory responses, or equivalently as an extension of the Potts model that incorporates covariate information into a pure-intercept model. The automultinomial model has the advantage of having a unique parameter that controls the spatial correlation. However, the model's likelihood involves an intractable normalizing function of the model parameters that poses serious computational problems for likelihood-based inference. We address this difficulty by performing Bayesian inference through the Double-Metropolis Hastings algorithm, and implement diagnostics to assess the convergence to the target posterior distribution. Through simulation studies and an application to land cover data, we find that the automultinomial model is flexible across a wide range of spatial correlations while maintaining a relatively simple specification. For large data sets we find it also has advantages over spatial generalized linear mixed models. To make this model practical for scientists, we provide recommendations for its specification and computational implementation.

2026-02-12T17:54:02Z Maria Paula Duenas-Herrera Stephen Berg Murali Haran http://arxiv.org/abs/2504.08263v2 A roadmap for systematic identification and analysis of multiple biases in causal inference 2026-02-10T21:49:53Z

Observational studies examining causal effects rely on unverifiable assumptions, the violation of which can induce multiple biases. Quantitative bias analysis (QBA) methods examine the sensitivity of findings to such violations, generally, by producing estimates under alternative assumptions, incorporating external information. Although substantial guidance exists for implementing QBA, there is limited guidance on how to systematically determine the assumptions underlying a primary causal analysis and the potential violations that should guide bias analysis. Consequently, many assumptions remain implicit, leading to selective and therefore misleading QBA. To address this gap, we propose a roadmap for systematically identifying and analysing multiple biases. Briefly, this consists of (1) articulating the assumptions underlying the primary analysis through specification and emulation of the ideal trial that defines the causal estimand and depicting these assumptions using a causal diagram; (2) extending the diagram to depict alternative assumptions under which biases may arise; (3) obtaining a single estimate that simultaneously corrects for all potential biases. We illustrate the roadmap using an investigation of the effect of breastfeeding on risk of childhood asthma, and through simulations illustrate the need for analysing multiple biases jointly rather than one at a time.

2025-04-11T05:30:32Z 12 Pages, 4 Figures Rushani Wijesuriya Rachael A. Hughes John B. Carlin Rachel L. Peters Jennifer J. Koplin Margarita Moreno-Betancur http://arxiv.org/abs/2506.05776v3 Analyzing the retraining frequency of global forecasting models: towards more stable forecasting systems 2026-02-10T14:56:53Z

Forecast stability, that is, the consistency of predictions over time, is essential in business settings where sudden shifts in forecasts can disrupt planning and erode trust in predictive systems. Despite its importance, stability is often overlooked in favor of accuracy. In this study, we evaluate the stability of point and probabilistic forecasts across several retraining scenarios using three large forecastingdatasets and ten different global forecasting models. To analyze stability in the probabilistic setting, we propose a new model-agnostic, distribution-free, and scale-free metric that measuresprobabilistic stability: the Scaled Multi-Quantile Change (SMQC). The results show that less frequent retraining not only preserves but often improves forecast stability, challenging the need for frequent retraining. Moreover, the study shows that accuracy and stability are not necessarily conflicting objectives when adopting a global modeling approach. The study promotes a shift toward stability-aware forecasting practices, proposing a new metric to evaluate forecast stability effectively in probabilistic settings, and offering practical guidelines for building more stable and sustainable forecasting systems.

2025-06-06T06:13:29Z Marco Zanotti http://arxiv.org/abs/2509.14218v2 Adaptive Off-Policy Inference for M-Estimators Under Model Misspecification 2026-02-08T13:47:37Z

When data are collected adaptively, such as in bandit algorithms, classical statistical approaches such as ordinary least squares and $M$-estimation will often fail to achieve asymptotic normality. Although recent lines of work have modified the classical approaches to ensure valid inference on adaptively collected data, most of these works assume that the model is correctly specified. The misspecified setting poses unique challenges because the parameter of interest itself may not be well-defined over a non-stationary distribution of rewards. We therefore tackle the problem of \emph{off-policy} inference in adaptive settings, where we uniquely define a projected solution over a stationary evaluation policy. Our method provides valid inference for $M$-estimators that use adaptively collected bandit data with a possibly misspecified working model. A key ingredient in our approach is the use of flexible approaches to stabilize the variance induced by adaptive data collection. A major novelty is that the procedure enables the construction of valid confidence sets even in settings where treatment policies are unstable and non-converging, such as when there is no unique optimal arm and standard bandit algorithms are used. Empirical results on semi-synthetic datasets constructed from the Osteoarthritis Initiative demonstrate that the method maintains type I error control, while existing methods for inference in adaptive settings do not cover in the misspecified case.

2025-09-17T17:51:40Z 43 pages, 6 figures James Leiner Robin Dunn Aaditya Ramdas http://arxiv.org/abs/2502.11510v3 Here Be Dragons: Bimodal posteriors arise from numerical integration error in longitudinal models 2026-02-07T00:39:44Z

Longitudinal models with dynamics governed by differential equations may require numerical integration alongside parameter estimation. We have identified a situation where the numerical integration introduces error in such a way that it becomes a novel source of non-uniqueness in estimation. We obtain two very different sets of parameters, one of which is a good estimate of the true values and the other a very poor one. The two estimates have forward numerical projections statistically indistinguishable from each other because of numerical error. In such cases, the posterior distribution for parameters is bimodal, with a dominant mode closer to the true parameter value, and a second cluster around the errant value. We demonstrate that multi-modality exists both theoretically and empirically for an affine first order differential equation, that a simulation workflow can test for evidence of the issue more generally, and that Markov Chain Monte Carlo sampling with a suitable solution can avoid bimodality. The issue of multi-modal posteriors arising from numerical error has consequences for Bayesian inverse methods that rely on numerical integration more broadly.

2025-02-17T07:26:15Z 33 pages, 7 figures, 2 tables Tess O'Brien Matthew T. Moores David Warton Daniel Falster http://arxiv.org/abs/2407.11518v2 Ensemble Transport Filter via Optimized Maximum Mean Discrepancy 2026-02-06T17:48:27Z

In this paper, we present a new ensemble-based filter method by reconstructing the analysis step of the particle filter through a transport map, which directly transports prior particles to posterior particles. The transport map is constructed through an optimization problem described by the Maximum Mean Discrepancy loss function, which matches the expectation information of the approximated posterior and reference posterior. The proposed method inherits the accurate estimation of the posterior distribution from particle filtering while gives an extension to high dimensional assimilation problems. To improve the robustness of Maximum Mean Discrepancy, a variance penalty term is used to guide the optimization. It prioritizes minimizing the discrepancy between the expectations of highly informative statistics for the reference posteriors. The penalty term significantly enhances the robustness of the proposed method and leads to a better approximation of the posterior. A few numerical examples are presented to illustrate the advantage of the proposed method over ensemble Kalman filter.

2024-07-16T08:54:12Z 27 pages, 14 figures Dengfei Zeng Lijian Jiang 10.1016/j.jcp.2025.114582 http://arxiv.org/abs/2405.07102v6 Nested Instrumental Variables Analysis: Switcher Average Treatment Effect, Identification, Efficient Estimation and Generalizability 2026-02-05T00:45:31Z

Instrumental variables (IVs) are widely used to estimate causal effects from non-randomized data. A canonical example is a randomized trial with noncompliance, in which the randomized treatment assignment serves as an IV for the non-ignorable treatment received. Under a monotonicity assumption, a valid IV nonparametrically identifies the average treatment effect among a latent complier subgroup, whose generalizability is often under debate. In many studies, there exist multiple versions of an IV, for instance, different nudges to take the same treatment in different study sites in a multicenter clinical trial. These different versions of an IV may result in different compliance rates and offer a unique opportunity to study IV estimates' generalizability. In this article, we introduce a novel nested IV assumption and study identification of the average treatment effect among two latent subgroups: always-compliers and switchers, who are defined based on the joint potential treatment received under two versions of a binary IV. We derive the efficient influence function for the SWitcher Average Treatment Effect (SWATE) under a nonparametric model and propose efficient estimators. We then propose formal statistical tests of the generalizability of IV estimates under the nested IV framework. The proposed tests are flexible nonparametric generalizations of classical overidentification tests that allow estimating nuisance parameters using machine learning tools. We apply the proposed method to the Prostate, Lung, Colorectal and Ovarian (PLCO) Cancer Screening Trial and study the causal effect of colorectal cancer screening and its generalizability.

2024-05-11T22:05:52Z Rui Wang Ying-Qi Zhao Oliver Dukes Bo Zhang http://arxiv.org/abs/2602.04762v1 Uncertainty in Island-based Ecosystem Services and Climate Change 2026-02-04T16:58:58Z

Small and medium-sized islands are acutely exposed to climate change and ecosystem degradation, yet the extent to which uncertainty is systematically addressed in scientific assessments of their ecosystem services remains poorly understood. This study revisits 226 peer-reviewed articles drawn from two global systematic reviews on island ecosystem services and climate change, applying a structured post hoc analysis to evaluate how uncertainty is treated across methods, service categories, ecosystem realms, and decision contexts. Studies were classified according to whether uncertainty was explicitly analysed, just mentioned, or ignored. Only 30 percent of studies incorporated uncertainty explicitly, while more than half did not address it at all. Scenario-based approaches dominated uncertainty assessment, whereas probabilistic and ensemble-based frameworks remained limited. Cultural ecosystem services and extreme climate impacts exhibited the lowest levels of uncertainty integration, and few studies connected uncertainty treatment to policy relevant decision frameworks. Weak or absent treatment of uncertainty emerges as a structural challenge in island systems, where narrow ecological thresholds, strong land-sea coupling, limited spatial buffers, and reduced institutional redundancy amplify the consequences of decision-making under incomplete knowledge. Systematic mapping of how uncertainty is framed, operationalised, or neglected reveals persistent methodological and conceptual gaps and informs concrete directions for strengthening uncertainty integration in future island-focused ecosystem service and climate assessments. Embedding uncertainty more robustly into modelling practices, participatory processes, and policy tools is essential for enhancing scientific credibility, governance relevance, and adaptive capacity in insular socio-ecological systems.

2026-02-04T16:58:58Z Nazli Demirel Ioannis N. Vogiatzakis George Zittis Mirela Tase Attila D. Sandor Savvas Zotos Christos Zoumides Turgay Dindaroglu Mauro Fois Irene Christoforidi Valentini Stamatiadou Shiri Zemah-Shamir Tamer Albayrak Cigdem Kaptan Ayhan Paraskevi Manolaki Ina Sieber Ziv Zemah-Shamir Elli Tzirkalli Aristides Moustakas http://arxiv.org/abs/2602.04353v1 Anyone for chess? Analysing chess ratings above high thresholds 2026-02-04T09:22:40Z

Suppose some cleverness score parameter is sufficiently interesting to be defined and then measured, perhaps for different strata of specialists or for the broader population. Such phenomena could have Gaussian distributions, when it comes to all players in a stratum, but when interest focuses on the very tails, for the top few percent, those above certain high thresholds, different models are called for, along with the need to analyse such based on the listed top scores only. In this note I develop such models and tools, and apply them to the top-100 and above 2100 points lists for regular chess ratings, for the currently active 14671 men and 753 women, as given by the FIDE, January 2026. It is argued that even when two or more distributions have close to identical expected values, or medians, even smaller differences in variance may explain gaps for the few very best ones.

2026-02-04T09:22:40Z 9 pages, 7 figures Nils Lid Hjort http://arxiv.org/abs/2602.04164v1 The Dynamics of Attention across Automated and Manual Driving Modes: A Driving Simulation Study 2026-02-04T02:57:40Z

This study aims to explore the dynamics of driver attention to various zones, including the road, the central mirror, the embedded Human-Machine Interface (HMI), and the speedometer, across different driving modes in AVs. The integration of autonomous vehicles (AVs) into transportation systems has introduced critical safety concerns, particularly regarding driver re-engagement during mode transitions. Past accidents underscore the risks of overreliance on automation and highlight the need to understand dynamic attention allocation to support safety in autonomous driving. A high-fidelity driving simulation was conducted. Eye-tracking technology was used to measure fixation duration, fixation count, and time to first fixation across distinct driving modes (automated, manual, and transition), which were then used to assess how drivers allocated attention to various areas of interest (AOIs). Findings show that drivers' attention varies significantly across driving modes. In manual mode, attention consistently focuses on the road, while in automated mode, prolonged fixation on the embedded HMI was observed. During the handover and takeover phases, attention shifts dynamically between environmental and technological elements. The study reveals that driver attention allocation is mode-dependent. These findings inform the design of adaptive HMIs in AVs that align with drivers' attention patterns. By presenting relevant information according to the driving context, such systems can enhance driver-vehicle interaction, support effective transitions, and improve overall safety. Systematic analysis of visual attention dynamics across driving modes is gaining prominence, as it informs adaptive HMI designs and driver readiness interventions. The GLMM findings can be directly applied to the design of adaptive HMIs or driver training programs to enhance attention and improve safety.

2026-02-04T02:57:40Z Yuan Cai Mustafa Demir Farzan Sasangohar Mohsen Zare http://arxiv.org/abs/2505.08395v2 Bayesian Estimation of Causal Effects Using Proxies of a Latent Interference Network 2026-02-03T14:59:43Z

Network interference occurs when treatments assigned to some units affect the outcomes of others. Traditional approaches often assume that the observed network correctly specifies the interference structure. However, in practice, researchers frequently only have access to proxy measurements of the interference network due to limitations in data collection or potential mismatches between measured networks and actual interference pathways. In this paper, we introduce a framework for estimating causal effects when only proxy networks are available. Our approach leverages a structural causal model that accommodates diverse proxy types, including noisy measurements, multiple data sources, and multilayer networks, and defines causal effects as interventions on population-level treatments. The latent nature of the true interference network poses significant challenges. To overcome them, we develop a Bayesian inference framework. We propose a Block Gibbs sampler with Locally Informed Proposals to update the latent network, thereby efficiently exploring the high-dimensional posterior space composed of both discrete and continuous parameters. The latent network updates are driven by information from the proxy networks, treatments, and outcomes. We illustrate the performance of our method through numerical experiments, demonstrating its accuracy in recovering causal effects even when only proxies of the interference network are available.

2025-05-13T09:46:30Z Bar Weinstein Daniel Nevo http://arxiv.org/abs/2602.03274v1 Six-Minute Man Sander Eitrem 5:58.52 -- first man below the 6:00.00 barrier 2026-02-03T08:58:23Z

In Calgary, November 2005, Chad Hedrick was the first to skate the 5,000 m below 6:10. His world record time 6:09.68 was then beaten a week later, in Salt Lake City, by Sven Kramer's 6:08.78. Further top races and world records followed over the ensuing seasons; up to and including the 2024-2025 season, a total of 126 races have been below 6:10, with Nils van der Poel's 2021 world record being 6:01.56. The appropriately hyped-up canonical question for the friends and followers and aficionados of speedskating has then been when (and by whom we for the first time would witness a below 6:00.00 race. In this note I first use extreme value statistics modelling to assess the state of affairs, as per the end of the 2024-2025 season, with predictions and probabilities for the 2025-2026 season. Under natural modelling assumptions the probability of seeing a new world record during this new season is shown to be about ten percent. We were indeed excited but in reality merely modestly surprised that a race better than van der Poel's record was clocked, by Timothy Loubineaud, in Salt Lake City, November 14, 2025. But Six-Minute Man Sander Eitrem's outstanding 5:58.52 in Inzell, on January 24, 2026, is truly beamonesquely shocking. I also use the modelling machinery to analyse the post-Eitrem situation, and suggest answers to the question of how fast the 5,000 m ever can be skated.

2026-02-03T08:58:23Z Nils Lid Hjort