https://arxiv.org/api/xElZ1BESEYYqHiKo0hjkkOoLem42026-06-10T04:09:54Z168610515http://arxiv.org/abs/2603.00098v1Profiling vs. Case-specific Evidence: A Probabilistic Analysis2026-02-17T13:43:02ZThe use of profiling evidence in criminal trials is a longstanding controversy in legal epistemology and evidence law theory. Many scholars, even when they oppose its use at trial, still assume that profiling evidence can be probative of guilt. We reject that assumption. Profiling evidence may support a generic hypothesis, but is not evidence that the defendant is guilty of the specific crime of which they are accused. We contrast profiling evidence with case-specific evidence, which speaks more directly to the facts of the case. Our critique departs from others by grounding the argument in a probabilistic analysis of evidentiary value. We also explore the implications of our account for debates about stereotyping.2026-02-17T13:43:02Z16 pagesMarcello Di BelloNicolò CangiottiMichele Loihttp://arxiv.org/abs/2602.14284v1Benchmarking AI Performance on End-to-End Data Science Projects2026-02-15T19:16:04ZData science is an integrated workflow of technical, analytical, communication, and ethical skills, but current AI benchmarks focus mostly on constituent parts. We test whether AI models can generate end-to-end data science projects. To do this we create a benchmark of 40 end-to-end data science projects with associated rubric evaluations. We use these to build an automated grading pipeline that systematically evaluates the data science projects produced by generative AI models. We find the extent to which generative AI models can complete end-to-end data science projects varies considerably by model. Most recent models did well on structured tasks, but there were considerable differences on tasks that needed judgment. These findings suggest that while AI models could approximate entry-level data scientists on routine tasks, they require verification.2026-02-15T19:16:04ZEvelyn HughesRohan Alexanderhttp://arxiv.org/abs/2602.13565v1An Improved Milstein Method for the Numerical Solution of Multidimensional Stochastic Differential Equations2026-02-14T02:54:38ZStochastic differential equations (SDEs) offer powerful and accessible mathematical models for capturing both deterministic and probabilistic aspects of dynamic behavior across a wide range of physical, financial, and social systems. However, analytical solutions for many SDEs are often unavailable, necessitating the use of numerical approximation methods. The rate of convergence of such numerical methods is of great importance, as it directly influences both computational efficiency and accuracy. This paper presents a proposed theorem, along with its proof, that facilitates the numerical evaluation of the strong (and weak) order of convergence of a numerical scheme for an SDE when the analytical solution is unavailable. Additionally, we address the challenge of numerically computing the multiple stochastic integrals required by the Milstein method to achieve improved convergence rates for multidimensional SDEs. In this context, two newly proposed numerical techniques for computing these multiple stochastic integrals are introduced and compared with existing approaches in terms of efficiency and effectiveness. The methodologies are further illustrated through simulation studies and applications to widely used financial models.2026-02-14T02:54:38ZParomita BanerjeeAnirban Mondalhttp://arxiv.org/abs/2602.12216v1Bayesian inference for the automultinomial model with an application to landcover data2026-02-12T17:54:02ZMulticategory lattice data arise in a wide variety of disciplines such as image analysis, biology, and forestry. We consider modeling such data with the automultinomial model, which can be viewed as a natural extension of the autologistic model to multicategory responses, or equivalently as an extension of the Potts model that incorporates covariate information into a pure-intercept model. The automultinomial model has the advantage of having a unique parameter that controls the spatial correlation. However, the model's likelihood involves an intractable normalizing function of the model parameters that poses serious computational problems for likelihood-based inference. We address this difficulty by performing Bayesian inference through the Double-Metropolis Hastings algorithm, and implement diagnostics to assess the convergence to the target posterior distribution. Through simulation studies and an application to land cover data, we find that the automultinomial model is flexible across a wide range of spatial correlations while maintaining a relatively simple specification. For large data sets we find it also has advantages over spatial generalized linear mixed models. To make this model practical for scientists, we provide recommendations for its specification and computational implementation.2026-02-12T17:54:02ZMaria Paula Duenas-HerreraStephen BergMurali Haranhttp://arxiv.org/abs/2504.08263v2A roadmap for systematic identification and analysis of multiple biases in causal inference2026-02-10T21:49:53ZObservational studies examining causal effects rely on unverifiable assumptions, the violation of which can induce multiple biases. Quantitative bias analysis (QBA) methods examine the sensitivity of findings to such violations, generally, by producing estimates under alternative assumptions, incorporating external information. Although substantial guidance exists for implementing QBA, there is limited guidance on how to systematically determine the assumptions underlying a primary causal analysis and the potential violations that should guide bias analysis. Consequently, many assumptions remain implicit, leading to selective and therefore misleading QBA. To address this gap, we propose a roadmap for systematically identifying and analysing multiple biases. Briefly, this consists of (1) articulating the assumptions underlying the primary analysis through specification and emulation of the ideal trial that defines the causal estimand and depicting these assumptions using a causal diagram; (2) extending the diagram to depict alternative assumptions under which biases may arise; (3) obtaining a single estimate that simultaneously corrects for all potential biases. We illustrate the roadmap using an investigation of the effect of breastfeeding on risk of childhood asthma, and through simulations illustrate the need for analysing multiple biases jointly rather than one at a time.2025-04-11T05:30:32Z12 Pages, 4 FiguresRushani WijesuriyaRachael A. HughesJohn B. CarlinRachel L. PetersJennifer J. KoplinMargarita Moreno-Betancurhttp://arxiv.org/abs/2506.05776v3Analyzing the retraining frequency of global forecasting models: towards more stable forecasting systems2026-02-10T14:56:53ZForecast stability, that is, the consistency of predictions over time, is essential in business settings where sudden shifts in forecasts can disrupt planning and erode trust in predictive systems. Despite its importance, stability is often overlooked in favor of accuracy. In this study, we evaluate the stability of point and probabilistic forecasts across several retraining scenarios using three large forecastingdatasets and ten different global forecasting models. To analyze stability in the probabilistic setting, we propose a new model-agnostic, distribution-free, and scale-free metric that measuresprobabilistic stability: the Scaled Multi-Quantile Change (SMQC). The results show that less frequent retraining not only preserves but often improves forecast stability, challenging the need for frequent retraining. Moreover, the study shows that accuracy and stability are not necessarily conflicting objectives when adopting a global modeling approach. The study promotes a shift toward stability-aware forecasting practices, proposing a new metric to evaluate forecast stability effectively in probabilistic settings, and offering practical guidelines for building more stable and sustainable forecasting systems.2025-06-06T06:13:29ZMarco Zanottihttp://arxiv.org/abs/2509.14218v2Adaptive Off-Policy Inference for M-Estimators Under Model Misspecification2026-02-08T13:47:37ZWhen data are collected adaptively, such as in bandit algorithms, classical statistical approaches such as ordinary least squares and $M$-estimation will often fail to achieve asymptotic normality. Although recent lines of work have modified the classical approaches to ensure valid inference on adaptively collected data, most of these works assume that the model is correctly specified. The misspecified setting poses unique challenges because the parameter of interest itself may not be well-defined over a non-stationary distribution of rewards. We therefore tackle the problem of \emph{off-policy} inference in adaptive settings, where we uniquely define a projected solution over a stationary evaluation policy. Our method provides valid inference for $M$-estimators that use adaptively collected bandit data with a possibly misspecified working model. A key ingredient in our approach is the use of flexible approaches to stabilize the variance induced by adaptive data collection. A major novelty is that the procedure enables the construction of valid confidence sets even in settings where treatment policies are unstable and non-converging, such as when there is no unique optimal arm and standard bandit algorithms are used. Empirical results on semi-synthetic datasets constructed from the Osteoarthritis Initiative demonstrate that the method maintains type I error control, while existing methods for inference in adaptive settings do not cover in the misspecified case.2025-09-17T17:51:40Z43 pages, 6 figuresJames LeinerRobin DunnAaditya Ramdashttp://arxiv.org/abs/2502.11510v3Here Be Dragons: Bimodal posteriors arise from numerical integration error in longitudinal models2026-02-07T00:39:44ZLongitudinal models with dynamics governed by differential equations may require numerical integration alongside parameter estimation. We have identified a situation where the numerical integration introduces error in such a way that it becomes a novel source of non-uniqueness in estimation. We obtain two very different sets of parameters, one of which is a good estimate of the true values and the other a very poor one. The two estimates have forward numerical projections statistically indistinguishable from each other because of numerical error. In such cases, the posterior distribution for parameters is bimodal, with a dominant mode closer to the true parameter value, and a second cluster around the errant value. We demonstrate that multi-modality exists both theoretically and empirically for an affine first order differential equation, that a simulation workflow can test for evidence of the issue more generally, and that Markov Chain Monte Carlo sampling with a suitable solution can avoid bimodality. The issue of multi-modal posteriors arising from numerical error has consequences for Bayesian inverse methods that rely on numerical integration more broadly.2025-02-17T07:26:15Z33 pages, 7 figures, 2 tablesTess O'BrienMatthew T. MooresDavid WartonDaniel Falsterhttp://arxiv.org/abs/2407.11518v2Ensemble Transport Filter via Optimized Maximum Mean Discrepancy2026-02-06T17:48:27ZIn this paper, we present a new ensemble-based filter method by reconstructing the analysis step of the particle filter through a transport map, which directly transports prior particles to posterior particles. The transport map is constructed through an optimization problem described by the Maximum Mean Discrepancy loss function, which matches the expectation information of the approximated posterior and reference posterior. The proposed method inherits the accurate estimation of the posterior distribution from particle filtering while gives an extension to high dimensional assimilation problems. To improve the robustness of Maximum Mean Discrepancy, a variance penalty term is used to guide the optimization. It prioritizes minimizing the discrepancy between the expectations of highly informative statistics for the reference posteriors. The penalty term significantly enhances the robustness of the proposed method and leads to a better approximation of the posterior. A few numerical examples are presented to illustrate the advantage of the proposed method over ensemble Kalman filter.2024-07-16T08:54:12Z27 pages, 14 figuresDengfei ZengLijian Jiang10.1016/j.jcp.2025.114582http://arxiv.org/abs/2405.07102v6Nested Instrumental Variables Analysis: Switcher Average Treatment Effect, Identification, Efficient Estimation and Generalizability2026-02-05T00:45:31ZInstrumental variables (IVs) are widely used to estimate causal effects from non-randomized data. A canonical example is a randomized trial with noncompliance, in which the randomized treatment assignment serves as an IV for the non-ignorable treatment received. Under a monotonicity assumption, a valid IV nonparametrically identifies the average treatment effect among a latent complier subgroup, whose generalizability is often under debate. In many studies, there exist multiple versions of an IV, for instance, different nudges to take the same treatment in different study sites in a multicenter clinical trial. These different versions of an IV may result in different compliance rates and offer a unique opportunity to study IV estimates' generalizability. In this article, we introduce a novel nested IV assumption and study identification of the average treatment effect among two latent subgroups: always-compliers and switchers, who are defined based on the joint potential treatment received under two versions of a binary IV. We derive the efficient influence function for the SWitcher Average Treatment Effect (SWATE) under a nonparametric model and propose efficient estimators. We then propose formal statistical tests of the generalizability of IV estimates under the nested IV framework. The proposed tests are flexible nonparametric generalizations of classical overidentification tests that allow estimating nuisance parameters using machine learning tools. We apply the proposed method to the Prostate, Lung, Colorectal and Ovarian (PLCO) Cancer Screening Trial and study the causal effect of colorectal cancer screening and its generalizability.2024-05-11T22:05:52ZRui WangYing-Qi ZhaoOliver DukesBo Zhanghttp://arxiv.org/abs/2602.04762v1Uncertainty in Island-based Ecosystem Services and Climate Change2026-02-04T16:58:58ZSmall and medium-sized islands are acutely exposed to climate change and ecosystem degradation, yet the extent to which uncertainty is systematically addressed in scientific assessments of their ecosystem services remains poorly understood. This study revisits 226 peer-reviewed articles drawn from two global systematic reviews on island ecosystem services and climate change, applying a structured post hoc analysis to evaluate how uncertainty is treated across methods, service categories, ecosystem realms, and decision contexts. Studies were classified according to whether uncertainty was explicitly analysed, just mentioned, or ignored. Only 30 percent of studies incorporated uncertainty explicitly, while more than half did not address it at all. Scenario-based approaches dominated uncertainty assessment, whereas probabilistic and ensemble-based frameworks remained limited. Cultural ecosystem services and extreme climate impacts exhibited the lowest levels of uncertainty integration, and few studies connected uncertainty treatment to policy relevant decision frameworks. Weak or absent treatment of uncertainty emerges as a structural challenge in island systems, where narrow ecological thresholds, strong land-sea coupling, limited spatial buffers, and reduced institutional redundancy amplify the consequences of decision-making under incomplete knowledge. Systematic mapping of how uncertainty is framed, operationalised, or neglected reveals persistent methodological and conceptual gaps and informs concrete directions for strengthening uncertainty integration in future island-focused ecosystem service and climate assessments. Embedding uncertainty more robustly into modelling practices, participatory processes, and policy tools is essential for enhancing scientific credibility, governance relevance, and adaptive capacity in insular socio-ecological systems.2026-02-04T16:58:58ZNazli DemirelIoannis N. VogiatzakisGeorge ZittisMirela TaseAttila D. SandorSavvas ZotosChristos ZoumidesTurgay DindarogluMauro FoisIrene ChristoforidiValentini StamatiadouShiri Zemah-ShamirTamer AlbayrakCigdem Kaptan AyhanParaskevi ManolakiIna SieberZiv Zemah-ShamirElli TzirkalliAristides Moustakashttp://arxiv.org/abs/2602.04353v1Anyone for chess? Analysing chess ratings above high thresholds2026-02-04T09:22:40ZSuppose some cleverness score parameter is sufficiently
interesting to be defined and then measured, perhaps for
different strata of specialists or for the broader population.
Such phenomena could have Gaussian distributions,
when it comes to all players in a stratum, but when interest
focuses on the very tails, for the top few percent,
those above certain high thresholds,
different models are called for, along with the need
to analyse such based on the listed top scores only.
In this note I develop such models and tools,
and apply them to the top-100 and above 2100 points
lists for regular chess ratings, for the currently active
14671 men and 753 women,
as given by the FIDE, January 2026.
It is argued that even when two or more distributions have
close to identical expected values, or medians,
even smaller differences in variance may explain gaps
for the few very best ones.2026-02-04T09:22:40Z9 pages, 7 figuresNils Lid Hjorthttp://arxiv.org/abs/2602.04164v1The Dynamics of Attention across Automated and Manual Driving Modes: A Driving Simulation Study2026-02-04T02:57:40ZThis study aims to explore the dynamics of driver attention to various zones, including the road, the central mirror, the embedded Human-Machine Interface (HMI), and the speedometer, across different driving modes in AVs. The integration of autonomous vehicles (AVs) into transportation systems has introduced critical safety concerns, particularly regarding driver re-engagement during mode transitions. Past accidents underscore the risks of overreliance on automation and highlight the need to understand dynamic attention allocation to support safety in autonomous driving. A high-fidelity driving simulation was conducted. Eye-tracking technology was used to measure fixation duration, fixation count, and time to first fixation across distinct driving modes (automated, manual, and transition), which were then used to assess how drivers allocated attention to various areas of interest (AOIs). Findings show that drivers' attention varies significantly across driving modes. In manual mode, attention consistently focuses on the road, while in automated mode, prolonged fixation on the embedded HMI was observed. During the handover and takeover phases, attention shifts dynamically between environmental and technological elements. The study reveals that driver attention allocation is mode-dependent. These findings inform the design of adaptive HMIs in AVs that align with drivers' attention patterns. By presenting relevant information according to the driving context, such systems can enhance driver-vehicle interaction, support effective transitions, and improve overall safety. Systematic analysis of visual attention dynamics across driving modes is gaining prominence, as it informs adaptive HMI designs and driver readiness interventions. The GLMM findings can be directly applied to the design of adaptive HMIs or driver training programs to enhance attention and improve safety.2026-02-04T02:57:40ZYuan CaiMustafa DemirFarzan SasangoharMohsen Zarehttp://arxiv.org/abs/2505.08395v2Bayesian Estimation of Causal Effects Using Proxies of a Latent Interference Network2026-02-03T14:59:43ZNetwork interference occurs when treatments assigned to some units affect the outcomes of others. Traditional approaches often assume that the observed network correctly specifies the interference structure. However, in practice, researchers frequently only have access to proxy measurements of the interference network due to limitations in data collection or potential mismatches between measured networks and actual interference pathways. In this paper, we introduce a framework for estimating causal effects when only proxy networks are available. Our approach leverages a structural causal model that accommodates diverse proxy types, including noisy measurements, multiple data sources, and multilayer networks, and defines causal effects as interventions on population-level treatments. The latent nature of the true interference network poses significant challenges. To overcome them, we develop a Bayesian inference framework. We propose a Block Gibbs sampler with Locally Informed Proposals to update the latent network, thereby efficiently exploring the high-dimensional posterior space composed of both discrete and continuous parameters. The latent network updates are driven by information from the proxy networks, treatments, and outcomes. We illustrate the performance of our method through numerical experiments, demonstrating its accuracy in recovering causal effects even when only proxies of the interference network are available.2025-05-13T09:46:30ZBar WeinsteinDaniel Nevohttp://arxiv.org/abs/2602.03274v1Six-Minute Man Sander Eitrem 5:58.52 -- first man below the 6:00.00 barrier2026-02-03T08:58:23ZIn Calgary, November 2005, Chad Hedrick was the first to skate the 5,000 m below 6:10. His world record time 6:09.68 was then beaten a week later, in Salt Lake City, by Sven Kramer's 6:08.78. Further top races and world records followed over the ensuing seasons; up to and including the 2024-2025 season, a total of 126 races have been below 6:10, with Nils van der Poel's 2021 world record being 6:01.56. The appropriately hyped-up canonical question for the friends and followers and aficionados of speedskating has then been when (and by whom we for the first time would witness a below 6:00.00 race. In this note I first use extreme value statistics modelling to assess the state of affairs, as per the end of the 2024-2025 season, with predictions and probabilities for the 2025-2026 season. Under natural modelling assumptions the probability of seeing a new world record during this new season is shown to be about ten percent. We were indeed excited but in reality merely modestly surprised that a race better than van der Poel's record was clocked, by Timothy Loubineaud, in Salt Lake City, November 14, 2025. But Six-Minute Man Sander Eitrem's outstanding 5:58.52 in Inzell, on January 24, 2026, is truly beamonesquely shocking. I also use the modelling machinery to analyse the post-Eitrem situation, and suggest answers to the question of how fast the 5,000 m ever can be skated.2026-02-03T08:58:23ZNils Lid Hjort