https://arxiv.org/api/77Ptvk4l+/+miiDooL//8R+2QLA 2026-06-13T16:09:40Z 23522 75 15 http://arxiv.org/abs/2606.08084v1 Assessing model calibration with boosting trees 2026-06-06T10:14:36Z

The main goal in regression modelling consists in approximating the conditional mean of a response given a set of features. A regression function is said to be calibrated if the resulting mean estimates match the true conditional means for almost every set of features. Aiming for calibration seems not achievable in practice as one typically deals with finite samples of noisy observations. A weaker notion of calibration is auto-calibration, and it means that the expectation of responses being given the same mean estimate matches this estimate. This notion is important, e.g., in insurance pricing as it ensures no cross-subsidization between different price cohorts. In this paper, we show that boosting trees can be used to test necessary conditions for calibration and auto-calibration, respectively. The practical relevance of our approach is supported by a numerical example, in which the proposed tests prove to be very powerful on a large insurance dataset.

2026-06-06T10:14:36Z 36 pages Selim Gatti http://arxiv.org/abs/2605.10406v2 Multi-Fidelity Quantile Regression 2026-06-06T08:18:07Z

High-fidelity (HF) data are often expensive to collect and therefore scarce, making conditional quantiles difficult to estimate accurately. We propose a two-stage, model-agnostic method for multi-fidelity quantile regression. The central idea is a local quantile link: at each covariate value, the HF quantile is represented as a low-fidelity (LF) quantile evaluated at a covariate-dependent level. This reformulation reduces the problem to estimating the level function, which can be smoother than the HF quantile itself when the LF and HF conditional distributions have similar shapes. We also study the complementary regime in which this advantage weakens and introduce a correction step to improve robustness. Our theory characterizes when the proposed estimator converges faster than direct quantile regression using HF data alone and when the correction step provides further improvement. Experiments on synthetic and real data show that our method yields more accurate quantile estimates and tighter conformal prediction intervals.

2026-05-11T11:43:38Z 69 pages, 12 figures, 3 tables Yixiang Liu Yao Zhang http://arxiv.org/abs/2604.06278v4 Predictive Volatility of Machine Learning in Micro-Samples: A Regularised Assessment of Regional Poverty 2026-06-06T07:22:29Z

Small regional datasets pose a dual statistical problem: correlated predictors inflate estimation variance, while flexible learners can become unstable because the available information per adaptive degree of freedom is limited. We examine this issue through predictive volatility, defined as the cross-sample dispersion and upper-tail behaviour of out-of-sample loss. Using simulation evidence reported for sparse linear, near-linear and heavy-tailed settings, we compare ordinary least squares, frequentist penalties, Bayesian shrinkage models, bounded-response and spatial specifications, and flexible machine-learning procedures. In the reported simulation results, regularised linear estimators generally dominate in the linear high-collinearity micro-sample settings and remain the most reliable overall, whereas tree-based methods become more competitive only when the signal is weakly nonlinear and the sample size is larger. In the empirical application to 34 Indonesian provinces, ridge yields the best leave-one-out performance, followed by elastic net and lasso. Across the Bayesian shrinkage specifications, ICT skills show the most consistent negative association with poverty, with the strongest support under horseshoe and spike-and-slab formulations. These results suggest that, in micro-sample regional modelling, the main constraint is limited information per effective degree of freedom rather than insufficient algorithmic flexibility.

2026-04-07T09:41:12Z Corrections are needed A. H. Jamaluddin A. T. R. Dani N. I. Mahat V. Ratnasari S. S. M. Fauzi http://arxiv.org/abs/2606.07994v1 The Rising Dominance of Methods Across Science 2026-06-06T06:16:22Z

Scientific progress is traditionally narrated through the interplay of theoretical insights and experimental findings. Yet this view of science underplays a third and central pillar of progress: the methods that underlie both conceptual advances and empirical evidence. By analysing more than 3 million articles across science published between 1980 and 2019, we find that science has undergone a fundamental structural transition. The share of papers that primarily contribute new methods-methods papers-has doubled across science over the past four decades, rising universally across disciplines and citation impact levels. Rather than a gradual evolution, this transition marks a pivotal shift beginning in the early 1990s, aligning with the computational revolution and the emergence of data-intensive science. The surge in methodological research is not confined to the most cited, elite publications; it spans the full spectrum of scientific output. These findings reveal a systemic reorientation of the scientific ecosystem where reusable methods increasingly serve as the essential infrastructure of scientific advances, challenging the traditional dichotomy of theory and experimental research. As science becomes increasingly methods-driven, our results call for rethinking how research is evaluated, funded and organised-towards better incentivising method innovations. This is especially the case as expanding AI must be effectively integrated with scientific instruments to realise its full potential.

2026-06-06T06:16:22Z Alexander Krauss Ariel Rosenfeld Lutz Bornmann http://arxiv.org/abs/2503.02245v2 Identification of Genetic Factors Associated with Corpus Callosum Morphology: Conditional Strong Independence Screening for Non-Euclidean Responses 2026-06-06T02:54:36Z

The corpus callosum, the largest white matter structure in the brain, plays a critical role in interhemispheric communication. Variations in its morphology are associated with various neurological and psychological conditions, making it a key focus in neurogenetics. Age is known to influence the structure and morphology of the corpus callosum significantly, complicating the identification of specific genetic factors that contribute to its shape and size. We propose a conditional strong independence screening method to address these challenges for ultrahigh-dimensional predictors and non-Euclidean responses. Our approach incorporates prior knowledge, such as age. It introduces a novel concept of conditional metric dependence, quantifying non-linear conditional dependencies among random objects in metric spaces without relying on predefined models. We apply this framework to identify genetic factors associated with the morphology of the corpus callosum. Simulation results demonstrate the efficacy of this method across various non-Euclidean data types, highlighting its potential to drive genetic discovery in neuroscience.

2025-03-04T03:44:51Z Zhe Gao Jin Zhu Yue Hu Wenliang Pan Xueqin Wang http://arxiv.org/abs/2606.07947v1 Bayesian Global Fréchet Regression via Weak Conditional Expectations 2026-06-06T02:34:09Z

Fréchet regression provides a versatile framework for modeling responses in metric spaces with Euclidean predictors, yet current methodologies rely almost exclusively on frequentist approaches. We propose a Bayesian framework for Fréchet regression that offers a principled way of incorporating prior information into nonlinear global Fréchet regression. By targeting a novel Fréchet Bayes rule, we reduce the object-valued regression problem to a collection of tractable scalar regression tasks. Our approach allows for a controlled interpolation between the prior and the data-driven frequentist estimate, facilitating effective shrinkage toward informed values. While initially derived under Gaussian assumptions, we demonstrate that our framework is robust to model misspecification by establishing its validity under moment conditions via weak conditional expectations. The numerical properties of the proposed methodology are demonstrated in simulation studies and an application to microbiome compositional data, where we show that leveraging an auxiliary cohort to inform the prior significantly enhances predictive performance in a targeted, small-scale study

2026-06-06T02:34:09Z 34 pages, 4 figures Simon Fontaine Bing Li Lingzhou Xue http://arxiv.org/abs/2601.01830v3 Confounder-robust causal discovery and inference in Perturb-seq using proxy and instrumental variables 2026-06-05T23:08:33Z

Emerging single-cell technologies that combine CRISPR-based genetic perturbations with single-cell RNA sequencing, such as Perturb-seq, offer unprecedented opportunities to uncover cause-and-effect relationships among genes. Nonetheless, Perturb-seq experiments are subject to unobserved factors that, if not properly handled, can severely bias the inferred causal relationships between genes. These latent factors may arise not only from intrinsic molecular features of the regulatory elements, but also from unmeasured genes omitted due to cost-constrained experimental designs. Although methods for analyzing large-scale Perturb-seq data are rapidly maturing, approaches that explicitly account for such unobserved confounders when inferring causal gene networks are still lacking. Here, we propose a novel approach to accurately reconstruct causal gene networks from Perturb-seq data even when important confounders are missing. Our framework leverages proxy and instrumental variable strategies to exploit the rich information embedded in the perturbations, enabling unbiased estimation of the underlying directed acyclic graph (DAG) of gene expression. Applications to both comprehensive synthetic data and real CRISPR interference experiments in K562 cells demonstrate that our method outperforms baseline approaches that lack principled adjustments for unmeasured confounding, yielding more accurate and biologically relevant recovery of the true causal DAGs.

2026-01-05T06:50:07Z Kwangmoon Park Hongzhe Li http://arxiv.org/abs/2407.01765v2 A General Framework for Design-Based Treatment Effect Estimation in Paired Cluster-Randomized Experiments 2026-06-05T21:43:05Z

Paired cluster-randomized experiments (pCRTs) are common in education program impact evaluation trials. Although common, there is surprisingly no clear consensus regarding how to analyze this randomization design to estimate average treatment effects. Variance estimation is also complicated due to the dependency created through pairing clusters. Therefore, we aim to provide an intuitive and practical comparison between different estimation strategies for pCRTs to inform practitioners' choice of strategy. To this end, we present a general framework for design-based estimation of an average individual effect in pCRTs. This framework offers a novel and intuitive view on the bias-variance trade-off between point estimators and emphasizes the benefits of covariate adjustment for estimation with pCRTs. In addition to providing a general framework for estimation with pCRTs, the point and variance estimators we present support fixed-sample unbiased estimation with similar precision to a common regression model and conservative variance estimation. Through simulation studies based on an educational efficacy trial, we compare the performance of the point and variance estimators reviewed. Our analysis and simulation studies inform the choice of point and variance estimators for analyzing pCRTs in practice.

2024-07-01T19:57:31Z Charlotte Z. Mann Adam C. Sales Johann A. Gagnon-Bartsch http://arxiv.org/abs/2606.07809v1 Sensitivity Analysis White Paper 2026-06-05T19:37:27Z

Sensitivity analysis is an important component of simulation-based decision support because it helps analysts determine which inputs most strongly influence model outcomes under uncertainty. This paper organizes the broad sensitivity analysis literature into a coherent framework for use in complex simulation settings, with particular attention to military applications. We review major classes of methods, including local and global approaches, variance-based techniques, screening methods, derivative-based methods, and uncertainty quantification tools, and relate them to common analytical objectives such as factor prioritization, factor fixing, variance reduction, and factor mapping. The paper also discusses sensitivity auditing as a complementary perspective that emphasizes transparency, assumption tracking, and responsible use of models in decision-relevant settings.

2026-06-05T19:37:27Z 12 pages, Nate Bade Lindsay Erickson http://arxiv.org/abs/2602.09267v3 Estimating the distance at which narwhal respond to disturbance: a penalized threshold hidden Markov model 2026-06-05T18:09:08Z

Understanding behavioural responses to disturbances is vital for wildlife conservation. For example, in the Arctic, the decrease in sea ice has opened new shipping routes, increasing the need for impact assessments that quantify the distance at which marine mammals react to vessel presence. This information can then guide targeted mitigation policies, such as vessel slow-down regulations and delineation of avoidance areas. Using telemetry data to determine distances linked to deviations from normal behaviour requires advanced statistical models, such as threshold hidden Markov models (THMMs). While these are powerful tools, they do not assess whether the estimated threshold reflects a meaningful behavioural shift. We introduce a lasso-penalized THMM that builds on computationally efficient methods to impose penalties on HMMs and present a new, efficient penalized quasi-restricted maximum-likelihood estimator. Our framework is capable of estimating thresholds and assessing whether the disturbance effects are distinguishable from baseline behaviour. With simulations, we demonstrate that our lasso method effectively shrinks spurious threshold effects towards zero. When applied to narwhal movement data, our analysis suggests that narwhal react to vessels up to 4 kilometres away by decreasing movement persistence and spending more time in deeper waters (average maximum depth of 356m). Overall, we provide a broadly applicable framework for quantifying behavioural responses to stimuli, with applications ranging from determining reaction thresholds to disturbance to estimating the distances at which terrestrial species, such as elephants, detect water.

2026-02-09T23:03:25Z 22 pages Fanny Dupont Marianne Marcoux Nigel E. Hussey Jackie Dawson Marie Auger-Méthé http://arxiv.org/abs/2211.02192v3 A Mixed Model Approach for Estimating Regional Functional Connectivity from Voxel-level BOLD Signals 2026-06-05T16:28:55Z

Resting-state brain functional connectivity quantifies the synchrony between activity patterns of different brain regions. In functional magnetic resonance imaging, each region comprises a set of spatially contiguous voxels at which blood-oxygen-level-dependent signals are acquired. The ubiquitous Correlation of Averages (CA) estimator, and other similar metrics, are computed from spatially aggregated signals within each region, and remain the quantifications of inter-regional connectivity most used by neuroscientists. Their popularity is primarily due to computational simplicity despite their demonstrable bias and lack of statistically principled justification. By leveraging linear mixed-effects models, both inter-regional and intra-regional correlation and measurement error can be explicitly modeled as signal variability sources. A novel computational pipeline, focused on subject-level inter-regional correlation parameters of interest, is developed to address the challenges of applying maximum likelihood estimation to such structured, high-dimensional spatiotemporal data. Simulation results confirm the superiority of the proposed estimator relative to CA in terms of both decreased bias and accurate confidence interval coverage across simulation settings. The proposed method is also applied to construct individual human brain networks for subjects from a Human Connectome Project test-retest database. Concordances between inter-regional correlation estimates demonstrate the potentially substantial scientific benefits of the proposed approach that reliably produces more consistent results than CA for test-retest scans of the same subject.

2022-11-04T00:00:26Z Ruobin Liu Chao Zhang Chau Tran Sophie Achard Wendy Meiring Alexander Petersen http://arxiv.org/abs/2606.07364v1 S2A3: Thompson Sampling and Stochastic Exposure Control for High-Stakes CATs 2026-06-05T15:08:44Z

High-stakes computerized adaptive tests (CATs) require a continuous supply of calibrated items, yet traditional item piloting is slow, expensive, and operationally hazardous. We introduce the S2A3 framework -- Soft Scoring (S2) and Adaptive Adaptive Administration (A3) -- which unifies item calibration and test administration into a single online process. Thompson sampling enhances item selection by drawing provisional parameters from each item's posterior distribution and selecting the item maximizing expected Fisher information, naturally routing uncertain items to informative test-takers while maintaining measurement precision. Soft scoring integrates over parameter uncertainty so that incompletely calibrated items exert appropriately attenuated influence on ability estimates. A stochastic variant of Sympson-Hetter exposure control balances measurement efficiency against bank security via a tunable temperature parameter and item-specific weights. We validate S2A3 on Yes/No Vocabulary and Vocabulary-in-Context tasks from the Duolingo English Test, demonstrating rapid item calibration and preserved scoring reliability even when cold-start items constitute a significant fraction of the active pool.

2026-06-05T15:08:44Z James Sharpnack Alexander Tsigler J. R. Lockwood Steven Nydick Alina A. von Davier http://arxiv.org/abs/2312.07762v3 Interpretable factorization of clinical questionnaires to identify latent factors of psychopathology 2026-06-05T14:27:01Z

Psychiatry research seeks to understand the manifestations of psychopathology in behavior, as measured in questionnaire data, by identifying a small number of latent factors that explain them. While factor analysis is the traditional tool for this purpose, the resulting factors may not be interpretable, and may also be subject to confounding variables. Moreover, missing data are common, and explicit imputation is often required. To overcome these limitations, we introduce interpretability constrained questionnaire factorization (ICQF), a non-negative matrix factorization method with regularization tailored for questionnaire data. Our method aims to promote factor interpretability and solution stability. We provide an optimization procedure with theoretical convergence guarantees, and an automated procedure to detect latent dimensionality accurately. We validate these procedures using realistic synthetic data. We demonstrate the effectiveness of our method in a widely used general-purpose questionnaire, in two independent datasets (the Healthy Brain Network and Adolescent Brain Cognitive Development studies). Specifically, we show that ICQF improves interpretability, as defined by domain experts, while preserving diagnostic information across a range of disorders, and outperforms competing methods for smaller dataset sizes. This suggests that the regularization in our method matches domain characteristics. The python implementation for ICQF is available at https://github.com/jefferykclam/ICQF.

2023-12-12T22:10:38Z Ka Chun Lam Bridget W Mahony Armin Raznahan Francisco Pereira http://arxiv.org/abs/2606.07016v1 An Integrated Roadside Sensing and Communication Framework for Vulnerable Road User Safety at Signalized Intersections 2026-06-05T07:59:35Z

Vulnerable road users (VRUs) account for approximately half of urban traffic deaths globally, with intersections concentrating a disproportionate share of these casualties. Recent reviews of sensing technology for VRU protection have cataloged dozens of single-sensor and dual-sensor deployments, yet none of the surveyed systems couples multi-modal sensing with edge-side near-miss analytics and bidirectional vehicle-to-everything (V2X) and pedestrian-to-everything (P2X) messaging in a single intersection cabinet. This paper presents an integrated framework for VRU protection at signalized intersections, combining LiDAR, radar, RGB camera, and thermal camera at the perception layer, edge-based prediction and surrogate-safety analytics at the computation layer, V2X and P2X messaging at the communication layer, and adaptive signal control at the actuation layer. The framework is grounded in an empirical case study using R-LiViT, the first publicly released roadside LiDAR-Visual-Thermal dataset, which provides 200 multi-modal sequences and 2,400 annotated RGB-T frames at three German intersections. Analysis of 53,319 detection annotations reveals that VRUs comprise approximately 49% of all road-user observations, that day-to-night density drops by 38% for pedestrians and 45% for vehicles while the night distribution shows a higher close-proximity share, that per-frame close-proximity event counts vary approximately 10-fold across the eight unique locations at three intersections, and that 83% of pedestrian bounding boxes are small in image space, indicating that VRUs are typically far from any single sensor. These findings support multi-modal sensing, edge-side analytics, and adaptive context-sensitive deployment rather than uniform single-sensor solutions.

2026-06-05T07:59:35Z 17 pages, 5 figures, 2 tables. Preprint Parvez Anowar http://arxiv.org/abs/2606.07014v1 Networked Spatial Effects in European Electricity Price Forecasting 2026-06-05T07:56:46Z

As European bidding zones are highly interconnected by physical transmission lines, spatial influences propagate across neighboring nodes through a network. It is reflected in the day-ahead electricity prices across European bidding zones, as the auction algorithm also uses information beyond each bidding zone's geographic boundary. To capture how this interconnection affects the electricity prices in neighboring bidding zones, we have used a metric graph to map the spatial coverage of information using a well-defined neighborhood measure. We propose the Networked Spatio-Temporal Model (NSTM), which maps irregular spatial nodes into an ordered network, enabling the systematic incorporation of neighborhood information. We implement the NSTM across 39 bidding zones covering the majority of European electricity markets in a high-resolution, streaming-forecasting setup. The model uses autoregressive, cross-hour, and seasonal effects, along with fuel and emission prices and day-ahead forecasts of fundamentals, as interconnected information to predict the day-ahead prices for each bidding zone. A Europe-wide study presented in this paper shows that the NSTM consistently outperforms traditional island-based pure local models. This paper provides a framework that demonstrates the critical role the networked structure plays in propagating information across interconnected markets and its vast implications for day-ahead electricity price forecasting.

2026-06-05T07:56:46Z Sultan Mahmud Chomon Florian Ziel