https://arxiv.org/api/aG4w3uRguFYmCbDRK8UuwrVJp5c2026-06-18T12:59:53Z2357131515http://arxiv.org/abs/2605.28762v1Deep Neural Networks for Doubly Robust Estimation with Nonprobability Survey Samples2026-05-27T17:21:50ZIntegrating probability and nonprobability survey samples is an important problem in modern survey sampling. Nonprobability samples often contain rich outcome information but may lack population representativeness, whereas probability samples provide design-based auxiliary information but may not contain the study variable. We propose a deep neural network (DNN)-assisted doubly robust framework for estimating the finite population mean from these two data sources. The proposed method models the logit sampling score for the nonprobability sample as an unknown nonparametric function and estimates it by maximizing a pseudo-likelihood that combines information from the nonprobability sample and a reference probability sample. The DNN parameters are optimized using the ADAM algorithm. The resulting DNN-estimated sampling scores are incorporated into a DNN-assisted inverse-probability weighted estimator and a deep doubly robust estimator. We establish consistency and convergence rates under regularity conditions and evaluate the finite-sample performance of the proposed estimators through simulation studies and an empirical application using Pew Research Center and Behavioral Risk Factor Surveillance System data. The results suggest that the proposed estimators can improve robustness to parametric propensity-score misspecification, especially when the true selection mechanism is nonlinear.2026-05-27T17:21:50Z29 pages, 1 figureYufang DaiShihua LuoWendy LouZilin WangXuewen Luhttp://arxiv.org/abs/2411.13479v4Conformal Prediction for Hierarchical Data2026-05-27T16:36:39ZWe consider conformal prediction for multivariate data and focus on hierarchical data, where some components are linear combinations of others. Intuitively, the hierarchical structure can be leveraged to reduce the size of prediction regions for the same coverage level. We implement this intuition by including a projection step (also called a reconciliation step) in the split conformal prediction [SCP] procedure, and prove that the resulting prediction regions are indeed globally smaller. We do so both under the classic objective of joint coverage and under a new and challenging task: component-wise coverage, for which efficiency results are more difficult to obtain. The associated strategies and their analyses are based both on the literature of SCP and of forecast reconciliation, which we connect. We also illustrate the theoretical findings, for different scales of hierarchies on simulated data.2024-11-20T17:26:26Z39 pages, 4 figuresGuillaume PrincipatoGilles StoltzYvenn Amara-OualiYannig GoudeBachir HamroucheJean-Michel Poggihttp://arxiv.org/abs/2605.28344v1Capturing the Curve: Functional Data Analysis for Validated Digital Outcome Measures2026-05-27T11:48:45ZDigital health technologies enable high-frequency collection of data in near-continuous time and capture rich information about the health of individuals. The raw data collected by these devices often have a hierarchical functional structure: repeated physiological functions are observed over time and on multiple time scales (seconds, days, weeks). While many summaries can be derived from digital data, typically, only a small subset of pre-defined scalars is validated as outcome measures in clinical trials. We explore data-driven summaries based on between-subject scores from Multilevel Functional Principal Component Analysis (MFPCA), which are low-dimensional representations of functional data with robust statistical properties. Specifically, we compute MFPCA projection scores with respect to a reference population, summarising how individuals differ from the dominant directions of variation at each hierarchical level. Through a simulation study based on smartwatch electrocardiogram (ECG) signals, we compare MFPCA scores with pre-specified summaries in terms of validation criteria, including test-retest reliability and known-groups discrimination. We demonstrate that MFPCA scores generally have high reliability and can discriminate between groups across simulated scenarios of change. This offers an advantage when digital tools enable the measurement of novel physiological signals and the characteristics of the change are not yet defined. Finally, using knee flexion-extension data from individuals living with Parkinson's disease, we demonstrate that one of the MFPCA scores more strongly correlates with established gold-standard metrics and can detect clinical change, compared to a pre-specified scalar. We conclude that MFPCA-derived scores retain more information than typical outcome measures and open the door to using learning representation strategies in clinical trial settings.2026-05-27T11:48:45ZMia S. TackneyMarcos MatabuenaMarco PalmaMichael WesterClaire MaassenThomas KrammerJulian MustrophPeter H. CharltonJames CarpenterSofia S. Villarhttp://arxiv.org/abs/2605.28212v1How to measure intra-physician variability in clinical decision-making?2026-05-27T09:30:45ZIntra-physician prescribing variability, the probability that one physician issues discordant decisions for two patients deemed comparable on observed covariates, holds great impact in quality of care, safety and cost. However, there are no known validated measurement methods. Here, we benchmark eight methods (Euclidean, Mahalanobis, Learned-Weights, Genetic Mahalanobis, Random Forest proximity, Mutual-Information-weighted, Latent Profile Analysis and Bayesian binomial generalized linear mixed model) against a synthetic ground truth across 94 experimental conditions. Learned-Weights matching achieves the lowest mean absolute error (0.027), followed by Mutual-Information-weighted matching (0.028) and RF Proximity (0.034). All eight discordance-analysis methods preserve the physician rank ordering with high fidelity (Spearman > 0.89 versus the ground truth on the SCORE2 experiment), as long as the physician variability groups are well separated. Under a continuous-heterogeneity physician model, rank preservation degrades substantially for unsupervised methods (Spearman = [0.28, 0.35]) but is retained by supervised feature-weighted methods and the GLMM (Spearman = [0.62, 0.68]). This controlled methodological evaluation is a foundation for validation on observational prescribing data. Once validated on observational prescribing data, these evaluated open-source estimators could turn prescribing inconsistency into a routinely measurable clinician-level quality metric, systematically complementing the existing literature on between-physician variation.2026-05-27T09:30:45Z24 pages, 7 tables, 3 figuresAlaedine BenaniPierre MenetonEmmanuel MessasLiza HettalSai SagireddyDamien GrosgeorgeJérôme SalomonSylvain BodardXavier Tannierhttp://arxiv.org/abs/2603.08276v2A Unified Framework for Density Estimation under Right-Censored Point-Centred Quarter Sampling2026-05-27T06:24:13ZWhile the point-centred quarter method (PCQM) is widely used for density estimation, existing methods for handling right-censored data from truncated search radii rely primarily on a Poisson model assuming complete spatial randomness (CSR), leaving a critical gap for spatially aggregated populations. To address this limitation, we develop a unified likelihood- and moment-based framework for right-censored point-centred quarter sampling under both Poisson and negative binomial distribution (NBD) models. In particular, the proposed NBD-based estimators explicitly account for spatial aggregation and censoring simultaneously, extending distance-based inference beyond the CSR setting. Extensive simulations and applications to fully mapped forest plots reveal that the NBD-based MLE delivers the most robust overall performance across diverse ecological scenarios. Across more than 100 species from fully mapped forest plots, the proposed NBD-based MLE approximately reduced absolute relative bias by a median of 0.10 compared with existing censored estimators, representing a relative improvement of over 30%. Ultimately, our framework provides a rigorously validated and practically useful toolkit for analysing censored point-to-tree distance data.2026-03-09T11:47:55Z42 pages, 28 figures, 4 tableWenzhe HuangGuochun ShenDingliang XingJiangyan Zhaohttp://arxiv.org/abs/2601.07299v2Cauchy-Gaussian Overbound for Heavy-tailed GNSS Measurement Errors2026-05-27T01:37:24ZOverbounds of heavy-tailed measurement errors are essential to meet stringent navigation requirements in integrity monitoring applications. This paper proposes to leverage the bounding sharpness of the Cauchy distribution in the core and the Overbounds of heavy-tailed measurement errors are essential for meeting stringent navigation requirements in integrity-monitoring applications. This paper proposes to leverage the bounding sharpness of the Cauchy distribution in the core and the Gaussian distribution in the tails to tightly bound heavy-tailedglobal navigation satellite system measurement errors. We develop a procedure to determine the overbounding parameters for both symmetric unimodal (SU)and non-symmetric unimodal (NSU) heavy-tailed errors and prove that the over-bounding property is preserved through convolution. Experiment results on both simulated and real-world data sets reveal that our method can sharply boundheavy-tailed errors in both the core and tail regions. In the position domain, the proposed method reduces the average vertical protection level by 15% for SU heavy-tailed errors compared with the single-cumulative-density-function Gaussian overbound and by 21%-47% for NSU heavy-tailed errors compared with the navigation discrete envelope and two-step Gaussian overbounds.2026-01-12T08:21:14ZPublished in NAVIGATION: Journal of the Institute of NavigationZhengdao LiPenggao YanWeisong WenLi-Ta Hsu10.33012/navi.749http://arxiv.org/abs/2605.27796v1Benchmarking Ultrasound Foundation Models for Fetal Plane Classification2026-05-27T00:32:40ZUltrasound is widely used in obstetric care due to its safety, accessibility, and real-time imaging. However, interpretation remains operator-dependent and susceptible to noise and artifacts. Deep learning models have shown strong performance to solve these problem, but they typically require large annotated datasets that are difficult to obtain in clinical ultrasound. Foundation models (FMs) offer an alternative, using a large number of ultrasound images to learn transferable representations that can generalize with limited labeled data. This work presents a comprehensive benchmark of ultrasound-specific FMs for fetal plane classification. We evaluated four ultrasound FMs (USFM, MOFO, UltraSAM, FetalCLIP) against two CNN baselines (ResNet50, EfficientNet-V2) and a ViT (DINOv3) pretrained on natural images. We trained all models under two complementary settings: full fine-tuning and linear probing with a frozen encoder. All models were trained using 5-fold patient-level cross-validation on a Spanish fetal ultrasound dataset and tested on both in-domain data and an external African cohort to assess cross-population generalization. We found that FetalCLIP achieved the best results in the linear probing setting (F1 = 0.9261 for in-domain, F1 = 0.9731 for out-of-domain), while USFM performed best in the full fine-tuning setting (F1 = 0.9476 for in-domain, F1 = 0.9515 for out-of-domain). MOFO and UltraSAM degraded most in both settings, underperforming natural image pretrained models in some cases. These findings highlight how the choice of pretrained model strongly affects fetal plane classification performance, since different pretraining objectives lead to different levels of transferability.2026-05-27T00:32:40ZLeya BarrientosYuexi DuNicha C. Dvornekhttp://arxiv.org/abs/2605.27781v1Day-Ahead Electricity Price Forecasting Using a Multivariate Group Lasso Method2026-05-27T00:08:44ZElectricity price signals in modern power systems exhibit complex dependence structures that render forecasting inherently challenging. Our analysis of real-world pricing signals from the California Independent System Operator (CAISO) reveals complex temporal group effects, whereby the influence of explanatory variables on electricity prices persists across consecutive blocks of time due to underlying economic and operational drivers. In response, we propose a multivariate statistical method based on a Group Lasso formulation to forecast the vector of day-ahead electricity prices, by leveraging multi-feature temporal group effects. Our approach is evaluated on two full years of electricity prices from CAISO, demonstrating considerable improvements in point and probabilistic forecast metrics compared to a wide array of statistical and deep learning methods. Theoretical and empirical analyses confirm the effectiveness of the proposed approach in modeling realistic group effects, maintaining both interpretability and low computational complexity. When retrospectively evaluated on test data from a recent international electricity price forecasting challenge, the proposed method ranked in second place, despite having access to significantly less information than competing approaches. Finally, the proposed method is independently validated against two operational electricity price forecasting systems in CAISO, demonstrating competitive predictive performance and practical relevance.2026-05-27T00:08:44ZKeyi WangJiaxiang JiMahan MansouriAhmed Aziz Ezzathttp://arxiv.org/abs/2605.27720v1Bayesian Deployment Approval for Learned Landing Controllers under Finite Rollout Validation2026-05-26T21:45:17ZReinforcement learning and data-driven autonomous controllers are commonly evaluated using cumulative reward and empirical success frequency under finite simulation trajectories. However, such empirical metrics do not necessarily provide sufficient statistical evidence regarding deployment readiness under uncertainty. This work develops a Bayesian approval framework for learned autonomous landing controllers under finite rollout evidence. A probabilistic landing capability formulation is introduced based on touchdown safety satisfaction under uncertain operating conditions, while Bayesian posterior inference is used to quantify uncertainty regarding the true deployment capability of learned policies. Posterior approval probability and posterior deployment risk are further introduced for deployment-oriented evaluation, together with a sequential validation framework supporting approve/reject/continue decisions during progressive rollout testing. Simulation experiments using PPO and SAC controllers demonstrate that empirical success and reward optimization may produce overconfident deployment interpretation under limited validation evidence, whereas posterior approval inference provides a more uncertainty-calibrated assessment of deployment readiness. The proposed framework provides a practical statistical connection between conventional reinforcement-learning evaluation and deployment-oriented validation under uncertainty and may be generalized to broader classes of learned autonomous systems.2026-05-26T21:45:17Z16 pages, 4 figures and 4 tablesFei JiangLei Yanghttp://arxiv.org/abs/2605.27694v1Likelihood-Free Inference for Multivariate Generalized Pareto Models2026-05-26T21:13:50ZLikelihood-based inference for multivariate extreme-value models is often unreliable or infeasible when likelihoods are intractable or supports are discrete. This challenge is particularly acute for multivariate discrete generalized Pareto models, where both marginal tail behavior and dependence must be inferred from sparse exceedance samples. We propose a two-stage likelihood-free inference procedure, termed AW--NBE (Adaptive Wasserstein Neural Bayes Estimator), that combines neural Bayes estimation with a targeted optimal transport refinement step based on the Sinkhorn discrepancy. In the first stage, a neural Bayes estimator trained on simulated data provides fast and stable initial parameter estimates. In the second stage, these estimates are locally refined by minimizing the Sinkhorn divergence between the empirical distributions of observed and simulated exceedances. This refinement reduces the Sinkhorn discrepancy between the empirical distributions of observed and simulated exceedances, while preserving dependence features learned by the neural estimator. Model adequacy is assessed using new optimal transport based multivariate Q--Q and potential diagnostics. Applications to financial log-returns and Swiss dry spell exceedances suggest that AW--NBE can improve parameter inferences compared to estimation using solely, either the Sinkhorn discrepancy, or the standard neural Bayes estimators and censored likelihood estimation.2026-05-26T21:13:50ZSamira AkaMarie KratzPhilippe Naveauhttp://arxiv.org/abs/2605.27648v1Why pyrotechnics markets keep killing:a simple geometric argument for redesign2026-05-26T20:09:24ZFires and explosions in pyrotechnics retail markets recur worldwide with predictable regularity, killing dozens to hundreds of people in single events. This paper argues that the global topology of the market is the dominant determinant of mortality, acting through two independent geometric channels. The first, propagation, concerns ballistic dispersal of ignited articles: the probability that fire spreads between blocks scales with the spatial density of blocks within the dispersal range. The second, evacuation, concerns the distance an occupant must traverse to reach the perimeter, which is set by the global geometry of the market footprint, not by any stall-level parameter. Because mortality risk grows approximately exponentially in evacuation time, topology amplifies modest differences in egress distance into large differences in casualties. Current standards in the United States, the European Union, and Mexico prescribe local parameters such as aisle width and stall separation, but leave the global topology of the market unregulated. We argue that topology should be a regulable design variable, and propose a market geometry that simultaneously slows propagation and shortens evacuation, derived from contact-process models of seed dispersal in spatial ecology.2026-05-26T20:09:24ZNine pages, three figuresCarlos M. Hernandez-SuarezAlonso Sanchez-MaldonadoCarlos A. Robles-Hernandezhttp://arxiv.org/abs/2605.27597v1Purely analytic composites: Relative variance contributions of indicators corresponding to a priori indicator weights2026-05-26T19:12:50ZComposites are often created to facilitate the work of decision-makers. Therefore, practical or theoretical considerations may lead to a priori weights of the indicators forming a composite. Composites that are created a weighted aggregates are not the result of data analysis and may therefore be termed 'analytic composites'. However, it has already been shown that the variance contributions of indicators within analytic composites are affected by the indicator variance and indicator inter-correlations. In the present study purely analytic composites are proposed, having exactly the variance contribution of indicators within the composites that are a priori defined by the indicator weights. An example based on simulated data illustrates the difference between analytic composites and purely analytic composites. As an application area, we propose that purely analytic composites could be of interest in the exchange-traded fund. An R-script for the computation of purely analytic composites is given in the Appendix.2026-05-26T19:12:50ZAndre BeauducelNed Kockhttp://arxiv.org/abs/2605.27335v1Beyond average warming: Two-sample inference for dense-sparse functional data reveals changes in intraday temperature patterns2026-05-26T17:42:48ZModern weather stations in Germany record daily temperatures every 10 minutes, whereas measurements from historical reference periods are often only available at much coarser temporal resolutions, typically hourly. This discrepancy must be accounted for when comparing historical and current daily temperature patterns. Motivated by this problem, we develop two-sample inference procedures for functional data under sampling schemes where one sample is densely observed while the other is relatively sparse. Building on recent ideas from transfer learning for functional data, we derive estimators of the difference of the mean functions that attain optimal convergence rates in the supremum norm. We further establish a functional central limit theorem in the space of continuous functions and develop multiplier bootstrap methods for constructing uniform confidence bands. Extensions to functional time series are also discussed. Applying the proposed methodology to daily temperature curves from German weather stations, analyzed separately by month, reveals that climate change has altered not only average temperatures but also intraday temperature patterns. In particular, for stations such as Berlin, warming from morning to early afternoon exceeds the daily average increase, whereas evening and nighttime temperatures exhibit comparatively smaller increases.2026-05-26T17:42:48ZKevin WilkHajo Holzmannhttp://arxiv.org/abs/2605.27270v1Inverse Control Constrained Optimization of Vessel Speed Decisions Under Environmental Risk: Evidence from Arctic Shipping2026-05-26T16:46:27ZUnderstanding how decision makers balance operational efficiency with environmental and ecological risks is central to vessel navigation. We model vessel speed as a control variable in a constrained optimization framework in which vessel operators balance multiple competing objectives, including transit efficiency, ice related navigational risk, and whale related ecological risk. The underlying risk parameters are estimated using over 14 million Automatic Identification System (AIS) observations from the United States Arctic (2010-2019), together with environmental covariates and spatially explicit whale density estimates. The framework incorporates a nonlinear risk objective, vessel heterogeneity, and regularization to ensure stable and interpretable results. The inferred trade offs reveal distinct decision making patterns across vessel groups and navigational statuses. Vessel types such as Tug Tow and Cargo balance operational speed with environmental and ecological considerations. In contrast, several vessel groups, including Fishing, Passenger, and Unspecified vessels, are strongly influenced by ice related risk, while Pleasure Craft and Tankers exhibit higher sensitivity to whale related risk. Across navigational status categories, similar heterogeneity is observed. The dominant status, under way using engine, displays a clear trade off, whereas other statuses, such as aground and undefined, are strongly shaped by ice related constraints. Statuses including restricted maneuverability and engaged in fishing exhibit higher estimated sensitivity to whale related risk, though with substantial uncertainty. Sensitivity analysis indicates that increasing whale-related risk weighting produces limited changes in model-implied optimal speed, whereas increasing ice-related risk leads to more consistent reductions.2026-05-26T16:46:27ZMauli PantLinda FernandezIndranil Sahoohttp://arxiv.org/abs/2605.27269v1Transfer Learning using 66 Diseases for Disease Forecasting Applications2026-05-26T16:45:21ZDisease forecasting models typically rely on a single data stream, making models brittle when histories are short or noisy. Recent top-performing models have shown that synthesizing multiple reporting systems for the same disease improves performance. Other recent work takes this idea a step further, using transfer learning to train a forecasting model for one disease using data from a different disease. We expand upon each of these approaches greatly, training machine learning models on data that span 66 infectious diseases and several data streams. We investigate the value of incorporating different data streams for forecasting 20 different disease data streams. We find that incorporating other data streams improves forecasting in the vast majority (84.9%) of time series and model structures considered. However, our work highlights that the quality of the added data matters, where adding data extremely different from the target data stream can sometimes degrade forecast performance. A major contribution of this work is in compiling a publicly-available database of data for use by the infectious disease forecasting community.2026-05-26T16:45:21ZLauren J BeesleyAlexander C MurphDave OsthusLauren A Castro