https://arxiv.org/api/LqoXh9wU/OKSG2cn+OlhWbOto9U2026-06-11T06:52:00Z3614646515http://arxiv.org/abs/2406.15844v3Bayesian modeling of multi-species labeling errors in ecological studies2026-05-28T13:56:38ZEcological and conservation studies monitoring bird communities typically rely on species classification based on bird vocalizations. Historically, this has been based on expert volunteers going into the field and making lists of the bird species that they observe. Recently, machine learning algorithms have emerged that can accurately classify bird species based on audio recordings of their vocalizations. Such algorithms crucially rely on training data that are labeled by experts. Automated classification is challenging when multiple species are vocalizing simultaneously, there is background noise, and/or the bird is far from the microphone. In continuously monitoring different locations, the size of the audio data become immense and it is only possible for human experts to label a tiny proportion of the available data. In addition, experts can vary in their accuracy and breadth of knowledge about different species. This article focuses on the important problem of combining sparse expert annotations to improve bird species classification while providing uncertainty quantification. We additionally are interested in providing expert performance scores to increase their engagement and encourage improvements. We propose a Bayesian hierarchical modeling approach and evaluate this approach on a new community science platform developed in Finland.2024-06-22T13:16:38ZHaoxuan WangPatrik LauhaDavid B. Dunsonhttp://arxiv.org/abs/2605.29922v1Statistical Tapers for Correlation-Based Localization in Ensemble Data Assimilation2026-05-28T13:33:49ZLocalization is essential in ensemble-based data assimilation because finite ensembles produce noisy covariance estimates, causing spurious updates and excessive loss of ensemble variance. In subsurface applications, localization is usually based on spatial distance, but this criterion can be hard to justify when parameter-data relationships are controlled by flow dynamics, nonlinear operators, non-local parameters, or prior conditioning effects.
This work investigates correlation-based localization as an alternative strategy in which tapering coefficients are computed from the statistical reliability of estimated model-data correlations. We interpret localization as a shrinkage problem in correlation space and propose three tapers: a generalized power-law taper motivated by mean-square-error correction, a logistic taper derived from a Bayesian spike-and-slab formulation, and a discrepancy-based taper inspired by Morozov's principle.
The tapers are evaluated using synthetic reservoir data assimilation problems involving scalar and grid-based parameters, localized flow responses, non-trivial correlation patterns, and increasing model dimension. The results show that correlation-based localization can suppress spurious correlations while preserving meaningful parameter-data relationships. In several cases, the proposed power-law and logistic tapers retained more posterior ensemble variance than distance-based localization while maintaining acceptable data-match quality. The logistic taper provided the strongest variance preservation, whereas smoother tapers favored better data matches.
Overall, the results indicate that correlation-based localization is a statistically motivated alternative to distance-based localization, especially when spatial distance is unavailable or misleading.2026-05-28T13:33:49ZAlexandre A. EmerickVinicius Luiz Santos Silvahttp://arxiv.org/abs/2510.25154v3TabMGP: Martingale Posterior with TabPFN2026-05-28T12:08:17ZBayesian inference provides principled uncertainty quantification but is often limited by the challenges of prior and likelihood elicitation. The martingale posterior (MGP) (Fong et al., 2023) offers an alternative by replacing these requirements with a predictive rule. In addition, the MGP focuses inference on parameters defined through a loss function. This framework is especially resonant in the era of foundation transformers; practitioners increasingly leverage models like TabPFN for their state-of-the-art capabilities, yet often require epistemic uncertainty for a scientific estimand $θ$ that need not parameterise the implicit latent model. The MGP provides a mechanism to recover these posterior distributions. We introduce TabMGP, an MGP built on TabPFN for tabular data. TabMGP produces credible sets with near-nominal coverage and often outperforms both handcrafted MGP constructions and standard Bayesian baselines.2025-10-29T04:12:33ZAccepted at ICML 2026. Extra plots in https://drive.google.com/drive/folders/1ct_effOoTEGpiWUf0_1xI3VqLWHtJY16 . Code in https://github.com/weiyaw/tabmgpKenyon NgEdwin FongDavid T. FrazierJeremias KnoblauchSusan Weihttp://arxiv.org/abs/2605.29758v1Fisher's ideas and the design of field experiments in agronomy and plant breeding2026-05-28T11:06:59ZR. A. Fisher was one of the greatest scientists of the last century. He made many ground-breaking contributions, so many indeed that it seems almost impossible to list all of them. His revolutionary contributions to the design of experiments can mostly be traced to the early part of his academic career, and they are inextricably linked to his involvement with agricultural field experiments at Rothamsted Experiment Station. In this talk I will review Fisher's key ideas on experimental design and relate them to some of the work I am involved in, most of which directly focuses on field experiments in agriculture. Topics covered include systematic designs, row-column designs, augmented row-column designs, multi-environment trials, partially replicated designs, optimal allocation of trials to zones in sub-divided target populations of environments, and the connection of trialling systems across countries.2026-05-28T11:06:59Z31 pages, 2 tablesHans-Peter Piephohttp://arxiv.org/abs/2505.07989v3rd2d: Causal Inference in Boundary Discontinuity Designs2026-05-28T10:56:25ZBoundary Discontinuity (BD) designs are used in empirical research to learn about causal treatment effects along a continuous assignment boundary defined by a bivariate score. These designs are also known as multi-score regression discontinuity (RD) designs, and include geographic RD designs as a prominent example. This article introduces \pkg{rd2d}, a statistical software package for \proglang{R}, \proglang{Python}, and \proglang{Stata} that implements local polynomial estimation and inference for BD designs using either the bivariate score or a univariate signed distance-to-boundary score. The software covers sharp and fuzzy BD designs, providing automatic bandwidth selection, robust bias-corrected pointwise inference, uniform confidence bands, cluster-robust inference with joint or separate fitting conventions, covariate-adjusted efficiency improvements, mass-point checks, and covariance regularization, among other features. We illustrate the package with an empirical application to Opportunity Zones, where eligibility has a strong first-stage effect on designation but no significant effects on early workplace-job growth.2025-05-12T18:35:30ZMatias D. CattaneoRocio TitiunikRuiqi Rae Yuhttp://arxiv.org/abs/2605.01665v2Exact Likelihood Inference and Robust Filtering for Gauss-Cauchy Convolution Models2026-05-28T10:14:31ZThe convolution of a Gaussian and a Cauchy distribution, known as the Voigt distribution, is widely used in spectroscopy and provides a natural framework for modeling heavy-tailed measurement noise. We derive analytical expressions for its density, score, Hessian, Fisher information, and conditional moments using the scaled complementary error function, enabling stable maximum likelihood estimation without numerical convolution, finite-difference derivatives, or pseudo-Voigt approximations. The conditional expectation of the latent Gaussian component is governed by a redescending location score, so extreme observations are automatically discounted rather than propagated. This structure leads to the Gauss-Cauchy Convolution (GCC) filter for state-space models with Gaussian latent dynamics and Voigt measurement errors, where the Masreliez Gaussian prediction approximation preserves a Voigt prediction-error density. In an application to log realized volatility for the Technology Select Sector SPDR Fund, the GCC filter separates persistent latent variation from transient measurement noise and attains the highest implemented prediction-error criterion among the Gaussian, Student-$t$, Huber, and related filtering specifications considered.2026-05-03T01:34:26ZPeter Reinhard HansenChen Tonghttp://arxiv.org/abs/2605.29702v1A Jensen-Shannon divergence based $k$--$NN$ algorithm for missing value imputation in compositional data2026-05-28T10:01:55ZA novel nonparametric method to impute missing values in compositional data is developed. The method is based on the $k$--$NN$ algorithm, utilizes the Jensen-Shannon divergence and employs the Fr{é}chet mean to allow for more flexibility in the estimation process. As an extra feature, the hyper-parameters can be self-adaptive according to the pattern of missing values. Unlike restrictive parametric models, the proposed method makes no assumption about the structure of the data and, most importantly, it is applicable even when compositional data contain zero values. Through simulation studies using real data, it is shown that the proposed algorithm outperforms competing algorithms at various settings, not only in terms of accuracy but also in terms of computational efficiency.2026-05-28T10:01:55ZThis is the preprint of the paper that was published in the Journal of Applied Statistics. https://www.tandfonline.com/doi/full/10.1080/02664763.2026.2677908Michail TsagrisConnie StewartAbdulaziz Alenazi10.1080/02664763.2026.2677908http://arxiv.org/abs/2605.29641v1Experimentation for Different Scheduling Policies on Queues: Mixed Differences-in-Q Estimators Based on Little's Law2026-05-28T09:07:44ZIn data centers, tasks are dispatched to various servers to evenly distribute the workload. When a data center considers implementing a new scheduling algorithm, it typically conducts an A/B test prior to deployment to assess the real-world impact of this new method. However, a straightforward A/B test might be interfered with so-called ``Markovian'' interference. We utilized the Differences-in-Q estimator, as developed by Farias et al. (2022), and introduced mixed Differences-in-Q estimators grounded in Little's Law. We show that our A/B testing methods significantly reduce bias and variance when testing various scheduling policies. Extensive simulations were conducted under scenarios like non-stationary arrival rates, heterogeneous service rates, and communication delays. These simulations highlight the robustness and efficacy of our A/B testing approach.2026-05-28T09:07:44ZNanshan JiaRamesh JohariNian SiZeyu Zhenghttp://arxiv.org/abs/2605.29611v1Hierarchical forecasting: The role of information2026-05-28T08:45:07ZIn hierarchical forecasting, the process of forecast reconciliation transforms a set of "base" or "raw" forecasts, which do not satisfy the hierarchical aggregation constraints in the real data, into a set of "coherent" forecasts, which do satisfy those constraints. The academic literature provides ample simulation evidence and real-world examples demonstrating the value of forecast reconciliation in improving forecasts of hierarchical time series. This improvement is attributed to the imposition of aggregation constraints. However, this evidence is derived from base forecasts, each generated using a distinct information set, usually the univariate information set corresponding to each time series. Since reconciliation algorithms combine forecasts, it is difficult to determine the extent to which the improvement is due to the imposition of constraints versus the combination of information carried by different forecasts.
In this paper, we demonstrate that when base forecasts are based on different information sets and historical data are available, there is scope for improving these forecasts by combining the information that each one carries, even when they are already coherent. We propose a new method, called the information combination (IComb) method, which combines the information content of forecasts during the reconciliation process. The method is regression-based and can be implemented using existing penalised regression packages. We provide simulation evidence to illustrate the role of information sets, as distinct from the role of aggregation constraints, in forecasting hierarchical time series. Finally, we apply our method to datasets previously used in the literature and demonstrate that it achieves superior results compared to traditional approaches.2026-05-28T08:45:07ZMinh NguyenFarshid VahidShanika L Wickramasuriyahttp://arxiv.org/abs/2605.29541v1Change-point estimation for Weibull time series with copula-based Markov models2026-05-28T07:55:36ZWe study offline change-point estimation for time series data exhibiting nonlinear serial dependence. To address this problem, we propose a copula-based Markov chain model with Weibull marginal distributions, which is suitable for modeling nonnegative data such as event times and volatility measures. Nonlinear dependence is incorporated through the Clayton and Joe copulas, allowing the model to capture asymmetric lower-tail and upper-tail dependence structures, respectively. We derive the corresponding likelihood function and estimate the change point and model parameters using maximum likelihood estimation implemented through the Newton--Raphson algorithm. Confidence intervals are constructed via a parametric bootstrap Monte Carlo procedure. Extensive numerical studies are conducted to evaluate the finite-sample performance and robustness of the proposed method under different dependence structures and copula misspecification scenarios. The results demonstrate that the proposed estimators perform well in terms of RMSE and relative error, particularly for the estimation of the change point. An empirical application to the VIX index during the COVID-19 pandemic further illustrates the practical usefulness of the proposed approach in detecting structural changes in both the marginal distributions and serial dependence structure.2026-05-28T07:55:36ZLi-Hsien SunZong-Yuan HuangYi-Ling HuangChi-Yang ChiuNing Ninghttp://arxiv.org/abs/2604.13410v2Estimating Continuous Treatment Effects with Two-Stage Kernel Ridge Regression2026-05-28T07:51:17ZWe study the problem of estimating the effect function for a continuous treatment, which maps each treatment value to a population-averaged outcome. A central challenge in this setting is confounding: treatment assignment often depends on covariates, creating selection bias that makes direct regression of the response on treatment unreliable. To address this issue, we propose a two-stage kernel ridge regression method. In the first stage, we learn a model for the response as a function of both treatment and covariates; in the second stage, we use this model to construct pseudo-outcomes that correct for distribution shift, and then fit a second model to estimate the treatment effect. Although the response varies with both treatment and covariates, the induced effect function obtained by averaging over covariates is typically much simpler, and our estimator adapts to this structure. Our optimal learning bounds are achieved without estimating the conditional treatment density, thereby bypassing a major bottleneck in existing methods. Furthermore, we introduce a fully data-driven model selection procedure that achieves provable adaptivity to both the unknown degree of overlap and the spectral decay of the underlying kernel.2026-04-15T02:21:15ZSeok-Jin KimKaizheng Wanghttp://arxiv.org/abs/2605.29516v1Active learning strategy for excursion-set confidence regions of functional simulator outputs2026-05-28T07:36:26ZEstimating excursion set confidence regions seeks to identify regions where a function may exceed some threshold with a given confidence level. This paper focuses on estimating such confidence regions in cases where the function has random inputs and a functional output that is returned all at once. We develop a surrogate-based approach for estimating the confidence region, combining principal component analysis and Gaussian process regression. An active learning strategy is also introduced, based on a max-min criterion that selects new samples which are likely to reduce the uncertainty in the confidence region. This strategy leverages efficient sampling of the Gaussian process through a Karhunen-Loève expansion. The proposed approach is applied to estimate the confidence regions of three case studies: a synthetic function, the surface pressure coefficient distribution of a hypersonic vehicle, and the glide-back trajectory of a reusable launcher first stage. The method demonstrates efficiency in accurately estimating the confidence region while reducing sources of modeling uncertainties. It is benchmarked against reference methods from the literature. Relevant metrics for assessing the confidence region estimation performance are discussed.2026-05-28T07:36:26ZLucas BrunelMathieu BalesdentLoïc BrevaultRodolphe Le RicheBruno Sudrethttp://arxiv.org/abs/2408.15451v3Certified Causal Defense with Generalizable Robustness2026-05-28T07:12:57ZWhile machine learning models have proven effective across various scenarios, it is widely acknowledged that many models are vulnerable to adversarial attacks. Recently, there have emerged numerous efforts in adversarial defense. Among them, certified defense is well known for its theoretical guarantees against arbitrary adversarial perturbations on input within a certain range (e.g., $l_2$ ball). However, most existing works in this line struggle to generalize their certified robustness in other data domains with distribution shifts. This issue is rooted in the difficulty of eliminating the negative impact of spurious correlations on robustness in different domains. To address this problem, in this work, we propose a novel certified defense framework GLEAN, which incorporates a causal perspective into the generalization problem in certified defense. More specifically, our framework integrates a certifiable causal factor learning component to disentangle the causal relations and spurious correlations between input and label, and thereby exclude the negative effect of spurious correlations on defense. On top of that, we design a causally certified defense strategy to handle adversarial attacks on latent causal factors. In this way, our framework is not only robust against malicious noises on data in the training distribution but also can generalize its robustness across domains with distribution shifts. Extensive experiments on benchmark datasets validate the superiority of our framework in certified robustness generalization in different data domains. Code is available in the supplementary materials.2024-08-28T00:14:09ZAccepted by AAAI 2025Yiran QiaoYu YinChen ChenJing Mahttp://arxiv.org/abs/2605.29411v1The Good, the Bad, and the Ugly of Markov Boundary for Tabular Prediction2026-05-28T06:01:04ZUnder standard graphical assumptions, the Markov boundary of a target variable is the smallest set of features that renders every other feature redundant. Once the boundary is observed, the target is conditionally independent of the rest of the table. This is a tempting object for tabular prediction, since it names exactly the columns a model should need. Yet modern regressors are still trained on the full feature set. We ask whether the Markov boundary is genuinely useful for prediction on SCM3K, a 3,450-task synthetic SCM benchmark with feature counts from 40 to 1000 and six SCM families, evaluated with six regressors. The answer is more nuanced than the theory suggests. Restricting a regressor to the oracle boundary often improves prediction substantially, and the improvement grows as the feature space becomes larger and sparser. But the natural pipeline of recovering the boundary with causal discovery and training on the recovered mask does not deliver. Existing estimators exhaust the compute budget before reaching the regime where the boundary helps most, and even where they run they rarely beat the full feature set. We trace this to three causes. Discovery optimizes structural recovery rather than prediction. False negatives and false positives carry sharply asymmetric predictive cost. The exact boundary is only one of many feature sets that beat all features. We then develop what these facts imply for prediction-aligned feature selection and for tabular models that learn to use causal structure.2026-05-28T06:01:04Z11 pages, 9 figures, 2 tables. PreprintShu WanAbhinav GorantlaHuan LiuK. Selçuk Candanhttp://arxiv.org/abs/2605.29403v1Power Estimation for Longitudinal Studies with Time Dependent Covariates Using Generalized Method of Moments2026-05-28T05:56:01ZLongitudinal studies frequently incorporate covariates that evolve over time, creating complex dependence structures between outcomes and predictors. When covariates are time dependent, standard power analysis tools--largely developed for generalized estimating equations (GEE)--can yield misleading results because they do not account for the moment based structure required for valid marginal inference. Generalized Method of Moments (GMM) provides a flexible and efficient framework for estimating marginal effects in the presence of time dependent covariates, yet no practical tools exist for conducting power analysis under GMM. This paper introduces a modern, implementable framework for power estimation in longitudinal studies with time dependent covariates using GMM. Two complementary approaches are developed: a Wald based method that leverages the asymptotic normality of GMM estimators, and a distance metric method based on quadratic forms of sample and population moment conditions. Both approaches require only limited distributional assumptions and rely on valid moment conditions rather than full likelihood specification. We outline the theoretical foundations, provide step by step implementation guidance, and illustrate the methods using data from the Osteoarthritis Initiative. A simulation framework is presented for evaluating empirical performance. These methods fill a critical gap in the longitudinal modeling literature by offering applied researchers a practical, distribution light approach to power estimation when time dependent covariates are present and GMM is the preferred estimation technique.2026-05-28T05:56:01Z27 pages with appendix, 16 pages main manuscript, 3 figures in main manuscript, 7 figures including figures in appendixNiloofar RamezaniOliver Hurst