https://arxiv.org/api/9gaCbow3iLgy4GA4cB/nYBvpQ1c2026-03-25T08:37:27Z3471618015http://arxiv.org/abs/2505.17300v2Statistical Inference for Online Algorithms2026-03-18T14:52:57ZThe construction of confidence intervals and hypothesis tests for functionals is a cornerstone of statistical inference. Traditionally, the most efficient procedures - such as the Wald interval or the Likelihood Ratio Test - require both a point estimator and a consistent estimate of its asymptotic variance. However, when estimators are derived from online or sequential algorithms, computational constraints often preclude multiple passes over the data, complicating variance estimation. In this article, we propose a computationally efficient, rate-optimal wrapper method (HulC) that wraps around any online algorithm to produce asymptotically valid confidence regions bypassing the need for explicit asymptotic variance estimation. The method is provably valid for any online algorithm that yields an asymptotically normal estimator. We evaluate the practical performance of the proposed method primarily using Stochastic Gradient Descent (SGD) with Polyak-Ruppert averaging. Furthermore, we provide extensive numerical simulations comparing the performance of our approach (HulC) when used with other online algorithms, including implicit-SGD and ROOT-SGD.2025-05-22T21:31:49Z1) Adding to ASGD simulations, we add 5 other SGD algorithms: averaged-implicit-SGD, last-iterate-implicit-SGD, ROOT-SGD, truncated-SGD, and noisy-truncated-SGD. 2) We modify links to the online viz/GitHub pages. 3) We qualify previous conclusions on ASGD: ex, we claim that logistic regression is sometimes more challenging "in terms of achieving the target coverage" than linear regressionSelina CarterArun K Kuchibhotlahttp://arxiv.org/abs/2603.18114v1Transfer Learning for Contextual Joint Assortment-Pricing under Cross-Market Heterogeneity2026-03-18T14:48:04ZWe study transfer learning for contextual joint assortment-pricing under a multinomial logit choice model with bandit feedback. A seller operates across multiple related markets and observes only posted prices and realized purchases. While data from source markets can accelerate learning in a target market, cross-market differences in customer preferences may introduce systematic bias if pooled indiscriminately.
We model heterogeneity through a structured utility shift, where markets share a common contextual utility structure but differ along a sparse set of latent preference coordinates. Building on this, we develop Transfer Joint Assortment-Pricing (TJAP), a bias-aware framework that combines aggregate-then-debias estimation with a UCB-style policy. TJAP constructs two-radius confidence bounds that separately capture statistical uncertainty and transfer-induced bias, uniformly over continuous prices.
We establish matching minimax regret bounds of order $\tilde{O}\!\left(d\sqrt{\frac{T}{1+H}} + s_0\sqrt{T}\right),$revealing a transparent variance-bias tradeoff: transfer accelerates learning along shared preference directions, while heterogeneous components impose an irreducible adaptation cost. Numerical experiments corroborate the theory, showing that TJAP outperforms both target-only learning and naive pooling while remaining robust to cross-market differences.2026-03-18T14:48:04ZElynn ChenXi ChenYi Zhanghttp://arxiv.org/abs/2302.02415v3On Separability of Covariance in Multiway Data Analysis2026-03-18T14:18:26ZMultiway data analysis aims to uncover patterns in data structured as multi-indexed arrays, with multiway covariance playing a crucial role in many applications. However, the high dimensionality of multiway covariance presents significant computational challenges. To overcome these challenges, factorized covariance models have been proposed that rely on a separability assumption: the multiway covariance can be accurately expressed as a sum of Kronecker products of mode-wise covariances. This paper addresses the representability, certification, and approximation of such separable models, leaving statistical estimation or finite-sample properties aside. We reduce the question of whether a given covariance can be decomposed into a separable multiway form to an equivalent question about the separability of quantum states. Leveraging results from quantum information theory, we show that generic multiway covariances are typically \emph{not} separable and that determining the best separable approximation is NP-hard. These findings suggest that factorized covariance models can be overly restrictive and difficult to fit without additional structural assumptions. Nevertheless, our numerical experiments indicate that standard iterative algorithms, namely Frank-Wolfe and gradient descent, often converge close to the best separable approximation. As NP-hardness concerns worst-case computational complexity, Kronecker-separable approximations to multiway covariance could still be tractable to apply for analyzing many real-world datasets.2023-02-05T15:54:13Z45 pages, 8 figures, 3 tablesDogyoon SongAlfred O. Herohttp://arxiv.org/abs/2502.05021v7Gradient-based filtering under misspecification: Stability and error bounds2026-03-18T14:16:26ZCan stochastic gradient methods track a moving target? We study the problem of tracking multidimensional time-varying parameters under noisy observations and possible model misspecification. Gradient-based filters update the time-varying parameters using the gradient of a postulated objective function. A natural filtering objective is the logarithm of the postulated observation density, which gives rise to the widely used class of score-driven filters. As in the optimization literature, these filters come in two forms: explicit filters evaluate the gradient at the predicted parameter, whereas implicit filters evaluate it at the updated parameter. For both filter types, we derive novel sufficient conditions for exponential stability of the filtered parameter path, showing that stability can be guaranteed independently of the data-generating process. Under mild additional moment conditions on the data-generating process, we also obtain finite-sample and asymptotic mean squared error bounds relative to the pseudo-true parameter path. For implicit filters, these guarantees hold under weak parameter restrictions. For explicit filters, they additionally require Lipschitz continuity of the score and a sufficiently small learning rate. Simulation studies support our theoretical findings and show that implicit gradient filters outperform explicit ones in both accuracy and stability.2025-02-07T15:48:16Z62 pagesSimon Donker van HeelRutger-Jan LangeBram van OsDick van Dijkhttp://arxiv.org/abs/2603.17663v1More with Less - Bethel Allocation and Precision-Preserving Sample Size Reduction via Hierarchical Bayes Modelling2026-03-18T12:28:55ZStatistical offices face a familiar and intensifying dilemma: rising demand for detailed regional and domain-level estimates under budgets that are fixed or shrinking. National statistical offices (NSOs) either ignore the problem of optimal sample allocation for multiple target variables when designing a multi-purpose survey, or address it incorrectly - relying on ad hoc approaches such as computing Neyman allocations separately per variable and taking the element-wise maximum, a practice that simultaneously wastes budget and fails to guarantee precision across all domains. This paper presents a practical two-stage strategy that reframes the question: not how to allocate a given sample, but how small the sample can be made while still meeting pre-defined precision targets for all target variables across all geographic domains at once. The innovation lies not in inventing new methods, but in the novel combination of two well-established techniques applied to this cost-reduction problem: (i) multivariate constrained optimisation via Bethel allocation, which finds the globally minimum sample satisfying all precision constraints simultaneously; and (ii) Hierarchical Bayes (HB) small area modelling, which borrows strength across strata and permits a further reduction of the Bethel sample. The approach is validated using a Monte Carlo study (B = 1,000 replications) based on a synthetic labour-force population of one million individuals, where known population truth allows rigorous evaluation of precision, accuracy, and credible-interval coverage. Keywords: Bethel allocation; Hierarchical Bayes; small area estimation; sample size reduction; multivariate optimisation; labour force survey; coefficient of variation.2026-03-18T12:28:55Z29 pages,9 tables and 1 appendixSiu-Ming Tamhttp://arxiv.org/abs/2603.17628v1rSDNet: Unified Robust Neural Learning against Label Noise and Adversarial Attacks2026-03-18T11:47:46ZNeural networks are central to modern artificial intelligence, yet their training remains highly sensitive to data contamination. Standard neural classifiers are trained by minimizing the categorical cross-entropy loss, corresponding to maximum likelihood estimation under a multinomial model. While statistically efficient under ideal conditions, this approach is highly vulnerable to contaminated observations including label noises corrupting supervision in the output space, and adversarial perturbations inducing worst-case deviations in the input space. In this paper, we propose a unified and statistically grounded framework for robust neural classification that addresses both forms of contamination within a single learning objective. We formulate neural network training as a minimum-divergence estimation problem and introduce rSDNet, a robust learning algorithm based on the general class of $S$-divergences. The resulting training objective inherits robustness properties from classical statistical estimation, automatically down-weighting aberrant observations through model probabilities. We establish essential population-level properties of rSDNet, including Fisher consistency, classification calibration implying Bayes optimality, and robustness guarantees under uniform label noise and infinitesimal feature contamination. Experiments on three benchmark image classification datasets show that rSDNet improves robustness to label corruption and adversarial attacks while maintaining competitive accuracy on clean data, Our results highlight minimum-divergence learning as a principled and effective framework for robust neural classification under heterogeneous data contamination.2026-03-18T11:47:46ZPre-print; under reviewSuryasis JanaAbhik Ghoshhttp://arxiv.org/abs/2603.17599v1Prediction with Missing Data: Target Probabilities and Missingness Mechanisms2026-03-18T11:13:41ZConditions ensuring optimal parameter estimation in the presence of missing data are well established in inference, typically relying on the Missing-at-Random (MAR) assumption. In prediction, similar principles are often assumed to apply. However, methods considered biased in inference, such as pattern sub-modelling or unconditional imputation, have been shown to achieve optimal predictive performance under any missingness mechanism, including non-MAR (MNAR). To explain this apparent contradiction, we introduce a new formal framework for describing missingness in prediction. Central to this framework is a distinction between two prediction targets, defined according to whether or not the indicator of observation of the predictors is exploited to predict the outcome. This distinction leads to a classification of the missingness mechanisms describing the conditions under which these targets are equal, and when consistent prediction of each is achievable. A key result is that both targets may be consistently predicted under conditions weaker than MAR. We discuss the implications of this paradigm for handling missing data in prediction, distinguishing between missingness at development, validation and deployment of a forecaster. The findings are illustrated using simulated data and a real-world application with the prediction of significant injury after trauma upon arrival at the emergency department.2026-03-18T11:13:41Z55 pages (including 40 pages for the main article and 15 pages for the supplementary material)Pierre CatoireRobin GenuerCecile Proust-Limahttp://arxiv.org/abs/2603.17502v1A lightweight framework for characterising extreme precipitation events in climate ensembles2026-03-18T09:06:13ZThis article summarises the methods used by the team ``Ca' Foscari" for the EVA 2025 Data Challenge. The questions of the challenge concern the estimation of exceedance probabilities across several locations. Rather than modelling the spatial dependence structure, we reduce the problems to univariate ones by considering relevant spatial order statistics across the sites. Within a Peaks over Threshold framework, we model the marginal distributions of exceedances using generalised Pareto distributions. Generalised additive models are employed to allow the parameters to vary as functions of external predictors, which for all questions are reduced to the month. For questions 1 and 2, the required estimates and confidence intervals are obtained by generating samples from our fitted models. Question 3 involves the dependence between two consecutive observed statistics. To account for this temporal dependence, we fit a conditional extreme value model and derive empirical estimates of persistent extreme events by simulating from this model.2026-03-18T09:06:13ZDáire HealyIsadora Antoniano-VillalobosClaudia CollarinNathan HuetIlaria ProsdocimiEmilia Sivierohttp://arxiv.org/abs/2507.00641v3Hebbian Physics Networks: A Self-Organizing Computational Architecture Based on Local Physical Laws2026-03-18T08:55:11ZPhysical transport processes organize through local interactions that redistribute imbalance while preserving conservation. Classical solvers enforce this organization by applying fixed discrete operators on rigid grids. We introduce the Hebbian Physics Network (HPN), a computational framework that replaces this rigid scaffolding with a plastic transport geometry. An HPN is a coupled dynamical system of physical states on nodes and constitutive weights on edges in a graph. Residuals--local violations of continuity, momentum balance, or energy conservation--act as thermodynamic forces that drive the joint evolution of both the state and the operator (i.e. the adaptive weights). The weights adapt through a three-factor Hebbian rule, which we prove constitutes a strictly local gradient descent on the residual energy. This mechanism ensures thermodynamic stability: near equilibrium, the learned operator naturally converges to a symmetric, positive-definite form, rigorously reproducing Onsagerś reciprocal relations without explicit enforcement. Far from equilibrium, the system undergoes a self-organizing search for a transport topology that restores global coercivity. Unlike optimization-based approaches that impose physics through global loss functions, HPNs embed conservation intrinsically: transport is restored locally by the evolving operator itself, without a global Poisson solve or backpropagated objective. We demonstrate the framework on scalar diffusion and incompressible lid-driven cavity flow, showing that physically consistent transport geometries and flow structures emerge from random initial conditions solely through residual-driven local adaptation. HPNs thus reframe computation not as the solution of a fixed equation, but as a thermodynamic relaxation process where the constitutive geometry and physical state co-evolve.2025-07-01T10:34:14Z16 pages, 3 figuresGunjan AutiHirofumi DaigujiGouhei Tanaka10.1103/tzgk-jqj4http://arxiv.org/abs/2603.17469v1Fast and scalable inference in hidden Markov models with Gaussian fields2026-03-18T08:22:46ZHidden Markov models (HMMs) are powerful tools for analysing time series data that depend on discrete underlying but unobserved states. As such, they have gained prominence across numerous empirical disciplines, in particular ecology, medicine, and economics. However, the increasing complexity of empirical data is often accompanied by additional latent structure such as spatial effects, temporal trends, or measurement perturbations. Gaussian fields provide an attractive building block for incorporating such structured latent variation into HMMs. Fast inference methods for Gaussian fields have emerged through the stochastic partial differential equation (SPDE) approach. Due to their sparse representation, these integrate well with novel frequentist estimation methods for random-effects models via the use of automatic differentiation and the Laplace approximation. Scaling to high dimensions requires tools such as (R)TMB to exploit sparsity in the Hessian w.r.t. the latent variables - a property satisfied by SPDE fields but violated by the HMM likelihood. We present a modified forward algorithm to compute the HMM likelihood, constructing sparsity in the Hessian and consequently enabling fast and scalable inference. We demonstrate the practical feasibility and the usefulness through simulations and two case studies exploring the detection of stellar flares as well as modelling the movement of lions.2026-03-18T08:22:46Z37 pages, 13 figuresJan-Ole Fischerhttp://arxiv.org/abs/2603.17460v1Algorithms for Models with Intractable Normalizing Functions2026-03-18T08:05:17ZIn this paper we discuss a well known computing problem -- inference for models with intractable normalizing functions. Models with intractable normalizing functions arise in a wide variety of areas, for instance network models, models for spatial data on lattices, spatial point processes, flexible models for count data and gene expression, and models for permutations. Simulating from these models for fixed parameter values is well studied, starting with work dating back seventy years to the origin of the Metropolis algorithm. On the other hand some of the most practical and theoretically justified algorithms for inference, particularly Bayesian inference, have only been developed within the past two decades. The most computationally efficient algorithms often do not have well developed theory and few if any approaches exist for assessing the quality of approximations based on them. For many problems even the best algorithms can be computationally infeasible. Hence, this is an exciting area of research with many open problems. We explain several key algorithms, providing connections and touching upon practical advantages and disadvantages of each, with some discussion of theoretical properties where they impact practice. We discuss an approach for assessing the accuracy of approximations produced by these algorithms; this diagnostic is particularly valuable for algorithm tuning. While our focus is largely on models with intractable normalizing functions, we also discuss algorithms that are more broadly applicable to models where the entire likelihood function is intractable; these methods are of course also applicable to intractable normalizing function problems.2026-03-18T08:05:17ZMurali HaranBokgyeong KangJaewoo Parkhttp://arxiv.org/abs/2603.17327v1Empirical Likelihood Inference for Sen and Sen--Shorrocks--Thon Indices2026-03-18T03:44:37ZThe Sen index and Sen-Shorrocks-Thon (SST) index are widely used measures of poverty indices. Developing reliable inference for these measures enables us to compare these measures in different populations of interest in an effective way. It is important to construct confidence intervals for the Sen index and SST index, which provide better coverage probability and shorter interval length. Motivated by this, we discuss empirical likelihood (EL) and jackknife empirical likelihood (JEL) based inference for the Sen index. To derive a JEL-based confidence interval for the Sen and SST indices, we propose a new estimator for the Sen index using the theory of U-statistics and examine its properties. The large sample properties of the EL and JEL ratio statistics are studied. We also discuss EL and JEL-based inference for the Sen-Shorrocks-Thon (SST) index. The finite sample performance of the EL and JEL-based confidence intervals of both Sen and SST indices is evaluated through a Monte Carlo simulation study. Finally, we illustrate our methods using individual-level data from the Panel Study of Income Dynamics (PSID) survey from the US as well as Indian household level income data for different states sourced from the Consumer Pyramids Household Survey (CPHS).2026-03-18T03:44:37ZSreelakshmi NSaparya SureshSudheesh K. Kattumannilhttp://arxiv.org/abs/2602.14286v3Online LLM watermark detection via e-processes2026-03-18T03:26:04ZWatermarking for large language models (LLMs) has emerged as an effective tool for distinguishing AI-generated text from human-written content. Statistically, watermark schemes induce dependence between generated tokens and a pseudo-random sequence, reducing watermark detection to a hypothesis testing problem on independence. We develop a unified framework for LLM watermark detection based on e-processes, providing anytime-valid guarantees for online testing. We propose various methods to construct empirically adaptive e-processes that can enhance the detection power. The proposed methods are applicable to any sequential testing problem where independent pivotal statistics are available. In addition, theoretical results are established to characterize the power properties of the proposed procedures. Some experiments demonstrate that the proposed framework achieves competitive performance compared to existing watermark detection methods.2026-02-15T19:37:06ZWeijie SuRuodu WangZinan Zhaohttp://arxiv.org/abs/2603.00269v3Robust Regression with Student's T: The Role of Degrees of Freedom2026-03-18T02:57:45ZLinear regression estimators are known to be sensitive to outliers, and one alternative to obtain a robust and efficient estimator of the regression parameter is to model the error with Student's $t$ distribution. In this article, we compare estimators of the degrees of freedom parameter in the $t$ distribution using frequentist and Bayesian methods, and then study properties of the corresponding estimated regression coefficient. We also include the comparison with some recommended approaches in the literature, including fixing the degrees of freedom and robust regression using the Huber loss. Our extensive simulations on both synthetic and real data demonstrate that estimating the degrees of freedom via the adjusted profile log-likelihood approach yields regression coefficient estimators with high accuracy, performing comparably to the maximum likelihood estimators where the degrees of freedom are fixed at their true values. These findings provide a detailed synthesis of $t$-based robust regression and underscore a key insight: the proper calibration of the degrees of freedom is as crucial as the choice of the robust distribution itself for achieving optimal performance. The {\tt R} package that implements our method is available at https://github.com/amanda-ng518/RobustTRegression.2026-02-27T19:37:38ZAmanda NgShangkai ZhuArcher Gong ZhangNancy Reidhttp://arxiv.org/abs/2603.17294v1Bayesian Scalar-on-Tensor Quantile Regression for Longitudinal Data on Alzheimer's Disease2026-03-18T02:43:01ZAs a general and robust alternative to traditional mean regression models, quantile regression avoids the assumption of normally distributed errors, making it a versatile choice when modeling outcomes such as cognitive scores that typically have skewed distributions. Motivated by an application to Alzheimer's disease data where the aim is to explore how brain-behavior associations change over time, we propose a novel Bayesian tensor quantile regression for high-dimensional longitudinal imaging data. The proposed approach distinguishes between effects that are consistent across visits and patterns unique to each visit, contributing to the overall longitudinal trajectory. A low-rank decomposition is employed on the tensor coefficients which reduces dimensionality and preserves spatial configurations of the imaging voxels. We incorporate multiway shrinkage priors to model the visit-invariant tensor coefficients and variable selection priors on the tensor margins of the visit-specific effects. For posterior inference, we develop a computationally efficient Markov chain Monte Carlo sampling algorithm. Simulation studies reveal significant improvements in parameter estimation, feature selection, and prediction performance when compared with existing approaches. In the analysis of the Alzheimer's disease data, the flexibility of our modeling approach brings new insights as it provides a fuller picture of the relationship between the imaging voxels and the quantile distributions of the cognitive scores.2026-03-18T02:43:01ZRongke LyuMarina VannucciSuprateek Kundu