https://arxiv.org/api/6ZAvt/zPeLbWZjicaMmNgHW3AAY 2026-03-21T06:51:14Z 9966 195 15 http://arxiv.org/abs/2304.04724v3 When does Metropolized Hamiltonian Monte Carlo provably outperform Metropolis-adjusted Langevin algorithm? 2026-02-11T18:51:51Z

We analyze the mixing time of Metropolized Hamiltonian Monte Carlo (HMC) with the leapfrog integrator to sample from a distribution on $\mathbb{R}^d$ whose log-density is smooth, has Lipschitz Hessian in Frobenius norm and satisfies isoperimetry. We bound the gradient complexity to reach $ε$ error in total variation distance from a warm start by $\tilde O(d^{1/4}\text{polylog}(1/ε))$ and demonstrate the benefit of choosing the number of leapfrog steps to be larger than 1. To surpass the previous analysis on Metropolis-adjusted Langevin algorithm (MALA) that has $\tilde{O}(d^{1/2}\text{polylog}(1/ε))$ dimension dependency [WSC22], we reveal a key feature in our proof that the joint distribution of the location and velocity variables of the discretization of the continuous HMC dynamics stays approximately invariant. This key feature, when shown via induction over the number of leapfrog steps, enables us to obtain estimates on moments of various quantities that appear in the acceptance rate control of Metropolized HMC. Notably, our analysis does not require log-concavity or independence of the marginals, and only relies on an isoperimetric inequality. To illustrate the relevance of the Lipschitz Hessian in Frobenius norm assumption, several examples that fall into our framework are discussed.

2023-04-10T17:35:57Z 46 pages, fixed typos and minor issues Yuansi Chen Khashayar Gatmiry Minhui Jiang http://arxiv.org/abs/2602.11108v1 Large Scale High-Dimensional Reduced-Rank Linear Discriminant Analysis 2026-02-11T18:23:04Z

Reduced-rank linear discriminant analysis (RRLDA) is a foundational method of dimension reduction for classification that has been useful in a wide range of applications. The goal is to identify an optimal subspace to project the observations onto that simultaneously maximizes between-group variation while minimizing within-group differences. The solution is straight forward when the number of observations is greater than the number of features but computational difficulties arise in both the high-dimensional setting, where there are more features than there are observations, and when the data are very large. Many works have proposed solutions for the high-dimensional setting and frequently involve additional assumptions or tuning parameters. We propose a fast and simple iterative algorithm for both classical and high-dimensional RRLDA on large data that is free from these additional requirements and that comes with guarantees. We also explain how RRLDA-RK provides implicit regularization towards the least norm solution without explicitly incorporating penalties. We demonstrate our algorithm on real data and highlight some results.

2026-02-11T18:23:04Z Jocelyn T. Chi http://arxiv.org/abs/2602.11090v1 Direct Learning of Calibration-Aware Uncertainty for Neural PDE Surrogates 2026-02-11T17:57:20Z

Neural PDE surrogates are often deployed in data-limited or partially observed regimes where downstream decisions depend on calibrated uncertainty in addition to low prediction error. Existing approaches obtain uncertainty through ensemble replication, fixed stochastic noise such as dropout, or post hoc calibration. Cross-regularized uncertainty learns uncertainty parameters during training using gradients routed through a held-out regularization split. The predictor is optimized on the training split for fit, while low-dimensional uncertainty controls are optimized on the regularization split to reduce train-test mismatch, yielding regime-adaptive uncertainty without per-regime noise tuning. The framework can learn continuous noise levels at the output head, within hidden features, or within operator-specific components such as spectral modes. We instantiate the approach in Fourier Neural Operators and evaluate on APEBench sweeps over observed fraction and training-set size. Across these sweeps, the learned predictive distributions are better calibrated on held-out splits and the resulting uncertainty fields concentrate in high-error regions in one-step spatial diagnostics.

2026-02-11T17:57:20Z 13 pages, 11 figures Carlos Stein Brito http://arxiv.org/abs/2602.10960v1 Integrating granular data into a multilayer network: an interbank model of the euro area for systemic risk assessment 2026-02-11T15:50:53Z

Micro-structural models of contagion and systemic risk emphasize that shock propagation is inherently multi-channel, spanning counterparty exposures, short-term funding and roll-over risk, securities cross-holdings, and common-asset (fire-sale) spillovers. Empirical implementations, however, often rely on stylized or simulated networks, or focus on a single exposure dimension, reflecting the practical difficulty of reconciling heterogeneous granular collections into a coherent representation with consistent identifiers and consolidation rules. We close part of this gap by constructing an empirically grounded multilayer network for euro area significant banking groups that integrates several supervisory and statistical datasets into layer-consistent exposure matrices defined on a common node set. Each layer corresponds to a distinct transmission channel, long- and short-term credit, securities cross-holdings, short-term secured funding, and overlapping external portfolios, and nodes are enriched with balance-sheet information to support model calibration. We document pronounced cross-layer heterogeneity in connectivity and centrality, and show that an aggregated (flattened) representation can mask economically relevant structure and misidentify the institutions that are systemically important in specific markets. We then illustrate how the resulting network disciplines standard systemic-risk analytics by implementing a centrality-based propagation measure and a micro-structural agent-based framework on real exposures. The approach provides a data-grounded basis for layer-aware systemic-risk assessment and stress testing across multiple dimensions of the banking network.

2026-02-11T15:50:53Z Adv Data Anal Classif (2026) Ilias Aarab Thomas Gottron Andrea Colombo Jörg Reddig Annalauro Ianiro 10.1007/s11634-026-00668-7 http://arxiv.org/abs/2602.10714v1 A Non-asymptotic Analysis for Learning and Applying a Preconditioner in MCMC 2026-02-11T10:19:56Z

Preconditioning is a common method applied to modify Markov chain Monte Carlo algorithms with the goal of making them more efficient. In practice it is often extremely effective, even when the preconditioner is learned from the chain. We analyse and compare the finite-time computational costs of schemes which learn a preconditioner based on the target covariance or the expected Hessian of the target potential with that of a corresponding scheme that does not use preconditioning. We apply our results to the Unadjusted Langevin Algorithm (ULA) for an appropriately regular target, establishing non-asymptotic guarantees for preconditioned ULA which learns its preconditioner. Our results are also applied to the unadjusted underdamped Langevin algorithm in the supplementary material. To do so, we establish non-asymptotic guarantees on the time taken to collect $N$ approximately independent samples from the target for schemes that learn their preconditioners under the assumption that the underlying Markov chain satisfies a contraction condition in the Wasserstein-2 distance. This approximate independence condition, that we formalize, allows us to bridge the non-asymptotic bounds of modern MCMC theory and classical heuristics of effective sample size and mixing time, and is needed to amortise the costs of learning a preconditioner across the many samples it will be used to produce.

2026-02-11T10:19:56Z Max Hird Florian Maire Jeffrey Negrea http://arxiv.org/abs/2412.20481v2 EM algorithms for optimization problems with polynomial objectives 2026-02-11T01:24:34Z

The EM (Expectation-Maximization) algorithm is regarded as an MM (Majorization-Minimization) algorithm for maximum likelihood estimation of statistical models. Expanding this view, this paper demonstrates that by choosing an appropriate probability distribution, even nonstatistical optimization problem can be cast as a negative log-likelihood-like minimization problem, which can be approached by an EM (or MM) algorithm. When a polynomial objective is optimized over a simple polyhedral feasible set and an exponential family distribution is employed, the EM algorithm can be reduced to a natural gradient descent of the employed distribution with a constant step size. This is demonstrated through three examples. In this paper, we demonstrate the global convergence of specific cases with some exponential family distributions in a general form. In instances when the feasible set is not sufficiently simple, the use of MM algorithms can nevertheless be adequately described. When the objective is to minimize a convex quadratic function and the constraints are polyhedral, global convergence can also be established based on the existing results for an entropy-like proximal point algorithm.

2024-12-29T14:45:29Z Kensuke Asai Jun-ya Gotoh http://arxiv.org/abs/2505.18879v4 Efficient Online Random Sampling via Randomness Recycling 2026-02-10T15:26:36Z

This article studies the fundamental problem of using i.i.d. coin tosses from an entropy source to efficiently generate random variables $X_i \sim P_i$ $(i \ge 1)$, where $(P_1, P_2, \dots)$ is a random sequence of rational discrete probability distributions subject to an \textit{arbitrary} stochastic process. Our method achieves an amortized expected entropy cost within $\varepsilon > 0$ bits of the information-theoretically optimal Shannon lower bound using $O(\log(1/\varepsilon))$ space. This result holds both pointwise in terms of the Shannon information content conditioned on $X_i$ and $P_i$, and in expectation to obtain a rate of $\mathbb{E}[H(P_1) + \dots + H(P_n)]/n + \varepsilon$ bits per sample as $n \to \infty$ (where $H$ is the Shannon entropy). The combination of space, time, and entropy properties of our method improves upon the Knuth and Yao (1976) entropy-optimal algorithm and Han and Hoshi (1997) interval algorithm for online sampling, which require unbounded space. It also uses exponentially less space than the more specialized methods of Kozen and Soloviev (2022) and Shao and Wang (2025) that generate i.i.d. samples from a fixed distribution. Our online sampling algorithm rests on a powerful algorithmic technique called \textit{randomness recycling}, which reuses a fraction of the random information consumed by a probabilistic algorithm to reduce its amortized entropy cost. On the practical side, we develop randomness recycling techniques to accelerate a variety of prominent sampling algorithms. We show that randomness recycling enables state-of-the-art runtime performance on the Fisher-Yates shuffle when using a cryptographically secure pseudorandom number generator, and that it reduces the entropy cost of discrete Gaussian sampling. Accompanying the manuscript is a performant software library in the C programming language.

2025-05-24T21:34:08Z Proceedings of the 2026 Annual ACM-SIAM Symposium on Discrete Algorithms (SODA), pp. 2473-2511. Society for Industrial and Applied Mathematics, 2026 Thomas L. Draper Feras A. Saad 10.1137/1.9781611978971.89 http://arxiv.org/abs/2602.09845v1 Estimating Individual Customer Lifetime Values with R: The CLVTools Package 2026-02-10T14:49:37Z

Customer lifetime value (CLV) describes a customer's long-term economic value for a business. This metric is widely used in marketing, for example, to select customers for a marketing campaign. However, modeling CLV is challenging. When relying on customers' purchase histories, the input data is sparse. Additionally, given its long-term focus, prediction horizons are often longer than estimation periods. Probabilistic models are able to overcome these challenges and, thus, are a popular option among researchers and practitioners. The latter also appreciate their applicability for both small and big data as well as their robust predictive performance without any fine-tuning requirements. Their popularity is due to three characteristics: data parsimony, scalability, and predictive accuracy. The R package CLVTools provides an efficient and user-friendly implementation framework to apply key probabilistic models such as the Pareto/NBD and Gamma-Gamma model. Further, it provides access to the latest model extensions to include time-invariant and time-varying covariates, parameter regularization, and equality constraints. This article gives an overview of the fundamental ideas of these statistical models and illustrates their application to derive CLV predictions for existing and new customers.

2026-02-10T14:49:37Z Markus Meierer Patrick Bachmann Jeffrey Näf Patrik Schilter René Algesheimer http://arxiv.org/abs/2507.15529v4 Algorithms for Approximating Conditionally Optimal Bounds 2026-02-10T13:17:57Z

This work develops algorithms for non-parametric confidence regions for samples from a univariate distribution whose support is a discrete mesh bounded on the left. We generalize the theory of Learned-Miller to preorders over the sample space. In this context, we show that the lexicographic low and lexicographic high orders are in some way extremal in the class of monotone preorders. From this theory we derive several approximation algorithms: 1) Closed form approximations for the lexicographic low and high orders with error tending to zero in the mesh size; 2) A polynomial-time approximation scheme for quantile orders with error tending to zero in the mesh size; 3) Monte Carlo methods for calculating quantile and lexicographic low orders applicable to any mesh size.

2025-07-21T11:55:54Z George Bissias http://arxiv.org/abs/2506.05905v2 Sequential Monte Carlo approximations of Wasserstein--Fisher--Rao gradient flows 2026-02-10T09:10:57Z

We consider the problem of sampling from a probability distribution $π$. It is well known that this can be written as an optimisation problem over the space of probability distribution in which we aim to minimise the Kullback--Leibler divergence from $π$. We consider several partial differential equations (PDEs) whose solution is a minimiser of the Kullback--Leibler divergence from $π$ and connect them to well-known Monte Carlo algorithms. We focus in particular on PDEs obtained by considering the Wasserstein--Fisher--Rao geometry over the space of probabilities and show that these lead to a natural implementation using importance sampling and sequential Monte Carlo. We propose a novel algorithm to approximate the Wasserstein--Fisher--Rao flow of the Kullback--Leibler divergence and conduct an extensive empirical study to identify when these algorithms outperforms other popular Monte Carlo algorithms.

2025-06-06T09:24:46Z Changes from v1: the study of tempered dynamics was removed in favour of a larger experimental section Francesca R. Crucinio Sahani Pathiraja http://arxiv.org/abs/2602.09512v1 Continuous mixtures of Gaussian processes as models for spatial extremes 2026-02-10T08:11:37Z

Spatial modelling of extreme values allows studying the risk of joint occurrence of extreme events at different locations and is of significant interest in climatic and other environmental sciences. A popular class of dependence models for spatial extremes is that of random location-scale mixtures, in which a spatial "baseline" process is multiplied or shifted by a random variable, potentially altering its extremal dependence behaviour. Gaussian location-scale mixtures retain benefits of their Gaussian baseline processes while overcoming some of their limitations, such as symmetry, light tails and weak tail dependence. We review properties of Gaussian location-scale mixtures and develop novel constructions with interesting features, together with a general algorithm for conditional simulation from these models. We leverage their flexibility to propose extended extreme-value models, that allow for appropriately modelling not only the tails but also the bulk of the data. This is important in many applications and avoids the need to explicitly select the events considered as extreme. We propose new solutions for likelihood inference in parametric models of Gaussian location-scale mixtures, in order to avoid the numerical bottleneck given by the latent location and scale variables that can lead to high computational cost of standard likelihood evaluations. The effectiveness of the models and of the inference methods is confirmed with simulated data examples, and we present an application to wildfire-related weather variables in Portugal. Although not detailed here, the approaches would also be straightforward to use for modelling multivariate (non spatial) data.

2026-02-10T08:11:37Z Lorenzo Dell'Oro Carlo Gaetan Thomas Opitz http://arxiv.org/abs/2602.09247v1 Motivating REML via Prediction-Error Covariances in EM Updates for Linear Mixed Models 2026-02-09T22:24:48Z

We present a computational motivation for restricted maximum likelihood (REML) estimation in linear mixed models using an expectation--maximization (EM) algorithm. At each iteration, maximum likelihood (ML) and REML solve the same mixed-model equations for the best linear unbiased estimator (BLUE) of the fixed effects and the best linear unbiased predictor (BLUP) of the random effects. They differ only in the trace adjustments used in the variance-component updates: ML uses conditional covariances of the random effects given the data, whereas REML uses prediction-error covariances from Henderson's C-matrix, reflecting uncertainty from estimating the fixed effects. Short R code makes this switch explicit, exposes the key matrices for classroom inspection, and reproduces lme4 ML and REML fits.

2026-02-09T22:24:48Z Andrew T. Karl http://arxiv.org/abs/2105.13440v5 Non-negative matrix factorization algorithms generally improve topic model fits 2026-02-09T17:39:48Z

In an effort to develop topic modeling methods that can be quickly applied to large data sets, we revisit the problem of maximum-likelihood estimation in topic models. It is known, at least informally, that maximum-likelihood estimation in topic models is closely related to non-negative matrix factorization (NMF). Yet, to our knowledge, this relationship has not been exploited previously to fit topic models. We show that recent advances in NMF optimization methods can be leveraged to fit topic models very efficiently, often resulting in much better fits and in less time than existing algorithms for topic models. We also formally make the connection between the NMF optimization problem and maximum-likelihood estimation for the topic model, and using this result we show that the expectation maximization (EM) algorithm for the topic model is essentially the same as the classic multiplicative updates for NMF (the only difference being that the operations are performed in a different order). Our methods are implemented in the R package fastTopics.

2021-05-27T20:34:46Z Peter Carbonetto Abhishek Sarkar Zihao Wang Matthew Stephens http://arxiv.org/abs/2602.08577v1 An arithmetic method algorithm optimizing k-nearest neighbors compared to regression algorithms and evaluated on real world data sources 2026-02-09T12:17:16Z

Linear regression analysis focuses on predicting a numeric regressand value based on certain regressor values. In this context, k-Nearest Neighbors (k-NN) is a common non-parametric regression algorithm, which achieves efficient performance when compared with other algorithms in literature. In this research effort an optimization of the k-NN algorithm is proposed by exploiting the potentiality of an introduced arithmetic method, which can provide solutions for linear equations involving an arbitrary number of real variables. Specifically, an Arithmetic Method Algorithm (AMA) is adopted to assess the efficiency of the introduced arithmetic method, while an Arithmetic Method Regression (AMR) algorithm is proposed as an optimization of k-NN adopting the potentiality of AMA. Such algorithm is compared with other regression algorithms, according to an introduced optimal inference decision rule, and evaluated on certain real world data sources, which are publicly available. Results are promising since the proposed AMR algorithm has comparable performance with the other algorithms, while in most cases it achieves better performance than the k-NN. The output results indicate that introduced AMR is an optimization of k-NN.

2026-02-09T12:17:16Z Nature Scientific Reports Theodoros Anagnostopoulos Evanthia Zervoudi Christos Anagnostopoulos Apostolos Christopoulos Bogdan Wierzbinski 10.1038/s41598-025-33966-9 http://arxiv.org/abs/2602.08544v1 Adaptive Markovian Spatiotemporal Transfer Learning in Multivariate Bayesian Modeling 2026-02-09T11:45:01Z

This manuscript develops computationally efficient online learning for multivariate spatiotemporal models. The method relies on matrix-variate Gaussian distributions, dynamic linear models, and Bayesian predictive stacking to efficiently share information across temporal data shards. The model facilitates effective information propagation over time while seamlessly integrating spatial components within a dynamic framework, building a Markovian dependence structure between datasets at successive time instants. This structure supports flexible, high-dimensional modeling of complex dependence patterns, as commonly found in spatiotemporal phenomena, where computational challenges arise rapidly with increasing dimensions. The proposed approach further manages exact inference through predictive stacking, enhancing accuracy and interoperability. Combining sequential and parallel processing of temporal shards, each unit passes assimilated information forward, then back-smoothed to improve posterior estimates, incorporating all available information. This framework advances the scalability and adaptability of spatiotemporal modeling, making it suitable for dynamic, multivariate, and data-rich environments.

2026-02-09T11:45:01Z Luca Presicce Sudipto Banerjee