https://arxiv.org/api/VrGl02KuGRYV07rv+ig1VbA6fYU2026-03-20T16:17:23Z99664515http://arxiv.org/abs/2504.06903v2Network Cross-Validation and Model Selection via Subsampling2026-03-11T15:18:03ZComplex and larger networks are becoming increasingly prevalent in scientific applications in various domains. Although a number of models and methods exist for such networks, cross-validation on networks remains challenging due to the unique structure of network data. In this paper, we propose a general cross-validation procedure called NETCROP (NETwork CRoss-Validation using Overlapping Partitions). The key idea is to divide the original network into multiple subnetworks with a shared overlap part, producing training sets consisting of the subnetworks and a test set with the node pairs between the subnetworks. This train-test split provides the basis for a network cross-validation procedure that can be applied on a wide range of model selection and parameter tuning problems for networks. The method is computationally efficient for large networks as it uses smaller subnetworks for the training step. We provide methodological details and theoretical guarantees for several model selection and parameter tuning tasks using NETCROP. Numerical results demonstrate that NETCROP performs accurate cross-validation on a diverse set of network model selection and parameter tuning problems. The results also indicate that NETCROP is computationally much faster while being often more accurate than the existing methods for network cross-validation.2025-04-09T14:03:40ZSayan ChakrabartySrijan SenguptaYuguo Chenhttp://arxiv.org/abs/2509.18149v2Tensor Train Completion from Fiberwise Observations Along a Single Mode2026-03-11T12:02:52ZTensor completion is an extension of matrix completion aimed at recovering a multiway data tensor by leveraging a given subset of its entries (observations) and the pattern of observation. The low-rank assumption is key in establishing a relationship between the observed and unobserved entries of the tensor. The low-rank tensor completion problem is typically solved using numerical optimization techniques, where the rank information is used either implicitly (in the rank minimization approach) or explicitly (in the error minimization approach). Current theories concerning these techniques often study probabilistic recovery guarantees under conditions such as random uniform observations and incoherence requirements. However, if an observation pattern exhibits some low-rank structure that can be exploited, more efficient algorithms with deterministic recovery guarantees can be designed by leveraging this structure. This work shows how to use only standard linear algebra operations to compute the tensor train decomposition of a specific type of ``fiber-wise'' observed tensor, where some of the fibers of a tensor (along a single specific mode) are either fully observed or entirely missing, unlike the usual entry-wise observations. From an application viewpoint, this setting is relevant when it is easier to sample or collect a multiway data tensor along a specific mode (e.g., temporal). The proposed completion method is fast and is guaranteed to work under reasonable deterministic conditions on the observation pattern. Through numerical experiments, we showcase interesting applications and use cases that illustrate the effectiveness of the proposed approach.2025-09-16T09:42:33Z26 pages, 12 figuresMathematics 2026, 14(5), 922Shakir Showkat SofiLieven De Lathauwer10.3390/math14050922http://arxiv.org/abs/2603.10687v1A Python implementation of some geometric tools on Kendall 3D shape space for practical applications2026-03-11T11:57:43ZThis work addresses the challenge of analyzing geometric structures using Kendall's 3D Shape Space. While Riemannian geometry provides a robust framework for shape analysis (independent of scale, position, and orientation) the transition from theoretical manifolds to practical computational workflows remains difficult. Although Geomstats is currently the leading Python library for manifold-based statistics, it lacks specific utilities required for advanced 3D shape analysis. This article introduces tools designed to bridge this gap, translating complex mathematical abstractions into efficient, accessible software solutions for researchers.2026-03-11T11:57:43ZJorge ValeroVicent Gimeno i GarciaM. Victoría IbáñezPau MartinavarroAmelia Simóhttp://arxiv.org/abs/2408.09155v2Learning Robust Treatment Rules for Censored Data2026-03-11T11:33:58ZThere is a fast-growing literature on estimating optimal treatment rules directly by maximizing the expected outcome. In biomedical studies and operations applications, censored survival outcome is frequently observed, in which case the truncated mean survival time and survival probability are of great interest. In this paper, we propose two robust criteria for learning optimal treatment rules with censored survival outcomes; the former one targets an optimal treatment rule maximizing the truncated mean survival time, where the cutoff is specified by a given quantile such as median; the latter one targets an optimal treatment rule maximizing buffered survival probabilities, where the predetermined threshold is adjusted to account for the truncated mean survival time. We develop a sampling-based difference-of-convex algorithm for learning the proposed optimal treatment rules, and provide theoretical justifications for them. In simulation studies, our estimators show improved performance compared to existing methods. We also demonstrate the proposed method using AIDS clinical trial data.2024-08-17T09:58:58ZYifan CuiJunyi LiuTao ShenZhengling QiXi Chenhttp://arxiv.org/abs/2508.03059v3Two-sample comparison through additive tree models for density ratios2026-03-11T04:12:52ZThe ratio of two densities provides a direct characterization of their differences. We consider the two-sample comparison problem by estimating this ratio given i.i.d. observations from two distributions. To this end, we propose additive tree models for density ratio estimation along with efficient algorithms using a new loss function, the balancing loss. The loss allows tree-based models to be trained using several algorithms originally designed for supervised learning, such as forward-stagewise optimization and gradient boosting. Moreover, the balancing loss resembles an exponential family kernel, and it can serve as a pseudo-likelihood with conjugate priors. This property enables generalized Bayesian inference on the density ratio using backfitting samplers designed for Bayesian additive regression trees (BART). Our Bayesian strategy provides uncertainty quantification for the inferred density ratio, which is critical for applications involving high-dimensional and data-limited distributions with potentially substantial uncertainty. We further show connections of the balancing loss to the exponential loss in binary classification and to the variational form of f-divergence, particularly the squared Hellinger distance. Numerical experiments demonstrate that our method achieves both accuracy and computational efficiency, while uniquely providing uncertainty quantification. Finally, we demonstrate its application to assessing the quality of generative models for microbiome compositional data.2025-08-05T04:08:49ZNaoki AwayaYuliang XuLi Mahttp://arxiv.org/abs/2603.10382v1Gimbal Regression: Orientation-Adaptive Local Linear Regression under Spatial Heterogeneity2026-03-11T03:51:57ZLocal regression is widely used to explore spatial heterogeneity, but anisotropic or effectively low-dimensional neighborhoods can produce ill-conditioned local solves, causing coefficient variation driven by numerical artifacts rather than substantive structure. Such instability is often hidden when estimation relies on implicit tuning or optimization without exposing local diagnostics.
This paper proposes Gimbal Regression (GR), a deterministic, geometry-aware local regression framework for stable and auditable estimation. GR constructs directional weights from neighborhood geometry using explicit orientation objects and deterministic safeguards, and computes local coefficients by a closed-form solve. Theoretical results are stated conditional on the realized neighborhood configuration, under which the estimator is a deterministic linear operator with finite-perturbation stability bounds. Simulations and empirical examples demonstrate predictable computation, transparent diagnostics, and improved numerical stability relative to common local regression baselines.2026-03-11T03:51:57ZYuichiro Otanihttp://arxiv.org/abs/2603.10318v1Optimising two-block averaging kernels to speed up Markov chains2026-03-11T01:40:02ZWe study the problem of selecting optimal two-block partitions to accelerate the mixing of finite Markov chains under group-averaging transformations. The main objectives considered are the Kullback-Leibler (KL) divergence and the Frobenius distance to stationarity. We establish explicit connections between these objectives and the induced projection chain. In the case of the KL divergence, this reduction yields explicit decay rates in terms of the log-Sobolev constant. For the Frobenius distance, we identify a Cheeger-type functional that characterises optimal cuts. This formulation recasts two-block selection as a structured combinatorial optimisation problem admitting difference-of-submodular decompositions. We further propose several algorithmic approximations, including majorisation-minimisation and coordinate descent schemes, as computationally feasible alternatives to exhaustive combinatorial search. Our numerical experiments reveal that optimal cuts under the two objectives can substantially reduce total variation distance to stationarity and demonstrate the practical effectiveness of the proposed approximation algorithms.2026-03-11T01:40:02Z45 pages, 5 figuresRyan J. Y. LimMichael C. H. Choihttp://arxiv.org/abs/2411.08821v3Conditional Local Importance by Quantile Expectations2026-03-10T23:53:48ZGlobal variable importance measures are commonly used to interpret the results of machine learning models. Local variable importance techniques assess how variables contribute to individual observations. Current, popular methods, including LIME and SHAP, typically fail to accurately reflect locally dependent relationships between variables and instead focus on marginal importance values. Additionally, they are not natively adapted for multi-class classification problems. We propose a new model-agnostic method for calculating local variable importance, CLIQUE, that captures locally dependent relationships, provides improvements over permutation-based methods, and can be directly applied to multi-class classification problems. Simulated and real-world examples show that CLIQUE emphasizes locally dependent information, captures interaction behavior beyond what can be evaluated by correlations, and properly reduces bias in regions where variables do not affect the response.2024-11-13T17:59:44Z22 pages, 18 figuresKelvyn K. BladenAdele CutlerD. Richard CutlerKevin R. Moonhttp://arxiv.org/abs/2505.09828v2Optimally balancing exploration and exploitation to automate multi-fidelity statistical estimation2026-03-10T19:03:43ZMulti-fidelity methods that use an ensemble of models to compute a Monte Carlo estimator of the expectation of a high-fidelity model can significantly reduce computational costs compared to single-model approaches. These methods use oracle statistics, specifically the covariance between models, to optimally allocate samples to each model in the ensemble. However, in practice, the oracle statistics are estimated using additional model evaluations, whose computational cost and induced error are typically ignored. To address this issue, this paper proposes an adaptive algorithm to optimally balance the resources between oracle statistics estimation and final multi-fidelity estimator construction, leveraging ideas from multilevel best linear unbiased estimators in Schaden and Ullmann (2020) and a bandit-learning procedure in Xu et al. (2022). Under mild assumptions, we demonstrate that the multi-fidelity estimator produced by the proposed algorithm exhibits mean-squared error commensurate with that of the best linear unbiased estimator under the optimal allocation computed with oracle statistics. Our theoretical findings are supported by detailed numerical experiments, including a parametric elliptic PDE and an ice-sheet mass-change modeling problem.2025-05-14T22:15:32Z40 pagesThomas DixonAlex GorodetskyJohn JakemanAkil NarayanYiming Xuhttp://arxiv.org/abs/2506.09762v2Parallel computations for Metropolis Markov chains with Picard maps2026-03-10T07:41:55ZWe develop parallel algorithms for simulating zeroth-order (aka gradient-free) Metropolis Markov chains based on the Picard map. For Random Walk Metropolis Markov chains targeting log-concave distributions $π$ on $\mathbb{R}^d$, our algorithm generates samples close to $π$ in $\mathcal{O}(\sqrt{d})$ parallel iterations with $\mathcal{O}(\sqrt{d})$ processors, therefore speeding up the convergence of the corresponding sequential implementation by a factor $\sqrt{d}$. Furthermore, a modification of our algorithm generates samples from an approximate measure $ π_r$ in $\mathcal{O}(1)$ parallel iterations and $\mathcal{O}(d)$ processors. We empirically assess the performance of the proposed algorithms in high-dimensional regression problems, an epidemic model where the gradient is unavailable and a real-word application in precision medicine. Our algorithms are straightforward to implement and may constitute a useful tool for practitioners seeking to sample from a prescribed distribution $π$ using only point-wise evaluations of $\logπ$ and parallel computing.2025-06-11T14:03:55Z37 pages, 9 figuresSebastiano GrazziGiacomo Zanellahttp://arxiv.org/abs/2601.05355v2An AI-powered Bayesian Generative Modeling Approach for Arbitrary Conditional Inference2026-03-10T04:22:45ZModern data analysis increasingly requires flexible conditional inference P(X_B | X_A) where (X_A, X_B) is an arbitrary partition of observed variable X. Existing approaches are either restricted to a fixed conditioning structure or depend strongly on the distribution of conditioning masks during training. To address these limitations, we introduce Bayesian generative modeling (BGM), a unified framework for arbitrary conditional inference. BGM learns a generative model of X via a stochastic iterative Bayesian updating algorithm in which model parameters and latent variables are updated until convergence. Once trained, any conditional distribution can be obtained without retraining. Empirically, BGM achieves superior predictive performance with posterior predictive intervals, demonstrating that a single learned model can serve as a universal engine for conditional prediction with principled uncertainty quantification. We provide theoretical guarantees for convergence of the stochastic iterative algorithm, statistical consistency, and conditional risk bounds. The proposed BGM framework leverages modern AI to capture complex relationships among variables while adhering to Bayesian principles, offering a promising approach for a wide range of applications in modern data science. Code for BGM is available at https://github.com/liuq-lab/bayesgm. Document of BGM is available at https://bayesgm.readthedocs.io.2026-01-08T20:14:30ZQiao LiuWing Hung Wonghttp://arxiv.org/abs/2603.09089v1Sampling on Discrete Spaces with Temporal Point Processes2026-03-10T01:58:49ZTemporal point processes offer a powerful framework for sampling from discrete distributions, yet they remain underutilized in existing literature. We show how to construct, for any target multivariate count distribution with downward-closed support, a multivariate temporal point process whose event-count vector in a fixed-length sliding window converges in distribution to the target as time tends to infinity. Structured as a system of potentially coupled infinite-server queues with deterministic service times, the sampler exhibits a discrete form of momentum that suppresses random-walk behaviour. The admissible families of processes permit both reversible and non-reversible dynamics. As an application, we derive a recurrent stochastic neural network whose dynamics implement sampling-based computation and exhibit some biologically plausible features, including relative refractory periods and oscillatory dynamics. The introduction of auxiliary randomness reduces the sampler to a birth-death process, establishing the latter as a degenerate case with the same limiting distribution. In simulations on 63 target distributions, our sampler always outperforms these birth-death processes and frequently outperforms Zanella processes in multivariate effective sample size, with further gains when normalized by CPU time.2026-03-10T01:58:49Z20 pages, 1 figureCameron A. StewartGatsby Computational Neuroscience Unit, University College London, London, U.KManeesh SahaniGatsby Computational Neuroscience Unit, University College London, London, U.Khttp://arxiv.org/abs/2603.08676v1Momentum SVGD-EM for Accelerated Maximum Marginal Likelihood Estimation2026-03-09T17:47:36ZMaximum marginal likelihood estimation (MMLE) can be formulated as the optimization of a free energy functional. From this viewpoint, the Expectation-Maximisation (EM) algorithm admits a natural interpretation as a coordinate descent method over the joint space of model parameters and probability measures. Recently, a significant body of work has adopted this perspective, leading to interacting particle algorithms for MMLE. In this paper, we propose an accelerated version of one such procedure, based on Stein variational gradient descent (SVGD), by introducing Nesterov acceleration in both the parameter updates and in the space of probability measures. The resulting method, termed Momentum SVGD-EM, consistently accelerates convergence in terms of required iterations across various tasks of increasing difficulty, demonstrating effectiveness in both low- and high-dimensional settings.2026-03-09T17:47:36ZAccepted to AISTATS 2026Adam RozzioRafael AthanasiadesO. Deniz Akyildizhttp://arxiv.org/abs/2405.08290v3MCMC using $\textit{bouncy}$ Hamiltonian dynamics: A unifying framework for Hamiltonian Monte Carlo and piecewise deterministic Markov process samplers2026-03-09T16:42:11ZPiecewise-deterministic Markov process (PDMP) samplers constitute a state-of-the-art Markov chain Monte Carlo paradigm in Bayesian computation, with examples including the zig-zag and bouncy particle sampler (bps). Recent work on the zig-zag has indicated its connection to Hamiltonian Monte Carlo (HMC), a version of the Metropolis algorithm that exploits Hamiltonian dynamics. Here we establish that, in fact, the connection between the two paradigms extends far beyond the specific instance. The key lies in (1) the fact that any time-reversible deterministic dynamics provides a valid Metropolis proposal and (2) how PDMPs' characteristic velocity changes constitute an alternative to the usual acceptance-rejection. We turn this observation into a rigorous framework for constructing rejection-free Metropolis proposals based on bouncy Hamiltonian dynamics which simultaneously possess Hamiltonian-like properties and generate discontinuous trajectories similar in appearance to PDMPs. When combined with periodic refreshment of the inertia, the dynamics converge strongly to PDMP equivalents in the limit of increasingly frequent refreshment. We demonstrate the practical implications of this new framework with a sampler based on a bouncy Hamiltonian dynamics closely related to the bps. The resulting sampler exhibits competitive performance on challenging real-data posteriors involving tens of thousands of parameters. As the sampler of choice in modern probabilistic programming languages, HMC plays a critical role in applied Bayesian modeling; by generalizing the paradigm and elucidating its connection to the leading competitor, our framework opens up opportunities for cross-pollination and innovation to further scale Bayesian inference.2024-05-14T03:13:55ZAndrew ChinAkihiko Nishimurahttp://arxiv.org/abs/2409.09787v5BNEM: A Boltzmann Sampler Based on Bootstrapped Noised Energy Matching2026-03-09T15:51:28ZDeveloping an efficient sampler capable of generating independent and identically distributed (IID) samples from a Boltzmann distribution is a crucial challenge in scientific research, e.g. molecular dynamics. In this work, we intend to learn neural samplers given energy functions instead of data sampled from the Boltzmann distribution. By learning the energies of the noised data, we propose a diffusion-based sampler, Noised Energy Matching, which theoretically has lower variance and more complexity compared to related works. Furthermore, a novel bootstrapping technique is applied to NEM to balance between bias and variance. We evaluate NEM and BNEM on a 2-dimensional 40 Gaussian Mixture Model (GMM) and a 4-particle double-well potential (DW-4). The experimental results demonstrate that BNEM can achieve state-of-the-art performance while being more robust.2024-09-15T16:41:30ZCamera-ready version for TMLR (03/2026)Transactions on Machine Learning Research (TMLR), 2026RuiKang OuYangBo QiangJosé Miguel Hernández-Lobato