https://arxiv.org/api/niSOIGhIUC4mfAon0DGMhSY7x982026-03-21T05:37:40Z996618015http://arxiv.org/abs/2602.14280v1Fast Compute for ML Optimization2026-02-15T19:09:58ZWe study optimization for losses that admit a variance-mean scale-mixture representation. Under this representation, each EM iteration is a weighted least squares update in which latent variables determine observation and parameter weights; these play roles analogous to Adam's second-moment scaling and AdamW's weight decay, but are derived from the model. The resulting Scale Mixture EM (SM-EM) algorithm removes user-specified learning-rate and momentum schedules. On synthetic ill-conditioned logistic regression benchmarks with $p \in \{20, \ldots, 500\}$, SM-EM with Nesterov acceleration attains up to $13\times$ lower final loss than Adam tuned by learning-rate grid search. For a 40-point regularization path, sharing sufficient statistics across penalty values yields a $10\times$ runtime reduction relative to the same tuned-Adam protocol. For the base (non-accelerated) algorithm, EM monotonicity guarantees nonincreasing objective values; adding Nesterov extrapolation trades this guarantee for faster empirical convergence.2026-02-15T19:09:58ZNick PolsonVadim Sokolovhttp://arxiv.org/abs/2511.19628v2Optimization and Regularization Under Arbitrary Objectives2026-02-15T13:56:11ZThis study investigates the limitations of applying Markov Chain Monte Carlo (MCMC) methods to arbitrary objective functions, focusing on a two-block MCMC framework which alternates between Metropolis-Hastings and Gibbs sampling. While such approaches are often considered advantageous for enabling data-driven regularization, we show that their performance critically depends on the sharpness of the employed likelihood form. By introducing a sharpness parameter and exploring alternative likelihood formulations proportional to the target objective function, we demonstrate how likelihood curvature governs both in-sample performance and the degree of regularization inferred by the training data. Empirical applications are conducted on reinforcement learning tasks: including a navigation problem and the game of tic-tac-toe. The study concludes with a separate analysis examining the implications of extreme likelihood sharpness on arbitrary objective functions stemming from the classic game of blackjack, where the first block of the two-block MCMC framework is replaced with an iterative optimization step. The resulting hybrid approach achieves performance nearly identical to the original MCMC framework, indicating that excessive likelihood sharpness effectively collapses posterior mass onto a single dominant mode.2025-11-24T19:03:43Z74 pages, 29 figures, 16 tablesJared N. LakhaniEtienne Pienaarhttp://arxiv.org/abs/2602.14061v1MPL-HMC: A Tunable Parameterized Leapfrog Framework for Robust Hamiltonian Monte Carlo2026-02-15T09:20:51ZThis article introduces the Modified Parameterized Leapfrog Hamiltonian Monte Carlo (MPL-HMC) method, a novel extension of HMC addressing key limitations through tunable integration parameters $α(δt)$ and $β(δt)$, enabling controlled perturbations to Hamiltonian dynamics. Theoretical analysis demonstrates MPL-HMC maintains approximate detailed balance. Extensive empirical evaluation reveals systematic performance improvements. The damping variant ($α_2=-0.1$, $β_2=-0.05$) achieves a 14-fold increase in effective sample size for Neal's funnel and 27\% better efficiency for pharmacokinetic models. The anti-damping variant ($α_2=0.1$, $β_2=0.05$) achieves $\hat{R}=1.026$ for Bayesian neural networks versus $\hat{R}=1.981$ for standard HMC. We introduce aggressive MPL-HMC for multimodal distributions, employing extreme parameters ($α_2=8.0$--$15.0$, $β_2=5.0$--$8.0$) with enhanced sampling to achieve full mode exploration where standard methods fail. All variants maintain computational efficiency identical to standard HMC while providing systematic control over damping, exploration, stability, and accuracy. The article provides rigorous mathematical foundations, implementation specifications, parameter tuning strategies, and comprehensive performance comparisons, extending HMC's applicability to previously challenging domains.2026-02-15T09:20:51ZFeedback welcomeSourabh Bhattacharyahttp://arxiv.org/abs/2510.25154v2TabMGP: Martingale posterior with TabPFN2026-02-15T07:32:35ZBayesian inference provides principled uncertainty quantification but is often limited by challenges of prior and likelihood elicitation. The martingale posterior (MGP) (Fong et al., 2023) offers an alternative by replacing these requirements with a predictive rule. Additionally MGP focuses inference on parameters defined through a loss function. This framework is especially resonant in the era of foundation transformers; practitioners increasingly leverage models like TabPFN for their state-of-the-art capabilities, yet often require epistemic uncertainty for a scientific estimand $θ$ that need not parameterise the model's implicit latent model. The MGP provides the mechanism to recover these posterior distributions. We introduce TabMGP, an MGP built on TabPFN for tabular data. TabMGP produces credible sets with near-nominal coverage and often outperforms both handcrafted MGP constructions and standard Bayesian baselines.2025-10-29T04:12:33Z11 pages (+3 reference, +22 appendix). Extra plots in https://drive.google.com/drive/folders/1ct_effOoTEGpiWUf0_1xI3VqLWHtJY16 . Code in https://github.com/weiyaw/tabmgpKenyon NgEdwin FongDavid T. FrazierJeremias KnoblauchSusan Weihttp://arxiv.org/abs/2602.13888v1Mixture-of-experts Wishart model for covariance matrices with an application to Cancer drug screening2026-02-14T21:07:14ZCovariance matrices arise naturally in different scientific fields, including finance, genomics, and neuroscience, where they encode dependence structures and reveal essential features of complex multivariate systems. In this work, we introduce a comprehensive Bayesian framework for analyzing heterogeneous covariance data through both classical mixture models and a novel mixture-of-experts Wishart (MoE-Wishart) model. The proposed MoE-Wishart model extends standard Wishart mixtures by allowing mixture weights to depend on predictors through a multinomial logistic gating network. This formulation enables the model to capture complex, nonlinear heterogeneity in covariance structures and to adapt subpopulation membership probabilities to covariate-dependent patterns. To perform inference, we develop an efficient Gibbs-within-Metropolis-Hastings sampling algorithm tailored to the geometry of the Wishart likelihood and the gating network. We additionally derive an Expectation-Maximization algorithm for maximum likelihood estimation in the mixture-of-experts setting. Extensive simulation studies demonstrate that the proposed Bayesian and maximum likelihood estimators achieve accurate subpopulation recovery and estimation under a range of heterogeneous covariance scenarios. Finally, we present an innovative application of our methodology to a challenging dataset: cancer drug sensitivity profiles, illustrating the ability of the MoE-Wishart model to leverage covariance across drug dosages and replicate measurements.
Our methods are implemented in the \texttt{R} package \texttt{moewishart} available at https://github.com/zhizuio/moewishart .2026-02-14T21:07:14ZThe Tien MaiZhi Zhaohttp://arxiv.org/abs/2409.15307v2An ILUES-based adaptive Gaussian process method for multimodal Bayesian inverse problems2026-02-14T11:46:30ZInverse problems are prevalent in both scientific research and engineering applications. In the context of Bayesian inverse problems, sampling from the posterior distribution can be particularly challenging when the forward models are computationally expensive. This challenge is further compounded when the posterior distribution is multimodal. To address this issue, we propose a Gaussian process (GP)-based method to indirectly build surrogates for the forward model. Specifically, the unnormalized posterior density is expressed as a product of an auxiliary density and an exponential GP surrogate. Iteratively, the auxiliary density converges to the posterior distribution, starting from an arbitrary initial density. However, the efficiency of GP regression is highly influenced by the quality of the training data. Therefore, we utilize the iterative local updating ensemble smoother (ILUES) to generate high-quality samples that are concentrated in regions with high posterior probability. Subsequently, based on the surrogate model and mode information extracted using a clustering method, Markov chain Monte Carlo (MCMC) with a Gaussian mixed (GM) proposal is used to draw samples from the auxiliary density. Through numerical examples, we demonstrate that the proposed method can accurately and efficiently represent the posterior with a limited number of forward simulations.2024-09-05T02:38:10ZZhihang XuXiaoyu ZhuDaoji LiQifeng Liaohttp://arxiv.org/abs/2505.21723v2Are Statistical Methods Obsolete in the Era of Deep Learning? A Study of ODE Inverse Problems2026-02-14T01:51:01ZIn the era of AI, neural networks have become increasingly popular for modeling, inference, and prediction, largely due to their potential for universal approximation. With the proliferation of such deep learning models, a question arises: are leaner statistical methods still relevant? To shed insight on this question, we employ the mechanistic nonlinear ordinary differential equation (ODE) inverse problem as a testbed, using the physics-informed neural network (PINN) as a representative of the deep learning paradigm and manifold-constrained Gaussian process inference (MAGI) as a representative of statistically principled methods. Through case studies involving the SEIR model from epidemiology and the Lorenz model from chaotic dynamics, we demonstrate that statistical methods are far from obsolete, especially when working with sparse and noisy observations. On tasks such as parameter inference and trajectory reconstruction, statistically principled methods consistently achieve lower bias and variance, while using far fewer parameters and requiring less hyperparameter tuning. Statistical methods can also decisively outperform deep learning models on out-of-sample future prediction, where the absence of relevant data often leads overparameterized models astray. Additionally, we find that statistically principled approaches are more robust to accumulation of numerical imprecision and can represent the underlying system more faithfully to the true governing ODEs.2025-05-27T20:11:21Z35 pages, 11 figures (main text)Skyler WuShihao YangS. C. Kouhttp://arxiv.org/abs/2506.20523v4Anytime-Valid Inference in Adaptive Experiments: Covariate Adjustment and Balanced Power2026-02-13T18:44:01ZAdaptive experiments such as multi-armed bandits offer efficiency gains over traditional randomized experiments but pose two major challenges: invalid inference on the Average Treatment Effect (ATE) due to adaptive sampling and low statistical power for sub-optimal treatments. We address both issues by extending the Mixture Adaptive Design framework (arXiv:2311.05794). First, we propose MADCovar, a covariate-adjusted ATE estimator that is unbiased and preserves anytime-valid inference guarantees while substantially improving ATE precision. Second, we introduce MADMod, which dynamically reallocates samples to underpowered arms, enabling more balanced statistical power across treatments without sacrificing valid inference. Both methods retain MAD's core advantage of constructing asymptotic confidence sequences (CSs) that allow researchers to continuously monitor ATE estimates and stop data collection once a desired precision or significance criterion is met. Empirically, we validate both methods using simulations and real-world data. In simulations, MADCovar reduces CS width by up to 60% relative to MAD. In a large-scale political RCT with 32,000 participants, MADCovar achieves similar precision gains. MADMod improves statistical power and inferential precision across all treatment arms, particularly for suboptimal treatments. Simulations show that MADMod sharply reduces Type II error while preserving the efficiency benefits of adaptive allocation. Together, MADCovar and MADMod make adaptive experiments more practical, reliable, and efficient for applied researchers across many domains. Our proposed methods are implemented through an open-source software package.2025-06-25T15:09:03Z14 pages, 5 figuresDaniel MolitorSamantha Goldhttp://arxiv.org/abs/2102.03411v2Cosine Series Representation2026-02-13T16:16:13ZWe present a functional data analysis (FDA) framework based on explicit orthonormal basis expansion for modeling and denoising complex biomedical signals. Observed functional data are represented as smooth functions in a Hilbert space, and statistical inference is performed directly on their basis coefficients. This formulation provides a transparent and flexible approach to smoothing, regularization, and hypothesis testing. Applications to diffusion tensor imaging tract modeling and EEG denoising demonstrate the advantages of explicit basis representations for scalable and interpretable functional modeling.2021-02-05T20:22:12ZMoo K. Chunghttp://arxiv.org/abs/2506.04166v2N$^2$: A Unified Python Package and Test Bench for Nearest Neighbor-Based Matrix Completion2026-02-13T16:04:52ZNearest neighbor (NN) methods have re-emerged as competitive tools for matrix completion, offering strong empirical performance and recent theoretical guarantees, including entry-wise error bounds, confidence intervals, and minimax optimality. Despite their simplicity, recent work has shown that NN approaches are robust to a range of missingness patterns and effective across diverse applications. This paper introduces N$^2$, a unified Python package and testbed that consolidates a broad class of NN-based methods through a modular, extensible interface. Built for both researchers and practitioners, N$^2$ supports rapid experimentation and benchmarking. Using this framework, we introduce a new NN variant that achieves state-of-the-art results in several settings. We also release a benchmark suite of real-world datasets, from healthcare and recommender systems to causal inference and LLM evaluation, designed to stress-test matrix completion methods beyond synthetic scenarios. Our experiments demonstrate that while classical methods excel on idealized data, NN-based techniques consistently outperform them in real-world settings.2025-06-04T17:04:34Z21 pages, 6 figuresCaleb ChinAashish KhubchandaniHarshvardhan MaskaraKyuseong ChoiJacob FeitelbergAlbert GongManit PaulTathagata SadhukhanAnish AgarwalRaaz Dwivedihttp://arxiv.org/abs/2504.11761v3Delayed Acceptance Markov Chain Monte Carlo for Robust Bayesian Analysis2026-02-13T06:27:41ZThis study introduces a computationally efficient algorithm, delayed acceptance Markov chain Monte Carlo (DA-MCMC), designed to improve posterior simulation in quasi-Bayesian inference. Quasi-Bayesian methods, which do not require fully specifying a probabilistic model, are often computationally expensive owing to the need to evaluate the inverse and determinant of large covariance matrices. DA-MCMC addresses this challenge by employing a two-stage process: In the first stage, proposals are screened using an approximate posterior, whereas a final acceptance or rejection decision is made in the second stage based on the exact target posterior. This reduces the need for costly matrix computations, thereby improving efficiency without sacrificing accuracy. We demonstrate the effectiveness of DA-MCMC through applications to both synthetic and real data. The results demonstrate that, although DA-MCMC slightly reduces the effective sample size per iteration compared with the standard MCMC, it achieves substantial improvement in terms of effective sample size per second, approximately doubling the efficiency. This makes DA-MCMC particularly useful for cases where posterior simulation is computationally intensive. Thus, the DA-MCMC algorithm offers a significant advancement in computational efficiency for quasi-Bayesian inference, making it a valuable tool for robust Bayesian analysis.2025-04-16T04:40:17ZAccepted for publication in Springer Proceedings in Mathematics and Statistics (2025 8th International Conference on Mathematics and Statistics)Masahiro Tanakahttp://arxiv.org/abs/2504.16585v2Leveraging Noisy Manual Labels as Useful Information: An Information Fusion Approach for Enhanced Variable Selection in Penalized Logistic Regression2026-02-13T03:27:32ZIn large-scale supervised learning, penalized logistic regression (PLR) effectively mitigates overfitting through regularization, yet its performance critically depends on robust variable selection. This paper demonstrates that label noise introduced during manual annotation, often dismissed as a mere artifact, can serve as a valuable source of information to enhance variable selection in PLR. We theoretically show that such noise, intrinsically linked to classification difficulty, helps refine the estimation of non-zero coefficients compared to using only ground truth labels, effectively turning a common imperfection into a useful information resource. To efficiently leverage this form of information fusion in large-scale settings where data cannot be stored on a single machine, we propose a novel partition insensitive parallel algorithm based on the alternating direction method of multipliers (ADMM). Our method ensures that the solution remains invariant to how data is distributed across workers, a key property for reproducible and stable distributed learning, while guaranteeing global convergence at a sublinear rate. Extensive experiments on multiple large-scale datasets show that the proposed approach consistently outperforms conventional variable selection techniques in both estimation accuracy and classification performance, affirming the value of intentionally fusing noisy manual labels into the learning process.2025-04-23T10:05:54ZXiaofei WuRongmei Liangsehttp://arxiv.org/abs/2602.12435v1Scalable Changepoint Detection for Large Spatiotemporal Data on the Sphere2026-02-12T21:48:27ZWe propose a novel Bayesian framework for changepoint detection in large-scale spherical spatiotemporal data, with broad applicability in environmental and climate sciences. Our approach models changepoints as spatially dependent categorical variables using a multinomial probit model (MPM) with a latent Gaussian process, effectively capturing complex spatial correlation structures on the sphere. To handle the high dimensionality inherent in global datasets, we leverage stochastic partial differential equations (SPDE) and spherical harmonic transformations for efficient representation and scalable inference, drastically reducing computational burden while maintaining high accuracy. Through extensive simulation studies, we demonstrate the efficiency and robustness of the proposed method for changepoint estimation, as well as the significant computational gains achieved through the combined use of the MPM and truncated spectral representations of latent processes. Finally, we apply our method to global aerosol optical depth data, successfully identifying changepoints associated with a major atmospheric event.2026-02-12T21:48:27ZSamantha Shi-JunBo Lihttp://arxiv.org/abs/2602.05716v2MixMashNet: An R Package for Single and Multilayer Networks2026-02-12T10:39:04ZThe R package MixMashNet provides an integrated framework for estimating and analyzing single and multilayer networks using Mixed Graphical Models (MGMs), accommodating continuous, count, and categorical variables. In the multilayer setting, layers may comprise different types and numbers of variables, and users can explicitly impose a predefined multilayer topology. Bootstrap procedures are implemented to quantify sampling uncertainty for edge weights and node-level centrality indices. In addition, the package includes tools to assess the stability of node community membership and to compute community scores that summarize the latent dimensions identified through network clustering. MixMashNet also offers interactive Shiny applications to support exploration, visualization, and interpretation of the estimated networks.2026-02-05T14:39:49ZMaria De MartinoFederico TrioloAdrien PerigordAlice Margherita OrnagoDavide Liborio VetranoCaterina Gregoriohttp://arxiv.org/abs/2506.18846v2Bayesian decomposition using Besov priors2026-02-12T10:35:07ZIn many inverse problems, the unknown is composed of multiple components with different regularities, for example, in imaging problems, where the unknown can have both rough and smooth features. We investigate linear Bayesian inverse problems, where the unknown consists of two components: one smooth and one piecewise constant. We model the unknown as a sum of two components and assign individual priors on each component to impose the assumed behavior. We propose and compare two prior models: (i) a combination of a Haar wavelet-based Besov prior and a smoothing Besov prior, and (ii) a hierarchical Gaussian prior on the gradient coupled with a smoothing Besov prior. To achieve a balanced reconstruction, we place hyperpriors on the prior parameters and jointly infer both the components and the hyperparameters. We propose Gibbs sampling schemes for posterior inference in both prior models. We demonstrate the capabilities of our approach on 1D and 2D deconvolution problems, where the unknown consists of smooth parts with jumps. The numerical results indicate that our methods improve the reconstruction quality compared to single-prior approaches and that the prior parameters can be successfully estimated to yield a balanced decomposition.2025-06-23T17:07:04Z28 pages, 13 figures, this is a preprint of an article submitted to the journal of Applied Numerical MathematicsAndreas HorstBabak Maboudi AfkhamYiqiu DongJakob Lemvig