https://arxiv.org/api/ZzdB8mTxrIAP05bQg6gPvcK1e/82026-06-15T07:11:39Z7837961515http://arxiv.org/abs/2606.00442v1Exploiting weight-space symmetries for approximating curvature2026-05-30T00:17:36ZMany machine learning techniques rely on approximating a loss function's curvature, but this is notoriously hard to do at the scale of modern deep networks. Surprisingly, no previous work has exploited the curvature constraints that arise from well known weight-space symmetries in loss landscapes. By analytically averaging over group actions that leave the loss invariant, we construct structured Hessian approximations from single gradients that can be tractably estimated, stored, and inverted. The choice of user-specified symmetry group directly governs the trade-off between approximation accuracy and computational cost. Moreover, our framework provides a unifying theoretical lens for viewing existing methods; in particular, a specific choice of symmetry group recovers Shampoo/Muon-like curvature estimates. We validate our method on a range of network architectures, and deploy it to second-order optimization benchmarks, including a small language model. Our curvature estimation framework might find applications in other machine learning problems such as uncertainty estimation, continual learning, compression/pruning, training data attribution, and more.2026-05-30T00:17:36ZPublished at ICML 2026. 35 pages, 11 figures. Code: https://github.com/mtkresearch/symm_optArtem ArtemevRui XiaBenjamin M. BoydYoujing YuFelix DangelGuillaume HennequinAlberto Bernacchiahttp://arxiv.org/abs/2601.21959v2Near-Optimal Private Tests for Simple and MLR Hypotheses2026-05-30T00:15:04ZWe develop a near-optimal testing procedure under the framework of Gaussian differential privacy for simple as well as one- and two-sided tests under monotone likelihood ratio conditions. Our mechanism is based on a private mean estimator with data-driven clamping bounds, whose population risk matches the private minimax rate up to logarithmic factors. Using this estimator, we construct private test statistics that achieve the same asymptotic relative efficiency as the non-private, most powerful tests while maintaining conservative type I error control. In addition to our theoretical results, our numerical experiments show that our private tests outperform competing DP methods and offer comparable power to the non-private most powerful tests, even at moderately small sample sizes and privacy loss budgets.2026-01-29T16:36:21ZYu-Wei ChenRaghu PasupathyJordan Awanhttp://arxiv.org/abs/2606.00436v1Weighted Conformal Clustering2026-05-29T23:58:56ZClustering is a central tool for discovering latent structure in unlabeled data; yet modern clustering pipelines often end with a hard assignment of each observation to a cluster without rigorous measures of assignment uncertainty. We propose a novel weighted conformal approach for constructing valid confidence sets for cluster labels. The key difficulty is that the labels available for calibration are not observed ground-truth labels, but synthetic labels produced by a data-dependent clustering algorithm. Our method develops a conformal inference algorithm that corrects the resulting mismatch with the latent target labels through weights by formulating conformal clustering as a conditional label-distribution shift problem. We first derive an oracle procedure that attains finite-sample marginal coverage and then develop a computationally tractable and implementable version using estimated conditional label probabilities and novel augmented calibration. We show that the coverage of the estimated-weight procedure depends on the estimator, giving an explicit bound on the loss relative to the nominal level. Empirical studies demonstrate that the proposed weighted approach offers improvements over the recently proposed split conformal clustering procedure in terms of informative confidence set size, especially in nonlinear and high-dimensional clustering applications.2026-05-29T23:58:56ZAnirban NathYoonHaeng HurGenevera I. Allenhttp://arxiv.org/abs/2606.00425v1Empirical Likelihood with Generative AI2026-05-29T23:21:50ZMoment conditions are widely used to identify parameters in models where the full likelihood is either unknown or intentionally left unspecified. Empirical likelihood methods address this problem by assigning probability weights to the observed data so that the sample moment conditions hold exactly. Building on this idea, we propose a nonparametric Bayesian framework based on exponentially tilted empirical likelihood. This Bayesian formulation is particularly appealing in settings where prior information is more naturally specified on the observables rather than on the underlying parameters. Such settings arise in the presence of auxiliary data sources or synthetic data generated by modern generative AI models.Inference proceeds by projecting posterior draws from a Dirichlet process onto the moment-restricted model, yielding a computationally efficient procedure that is naturally amenable to parallelization. We establish new Bernstein--von Mises and consistency theorems for the resulting projection posterior under both vanishing-prior and persistent-prior regimes. In an application to return prediction using overnight news headlines, we show that AI-generated auxiliary data can provide a useful source of indirect regularization when informative priors on the parameter itself are unavailable.2026-05-29T23:21:50ZJiguang LiSid KankanalaVeronika Rockovahttp://arxiv.org/abs/2606.00413v1Riemannian Stochastic Optimization for Sufficient Dimension Reduction2026-05-29T23:06:44ZSufficient dimension reduction (SDR) makes high-dimensional regression tractable by projecting the covariates onto a low-dimensional subspace that preserves the conditional mean of the response. Existing gradient-based estimators either operate in the ambient space and suffer from the curse of dimensionality, or localize in the reduced space at a per-outer-iteration cost at least quadratic in the sample size. We show that minimizers of the population Minimum Average Variance Estimation (MAVE) risk approximate the same Grassmannian target as the Outer Product of Gradients (OPG), and recast the empirical criterion as a smooth maximization on the Stiefel manifold with closed-form Riemannian gradient. The resulting algorithm, SMAVE, combines sparse projected-space nearest-neighbor localization with Riemannian stochastic gradient ascent. A simplified version comes with almost-sure convergence and a non-asymptotic rate matching the standard non-convex stochastic first-order scaling. Empirically, SMAVE matches or improves on RMAVE's synthetic subspace recovery at moderate-to-high ambient dimension, and on four real datasets it uniformly improves over OPG and is competitive with or outperforms RMAVE at orders of magnitude lower runtime.2026-05-29T23:06:44ZThibault PautrelFrançois Portierhttp://arxiv.org/abs/2602.07218v2Collaborative and Efficient Fine-tuning: Leveraging Task Similarity2026-05-29T20:42:03ZAdaptability has been regarded as a central feature in the foundation models, enabling them to effectively acclimate to unseen downstream tasks. Parameter-efficient fine-tuning methods such as celebrated LoRA facilitate efficient adaptation of large foundation models using labeled, high-quality and generally scarce task data. To mitigate data scarcity in fine-tuning of foundation models, we propose to leverage task similarity across multiple downstream users. Intuitively, users with similar tasks must be able to assist each other in boosting the effective fine-tuning data size. We propose Collaborative Low-Rank Adaptation, or CoLoRA, which exploits task similarity to collaboratively and efficiently fine-tune personalized foundation models. The main idea in CoLoRA is to train one shared adapter capturing underlying task similarities across all tasks, and personalized adapters tailored to user-specific tasks. We theoretically study CoLoRA on heterogeneous linear regression and provide provable guarantees for ground truth recovery. We also conduct several natural language experiments with varying task similarity, which further demonstrate that when trained together with similar tasks, individual performances are significantly boosted.2026-02-06T21:59:40ZGagik MagakyanAmirhossein ReisizadehChanwoo ParkPablo A. ParriloAsuman Ozdaglarhttp://arxiv.org/abs/2603.01157v2Adaptive Window Selection for Financial Risk Forecasting2026-05-29T20:34:17ZRisk forecasts in financial regulation and internal management are calculated through historical data. The unknown structural changes of financial data pose a substantial challenge in selecting an appropriate look-back window for risk modeling and forecasting. We develop a data-driven online learning method, called the bootstrap-based adaptive window selection (BAWS), that adaptively determines the window size in a sequential manner. A central component of BAWS is to compare the realized scores against a data-dependent threshold based on the bootstrap method. We provide an asymptotic justification for the bootstrap threshold, covering non-smooth scores such as the VaR check loss and the joint VaR--ES score, with an extension to stationary weakly dependent data via the moving block bootstrap. A single-break analysis further shows that BAWS rejects overlong windows crossing sufficiently large breaks. The proposed method is applicable to the forecasting of risk measures that are elicitable individually or jointly, such as the Value-at-Risk (VaR) and the pair of VaR and the corresponding Expected Shortfall. Through simulation studies and an empirical analysis, we demonstrate that BAWS often improves upon the standard rolling window approach and the recently developed method of stability-based adaptive window selection, especially when there are structural changes in the data-generating process.2026-03-01T15:42:52ZYinhuan LiChenxin LyuRuodu Wanghttp://arxiv.org/abs/2606.00343v1Polar Depth for Potentially Heavy-Tailed Data2026-05-29T20:32:56ZMotivated by the analysis of the behaviour of extremes from multivariate heavy-tailed distributions, we introduce a novel notion of statistical depth, referred to as Polar Depth. The polar depth function is naturally expressed in polar coordinates, as is the limiting distribution of a regularly varying random variable, beyond asymptotically large thresholds, once its marginals have been appropriately normalized. Not only does the polar depth function make it easy to order the extreme values taken by a heavy-tailed random variable X and finds natural applications in anomaly detection, but it is also possible to show, as we prove it under appropriate assumptions in this article, that the polar depth of the largest observations, i.e. observations X which norm is larger than t>0, converges to the polar depth of the limiting distribution as t converges to infinity. Although designed to quantify the depth of multivariate extremes, the polar depth is interesting in its own right, insofar as this notion is more relevant for distributions whose support is included in a halfspace than the alternatives proposed in the literature, the halfspace depth in particular. Here, we demonstrate its properties and analyze statistical issues related to its estimation from both finite-sample and asymptotic points of view. We present numerical results to empirically demonstrate its relevance, particularly for the statistical analysis of extreme observations and more specifically for the identification of anomalies among them.2026-05-29T20:32:56ZStephan ClemençonCarlos FernándesPavlo MozharovskyiAnne Sabourinhttp://arxiv.org/abs/2506.21278v3Hyperspherical Variational Autoencoders Using Efficient Spherical Cauchy Distribution2026-05-29T20:26:44ZWe propose spherical Cauchy (spCauchy) latent variables for variational autoencoders on hyperspherical latent spaces. The spCauchy family has heavy-tailed global behavior and admits an exact differentiable reparameterization by applying a Möbius transformation to uniform samples on the sphere. We show that, in the high-concentration limit, spCauchy recovers the local tangent-space geometry of the von Mises-Fisher (vMF) distribution under an explicit concentration parameter mapping, while avoiding the high-order Bessel-function evaluations required by vMF implementations. For training, the Kullback-Leibler divergence to a uniform spherical prior admits rapidly convergent series, stable quadrature, and high-concentration asymptotic forms. We further establish monotonicity of the concentration-dependent KL core and derive analytic brackets with closed-form surrogates and error control, supporting stable approximation in extreme regimes. Stress-test benchmarks show that the resulting latent-layer objective remains stable and faster to evaluate than vMF baselines on CPU and GPU. Experiments on image and molecular sequence data demonstrate that spCauchy-VAEs provide a robust and scalable alternative for generative modeling with hyperspherical latent representations.2025-06-26T14:01:51ZLukas SablicaKurt Hornikhttp://arxiv.org/abs/2510.06028v3Generalization of Gibbs and Langevin Monte Carlo Algorithms in the Interpolation Regime2026-05-29T20:25:59ZThis paper provides data-dependent bounds on the expected error of the Gibbs algorithm in the overparameterized interpolation regime, where low training errors are also obtained for impossible data, such as random labels in classification. The results show that generalization in the low-temperature regime is already signaled by small training errors in the noisier high-temperature regime. The bounds are stable under approximation with Langevin Monte Carlo algorithms. The analysis motivates the design of an algorithm to compute bounds, which on the MNIST, CIFAR-10, and SVHN datasets yield nontrivial, close predictions on the test error for true labeled data, while maintaining a correct upper bound on the test error for random labels.2025-10-07T15:25:56ZAndreas MaurerErfan MirzaeiMassimiliano Pontilhttp://arxiv.org/abs/2606.00329v1Benchmarking Recursive-Collapse Warning Claims Under Matched False-Positive Control2026-05-29T20:12:42ZRecursive systems can enter collapse-like regimes -- self-reinforcing amplification, persistent recursion, and narrowing diversity that mask accelerating internal degradation -- before overt failure becomes visible. We introduce Loopzero, a claim-bounded benchmark framework for testing whether recursive failures follow a directional telemetry pattern: rising gain (G), recursive persistence (p), and declining diversity ($δ$). The claim boundary is specified in Lean; the Lean artifact does not verify real telemetry, benchmark validity, or detector performance.
We evaluate the bridge on two frozen public-artifact benchmarks: a segmented public-markets benchmark (Volmageddon 2018, COVID MWCB 2020) and a MovieLens-25M offline deterministic recommender replay. Detectors are evaluated under a locked equal-false-positive contract (FP $\in$ [0.03, 0.07], pre-registered) so all configurations face the same alert budget. Neither tested standard comparators nor Loopzero's pre-registered quantile detector achieved an accepted operating point. Directional witness alignment held on both canonical benchmarks, with adjacent-horizon and row-level limitations disclosed. Digitized Shumailov et al. (2024) LLM training-loop trajectories are directionally consistent with the pattern; matched-FP evaluation in that domain is deferred.
The contribution is a reproducible, falsifiable benchmark framework for evaluating recursive-collapse warning claims under an explicit alert-budget contract -- non-acceptance reported as a first-class scientific outcome.2026-05-29T20:12:42Z29 pages, 7 figures, 2 tables; supplementary materials: 9 pages, 1 figure, 4 tables. Code, derived data packets, and Lean artifact: https://github.com/davidmullett/loopzero-paper-public (release tag lean-v1.0)David Mulletthttp://arxiv.org/abs/2606.00327v1Cluster Analysis with Resampling for Validation and Exploration (CARVE)2026-05-29T20:09:20ZClustering is widely used across the sciences as the foundation for downstream data-driven scientific discoveries. However, clustering results are highly sensitive to the choice of algorithm, preprocessing, and the number of clusters $k$, producing scientific claims that are often not reproducible. The current state of the art for validating clustering solutions consists of clustering validation indices (CVIs) such as Silhouette, Davies-Bouldin, and Calinski-Harabasz, which rely on geometric assumptions that break down on the heavy-tailed, high-dimensional, and nonlinearly structured data encountered in biomedical research. Resampling-based alternatives - grounded in the ideas of clustering stability and generalizability - have been proposed but remain scattered across specialized tools with no unified, accessible software. We fill this gap with CARVE (Cluster Analysis with Resampling for Validation and Exploration), an open-source Python and R package that jointly evaluates multiple clustering algorithms and hyperparameters, returning stability and generalizability diagnostics at the global, cluster, and sample level together with principled selection rules and consensus-based cluster labels. Across six synthetic benchmarks CARVE consistently recovers near-optimal clusterings where classical indices degrade substantially. On experimental genomics and proteomics data sets, CARVE recovers finer biological structure when classical CVIs collapse entirely. CARVE is available with a scikit-learn-compatible Python API and an analogous R interface compatible with Seurat workflows.2026-05-29T20:09:20ZKai R. WycikTiffany M. TangTarek M. ZikryGenevera I. Allenhttp://arxiv.org/abs/2606.00322v1Perturbative methods for non-parametric instrumental variable2026-05-29T19:56:26ZWe introduce a perturbative approach for nonparametric instrumental variable (NPIV) estimation. By drawing inspiration from perturbation theory in physics, we extend standard kernel ridge methods with systematic higher perturbation order corrections that significantly improve estimation accuracy. Spectrally, the perturbation introduces mixing between different eigenmodes of the expectation integral operator, which becomes especially useful when the integral equation is ill-defined. One source for such ill-definedness can be the curse of dimensionality. Our method performs across various dimensionality regimes, particularly when the dimensionality parameter $β$ which is defined through the number of samples $n$ and dimension $d$ as $n^β= d$, becomes large. Experimental results show that our first-order perturbative corrections can reduce prediction error by up to 99\% in high-dimensional ill-defined cases ($β> 0.7$) compared to standard ridge regression approaches. The performance improvement is maintained across a wide range of dimensions, with the advantage becoming more pronounced as dimensionality increases.2026-05-29T19:56:26Z8+24 pages, 4 figures, comments welcomedWei BuArthur Grettonhttp://arxiv.org/abs/2601.21696v2Independent Component Discovery in Temporal Count Data2026-05-29T19:37:42ZAdvances in data collection are producing growing volumes of temporal count observations, making adapted modeling increasingly necessary. In this work, we introduce a generative framework for independent component analysis of temporal count data, combining regime-adaptive dynamics with Poisson log-normal emissions. The model identifies disentangled components with regime-dependent contributions, enabling representation learning and perturbations analysis. Notably, we establish the identifiability of the model, supporting principled interpretation. To learn the parameters, we propose an efficient amortized variational inference procedure. Experiments on simulated data evaluate recovery of the mixing function and latent sources across diverse settings, while real-world applications to gut microbiome and climate datasets reveal co-variation patterns and regime shifts consistent with domain-specific knowledge.2026-01-29T13:30:10Z9 pages, 7 figures, Appendix providedAlexandre ChaussardAnna BonnetSylvain Le Corffhttp://arxiv.org/abs/2606.00309v1Large-scale Uncertainty Quantification for Latent Variable Models Using Subsampling Markov Chain Monte Carlo2026-05-29T19:34:28ZStochastic gradient Langevin dynamics combined with Gibbs updates (SGLD--Gibbs) provides a highly scalable approach to approximate Bayesian inference in latent variable models. However, it remains unclear how to tune the algorithm's hyperparameters in a principled manner to ensure the uncertainty estimates are statistically meaningful. In this work, we address this gap in tuning guidance by developing a statistical scaling limit theory for SGLD--Gibbs. We derive a joint asymptotic limit for the global parameters and latent variables under appropriate space-time rescaling. We show that global parameters converge to a diffusion-type limit, while each latent variable converges to a jump process, reflecting the use of intermittent Gibbs updates. This joint jump-diffusion structure reveals how latent-variable randomness contributes to the stationary distribution of the global parameters. We leverage our results to propose explicit guidance on hyperparameter tuning for SGLD--Gibbs that ensures meaningful uncertainty quantification. Numerical experiments show that SGLD--Gibbs with our tuning guidance leads to better parameter estimates, uncertainty quantification, and predictive performance than stochastic variational inference.2026-05-29T19:34:28ZProceedings of the 43rd International Conference on Machine Learning, Seoul, South Korea. PMLR 306, 2026Xiaoyu WangJonathan H. Huggins