https://arxiv.org/api/5I3QRjbNyv7JF/BvsZk3kMWtJB02026-06-21T19:33:20Z7851176515http://arxiv.org/abs/2606.02101v1It does what it says on the tin: safe synthetic data from coarsened margins2026-06-01T11:32:10ZThis paper proposes a method of creating synthetic data (SD) that will have two important advantages for the user compared to other methods currently available. The first is transparency; unlike other methods, the person in receipt of the SD will know which of the relationships between variables in the original data will be approximately maintained in the SD. The second is a guarantee that the SD is derived from information that has already been judged to be free of disclosure risk. This is achieved by first defining and calculating the margins where relationships between variables will be maintained in the SD. Each margin will then be subject to statistical disclosure control (SDC) to the standards defined by the data custodian, e.g. top-coding and bottom-coding, combination of small categories and/or modifying small counts. Further adjustment of the curated margins is advised by coarsening all counts in the table to multiples of the disclosure limit. These adjusted margins are used to create SD by the Iterative Proportional Fitting (IPF) algorithm. The practical steps involved in creating such SD are illustrated using data from the 1901 Census of Scotland.2026-06-01T11:32:10ZGillian M Raabhttp://arxiv.org/abs/2510.17303v2Symmetries in PAC-Bayesian Learning2026-06-01T11:25:07ZSymmetries are known to improve the empirical performance of machine learning models, yet theoretical guarantees explaining these gains remain limited. Prior work has focused mainly on compact group symmetries and often assumes that the data distribution itself is invariant, an assumption rarely satisfied in real-world applications. In this work, we extend generalization guarantees to the broader setting of non-compact symmetries, such as translations and to non-invariant data distributions. Building on the PAC-Bayes framework, we adapt and tighten existing bounds, demonstrating the approach on McAllester's PAC-Bayes bound while showing that it applies to a wide range of PAC-Bayes bounds. We validate our theory with experiments on several datasets with non-uniform and non-compact transformations, where the derived guarantees not only hold but also improve upon prior results. These findings provide theoretical evidence that, for symmetric data, symmetric models are preferable beyond the narrow setting of compact groups and invariant distributions, opening the way to a more general understanding of symmetries in machine learning.2025-10-20T08:45:57ZArmin BeckPeter Ochshttp://arxiv.org/abs/2605.31200v2Beyond Additive Decompositions: Interpretability Through Separability2026-06-01T11:12:14ZInterpretable machine learning requires models that are accurate and structurally faithful to the data. Existing explainability methods rely heavily on additive representations (e.g., Generalized Additive Models (GAMs), SHapley Additive exPlanations (SHAP), functional ANOVA), which can suffer from signal cancellation and off-support extrapolation in the presence of strong interactions. We propose Tensor Separation Learning (TSL), a regression model that learns a sum of rank-1 products of univariate per-feature functions via a stagewise greedy procedure with orthogonal refitting. By enforcing separability, TSL avoids the information loss inherent in additive projections caused by marginalizing higher-order interactions. The learned TSL model can be fully reconstructed from first-order partial dependence functions, up to constant factors. This stage-wise correspondence ensures that the resulting visualizations are faithful to the fitted components. We establish approximation-rate guarantees for functions with bounded mixed $p$-th order partial derivatives and demonstrate that TSL competes with black-box models on regression benchmarks.2026-05-29T12:08:14ZTo appear in Proceedings of the 43rd International Conference on Machine Learning (ICML 2026)Jinyang LiuMunir Eberhardt Hiabuhttp://arxiv.org/abs/2606.02081v1Decision-calibrated prediction sets for robust power system operations2026-06-01T11:12:09ZRobust optimization offers a tractable approach to balance operating costs and reliability in power systems dominated by weather-dependent renewable uncertainty, but its performance depends critically on the uncertainty set. Standard data-driven approaches often calibrate uncertainty sets to attain predictive coverage, which can produce unnecessarily large sets and costly operating decisions. In contrast, we introduce decision-calibrated prediction sets and embed them as uncertainty sets in robust optimization problems; these are conditional multivariate prediction sets where calibration is defined in terms of the reliability of downstream decisions, rather than in terms of the coverage. First, we learn these conditional prediction sets as sub-level sets of norm-based score functions represented by partially input-convex neural networks, capturing contextual information and multivariate dependence while preserving convexity and tractability in downstream robust formulations. Second, inspired by conformal risk control, we calibrate a score-threshold parameter that sets the volume of the uncertainty set, thereby controlling the expected violations of downstream operational constraints. We apply our approach to 15-minute-ahead reserve scheduling with network-constrained deliverability, which we formulate as a robust DC optimal power flow problem with affine recourse. Numerical experiments show that decision-calibrated sets attain prescribed constraint-satisfaction targets within about three percentage points, whereas standard coverage-based calibration systematically exceeds these targets by more than eleven percentage points, leading to larger sets and higher operating costs.2026-06-01T11:12:09Z25 pages, 6 figuresAkylas StratigakosHonglin WenElina SpyrouPierre Pinsonhttp://arxiv.org/abs/2606.02055v1Query-Limited Community Recovery in Stochastic Block Models2026-06-01T10:44:00ZWe study exact community recovery in the two-community stochastic block model on $n$ vertices under limited and noisy access to network data. The learner may query a noisy neighborhood oracle that reveals each true neighbor of a queried vertex independently with fixed probability and never returns non-neighbors, subject to a finite query budget. We consider both oracle-only access and a combined model where the learner also observes a single subsampled copy of the underlying graph. For oracle-only access, balanced uniform querying gives a sharp non-adaptive benchmark: when each vertex is queried the same integer number of times, the observations reduce to an SBM with attenuated edge probabilities and the Abbe-Bandeira-Hall exact-recovery threshold applies. We show that this benchmark is not adaptively optimal: a two-stage adaptive strategy succeeds with $n+o(n)$ queries in a regime where balanced uniform querying requires $m n$ queries for some $m>1$. With an additional subsampled graph, we prove a sublinear-query adaptivity gap: balanced data-independent uniform querying with a sublinear budget does not improve over the subsampled graph alone, whereas adaptive querying can target a small set of uncertain vertices and achieve exact recovery. Thus adaptive data acquisition can strictly improve the information-theoretic limits of exact recovery.2026-06-01T10:44:00ZSabyasachi BasuManuj MukherjeeLutz OettershagenSuhas Thejaswihttp://arxiv.org/abs/2606.02047v1Convex Distance Operator Transport: A Convex and Geometry-Preserving Formulation2026-06-01T10:38:09ZWe introduce Convex Distance Operator Transport (CDOT), the first convex optimal transport framework that aligns distributions across heterogeneous domains by jointly preserving feature correspondence and intrinsic geometric structure. Specifically, CDOT employs an operator-based regularization that aligns aggregated distance structures by introducing distance and conditional expectation operators. Consequently, the proposed regularization improves the robustness to local geometric variations. We further prove that the resulting CDOT discrepancy is a valid pseudometric on the space of attributed compact metric-measure spaces. In addition, we characterize the relationship between CDOT and Gromov--Wasserstein (GW) through a new notion of dispersion gap, formally elucidating the geometric source of non-convexity in GW compared to the convexity of CDOT. In the finite-sample regime, we derive a non-asymptotic risk bound decomposed into optimization and statistical errors, establishing risk consistency under a globally convergent Frank--Wolfe algorithm. Experiments on synthetic point clouds, brain connectomes, and graph classification benchmarks demonstrate better performance over existing methods, with stable and reliable behavior in practice.2026-06-01T10:38:09ZThis paper is 41 pages long, contains 6 figures, and has been accepted to ICML 2026Junhyoung ChungEuijong SongWon Hwa KimGunwoong Parkhttp://arxiv.org/abs/2606.02017v1PliableBVS: A flexible Bayesian variable selection method for modeling interactions with mandatory modifying variables2026-06-01T10:07:53ZHigh-dimensional interaction models are useful for studying, for example, how a large set of variables of interest, such as gene expression or other omics features, interact with a smaller set of modifying variables, such as clinical covariates. In this context, the pliable lasso has recently been proposed as an efficient method for screening large numbers of potential interaction terms under an asymmetric weak hierarchical constraint. In this work, we extend this framework by introducing PliableBVS, a Bayesian variable selection approach that preserves the hierarchical structure of the pliable lasso while inducing sparsity through spike-and-slab priors. The proposed model combines the continuous shrinkage effect of Bayesian lasso with a hierarchical spike-and-slab prior formulation that has two layers of decision variables: one governing the inclusion of main effects and another controlling the inclusion of interaction effects which is conditional on the inclusion of the corresponding main effects. This structure enables simultaneous selection of high-dimensional main and interaction effects within a coherent probabilistic framework. In simulation studies the proposed method outperforms the original pliable lasso in identifying active main and interaction effects, reducing false discoveries, and improving prediction accuracy in most scenarios. Applications with data from a labor onset study and a preeclampsia study demonstrate that PliableBVS selects biologically meaningful features and interactions.2026-06-01T10:07:53ZTheophilus Quachie AsensoZhi ZhaoMaren-Helene Langeland DegnesMarie Cecilie Paasche RolandTrond Melbye MichelsenManuela Zucknickhttp://arxiv.org/abs/2606.02008v1Provable Data Scaling Law for Meta Learning via Complexity Minimization2026-06-01T10:02:29ZPre-training has become a fundamental paradigm in modern machine learning, with one of its key empirical benefits being reduced downstream sample complexity as the scale of pre-training data increases. However, existing theoretical frameworks for pre-training do not fully explain this phenomenon. In this paper, we introduce complexity minimization, a novel meta-representation learning framework designed to enable theoretical analysis of this scaling behavior, which learns representations by evaluating the downstream model complexity best suited to each domain and minimizing the worst-case such complexity across source domains. Our end-to-end theoretical analysis, spanning pre-training through downstream regression, shows that this framework provably captures this scaling behavior; in particular, we show that the error rate of few-shot adaptation improves as the amount of meta-training data grows. Empirically, we demonstrate that incorporating complexity regularization into existing meta-learning methods consistently improves downstream sample efficiency.2026-06-01T10:02:29ZKazuto FukuchiRyuichiro HatayaKota Matsuihttp://arxiv.org/abs/2606.02664v1State-Coupled Volatility in Latent Dynamical Systems: Recovery Under Partial Observation2026-06-01T10:00:26ZLatent state-space models are widely used to study partially observed dynamical systems, yet most formulations assume that process variability is independent of latent-state position. In many biological, behavioral, and physiological systems, however, variability may depend systematically on the underlying dynamical state, producing structured stochasticity that is not captured by constant-variance models. We introduce a state-coupled stochastic volatility framework in which latent process variance depends on displacement from a latent equilibrium. To estimate this relationship under partial observation, we develop a particle expectation-maximization procedure combining bootstrap particle filtering and backward trajectory smoothing. The model includes a coupling parameter, $γ$, that quantifies the strength of association between latent-state position and process variability. A large-scale simulation benchmark evaluated recovery and detection performance across varying coupling strengths, observation noise levels, trajectory lengths, and persistence regimes. The proposed framework consistently reduced recovery bias relative to an observed-state heteroskedastic proxy, with the largest improvements occurring under strong coupling. Recovery performance improved with increasing latent persistence, while detection performance remained competitive across a broad range of conditions and became increasingly advantageous as observation noise increased. Taken together, the results demonstrate that state-coupled volatility can be identified and estimated under partial observation when latent-state structure is explicitly modeled. The framework provides a practical methodological foundation for studying state-dependent variability and evaluating whether structured stochasticity contributes information about system dynamics beyond that contained in mean-state trajectories alone.2026-06-01T10:00:26Z40 pages, 16 figuresImani Becketthttp://arxiv.org/abs/2606.01954v1Flow-Transformed Implicit Processes for Function-Space Variational Inference2026-06-01T09:14:09ZImplicit-process priors define distributions over functions through flexible generative mechanisms, making them attractive for Bayesian function-space modelling. However, performing posterior inference with such priors is challenging because their induced function-space distributions are typically not available in closed form. One practical strategy is to approximate the prior using a finite collection of sampled functions, and then represent posterior functions as learned combinations of these samples. Existing approaches commonly place a Gaussian variational distribution over the combination weights. While tractable, this choice limits the shapes of posterior uncertainty that can be represented, especially when the true posterior is asymmetric, heavy-tailed, or multimodal. We propose Flow-Transformed Implicit Processes (FTIP), a variational inference method that makes this finite-dimensional function-space approximation more expressive. Instead of using a Gaussian distribution over the combination weights, FTIP uses a normalizing flow to define a richer variational distribution. This induces a flexible posterior distribution over functions while preserving tractable optimization. We train the model using a Black-Box α objective, allowing us to compare mass-covering and mode-seeking variational behaviour. Experiments show that FTIP captures asymmetric and multimodal posterior structure in function space that Gaussian coefficient approximations tend to smooth or collapse.2026-06-01T09:14:09Z24 pages, 4 figures, 10 tables. Pre-print submitted for revisionLuis A. OrtegaAndrés R. MasegosaThomas D. Nielsenhttp://arxiv.org/abs/2501.08640v2Quantum Reservoir Computing and Risk Bounds2026-06-01T09:06:53ZWe propose a way to bound the generalisation errors of several classes of quantum reservoirs using the Rademacher complexity. We give specific, parameter-dependent bounds for two particular quantum reservoir classes. We analyse how the generalisation bounds scale with growing numbers of qubits. Applying our results to classes with polynomial readout functions, we find that the risk bounds converge in the number of training samples. The explicit dependence on the quantum reservoir and readout parameters in our bounds can be used to control the generalisation error to a certain extent. It should be noted that the bounds scale exponentially with the number of qubits n. The upper bounds on the Rademacher complexity can be applied to other reservoir classes that fulfill a few hypotheses on the quantum dynamics and the readout function.2025-01-15T08:06:03ZNaomi Mona ChmielewskiL2SNina AminiL2S, CNRSJoseph Mikaelhttp://arxiv.org/abs/2411.12438v2Dimension Reduction via Sum-of-Squares and Improved Clustering Algorithms for Non-Spherical Mixtures2026-06-01T08:02:06ZWe develop a new approach for clustering non-spherical (i.e., arbitrary component covariances) Gaussian mixture models via a subroutine, based on the sum-of-squares method, that finds a low-dimensional separation-preserving projection of the input data. Our method gives a non-spherical analog of the classical dimension reduction, based on singular value decomposition, that, among several other applications, forms a key component of the celebrated spherical clustering algorithm of Vempala and Wang [VW04].
As applications, we obtain an algorithm to (1) cluster an arbitrary total-variation separated mixture of $k$ centered (i.e., zero-mean) Gaussians with $n\geq \operatorname{poly}(d) f(w_{\min}^{-1})$ samples and $\operatorname{poly}(n)$ time, and (2) cluster an arbitrary total-variation separated mixture of $k$ Gaussians with identical but arbitrary unknown covariance with $n \geq d^{O(\log w_{\min}^{-1})} f(w_{\min}^{-1})$ samples and $n^{O(\log w_{\min}^{-1})}$ time. Here, $w_{\min}$ is the minimum mixing weight of the input mixture, and $f$ does not depend on the dimension $d$. Our algorithms naturally extend to tolerating a dimension-independent fraction of arbitrary outliers. Before this work, the techniques in the state-of-the-art non-spherical clustering algorithms needed $d^{O(k)} f(w_{\min}^{-1})$ samples and time for clustering such mixtures.
Our results may come as a surprise in the context of the $d^{Ω(k)}$ statistical query and sum-of-squares lower bounds [DKS17, DKPP24] for clustering non-spherical Gaussian mixtures. While these results are usually thought to rule out $d^{o(k)}$ cost algorithms for the problem, our results show that the lower bounds can in fact be circumvented for a remarkably general class of Gaussian mixtures.2024-11-19T11:58:51Z67 pages, updated to match camera-ready version at COLT 2026Prashanti AndersonMitali BafnaRares-Darius BuhaiPravesh K. KothariDavid Steurerhttp://arxiv.org/abs/2512.02342v3Safeguarded Stochastic Polyak Step Sizes for Non-smooth Optimization: Robust Performance Without Small (Sub)Gradients2026-06-01T07:48:25ZThe stochastic Polyak step size (SPS) has proven to be a promising choice for stochastic gradient descent (SGD), delivering competitive performance relative to state-of-the-art methods on smooth convex and non-convex optimization problems, including deep neural network training. However, extensions of this approach to non-smooth settings remain in their early stages, often relying on interpolation assumptions or requiring knowledge of the optimal solution. In this work, we propose a novel SPS variant, Safeguarded SPS (SPS$_{safe}$), for the stochastic subgradient method, and provide rigorous convergence guarantees for non-smooth convex optimization with no need for strong assumptions. We further incorporate momentum into the update rule, yielding equally tight theoretical results. Comprehensive experiments on convex benchmarks and deep neural networks corroborate our theory: the proposed step size achieves competitive performance to existing adaptive baselines and exhibits stable behavior across a wide range of problem settings. Finally, in the context of deep neural network training, the gradient norms under our step size do not collapse to (near) zero, indicating robustness to vanishing gradients.2025-12-02T02:24:32Z43rd International Conference on Machine Learning (ICML 2026)Dimitris OikonomouNicolas Loizouhttp://arxiv.org/abs/2606.01827v1Adaptive Sharpness-Aware Minimization with a Polyak-type Step size: A Theory-Grounded Scheduler2026-06-01T07:42:01ZSharpness-Aware Minimization (SAM) has established itself as a powerful and widely adopted optimizer for training machine learning models. By explicitly minimizing the sharpness of the loss landscape, SAM often improves generalization while delivering strong empirical performance. However, SAM and its variants, like most training algorithms, are sensitive to the choice of learning rate, which is typically selected through extensive hyperparameter tuning or predefined schedulers. In this work, motivated by recent advances on the effectiveness of stochastic Polyak step sizes for Stochastic Gradient Descent (SGD), we derive Polyak schedulers tailored to SAM-style updates, yielding novel adaptive algorithms in both deterministic and stochastic settings. In the smooth setting, we prove linear convergence for strongly convex objectives and an $\mathcal{O}(1/T)$ convergence rate for convex objectives in the deterministic case. In the stochastic setting, we establish analogous convergence guarantees up to a neighborhood of the optimum. Numerical experiments demonstrate that the proposed Polyak schedulers achieve performance comparable to or better than carefully tuned SAM baselines, while substantially reducing the need for learning-rate tuning.2026-06-01T07:42:01Z43rd International Conference on Machine Learning (ICML 2026)Dimitris OikonomouNicolas Loizouhttp://arxiv.org/abs/2606.01799v1Tree-Guided Identify-Then-Exploit: A Unified Framework of Best Arm Identification and Regret Minimization for Dueling Bandits2026-06-01T07:17:49ZWe study $N$-armed stochastic dueling bandits under the Condorcet-winner assumption, where three widely adopted objectives are considered: best-arm identification (BAI), weak regret, and strong regret. We propose Tree-Guided Identify-Then-Exploit (TG-ITE), the first unified framework to tackle all these objectives to our knowledge. Without requiring stronger assumptions, we propose a shared tree-guided identification approach to find a high-confidence incumbent within $O(N)$ comparisons. We further propose varied exploitation strategies to utilize this warm-start stage to optimize the specific objectives at hand. This methodology enables our approach to (1) achieve $O(N)$ sample complexity in BAI without commonly adopted stronger assumptions; (2) build the first winner-stays-style algorithm to achieve $O(N)$ weak regret; (3) enjoy the same $O(N \log T)$ guarantee as specialized strong-regret approaches; (4) realize the joint optimization of BAI and weak regret with $O(N)$ guarantees for both, eliminating the sub-optimal gap of $O(\log N)$ in the existing approach. Our results provide evidence that the trade-off between BAI and regret minimization is relatively benign in dueling bandits.2026-06-01T07:17:49ZPu WangYao-Xiang Ding