https://arxiv.org/api/aGeuck7CLk0IXhNQfT6rwYaAFFM2026-06-14T17:16:18Z7835442015http://arxiv.org/abs/2606.03553v1A Robust Optimization Approach to Sparse Principal Component Analysis2026-06-02T12:14:12ZWhile principal component analysis (PCA) is a fundamental tool for dimensionality reduction, its dense representations make it ill-suited for high-dimensional data. Existing methods address this by promoting sparsity through explicit $\ell_1$-penalties, but these are not obvious to tune due to the unsupervised nature of the task. In contrast, we propose Adversarial PCA (AdvPCA), which leverages robust optimization to achieve sparsity by optimizing the reconstruction objective against bounded, worst-case latent space perturbations. We show that this formulation admits a closed-form reduction, leading to a practical iterative algorithm that alternates between adversarial linear regression-style updates for the sparse encoder and orthogonal updates for the decoder. By theoretically characterizing the solution, we derive a data-adaptive parameterization that allows the algorithm to perform effectively out of the box. We validate these claims through numerical experiments on synthetic and real-world genomics data.2026-06-02T12:14:12ZDavid VävinggrenFrancis BachAndré M. H. TeixeiraDave ZachariahAntônio H. Ribeirohttp://arxiv.org/abs/2606.01340v2Sample Complexity and Decision-Theoretic Guarantees for Bayesian Model Averaging over Decision Trees with Catalan-Exponential Priors2026-06-02T11:54:11ZWe ask: when do Bayesian model averaging (BMA) weights over decision trees carry sufficient epistemic information to justify committed exploitation of the averaging distribution? We answer this question in closed form for Bayesian decision trees (BDTs) with Dirichlet-Multinomial leaf models and a Catalan-exponential tree-size prior (Schetinin&Jakaite, 2025), establishing a complete non-asymptotic theory of rational commitment thresholds.2026-05-31T16:43:54Z22 pages, 3 figures, Submitted to the Journal of Machine Learning ResearchLivija JakaiteVitaly Schetininhttp://arxiv.org/abs/2605.19805v2Latent Laplace Diffusion for Irregular Multivariate Time Series2026-06-02T11:20:56ZIrregular multivariate time series impose a trade-off for long-horizon forecasting: discrete methods can distort temporal structure via re-gridding, while continuous-time models often require sequential solvers prone to drift. To bridge this gap, we present Latent Laplace Diffusion (LLapDiff), a generative framework that models the target as a low-dimensional latent trajectory, enabling horizon-wide generation without step-by-step integration over physical time. We guide the reverse process utilizing a stable modal parameterization motivated by stochastic port-Hamiltonian dynamics, and parameterize its mean evolution in the Laplace domain via learnable complex-conjugate poles, enabling direct evaluation over irregular timestamps. We also link continuous dynamics to irregular observations through renewal-averaging analysis, which maps sampling gaps to effective event-domain poles and motivates a gap-aware history summarizer. Extensive experiments show that LLapDiff improves over baselines in long-horizon forecasting, and its continuous-time generative nature supports missing-value imputation by querying the same model at historical timestamps. Code is available at https://github.com/pixelhero98/LLapDiffusion.2026-05-19T13:04:53ZAccepted as a Spotlight at ICML 2026. The Version of Record will appear in Proceedings of Machine Learning Research (PMLR). 27 pages, 5 figures. Code: https://github.com/pixelhero98/LLapDiffusionZinuo YouJin ZhengJohn Cartlidgehttp://arxiv.org/abs/2602.10949v2Optimal Initialization in Depth: Lyapunov Initialization and Limit Theorems for Deep Leaky ReLU Networks2026-06-02T11:05:28ZEffective initialization in deep networks requires an understanding of random neural networks. In this work, a rigorous probabilistic analysis of deep bias-free random Leaky ReLU networks is provided. We prove a Law of Large Numbers and a Central Limit Theorem for the logarithm of the norm of network activations, establishing that, as the number of layers increases, their growth is governed by a parameter called the Lyapunov exponent. This parameter characterizes a sharp phase transition between vanishing and exploding activations, and we calculate the Lyapunov exponent explicitly for Gaussian or orthogonal weight matrices. Our results reveal that standard methods, such as He initialization or orthogonal initialization, do not guarantee activation stability for deep networks of low width. Based on these theoretical insights, we propose a novel initialization method, referred to as Lyapunov initialization, which sets the Lyapunov exponent to zero and thereby ensures that the neural network is as stable as possible, leading empirically to improved learning.2026-02-11T15:36:13ZPreprint, 44 pagesConstantin KoglerTassilo SchwarzSamuel Kittlehttp://arxiv.org/abs/2511.05050v3Estimating Bidirectional Causal Effects with Large Scale Online Kernel Learning2026-06-02T10:50:28ZIn this study, a scalable online kernel learning framework is proposed for estimating bidirectional causal effects in systems characterized by mutual dependence and heteroskedasticity. Traditional causal inference often focuses on unidirectional effects, overlooking the common bidirectional relationships in real-world phenomena. Building on heteroskedasticity-based identification, the proposed method integrates a quasi-maximum likelihood estimator for simultaneous equation models with large scale online kernel learning. It employs random Fourier feature approximations to flexibly model nonlinear conditional means and variances, while an adaptive online gradient descent algorithm ensures computational efficiency for streaming and high-dimensional data. Results from extensive simulations demonstrate that the proposed method achieves superior accuracy and stability than single equation and polynomial approximation baselines, exhibiting lower bias and root mean squared error across various data-generating processes. These results confirm that the proposed approach effectively captures complex bidirectional causal effects with near-linear computational scaling. By combining econometric identification with modern machine learning techniques, the proposed framework offers a practical, scalable, and theoretically grounded solution for large scale causal inference in natural/social science, policy making, business, and industrial applications.2025-11-07T07:44:06ZProceedings of the 2025 International Conference on Data Science and Intelligent Systems (DSIS 2025), Article 65, pp. 449-455Masahiro Tanaka10.1109/DSIS67228.2025.11390623http://arxiv.org/abs/2502.08006v3Greed is Good: A Unifying Perspective on Guided Generation2026-06-02T10:22:32ZTraining-free guided generation is a widely used and powerful technique that allows the end user to exert further control over the generative process of flow/diffusion models. Generally speaking, two families of techniques have emerged for solving this problem for gradient-based guidance: namely, posterior guidance (i.e., guidance via projecting the current sample to the target distribution via the target prediction model) and end-to-end guidance (i.e., guidance by performing backpropagation throughout the entire ODE solve). In this work, we show that these two seemingly separate families can actually be unified by looking at posterior guidance as a greedy strategy of end-to-end guidance. We explore the theoretical connections between these two families and provide an in-depth theoretical of these two techniques relative to the continuous ideal gradients. Motivated by this analysis we then show a method for interpolating between these two families enabling a trade-off between compute and accuracy of the guidance gradients. We then validate this work on several inverse image problems and property-guided molecular generation.2025-02-11T23:05:16ZAccepted at NeurIPS 2025Zander W. BlasingameChen Liuhttp://arxiv.org/abs/2502.08834v4Rex: A Family of Reversible Exponential (Stochastic) Runge-Kutta Solvers2026-06-02T10:04:14ZDeep generative models based on neural differential equations have become state-of-the-art for many generation tasks. These models rely on ODE/SDE solvers that integrate from a prior distribution to the data distribution; in many applications it is also highly desirable to integrate in the inverse direction. Standard solvers, however, accumulate discretization errors that prohibit exact inversion, an inaccuracy that is unacceptable in precision-critical applications. Existing inversion methods suffer from poor stability and low order of convergence, and are strictly limited to the ODE setting. In this work, we propose Rex, a family of reversible exponential (stochastic) Runge-Kutta solvers obtained by applying Lawson methods to convert any explicit (stochastic) Runge-Kutta scheme into an algebraically reversible one for both diffusion ODEs and SDEs. Beyond a rigorous theoretical analysis -- establishing arbitrary-order convergence and a non-zero region of linear stability -- we empirically demonstrate that Rex achieves near-machine-precision reconstruction and improves Boltzmann sampling with flow models as well as image generation and editing with diffusion models.2025-02-12T22:51:54ZAccepted as an Oral presentation at ICML 2026Zander W. BlasingameChen Liuhttp://arxiv.org/abs/2605.30253v2Wasserstein Contraction of Coordinate Ascent Variational Inference2026-06-02T09:51:10ZWe study the contraction in Wasserstein distance of the coordinate ascent variational inference algorithm. This is shown to hold under a transport-information inequality at the fixed points and a functional smoothness condition. The results are general and sharp, allow for local convergence guarantees, hold for general smooth manifolds, and also in some non-smooth spaces. We consider applications to Bayesian Gaussian Mixture Models, and high-dimensional Bayesian Probit Regression, and Logistic Regression with Pólya-Gamma random variables (i.e. Jaakkola-Jordan's algorithm).2026-05-28T17:16:22Z17 pages + 3 pages appendix, 3 figures. V2 fixes some citations not displaying properly in the appendix. No content change compared to prior versionRocco CaprioAdrien CorenflosSam Powerhttp://arxiv.org/abs/2510.20372v4Testing Most Influential Sets2026-06-02T09:35:36ZSmall influential data subsets can dramatically impact model conclusions, with a few data points overturning key findings. While recent work identifies these most influential sets, there is no formal way to tell when maximum influence is excessive rather than expected under natural random sampling variation. We address this gap by developing a principled framework for most influential sets. Focusing on linear least-squares, we derive a convenient exact influence formula and identify the extreme value distributions of maximal influence - the heavy-tailed Fréchet for constant-size sets and heavy-tailed data, and the well-behaved Gumbel for growing sets or light tails. This allows us to conduct rigorous hypothesis tests for excessive influence. We demonstrate through applications across economics, biology, and machine learning benchmarks, resolving contested findings and replacing ad-hoc heuristics with rigorous inference.2025-10-23T09:12:29ZPublished as a conference paper at ICLR 2026Lucas D. KonradNikolas Kuschnighttp://arxiv.org/abs/2605.11607v2Exact Stiefel Optimization for Probabilistic PLS: Closed-Form Updates, Error Bounds, and Calibrated Uncertainty2026-06-02T09:14:53ZProbabilistic partial least squares (PPLS) is a central likelihood-based model for two-view learning when one needs both interpretable latent factors and calibrated uncertainty. Building on the identifiable parameterization of Bouhaddani et al.\ (2018), existing fitting pipelines still face two practical bottlenecks: noise--signal coupling under joint EM/ECM updates and nontrivial handling of orthogonality constraints. Following the fixed-noise scalar-likelihood protocol, we develop an end-to-end framework that combines noise pre-estimation, constrained likelihood optimization, and prediction calibration in one pipeline. We estimate the observation noise from the low-eigenvalue noise subspace and enforce orthogonality through exact Stiefel-manifold optimization. The noise-subspace estimator attains a signal-strength-independent leading finite-sample rate and matches a minimax lower bound, whereas a full-spectrum noise estimator carries a deterministic bias under the same model. We further extend the framework to sub-Gaussian settings via optional Gaussianization and provide closed-form standard errors through a block-structured Fisher analysis. Across synthetic high-noise settings and two multi-omics benchmarks (TCGA-BRCA and PBMC CITE-seq), the method achieves near-nominal coverage without post-hoc recalibration, reaches Ridge-level point accuracy on TCGA-BRCA at rank $r=3$, matches or exceeds PO2PLS on cross-view prediction while providing native calibrated uncertainty, and improves stability of parameter recovery.2026-05-12T06:38:12ZHaoran HuXingce Wanghttp://arxiv.org/abs/2606.03347v1AugMask: Training Diffusion Models on Incomplete Tabular Data via Stochastic Augmentation and Masking2026-06-02T08:57:38ZScore-based diffusion models have emerged as prominent deep generative models; however, their application to tabular data remains challenging because their backbones assume fully specified inputs, whereas real-world tabular data often contain missing values. We propose AugMask, a plug-and-play training framework that adapts missing-unaware backbones to incomplete data by separating conditioning from supervision. AugMask 1) constructs numeric inputs via conditional stochastic augmentation using lightweight auxiliary models, and 2) applies denoising supervision only to observed coordinates. In effect, augmented missing entries serve as uncertain conditioning context rather than training targets. We connect this training rule to a Rao--Blackwellized objective and show that marginalizing missing entries yields a variance-weighted sensitivity penalty, discouraging over-reliance on uncertain completions. Across diverse datasets and missingness regimes, AugMask enables standard diffusion-based tabular generators to outperform specialized missing-aware baselines.2026-06-02T08:57:38ZJungkyu KimTaeyoung ParkKibok Leehttp://arxiv.org/abs/2506.13107v4Honesty in Causal Forests: When It Helps and When It Hurts2026-06-02T08:57:14ZCausal forests estimate how treatment effects vary across individuals, guiding personalized interventions in areas like marketing, operations, and public policy. A standard practice is honest estimation: dividing the data into two samples, one to define subgroups and another to estimate treatment effects within them. This is intended to reduce overfitting and is the default in many software packages. But is it the right choice? We show that honest estimation can reduce the accuracy of estimates of individual treatment effects, especially when effect heterogeneity is substantial and datasets are large enough to detect it. The reason is a bias-variance trade-off: honesty lowers the risk of overfitting but increases the risk of underfitting by limiting the data available to detect and model heterogeneity. Across more than 7,000 benchmark datasets, we find that the cost of using honesty by default can be as high as requiring 27% more data to match the performance of models trained without it. Honesty is best understood as a form of regularization. Whether to adopt it should depend on the goals of the application and its empirical performance, not on reflexive default use.2025-06-16T05:32:58ZYanfang HouCarlos Fernández-Loríahttp://arxiv.org/abs/2412.05109v2Generating Rectifiable Measures through Neural Networks2026-06-02T08:37:17ZWe derive universal approximation results for the class of (countably) $m$-rectifiable measures. Specifically, we prove that $m$-rectifiable measures can be approximated as push-forwards of the one-dimensional Lebesgue measure on $[0,1]$ using ReLU neural networks with arbitrarily small approximation error in terms of Wasserstein distance. What is more, the weights in the networks under consideration are quantized and bounded and the number of ReLU neural networks required to achieve an approximation error of $\varepsilon$ is no larger than $2^{b(\varepsilon)}$ with $b(\varepsilon)=\mathcal{O}(\varepsilon^{-m}\log^2(\varepsilon))$. This result improves Lemma IX.4 in Perekrestenko et al. as it shows that the rate at which $b(\varepsilon)$ tends to infinity as $\varepsilon$ tends to zero equals the rectifiability parameter $m$, which can be much smaller than the ambient dimension. We extend this result to countably $m$-rectifiable measures and show that this rate still equals the rectifiability parameter $m$ provided that, among other technical assumptions, the measure decays exponentially on the individual components of the countably $m$-rectifiable support set.2024-12-06T15:10:04ZErwin RieglerAlex BühlerYang PanHelmut Bölcskeihttp://arxiv.org/abs/2510.16462v3Buzz, Choose, Forget: A Meta-Bandit Framework for Bee-Like Decision Making2026-06-02T08:32:59ZThis work introduces MAYA, a sequential imitation learning model based on multi-armed bandits, designed to reproduce and predict individual bees' decisions in contextualized foraging tasks. The model accounts for bees' limited memory through a temporal window $τ$, whose optimal value is around 7 trials, with a slight dependence on weather conditions. Experimental results on real, simulated, and complementary (mice) datasets show that MAYA (particularly with the Wasserstein distance) outperforms imitation baselines and classical statistical models, while providing interpretability of individual learning strategies and enabling the inference of realistic trajectories for prospective ecological applications.2025-10-18T12:03:15ZEmmanuelle ClaeysElena KerjeanJean-Michel Loubeshttp://arxiv.org/abs/2507.10419v3Multiple Choice Learning of Low-Rank Adapters for Language Modeling2026-06-02T08:21:14ZWe propose LoRA-MCL, a training scheme that extends next-token prediction in language models with a method designed to decode diverse, plausible sentence continuations at inference time. Traditional language modeling is an intrinsically ill-posed problem: given a context, multiple futures may be equally plausible. Our approach leverages Multiple Choice Learning (MCL) and the winner-takes-all loss to efficiently handle ambiguity through Low-Rank Adaptation. We provide a theoretical interpretation of applying MCL to language modeling, assuming the data is generated from a mixture of distributions. We illustrate the proposed approach using mixtures of Markov chains. We then demonstrate with experiments on audio and visual captioning, as well as machine translation, that our method achieves high diversity and relevance in generated outputs. We release the code for applying LoRA-MCL to a wide range of language models.2025-07-14T16:00:51ZICML 2026Victor LetzelterHugo MalardMathieu FontaineGaël RichardSlim EssidAndrei BursucPatrick Pérez