https://arxiv.org/api/kG2bfjJ8KJIr5etbpPoRVPlti4I 2026-06-14T03:33:04Z 78354 225 15 http://arxiv.org/abs/2606.07926v1 Barycentric Projections of Optimal Transport Plans on Riemannian Manifolds 2026-06-06T01:31:58Z Optimal transport couplings are probabilistic objects, while many learning pipelines require deterministic maps. In Euclidean space, barycentric projection converts a coupling into a map by taking conditional expectations, but on a Riemannian manifold curvature and cut loci make this operation nontrivial. We develop a framework for barycentric projections of transport couplings on Riemannian manifolds. The intrinsic projection maps each source point to the conditional Fréchet mean of its destination law and is shown to be the best deterministic representative under squared geodesic loss. The corresponding minimum value is an integrated conditional Fréchet variance, which vanishes exactly for map-induced couplings and therefore defines a conditional-variance Monge defect. We also study a tangential log-exp projection, prove its Euclidean exactness, its compatibility with Brenier-McCann maps in the Monge case, and its interpretation as the first unit Riemannian gradient update for the intrinsic objective. For discrete couplings, both constructions decompose row-wise into weighted Fréchet mean and log-exp problems. Experiments on spherical data, synthetic SPD data, and real EEG covariance matrices support the proposed division of roles: the intrinsic projection is the variational representative, while the tangential projection is a useful local displacement surrogate. 2026-06-06T01:31:58Z Kisung You http://arxiv.org/abs/2606.07914v1 Identifiability and Estimation for Unlabeled Finite Mixtures under Marginal Independence 2026-06-06T00:29:35Z We study component recovery and mixing-matrix estimation from unlabeled finite mixtures whose observable distributions share the same latent components but have unknown mixing weights. The main identifying signal is marginal independence: each component is assumed to be independent on at least one coordinate pair, but no labels, clean component samples, or mixing weights are observed. We first prove a structural result for product components: under linear independence of the univariate marginals, any independent affine combination of the components must coincide with a single component. We then extend this principle to observable mixtures and show that, under full-rank and no-cancellation conditions, marginally independent affine combinations recover the corresponding latent components. When every component is independent on some coordinate pair, all components are identifiable, and the mixing matrix is recoverable under the stated completion conditions. Finally, we propose a Product-Marginal Maximum Mean Discrepancy (PM-MMD) estimator over affine combinations of the observable mixtures and prove uniform convergence and stability under approximate marginal independence. This framework also separates the empirical roles of the assumptions: irreducibility is, in general, not directly testable from the unlabeled mixtures alone, whereas marginal independence yields a candidate-level diagnostic through held-out PM-MMD. Controlled and flow-cytometry experiments show when marginal independence provides a useful recovery signal. In the reported multi-component comparisons, condition-aware representative selection stabilizes PM-MMD and improves recovery relative to clustering, factorization, and pairwise mixture-proportion baselines using the same unlabeled mixtures. 2026-06-06T00:29:35Z Takafumi Kanamori Yushi Hirose Shohei Yamamoto http://arxiv.org/abs/2606.00469v2 Constructive interpolation and generalization rates for neural ODEs: a control perspective 2026-06-05T23:48:46Z We study supervised regression with neural ODEs (NODEs) from a control-theoretic perspective to derive explicit population-risk bounds. We focus on a widely used class of non-autonomous models with constant parameters and explicit time dependence, which we call semi-autonomous NODEs (SA-NODEs). We constructively prove that SA-NODEs are capable of \emph{exact} interpolation of admissible finite datasets, and even satisfy a stronger property that we call \emph{simultaneous cell controllability} (SCC): their flows can map prescribed disjoint cells into arbitrarily small target balls. This property is the mechanism that upgrades interpolation into quantitative generalization, by allowing SA-NODEs to emulate piecewise-constant nonparametric estimators. Consequently, our risk bounds recover the rates of histogram and nearest-neighbor estimators, provided the network width satisfies a conservative scaling with the sample size. Numerical experiments show that trained SA-NODEs achieve competitive -- often lower -- test errors than these baselines. Finally, we show that the explicit time dependence is essential. Although two-layer autonomous NODEs can interpolate geometrically nondegenerate datasets, structural obstructions prevent them from achieving SCC. These limitations, further confirmed numerically, support the view that SA-NODEs provide a minimal effective architecture for learning. 2026-05-30T01:26:36Z 36 pages, 8 figures Antonio Álvarez-López Lorenzo Liverani Enrique Zuazua http://arxiv.org/abs/2606.07890v1 Partially Performative Prediction 2026-06-05T22:58:46Z Performative prediction studies feedback loops that arise when predictive models are deployed in consequential domains. In these settings, deploying a model can change the population whose patterns the model aims to predict, inducing a distribution shift that is endogenous to the learning system. This perspective departs from classical treatments of distribution shift, where shifts are typically modeled as exogenous changes in the data-generating process. Yet, in practice, distribution shift is rarely one or the other. Predictive models may influence future data through the decisions they support, while the world itself continues to drift for reasons beyond the learner's control. We study partially performative prediction, a framework that captures both endogenous and exogenous sources of distribution shift. The framework generalizes performative prediction by allowing the data distribution to evolve both in response to the deployed model and according to an external, time-varying process. We extend the central notions of performative stability and performative optimality to this setting by defining their online analogues that track the evolving partially performative environment. We analyze practical learning heuristics, including repeated retraining, and characterize when they successfully adapt to partially performative environments. 2026-06-05T22:58:46Z Jaewook Lee Tijana Zrnic http://arxiv.org/abs/2605.26703v2 Proper Calibeating 2026-06-05T22:02:25Z The classic concept of "calibrated forecasts" and its more recent refinement, "calibeating," are defined with respect to the standard quadratic scoring rule. We extend these notions to the class of $\textit{proper}$ scoring rules (for which the best forecast is the true distribution) and define $\textit{proper-calibration}$ and $\textit{proper-calibeating}$ by requiring the errors to converge to zero uniformly over all bounded proper scoring rules. We first establish that calibration always implies proper-calibration, whereas calibeating need not imply proper-calibeating. Second, we show how to guarantee proper-calibeating and proper-multicalibeating. Finally, we demonstrate the equivalence between proper-calibration and universal no regret when best replying to forecasts in decision-making under uncertainty. 2026-05-26T08:44:45Z v2: Updated section 6 "Decision Making Under Uncertainty" Dean P. Foster Sergiu Hart http://arxiv.org/abs/2606.07865v1 Instrumented data for causal scientific machine learning 2026-06-05T21:53:39Z Scientific machine learning is limited less by model size than by the data it is trained on. Observational data records what happened but not why; template synthetic data has a known generating process but only for the simulator's template, not the case a user faces. We argue a third option is now operationally feasible: instrumented data, in which every datum carries the mechanistic model that produced it, an explicit uncertainty over that model, and an executable family of counterfactuals. Verification-and-validation (V&V) instrumented image-to-simulation pipelines are one realisation: a sensor observation becomes a fully specified, solver-backed simulation with explicit, editable parameters and a propagated aleatoric/epistemic uncertainty. The substrate is case-specific, mechanistically supervised, and supports causal interventions through Pearl's do-operator. Near-term consequences for validation, auditing, and surrogate training span computational biology, climate, materials, fluid mechanics, and medical imaging; a longer-term, falsifiable implication concerns foundation models for scientific reasoning. 2026-06-05T21:53:39Z 10 pages, 2 figures Daniel N. Wilke http://arxiv.org/abs/2606.07841v1 Large-scale empirical tuning and comparison of default optimizers for variational inference 2026-06-05T21:04:12Z Black-box variational inference (BBVI) is a methodology for posterior approximation that relies on stochastic optimization. In practice, the stochastic optimizers underpinning BBVI generally require extensive problem-specific tuning, which undermines its promise as a truly "black box" inference algorithm. However, over the past decade, many new adaptive stochastic optimization algorithms have been developed that reduce or remove entirely the need for tuning. In this work, we investigate this new collection of adaptive methods in the context of BBVI, with the goal of establishing the current state of the art in tuning-free optimization-based inference. In particular, we present a large-scale empirical evaluation of 56 stochastic gradient-based optimization algorithms applied to 1092 Bayesian inference optimization problems, involving over 550,000 individual optimization runs and 15 core-years of compute. The optimization algorithms we evaluate are chosen to represent a wide spectrum of recent approaches and the benchmark problems are chosen to span a range of difficulty, with posterior target dimension 1-10^4, condition number 1-10^8, and a range of variational families. Our results show that no single method dominates, but running a selection of 5 algorithms suffices to reliably get close to the best-possible observed performance. We thus provide a strong baseline for applications where expert tuning is not possible and for comparison when developing new stochastic optimization algorithms. 2026-06-05T21:04:12Z Trevor Campbell Jonathan H. Huggins Kyurae Kim Charles C. Margossian http://arxiv.org/abs/2311.05009v5 Consensus-based adaptive sampling and approximation for high-dimensional energy landscapes 2026-06-05T19:02:55Z We present a consensus-based framework that unifies phase space exploration with posterior-residual-based adaptive sampling for surrogate construction in high-dimensional energy landscapes. Unlike standard approximation tasks where sampling points can be freely queried, physical systems with complex energy landscapes such as molecular dynamics (MD) do not have direct access to arbitrary sampling regions due to the physical constraints and energy barriers; the surrogate construction further relies on the dynamical exploration of phase space, posing a significant numerical challenge. We formulate the problem as a minimax optimization that jointly adapts both the surrogate approximation and residual-enhanced sampling. The construction of free energy surfaces (FESs) for high-dimensional collective variables (CVs) of MD systems is used as a motivating example to illustrate the essential idea. Specifically, the maximization step establishes a stochastic interacting particle system to impose adaptive sampling through both exploitation of a Laplace approximation of the max-residual region and exploration of uncharted phase space via temperature control. The minimization step updates the FES surrogate with the new sample set. Numerical results demonstrate the effectiveness of the present approach for biomolecular systems with up to 30 CVs. While we focus on the FES construction, the developed framework is general for efficient surrogate construction for complex systems with high-dimensional energy landscapes. 2023-11-08T20:32:27Z Liyao Lyu Huan Lei http://arxiv.org/abs/2606.07789v1 A Framework for Evaluating and Benchmarking Concept Drift Detection Methods 2026-06-05T19:02:17Z Data stream mining is fundamentally challenged by concept drift, where distributional changes can degrade model performance. Despite the proliferation of drift detection methods, progress in the field is hindered by inconsistent evaluation practices: studies rely on oversimplified synthetic data generators, adopt incompatible metrics, and lack transparency in hyperparameter selection, making fair comparisons difficult. We address this gap with a novel benchmarking framework comprising three contributions: (1) a drift simulation method that injects controlled distributional changes into real-world datasets via Monte Carlo trials, enabling supervised evaluation while preserving real-world data complexity; (2) an evaluation protocol for drift detection with timing-aware criteria, including the derivation of new metrics (e.g., F1 detection score, normalized detection time) that are comparable across streams; and (3) we advocate for a leave-one-dataset-out hyperparameter optimization protocol for drift detection methods that promotes configuration robustness across heterogeneous stream dynamics. We benchmark 14 widely used drift detection methods on 7 realworld datasets across 4 drift types (class prior, label swap, feature permutation, feature filtering), each under both abrupt and gradual transitions. Our experimental results provide insights into the strengths and weaknesses of current drift detection approaches while establishing baseline performance metrics for future research in this area. All code and experiments are publicly available. 2026-06-05T19:02:17Z Accepted in KDD'26 Vitor Cerqueira Heitor Murilo Gomes Marco Heyden Bernhard Pfahringer Albert Bifet http://arxiv.org/abs/2602.18364v2 Quantum Maximum Likelihood Prediction via Hilbert Space Embeddings 2026-06-05T18:59:00Z Maximum likelihood prediction (MLP) is a core task at the heart of modern large language models. Here, we study a quantum version of this task for a simplified data model consisting of independent and identically distributed samples, as a first step. The quantum maximum likelihood predictor is obtained by embedding of empirical probability distributions into quantum states and performing a minimization of quantum relative entropy over a given class of states. We provide an interpretation of this predictor in terms of quantum reverse information projection and quantum Pythagorean theorem when the class of quantum models is sufficiently expressive. We further derive non-asymptotic performance guarantees in terms of convergence rates and concentration inequalities, both in trace norm and quantum relative entropy. Our approach provides a unified framework to handle MLP within both classical and quantum LLMs. 2026-02-20T17:16:38Z 31+3 pages, 1 figure Sreejith Sreekumar Nir Weinberger http://arxiv.org/abs/2605.01446v3 Sequential Minimal Optimization for $\varepsilon$-SVR with MAPE Loss and Sample-Dependent Box Constraints 2026-06-05T18:30:10Z Support vector regression with Mean Absolute Percentage Error (MAPE) loss is theoretically well-motivated for forecasting applications where accuracy is evaluated in relative terms, but the sample-dependent dual box constraints it induces have not been addressed in the published SMO literature. We derive a Sequential Minimal Optimization algorithm for this setting and prove a structural-invariance result: the MAPE modification affects exactly two components of the SMO iteration -- working-set selection and analytic-update clipping -- leaving gradient bookkeeping and curvature computation identical to classical epsilon-SVR. Building on this invariance, we establish four efficiency improvements (asymmetric freeze-counters, warm-starting, block working-set updates of size four, and per-pair tolerance scaling) and resolve a previously-open convergence problem for the odd-symmetry kernel variant via adaptive spectral regularization. Numerical validation against three reference solvers across eleven synthetic configurations certifies solution agreement within standard tolerance. Wall-time benchmarks show the present algorithm achieves the lowest median runtime on every tested configuration against OSQP, MOSEK, and Clarabel. At production scale, the algorithm converges on the California Housing benchmark while the patched LIBSVM reference implementation reaches its iteration ceiling without satisfying optimality -- demonstrating the practical necessity of the theoretical efficiency mechanisms. An open-source R package and an explicit solver-adaptation recipe are provided. 2026-05-02T13:51:46Z 82 pages, 3 figure, 13 tables Pablo Benavides-Herrera Riemann Ruiz-Cruz Juan Diego Sánchez-Torres http://arxiv.org/abs/2602.02431v2 Full-Batch Gradient Descent Outperforms One-Pass SGD: Sample Complexity Separation in Single-Index Learning 2026-06-05T18:19:06Z It is folklore that reusing training data more than once can improve the statistical efficiency of gradient-based learning. While this phenomenon has been extensively studied in linear regression, the benefit of multi-pass gradient descent (GD, which reuses all the data) over one-pass stochastic gradient descent (online SGD, which uses each data point only once) is not well-understood in nonlinear and non-convex settings, except for a loss modification mechanism achieved by the first two passes on the data. In this work, we consider learning a $d$-dimensional single-index model with a quadratic activation, for which it is known that one-pass SGD requires $n\gtrsim d\log d$ samples to achieve weak recovery. We first show that this $\log d$ factor in the sample complexity persists for full-batch spherical GD on the correlation loss; however, by simply truncating the activation, full-batch GD exhibits a favorable optimization landscape at $n \simeq d$ samples, thereby outperforming one-pass SGD (with the same activation) in statistical efficiency. We complement this result with a trajectory analysis of full-batch GD on the squared loss from small initialization, showing that $n \gtrsim d$ samples and $T \gtrsim\log d$ gradient steps suffice to achieve strong (exact) recovery. 2026-02-02T18:31:51Z Accepted to ICML 2026 Filip Kovačević Hong Chang Ji Denny Wu Mahdi Soltanolkotabi Marco Mondelli http://arxiv.org/abs/2606.07492v1 Bradley-Terry Rankings for Recommender Systems Across Dataset Taxonomies 2026-06-05T17:46:36Z The ranking of recommendation algorithms is a challenging problem since model performance is sensitive to dataset characteristics such as sparsity, sequential structure, and scale. This drives a demand for a proper methodology for fair comparison between algorithms. Naive aggregation of performance metrics (e.g., averaging NDCG over benchmarks) can yield misleading rankings, undermining practical selection. To address this problem, we introduce a novel, data-driven ranking methodology based on Bradley-Terry (BT) model. We demonstrate that the obtained ranking depends on key dataset statistics. Additionally, we propose a novel metric for evaluating ranking consistency and demonstrate robustness of our ranking to incomplete data. Finally, we introduce a dataset-specific methodology for ranking algorithms on unseen datasets without running the models, relying on extensions of the Bradley-Terry framework, including BT trees and BT models with covariates. 2026-06-05T17:46:36Z KDD'26 Ekaterina Grishina Stepan Kuznetsov Askar Tsyganov Ilya Ivanov Daria Korovaitceva Margarita Rusanova Uliana Parkina Alexander Derevyagin Evgeny Frolov Sergey Samsonov Anton Lysenko 10.1145/3770855.3817890 http://arxiv.org/abs/2606.07483v1 Network Recovery from Cascade Data: A Debiased Jacobian-Based Machine Learning Approach 2026-06-05T17:38:05Z Many important outcomes unfold as dynamic cascades, including product adoption, disease spread, financial distress, and information diffusion. A central challenge is to recover the hidden influence network behind these cascades. Existing methods typically assume a specific diffusion model, and their performance degrades substantially when that assumption is misspecified. We propose CascadeNet, a Jacobian-based machine learning framework for network recovery that does not require specifying a diffusion mechanism. The key idea is that the underlying influence structure can be characterized by the Jacobian of the one-step transition function. CascadeNet first constructs a flexible estimator of the transition function, and further applies Neyman-orthogonal debiasing via the Riesz representer, so that the debiased Jacobian is $\sqrt{n}$-consistent and asymptotically normal, enabling formal inference on the network structure. We validate CascadeNet in both a simulation exercise and a real-world empirical application. In simulations, where the data-generating process is known, CascadeNet achieves the highest network recovery accuracy across nine common data-generating processes. In an empirical application to COVID-19 transmission across Spain's 52 provinces, CascadeNet recovers transmission networks that are significantly correlated with the true inter-province mobility network, whereas networks recovered by baseline methods show no significant alignment with the ground truth. 2026-06-05T17:38:05Z Lei Huang http://arxiv.org/abs/2606.07457v1 Time series Foundation Models based on Physics-Informed Synthetic Histories for Cold-Start Photovoltaic Forecasting 2026-06-05T17:05:19Z At commissioning time, Photovoltaic (PV) operators must forecast production before target-site observations are available, limiting the direct use of standard supervised forecasters. This cold-start setting is addressed with a zero-shot pipeline that generates a synthetic production history from plant metadata and meteorological covariates, enabling time-series foundation models (TSFMs) to forecast through inference-time conditioning. Five TSFMs are benchmarked against classical baselines under strict Cold-Start Baseline, Real Feedback, and Self-Forecast Feedback strategies. The evaluation spans $440$ PV sites across four datasets and diverse climate regimes. Covariate-aware foundation models outperform baselines by approximately $1.7-2\times$: TabPFN-TS achieves the lowest error under Real Feedback (MAE $0.514$, RMSE $0.721$ $kWh$ ${kWp}^{-1}$ ${d}^{-1}$), while Chronos-2 is most robust under Self-Forecast Feedback. Performance is largely insensitive to the synthetic-history source, indicating that accuracy is driven more by the availability of plausible temporal context than by the specific generator. 2026-06-05T17:05:19Z To be published in the 2nd ICML Workshop on Foundation Models for Structured Data Lorenzo Longarini Alessandro Rongoni Simone Silenzi Emanuele Frontoni Riccardo Rosati