https://arxiv.org/api/kG2bfjJ8KJIr5etbpPoRVPlti4I2026-06-14T03:33:04Z7835422515http://arxiv.org/abs/2606.07926v1Barycentric Projections of Optimal Transport Plans on Riemannian Manifolds2026-06-06T01:31:58ZOptimal transport couplings are probabilistic objects, while many learning pipelines require deterministic maps. In Euclidean space, barycentric projection converts a coupling into a map by taking conditional expectations, but on a Riemannian manifold curvature and cut loci make this operation nontrivial. We develop a framework for barycentric projections of transport couplings on Riemannian manifolds. The intrinsic projection maps each source point to the conditional Fréchet mean of its destination law and is shown to be the best deterministic representative under squared geodesic loss. The corresponding minimum value is an integrated conditional Fréchet variance, which vanishes exactly for map-induced couplings and therefore defines a conditional-variance Monge defect. We also study a tangential log-exp projection, prove its Euclidean exactness, its compatibility with Brenier-McCann maps in the Monge case, and its interpretation as the first unit Riemannian gradient update for the intrinsic objective. For discrete couplings, both constructions decompose row-wise into weighted Fréchet mean and log-exp problems. Experiments on spherical data, synthetic SPD data, and real EEG covariance matrices support the proposed division of roles: the intrinsic projection is the variational representative, while the tangential projection is a useful local displacement surrogate.2026-06-06T01:31:58ZKisung Youhttp://arxiv.org/abs/2606.07914v1Identifiability and Estimation for Unlabeled Finite Mixtures under Marginal Independence2026-06-06T00:29:35ZWe study component recovery and mixing-matrix estimation from unlabeled finite mixtures whose observable distributions share the same latent components but have unknown mixing weights. The main identifying signal is marginal independence: each component is assumed to be independent on at least one coordinate pair, but no labels, clean component samples, or mixing weights are observed. We first prove a structural result for product components: under linear independence of the univariate marginals, any independent affine combination of the components must coincide with a single component. We then extend this principle to observable mixtures and show that, under full-rank and no-cancellation conditions, marginally independent affine combinations recover the corresponding latent components. When every component is independent on some coordinate pair, all components are identifiable, and the mixing matrix is recoverable under the stated completion conditions. Finally, we propose a Product-Marginal Maximum Mean Discrepancy (PM-MMD) estimator over affine combinations of the observable mixtures and prove uniform convergence and stability under approximate marginal independence. This framework also separates the empirical roles of the assumptions: irreducibility is, in general, not directly testable from the unlabeled mixtures alone, whereas marginal independence yields a candidate-level diagnostic through held-out PM-MMD. Controlled and flow-cytometry experiments show when marginal independence provides a useful recovery signal. In the reported multi-component comparisons, condition-aware representative selection stabilizes PM-MMD and improves recovery relative to clustering, factorization, and pairwise mixture-proportion baselines using the same unlabeled mixtures.2026-06-06T00:29:35ZTakafumi KanamoriYushi HiroseShohei Yamamotohttp://arxiv.org/abs/2606.00469v2Constructive interpolation and generalization rates for neural ODEs: a control perspective2026-06-05T23:48:46ZWe study supervised regression with neural ODEs (NODEs) from a control-theoretic perspective to derive explicit population-risk bounds. We focus on a widely used class of non-autonomous models with constant parameters and explicit time dependence, which we call semi-autonomous NODEs (SA-NODEs). We constructively prove that SA-NODEs are capable of \emph{exact} interpolation of admissible finite datasets, and even satisfy a stronger property that we call \emph{simultaneous cell controllability} (SCC): their flows can map prescribed disjoint cells into arbitrarily small target balls. This property is the mechanism that upgrades interpolation into quantitative generalization, by allowing SA-NODEs to emulate piecewise-constant nonparametric estimators. Consequently, our risk bounds recover the rates of histogram and nearest-neighbor estimators, provided the network width satisfies a conservative scaling with the sample size. Numerical experiments show that trained SA-NODEs achieve competitive -- often lower -- test errors than these baselines. Finally, we show that the explicit time dependence is essential. Although two-layer autonomous NODEs can interpolate geometrically nondegenerate datasets, structural obstructions prevent them from achieving SCC. These limitations, further confirmed numerically, support the view that SA-NODEs provide a minimal effective architecture for learning.2026-05-30T01:26:36Z36 pages, 8 figuresAntonio Álvarez-LópezLorenzo LiveraniEnrique Zuazuahttp://arxiv.org/abs/2606.07890v1Partially Performative Prediction2026-06-05T22:58:46ZPerformative prediction studies feedback loops that arise when predictive models are deployed in consequential domains. In these settings, deploying a model can change the population whose patterns the model aims to predict, inducing a distribution shift that is endogenous to the learning system. This perspective departs from classical treatments of distribution shift, where shifts are typically modeled as exogenous changes in the data-generating process. Yet, in practice, distribution shift is rarely one or the other. Predictive models may influence future data through the decisions they support, while the world itself continues to drift for reasons beyond the learner's control. We study partially performative prediction, a framework that captures both endogenous and exogenous sources of distribution shift. The framework generalizes performative prediction by allowing the data distribution to evolve both in response to the deployed model and according to an external, time-varying process. We extend the central notions of performative stability and performative optimality to this setting by defining their online analogues that track the evolving partially performative environment. We analyze practical learning heuristics, including repeated retraining, and characterize when they successfully adapt to partially performative environments.2026-06-05T22:58:46ZJaewook LeeTijana Zrnichttp://arxiv.org/abs/2605.26703v2Proper Calibeating2026-06-05T22:02:25ZThe classic concept of "calibrated forecasts" and its more recent refinement, "calibeating," are defined with respect to the standard quadratic scoring rule. We extend these notions to the class of $\textit{proper}$ scoring rules (for which the best forecast is the true distribution) and define $\textit{proper-calibration}$ and $\textit{proper-calibeating}$ by requiring the errors to converge to zero uniformly over all bounded proper scoring rules. We first establish that calibration always implies proper-calibration, whereas calibeating need not imply proper-calibeating. Second, we show how to guarantee proper-calibeating and proper-multicalibeating. Finally, we demonstrate the equivalence between proper-calibration and universal no regret when best replying to forecasts in decision-making under uncertainty.2026-05-26T08:44:45Zv2: Updated section 6 "Decision Making Under Uncertainty"Dean P. FosterSergiu Harthttp://arxiv.org/abs/2606.07865v1Instrumented data for causal scientific machine learning2026-06-05T21:53:39ZScientific machine learning is limited less by model size than by the data it is trained on. Observational data records what happened but not why; template synthetic data has a known generating process but only for the simulator's template, not the case a user faces. We argue a third option is now operationally feasible: instrumented data, in which every datum carries the mechanistic model that produced it, an explicit uncertainty over that model, and an executable family of counterfactuals. Verification-and-validation (V&V) instrumented image-to-simulation pipelines are one realisation: a sensor observation becomes a fully specified, solver-backed simulation with explicit, editable parameters and a propagated aleatoric/epistemic uncertainty. The substrate is case-specific, mechanistically supervised, and supports causal interventions through Pearl's do-operator. Near-term consequences for validation, auditing, and surrogate training span computational biology, climate, materials, fluid mechanics, and medical imaging; a longer-term, falsifiable implication concerns foundation models for scientific reasoning.2026-06-05T21:53:39Z10 pages, 2 figuresDaniel N. Wilkehttp://arxiv.org/abs/2606.07841v1Large-scale empirical tuning and comparison of default optimizers for variational inference2026-06-05T21:04:12ZBlack-box variational inference (BBVI) is a methodology for posterior approximation that relies on stochastic optimization. In practice, the stochastic optimizers underpinning BBVI generally require extensive problem-specific tuning, which undermines its promise as a truly "black box" inference algorithm. However, over the past decade, many new adaptive stochastic optimization algorithms have been developed that reduce or remove entirely the need for tuning. In this work, we investigate this new collection of adaptive methods in the context of BBVI, with the goal of establishing the current state of the art in tuning-free optimization-based inference. In particular, we present a large-scale empirical evaluation of 56 stochastic gradient-based optimization algorithms applied to 1092 Bayesian inference optimization problems, involving over 550,000 individual optimization runs and 15 core-years of compute. The optimization algorithms we evaluate are chosen to represent a wide spectrum of recent approaches and the benchmark problems are chosen to span a range of difficulty, with posterior target dimension 1-10^4, condition number 1-10^8, and a range of variational families. Our results show that no single method dominates, but running a selection of 5 algorithms suffices to reliably get close to the best-possible observed performance. We thus provide a strong baseline for applications where expert tuning is not possible and for comparison when developing new stochastic optimization algorithms.2026-06-05T21:04:12ZTrevor CampbellJonathan H. HugginsKyurae KimCharles C. Margossianhttp://arxiv.org/abs/2311.05009v5Consensus-based adaptive sampling and approximation for high-dimensional energy landscapes2026-06-05T19:02:55ZWe present a consensus-based framework that unifies phase space exploration with posterior-residual-based adaptive sampling for surrogate construction in high-dimensional energy landscapes. Unlike standard approximation tasks where sampling points can be freely queried, physical systems with complex energy landscapes such as molecular dynamics (MD) do not have direct access to arbitrary sampling regions due to the physical constraints and energy barriers; the surrogate construction further relies on the dynamical exploration of phase space, posing a significant numerical challenge. We formulate the problem as a minimax optimization that jointly adapts both the surrogate approximation and residual-enhanced sampling. The construction of free energy surfaces (FESs) for high-dimensional collective variables (CVs) of MD systems is used as a motivating example to illustrate the essential idea. Specifically, the maximization step establishes a stochastic interacting particle system to impose adaptive sampling through both exploitation of a Laplace approximation of the max-residual region and exploration of uncharted phase space via temperature control. The minimization step updates the FES surrogate with the new sample set. Numerical results demonstrate the effectiveness of the present approach for biomolecular systems with up to 30 CVs. While we focus on the FES construction, the developed framework is general for efficient surrogate construction for complex systems with high-dimensional energy landscapes.2023-11-08T20:32:27ZLiyao LyuHuan Leihttp://arxiv.org/abs/2606.07789v1A Framework for Evaluating and Benchmarking Concept Drift Detection Methods2026-06-05T19:02:17ZData stream mining is fundamentally challenged by concept drift, where distributional changes can degrade model performance. Despite the proliferation of drift detection methods, progress in the field is hindered by inconsistent evaluation practices: studies rely on oversimplified synthetic data generators, adopt incompatible metrics, and lack transparency in hyperparameter selection, making fair comparisons difficult. We address this gap with a novel benchmarking framework comprising three contributions: (1) a drift simulation method that injects controlled distributional changes into real-world datasets via Monte Carlo trials, enabling supervised evaluation while preserving real-world data complexity; (2) an evaluation protocol for drift detection with timing-aware criteria, including the derivation of new metrics (e.g., F1 detection score, normalized detection time) that are comparable across streams; and (3) we advocate for a leave-one-dataset-out hyperparameter optimization protocol for drift detection methods that promotes configuration robustness across heterogeneous stream dynamics. We benchmark 14 widely used drift detection methods on 7 realworld datasets across 4 drift types (class prior, label swap, feature permutation, feature filtering), each under both abrupt and gradual transitions. Our experimental results provide insights into the strengths and weaknesses of current drift detection approaches while establishing baseline performance metrics for future research in this area. All code and experiments are publicly available.2026-06-05T19:02:17ZAccepted in KDD'26Vitor CerqueiraHeitor Murilo GomesMarco HeydenBernhard PfahringerAlbert Bifethttp://arxiv.org/abs/2602.18364v2Quantum Maximum Likelihood Prediction via Hilbert Space Embeddings2026-06-05T18:59:00ZMaximum likelihood prediction (MLP) is a core task at the heart of modern large language models. Here, we study a quantum version of this task for a simplified data model consisting of independent and identically distributed samples, as a first step. The quantum maximum likelihood predictor is obtained by embedding of empirical probability distributions into quantum states and performing a minimization of quantum relative entropy over a given class of states. We provide an interpretation of this predictor in terms of quantum reverse information projection and quantum Pythagorean theorem when the class of quantum models is sufficiently expressive. We further derive non-asymptotic performance guarantees in terms of convergence rates and concentration inequalities, both in trace norm and quantum relative entropy. Our approach provides a unified framework to handle MLP within both classical and quantum LLMs.2026-02-20T17:16:38Z31+3 pages, 1 figureSreejith SreekumarNir Weinbergerhttp://arxiv.org/abs/2605.01446v3Sequential Minimal Optimization for $\varepsilon$-SVR with MAPE Loss and Sample-Dependent Box Constraints2026-06-05T18:30:10ZSupport vector regression with Mean Absolute Percentage Error (MAPE) loss is theoretically well-motivated for forecasting applications where accuracy is evaluated in relative terms, but the sample-dependent dual box constraints it induces have not been addressed in the published SMO literature. We derive a Sequential Minimal Optimization algorithm for this setting and prove a structural-invariance result: the MAPE modification affects exactly two components of the SMO iteration -- working-set selection and analytic-update clipping -- leaving gradient bookkeeping and curvature computation identical to classical epsilon-SVR. Building on this invariance, we establish four efficiency improvements (asymmetric freeze-counters, warm-starting, block working-set updates of size four, and per-pair tolerance scaling) and resolve a previously-open convergence problem for the odd-symmetry kernel variant via adaptive spectral regularization. Numerical validation against three reference solvers across eleven synthetic configurations certifies solution agreement within standard tolerance. Wall-time benchmarks show the present algorithm achieves the lowest median runtime on every tested configuration against OSQP, MOSEK, and Clarabel. At production scale, the algorithm converges on the California Housing benchmark while the patched LIBSVM reference implementation reaches its iteration ceiling without satisfying optimality -- demonstrating the practical necessity of the theoretical efficiency mechanisms. An open-source R package and an explicit solver-adaptation recipe are provided.2026-05-02T13:51:46Z82 pages, 3 figure, 13 tablesPablo Benavides-HerreraRiemann Ruiz-CruzJuan Diego Sánchez-Torreshttp://arxiv.org/abs/2602.02431v2Full-Batch Gradient Descent Outperforms One-Pass SGD: Sample Complexity Separation in Single-Index Learning2026-06-05T18:19:06ZIt is folklore that reusing training data more than once can improve the statistical efficiency of gradient-based learning. While this phenomenon has been extensively studied in linear regression, the benefit of multi-pass gradient descent (GD, which reuses all the data) over one-pass stochastic gradient descent (online SGD, which uses each data point only once) is not well-understood in nonlinear and non-convex settings, except for a loss modification mechanism achieved by the first two passes on the data. In this work, we consider learning a $d$-dimensional single-index model with a quadratic activation, for which it is known that one-pass SGD requires $n\gtrsim d\log d$ samples to achieve weak recovery. We first show that this $\log d$ factor in the sample complexity persists for full-batch spherical GD on the correlation loss; however, by simply truncating the activation, full-batch GD exhibits a favorable optimization landscape at $n \simeq d$ samples, thereby outperforming one-pass SGD (with the same activation) in statistical efficiency. We complement this result with a trajectory analysis of full-batch GD on the squared loss from small initialization, showing that $n \gtrsim d$ samples and $T \gtrsim\log d$ gradient steps suffice to achieve strong (exact) recovery.2026-02-02T18:31:51ZAccepted to ICML 2026Filip KovačevićHong Chang JiDenny WuMahdi SoltanolkotabiMarco Mondellihttp://arxiv.org/abs/2606.07492v1Bradley-Terry Rankings for Recommender Systems Across Dataset Taxonomies2026-06-05T17:46:36ZThe ranking of recommendation algorithms is a challenging problem since model performance is sensitive to dataset characteristics such as sparsity, sequential structure, and scale. This drives a demand for a proper methodology for fair comparison between algorithms. Naive aggregation of performance metrics (e.g., averaging NDCG over benchmarks) can yield misleading rankings, undermining practical selection. To address this problem, we introduce a novel, data-driven ranking methodology based on Bradley-Terry (BT) model. We demonstrate that the obtained ranking depends on key dataset statistics. Additionally, we propose a novel metric for evaluating ranking consistency and demonstrate robustness of our ranking to incomplete data. Finally, we introduce a dataset-specific methodology for ranking algorithms on unseen datasets without running the models, relying on extensions of the Bradley-Terry framework, including BT trees and BT models with covariates.2026-06-05T17:46:36ZKDD'26Ekaterina GrishinaStepan KuznetsovAskar TsyganovIlya IvanovDaria KorovaitcevaMargarita RusanovaUliana ParkinaAlexander DerevyaginEvgeny FrolovSergey SamsonovAnton Lysenko10.1145/3770855.3817890http://arxiv.org/abs/2606.07483v1Network Recovery from Cascade Data: A Debiased Jacobian-Based Machine Learning Approach2026-06-05T17:38:05ZMany important outcomes unfold as dynamic cascades, including product adoption, disease spread, financial distress, and information diffusion. A central challenge is to recover the hidden influence network behind these cascades. Existing methods typically assume a specific diffusion model, and their performance degrades substantially when that assumption is misspecified. We propose CascadeNet, a Jacobian-based machine learning framework for network recovery that does not require specifying a diffusion mechanism. The key idea is that the underlying influence structure can be characterized by the Jacobian of the one-step transition function. CascadeNet first constructs a flexible estimator of the transition function, and further applies Neyman-orthogonal debiasing via the Riesz representer, so that the debiased Jacobian is $\sqrt{n}$-consistent and asymptotically normal, enabling formal inference on the network structure. We validate CascadeNet in both a simulation exercise and a real-world empirical application. In simulations, where the data-generating process is known, CascadeNet achieves the highest network recovery accuracy across nine common data-generating processes. In an empirical application to COVID-19 transmission across Spain's 52 provinces, CascadeNet recovers transmission networks that are significantly correlated with the true inter-province mobility network, whereas networks recovered by baseline methods show no significant alignment with the ground truth.2026-06-05T17:38:05ZLei Huanghttp://arxiv.org/abs/2606.07457v1Time series Foundation Models based on Physics-Informed Synthetic Histories for Cold-Start Photovoltaic Forecasting2026-06-05T17:05:19ZAt commissioning time, Photovoltaic (PV) operators must forecast production before target-site observations are available, limiting the direct use of standard supervised forecasters. This cold-start setting is addressed with a zero-shot pipeline that generates a synthetic production history from plant metadata and meteorological covariates, enabling time-series foundation models (TSFMs) to forecast through inference-time conditioning. Five TSFMs are benchmarked against classical baselines under strict Cold-Start Baseline, Real Feedback, and Self-Forecast Feedback strategies. The evaluation spans $440$ PV sites across four datasets and diverse climate regimes. Covariate-aware foundation models outperform baselines by approximately $1.7-2\times$: TabPFN-TS achieves the lowest error under Real Feedback (MAE $0.514$, RMSE $0.721$ $kWh$ ${kWp}^{-1}$ ${d}^{-1}$), while Chronos-2 is most robust under Self-Forecast Feedback. Performance is largely insensitive to the synthetic-history source, indicating that accuracy is driven more by the availability of plausible temporal context than by the specific generator.2026-06-05T17:05:19ZTo be published in the 2nd ICML Workshop on Foundation Models for Structured DataLorenzo LongariniAlessandro RongoniSimone SilenziEmanuele FrontoniRiccardo Rosati