https://arxiv.org/api/J63YguUOv2+0ExeYu7gAato4uyE 2026-06-13T13:38:59Z 78354 45 15 http://arxiv.org/abs/2606.10212v2 Intrinsic Riemannian Cross-covariance for Manifold-valued Random Objects 2026-06-10T17:56:45Z Covariance estimation yields a fundamental second-order statistic underlying representation learning, dimension reduction, and dependence modeling. While covariance has been well understood in Euclidean spaces, it is ill-defined for random objects residing on nonlinear Riemannian manifolds, which increasingly arise in modern machine learning applications involving shapes, symmetric positive definite (SPD) matrices, etc. This paper introduces an intrinsic Riemannian cross-covariance for manifold-valued random objects. Our approach defines covariance and correlation by transporting local variations to a common tangent space via parallel transport, yielding a second-order descriptor that is independent of arbitrary coordinate choices. We establish that the proposed covariance inherits desirable properties of its Euclidean counterparts and characterize its asymptotic behavior. Numerical studies on spheres and SPD manifolds, together with real-data experiments on heart valve shapes in Kendall's shape space, demonstrate the effectiveness of our estimators and verify the stated properties. Our results position the Riemannian covariance as a fundamental tool for second-order learning and analysis in non-Euclidean representation spaces. 2026-06-08T22:05:16Z 31 pages, 16 figures Carlos Soto Cheng Wang Yujing Huang Xiaoyu Chen http://arxiv.org/abs/2605.04893v2 Self-Attention as Transport: Limits of Symmetric Spectral Diagnostics 2026-06-10T17:09:08Z When a language model processes a hallucinated response, its attention routing tends to fail in one of two shapes: over-concentrating on a narrow set of positions, or spreading so diffusely that relevance is diluted, and the shape of the failure carries diagnostic signal. We study these shapes as a diagnostic characterization, computed from attention matrices under \emph{forced scoring} of benchmark-labeled responses rather than during live generation. A widely used family of spectral methods analyzes the symmetric component of the degree-normalized attention operator, which governs transport \emph{capacity}; we prove that every transpose-invariant spectral diagnostic of this operator is structurally \emph{orientation-blind} (it cannot distinguish an operator from its transpose, and therefore cannot detect information-flow direction), with a converse to the blindness theorem bounding any Lipschitz diagnostic's transpose sensitivity by the asymmetry coefficient $G$. Pairing this with a closed-form bipartite-Cheeger landscape for canonical causal architectures, we show that uniform causal attention satisfies an $n$-independent floor $φ\ge 1/5$, while window attention pierces the floor as $O(w/n)$; failure modes are shape-different, not just value-different. This floor is an idealized-architecture benchmark, not an empirical attractor: the fraction of real attention heads that pierce it is itself an architectural signature. The resulting two-axis diagnostic ($φ$ for capacity, $G$ for direction) yields a falsifiable polarity prediction: bottleneck- and diffuse-dominated benchmarks should exhibit opposite polarity. Under length-controlled evaluation, transport features retain interpretable signal (0.62-0.84 LC-AUROC) across the tested decoder-only, encoder-only, and encoder-decoder models, with polarity reversing as predicted between HaluEval and MedHallu. 2026-05-06T13:25:13Z 48 pages, 6 figures, 7 tables; 81-page online supplement (proofs, additional experiments, dataset statistics) as an ancillary file Dominik Dahlem Diego Maniloff Mac Misiura http://arxiv.org/abs/2605.27478v3 Triangular-Reference Schrödinger Bridges for Time Series Generation 2026-06-10T16:05:43Z Schrödinger bridges for time series (SBTS) generate synthetic paths by projecting, in relative entropy, a Brownian reference onto the path laws that match the joint distribution of the data on the observation grid. The Brownian reference, however, fixes the quadratic variation of the generated paths, which is restrictive when stochastic volatility, correlated noise, or rank-deficient covariance structures must be reproduced. We introduce "Triangular-Reference Schrödinger Bridges for Time Series" (TR-SBTS), which keeps the entropy-projection backbone of SBTS but replaces the Brownian reference by a triangular, volatility-informed, intervalwise frozen reference on a state augmented with latent covariance descriptors. The construction remains a single entropy projection on the augmented state: the minimiser is the \(h\)-transform of the reference, and on each frozen interval the optimal drift has the logarithmic-gradient form \(b^\star(t,x)=A\,\nabla\log H(t,x)\), intrinsic to the active covariance directions when the frozen covariance \(A\) is degenerate. We prove stability of the frozen approximation and consistency of the associated regularised kernel estimators, describe a reference-aware Nadaraya--Watson implementation of the conditional next-increment law, and evaluate the construction on numerical experiments. 2026-05-26T12:05:11Z Gabriele Bocchi http://arxiv.org/abs/2606.12260v1 Market Design for AI: Beyond the Copyright Binary 2026-06-10T16:04:08Z How can we design a market of human-generated content for use in training AI models that both enables technological progress and preserves individual incentives for high-quality content creation? Existing approaches take polar positions: a "free-for-all" model based on fair use and a "strong intellectual property rights" model. We show that both fail: Free-for-all does not compensate creators, and -- by modeling as a static Stackelberg game -- strong intellectual property rights also underpower creative incentives. We find this especially true for more innovative creators, a phenomenon we term the "originality penalty." Extending this insight to a dynamic model, we find another market failure undermining AI model performance, even for an initially good model: Such a model induces greater reliance by humans on AI-assisted creation, resulting in homogenized content feeding back into training, which degrades the model performance -- a "curse of precision." We further propose a market design with a data intermediary internalizing cross-creator externalities and subsidizing innovative contributions, thereby restoring efficiency. 2026-06-10T16:04:08Z Yan Dai Maryam Farboodi Negin Golrezaei Sepehr Shahshahani http://arxiv.org/abs/2601.14031v2 Intermittent time series forecasting: local vs global models 2026-06-10T15:11:19Z Forecasting intermittent time series, which contain zeros, is a crucial challenge in supply chains as inventory policies require probabilistic forecasts to establish safety levels. Intermittent time series are commonly forecast using local models, trained individually on each time series. In the last years global models, trained on a large collection of time series, have become popular for time series forecasting. Global models are often based on neural networks or gradient boosted trees. We carry out the first study comparing state-of-the-art probabilistic local and global models on intermittent time series. For global models we consider three different distribution heads suitable for intermittent time series: negative binomial, hurdle-shifted negative binomial and Tweedie. To the best of our knowledge, this is the first use of the latter two with neural networks. We perform experiments on five datasets comprising overall more than 40'000 real-world time series. Among global models, TiDE, a simple neural network architecture, achieves the best accuracy; it also consistently outperforms local models and has lower computational requirements. Large global models are instead much more computationally demanding and less accurate. Among the distribution heads, the Tweedie provides the best estimates of the highest quantiles. 2026-01-20T14:53:24Z Submitted to the Journal of the Operational Research Society Stefano Damato Nicolò Rubattu Dario Azzimonti Giorgio Corani http://arxiv.org/abs/2602.10908v2 SoftMatcha 2: A Fast and Soft Pattern Matcher for Trillion-Scale Corpora 2026-06-10T14:49:59Z We present SoftMatcha 2, an ultra-fast and flexible search algorithm that enables search over trillion-scale natural language corpora in under 0.3 seconds while allowing semantic variations in the form of substitution, insertion, and deletion. Our approach employs string matching based on suffix arrays that scales well with corpus size, and represents words as vectors, which underpin its semantic flexibility. To mitigate the combinatorial explosion induced by the semantic relaxation of queries, our method is built on two key algorithmic ideas: dynamic corpus-aware pruning and fast exact lookup enabled by a disk-aware design. We theoretically analyze the efficiency of the proposed method, indicating that it can mitigate exponential growth in the search space. Empirically, on FineWeb-Edu (Lozhkov et al., 2024) (1.4T tokens), it attains substantially lower search latency than existing methods: infini-gram (Liu et al., 2024), infini-gram mini (Xu et al., 2025), and SoftMatcha (Deguchi et al., 2025). As a practical application, our method uncovers benchmark contamination in training corpora that existing approaches miss, and it also benefits information retrieval and paraphrase detection. We also provide an online demo of fast, soft search across corpora in seven languages. 2026-02-11T14:40:15Z Accepted at ICML2026. Project Page & Web Interface: https://softmatcha.github.io/v2/, Source Code: https://github.com/softmatcha/softmatcha2 Masataka Yoneda Yusuke Matsushita Go Kamoda Kohei Suenaga Takuya Akiba Masaki Waga Sho Yokoi http://arxiv.org/abs/2408.07498v5 Wasserstein Gradient Flows of MMD Functionals with Distance Kernel and Cauchy Problems on Quantile Functions 2026-06-10T14:06:36Z We give a comprehensive description of Wasserstein gradient flows of maximum mean discrepancy (MMD) functionals $\mathcal F_ν:= \text{MMD}_K^2(\cdot, ν)$ towards given target measures $ν$ on the real line, where we focus on the negative distance kernel $K(x,y) := -|x-y|$. In one dimension, the Wasserstein-2 space can be isometrically embedded into the cone $\mathcal C(0,1) \subset L_2(0,1)$ of quantile functions leading to a characterization of Wasserstein gradient flows via the solution of an associated Cauchy problem on $L_2(0,1)$. Based on the construction of an appropriate counterpart of $\mathcal F_ν$ on $L_2(0,1)$ and its subdifferential, we provide a solution of the Cauchy problem. For discrete target measures $ν$, this results in a piecewise linear solution formula. We prove invariance and smoothing properties of the flow on subsets of $\mathcal C(0,1)$. For certain $\mathcal F_ν$-flows this implies that initial point measures instantly become absolutely continuous, and stay so over time. Finally, we illustrate the behavior of the flow by various numerical examples using an implicit Euler scheme, which is easily computable by a bisection algorithm. For continuous targets $ν$, also the explicit Euler scheme can be employed, although with limited convergence guarantees. 2024-08-14T12:28:21Z We corrected the implicit Euler scheme in our code and updated the plots. Also, a minor mistake in the def. (14) and an error in the proof of Thm. 3.5 have been corrected. We thank the anonymous contributors for their valuable feedback, further improving the clarity of the paper. 48 pages, 23 figures, comments welcome! Richard Duong Viktor Stein Robert Beinert Johannes Hertrich Gabriele Steidl http://arxiv.org/abs/2506.00330v3 Accurate Estimation of Mutual Information in High Dimensional Data 2026-06-10T13:50:48Z Mutual information (MI) quantifies statistical dependence between variables and is widely used across scientific disciplines, yet accurate estimation from finite data remains notoriously difficult. Common approaches fail in high-dimensional, undersampled regimes ($N \lesssim K$) typical of modern experiments, and no accepted tests exist to detect when neural network-based estimators fail, making them effectively unusable as scientific instruments. We show that neural MI estimators can be made reliable when the statistical dependencies admit a low-dimensional latent representation. Sample complexity is then governed by the latent dimensionality $K_Z \ll K$ rather than the ambient dimension -- a regime shift we confirm empirically and ground theoretically via random matrix theory. Building on this insight, we develop a practical protocol that provides neural estimators with explicit statistical consistency checks, bias correction, and confidence intervals. We additionally introduce a new class of probabilistic critics (the VSIB family) that substantially reduce bias and variance at higher MI values where standard estimators break down. We validate the protocol on synthetic benchmarks ($K=500$, $N$ as low as $256$), on the standard 40-dataset benchmark suite of Czyz et al. (2023), on noisy MNIST ($K=784$), and on CIFAR-10/100 ($K=3072$) with a ResNet-20 backbone. Our protocol consistently matches or exceeds existing methods while being the only approach to report confidence intervals and flag unreliable estimates, achieving reliable MI detection well below the ambient pixel dimension on real images. 2025-05-31T01:06:18Z 15 pages main text, 21 pages SI, 12 Figs overall Eslam Abdelaleem K. Michael Martini Ilya Nemenman http://arxiv.org/abs/2603.12901v2 A theory of learning data statistics in diffusion models, from easy to hard 2026-06-10T13:28:42Z While diffusion models have emerged as a powerful class of generative models, their learning dynamics remain poorly understood. We address this issue first by empirically showing that standard diffusion models trained on natural images exhibit a distributional simplicity bias, learning simple, pair-wise input statistics before specializing to higher-order correlations. We reproduce this behaviour in simple denoisers trained on a minimal data model, the mixed cumulant model, where we precisely control both pair-wise and higher-order correlations of the inputs. We identify a scalar invariant of the model that governs the sample complexity of learning pair-wise and higher-order correlations that we call the diffusion information exponent, in analogy to related invariants in different learning paradigms. Using this invariant, we prove that the denoiser learns simple, pair-wise statistics of the inputs at linear sample complexity, while more complex higher-order statistics, such as the fourth cumulant, require at least cubic sample complexity. We also prove that the sample complexity of learning the fourth cumulant is linear if pair-wise and higher-order statistics share a correlated latent structure. Our work describes a key mechanism for how diffusion models can learn distributions of increasing complexity. 2026-03-13T11:07:01Z ICML 2026 Lorenzo Bardone Claudia Merger Sebastian Goldt http://arxiv.org/abs/2606.12058v1 Phase Transitions in Attention: A Bayesian Theory of Copy Head Emergence 2026-06-10T13:26:56Z Attention is the key mechanism underlying in-context learning in transformers, and attention patterns have been observed empirically to emerge abruptly during training. We present a Bayesian theory of feature learning in attention; we then focus on how the copy subcircuit in the first layer of an induction head is learned by analyzing a single-layer softmax attention network trained on a copy task. We derive a closed-form posterior over the attention matrix and reduce it to a low-dimensional order parameter space. This reduction reveals a phase transition in the amount of training data, which we verify using both Bayesian sampling and standard training with Adam. We contrast our results with linear attention and find that softmax attention exhibits a \emph{first-order phase transition} while in linear attention an initial \emph{second-order phase transition} is followed by a smooth, continuous evolution toward the structured attention pattern (\emph{crossover}). Our work provides a first-principles theoretical account of the abrupt emergence of the copy subcircuit, reminiscent of the one observed in training large language models. 2026-06-10T13:26:56Z Itay Lavie Kirsten Fischer Andrey Lekov Frederic Van Maele Zohar Ringel Moritz Helias http://arxiv.org/abs/2505.00571v3 Discovery and inference beyond linearity for epidemiological data by integrating Bayesian regression, tree ensembles and Shapley values 2026-06-10T13:21:06Z Machine Learning (ML) is gaining popularity in epidemiology and healthcare studies for hypothesis-free discovery of risk and protective factors. ML is strong at discovering nonlinearities and interactions, but this power is compromised by a lack of reliable inference. Although Shapley values provide local measures of features' effects, valid uncertainty quantification for these effects is typically lacking, thus precluding statistical inference. We propose RuleSHAP, a framework that addresses this limitation by combining a dedicated Bayesian sparse regression model with an improved tree-based rule generator and Shapley value attribution. RuleSHAP provides detection of nonlinear and interaction effects, with uncertainty quantification at the individual level as a key contribution. We derive an efficient formula for computing marginal Shapley values within this framework. We apply RuleSHAP to data from an epidemiological cohort to detect and infer several effects for high cholesterol and blood pressure, such as nonlinear interaction effects between features like age, sex, ethnicity, BMI and glucose level. To conclude, we demonstrate the validity of our framework on simulated data. 2025-05-01T14:55:22Z Giorgio Spadaccini Marjolein Fokkema Mark A. van de Wiel http://arxiv.org/abs/2606.12047v1 Metadata-Aware Multi-Prompt Reasoning for Zero-Shot Accident Understanding 2026-06-10T13:12:40Z In this paper, we address the problem of zero-shot understanding of accidents from surveillance videos by identifying when an impact event occurs, what type of impact it is, and where in the frame it occurs using natural language. We propose a three-stage pipeline that decomposes the accident understanding into when, what, and where. The first stage extracts a short temporal window around the impact using vision-language similarity. In the second stage, we perform metadata-driven multi-prompt reasoning with five complementary views (baseline, motion, geometry, contrast, and tiebreaker) and resolve disagreement via an entropy-gated pairwise adjudicator. Finally, we localize the impact of an open-vocabulary detector queried on the predicted accident type and scene layout, and aggregate detections across keyframes using a score-weighted centroid. Our pipeline achieves a substantial improvement in the harmonic-mean score over a centre-of-frame baseline on the zero-shot ACCIDENT @ CVPR benchmark. We show that decomposing zero-shot video understanding into temporal localization, semantic classification, and spatial grounding enable more reliable reasoning with vision-language models than direct prompting alone. 2026-06-10T13:12:40Z Accepted at the AUTOPILOT Workshop, CVPR 2026 (non-archival). Workshop Paper ID 15 Tarandeep Singh Soumyanetra Pal Soham Biswas Nishanth Chandran http://arxiv.org/abs/2508.17077v3 CP4SBI: Local Conformal Calibration of Credible Sets in Simulation-Based Inference 2026-06-10T13:08:19Z Current experimental scientists have been increasingly relying on simulation-based inference (SBI) to invert complex non-linear models with intractable likelihoods. However, posterior approximations obtained with SBI are often miscalibrated, causing credible regions to undercover true parameters. We develop $\texttt{CP4SBI}$, a model-agnostic conformal calibration framework that constructs credible sets with local Bayesian coverage. Our two proposed variants, namely local calibration via regression trees and CDF-based calibration, enable finite-sample local coverage guarantees for any scoring function, including HPD, symmetric, and quantile-based regions. Experiments on widely used SBI benchmarks demonstrate that our approach improves the quality of uncertainty quantification for neural posterior estimators using both normalizing flows and score-diffusion modeling. 2025-08-23T16:13:10Z Luben M. C. Cabezas Vagner S. Santos Thiago R. Ramos Pedro L. C. Rodrigues Rafael Izbicki http://arxiv.org/abs/2603.08558v3 Impact of Connectivity on Laplacian Representations in Reinforcement Learning 2026-06-10T12:46:36Z Learning compact state representations in Markov Decision Processes (MDPs) has proven crucial for addressing the curse of dimensionality in large-scale reinforcement learning (RL) problems. Existing principled approaches leverage structural priors on the MDP by constructing state representations as linear combinations of the state-graph Laplacian eigenvectors. When the transition graph is unknown or the state space is prohibitively large, the graph spectral features can be estimated directly via sample trajectories. In this work, we prove an upper bound on the approximation error of linear value function approximation under the learned spectral features. We show how this error scales with the algebraic connectivity of the state-graph, grounding the approximation quality in the topological structure of the MDP. We further bound the error introduced by the eigenvector estimation itself, leading to an end-to-end error decomposition across the representation learning pipeline. Additionally, our expression of the Laplacian operator for the RL setting, although equivalent to existing ones, prevents some common misunderstandings, of which we show some examples from the literature. Our results hold for general (non-uniform) policies without any assumptions on the symmetry of the induced transition kernel. We validate our theoretical findings with numerical simulations on gridworld environments. 2026-03-09T16:20:31Z Tommaso Giorgi Pierriccardo Olivieri Keyue Jiang Laura Toni Matteo Papini http://arxiv.org/abs/2606.11988v1 What Uncertainties Do We Need for Dynamical Systems? 2026-06-10T12:12:12Z The distinction between aleatoric and epistemic uncertainty has received considerable attention in machine learning research, mainly in the context of supervised learning but also in other settings such as generative modeling. In this paper, we offer a machine learning perspective on uncertainty modeling for dynamical systems, which has been studied much less so far. In particular, we ask: what uncertainties do we need for dynamical systems? We discuss sources of uncertainty, clarify their nature (aleatoric or epistemic), and consider how the objectives of representing and quantifying uncertainty vary across different tasks. 2026-06-10T12:12:12Z EIML@ICML Yusuf Sale Christopher Bülte Felix Czaja Joshua Stiller Eyke Hüllermeier