https://arxiv.org/api/+EY1nB2U+1oaNiaGavKiF3ItyQU 2026-03-26T08:31:30Z 76543 30 15 http://arxiv.org/abs/2603.23783v1 Probabilistic Geometric Alignment via Bayesian Latent Transport for Domain-Adaptive Foundation Models 2026-03-24T23:35:08Z Adapting large-scale foundation models to new domains with limited supervision remains a fundamental challenge due to latent distribution mismatch, unstable optimization dynamics, and miscalibrated uncertainty propagation. This paper introduces an uncertainty-aware probabilistic latent transport framework that formulates domain adaptation as a stochastic geometric alignment problem in representation space. A Bayesian transport operator is proposed to redistribute latent probability mass along Wasserstein-type geodesic trajectories, while a PAC-Bayesian regularization mechanism constrains posterior model complexity to mitigate catastrophic overfitting. The proposed formulation yields theoretical guarantees on convergence stability, loss landscape smoothness, and sample efficiency under distributional shift. Empirical analyses demonstrate substantial reduction in latent manifold discrepancy, accelerated transport energy decay, and improved covariance calibration compared with deterministic fine-tuning and adversarial domain adaptation baselines. Furthermore, bounded posterior uncertainty evolution indicates enhanced probabilistic reliability during cross-domain transfer. By establishing a principled connection between stochastic optimal transport geometry and statistical generalization theory, the proposed framework provides new insights into robust adaptation of modern foundation architectures operating in heterogeneous environments. These findings suggest that uncertainty-aware probabilistic alignment constitutes a promising paradigm for reliable transfer learning in next-generation deep representation systems. 2026-03-24T23:35:08Z 11 pages, 8 Figures, 25 Equations, 5 Tables and 3 Theorems Kuepon Aueawatthanaphisut Kuepon Aueawatthanaphisut http://arxiv.org/abs/2602.12023v2 Decomposition of Spillover Effects Under Misspecification: Pseudo-true Estimands and a Local-Global Extension 2026-03-24T22:45:29Z Applied work under interference typically models outcomes as functions of own treatment and a low-dimensional exposure mapping of others' treatments, even when that mapping may be misspecified. We ask what policy object such exposure-based procedures target. Taking the marginal policy effect as primitive, we show that any researcher-chosen exposure mapping induces a unique pseudo-true outcome model: the best approximation to the underlying potential outcomes within the class of functions that depend only on that mapping. This yields a decomposition of the marginal policy effect into exposure-based direct and spillover effects, and each component optimally approximates its oracle counterpart, with a sign-preserving interpretation under monotonicity. We then study a structured misspecification setting in which outcomes depend on both network spillovers and a global equilibrium channel, while the analyst may model only one. In this setting, we obtain a sharper asymptotic decomposition into direct, local, and global components, implying that existing estimators recover their respective oracle channel-specific effects even when the other channel is present but omitted from the maintained model.The analysis also yields phase transitions in convergence rates and higher-order expansions for Z-estimators. A semi-synthetic experiment calibrated to a large cash-transfer study illustrates the empirical relevance of the framework. 2026-02-12T14:54:28Z Yechan Park Xiaodong Yang http://arxiv.org/abs/2512.09295v3 Distributional Shrinkage II: Higher-Order Scores Encode Brenier Map 2026-03-24T22:31:40Z Consider the additive Gaussian model $Y = X + σZ$, where $X \sim P$ is an unknown signal, $Z \sim N(0,1)$ is independent of $X$, and $σ> 0$ is known. Let $Q$ denote the law of $Y$. We construct a hierarchy of denoisers $T_0, T_1, \ldots, T_\infty \colon \mathbb{R} \to \mathbb{R}$ that depend only on higher-order score functions $q^{(m)}/q$, $m \geq 1$, of $Q$ and require no knowledge of the law $P$. The $K$-th order denoiser $T_K$ involves scores up to order $2K{-}1$ and satisfies $W_r(T_K \sharp Q, P) = O(σ^{2(K+1)})$ for every $r \geq 1$; in the limit, $T_\infty$ recovers the monotone optimal transport map (Brenier map) pushing $Q$ onto $P$. We provide a complete characterization of the combinatorial structure governing this hierarchy through partial Bell polynomial recursions, making precise how higher-order score functions encode the Brenier map. We further establish rates of convergence for estimating these scores from $n$ i.i.d.\ draws from $Q$ under two complementary strategies: (i) plug-in kernel density estimation, and (ii) higher-order score matching. The construction reveals a precise interplay among higher-order Fisher-type information, optimal transport, and the combinatorics of integer partitions. 2025-12-10T03:41:06Z 25 pages Tengyuan Liang http://arxiv.org/abs/2511.09500v4 Distributional Shrinkage I: Universal Denoiser Beyond Tweedie's Formula 2026-03-24T22:28:38Z We study the problem of denoising when only the noise level is known, not the noise distribution. Independent noise $Z$ corrupts a signal $X$, yielding the observation $Y = X + σZ$ with known $σ\in (0,1)$. We propose \emph{universal} denoisers, agnostic to both signal and noise distributions, that recover the signal distribution $P_X$ from $P_Y$. When the focus is on distributional recovery of $P_X$ rather than on individual realizations of $X$, our denoisers achieve order-of-magnitude improvements over the Bayes-optimal denoiser derived from Tweedie's formula, which achieves $O(σ^2)$ accuracy. They shrink $P_Y$ toward $P_X$ with $O(σ^4)$ and $O(σ^6)$ accuracy in matching generalized moments and densities. Drawing on optimal transport theory, our denoisers approximate the Monge--Ampère equation with higher-order accuracy and can be implemented efficiently via score matching. Let $q$ denote the density of $P_Y$. For distributional denoising, we propose replacing the Bayes-optimal denoiser, $$\mathbf{T}^*(y) = y + σ^2 \nabla \log q(y),$$ with denoisers exhibiting less-aggressive distributional shrinkage, $$\mathbf{T}_1(y) = y + \frac{σ^2}{2} \nabla \log q(y),$$ $$\mathbf{T}_2(y) = y + \frac{σ^2}{2} \nabla \log q(y) - \frac{σ^4}{8} \nabla \!\left( \frac{1}{2} \| \nabla \log q(y) \|^2 + \nabla \cdot \nabla \log q(y) \right)\!.$$ 2025-11-12T17:20:42Z 27 pages, 5 figures Tengyuan Liang http://arxiv.org/abs/2510.00479v2 On the joint estimation of flow fields and particle properties from Lagrangian data 2026-03-24T22:16:11Z We numerically investigate the feasibility and limits of jointly estimating flow fields and unknown particle properties (e.g., position, size, and density) from Lagrangian particle tracking (LPT) data. LPT offers time-resolved, volumetric measurements of particle trajectories, which are markers of the carrier fluid motion. However, experimental tracks are spatially sparse and potentially noisy, and the problem of reconstructing flow fields may be further complicated by inertial particle transport, such that particle slip velocities must be determined to access the velocity field of the carrier fluid. To address this problem, we develop a data assimilation framework that couples an Eulerian representation of the flow with Lagrangian particle models, enabling the simultaneous inference of carrier fields and particle properties under the governing equations of disperse multiphase flow. We show that flow fields and particle properties can be jointly estimated in three representative regimes: (1) In a turbulent boundary layer with noisy tracer tracks (St to 0), flow fields and true particle positions are jointly estimated, which amounts to a physics-informed particle tracking problem; (2) in homogeneous isotropic turbulence seeded with inertial particles (St ~ 1-5), we demonstrate simultaneous recovery of flow states and particle diameters, showing the feasibility of implicit particle characterization; and (3) in a compressible, shock-dominated flow, we report the first joint reconstructions of velocity, pressure, density, and inertial particle properties (diameter and density), highlighting both the potential and certain limits of joint estimation in supersonic regimes. A systematic sensitivity study reveals how the seeding density, noise level, and Stokes number govern reconstruction accuracy for our method. 2025-10-01T04:00:52Z Ke Zhou Samuel J. Grauer http://arxiv.org/abs/2603.23736v1 Wasserstein Parallel Transport for Predicting the Dynamics of Statistical Systems 2026-03-24T21:45:37Z Many scientific systems, such as cellular populations or economic cohorts, are naturally described by probability distributions that evolve over time. Predicting how such a system would have evolved under different forces or initial conditions is fundamental to causal inference, domain adaptation, and counterfactual prediction. However, the space of distributions often lacks the vector space structure on which classical methods rely. To address this, we introduce a general notion of parallel dynamics at a distributional level. We base this principle on parallel transport of tangent dynamics along optimal transport geodesics and call it ``Wasserstein Parallel Trends''. By replacing the vector subtraction of classic methods with geodesic parallel transport, we can provide counterfactual comparisons of distributional dynamics in applications such as causal inference, domain adaptation, and batch-effect correction in experimental settings. The main mathematical contribution is a novel notion of fanning scheme on the Wasserstein manifold that allows us to efficiently approximate parallel transport along geodesics while also providing the first theoretical guarantees for parallel transport in the Wasserstein space. We also show that Wasserstein Parallel Trends recovers the classic parallel trends assumption for averages as a special case and derive closed-form parallel transport for Gaussian measures. We deploy the method on synthetic data and two single-cell RNA sequencing datasets to impute gene-expression dynamics across biological systems. 2026-03-24T21:45:37Z Tristan Luca Saidi Gonzalo Mena Larry Wasserman Florian Gunsilius http://arxiv.org/abs/2603.23688v1 Adaptive Gaussian Process Search for Simulation-Based Sample Size Estimation in Clinical Prediction Models: Validation of the pmsims R Package 2026-03-24T19:53:37Z Background: Determining an adequate sample size is essential for developing reliable and generalisable clinical prediction models, yet practical guidance on selecting appropriate methods remains limited. Existing analytical and simulation-based approaches often rely on restrictive assumptions and focus on mean-based criteria. We present and validate pmsims, an R package that uses Gaussian process surrogate modelling to provide a flexible and computationally efficient simulation-based framework for sample size determination across diverse prediction settings. Methods: We conducted a comprehensive simulation study with two aims. First, we compared three search engines implemented in pmsims: a Gaussian process-based adaptive method, a deterministic bisection method, and a hybrid approach, across binary, continuous, and survival outcomes. Second, we benchmarked the best-performing pmsims engine against existing analytical (pmsampsize) and simulation-based (samplesizedev) methods, evaluating recommended sample sizes, computational time, and achieved performance on large independent validation datasets. Results: The Gaussian process-based method consistently produced the most stable sample size estimates, particularly in low-signal, high-dimensional settings. In benchmarking, pmsims achieved performance close to prespecified targets across all outcome types, matching simulation-based approaches and outperforming analytical methods in more challenging scenarios. Conclusions: pmsims provides an efficient and flexible framework for principled sample size planning in clinical prediction modelling, requiring fewer model evaluations than non-adaptive simulation approaches. 2026-03-24T19:53:37Z 27 pages, 2 main-text figures, 16 supplementary figures, 9 tables, preprint Oyebayo Ridwan Olaniran Diana Shamsutdinova Sarah Markham Felix Zimmer Daniel Stahl Gordon Forbes Ewan Carr http://arxiv.org/abs/2509.24140v2 A signal separation view of classification 2026-03-24T19:36:27Z The problem of classification in machine learning has often been approached in terms of function approximation. In this paper, we propose an alternative approach for classification in arbitrary compact metric spaces which, in theory, yields both the number of classes, and a perfect classification using a minimal number of queried labels. Our approach uses localized trigonometric polynomial kernels initially developed for the point source signal separation problem in signal processing. Rather than point sources, we argue that the various classes come from different probability measures. The localized kernel technique developed for separating point sources is then shown to separate the supports of these distributions. This is done in a hierarchical manner in our MASC algorithm to accommodate touching/overlapping class boundaries. We illustrate our theory on several simulated and real life datasets, including the Salinas and Indian Pines hyperspectral datasets and a document dataset. 2025-09-29T00:28:55Z H. N. Mhaskar Ryan O'Dowd http://arxiv.org/abs/2511.18789v3 Perturbing the Derivative: Doubly Wild Refitting for Model-Free Evaluation of Opaque Machine Learning Predictors 2026-03-24T19:34:22Z We study the problem of excess risk evaluation for empirical risk minimization (ERM) under convex losses. We show that by leveraging the idea of wild refitting, one can upper bound the excess risk through the so-called "wild optimism," without relying on the global structure of the underlying function class but only assuming black box access to the training algorithm and a single dataset. We begin by generating two sets of artificially modified pseudo-outcomes created by stochastically perturbing the derivatives with carefully chosen scaling. Using these pseudo-labeled datasets, we refit the black-box procedure twice to obtain two wild predictors and derive an efficient excess risk upper bound under the fixed design setting. Requiring no prior knowledge of the complexity of the underlying function class, our method is essentially model-free and holds significant promise for theoretically evaluating modern opaque deep neural networks and generative models, where traditional learning theory could be infeasible due to the extreme complexity of the hypothesis class. 2025-11-24T05:38:47Z Haichen Hu David Simchi-Levi http://arxiv.org/abs/2412.07586v2 Paired Wasserstein Autoencoders for Conditional Sampling 2026-03-24T17:53:34Z Generative autoencoders learn compact latent representations of data distributions through jointly optimized encoder--decoder pairs. In particular, Wasserstein autoencoders (WAEs) minimize a relaxed optimal transport (OT) objective, where similarity between distributions is measured through a cost-minimizing joint distribution (OT coupling). Beyond distribution matching, neural OT methods aim to learn mappings between two data distributions induced by an OT coupling. Building on the formulation of the WAE loss, we derive a novel loss that enables sampling from OT-type couplings via two paired WAEs with shared latent space. The resulting fully parametrized joint distribution yields (i) learned cost-optimal transport maps between the two data distributions via deterministic encoders. Under cost-consistency constraints, it further enables (ii) conditional sampling from an OT-type coupling through stochastic decoders. As a proof of concept, we use synthetic data with known and visualizable marginal and conditional distributions. 2024-12-10T15:22:26Z Moritz Piening Matthias Chung http://arxiv.org/abs/2603.23398v1 Graph Energy Matching: Transport-Aligned Energy-Based Modeling for Graph Generation 2026-03-24T16:35:25Z Energy-based models for discrete domains, such as graphs, explicitly capture relative likelihoods, naturally enabling composable probabilistic inference tasks like conditional generation or enforcing constraints at test-time. However, discrete energy-based models typically struggle with efficient and high-quality sampling, as off-support regions often contain spurious local minima, trapping samplers and causing training instabilities. This has historically resulted in a fidelity gap relative to discrete diffusion models. We introduce Graph Energy Matching (GEM), a generative framework for graphs that closes this fidelity gap. Motivated by the transport map optimization perspective of the Jordan-Kinderlehrer-Otto (JKO) scheme, GEM learns a permutation-invariant potential energy that simultaneously provides transport-aligned guidance from noise toward data and refines samples within regions of high data likelihood. Further, we introduce a sampling protocol that leverages an energy-based switch to seamlessly bridge: (i) rapid, gradient-guided transport toward high-probability regions to (ii) a mixing regime for exploration of the learned graph distribution. On molecular graph benchmarks, GEM matches or exceeds strong discrete diffusion baselines. Beyond sample quality, explicit modeling of relative likelihood enables targeted exploration at inference time, facilitating compositional generation, property-constrained sampling, and geodesic interpolation between graphs. 2026-03-24T16:35:25Z Michal Balcerak Suprosana Shit Chinmay Prabhakar Sebastian Kaltenbach Michael S. Albergo Yilun Du Bjoern Menze http://arxiv.org/abs/2603.23397v1 Kinetic Langevin Splitting Schemes for Constrained Sampling 2026-03-24T16:34:38Z Constrained sampling is an important and challenging task in computational statistics, concerned with generating samples from a distribution under certain constraints. There are numerous types of algorithm aimed at this task, ranging from general Markov chain Monte Carlo, to unadjusted Langevin methods. In this article we propose a series of new sampling algorithms based on the latter of these, specifically the kinetic Langevin dynamics. Our series of algorithms are motivated on advanced numerical methods which are splitting order schemes, which include the BU and BAO families of splitting schemes.Their advantage lies in the fact that they have favorable strong order (bias) rates and computationally efficiency. In particular we provide a number of theoretical insights which include a Wasserstein contraction and convergence results. We are able to demonstrate favorable results, such as improved complexity bounds over existing non-splitting methodologies. Our results are verified through numerical experiments on a range of models with constraints, which include a toy example and Bayesian linear regression. 2026-03-24T16:34:38Z 35 pages Neil K. Chada Lu Yu http://arxiv.org/abs/2603.16146v2 Deep Adaptive Model-Based Design of Experiments 2026-03-24T16:20:48Z Model-based design of experiments (MBDOE) is essential for efficient parameter estimation in nonlinear dynamical systems. However, conventional adaptive MBDOE requires costly posterior inference and design optimization between each experimental step, precluding real-time applications. We address this by combining Deep Adaptive Design (DAD), which amortizes sequential design into a neural network policy trained offline, with differentiable mechanistic models. For dynamical systems with known governing equations but uncertain parameters, we extend sequential contrastive training objectives to handle nuisance parameters and propose a transformer-based policy architecture that respects the temporal structure of dynamical systems. We demonstrate the approach on four systems of increasing complexity: a fed-batch bioreactor with Monod kinetics, a Haldane bioreactor with uncertain substrate inhibition, a two-compartment pharmacokinetic model with nuisance clearance parameters, and a DC motor for real-time deployment. 2026-03-17T05:53:09Z Arno Strouwen Sebastian Micluţa-Câmpeanu http://arxiv.org/abs/2603.23374v1 Shape-Adaptive Conditional Calibration for Conformal Prediction via Minimax Optimization 2026-03-24T16:05:43Z Achieving valid conditional coverage in conformal prediction is challenging due to the theoretical difficulty of satisfying pointwise constraints in finite samples. Building upon the characterization of conditional coverage through marginal moment restrictions, we introduce Minimax Optimization Predictive Inference (MOPI), a framework that generalizes prior work by optimizing over a flexible class of set-valued mappings during the calibration phase, rather than simply calibrating a fixed sublevel set. This minimax formulation effectively circumvents the structural constraints of predefined score functions, achieving superior shape adaptivity while maintaining a principled connection to the minimization of mean squared coverage error. Theoretically, we provide non-asymptotic oracle inequalities and show that the convergence rate of the coverage error attains the optimal order under regular conditions. The MOPI also enables valid inference conditional on sensitive attributes that are available during calibration but unobserved at test time. Empirical results on complex, non-standard conditional distributions demonstrate that MOPI produces more efficient prediction sets than existing baselines. 2026-03-24T16:05:43Z Yajie Bao Chuchen Zhang Zhaojun Wang Haojie Ren Changliang Zou http://arxiv.org/abs/2603.20904v2 Sparse Weak-Form Discovery of Stochastic Generators 2026-03-24T16:03:23Z The proposed algorithm seeks to provide a novel data-driven framework for the discovery of stochastic differential equations (SDEs) by application of the Weak-formulation to stochastic SINDy. This Weak formulation of the algorithm provides a noise-robust methodology that avoids traditional noisy derivative computation using finite differences. An additional novelty is the adoption of spatial Gaussian test functions in place of temporal test functions, wherein, the use of the kernel weight $K_j(X_{t_n})$ guarantees unbiasedness in expectation and prevents the structural regression bias that is otherwise pertinent temporal test functions. The proposed framework converts the SDE identification problem into two SINDy based linear sparse identification problems. We validate the algorithm on three SDEs, for which we recover all active non-linear terms with coefficient errors below 4\%, stationary-density total-variation distances below 0.01, and autocorrelation functions that reproduce true relaxation timescales across all three benchmarks faithfully. 2026-03-21T18:28:10Z 29 pages, 5 figures Eshwar R A Gajanan V. Honnavar