http://arxiv.org/api/1QTfKX9UVqSWtNbBEKiMa/38CdY2025-05-05T00:00:00-04:00601811515http://arxiv.org/abs/2505.00818v12025-05-01T19:19:29Z2025-05-01T19:19:29ZDual Filter: A Mathematical Framework for Inference using
Transformer-like Architectures This paper presents a mathematical framework for causal nonlinear prediction
in settings where observations are generated from an underlying hidden Markov
model (HMM). Both the problem formulation and the proposed solution are
motivated by the decoder-only transformer architecture, in which a finite
sequence of observations (tokens) is mapped to the conditional probability of
the next token. Our objective is not to construct a mathematical model of a
transformer. Rather, our interest lies in deriving, from first principles,
transformer-like architectures that solve the prediction problem for which the
transformer is designed. The proposed framework is based on an original optimal
control approach, where the prediction objective (MMSE) is reformulated as an
optimal control problem. An analysis of the optimal control problem is
presented leading to a fixed-point equation on the space of probability
measures. To solve the fixed-point equation, we introduce the dual filter, an
iterative algorithm that closely parallels the architecture of decoder-only
transformers. These parallels are discussed in detail along with the
relationship to prior work on mathematical modeling of transformers as
transport on the space of probability measures. Numerical experiments are
provided to illustrate the performance of the algorithm using parameter values
used in researchscale transformer models.
Heng-Sheng ChangPrashant G. Mehta49 pages, 6 figureshttp://arxiv.org/abs/2407.16261v22025-05-01T18:10:53Z2024-07-23T08:03:14ZA martingale-type of characterisation of the Gaussian free field and
fractional Gaussian free fields We establish a martingale-type characterisations for the continuum Gaussian
free field (GFF) and for fractional Gaussian free fields (FGFs), using their
connection to the stochastic heat equation and to fractional stochastic heat
equations. The main theorem on the GFF generalizes previous results of similar
flavour and the characterisation theorems on the FGFs are new. The proof
strategy is to link the resampling dynamics coming from a martingale-type of
decomposition property to the stationary dynamics of the desired field, i.e. to
the (fractional) stochastic heat equation.
Juhan AruGuillaume Woessner28 pages, 1 figurehttp://arxiv.org/abs/2505.00656v12025-05-01T16:59:01Z2025-05-01T16:59:01ZThe local coupling of noise technique and its application to lower error
bounds for strong approximation of SDEs with irregular coefficients In recent years, interest in approximation methods for stochastic
differential equations (SDEs) with non-Lipschitz continuous coefficients has
increased. We show lower bounds for the $L^p$-error of such methods in the case
of approximation at a single point in time or globally in time. On the one
hand, we show that for a large class of piecewise Lipschitz continuous drifts
and non-additive diffusions the best possible $L^p$-error rate for final time
approximation that can be achieved by any method based on finitely many
evaluations of the driving Brownian motion is at most $3/4$, which was
previously known only for additive diffusions. Moreover, we show that the best
$L^p$-error rate for global approximation that can be achieved by any method
based on finitely many evaluations of the driving Brownian motion is at most
$1/2$ when the drift is locally bounded and the diffusion is locally Lipschitz
continuous.
For the derivation of the lower bounds we introduce a new method of proof:
the local coupling of noise technique. Using this technique when approximating
a solution $X$ of the SDE at the final time, a lower bound for the $L^p$-error
of any approximation method based on evaluations of the driving Brownian motion
at the points $t_1 < \dots < t_n$ can be determined by the $L^p$-distances of
solutions of the same SDE on $[t_{i-1}, t_i]$ with initial values $X_{t_{i-1}}$
and driving Brownian motions that are coupled at $t_{i-1}, t_i$ and
independent, conditioned on the values of the Brownian motion at $t_{i-1},
t_i$.
Simon Ellingerhttp://arxiv.org/abs/2505.00641v22025-05-02T21:56:21Z2025-05-01T16:30:55ZExpected First Return Times for Random Walks on Bounded Grids We derive a general formula for computing the expected first return time of a
random walk on a finite graph. Using this framework, we calculate the expected
first return time in various settings over bounded rectangular grids with
different boundary conditions.
Nan Anhttp://arxiv.org/abs/2202.09660v42025-05-01T16:28:01Z2022-02-19T19:03:57ZThe heat flow conjecture for polynomials and random matrices We study the evolution of the roots of a polynomial of degree $N$, when the
polynomial itself is evolving according to the heat flow. We propose a general
conjecture for the large-$N$ limit of this evolution. Specifically, we propose
(1) that the log potential of the limiting root distribution should evolve
according to a certain first-order, nonlinear PDE, and (2) that the limiting
root distribution at a general time should be the push-forward of the initial
distribution under a certain explicit transport map. These results should hold
for sufficiently small times, that is, until singularities begin to form.
We offer three lines of reasoning in support of our conjecture. First, from a
random matrix perspective, the conjecture is supported by a deformation theorem
for the second moment of the characteristic polynomial of certain random matrix
models. Second, from a dynamical systems perspective, the conjecture is
supported by the computation of the second derivative of the roots with respect
to time, which is formally small before singularities form. Third, from a PDE
perspective, the conjecture is supported by the exact PDE\ satisfied by the log
potential of the empirical root distribution of the polynomial, which formally
converges to the desired PDE as $N\rightarrow \infty.$ We also present a
"multiplicative" version of the the conjecture, supported by similar arguments.
Finally, we verify rigorously that the conjectures hold at the level of the
holomorphic moments.
Brian C. HallChing-Wei HoFinal version: 45 pages, 8 figures. Further reorganization since
previous version. To appear in Letters in Mathematical Physicshttp://arxiv.org/abs/2505.00621v12025-05-01T15:56:35Z2025-05-01T15:56:35ZScaling limit of a weakly asymmetric simple exclusion process in the
framework of regularity structures We prove that a parabolically rescaled and suitably renormalised height
function of a weakly asymmetric simple exclusion process on a circle converges
to the Cole-Hopf solution of the KPZ equation. This is an analogue of the
celebrated result by Bertini and Giacomin from 1997 for the exclusion process
on a circle with any particle density. The main goal of this article is to
analyse the interacting particle system using the framework of regularity
structures without applying the Gaertner transform, a discrete version of the
Cole-Hopf transform which linearises the KPZ equation. Our analysis relies on
discretisation framework for regularity structures developed by Erhard and
Hairer as well as estimates for iterated integrals with respect to cadlag
martingales derived by Grazieschi, Matetski and Weber. The main technical
challenge addressed in this work is the renormalisation procedure which
requires a subtle analysis of regularity preserving discrete convolution
operators.
Ruojun HuangKonstantin MatetskiHendrik Weber111 pageshttp://arxiv.org/abs/2505.00601v12025-05-01T15:32:48Z2025-05-01T15:32:48ZA stochastic epidemic model with memory of the last infection and waning
immunity We adapt the article of Forien, Pang, Pardoux and Zotsa: Arxiv preprint
Arxiv2210.04667(2022), on epidemic models with varying infectivity and waning
immunity, to incorporate the memory of the last infection. To this end, we
introduce a parametric approach and consider a piecewise deterministic Markov
process modeling both the evolution of the parameter, also called the trait,
and the age of infection of individuals over time. At each new infection, a new
trait is randomly chosen for the infected individual according to a Markov
kernel, and their age is reset to zero. In the large population limit, we
derive a partial differential equation (PDE) that describes the density of
traits and ages. The main goal is to study the conditions under which endemic
equilibria exist for the deterministic PDE model and to establish an endemicity
threshold that depends on the model parameters. The local stability of these
equilibria is also analyzed. The endemicity threshold is computed for several
examples, including models that incorporate a vaccination policy, and a local
stability result is obtained for a memory-free SIS-type model.
Hélène GuérinArsene Brice Zotsa-NgoufackStochastic epidemic model with memory; age-structured model; varying
infectivity; varying immunity/susceptibility; endemicity; local stabilityhttp://arxiv.org/abs/2502.12037v22025-05-01T14:19:51Z2025-02-17T17:07:06ZInformation geometry of tempered stable processes We find the information geometry of tempered stable processes. Beginning with
the derivation of $\alpha$-divergence between two tempered stable processes, we
obtain the corresponding Fisher information matrices and the
$\alpha$-connections on their statistical manifolds. Furthermore, we explore
statistical applications of this geometric framework. Various tempered stable
processes such as generalized tempered stable processes, classical tempered
stable processes, and rapidly-decreasing tempered stable processes are
presented as illustrative examples.
Jaehyung Choi19 pageshttp://arxiv.org/abs/2501.14263v22025-05-01T11:51:08Z2025-01-24T06:04:30ZAnticipated backward stochastic Volterra integral equations and their
applications to nonzero-sum stochastic differential games In [J. Wen, Y. Shi, Stat. Probab. Lett. 156 (2020) 108599] the authors first
introduced a kind of anticipated backward stochastic Volterra integral
equations (anticipated BSVIEs, for short). By virtue of the duality principle,
it is found in this paper that the anticipated BSVIEs can be applied to the
study of stochastic differential games. For this in this paper we deeply
investigate a more general class of anticipated BSVIEs whose generator includes
both pointwise time-advanced functions and average time-advanced functions. In
theory, the well-posedness and the comparison theorem of anticipated BSVIEs are
established, and some regularity results of adapted M-solutions are proved by
applying Malliavin calculus, which cover the previous results for BSVIEs.
Further, using linear ABSVIEs as the adjoint equation, we present the maximum
principle for the nonzero-sum differential game system of stochastic delay
Volterra integral equations (SDVIEs, for short) for the first time. As one of
the applications of the theorem, a Nash equilibrium point of the
linear-quadratic differential game problem of SDVIEs is obtained.
Bixuan YangTiexin Guo41 pageshttp://arxiv.org/abs/2505.00456v12025-05-01T11:12:41Z2025-05-01T11:12:41ZPost-Lie deformations of pre-Lie algebras and their applications in
Regularity Structures In this paper, we study post-Lie deformations of a pre-Lie algebra, namely
deforming a pre-Lie algebra into a post-Lie algebra. We construct the
differential graded Lie algebra that governs post-Lie deformations of a pre-Lie
algebra. We also develop the post-Lie cohomology theory for a pre-Lie algebra,
by which we classify infinitesimal post-Lie deformations of a pre-Lie algebra
using the second cohomology group. The rigidity of such kind of deformations is
also characterized using the second cohomology group. Finally, we apply this
deformation theory to Regularity Structures. We prove that the post-Lie
algebraic structure on the decorated trees which appears spontaneously in
Regularity Structures is a post-Lie deformation of a pre-Lie algebra.
Yvain BrunedYunhe ShengRong Tang16 pageshttp://arxiv.org/abs/2505.00453v12025-05-01T11:02:11Z2025-05-01T11:02:11ZLévy processes under level-dependent Poissonian switching In this paper, we derive identities for the upward and downward exit problems
and resolvents for a process whose motion changes between two L\'evy processes
if it is above (or below) a barrier $b$ and coincides with a Poissonian arrival
time. This can be expressed in the form of a (hybrid) stochastic differential
equation, for which the existence of its solution is also discussed. All
identities are given in terms of new generalisations of scale functions
(counterparts of the scale functions from the theory of L\'evy processes). To
illustrate the applicability of our results, the probability of ruin is
obtained for a risk process with delays in the dividend payments.
Noah BeeldersLewis RamsdenApostolos D. Papaioannou31 pageshttp://arxiv.org/abs/2505.00451v12025-05-01T10:53:49Z2025-05-01T10:53:49ZThe iterated Dirichlet process and applications to Bayesian inference Consider an i.i.d. sequence of random variables, taking values in some space
$S$, whose underlying distribution is unknown. In problems of Bayesian
inference, one models this unknown distribution as a random measure, and the
law of this random measure is the prior. When $S = \{0, 1\}$, a commonly used
prior is the uniform distribution on $[0, 1]$, or more generally, the beta
distribution. When $S$ is finite, the analogous choice is the Dirichlet
distribution. For a general space $S$, we are led naturally to the Dirichlet
process (see [Ferguson, 1973]).
Here, we consider an array of random variables, and in so doing are led to
what we call the iterated Dirichlet process (IDP). We define the IDP and then
show how to compute the posterior distribution, given a finite set of
observations, using the method of sequential imputation. Ordinarily, this
method requires the existence of certain joint density functions, which the IDP
lacks. We therefore present a new, more general proof of the validity of
sequential imputation, and show that the hypotheses of our proof are satisfied
by the IDP.
Evan DonaldJason Swanson51 pages, 5 figures, 8 tableshttp://arxiv.org/abs/2411.08295v22025-05-01T09:38:16Z2024-11-13T02:31:28ZImproving the convergence of Markov chains via permutations and
projections This paper aims at improving the convergence to equilibrium of finite ergodic
Markov chains via permutations and projections. First, we prove that a specific
mixture of permuted Markov chains arises naturally as a projection under the KL
divergence or the squared-Frobenius norm. We then compare various mixing
properties of the mixture with other competing Markov chain samplers and
demonstrate that it enjoys improved convergence. This geometric perspective
motivates us to propose samplers based on alternating projections to combine
different permutations and to analyze their rate of convergence. We give
necessary, and under some additional assumptions also sufficient, conditions
for the projection to achieve stationarity in the limit in terms of the trace
of the transition matrix. We proceed to discuss tuning strategies of the
projection samplers when these permutations are viewed as parameters. Along the
way, we reveal connections between the mixture and a Markov chain Sylvester's
equation as well as assignment problems, and highlight how these can be used to
understand and improve Markov chain mixing. We provide two examples as
illustrations. In the first example, the projection sampler (with a suitable
choice of the permutation) improves upon Metropolis-Hastings in a discrete
bimodal distribution with a reduced relaxation time from exponential to
polynomial in the system size, while in the second example, the mixture of
permuted Markov chain yields a mixing time that is logarithmic in system size
(with high probability under random permutation), compared to a linear mixing
time in the Diaconis-Holmes-Neal sampler. Finally, we provide numerical
experiments on statistical physics models to illustrate the improved mixing
performance of the proposed projection samplers over standard
Metropolis-Hastings.
Michael C. H. ChoiMax HirdYoujia Wang52 pages, 5 figureshttp://arxiv.org/abs/2310.02765v32025-05-01T09:25:32Z2023-10-04T12:28:12ZSolutions to the stochastic thin-film equation for the range of mobility
exponents $n\in (2,3)$ Recently, many existence results for the stochastic thin-film equation were
established in the case of a quadratic mobility exponent $n=2$, in which the
noise term $\partial_x(u^\frac{n}{2}\mathcal{W})$ becomes linear. In the case
of a non-quadratic mobility exponent, results are only available in the
situation that $n\ge \frac{8}{3}$ leaving the interval of mobility exponents
$n\in (2,\frac{8}{3})$ untreated. In this article we resolve the current gap in
the literature by presenting a proof, which works under the assumption $n\in
(2,3)$, i.e., the regime of weak slippage. The key idea is to use that the
$\log$-entropy dissipation coincides with the energy production due to the
noise. To realize this idea, we approximate the stochastic thin-film equation
by stochastic thin-film equations with inhomogeneous mobility functions, which
behave like a higher power near $0$. As a consequence the approximate solutions
are non-negative, which is vital to use the $\log$-entropy estimate.
Max Sauerbrey42 pages; improved presentation; accepted at Stochastics and Partial
Differential Equations: Analysis and Computationshttp://arxiv.org/abs/2505.00382v12025-05-01T08:19:24Z2025-05-01T08:19:24ZApproximation to Deep Q-Network by Stochastic Delay Differential
Equations Despite the significant breakthroughs that the Deep Q-Network (DQN) has
brought to reinforcement learning, its theoretical analysis remains limited. In
this paper, we construct a stochastic differential delay equation (SDDE) based
on the DQN algorithm and estimate the Wasserstein-1 distance between them. We
provide an upper bound for the distance and prove that the distance between the
two converges to zero as the step size approaches zero. This result allows us
to understand DQN's two key techniques, the experience replay and the target
network, from the perspective of continuous systems. Specifically, the delay
term in the equation, corresponding to the target network, contributes to the
stability of the system. Our approach leverages a refined Lindeberg principle
and an operator comparison to establish these results.
Jianya LuYingjun Mo