On Interpolation Formulas Describing Neural Network Generalization

2026-03-14T10:03:29Z

In 2020 Domingos introduced an interpolation formula valid for "every model trained by gradient descent". He concluded that such models behave approximately as kernel machines. In this work, we extend the Domingos formula to stochastic training. We introduce a stochastic gradient kernel that extends the deterministic version via a continuous-time diffusion approximation. We prove stochastic Domingos theorems and show that the expected network output admits a kernel-machine representation with optimizer-specific weighting. It reveals that training samples contribute through loss-dependent weights and gradient alignment along the training trajectory. We then link the generalization error to the null space of the integral operator induced by the stochastic gradient kernel. The same path-kernel viewpoint provides a unified interpretation of diffusion models and GANs: diffusion induces stage-wise, noise-localized corrections, whereas GANs induce distribution-guided corrections shaped by discriminator geometry. We visualize the evolution of implicit kernels during optimization and quantify out-of-distribution behaviors through a series of numerical experiments. Our results support a feature-space memory view of learning: training stores data-dependent information in an evolving tangent feature geometry, and predictions at test time arise from kernel-weighted retrieval and aggregation of these stored features, with generalization governed by alignment between test points and the learned feature memory.

Linear dynamics of random products of operators

2026-03-14T08:06:42Z

We study the linear dynamics of the random sequence $(T_n(.))_{n \geq 1}$ of the operators $T_n(ω) = T(τ^{n-1}ω) \dotsm T(τω) T(ω), n \geq 1$. These products depend on an ergodic measure-preserving transformation $τ: \mathbb{T} \to \mathbb{T}$ on the probability space $(\mathbb{T}, m)$ and on a strongly measurable map $T : \mathbb{T} \to \mathcal{B}(X)$, where $X$ is a separable Fréchet space. We will be focusing on the case where $T(ω)$ is equal to an operator $T_1$ on $X$ for every $ω\in A_1$ and equal to an operator $T_2$ on $X$ for every $ω\in A_2$, where $A_1, A_2$ are two disjoint Borel subsets of $[0,1)$ such that $A_1 \cup A_2 = [0,1)$ and $m(A_k) > 0$ for $k = 1,2$. More precisely, we will be focusing on the case where the operators $T_1$ and $T_2$ are adjoints of multiplication operators on the Hardy space $H^2(\mathbb{D})$, as well as the case where $T_1$ and $T_2$ are entire functions of exponential type of the derivation operator on the space of entire functions. Finally, we will study the linear dynamics of a case of a random product $T_n(ω)$ for which the operators $T(τ^i ω), i \geq 0$, do not commute. We will give particular importance to the case where the ergodic transformation is an irrational rotation or the doubling map on $\mathbb{T}$.

Hierarchy of extreme-event predictability in turbulence revealed by machine learning

2026-03-14T06:39:29Z

Extreme-event predictability in turbulence is strongly state dependent, yet event-by-event predictability horizons are difficult to quantify without access to governing equations or costly perturbation ensembles. Here we train an autoregressive conditional diffusion model on direct numerical simulations of the two-dimensional Kolmogorov flow and use a CRPS-based skill score to define an event-wise predictability horizon. Enstrophy extremes exhibit a pronounced hierarchy: forecast skill persists from $\approx 1$ to $> 4$ Lyapunov times across events. Spectral filtering shows that these horizons are controlled predominantly by large-scale structures. Extremes are preceded by intense strain cores organizing quadrupolar vortex packets, whose lifetime sharply separates long- from short-horizon events. These results identify coherent-structure persistence as a governing mechanism for the predictability of turbulence extremes and provide a data-driven route to diagnose predictability limits from observations.

Discretized Rotation with fixed initial points

2026-03-14T03:29:57Z

We prove that if $\max\{|a_0|,|a_1|\}\le 10$ and $λ\in\ ]-2,2[$, then the sequence defined by $$ 0 \le a_{n+2} +λa_{n+1}+a_n<1 $$ is periodic.

Infinite graph product of groups I: Geometry of the extension graph

2026-03-13T23:58:42Z

We introduce the extension graph of graph product of groups and study its geometry. This enables us to study properties of graph product by exploiting large scale geometry of its defining graph. In particular, we show that the extension graph is isomorphic to the crossing graph of a canonical quasi-median graph and exhibits the same phenomenon about asymptotic dimension as quasi-trees of metric spaces studied by Bestvina-Bromberg-Fujiwara. As an application of the extension graph, we prove relative hyperbolicity of graph-wreath product. This provides a new construction of relatively hyperbolic groups.

Misiurewicz points and subhyperbolicity in unicritical algebraic correspondences

2026-03-13T22:36:21Z

We provide the first definition of \emph{Misiurewicz parameter} for the unicritical family of algebraic correspondences $ z^r + c$, with $ r > 1$ rational, and prove that, at every Misiurewicz parameter, the correspondence uniformly expands the canonical orbifold metric on a neighborhood of the Julia set. This is achieved using Thurston's ideas on postcritically finite rational maps, regular branched coverings, and orbifolds, viewing the correspondence as a global analytic multifunction. This result provides the necessary tools for further investigations into the fine structure of the parameter space near Misiurewicz points, particularly in exploring similarities between the local geometry of the parameter space and the Julia sets at such parameters. Finally, we present both rigorous examples and empirical evidence suggesting that Misiurewicz parameters are abundant and may be detected by identifying increasingly small copies of the Multibrot set nested within itself: the smaller the copy, the closer it is likely to be to a Misiurewicz parameter.

Feedback Control and Local Convexification of Wasserstein Gradient Flows

2026-03-13T20:57:49Z

For free energies of the form \[ F(μ) = E(μ) + σ\int_Ωμ\logμ\,dx, \quad σ> 0, \] we study the Wasserstein gradient flow, a continuity equation also known as mean-field Langevin dynamics, around a stationary state $\barμ$ on the flat torus. Our first result identifies the Wasserstein Hessian of $F$ at $\barμ$ with a self-adjoint operator with compact resolvent on a Hilbert space of potential variables, and shows that, up to the natural Riesz isometry, this operator generates the linearized gradient flow. This spectral description allows us to design a finite-rank feedback law, via an algebraic Riccati equation, that shifts the closed-loop Hessian spectrum above any prescribed threshold $δ> 0$. As a consequence, the nonlinear closed-loop flow converges locally exponentially to $\barμ$ with rate $δ$. Under an additional second-order remainder assumption on the first variation, the corresponding closed-loop energy is also locally strongly convex in chart coordinates. We illustrate the framework on the flat torus and discuss extensions to multi-species systems, moment-constrained Fokker-Planck equations, and closed Riemannian manifolds.

On the connectedness of the singular set of holomorphic foliations

2026-03-13T17:04:18Z

Let $\mathcal{F}$ be a singular holomorphic foliation of dimension $k>1$ on a projective $n$-manifold $X$. Assume that the determinant of the normal sheaf of $\mathcal{F}$ is ample (as is always the case when $X=\mathbb{P}^{n}$), and that the singular set $Sing(\mathcal{F})$ has dimension $\leq k-1$. We show that the union of those irreducible components of $Sing(\mathcal{F})$ of dimension exactly $k-1$ is necessarily connected. Consequently, we obtain a Bott-type topological obstruction to the integrability of singular holomorphic distributions, echoing Bott's vanishing theorem, and we answer a question of Cerveau for codimension-one foliations on $\mathbb{P}^{3}$.

The Bianchi IX Attractor in Modified Gravity

2026-03-13T16:00:37Z

We consider vacuum anisotropic spatially homogeneous models in certain modified gravity theories (such as Hořava-Lifshitz, $λ$-$R$ or $f(R)$ gravity), which are expected to describe generic spacelike singularities for these theories. These models perturb the well-known Bianchi models in general relativity (GR) by a parameter $v\in (0,1)$ with GR recovered at $v=1/2$. We prove an analogue of the well-known Ringström attractor theorem in GR to the supercritical theories: for any $v\in (1/2,1)$, all solutions of Bianchi type $\mathrm{IX}$ converge to an analogue of the Mixmaster attractor, consisting of Bianchi type I solutions (Kasner states) and heteroclinic chains of Bianchi type II solutions. In contrast to GR, there are no solutions that converge to a different set other than the Mixmaster (such as the locally rotationally symmetric solutions in GR).

Li-Yorke chaotic weighted composition operators on Hardy and Bergman spaces over the unit disk

2026-03-13T15:37:25Z

We study Li--Yorke and mean Li--Yorke chaos for weighted composition operators $C_{w,\varphi}$ on Banach spaces of analytic functions on the unit disk $\mathbb{D}$. Under natural conditions on the space, we show that $C_{w,\varphi}$ is (densely) Li--Yorke chaotic if and only if it is not power-bounded, and (densely) mean Li--Yorke chaotic if and only if it is not absolutely Cesàro bounded. These results are applied to Hardy spaces $H^p(\mathbb{D})$, $1 \le p \le \infty$, and weighted Bergman spaces $A^p_β(\mathbb{D})$, $-1 < β< \infty$ and $1 < p < \infty$.

Fractals made Practical: Denoising Diffusion as Partitioned Iterated Function Systems

2026-03-13T15:15:50Z

What is a diffusion model actually doing when it turns noise into a photograph? We show that the deterministic DDIM reverse chain operates as a Partitioned Iterated Function System (PIFS) and that this framework serves as a unified design language for denoising diffusion model schedules, architectures, and training objectives. From the PIFS structure we derive three computable geometric quantities: a per-step contraction threshold $L^*_t$, a diagonal expansion function $f_t(λ)$ and a global expansion threshold $λ^{**}$. These quantities require no model evaluation and fully characterize the denoising dynamics. They structurally explain the two-regime behavior of diffusion models: global context assembly at high noise via diffuse cross-patch attention and fine-detail synthesis at low noise via patch-by-patch suppression release in strict variance order. Self-attention emerges as the natural primitive for PIFS contraction. The Kaplan-Yorke dimension of the PIFS attractor is determined analytically through a discrete Moran equation on the Lyapunov spectrum. Through the study of the fractal geometry of the PIFS, we derive three optimal design criteria and show that four prominent empirical design choices (the cosine schedule offset, resolution-dependent logSNR shift, Min-SNR loss weighting, and Align Your Steps sampling) each arise as approximate solutions to our explicit geometric optimization problems tuning theory into practice.

Social Distancing Equilibria in Games under Conventional SI Dynamics

2026-03-13T14:06:41Z

The mathematical characterization of social-distancing games in classical epidemic theory remains an important question, for their applications to both infectious-disease theory and memetic theory. We consider a special case of the dynamic finite-duration SI social-distancing game where payoffs are accounted using Markov decision theory with zero-discounting, while distancing is constrained by threshold-linear running-costs, and the running-cost of perfect-distancing is finite. In this special case, we are able construct strategic equilibria satisfying the Nash best-response condition explicitly by integration. Our constructions are obtained using a new change of variables which simplifies the geometry and analysis. As it turns out, there are no singular solutions, and a time-dependent bang-bang strategy consisting of a wait-and-see phase followed by a lock-down phase is always the unique strategic equilibrium. We also show that in a restricted strategy space the bang-bang Nash equilibrium is an ESS, and that the optimal public policy exactly corresponds with the equilibrium strategy.

Covering number on inhomogeneous graph-directed self-similar sets

2026-03-13T13:14:05Z

For a strongly connected inhomogeneous graph-directed self-similar set $K^C$ satisfying the strong open set condition, we characterize the asymptotic behaviour of the $r$-covering number $N_r(K^C)$ as $r \downarrow 0$ in terms of the Minkowski dimension $s_0(G)$ of the attractor. If $\int_0^\infty e^{-s_0(G)t}N_{e^{-t}}(C_i)\,\mathrm{d} t<\infty$ for all vertices $i$, then $e^{-s_0(G)t}N_{e^{-t}}(K^C)$ has a limit as $t\to\infty$, which is a positive constant when the log-contraction group $G_M$ is $\mathbb{R}$ and a positive periodic function when $G_M$ is a lattice; if the integral diverges for some $i$, the limit is infinite.

Multislicing and effective equidistribution for random walks on some homogeneous spaces

2026-03-13T13:09:34Z

We consider a random walk on a homogeneous space $G/Λ$ where $G$ is $\mathrm{SO}(2,1)$ or $\mathrm{SO}(3,1)$ and $Λ$ is a lattice. The walk is driven by a probability measure $μ$ on $G$ whose support generates a Zariski-dense subgroup. We show that for every starting point $x \in G/Λ$ which is not trapped in a finite $μ$-invariant set, the $n$-step distribution $μ^{*n}*δ_{x}$ of the walk equidistributes toward the Haar measure. Moreover, under arithmetic assumptions on the pair $(Λ, μ)$, we show the convergence occurs at an exponential rate, tempered by the obstructions that $x$ may be high in a cusp or close to a finite orbit. Our approach is substantially different from that of Benoist-Quint, whose equidistribution statements only hold in Cesàro average and are not quantitative, that of Bourgain-Furman-Lindenstrauss-Mozes concerning the torus case, and that of Lindenstrauss-Mohammadi-Wang and Yang about the analogous problem for unipotent flows. A key new feature of our proof is the use of a new phenomenon which we call multislicing. The latter is a generalization of the discretized projection theorems à la Bourgain and we believe it presents independent interest.

Fractal Patterns in Discrete Laplacians: Iterative Construction on 2D Square Lattices

2026-03-13T12:12:09Z

We investigate the iterative construction of discrete Laplacians on 2D square lattices, revealing emergent fractal-like patterns shaped by modular arithmetic. While classical 2222-style iterations reproduce known structures such as the Sierpinski triangle, our alternating binary-ternary (2322-style) process produces a novel class of aperiodic figures. These display low density variance, minimal connectivity loss, and non-repetitive organization reminiscent of Dekking's sequences. Fourier and autocorrelation analyses confirm their quasi-periodic nature, suggesting applications in self-assembly, sensor networks, and biological modeling. The findings open new paths toward structured randomness and fractal dynamics in discrete systems. These findings also open avenues for exploring higher-dimensional Laplacian constructions and their implications in quasicrystals, aperiodic tilings, and stochastic processes.