Non-normal spectral signatures of instability in neural network training dynamics

2026-05-22T10:36:48Z

Training instabilities in deep networks - loss spikes, oscillatory convergence, and gradient pathologies - are empirically prevalent but lack a rigorous operator-theoretic explanation. We show that the linearized update operators for practically used optimizers are generically non-normal: for Adam, non-normality is controlled by the commutator [H, M] between the Hessian and the diagonal adaptive preconditioner, while for SGD with momentum it arises from the augmented state-space structure of the update map. Applying non-normal stability theory to these operators, we derive a conservative pseudospectral precursor bound in which κ(V) serves as an early-warning indicator of transient amplification even when the spectral radius remains below one, and we establish that exceptional points of the update operator appear as the κ(V) -> \infty limiting case of this framework. Numerical experiments on two-layer networks confirm that the spectral radius ρ(J) provides no separation between stable and unstable training phases while κ(V) separates them by approximately one order of magnitude, complementing the classical sharpness criterion with a continuous severity measure of non-normal amplification. These results establish non-Hermitian operator theory as a useful and underexplored framework for neural network optimization stability, offering a diagnostic language and proof-of-concept benchmark for understanding adaptive optimization stability.

Analysis of spin avalanches due to interplay of disorder and temperature

2026-05-22T09:47:18Z

The nonequilibrium zero-temperature Random Field Ising Model (RFIM) has been extensively studied to understand critical response and avalanches in disordered driven systems. The emergence of power-law behaviour is observed over a wide region around the critical point. These studies however, are confined to zero-temperature dynamics. We study the role of temperature, which is inevitable in real experiments, in the context of RFIM on triangular lattices. We explore the interplay of different parameters: temperature, random field strength, and relaxation time which affect the prevalence of power-law behaviour on the lattice. The results indicate that power-law survives only in the regime of low temperature or small and intermediate disorder. Variations in temperature and disorder have similar affects on the avalanche-size distribution, indicating their strong correspondence. We also discuss the process of blurring out of the power law on increasing temperature or disorder.

Starting from the amorphous ground state: linking landscape thermodynamics to slow dynamics and crossover

2026-05-22T09:39:54Z

A microscopic understanding of low-temperature thermodynamics and its relation to dynamical features such as a fragile-to-strong crossover (FSC) remains a central challenge in glass physics. Using swap Monte Carlo combined with a full potential-energy-landscape (PEL) analysis of a non-network-forming model, we obtain equilibrium data deep into the glassy regime and identify a finite system size that simultaneously reproduces bulk behaviour for $T \gtrsim T_g/2$ and allows complete sampling of the PEL down to its lowest-energy amorphous states. This enables the direct computation of the configurational entropy over the full temperature range of the finite system without relying on liquid-state thermodynamic integration. We find a pronounced depletion of low-energy states relative to the Gaussian regime of the PEL, which governs the low-temperature curvature of the configurational entropy. Numerically, the apparent activation energy of the diffusivity closely follows the temperature dependence of the mean inherent structure energy and exhibits a gradual crossover towards Arrhenius-like behaviour. This correlation is consistent with a trap-model description of the PEL, in which the FSC emerges naturally as a consequence of the depletion of low-energy states and thus of the lower bound of the PEL. We further argue, as illustrated analytically for a simple binomial model of the PEL, that the observability of a FSC depends on whether the depletion regime is reached within the accessible temperature window.

Two-lifetime model for the cuprates revisited

2026-05-22T09:20:17Z

Several models of the strange-metal state of the cuprate superconductors postulate the existence of strong inelastic forward scattering of the electrons, but direct evidence of such scattering is missing. Here, we show that angle-resolved photoemission spectroscopy (ARPES) provides a unique tool which can address this issue. We propose a two-lifetime phenomenological model of the superconducting state of the cuprates, and we show that it explains several salient low-energy features of the measured ARPES spectra. The model enables discrimination between forward- and large-angle scattering and, in addition, gives access to the magnitude of the gap function away from the Fermi surface.

Comment on "Spin-1/2 Kagome Heisenberg Antiferromagnet: Machine Learning Discovery of the Spinon Pair-Density-Wave Ground State"

2026-05-21T19:06:19Z

A recent article [Phys. Rev. X 15, 011047 (2025)] utilizes group-equivariant convolutional neural networks to study the ground state of the kagome Heisenberg antiferromagnet. On the largest finite-size cluster studied to date ($N=108$), the authors report variational energies significantly lower than other numerical methods, including state-of-the-art density matrix renormalization group (DMRG) calculations. In contrast to previous results suggesting a possible spin-liquid ground state, the authors observe a spinon pair-density-wave ground state. We find that: (i) the reported low energies are artifacts of broken ergodicity in the Metropolis--Hastings sampling, since the single-spin-flip update rule utilized by the authors effectively freezes the Markov chains; and (ii) when ergodic sampling is enforced via spin-exchange updates, the neural network converges to energies significantly higher than existing DMRG results, calling the paper's claims into question.

Persistence of asymptotic variance under transport: from hyperfluctuation to stealthy hyperuniformity

2026-05-21T17:54:43Z

We introduce $p$-uniformity to characterize the scaling of density fluctuations in spatial random systems in $\mathbb{R}^d$, ranging from hyperfluctuation to stealthy hyperuniformity. Our central theorem establishes sufficient conditions to preserve $p$-uniformity under transport. The first condition, a finite $(d+p)$-th moment of the transport distance, allows for a Taylor expansion of the transport. The second condition controls the corresponding terms. We thus solve a previously stated open problem; indeed we extend it, since our result applies to a general $p$-uniform source in any dimension, and the source and transport may be dependent. As an application, we construct new classes of point processes that are isotropic and $p$-uniform with arbitrarily high $p$, and that can be simulated in linear time. We conclude with an outlook on a converse statement.

Spin Glass Mapping of the Parallel Minority Game

2026-05-21T17:25:13Z

The parallel minority game (PMG) extends the classical minority game to many choices, with each agent restricted to two predetermined alternatives. In this condition, minimizing the population variance across all choices is a complex combinatorial optimization problem. We show that this minimization is exactly equivalent to finding the ground state of an Ising spin glass in the mean-field limit, i.e., the Sherrington-Kirkpatrick model. By encoding the agent choices as spin variables, the variance becomes a quadratic Hamiltonian with quenched random couplings $J_{ij}$ and random fields $h_i$. This mapping reveals inherent frustration and connects the PMG to the well developed theory of spin glasses, providing a new perspective on the frozen, sub-optimal configurations observed in stochastic strategies.

Spectral Dynamics in Deep Networks: Feature Learning, Outlier Escape, and Learning Rate Transfer

2026-05-21T16:56:03Z

We study the evolution of hidden-weight spectra in wide neural networks trained by (stochastic) gradient descent. We develop a two-level dynamical mean-field theory (DMFT) that jointly tracks bulk and outlier spectral dynamics for spiked ensembles whose spike directions remain statistically dependent on the random bulk. We apply this framework to two settings: (1) infinite-width nonlinear networks in mean-field/$μ$P scaling and (2) deep linear networks in the proportional high-dimensional limit, where width, input dimension, and sample size diverge with fixed ratios. Our theory predicts how outliers evolve with training time, width, output scale, and initialization variance. In deep linear networks, $μ$P yields width-consistent outlier dynamics and hyperparameter transfer, including width-stable growth of the leading NTK mode toward the edge of stability (EoS). In contrast, NTK parameterization exhibits strongly width-dependent outlier dynamics, despite converging to a stable large-width limit. We show that this bulk+outlier picture is descriptive of simple tasks with small output channels, but that tasks involving large numbers of outputs (ImageNet classification or GPT language modeling) are better described by a restructuring of the spectral bulk. We develop a toy model with extensive output channels that recapitulates this phenomenon and show that edge of the spectrum still converges for sufficiently wide networks.

Directed extended-range percolation

2026-05-21T15:51:57Z

While for standard percolation directionality is known to increase the combinatorial complexity of percolation, here we show that when connectivity is ensured by paths of length $R\geq 2$, network directionality, impeding backtracking, can significantly reduce the complexity of percolation. To illustrate this finding, we introduce Directed Extended-Range Percolation (DERP), defined directed networks with non-reciprocal edges, motivated by applications in quantum communication. In this framework, message transmission is enabled between trusted nodes separated by a directed path of length at most $R$. Using a message-passing approach, we show that directionality enables an exact determination of the percolation threshold and the anomalous critical indices on locally tree-like structures. On random directed networks we find that the critical behavior of DERP depends sensitively on degree correlations. These analytical predictions are corroborated by extensive Monte Carlo simulations, highlighting the profound impact of directionality and correlations on long-range connectivity in complex networks.

Ising surface defects can get dirty

2026-05-21T15:38:22Z

Real critical systems, such as uniaxial ferromagnets in the 3d Ising universality class, are constrained by boundaries and subject to random couplings. We consider the Wilson-Fisher fixed point in $4-ε$ dimensions subject to a random magnetic field localized on a two-dimensional surface, which becomes co-dimension 1 in the physical $ε\to1$ limit. Using the replica method for the disordered field, we find that the ordinary boundary condition is stable under disorder but also discover a non-trivial ``dirty" boundary condition which can be reached by tuning the disorder strength or the local temperature. We also investigate the logarithmic structure of the defect spectrum and how it emerges via the replica formalism.

Exact and mean-field analysis of the role of Hubbard interactions on flux driven circular current in a quantum ring

2026-05-21T15:12:45Z

We investigate circular current in both ordered and disordered Hubbard quantum rings threaded by magnetic flux, employing exact diagonalization and the Hartree-Fock mean-field approach within the tight-binding framework. The influence of on-site and extended Hubbard interactions, disorder, and electron filling on the persistent current is systematically analyzed. To construct the full many-body Hamiltonian, we introduce a linear table formalism, which, to our knowledge, has been rarely used in this context. In ordered rings, the current decreases monotonically with increasing on-site repulsion, while the impact of the extended interaction depends strongly on the filling factor. At low filling, stronger extended interaction suppresses the current, whereas near half-filling, it enhances the current up to a critical ratio, half of the on-site strength, before reducing it. Disorder significantly modifies these behaviors, notably enhancing the current at less than quarter-filling with increasing extended interaction. The localization properties of eigenstates, examined via the inverse participation ratio, further support the crucial roles of filling and the interplay between on-site and extended interactions in governing persistent current.

A Boundary-Layer Mechanism for One-Third Scaling in Online Softmax Classification

2026-05-21T11:26:32Z

Hard-label classification is usually trained with smooth surrogate losses, most prominently softmax cross-entropy. We isolate an asymptotic mechanism by which this mismatch between smooth surrogate and discrete labels produces power-law learning curves in an online teacher-student model. After subtracting the mean logit, the thermodynamic-limit dynamics close in centered variables: a growing centered student-teacher alignment $D$ and the residual student variance $Δ$. At late times, examples away from teacher decision boundaries are already classified confidently and contribute exponentially little. Only boundary layers of width $O(D^{-1})$ remain active, while the noise of fixed-learning-rate online gradient descent maintains a nonzero $Δ$. As a function of the training time $α$ the late-time solution yields a $α^{-1/3}$ power law not only for the test loss but also for the generalization error $ε_g$, i.e., one minus test accuracy. This is much slower than the $α^{-1}$ Bayes-optimal reference for the same model. We further show that learning-rate schedules can improve the generalization error towards a $ε_g \sim α^{-1/2}$ power law. Simulations support the predicted order parameter dynamics and learning curves. Controlled experiments with correlated Gaussian inputs and whitened pretrained features show that data structure can dominate transients. Therefore, our result is an asymptotic, complementary mechanism rather than an alternative to spectral explanations of neural scaling laws.

Possible Topological Decoherence Transition in Relativistic Electron Beams Propagating through Coulomb-Disordered Media

2026-05-21T11:02:22Z

We show that the mutual coherence of a relativistic electron beam in a Coulomb-disordered medium is governed by an effective two-dimensional compact phase field with a logarithmic correlation function. The corresponding Gaussian free-field action exhibits a stiffness inversely proportional to the propagation length. When the compact nature of the phase is taken into account, the system supports vortex excitations that interact as a two-dimensional Coulomb gas. Renormalization-group analysis of this gas indicates the existence of a critical sample thickness $L_c$ at which a Berezinskii--Kosterlitz--Thouless (BKT) transition may occur, separating a regime of algebraic decoherence from one where free vortices proliferate and coherence is destroyed exponentially. The critical thickness is expressed through fundamental microscopic parameters and could be observed in transmission electron microscopy of liquid cells or cryogenic samples.

Berezinskii-Kosterlitz-Thouless-type Transition in Site Percolation on the Diamond Hierarchical Lattice

2026-05-21T07:39:04Z

We study site percolation on the diamond hierarchical lattice, a finite-dimensional fractal network, using an exact generating-function analysis. In contrast to bond percolation, site percolation on this lattice does not undergo a transition from a nonpercolating phase to a percolating phase. Instead, the system exhibits a nonpercolating phase for $pp_{\rm c}$. In the critical phase, the size of the largest cluster remains subextensive, scaling as $N^{ψ(p)}$, where the fractal exponent $ψ(p)$ varies continuously with $p$. By analyzing the renormalization-group recursion relation in the vicinity of $p_{\rm c}$, we show that the correlation length exhibits a Berezinskii-Kosterlitz-Thouless-type essential singularity, $ξ(p)\sim \exp \left({\rm const}/\sqrt{p_{\rm c}-p}\right)$ for $p \to p_{\rm c}^-$, which is further confirmed by finite-size scaling analyses showing excellent data collapse. These results demonstrate that critical phases in percolation can emerge even on finite-dimensional networks and that exponential volume growth is not necessary for such phases to appear. We argue that the critical phase on the diamond hierarchical lattice stems from site dilution remaining relevant under renormalization.

How does Chain of Thought decompose complex tasks?

2026-05-21T04:59:39Z

Many language tasks can be modeled as classification problems where a large language model (LLM) is given a prompt and selects one among many possible answers. We show that the classification error in such problems scales as a power law in the number of classes. This has a dramatic consequence: the prediction error can be reduced substantially by splitting the overall task into a sequence of smaller classification problems, each with the same number of classes ("degree"). This tree-structured decomposition models chain-of-thought (CoT). It has been observed that CoT-based predictors perform better when they "think", i.e., when they develop a deeper tree, thus decomposing the problem into a larger number of steps. We identify a critical threshold for the degree, below which thinking is detrimental, and above which there exists an optimal depth that minimizes the error. It is impossible to surpass this minimal error by increasing the depth of thinking.