Linear Functional Testing with General Loadings in Sparse Regression: Separation Rates and Computational Barriers

2026-05-20T16:24:54Z

We study the problem of testing $H_0: ξ^\topβ=t_0$ in high-dimensional sparse linear regression with Gaussian random design and unknown design covariance. The loading vector $ξ$ is arbitrary, and the exact sparsity level $k$ is unknown but bounded by a known value $k_u$. Tests are required to control Type I error uniformly over the $k_u$-sparse null, while power is evaluated against $k$-sparse alternatives. We construct a computationally efficient mixed test that gives an upper bound on the adaptive separation distance and establish an information-theoretic lower bound calibrated to the magnitude profile of $ξ$. In the ultra-sparse regime $k_u\lesssim \sqrt n/\log p$, these bounds characterize the adaptive separation rate up to logarithmic factors for arbitrary $ξ$. In the moderately sparse regime $\sqrt n/\log p\ll k_u\lesssim n/\log p$, these bounds match for several classes of loading vectors but may differ in general. In this regime, we further prove a low-degree lower bound that matches the upper bound up to logarithmic factors. This provides evidence that improving on the rate of the mixed test, if statistically possible, may be computationally hard. For flat sparse loadings, we complement this evidence with a polynomial-time reduction from sparse CCA. Finally, we examine how information about the design covariance affects the adaptive separation rate in two settings. Under a sparse signed-spiked covariance model, the information-theoretic lower bound is attainable up to logarithmic factors by a computationally inefficient procedure, while the low-degree lower bound and sparse-CCA reduction continue to apply, providing evidence for a statistical-computational gap. When the design covariance is known and diagonal, the adaptive separation rate takes the same form as in the ultra-sparse regime.

The Generation-Recognition Asymmetry: Six Dimensions of a Fundamental Divide in Formal Language Theory

2026-05-20T15:03:50Z

Every formal grammar defines a language and can in principle be used in three ways: to generate strings (production), to recognize them (parsing), or -- given only examples -- to infer the grammar itself (grammar induction). Generation and recognition are extensionally equivalent -- they characterize the same set -- but operationally asymmetric in multiple independent ways. Inference is a qualitatively harder problem: it does not have access to a known grammar. Despite the centrality of this triad to compiler design, natural language processing, and formal language theory, no survey has treated it as a unified, multidimensional phenomenon. We identify six dimensions along which generation and recognition diverge: computational complexity, ambiguity, directionality, information availability, grammar inference, and temporality. We show that the common characterization "generation is easy, parsing is hard" is misleading: unconstrained generation is trivial, but generation under constraints can be NP-hard. The real asymmetry is that parsing is always constrained (the input is given) while generation need not be. Two of these dimensions -- directionality and temporality -- have not previously been identified as dimensions of the generation-recognition asymmetry. We connect the temporal dimension to the surprisal framework of Hale (2001) and Levy (2008), arguing that surprisal formalizes the temporal asymmetry between a generator (surprisal = 0) and a parser that predicts under uncertainty (surprisal > 0). We review bidirectional systems in NLP and observe that bidirectionality has been available for fifty years yet has not transferred to most domain-specific applications. We conclude with a discussion of large language models, which architecturally unify generation and recognition while operationally preserving the asymmetry.

Towards Single Exponential Time for Temporal and Spatial Reasoning: A Study via Redundancy and Dynamic Programming

2026-05-20T14:54:14Z

The region connection calculus ($RCC$) and Allen's interval algebra ($IA$) are two well-known NP-hard spatial-temporal qualitative reasoning problems. They are solvable in $2^{O(n \log n)}$ time, where $n$ is the number of variables, and $IA$ is additionally known to be solvable in $o(n)^n$ time. However, no improvement over exhaustive search is known for $RCC$, and if they are also solvable in single exponential time $2^{O(n)}$ is unknown. We investigate multiple avenues towards reaching such bounds. First, we show that branching is insufficient since there are too many non-redundant constraints. Concretely, we classify the maximum number of non-redundant constraints in $RCC$ and $IA$. Algorithmically, we make two significant contributions based on dynamic programming (DP). The first algorithm runs in $4^n$ time and is applicable to a non-trivial, NP-hard fragment of $IA$, which includes the well-known interval graph sandwich problem of Golumbic and Shamir (1993). For the richer $RCC$ problem with 8 basic relations we use a more sophisticated approach which asymptotically matches the $o(n)^n$ bound for $IA$.

Resource bounded Kučera-Gács Theorems

2026-05-20T10:13:55Z

The Kučera--Gács theorem is a fundamental result in algorithmic randomness. It states that every infinite sequence $X$ is Turing reducible to a Martin-Löf random $R$. This paper studies resource-bounded analogues of the Kučera-Gács Theorem, at the resource bounds of polynomial-time and finite-state computation. We prove a {quasi-polynomial-time}{ Kučera-Gács Theorem}, showing that every infinite sequence $X$ is quasi-polynomial-time reducible to a \emph{polynomial-time random} sequence $R$. We also show that for any $X$, the oracle use of $R$ is $n+o(n)$ bits for obtaining the first $n$ bits of $X$. We then study the relationship between compressibility and Turing reductions, in the polynomial-time setting. We establish that $ρ^-_{\mathsf{poly}}(X) = K_{poly}(X)$, demonstrating that the lower polynomial-time Turing decompression ratio is precisely characterized by the polynomial-time Kolmogorov complexity rate. We note that this characterization fails for the polynomial-time dimension if one-way functions exist, resolving an open problem from Doty's work. We use these results to strengthen the {quasi-polynomial-time}{ Kučera-Gács Theorem}. We show that every infinite sequence $X$ is quasi-polynomial-time reducible to a {polynomial-time random} sequence $R$, where the lower oracle use rate of the reduction is less than ${K}_{poly}(X)$. We also show that any sequence extracted from the (even larger) set of \emph{normal sequences} by a finite-state reduction must have a convergent asymptotic frequency for its symbols. Since sequences lacking this invariant property exist, they cannot be finite-state reduced from any normal sequence. Hence we show that the Kučera-Gács theorem \emph{fails} for finite-state reductions.

Gradient Scalability and Taylor Surrogation of Quantum Cost Landscapes

2026-05-20T10:06:53Z

Variational Quantum Algorithms are promising candidates for near-term quantum computing, yet they face scalability challenges due to barren plateaus, where gradients vanish exponentially relative to system size. Recent conjectures suggest that avoiding these plateaus might inherently lead to classical simulability, thereby limiting the opportunities for quantum advantage. In this work, we advance the theoretical understanding of the relationship between gradient scalability at initialization and the computational complexity of variational quantum algorithms. We first present the Taylor surrogate, a classical simulation technique that matches Pauli path runtime guarantees on near-Clifford regions while offering runtime advantages in specific regimes. Leveraging this surrogate, we prove that beyond previously established classically simulable regions, the computational complexity is at least super-polynomial. Next, we introduce the Linear Clifford Encoder, a classically efficient ansatz modifier that ensures constant-scaling gradients within landscape regions close to Clifford circuits. Finally, numerical experiments on these modified landscapes provide preliminary empirical evidence of a transition zone where constant-scaling gradients may decay polynomially in super-polynomially complex regions rather than exponentially. These findings suggest speculative instances where non-vanishing gradients and super-polynomial complexity could potentially coexist, vindicating the need for future formal proofs.

Learning junta distributions, quantum junta states, and QAC$^0$ circuits

2026-05-20T09:36:40Z

In this work, we consider the problems of learning junta distributions, their quantum counterparts (quantum junta states) and $\mathsf{QAC}^0$ circuits, which we show to be close to juntas. (1) Junta distributions. A probability distribution $p:\{-1,1\}^n\to \mathbb [0,1]$ is a $k$-junta if it only depends on $k$ bits. We show that they can be learned with to error $\varepsilon$ in total variation distance from $O(2^k\log(n)/\varepsilon^2)$ samples, which quadratically improves the upper bound of Aliakbarpour et al. (COLT'16) and matches their lower bound in every parameter. (2) Junta states. We initiate the study of $n$-qubit states that are $k$-juntas, those that are the tensor product of a $k$-qubit state and an $(n-k)$-qubit maximally mixed state. We show that these states can be learned with error $\varepsilon$ in trace distance with $O(12^{k}\log(n)/\varepsilon^2)$ single copies. We also prove a lower bound of $Ω((4^k+\log (n))/\varepsilon^2)$ copies. Additionally, we show that, for constant $k$, $\tildeΘ(2^n/\varepsilon^2)$ copies are necessary and sufficient to test whether a state is $\varepsilon$-close or $7\varepsilon$-far from being a $k$-junta. (3) $\mathsf{QAC}^0$ circuits. Nadimpalli et al. (STOC'24) recently showed that the Pauli spectrum of $\mathsf{QAC}^0$ circuits (with a limited number of auxiliary qubits) is concentrated on low-degree. We remark that they implied something stronger, namely that the Choi states of those circuits are close to be juntas. As a consequence, we show that $n$-qubit $\mathsf{QAC}^0$ circuits with size $s$, depth $d$ and $a$ auxiliary qubits can be learned from $2^{O(\log(s^22^a)^d)}\log (n)$ copies of the Choi state, improving the $n^{O(\log(s^22^a)^d)}$ by Nadimpalli et al.

Efficient Matrix Product State Learning in Logarithmic Depth

2026-05-20T05:02:41Z

Learning the closest matrix product state (MPS) representation of a quantum state enables useful tools for quantum machine learning and analysis of complex quantum systems. In this work, we study the problem of learning MPS in the following setting: given many copies of an input MPS, the task is to recover a classical description of the state. The best known polynomial-time algorithm, introduced by [LCLP10, CPF+10], requires linear circuit depth and $\widetilde O(n^5)$ samples, and has seen no improvement in over a decade. These costs, neither known to be optimal, renders existing algorithms impractical for near-term quantum devices with limited resources. We introduce parallel disentangling algorithms for MPS learning. For exact MPS learning, our algorithm runs in polynomial time and uses circuit depth $O(\log n)$ and sample complexity $\widetilde O(n^3)$, improving both the depth and the dependence on the system size $n$. The key idea is to exploit the bounded-rank structure of reduced states on middle blocks of an MPS and organize the disentangling operations in a tree structure. We further extend the algorithm to closest MPS learning, improving the sample complexity dependence on $n$ from $n^9$ to $n^7$ and complement the algorithms with an $Ω(n)$ product-state lower bound. We also investigate MPS learning under hardware constraints, including restricted measurements and geometric connectivity. Under the Learning Parity with Noise (LPN) assumption, we show computational hardness for learning an MPS(2) family with non-adaptive single-qubit measurements. Finally, we show that our algorithm can be implemented with depth $O(q n^{1/q})$ on a $q$-dimensional hypercubic lattice, giving an asymptotic reduction in depth. Together, our work provides a complete characterization of the quantum resources needed for efficient MPS learning.

Model-agnostic super-resolution in high dimensions

2026-05-20T02:36:27Z

The problem of super-resolution, roughly speaking, is to reconstruct an unknown signal to high accuracy, given (potentially noisy) information about its low-degree Fourier coefficients. Prior results on super-resolution have imposed strong modeling assumptions on the signal, typically requiring that it is a linear combination of spatially separated point sources. In this work we analyze a very general version of the super-resolution problem by considering completely general non-negative signals (equivalently, distributions) over the $d$-dimensional torus $[0,1)^d$; we do not assume any spatial separation between point sources, or even that the distribution is a finite linear combination of point sources. The question naturally arises: what can be said about super-resolution in such a general setting? - As a warm-up, we first give a set of results for reconstructing distributions under the Wasserstein distance. We establish essentially matching upper and lower bounds on the cutoff frequency $T$ and the magnitude $κ$ of the noise for which accurate reconstruction is possible: we show that for $d$-dimensional distributions, estimates of $\approx \exp(d)$ many Fourier coefficients are both necessary and sufficient for accurate Wasserstein reconstruction. - As our main result, we define a new notion of "heavy hitter" reconstruction for distributions, which essentially amounts to achieving high-accuracy reconstruction of all "sufficiently dense" regions of the distribution. We give essentially matching upper and lower bounds on the cutoff frequency $T$ and the magnitude $κ$ of the noise for which accurate reconstruction is possible under this notion. Our results show that (in sharp contrast with Wasserstein reconstruction) accurate estimates of only $\approx \exp(\sqrt{d})$ many Fourier coefficients are both necessary and sufficient for heavy hitter reconstruction.

Optimizing Computational-Statistical Runtime for Wasserstein Distance Estimation

2026-05-19T17:10:47Z

Squared Wasserstein distance is a frequently used tool to measure discrepancy between probability distributions. This distance is typically computed between empirical measures of size $n$ from two underlying random samples. Unfortunately, even in lower dimensional Euclidean space problems $\left( d \in \{2,3\} \right)$, algorithms for Wasserstein distance computation with approximate or exact precision guarantees scale poorly in the runtime as a function of $n$ and the desired precision. In response, we consider the computational-statistical runtime, where the goal is to estimate from samples the Wasserstein distance between potentially smooth measures up to $ε$-additive error in expectation with respect to the sampling; we allow $O(1)$ computational cost for collecting a sample. Towards this, we develop a Sample-Sketch-Solve paradigm where we introduce a regular cartesian grid sketch of the samples. We show that (especially under $α$-Hölder smooth distributions) this can compress the data without increasing asymptotic error, and also regularizes the structure which enables faster exact algorithms. Ultimately, we approximate $W_2^2(P,Q)$ within $ε$ error in $ε^{-\max(2,\frac{d+1+o(1)}{1+α})}$ time for $0 < α< 1$ Hölder smooth distributions $P,Q$ on $(0,1)^{d}$; an optimal $Θ(ε^{-2})$ for $α> 1/2$ when $d=2$ and nearly optimal as $α\to 1$ when $d = 3$.

A Measure-Theoretic Analysis of Reasoning: Structural Generalization and Approximation Limits

2026-05-19T15:00:51Z

While empirical scaling laws for LLM reasoning are well-documented, the theoretical mechanisms governing out-of-distribution (OOD) generalization remain elusive. We formalize reasoning via optimal transport, projecting discrete trajectories into a continuous metric space to quantify domain shifts using the Wasserstein-1 distance. Invoking Kantorovich duality, we bound OOD generalization via architectural Lipschitz continuity and functional approximation limits. This exposes two primary constraints. First, position-dependent attention (e.g., Absolute Positional Encoding) fails to preserve shift invariance, yielding an $Ω(1)$ Lipschitz constant and expected risk, whereas shift-invariant mechanisms (e.g., Rotary Embeddings) preserve equivariance and bound the error. Second, by mapping sequential backtracking to a Dyck-$k$ language, we establish a strict circuit depth lower bound for $\text{TC}^0$ Transformers. Scaling physical layer depth is necessary to avert representation collapse -- a constraint that scaling representation width cannot bypass due to irreducible approximation bounds in Barron spaces. Evaluations across 54 Transformer configurations on combinatorial search corroborate these bounds, demonstrating that generalization risk degrades monotonically with the Wasserstein domain shift.

Hive is PSPACE-Hard

2026-05-19T12:52:37Z

Hive is an abstract strategy game played on a table with hexagonal pieces. First published in 2001, it was and continues to be highly popular among both casual and competitive players. In this paper, we show that for a suitably generalized version of the game, the computational problem of determining whether a given player in an arbitrary position has a winning strategy is PSPACE-hard. We do this by reduction from a variant of Generalized Geography we call Formula Game Geography.

A Hierarchy of Tinhofer Graphs: Separations and Membership Testing

2026-05-19T11:38:21Z

Color refinement is an important technique that works very well in practice for the graph isomorphism problem. Tinhofer graphs are the class of graphs for which refinement together with individualization correctly tests graph isomorphism against every other graph, irrespective of the choices of vertices made during individualization. Motivated by the fact that Tinhofer graphs form a natural boundary for efficient graph isomorphism tests based on color refinement, in this paper, we introduce a hierarchy of graph classes within the class of Tinhofer graphs. We call a graph $G$ $k$-Tinhofer if, after $k$ rounds of individualization and refinement, the resulting colored graphs remain isomorphic for every graph $H \cong G$, irrespective of the choices of vertices made during individualization. Arvind et al. (2017) studied a hierarchy of graph classes motivated by color refinement - discrete, amenable, Tinhofer, and refinable graphs. We show that the $k$-Tinhofer hierarchy lies between the class of all graphs and Tinhofer graphs, with refinable graphs coinciding with the first level of the hierarchy. We obtain two characterizations of $k$-Tinhofer graphs: an algebraic characterization in terms of orbit partitions induced by pointwise stabilizers of automorphism groups, and a combinatorial characterization in terms of individualization-refinement trees and quotient graphs. For every fixed integer $k \ge 0$, there exist vertex-colored graphs that are $k$-Tinhofer but not $(k + 1)$-Tinhofer. For every fixed integer $k \ge 0$, the problem of deciding whether a given $k$-Tinhofer graph is ($k + 1$)-Tinhofer is $P$-hard under uniform $\mathsf{AC^0}$ many-one reductions. We show that testing isomorphism between an $(n - k)$-Tinhofer graph $G$ and an arbitrary graph $H$ is fixed-parameter tractable with respect to the parameter $k$.

Limit on the computational power of $\mathrm{C}$-random strings

2026-05-19T09:35:17Z

We construct a universal decompressor $U$ for plain Kolmogorov complexity $\mathrm{C}_U$ such that the Halting Problem cannot be decided by any polynomial-time oracle machine with access to the set of random strings $R_{\mathrm{C}_U} = \{x : \mathrm{C}_U(x) \ge |x|\}$. This result resolves a problem posed by Eric Allender regarding the computational power of Kolmogorov complexity-based oracles.

Prover-Adversary games for systems over (non-deterministic) branching programs

2026-05-19T09:25:56Z

We introduce Pudlak-Buss style Prover-Adversary games to characterise proof systems reasoning over deterministic branching programs (BPs) and non-deterministic branching programs (NBPs). Our starting points are the proof systems eLDT and eLNDT, for BPs and NBPs respectively, previously introduced by Buss, Das and Knop. We prove polynomial equivalences between these proof systems and the corresponding games we introduce. This crucially requires access to a form of negation of branching programs which, for NBPs, requires us to formalise a non-uniform version of the Immerman-Szelepcsenyi theorem that coNL = NL. Thanks to the techniques developed, we further obtain a proof complexity theoretic version of Immerman-Szelepcsenyi, showing that eLNDT is polynomially equivalent to systems over boundedly alternating branching programs.

The Proof Analysis Problem

2026-05-19T08:26:37Z

Atserias and Müller (JACM, 2020) proved that for every unsatisfiable CNF formula $\varphi$, the formula $\operatorname{Ref}(\varphi)$, stating "$\varphi$ has small Resolution refutations", does not have subexponential-size Resolution refutations. Conversely, when $\varphi$ is satisfiable, Pudlák (TCS, 2003) showed how to construct a polynomial-size Resolution refutation of $\operatorname{Ref}(\varphi)$ given a satisfying assignment of $\varphi$. A question that remained open is: do all short Resolution refutations of $\operatorname{Ref}(\varphi)$ explicitly leak a satisfying assignment of $\varphi$? We answer this question affirmatively by giving a polynomial-time algorithm that extracts a satisfying assignment for $\varphi$ given any short Resolution refutation of $\operatorname{Ref}(\varphi)$. The algorithm follows from a new feasibly constructive proof of the Atserias-Müller lower bound, formalizable in Cook's theory $\mathsf{PV_1}$ of bounded arithmetic. Motivated by this, we introduce a computational problem concerning Resolution lower bounds: the Proof Analysis Problem (PAP). For a proof system $Q$, the Proof Analysis Problem for $Q$ asks, given a CNF formula $\varphi$ and a $Q$-proof of a Resolution lower bound for $\varphi$, encoded as $\neg \operatorname{Ref}(\varphi)$, whether $\varphi$ is satisfiable. In contrast to PAP for Resolution, we prove that PAP for Extended Frege (EF) is NP-complete. Our results yield new insights into proof complexity: (i) every proof system simulating EF is (weakly) automatable if and only if it is (weakly) automatable on formulas stating Resolution lower bounds; (ii) we provide Ref formulas exponentially hard for bounded-depth Frege systems; and (iii) for every strong enough theory of arithmetic $T$ we construct unsatisfiable CNF formulas exponentially hard for Resolution but for which $T$ cannot prove even a quadratic lower bound.