One-Shot Klein Cutting Planes for Lipschitz Geodesically Convex Optimization in Hyperbolic Space

2026-05-28T19:47:08Z

Motivated by the COLT 2023 open problem of Criscitiello, Martínez-Rubio, and Boumal on deterministic first-order methods for Lipschitz geodesically convex optimization on Hadamard manifolds, we study hyperbolic space \[ \HH^d_{-\kappaC^2} =\{X\in\R^{d+1}:\ipL{X}{X}=-1,\ X_0>0\}, \qquad \ip{U}{V}_X=\kappaC^{-2}\ipL{U}{V}. \] For every geodesically convex $M$-Lipschitz function \[ f:\bar B_{\HH}(x_0,r)\to\R,\qquad s=\kappaC r, \] we give a one-shot Klein cutting-plane method returning a queried point $\hat x$ such that \[ f(\hat x)-\min_{\bar B_{\HH}(x_0,r)}f\le \eps Mr \] after at most \[ \left\lceil 2d(d+1)\log\!\left(\frac{16\sinh s\cosh s}{s\eps}\right) \right\rceil \] oracle calls. For $d\ge2$, each localization step costs $O(d^2)$ arithmetic operations; for $d=1$, an interval variant gives the same oracle bound. Hence \[ N=O\bigl(d^2(s+\log(e/\eps))\bigr) =O\bigl(d^2ζ_s\log(e/\eps)\bigr), \qquad ζ_s=s/\tanh s . \] Compared with the constant-curvature construction associated with the COLT problem, this replaces chained curvature--accuracy dependence by additive dependence. The proof does not rely on convexity of the Klein pullback, which is generally only quasiconvex. Instead, every Riemannian subgradient halfspace becomes an exact Euclidean central cut: for $θ=\kappaC\dist(X,Y)$, \[ \ip{g}{\log_XY}_X =\fracθ{\kappaC^2\sinhθ}\ipL{g}{Y}, \] and tangency at $X$ converts $\ipL{g}{Y}\le0$ into \[ \gbar^{\mathsf T}(u-c)\le0,\qquad u=Φ(Y),\ c=Φ(X). \] Thus one fixed Euclidean ellipsoid localizes the hyperbolic ball, and curvature enters only through \[ \log\!\left(\frac{\sinh s\cosh s}{s\eps}\right) =\log(1/\eps)+2s-\log(4s)+O(e^{-4s}). \] The general Hadamard-manifold problem remains open.

Faster PBWT prefix-array access via batching

2026-05-28T18:39:41Z

The positional Burrows-Wheeler Transform (PBWT) is commonly used to store haplotype panels compactly in such a way that, given a query haplotype, we can quickly find the set maximal exact matches (SMEMs) between the query and the haplotypes in a panel. There are generally two steps in this process: first we find the maximal substrings of the query that occur in the same positions in haplotypes in the panel and then, for each such substring, report the haplotypes in the panel in which the substring occurs in the same position as in the query. Very recently, Bonizzoni, Gagie and Gao (2026) gave two time-space tradeoffs for the second step: they use either $O ((r + h) \log n)$ bits and $O (\log \log \min (h, \ell) + k)$ time to report $k$ haplotypes in the panel, or $O (r \log h + h \log n)$ bits and $O (k \log \log h)$ time, where $r$ is the number of runs in the panel's PBWT and $h$, $\ell$ and $n = h \ell$ are the panel's height, length and size, respectively. We observe here that if we can batch queries until we have found $r \lg (h) / \lg r$ such substrings and we report an average of at least $\lg (r) / \lg h$ haplotypes in the panel per substring, for example, then for the second step we can easily use $O (r \log h)$ bits and constant time to report each haplotype. Our approach is based on an algorithm for constructing the prefix arrays quickly from the PBWT, which may be of independent interest.

On Language Generation in the Limit with Bounded Memory

2026-05-28T17:57:03Z

We study language generation in the limit under bounded memory. In this task, a learner observes examples from an unknown target language one at a time and must eventually output only new valid examples. Prior work assumes access to the entire history, a strong assumption since realistic algorithms retain limited past information. Classical work in learning theory shows memory constraints dramatically alter learnability; we extend this to language generation. First, we study memoryless generators. Under a mild enumeration restriction, every countable collection of infinite languages remains generable without memory. Without this restriction, we exactly characterize when memoryless generation is possible. For finite collections, we characterize the optimal minimax density achievable by memoryless generators -- the best density guaranteed against any collection of a given size. This combinatorial bound relies on Sperner's theorem and symmetric chain decompositions. We further show that a sliding window of the last $W$ examples does not improve this worst-case density, whereas allowing it to store $b$ adaptively chosen past examples improves the achievable density for every $b \geq 1$. Finally, we revisit identification in the limit, where the learner must converge to a single correct hypothesis for the target language. We focus on its incremental variant, where the learner remembers only its previous guess. Here, although exact identification fails on a collection of just three languages, a mild relaxation requiring convergence to an ``approximate'' version of the target is achievable for every finite collection. These results show bounded memory affects these tasks differently: generation remains achievable for every countable collection, while density and identification are confined to finite collections, with guarantees weakening as the collection grows.

Improved Guarantees for Heterogeneous Treatment-Effect Estimation via Matrix Completion

2026-05-28T17:55:23Z

A central goal of modern causal inference is estimating heterogeneous treatment effects to answer questions like "how does an intervention affect each unit," rather than only on average. We study this problem with panel-data where we observe $n$ units across $m$ times under unknown, non-uniform treatment assignments. The data in this setting is naturally represented as a matrix of all unit--time treatment effects. Estimating heterogeneous treatment effects can then be expressed as obtaining a good estimation of each row's average in this matrix. This allows us to formulate the problem as matrix completion, which can be solved under natural low-rankness assumptions. However, existing matrix-completion guarantees are not powerful enough to get meaningful bounds for the per-row guarantee required for estimating the heterogeneous treatment effect; roughly speaking, they are only useful for estimating average treatment effect bounds, as also illustrated in a recent line of work. We give a simple, computationally efficient estimator that, without knowledge of the propensities and under standard low-rankness and regularity assumptions, achieves a row-wise $\ell_2$ error of $\tilde{O}(\sqrt{\frac{1}{n} + \frac{n}{m^2}})$. Technically, our analysis establishes the first sharp row-wise $\ell_2$-perturbation bound for low-rank approximation, complementing existing spectral-, Frobenius-, and entrywise perturbation theory.

Algorithms with Polynomially-Improved Approximation Factors for the $2 \rightarrow q$ Norm, and Applications

2026-05-28T17:16:43Z

The $2 \rightarrow q$ norm of a matrix $X \in \mathbb{R}^{n \times d}$ is defined as $\lVert X \rVert_{2 \rightarrow q} = \sup_{\lVert v \rVert_2 = 1} \lVert Xv \rVert_q$. We give polynomial-time multiplicative approximation algorithms for this norm when $q > 2$ (i.e. in the hypercontractive setting). This problem either directly captures or is closely related to long-standing open problems in combinatorial optimization and hardness of approximation (e.g. Small Set Expansion), quantum information (e.g. Best Separable State), and algorithmic statistics. Very little is known about what approximation factors we can achieve for this problem in polynomial time, even though such approximations have significant downstream consequences. Barak, Brandão, Harrow, Kelner, Steurer, and Zhou showed that no polynomial-time algorithm can achieve an approximation factor better than $2^{\sqrt{\log n}}$, assuming the Exponential Time Hypothesis (FOCS'12). On the other hand, a simple spectral algorithm gives a $d^{1/4}$-approximation as a baseline. We give, to the best of our knowledge, the first polynomial-time approximation algorithm beating this baseline by polynomial factors. For the important special case of $q = 4$ it achieves a $d^{1/8}$-approximation. All previous algorithms required additional assumptions on $X$, or only surpassed the baseline for small values of $n$. Moreover, we construct sum-of-squares certificates for the $2 \rightarrow q$ norm. This directly implies improved algorithms for robust mean and covariance estimation, robust regression, and clustering, when the data only satisfies a bound on its $q$-th moment.

Parse indexing for discarding short pseudo-MEMs safely

2026-05-28T17:10:38Z

Brown et al.\ (2025) described a pre-processing step, called $k$-mer based breaking (KeBaB), that speeds up searching for long maximal exact matches (MEMs) between a pattern $P$ and an indexed repetitive text $T$. KeBaB produces a set of substrings of $P$ called pseudo-MEMs that often have total length much less than $|P|$ but are still guaranteed to contain all the MEMs of length at least a fixed parameter $k$. Brown et al.\ found that KeBaB can be particularly effective when we discard all but the longest pseudo-MEMs -- but then we risk also discarding the longest MEMs! In this paper we show how we can use parse indexing to generate pseudo-MEMs together with lower bounds on the lengths of the longest MEMs they must contain, allowing us to discard short pseudo-MEMs safely.

Quantum Algorithms on Edge Lists: Hiding, Shuffling, and Cycle Finding

2026-05-28T17:05:56Z

The edge list model is arguably the simplest input model for graphs, where the graph is specified by a list of its edges. In this model, we study the quantum query complexity of three variants of the triangle finding problem. The first asks whether there exists a triangle containing a target edge and raises general questions about the hiding of a problem's input among irrelevant data. The second asks whether there exists a triangle containing a target vertex and raises general questions about the shuffling of a problem's input. The third asks whether there exists a triangle; this problem bridges the $3$-distinctness and $3$-sum problems, which have been extensively studied by both cryptographers and complexity theorists. We provide tight or nearly tight results for these problems as well as some first answers to the general questions they raise. Furthermore, given any graph with low maximum degree, such as a typical random sparse graph, we prove that the quantum query complexity of finding a length-$k$ cycle in its length-$m$ edge list is $m^{3/4-1/(2^{k+2}-4)\pm o(1)}$, which matches the best-known upper bound for the quantum query complexity of $k$-distinctness on length-$m$ inputs up to an $m^{o(1)}$ factor. We prove the lower bound by developing new techniques within Zhandry's recording query framework [CRYPTO '19] as generalized by Hamoudi and Magniez [ToCT '23]. These techniques extend the framework to treat any non-product distribution that results from conditioning a product distribution on the absence of rare events. We prove the upper bound by adapting Belovs's learning graph algorithm for $k$-distinctness [FOCS '12]. Finally, assuming a plausible conjecture concerning only cycle finding, we show that the lower bound can be lifted to an essentially tight lower bound on the quantum query complexity of $k$-distinctness, which is a long-standing open question.

Smallest Suffixient Sets: Effectiveness, Resilience, and Calculation

2026-05-28T15:52:57Z

A suffixient set is a novel combinatorial object that captures the essential information of repetitive strings in a way that, provided with a random access mechanism, supports various forms of pattern matching. In this paper, we study the size $χ$ of the smallest suffixient set as a repetitiveness measure. First, we study its sensitivity to various string operations. We show that $χ$ cannot increase by more than 2 after appending or prepending a character to the string. As a consequence, we are able to give simple linear-time online algorithms to compute smallest suffixient sets. We also show that, although reversing the string can increase $χ$ by an arbitrary $O(n)$ value, it always holds $χ(T)/χ(T^R)\le 2$. We also prove lower and upper bounds for the additive or multiplicative increase of $χ$ after applying arbitrary edit operations, or rotating the text. In particular, we show that the additive increase can be as large as $Ω(\sqrt{n})$ for all those operations. Secondly, we place $χ$ in between known repetitiveness measures. In particular, we show $χ= O(r)$ (where $r$ is the number of runs in the Burrows-Wheeler Transform of the string), that there are string families where $χ=o(v)$ (where $v$ is the size of the smallext lexicographic parse of the string), and that $χ$ is uncomparable to almost all reachable measures based on copy-paste mechanisms. In passing, we give precise bounds for $χ$ for some relevant string families, for example $χ\le σ+2$ on episturmian words over alphabets of size $σ$ (e.g., $χ\le 4$ on Fibonacci strings, for which we precisely characterize the only two smallest suffixient sets).

Low-degree estimation thresholds in planted hypergraphs and tensor PCA

2026-05-28T15:49:59Z

A central question in high-dimensional statistics is to understand statistical--computational gaps: regimes in which recovering a hidden signal is information-theoretically possible but conjectured to be computationally intractable. The low-degree framework offers a concrete way to study this gap by restricting attention to estimators that are polynomials of degree at most $D$ in the observed data. In this paper, we study low-degree estimation in planted dense subhypergraph, sparse tensor PCA, and tensor PCA with a general prior. For the planted dense subhypergraph model on $n$ vertices, we identify two regimes depending on whether the planted set is larger or smaller than $\sqrt{n}$. Above this scale, we identify a sharp threshold for low-degree estimation. Below this scale, we establish hardness in the regimes predicted by prior work, thereby resolving a question of Schramm and Wein (2022) and Sohn and Wein (2025). For sparse tensor PCA, we identify an analogous sharp phase transition. For tensor PCA with a general prior, we prove a low-degree estimation lower bound at the critical signal scale, matching the degree--signal tradeoff suggested by prior work. Our lower bounds apply to degree $D=n^δ$, where $n$ is the dimension and $δ>0$ is a constant, and we complement them with corresponding low-degree upper bounds. In addition, for planted dense subhypergraph and sparse tensor PCA above the $\sqrt{n}$ scale, we convert our upper bounds into polynomial-time algorithms that achieve almost exact recovery above the sharp threshold, yielding polynomial-time algorithms succeeding up to this threshold. Our proofs extend the framework of Sohn and Wein (2025) through a conditional variant that yields the correct signal-to-noise ratio in settings where the unconditional approach is insufficient.

A Radius-Sensitive Approximation Algorithm for Connected Submodular Maximization

2026-05-28T15:05:08Z

Connected Submodular Maximization (CSM) is a graph problem with important applications to wireless network deployment, path planning, epidemic outbreaks, and cancer genome studies. In CSM, we are given a graph $G$, a non-negative monotone submodular function $f$ on subsets of the vertex set of $G$, and an integer $k$. The goal is to select a tree in $G$, with $k$ edges, whose vertex set maximizes $f$. We also study the more general Directed and Directed Rooted variants of CSM (DCSM and DRCSM respectively). In both variants, $G$ is directed and the solution must be an out-tree in $G$, with $k$ edges, whose vertex set maximizes $f$; DRCSM further specifies a vertex to be the root of the selected out-tree. For CSM, several previous works have proposed polynomial time approximation algorithms; the state-of-the-art polynomial time algorithm achieves a $Ω(\frac{1}{\sqrt{k}})$-approximation. We can also parameterize the approximation factor by the radius of the optimal solution, denoted by $r$; the state-of-the-art polynomial time algorithm achieves a $Ω(\frac{1}{r})$-approximation. In this paper, we improve on the state-of-the-art approximation factor for CSM with respect to $r$ as well as $k$, noting that $r \leq k$. We propose a polynomial time framework that, for (Directed) CSM, achieves a $Ω(\frac{\varepsilon^{3}}{{r}^{\varepsilon}})$-approximation for every constant $\varepsilon \in (0, 1]$. For DRCSM, our framework achieves a $Ω(\frac{δ\varepsilon^{3}}{{r}^{\varepsilon}})$-approximation that violates the size constraint by at most a factor of $1 + δ$ for every $δ\in [\frac{1}{k}, 1]$. A key component of our framework is GreedyRadius, which is an algorithm for DRCSM that takes another algorithm with a bicriteria approximation factor in terms of $k$ and outputs a solution with the same bicriteria approximation factor (up to constants) in terms of $r$.

Elfs, transducers and quantum walks

2026-05-28T14:39:07Z

Electric flow sampling (elfs) is a new tool in the quantum walk toolbox and a useful primitive for solving search, sampling and optimization problems on graphs. We refine this tool by showing that there exists a zero-error transducer for implementing elfs. More broadly, we establish a zero-error transducer for reflecting about the intersection of two subspaces, yielding an errorfree transducer version of the effective gap lemma. Building on this result, we obtain improved quantum walk algorithms for estimating effective resistances and span program witness sizes with an optimal error scaling, and for sampling from the random walk arrival distribution, via the composition of many elfs. Using this last algorithm, we obtain an up-to-quadratic quantum speedup for semi-supervised learning on expander graphs.

Quadratic Sums-of-Powers for Fixed-Parameter Tractable Quantum-Circuit Simulation

2026-05-28T13:54:42Z

Strongly simulating a quantum circuit, that is, computing an output amplitude, amounts to summing the circuit's Feynman paths, a weighted count over assignments to the Boolean ``path'' variables. The circuit's gates induce correlations among these variables, forming a graph whose structure determines the hardness of the simulation task. This sum-of-powers viewpoint underlies recent simulators built on knowledge-representation tools from artificial intelligence, namely binary decision diagrams and weighted model counting. We show that the structural quantity most accurately governing the difficulty is the rank-width of the path-variable graph, and we give an algorithm that evaluates the amplitude in time that is exponential only in this rank-width and polynomial in the circuit size. Rank-width can be far smaller than the widths that control competing methods: as corollaries, our algorithm reproduces a recent decision-diagram simulation breakthrough as a special case and matches the Markov--Shi tensor-network contraction bound. To complement this, we exhibit circuit families on which our algorithm provably beats both competing methods. The new method applies to every circuit built from Hadamard and diagonal gates, in particular to circuits over Clifford+T. In practical terms, general-purpose decision-diagram and model-counting tools can serve as the workhorse, with our specialized algorithm dispatched to exploit a small rank-width of the associated graph when it is present.

Min-Sum Set Cover on Parallel Machines

2026-05-28T13:44:35Z

Consider the classical Min-Sum Set Cover problem: We are given a universe $\mathcal{U}$ of $n$ elements and a collection $\mathcal{S}$ of $k$ subsets of $\mathcal{U}$. Moreover, a cost function is associated with each set. The goal is to find a subsequence of sets in $\mathcal{S}$ that covers all elements in $\mathcal{U}$, such that the sum of the covering times of the elements is minimized. The covering time of an element $u$ is the cost of all sets that appear in the sequence before $u$ is first covered. This problem can be seen as a scheduling problem on a single machine, where each job represents a set and elements are represented by some kind of utility that is required to be provided by at least one of the jobs. The goal is to schedule the jobs in such a way to minimize the sum of provision times of the utilities. In this paper we consider a natural generalization of this problem to the case of $m$ machines, processing the jobs in parallel. We call this problem Parallel Min-Sum Set Cover. To obtain approximation algorithms for both related and unrelated machines, we use a crucial subproblem which we call Parallel Maximum Coverage. We give a randomized bicriteria $(1-1/e-ε, O(\log m/\log\log m))$-approximation algorithm for this problem based on a natural LP relaxation. This can be then used to obtain $O(\log m/\log\log m)$-approximation algorithm for the Min-Sum Set Cover problem on unrelated machines. For related machines, we allow the aforementioned bicriteria approximation algorithm to run in FPT time, and apply a technique enabling transformation of a related machines instance into one consisting of $O(\log m)$ unrelated machines, to get an $\frac{8e}{e+1}+ε<12.66$-approximation algorithm for this case. We also show a greedy algorithm for unit cost sets, subject to precedence constraints, with an $O(k^{2/3})$ approximation ratio.

Selection Hyper-heuristics Can Automatically Adjust the Learning Period to Optimally Solve Pseudo-Boolean Problems

2026-05-28T13:31:16Z

The Random Gradient hyper-heuristic was recently shown to be able to learn the optimal neighbourhood size when optimizing the LeadingOnes benchmark via the Randomised Local Search (RLS) meta-heuristic. However, for this to happen, a learning period of a certain length $τ$ had to be used, differently from classic hyper-heuristics, which change their behaviour based on the success of only the previous iteration. In this paper, we show how to automatically set this new parameter value, relieving the user from the non-trivial task of controlling this novel algorithm parameter. We prove that the resulting hyper-heuristic selects the optimal neighbourhood size in a $1-o(1)$ fraction of the iterations and, consequently, optimises the LeadingOnes benchmark in the best possible time (apart from lower-order terms) achievable with these neighborhood sizes.

On the sensitivity of CDAWG-grammars

2026-05-28T13:13:07Z

The compact directed acyclic word graph (CDAWG) [Blumer et al. 1987] of a string is the minimal compact automaton that recognizes all the suffixes of the string. CDAWGs can be used for various string tasks including text pattern searching, data compression, and pattern discovery. The CDAWG-grammar [Belazzougui & Cunial 2017] is a grammar-based text compression based on the CDAWG, which allows for representing the CDAWG in $O(e)$ space without storing the string, where $e$ denotes the number of CDAWG edges. Let $g$ be the size of the CDAWG-grammar for the input string $T$. We show that the worst-case additive sensitivity of the CDAWG-grammar is lower bounded by $3g-21$ and is upper bounded by $8 g + 4$.