https://arxiv.org/api/+X+HdK+nYxqbFKr3X6L88zXhi2A2026-06-13T23:23:22Z2896616515http://arxiv.org/abs/2605.07150v2Deterministic Monotone Min-Plus Product and Convolution2026-05-29T04:49:30ZThe Monotone Min-Plus Product problem is a useful primitive that has seen many algorithmic applications over the past decade. In this problem, we are given two $n\times n$ integer matrices $A$ and $B$, where each row of $B$ is a monotone non-decreasing sequence of integers from $\{1,\dots,n\}$, and the goal is to compute their Min-Plus product, defined as the $n\times n$ matrix $C$ with $C_{i,j} = \min_{k}\{A_{i,k} + B_{k,j}\}$. The fastest known algorithm for this task [Chi, Duan, Xie, and Zhang, STOC'22] runs in $n^{(ω+3)/2+o(1)} = O(n^{2.686})$ time, significantly improving over the brute-force cubic algorithm. However, its main disadvantage is that it requires randomization, which is then inherited by all downstream applications.
Our main result is a deterministic algorithm for Monotone Min-Plus product with the same time complexity $n^{(ω+3)/2+o(1)} = O(n^{2.686})$ as its randomized counterpart, improving upon the previous deterministic bound $O(n^{2.875})$ [Gu, Polak, Vassilevska Williams, and Xu, ICALP'21]. Our derandomization also applies to previously studied extensions and variants (e.g., [Dürr, IPL'23]), including rectangular matrices, bounded range $[n^μ]$, and column-monotone matrices. As an immediate consequence, we derandomize state-of-the-art algorithms for multiple problems, including Language Edit Distance, RNA Folding, Optimum Stack Generation, unweighted Tree Edit Distance, Batched Range Mode, and Approximate All-Pairs Shortest Paths.
Our techniques also yield a deterministic algorithm for the Monotone Min-Plus Convolution problem that runs in $n^{1.5+o(1)}$ time, nearly matching the best known randomized time complexity $\widetilde{O}(n^{1.5})$ [Chi, Duan, Xie, and Zhang, STOC'22]. This algorithm can be used to derandomize state-of-the-art algorithms for Jumbled Indexing for binary strings and several variants of Knapsack.2026-05-08T02:36:07ZTo appear in ICALP 2026. Abstract shortened to meet arXiv requirementsCe JinJaewoo ParkBarna SahaYinzhan Xuhttp://arxiv.org/abs/2605.30813v1Incremental BPE Tokenization2026-05-29T04:04:32ZWe propose a novel algorithm for incremental Byte Pair Encoding (BPE) tokenization. The algorithm processes each input byte in worst-case $\mathcal{O}(\log^2 t)$ time, leading to an overall complexity of $\mathcal{O}(n \log^2 t)$, where $n$ is the input length and $t$ is the maximum token length. The algorithm incrementally maintains BPE tokenization results for every prefix of the input text, implementing the standard BPE merge procedure defined by a fixed set of merge rules. This enables efficient partial tokenization in streaming settings. Functioning as a drop-in replacement for standard BPE, our approach achieves a speedup of up to ${\sim}3\times$ over Hugging Face's tokenizers, and demonstrates significant latency reductions over OpenAI's tiktoken on pathological inputs. We further introduce an eager output algorithm that enables streaming output, emitting tokens as soon as token boundaries are determined during incremental tokenization. Overall, our results demonstrate that BPE tokenization can be performed incrementally with strong worst-case guarantees, while providing practical latency benefits in modern large language model pipelines. Code: https://github.com/ModelTC/mtc-inc-bpe2026-05-29T04:04:32ZAccepted to ICML 2026 (Spotlight)Shenghu JiangRuihao Gonghttp://arxiv.org/abs/2605.28952v2Optimal Rates for Differentially Private Hypothesis Testing with E-values2026-05-29T03:39:02ZE-values have attracted considerable interest in recent years as flexible tools for enabling anytime-valid and adaptive data analysis. Hypothesis testing is at the core of many of these applications, which can often involve private or sensitive data. In this work, we answer a simple but important question: given two distributions $\mathbb{P}$ and $\mathbb{Q}$, what is the maximum achievable e-power when testing $X\sim \mathbb{P}^n$ against $X\sim\mathbb{Q}^n$ with e-values that satisfy $\varepsilon$-differential privacy? We characterize the optimal rate for this problem and provide an algorithm which matches it exactly. In the sequential setting, when observations arrive one-by-one and the analyst chooses when to halt, we give matching upper and lower bounds on the stopping times of any private e-process. Numerical experiments confirm the practicality of our algorithms, which require less data than the recently proposed DP-SPRT across a range of sequential testing problems and privacy levels.2026-05-27T18:00:13ZCorrected typos; updated references; generalized proposition 3.1Ben JacobsenTomas GonzalezGavin BrownKassem FawazAaditya Ramdashttp://arxiv.org/abs/2605.30564v1Listing Even Cycles Faster than the Submodular-Width Barrier2026-05-28T20:53:42ZA classic result of Alon, Yuster, and Zwick (AYZ, Algorithmica 1997) shows that all $2k$-cycles in an $m$-edge graph can be listed in $\tilde O(m^{2-1/k}+t)$ time, where $t$ is the output size. This bound underlies the {\em submodular width} of Marx (JACM 2013) and the PANDA framework of Abo Khamis, Ngo, and Suciu (PODS 2017), which extend AYZ to arbitrary conjunctive queries with degree constraints. A central open question is whether combinatorial algorithms can beat the submodular-width barrier.
Bringmann and Gorbachev (STOC 2025) gave lower-bound evidence that submodular width may be optimal for general conjunctive queries under combinatorial algorithms. The picture changes for $2k$-cycles on undirected graphs, whose queries have self-joins and symmetric EDBs: recent works improve on AYZ for even-cycle detection and listing. Pinning down the complexity of $C_{2k}$-detection and listing is thus a natural step toward overcoming the submodular-width barrier for such queries.
For detection, Dahlgaard, Knudsen, and St{ö}ckel (STOC 2017) solved $C_{2k}$-detection in $\tilde O(m^{2k/(k+1)})$ time. Listing is harder. Jin and Xu (STOC 2023), and independently Abboud, Khoury, Leibowitz, and Safier (FSTTCS 2023), listed 4-cycles in $\tilde O(m^{4/3}+t)$ time; Vassilevska~Williams and Westover (ITCS 2025) listed 6-cycles in $\tilde O(m^{8/5}+t)$ time, improving the AYZ bounds of $\tilde O(m^{3/2})$ and $\tilde O(m^{5/3})$. The general case has remained open for 30 years.
Building on these works, we list $2k$-cycles in $\tilde O(m^{(2k^2-k+1)/(k^2+1)}+t)$ time, improving AYZ for every $k\geq 3$. The key ingredient is an \emph{asymmetric supersaturation} result for even cycles. Our algorithms use only join and project operators over multiple tree-decomposition plans, making them naturally implementable in database systems, in contrast to prior BFS-based graph approaches.2026-05-28T20:53:42ZVasileios NakosHung Q. NgoAndreas Panayihttp://arxiv.org/abs/2605.30546v1Energy-Efficient Aggregation and Minimum-Degree Spanning Trees in Radio Networks2026-05-28T20:28:08ZWe study the aggregation problem in synchronous multi-hop radio networks with $O(\log n)$-bit messages and no collision detection. Each node initially holds a value, and the goal is to compute a global aggregate such as the sum of all values. Aggregation tasks arise naturally in wireless sensor networks, where nodes are often battery-powered and radio activity is the dominant source of energy consumption. Accordingly, our main objective is to minimize the energy complexity, defined as the maximum number of rounds in which any node is awake.
Our main result is a randomized distributed algorithm that, with high probability, constructs and executes an aggregation schedule in $O(n \operatorname{polylog} n)$ rounds and using $O(Δ^\ast \operatorname{polylog} n)$ energy, where $Δ^\ast$ is the minimum possible maximum degree of a spanning tree of the network graph. This guarantee is nearly optimal: for any aggregation schedule and any graph, there exists a node that must be awake for at least $Δ^\ast$ rounds.
As a by-product, the algorithm also computes a spanning tree whose maximum degree is within an $O(\log n)$ factor of $Δ^\ast$, with the same round and energy guarantees. For every tree edge, both endpoints learn that the edge belongs to the tree.2026-05-28T20:28:08ZYi-Jun ChangYang Ze Guanhttp://arxiv.org/abs/2605.27912v2Privately Estimating Monotone Statistics in Polynomial Time2026-05-28T20:19:03ZWe study efficient differentially private algorithms for estimating monotone statistics, i.e., statistics that are monotone under the addition of new observations. The starting point for our investigation is subsample-and-aggregate: a classical paradigm that partitions the dataset into blocks, estimates the statistic on each block, and then privately aggregates the estimates. While practical and generically applicable, this approach is quite data-hungry. We improve upon this framework for the class of monotone statistics -- compared to subsample-and-aggregate, our algorithms save a factor of $t$ in sample complexity and pay a factor of $e^t$ in running time, where $t>0$ is a tunable parameter. We complement our results with a query-complexity lower bound, showing that our algorithms are essentially optimal for this task. As an application, we obtain improved results for private eigenvalue estimation, private loss estimation, and privately estimating a single parameter of a high-dimensional model, e.g., in linear regression.2026-05-27T03:40:25ZGavin BrownEphraim LinderMahbod MajidVikrant Singhalhttp://arxiv.org/abs/2605.17540v4One-Shot Klein Cutting Planes for Lipschitz Geodesically Convex Optimization in Hyperbolic Space2026-05-28T19:47:08ZMotivated by the COLT 2023 open problem of Criscitiello, Martínez-Rubio, and Boumal on deterministic first-order methods for Lipschitz geodesically convex optimization on Hadamard manifolds, we study hyperbolic space \[ \HH^d_{-\kappaC^2} =\{X\in\R^{d+1}:\ipL{X}{X}=-1,\ X_0>0\}, \qquad \ip{U}{V}_X=\kappaC^{-2}\ipL{U}{V}. \] For every geodesically convex $M$-Lipschitz function \[ f:\bar B_{\HH}(x_0,r)\to\R,\qquad s=\kappaC r, \] we give a one-shot Klein cutting-plane method returning a queried point $\hat x$ such that \[ f(\hat x)-\min_{\bar B_{\HH}(x_0,r)}f\le \eps Mr \] after at most \[ \left\lceil 2d(d+1)\log\!\left(\frac{16\sinh s\cosh s}{s\eps}\right) \right\rceil \] oracle calls. For $d\ge2$, each localization step costs $O(d^2)$ arithmetic operations; for $d=1$, an interval variant gives the same oracle bound. Hence \[ N=O\bigl(d^2(s+\log(e/\eps))\bigr) =O\bigl(d^2ζ_s\log(e/\eps)\bigr), \qquad ζ_s=s/\tanh s . \] Compared with the constant-curvature construction associated with the COLT problem, this replaces chained curvature--accuracy dependence by additive dependence. The proof does not rely on convexity of the Klein pullback, which is generally only quasiconvex. Instead, every Riemannian subgradient halfspace becomes an exact Euclidean central cut: for $θ=\kappaC\dist(X,Y)$, \[ \ip{g}{\log_XY}_X =\fracθ{\kappaC^2\sinhθ}\ipL{g}{Y}, \] and tangency at $X$ converts $\ipL{g}{Y}\le0$ into \[ \gbar^{\mathsf T}(u-c)\le0,\qquad u=Φ(Y),\ c=Φ(X). \] Thus one fixed Euclidean ellipsoid localizes the hyperbolic ball, and curvature enters only through \[ \log\!\left(\frac{\sinh s\cosh s}{s\eps}\right) =\log(1/\eps)+2s-\log(4s)+O(e^{-4s}). \] The general Hadamard-manifold problem remains open.2026-05-17T16:48:58ZYutong ZhangYaoran YangYifan ZhuWentao Zhanghttp://arxiv.org/abs/2605.15819v3Faster PBWT prefix-array access via batching2026-05-28T18:39:41ZThe positional Burrows-Wheeler Transform (PBWT) is commonly used to store haplotype panels compactly in such a way that, given a query haplotype, we can quickly find the set maximal exact matches (SMEMs) between the query and the haplotypes in a panel. There are generally two steps in this process: first we find the maximal substrings of the query that occur in the same positions in haplotypes in the panel and then, for each such substring, report the haplotypes in the panel in which the substring occurs in the same position as in the query. Very recently, Bonizzoni, Gagie and Gao (2026) gave two time-space tradeoffs for the second step: they use either $O ((r + h) \log n)$ bits and $O (\log \log \min (h, \ell) + k)$ time to report $k$ haplotypes in the panel, or $O (r \log h + h \log n)$ bits and $O (k \log \log h)$ time, where $r$ is the number of runs in the panel's PBWT and $h$, $\ell$ and $n = h \ell$ are the panel's height, length and size, respectively. We observe here that if we can batch queries until we have found $r \lg (h) / \lg r$ such substrings and we report an average of at least $\lg (r) / \lg h$ haplotypes in the panel per substring, for example, then for the second step we can easily use $O (r \log h)$ bits and constant time to report each haplotype. Our approach is based on an algorithm for constructing the prefix arrays quickly from the PBWT, which may be of independent interest.2026-05-15T10:18:25ZTravis Gagiehttp://arxiv.org/abs/2605.30324v1On Language Generation in the Limit with Bounded Memory2026-05-28T17:57:03ZWe study language generation in the limit under bounded memory. In this task, a learner observes examples from an unknown target language one at a time and must eventually output only new valid examples. Prior work assumes access to the entire history, a strong assumption since realistic algorithms retain limited past information. Classical work in learning theory shows memory constraints dramatically alter learnability; we extend this to language generation.
First, we study memoryless generators. Under a mild enumeration restriction, every countable collection of infinite languages remains generable without memory. Without this restriction, we exactly characterize when memoryless generation is possible. For finite collections, we characterize the optimal minimax density achievable by memoryless generators -- the best density guaranteed against any collection of a given size. This combinatorial bound relies on Sperner's theorem and symmetric chain decompositions.
We further show that a sliding window of the last $W$ examples does not improve this worst-case density, whereas allowing it to store $b$ adaptively chosen past examples improves the achievable density for every $b \geq 1$.
Finally, we revisit identification in the limit, where the learner must converge to a single correct hypothesis for the target language. We focus on its incremental variant, where the learner remembers only its previous guess. Here, although exact identification fails on a collection of just three languages, a mild relaxation requiring convergence to an ``approximate'' version of the target is achievable for every finite collection.
These results show bounded memory affects these tasks differently: generation remains achievable for every countable collection, while density and identification are confined to finite collections, with guarantees weakening as the collection grows.2026-05-28T17:57:03ZThe abstract has been shortened to fit within the arXiv limitJon KleinbergAnay MehrotraAmin SaberiGrigoris Velegkashttp://arxiv.org/abs/2605.30319v1Improved Guarantees for Heterogeneous Treatment-Effect Estimation via Matrix Completion2026-05-28T17:55:23ZA central goal of modern causal inference is estimating heterogeneous treatment effects to answer questions like "how does an intervention affect each unit," rather than only on average. We study this problem with panel-data where we observe $n$ units across $m$ times under unknown, non-uniform treatment assignments. The data in this setting is naturally represented as a matrix of all unit--time treatment effects. Estimating heterogeneous treatment effects can then be expressed as obtaining a good estimation of each row's average in this matrix. This allows us to formulate the problem as matrix completion, which can be solved under natural low-rankness assumptions. However, existing matrix-completion guarantees are not powerful enough to get meaningful bounds for the per-row guarantee required for estimating the heterogeneous treatment effect; roughly speaking, they are only useful for estimating average treatment effect bounds, as also illustrated in a recent line of work. We give a simple, computationally efficient estimator that, without knowledge of the propensities and under standard low-rankness and regularity assumptions, achieves a row-wise $\ell_2$ error of $\tilde{O}(\sqrt{\frac{1}{n} + \frac{n}{m^2}})$. Technically, our analysis establishes the first sharp row-wise $\ell_2$-perturbation bound for low-rank approximation, complementing existing spectral-, Frobenius-, and entrywise perturbation theory.2026-05-28T17:55:23ZAnay MehrotraPhuc TranVan H. VuManolis Zampetakishttp://arxiv.org/abs/2605.25303v2Algorithms with Polynomially-Improved Approximation Factors for the $2 \rightarrow q$ Norm, and Applications2026-05-28T17:16:43ZThe $2 \rightarrow q$ norm of a matrix $X \in \mathbb{R}^{n \times d}$ is defined as $\lVert X \rVert_{2 \rightarrow q} = \sup_{\lVert v \rVert_2 = 1} \lVert Xv \rVert_q$. We give polynomial-time multiplicative approximation algorithms for this norm when $q > 2$ (i.e. in the hypercontractive setting). This problem either directly captures or is closely related to long-standing open problems in combinatorial optimization and hardness of approximation (e.g. Small Set Expansion), quantum information (e.g. Best Separable State), and algorithmic statistics.
Very little is known about what approximation factors we can achieve for this problem in polynomial time, even though such approximations have significant downstream consequences. Barak, Brandão, Harrow, Kelner, Steurer, and Zhou showed that no polynomial-time algorithm can achieve an approximation factor better than $2^{\sqrt{\log n}}$, assuming the Exponential Time Hypothesis (FOCS'12). On the other hand, a simple spectral algorithm gives a $d^{1/4}$-approximation as a baseline. We give, to the best of our knowledge, the first polynomial-time approximation algorithm beating this baseline by polynomial factors. For the important special case of $q = 4$ it achieves a $d^{1/8}$-approximation. All previous algorithms required additional assumptions on $X$, or only surpassed the baseline for small values of $n$.
Moreover, we construct sum-of-squares certificates for the $2 \rightarrow q$ norm. This directly implies improved algorithms for robust mean and covariance estimation, robust regression, and clustering, when the data only satisfies a bound on its $q$-th moment.2026-05-24T23:56:06Zv2 corrected minor typosSamuel B. HopkinsStefan Tiegelhttp://arxiv.org/abs/2605.17574v3Parse indexing for discarding short pseudo-MEMs safely2026-05-28T17:10:38ZBrown et al.\ (2025) described a pre-processing step, called $k$-mer based breaking (KeBaB), that speeds up searching for long maximal exact matches (MEMs) between a pattern $P$ and an indexed repetitive text $T$. KeBaB produces a set of substrings of $P$ called pseudo-MEMs that often have total length much less than $|P|$ but are still guaranteed to contain all the MEMs of length at least a fixed parameter $k$. Brown et al.\ found that KeBaB can be particularly effective when we discard all but the longest pseudo-MEMs -- but then we risk also discarding the longest MEMs! In this paper we show how we can use parse indexing to generate pseudo-MEMs together with lower bounds on the lengths of the longest MEMs they must contain, allowing us to discard short pseudo-MEMs safely.2026-05-17T18:01:33ZTravis Gagiehttp://arxiv.org/abs/2412.17786v4Quantum Algorithms on Edge Lists: Hiding, Shuffling, and Cycle Finding2026-05-28T17:05:56ZThe edge list model is arguably the simplest input model for graphs, where the graph is specified by a list of its edges. In this model, we study the quantum query complexity of three variants of the triangle finding problem. The first asks whether there exists a triangle containing a target edge and raises general questions about the hiding of a problem's input among irrelevant data. The second asks whether there exists a triangle containing a target vertex and raises general questions about the shuffling of a problem's input. The third asks whether there exists a triangle; this problem bridges the $3$-distinctness and $3$-sum problems, which have been extensively studied by both cryptographers and complexity theorists. We provide tight or nearly tight results for these problems as well as some first answers to the general questions they raise.
Furthermore, given any graph with low maximum degree, such as a typical random sparse graph, we prove that the quantum query complexity of finding a length-$k$ cycle in its length-$m$ edge list is $m^{3/4-1/(2^{k+2}-4)\pm o(1)}$, which matches the best-known upper bound for the quantum query complexity of $k$-distinctness on length-$m$ inputs up to an $m^{o(1)}$ factor. We prove the lower bound by developing new techniques within Zhandry's recording query framework [CRYPTO '19] as generalized by Hamoudi and Magniez [ToCT '23]. These techniques extend the framework to treat any non-product distribution that results from conditioning a product distribution on the absence of rare events. We prove the upper bound by adapting Belovs's learning graph algorithm for $k$-distinctness [FOCS '12]. Finally, assuming a plausible conjecture concerning only cycle finding, we show that the lower bound can be lifted to an essentially tight lower bound on the quantum query complexity of $k$-distinctness, which is a long-standing open question.2024-12-23T18:43:41Z80 pages; v2: major update to include new results on cycle finding; v3: corrected Mirroring Lemma and improved presentation; v4: changed title and improved presentationAmin Shiraz GilaniDaochen WangPei WuXingyu Zhouhttp://arxiv.org/abs/2506.05638v6Smallest Suffixient Sets: Effectiveness, Resilience, and Calculation2026-05-28T15:52:57ZA suffixient set is a novel combinatorial object that captures the essential information of repetitive strings in a way that, provided with a random access mechanism, supports various forms of pattern matching. In this paper, we study the size $χ$ of the smallest suffixient set as a repetitiveness measure.
First, we study its sensitivity to various string operations. We show that $χ$ cannot increase by more than 2 after appending or prepending a character to the string. As a consequence, we are able to give simple linear-time online algorithms to compute smallest suffixient sets. We also show that, although reversing the string can increase $χ$ by an arbitrary $O(n)$ value, it always holds $χ(T)/χ(T^R)\le 2$. We also prove lower and upper bounds for the additive or multiplicative increase of $χ$ after applying arbitrary edit operations, or rotating the text. In particular, we show that the additive increase can be as large as $Ω(\sqrt{n})$ for all those operations.
Secondly, we place $χ$ in between known repetitiveness measures. In particular, we show $χ= O(r)$ (where $r$ is the number of runs in the Burrows-Wheeler Transform of the string), that there are string families where $χ=o(v)$ (where $v$ is the size of the smallext lexicographic parse of the string), and that $χ$ is uncomparable to almost all reachable measures based on copy-paste mechanisms. In passing, we give precise bounds for $χ$ for some relevant string families, for example $χ\le σ+2$ on episturmian words over alphabets of size $σ$ (e.g., $χ\le 4$ on Fibonacci strings, for which we precisely characterize the only two smallest suffixient sets).2025-06-05T23:58:03ZExtended version of 'Smallest suffixient sets as a repetitiveness measure'(https://doi.org/10.1007/978-3-032-05228-5_18)Hiroto FujimaruGonzalo NavarroGiuseppe RomanaCristian Urbinahttp://arxiv.org/abs/2605.30113v1Low-degree estimation thresholds in planted hypergraphs and tensor PCA2026-05-28T15:49:59ZA central question in high-dimensional statistics is to understand statistical--computational gaps: regimes in which recovering a hidden signal is information-theoretically possible but conjectured to be computationally intractable. The low-degree framework offers a concrete way to study this gap by restricting attention to estimators that are polynomials of degree at most $D$ in the observed data. In this paper, we study low-degree estimation in planted dense subhypergraph, sparse tensor PCA, and tensor PCA with a general prior.
For the planted dense subhypergraph model on $n$ vertices, we identify two regimes depending on whether the planted set is larger or smaller than $\sqrt{n}$. Above this scale, we identify a sharp threshold for low-degree estimation. Below this scale, we establish hardness in the regimes predicted by prior work, thereby resolving a question of Schramm and Wein (2022) and Sohn and Wein (2025). For sparse tensor PCA, we identify an analogous sharp phase transition. For tensor PCA with a general prior, we prove a low-degree estimation lower bound at the critical signal scale, matching the degree--signal tradeoff suggested by prior work.
Our lower bounds apply to degree $D=n^δ$, where $n$ is the dimension and $δ>0$ is a constant, and we complement them with corresponding low-degree upper bounds. In addition, for planted dense subhypergraph and sparse tensor PCA above the $\sqrt{n}$ scale, we convert our upper bounds into polynomial-time algorithms that achieve almost exact recovery above the sharp threshold, yielding polynomial-time algorithms succeeding up to this threshold. Our proofs extend the framework of Sohn and Wein (2025) through a conditional variant that yields the correct signal-to-noise ratio in settings where the unconditional approach is insufficient.2026-05-28T15:49:59Z67 pages, 1 figureDaniel FuYoungtak Sohn