https://arxiv.org/api/xCY9tAlCcSxBK6asQ9q3RYvqwW02026-06-13T13:44:07Z289664515http://arxiv.org/abs/2505.05847v4Smaller and More Flexible Cuckoo Filters2026-06-08T13:40:57ZCuckoo filters are space-efficient approximate set membership data structures with a controllable false positive rate (FPR) and zero false negatives, similar to Bloom filters. In contrast to Bloom filters, Cuckoo filters store multi-bit fingerprints of keys in a hash table using variants of Cuckoo hashing, allowing each fingerprint to be stored at a small number of possible locations. Existing Cuckoo filters use fingerprints of $(k+3)$ bits per key and an additional space overhead factor of at least $1.05$ to achieve an FPR of $2^{-k}$. For $k=10$, this amounts to $1.365\, kn$ bits to store $n$ keys, which is better than $1.443\, kn$ bits for Bloom filters. The $+3$ for the fingerprint size is required to balance out the multiplied FPR caused by looking for the fingerprint at several locations. In the original Cuckoo filter, the number of hash table buckets is restricted to a power of 2, which may lead to much larger space overheads, up to $2.1\, (1+3/k)\, kn$ bits.
We present two improvements of Cuckoo filters. First, we remove the restriction that the number of buckets must be a power of 2 by using a different placement strategy. Second, we reduce the space overhead factor of Cuckoo filters to $1.06 \, (1+2/k)$ by using overlapping windows instead of disjoint buckets to maintain the load threshold of the hash table, while reducing the number of alternative slots where any fingerprint may be found.
A detailed evaluation demonstrates that the alternative memory layout based on overlapping windows decreases the size of Cuckoo filters not only in theory, but also in practice. A comparison with other state-of-the art filter types, Prefix filters and Vector Quotient filters (VQFs), shows that the reduced space overhead makes windowed Cuckoo filters the smallest filters supporting online insertions, with similarly fast queries, but longer insertion times.2025-05-09T07:37:38ZJohanna Elena SchmitzJens ZentgrafSven Rahmannhttp://arxiv.org/abs/2511.20397v2Model-Based Learning of Whittle indices2026-06-08T13:11:50ZWe present BLINQ, a new model-based algorithm that learns the Whittle indices of an indexable, communicating and unichain Markov Decision Process (MDP). Our approach relies on building an empirical estimate of the MDP and then computing its Whittle indices using an extended version of a state-of-the-art existing algorithm. We provide a proof of convergence to the Whittle indices we want to learn as well as a bound on the time needed to learn them with arbitrary precision. Moreover, we investigate its computational complexity. Our numerical experiments suggest that BLINQ significantly outperforms existing Q-learning approaches in terms of the number of samples needed to get an accurate approximation. In addition, it has a total computational cost even lower than Q-learning for any reasonably high number of samples. These observations persist even when the Q-learning algorithms are speeded up using neural networks to predict Q-values.2025-11-25T15:21:00Z30 pages, 7 figures, submitted to TOMPECSJoël Charles-RebufféNicolas GastBruno Gaujalhttp://arxiv.org/abs/2606.09318v1Engineering Scalable Distributed List Ranking2026-06-08T10:28:44ZThe list ranking problem is one of the classical problems of parallel computing, with nontrivial algorithms and many applications as a subroutine for solving other problems. While it has been intensively studied in the early days of parallel computing, few things happened in the last 20 years. In particular, there is little work on scaling list ranking to large machines and input sizes. We reconsider list ranking starting from the ground-breaking results of Sibeyn a quarter century ago. We employ algorithm and performance engineering to improve his sparse ruling-set algorithm, making it capable of scaling to many processors, and provide a more detailed analysis of the impact of the algorithm's parameters, further guiding our practical implementation.
We perform an extensive experimental study across a variety of input instances with different structural properties. We demonstrate that indirect communication, exploiting input locality, and message coalescing allows scaling to billions of elements on up to 24,576 cores.2026-06-08T10:28:44Z14 pages, 4 figuresPeter SandersMatthias SchimekTim Niklas UhlThomas Weidmannhttp://arxiv.org/abs/2606.09133v1Multiversion Concurrency Control for Multiversion B-Trees2026-06-08T07:31:22ZMultiversion concurrency control (MVCC) enables scans to read data from a committed snapshot (version), reducing conflicts with write operations compared to traditional concurrency approaches. Currently, versioned records are often managed in a B$^+$-tree using version chains. However, version chains introduce overhead during scans and can still lead to conflicts between scans and writers. The multiversion B-tree (MVBT) was designed for optimal range scan performance on arbitrary versions, but has been considered impractical due to its structural complexity and, until recently, the lack of effective concurrency control. In this paper, we present the concurrent MVBT (cMVBT), a redesign of the MVBT featuring a novel concurrency control protocol that uses optimistic latches for write operations and requires no latches for range scans, while preserving all the optimality guarantees of the original MVBT. Additionally, cMVBT supports continuous garbage collection without activity spikes, seamlessly integrating free-space management. Experiments with mixed workloads derived from a standard benchmark show that the cMVBT achieves low overhead, high write throughput, and excellent range scan performance, outperforming state-of-the-art methods based on version chains.2026-06-08T07:31:22ZAmir TontaBernhard SeegerEljas Soisalon-Soininenhttp://arxiv.org/abs/2606.01342v3Towards Optimal Robustness in Learning-Augmented Paging2026-06-08T06:33:53ZLearning-augmented paging has been extensively studied in recent years. A key advantage over naive ML-based approaches is \emph{bounded robustness}, which guarantees worst-case performance even when predictions are inaccurate, making these algorithms valuable for real-world systems. Prior work achieves robustness bounds of $2H_k + O(1)$ in the randomized setting, leaving a gap to the optimal competitive ratio $H_k$.
In this paper, we study how to close this gap. We begin by reviewing online optimality and proving a new property of the latest $H_k$-competitive algorithm, which facilitates our analysis in the learning-augmented setting. Then, we review existing learning-augmented paging algorithms and introduce a unifying primitive, the \emph{relative prediction budget}, which captures the essence of establishing robustness and reveals that prior algorithms either overuse or underutilize predictions. Guided by the above analysis, we develop a new framework that achieves the best-possible robustness up to an additive constant for learning-augmented paging: $H_k + O(1)$. Experiments further demonstrate strong practical performance.2026-05-31T16:49:36ZICML 2026Peng ChenHailiang ZhaoXueyan TangYixuan WangShuiguang Denghttp://arxiv.org/abs/2507.03980v4Functional design of efficient and parallelizable combinatorial generators using convolution2026-06-08T04:17:36ZThe application of program transformation and algebraic methods to the development of efficient combinatorial optimization (CO) algorithms relies on an exhaustive combinatorial generator for the problem specification, followed by the fusion of thinning or filtering processes into this specification. However, the effectiveness of such fusion transformations critically depends on the structural compatibility between the objective function and the generator, which is highly problem dependent. In practice, when the majority of candidate solutions remain unfiltered or are not eliminated-as is the case for most intractable CO problems-the overall efficiency of the resulting fused program is largely determined by the intrinsic efficiency of the combinatorial generator. Consequently, if the specification itself exhibits suboptimal performance, the fused program will inherit a correspondingly inferior level of efficiency.
We argue that a genuine designed process should also account for hardware compatibility and parallelizability-particularly the ability to support efficient parallel execution on modern hardware architectures, including multi-level cache hierarchies and GPUs. However, does achieving formal correctness necessarily conflict with designing algebraically elegant algorithms that support fusion? Can we obtain both simultaneously?
In this paper, we show that techniques from functional programming, provide powerful formal tools for the systematic construction of such hardware-compatible and parallelizable combinatorial generators. This paper investigates generators for two of the most fundamental combinatorial structures-combinations and permutations-together with their natural extension to nested generators (e.g., combinations/permutations of combinations/permutations).2025-07-05T10:11:37ZXi HeZhenjiang HuMax. A. Littlehttp://arxiv.org/abs/2606.08977v1Online Learning with Recency: Algorithms for Sliding-window Streaming Multi-armed Bandits2026-06-08T03:21:54ZMotivated by the recency effect in online learning, we study algorithms for single-pass *sliding-window streaming multi-armed bandits (MABs)* in this paper. In this setting, we are given $n$ arms with unknown sub-Gaussian reward distributions and a parameter $W$. The arms arrive in a single-pass stream, and only the most recent $W$ arms are considered valid. The algorithm is required to perform pure exploration and regret minimization with limited memory, defined as the number of stored arms. The model is a natural extension of the streaming multi-armed bandits model (without the sliding window) that has been extensively studied in recent years. We provide a comprehensive analysis of both the pure exploration and regret minimization problems with the model. For pure exploration, we prove that finding the best arm is hard with sublinear memory while finding an approximate best arm admits an efficient algorithm. For regret minimization, we explore a new notion of regret and give sharp memory-regret trade-offs for any single-pass algorithm. We complement our theoretical results with experiments, demonstrating the trade-offs between sample, regret, and memory.2026-06-08T03:21:54ZICML 2026Vladimir BravermanChen WangLiudeng WangSamson Zhouhttp://arxiv.org/abs/2508.11874v2Discovering Expert-Level Nash Equilibrium Algorithms with Large Language Models2026-06-08T00:24:06ZDesigning polynomial-time algorithms for approximate Nash equilibria (ANE) with provable worst-case guarantees is a fundamental open problem in algorithmic game theory. While large language models (LLMs) can generate candidate algorithms at scale, certifying worst-case guarantees requires formal analysis over all game instances -- a task for which no automated system previously existed. Here, we present LegoNE, a framework encoding expert proof strategies into a symbolic language that automatically compiles any candidate algorithm into a finite optimization problem certifying its worst-case guarantee. Integrating LegoNE with a reasoning LLM, we rediscovered an algorithm matching the best polynomial-time guarantee for two-player games, and discovered a three-player algorithm improving the best guarantee from $0.6+δ$ to $0.5+δ$ -- provably beyond the reach of the extension technique, the only previously known multi-player ANE design paradigm. These results show that encoding domain-specific proof strategies into a machine-tractable language can support LLM-driven discovery of algorithms outside known human design paradigms.2025-08-16T02:18:43Zaccepted by Nature CommunicationsHanyu LiDongchen LiXiaotie Deng10.1038/s41467-026-74003-1http://arxiv.org/abs/2508.12536v3jXBW: A Compressed Index for Structure-Aware JSONL Retrieval in Structured RAG2026-06-07T16:31:47ZProviding \textit{structured} information to large language models (LLMs) improves multi-step reasoning and factual grounding, and recent retrieval-augmented generation (RAG) systems therefore reconstruct structure from retrieved text on every query. When the corpus is \emph{already} structured -- as in JSON Lines (JSONL), a popular format for LLM prompts, chemical compounds, and geospatial records -- this per-query rebuilding can be replaced by direct \emph{structural retrieval}. The core primitive is \textit{substructure search}: finding all JSON objects in a collection that contain a given query pattern. Existing approaches index each document separately, so both index space and query time grow with the total collection size; XML-based engines add conversion overhead and semantic mismatches. We propose \textbf{jXBW}, a compressed index for fast substructure search over JSONL, combining three innovations: (i) a merged tree representation that consolidates repeated structures across objects, (ii) a succinct tree index based on the eXtended Burrows--Wheeler Transform (XBW), and (iii) a newly developed three-phase substructure search algorithm that runs on this index. Together they achieve \textbf{query-dependent complexity}: the cost is determined by query characteristics rather than collection size, in compressed space. Experiments on seven real-world datasets, including PubChem ($10^6$ compounds) and OpenStreetMap ($6.6 \times 10^6$ objects), show that jXBW outperforms the strongest tree-based baseline by $\mathbf{16\times}$ on the smallest dataset and by up to $\mathbf{2{,}800\times}$ on the largest, and is more than $\mathbf{2 \times 10^6\times}$ faster than the XQuery engine Saxon. jXBW thus brings structural retrieval over million-record JSONL collections into the sub-millisecond range.2025-08-18T00:14:24ZYasuo Tabeihttp://arxiv.org/abs/2606.08713v1The price of incrementality in k-center clustering2026-06-07T16:09:04ZThe $k$-center problem is one of the best-studied and most intuitive clustering formulations. It asks, given a set of $n$ points in a metric space, for $k$ of the points to be designated as cluster centers, so that the maximum distance of an input point to its nearest center is minimized. Gonzalez's greedy algorithm from 1985 is a simple and efficient way to find a $2$-approximate solution. The algorithm has the attractive feature of \emph{incrementality}: it outputs the centers one by one, with a guaranteed $2$-approximation for every prefix of the obtained sequence of centers.
Incrementality imposes a geometric constraint on how solutions can be built, and it is natural to ask whether this comes at a price in the quality of the solution. It is known that in polynomial time, the approximation ratio of $2$ is best possible, assuming $P \neq NP$. In this paper we show that even with \emph{unlimited} computational power, the factor $2$ cannot be improved, if the solution is required to be built incrementally. The lower bound construction imposes a tradeoff between all $n$ levels of the clustering simultaneously; it was obtained with the help of ChatGPT, an aspect we discuss in Section 3 of the paper.2026-06-07T16:09:04ZLászló Kozmahttp://arxiv.org/abs/2606.08698v1Quotient Admission Algorithms for Witness-Supported Graph Windows2026-06-07T15:53:21ZWe formulate the quotient admission problem for finite graph-window rows. The input is a finite row set, an admissible evidence map, semantic labels, witness-support hypergraphs, and atom-level admissibility predicates. The output is a quotient decision on evidence atoms, with possible decisions certificate, residual, low-confidence, or blocked. The problem asks for the maximal guard-respecting atom-level decision map that uses no refinement beyond the admissible evidence partition. We prove an atom-union characterization of identifiable classes, give a witness-support hypergraph guard for certificate admission, characterize projected-label conflicts as blocked atoms, and present quotient admission algorithms with correctness, maximality, and complexity guarantees. With explicit evidence vectors and hyperedges, the algorithms run in expected O(B + I + n) time and space by hashing and deterministic O(B + I + n log n) time by sorting under a key-linear comparison model, where n is the number of rows, B is the total evidence encoding length, and I is the total hyperedge incidence size. We also prove a magnitude-only indistinguishability lower bound: any evaluator that observes only residual magnitudes fails on instances whose evidence atoms require different residual decisions after the magnitudes collapse them.2026-06-07T15:53:21Z11 pagesYushan Lihttp://arxiv.org/abs/2606.08662v1Uncertainty Principles for the Number Theoretic Transform2026-06-07T15:05:53ZMotivated by polynomial identity testing with exponentials (Li and Wu, ITCS'26), we study uncertainty principles for the number-theoretic transform (NTT). We show that the NTT satisfies strong sparsity tradeoffs: For every fixed prime $q$ and for all but finitely many primes $p \equiv 1 \pmod q$ every nonzero $f\in \mathbb F_p^{\mathbb Z_q}$ and its number-theoretic transform $\hat f$ satisfy \[ |\mathrm{Supp}(f)| + |\mathrm{Supp}(\hat f)| \ge q+1. \] Thus, a $k$-sparse function has transform support at least $q-k+1$. As our main technical contribution, we prove a probabilistic version of the above uncertainty principle, averaged over primes $p$, in the regime $p=q^{O(1)}$.
As an application, we obtain a black-box identity test for $k$-sparse exponential polynomials of degree at most $d$ with vanishing soundness error, for $q$ moderately larger than $k$.2026-06-07T15:05:53ZGiulio MalavoltaAlon Rosenhttp://arxiv.org/abs/2606.08646v1The Arithmetic Circuit Combinatorial Nullstellensatz is NP-hard2026-06-07T14:23:23ZA multivariate polynomial on $n$ variables $x_1,\ldots,x_n$ of total degree $n$ over $\mathbf{Z}_2$ containing the multilinear monomial $\prod_{i=1}^n x_i$ is by the combinatorial nullstellensatz [Alon, Comb. Probab. Comput., 1999] known to always have a nonroot. We show that there cannot be a randomised polynomial time algorithm that given an arithmetic circuit of polynomial size formally computing such a polynomial, locates a nonroot with constant nonzero probability unless RP=NP. The result holds even when the individual degree of every variable in the input polynomial is at most two.2026-06-07T14:23:23ZAndreas Björklundhttp://arxiv.org/abs/2412.16457v3Robust Random Graph Matching in Dense Graphs via an Approximate Message Passing Type Algorithm2026-06-07T12:25:36ZIn this paper, we focus on the matching recovery problem between a pair of correlated Gaussian Wigner matrices with a latent vertex correspondence. We are particularly interested in a robust version of this problem such that our observation is a perturbed input $(A+E,B+F)$ where $(A,B)$ is a pair of correlated Gaussian Wigner matrices and $E,F$ are adversarially chosen matrices supported on an unknown $εn * εn$ principal minor of $A,B$, respectively. We propose an approximate message passing (AMP) type iterative algorithm that succeeds in polynomial time as long as the correlation $ρ$ between $(A,B)$ is a non-vanishing constant and $ε= o\big( \tfrac{1}{(\log n)^{20}} \big)$. A key distinction from standard AMP is the introduction of a time-dependent matrix multiplication step within the iteration, which simultaneously enlarges the feature dimension and cancels the correlation during the iteration.
The main methodological inputs for our result are the iterative random graph matching algorithm proposed in \cite{DL22+, DL23+} and the spectral preprocessing procedure proposed in \cite{IS24+}. To the best of our knowledge, our algorithm is the first efficient random graph matching type algorithm that is robust under any adversarial perturbations of $n^{1-o(1)}$ size.2024-12-21T03:15:38Z46 pages; accepted by IEEE Trans. Inf. TheoryZhangsong Lihttp://arxiv.org/abs/2606.08597v1Kikuchi Graphs of Random Hypergraphs are Approximately Johnson2026-06-07T12:20:36ZWe prove that level-$\ell$ Kikuchi graphs of random $2r$-uniform hypergraphs spectrally approximate the Kikuchi graph of the complete $2r$-uniform hypergraph at a sampling rate that is sharp up to a logarithmic factor, in the regime $r\leq \ell \leq n/2$. Our proof is based on the matrix Bernstein inequality, but, unlike prior works, we apply it to an appropriate collection of blocks of Johnson eigenspaces. Our analysis relies on a new, simple band-locality property for arbitrary Kikuchi graphs. As an application, we prove that the natural degree-$2\ell$ sum-of-squares relaxation for the Max $2r$-XOR problem is ``integral'' when the input is a planted noisy $2r$-XOR instance on a random hypergraph with $\gtrsim n \cdot (n/\ell)^{r-1} \log n$ hyperedges.2026-06-07T12:20:36ZPravesh K. Kothari