https://arxiv.org/api/bXQdzN3ij3v2i5AeDuZFfRfjmpQ 2026-06-09T20:34:03Z 28933 0 15 http://arxiv.org/abs/2606.09729v1 Bayesian Probing on Graphs 2026-06-08T16:51:30Z We introduce a stochastic probing problem with correlated items. In our model, which we call Bayesian Probing, the correlations are modeled by an underlying graph $G$. Each vertex is independently active with a known probability. Each item corresponds to an edge in the graph. Probing an edge has some cost, gives some reward if both endpoints are active, and also reveals the state of its endpoints. Hence a probe induces a Bayesian update on the remaining edges. The goal is to adaptively probe items/edges subject to a knapsack constraint to maximize the expected total reward obtained from the probed edges. Bayesian Probing generalizes stochastic knapsack and stochastic probing by allowing correlations between items. Moreover, it gives a tractable model for the Bayesian Active Search problem, a popular problem considered in the machine learning community. In Bayesian Active Search, the goal is to find items in a particular class by adaptively probing at most, say $k$, items. Given a prior distribution over items, we want to compute a Bayesian policy to maximize the number of such items found. For this general problem with arbitrary priors, there are strong lower bounds on efficiently computing good policies. In this paper, we design efficient approximation algorithms for Bayesian Probing. These results give the first efficient approximation algorithms for Bayesian Active Search, for a class of practically-relevant prior distributions. 2026-06-08T16:51:30Z Anupam Gupta Benjamin Moseley Rudy Zhou http://arxiv.org/abs/2606.09728v1 Quantum Cut Sparsifiers 2026-06-08T16:51:08Z In this paper, we continue a line of research initiated by Basu, Brakensiek, and Putterman [2026] studying the sparsifiability of Hamiltonians. We focus particularly on the sparsifiability of the widely-studied Quantum Cut (QC) Hamiltonians. Our main result is that in an $n$-qubit system, any $n$-qubit QC Hamiltonian can be sparsified to $\widetilde{O}(n /\varepsilon^2)$ many terms while preserving the energy of every state up to a factor of $1 \pm \varepsilon$. Our result can be interpreted as giving an importance sampling scheme for the edges of an arbitrary graph $G$ such that the \emph{Kikuchi} graph at level $\ell$ of the sampled graph is a spectral approximation to the Kikuchi graph of $G$. Importantly, the \emph{same} sampling scheme works simultaneously for all $\ell$. The natural approach of leverage score sampling, analyzed via matrix concentration inequalities, yields a polynomially worse bound in our setting because the underlying matrices have dimension $\sim 2^n$. Instead, our approach relies on decomposing the action of these matrices into invariant subspaces. Then, by using an operator-valued inequality of Alon and Kozma [Ann. Henri Poincaré, 2020], itself building on an \emph{octopus inequality} of Caputo, Liggett, and Richthammer [J. AMS, 2010], we extend our sparsification technique to all expander graphs. We then invoke expander decomposition to extend our sparsifier to all graphs. 2026-06-08T16:51:08Z Arpon Basu Joshua Brakensiek Pravesh K. Kothari Aaron Putterman http://arxiv.org/abs/2505.05847v4 Smaller and More Flexible Cuckoo Filters 2026-06-08T13:40:57Z Cuckoo filters are space-efficient approximate set membership data structures with a controllable false positive rate (FPR) and zero false negatives, similar to Bloom filters. In contrast to Bloom filters, Cuckoo filters store multi-bit fingerprints of keys in a hash table using variants of Cuckoo hashing, allowing each fingerprint to be stored at a small number of possible locations. Existing Cuckoo filters use fingerprints of $(k+3)$ bits per key and an additional space overhead factor of at least $1.05$ to achieve an FPR of $2^{-k}$. For $k=10$, this amounts to $1.365\, kn$ bits to store $n$ keys, which is better than $1.443\, kn$ bits for Bloom filters. The $+3$ for the fingerprint size is required to balance out the multiplied FPR caused by looking for the fingerprint at several locations. In the original Cuckoo filter, the number of hash table buckets is restricted to a power of 2, which may lead to much larger space overheads, up to $2.1\, (1+3/k)\, kn$ bits. We present two improvements of Cuckoo filters. First, we remove the restriction that the number of buckets must be a power of 2 by using a different placement strategy. Second, we reduce the space overhead factor of Cuckoo filters to $1.06 \, (1+2/k)$ by using overlapping windows instead of disjoint buckets to maintain the load threshold of the hash table, while reducing the number of alternative slots where any fingerprint may be found. A detailed evaluation demonstrates that the alternative memory layout based on overlapping windows decreases the size of Cuckoo filters not only in theory, but also in practice. A comparison with other state-of-the art filter types, Prefix filters and Vector Quotient filters (VQFs), shows that the reduced space overhead makes windowed Cuckoo filters the smallest filters supporting online insertions, with similarly fast queries, but longer insertion times. 2025-05-09T07:37:38Z Johanna Elena Schmitz Jens Zentgraf Sven Rahmann http://arxiv.org/abs/2511.20397v2 Model-Based Learning of Whittle indices 2026-06-08T13:11:50Z We present BLINQ, a new model-based algorithm that learns the Whittle indices of an indexable, communicating and unichain Markov Decision Process (MDP). Our approach relies on building an empirical estimate of the MDP and then computing its Whittle indices using an extended version of a state-of-the-art existing algorithm. We provide a proof of convergence to the Whittle indices we want to learn as well as a bound on the time needed to learn them with arbitrary precision. Moreover, we investigate its computational complexity. Our numerical experiments suggest that BLINQ significantly outperforms existing Q-learning approaches in terms of the number of samples needed to get an accurate approximation. In addition, it has a total computational cost even lower than Q-learning for any reasonably high number of samples. These observations persist even when the Q-learning algorithms are speeded up using neural networks to predict Q-values. 2025-11-25T15:21:00Z 30 pages, 7 figures, submitted to TOMPECS Joël Charles-Rebuffé Nicolas Gast Bruno Gaujal http://arxiv.org/abs/2606.09318v1 Engineering Scalable Distributed List Ranking 2026-06-08T10:28:44Z The list ranking problem is one of the classical problems of parallel computing, with nontrivial algorithms and many applications as a subroutine for solving other problems. While it has been intensively studied in the early days of parallel computing, few things happened in the last 20 years. In particular, there is little work on scaling list ranking to large machines and input sizes. We reconsider list ranking starting from the ground-breaking results of Sibeyn a quarter century ago. We employ algorithm and performance engineering to improve his sparse ruling-set algorithm, making it capable of scaling to many processors, and provide a more detailed analysis of the impact of the algorithm's parameters, further guiding our practical implementation. We perform an extensive experimental study across a variety of input instances with different structural properties. We demonstrate that indirect communication, exploiting input locality, and message coalescing allows scaling to billions of elements on up to 24,576 cores. 2026-06-08T10:28:44Z 14 pages, 4 figures Peter Sanders Matthias Schimek Tim Niklas Uhl Thomas Weidmann http://arxiv.org/abs/2606.09133v1 Multiversion Concurrency Control for Multiversion B-Trees 2026-06-08T07:31:22Z Multiversion concurrency control (MVCC) enables scans to read data from a committed snapshot (version), reducing conflicts with write operations compared to traditional concurrency approaches. Currently, versioned records are often managed in a B$^+$-tree using version chains. However, version chains introduce overhead during scans and can still lead to conflicts between scans and writers. The multiversion B-tree (MVBT) was designed for optimal range scan performance on arbitrary versions, but has been considered impractical due to its structural complexity and, until recently, the lack of effective concurrency control. In this paper, we present the concurrent MVBT (cMVBT), a redesign of the MVBT featuring a novel concurrency control protocol that uses optimistic latches for write operations and requires no latches for range scans, while preserving all the optimality guarantees of the original MVBT. Additionally, cMVBT supports continuous garbage collection without activity spikes, seamlessly integrating free-space management. Experiments with mixed workloads derived from a standard benchmark show that the cMVBT achieves low overhead, high write throughput, and excellent range scan performance, outperforming state-of-the-art methods based on version chains. 2026-06-08T07:31:22Z Amir Tonta Bernhard Seeger Eljas Soisalon-Soininen http://arxiv.org/abs/2606.01342v3 Towards Optimal Robustness in Learning-Augmented Paging 2026-06-08T06:33:53Z Learning-augmented paging has been extensively studied in recent years. A key advantage over naive ML-based approaches is \emph{bounded robustness}, which guarantees worst-case performance even when predictions are inaccurate, making these algorithms valuable for real-world systems. Prior work achieves robustness bounds of $2H_k + O(1)$ in the randomized setting, leaving a gap to the optimal competitive ratio $H_k$. In this paper, we study how to close this gap. We begin by reviewing online optimality and proving a new property of the latest $H_k$-competitive algorithm, which facilitates our analysis in the learning-augmented setting. Then, we review existing learning-augmented paging algorithms and introduce a unifying primitive, the \emph{relative prediction budget}, which captures the essence of establishing robustness and reveals that prior algorithms either overuse or underutilize predictions. Guided by the above analysis, we develop a new framework that achieves the best-possible robustness up to an additive constant for learning-augmented paging: $H_k + O(1)$. Experiments further demonstrate strong practical performance. 2026-05-31T16:49:36Z ICML 2026 Peng Chen Hailiang Zhao Xueyan Tang Yixuan Wang Shuiguang Deng http://arxiv.org/abs/2507.03980v4 Functional design of efficient and parallelizable combinatorial generators using convolution 2026-06-08T04:17:36Z The application of program transformation and algebraic methods to the development of efficient combinatorial optimization (CO) algorithms relies on an exhaustive combinatorial generator for the problem specification, followed by the fusion of thinning or filtering processes into this specification. However, the effectiveness of such fusion transformations critically depends on the structural compatibility between the objective function and the generator, which is highly problem dependent. In practice, when the majority of candidate solutions remain unfiltered or are not eliminated-as is the case for most intractable CO problems-the overall efficiency of the resulting fused program is largely determined by the intrinsic efficiency of the combinatorial generator. Consequently, if the specification itself exhibits suboptimal performance, the fused program will inherit a correspondingly inferior level of efficiency. We argue that a genuine designed process should also account for hardware compatibility and parallelizability-particularly the ability to support efficient parallel execution on modern hardware architectures, including multi-level cache hierarchies and GPUs. However, does achieving formal correctness necessarily conflict with designing algebraically elegant algorithms that support fusion? Can we obtain both simultaneously? In this paper, we show that techniques from functional programming, provide powerful formal tools for the systematic construction of such hardware-compatible and parallelizable combinatorial generators. This paper investigates generators for two of the most fundamental combinatorial structures-combinations and permutations-together with their natural extension to nested generators (e.g., combinations/permutations of combinations/permutations). 2025-07-05T10:11:37Z Xi He Zhenjiang Hu Max. A. Little http://arxiv.org/abs/2606.08977v1 Online Learning with Recency: Algorithms for Sliding-window Streaming Multi-armed Bandits 2026-06-08T03:21:54Z Motivated by the recency effect in online learning, we study algorithms for single-pass *sliding-window streaming multi-armed bandits (MABs)* in this paper. In this setting, we are given $n$ arms with unknown sub-Gaussian reward distributions and a parameter $W$. The arms arrive in a single-pass stream, and only the most recent $W$ arms are considered valid. The algorithm is required to perform pure exploration and regret minimization with limited memory, defined as the number of stored arms. The model is a natural extension of the streaming multi-armed bandits model (without the sliding window) that has been extensively studied in recent years. We provide a comprehensive analysis of both the pure exploration and regret minimization problems with the model. For pure exploration, we prove that finding the best arm is hard with sublinear memory while finding an approximate best arm admits an efficient algorithm. For regret minimization, we explore a new notion of regret and give sharp memory-regret trade-offs for any single-pass algorithm. We complement our theoretical results with experiments, demonstrating the trade-offs between sample, regret, and memory. 2026-06-08T03:21:54Z ICML 2026 Vladimir Braverman Chen Wang Liudeng Wang Samson Zhou http://arxiv.org/abs/2508.11874v2 Discovering Expert-Level Nash Equilibrium Algorithms with Large Language Models 2026-06-08T00:24:06Z Designing polynomial-time algorithms for approximate Nash equilibria (ANE) with provable worst-case guarantees is a fundamental open problem in algorithmic game theory. While large language models (LLMs) can generate candidate algorithms at scale, certifying worst-case guarantees requires formal analysis over all game instances -- a task for which no automated system previously existed. Here, we present LegoNE, a framework encoding expert proof strategies into a symbolic language that automatically compiles any candidate algorithm into a finite optimization problem certifying its worst-case guarantee. Integrating LegoNE with a reasoning LLM, we rediscovered an algorithm matching the best polynomial-time guarantee for two-player games, and discovered a three-player algorithm improving the best guarantee from $0.6+δ$ to $0.5+δ$ -- provably beyond the reach of the extension technique, the only previously known multi-player ANE design paradigm. These results show that encoding domain-specific proof strategies into a machine-tractable language can support LLM-driven discovery of algorithms outside known human design paradigms. 2025-08-16T02:18:43Z accepted by Nature Communications Hanyu Li Dongchen Li Xiaotie Deng 10.1038/s41467-026-74003-1 http://arxiv.org/abs/2603.04689v3 Generalizing Fair Top-$k$ Selection: An Integrative Approach 2026-06-07T17:37:24Z Fair top-$k$ selection, which ensures appropriate proportional representation of members from minority or historically disadvantaged groups among the top-$k$ selected candidates, has drawn significant attention. We study the problem of finding a fair (linear) scoring function with multiple protected groups while also minimizing the disparity from a reference scoring function. This generalizes the prior setup, which was restricted to the single-group setting without disparity minimization. Previous studies imply that the number of protected groups may have a limited impact on the runtime efficiency. However, driven by the need for experimental exploration, we find that this implication overlooks a critical issue that may affect the fairness of the outcome. Once this issue is properly considered, our hardness analysis shows that the problem may become computationally intractable even for a two-dimensional dataset and small values of $k$. However, our analysis also reveals a gap in the hardness barrier, enabling us to recover the efficiency for the case of small $k$ when the number of protected groups is sufficiently small. Furthermore, beyond measuring disparity as the "distance" between the fair and the reference scoring functions, we introduce an alternative disparity measure$\unicode{x2014}$utility loss$\unicode{x2014}$that may yield a more stable scoring function under small weight perturbations. Through careful engineering trade-offs that balance implementation complexity, robustness, and performance, our augmented two-pronged solution demonstrates strong empirical performance on real-world datasets, with experimental observations also informing algorithm design and implementation decisions. 2026-03-05T00:06:47Z Guangya Cai http://arxiv.org/abs/2508.12536v3 jXBW: A Compressed Index for Structure-Aware JSONL Retrieval in Structured RAG 2026-06-07T16:31:47Z Providing \textit{structured} information to large language models (LLMs) improves multi-step reasoning and factual grounding, and recent retrieval-augmented generation (RAG) systems therefore reconstruct structure from retrieved text on every query. When the corpus is \emph{already} structured -- as in JSON Lines (JSONL), a popular format for LLM prompts, chemical compounds, and geospatial records -- this per-query rebuilding can be replaced by direct \emph{structural retrieval}. The core primitive is \textit{substructure search}: finding all JSON objects in a collection that contain a given query pattern. Existing approaches index each document separately, so both index space and query time grow with the total collection size; XML-based engines add conversion overhead and semantic mismatches. We propose \textbf{jXBW}, a compressed index for fast substructure search over JSONL, combining three innovations: (i) a merged tree representation that consolidates repeated structures across objects, (ii) a succinct tree index based on the eXtended Burrows--Wheeler Transform (XBW), and (iii) a newly developed three-phase substructure search algorithm that runs on this index. Together they achieve \textbf{query-dependent complexity}: the cost is determined by query characteristics rather than collection size, in compressed space. Experiments on seven real-world datasets, including PubChem ($10^6$ compounds) and OpenStreetMap ($6.6 \times 10^6$ objects), show that jXBW outperforms the strongest tree-based baseline by $\mathbf{16\times}$ on the smallest dataset and by up to $\mathbf{2{,}800\times}$ on the largest, and is more than $\mathbf{2 \times 10^6\times}$ faster than the XQuery engine Saxon. jXBW thus brings structural retrieval over million-record JSONL collections into the sub-millisecond range. 2025-08-18T00:14:24Z Yasuo Tabei http://arxiv.org/abs/2606.08713v1 The price of incrementality in k-center clustering 2026-06-07T16:09:04Z The $k$-center problem is one of the best-studied and most intuitive clustering formulations. It asks, given a set of $n$ points in a metric space, for $k$ of the points to be designated as cluster centers, so that the maximum distance of an input point to its nearest center is minimized. Gonzalez's greedy algorithm from 1985 is a simple and efficient way to find a $2$-approximate solution. The algorithm has the attractive feature of \emph{incrementality}: it outputs the centers one by one, with a guaranteed $2$-approximation for every prefix of the obtained sequence of centers. Incrementality imposes a geometric constraint on how solutions can be built, and it is natural to ask whether this comes at a price in the quality of the solution. It is known that in polynomial time, the approximation ratio of $2$ is best possible, assuming $P \neq NP$. In this paper we show that even with \emph{unlimited} computational power, the factor $2$ cannot be improved, if the solution is required to be built incrementally. The lower bound construction imposes a tradeoff between all $n$ levels of the clustering simultaneously; it was obtained with the help of ChatGPT, an aspect we discuss in Section 3 of the paper. 2026-06-07T16:09:04Z László Kozma http://arxiv.org/abs/2606.08698v1 Quotient Admission Algorithms for Witness-Supported Graph Windows 2026-06-07T15:53:21Z We formulate the quotient admission problem for finite graph-window rows. The input is a finite row set, an admissible evidence map, semantic labels, witness-support hypergraphs, and atom-level admissibility predicates. The output is a quotient decision on evidence atoms, with possible decisions certificate, residual, low-confidence, or blocked. The problem asks for the maximal guard-respecting atom-level decision map that uses no refinement beyond the admissible evidence partition. We prove an atom-union characterization of identifiable classes, give a witness-support hypergraph guard for certificate admission, characterize projected-label conflicts as blocked atoms, and present quotient admission algorithms with correctness, maximality, and complexity guarantees. With explicit evidence vectors and hyperedges, the algorithms run in expected O(B + I + n) time and space by hashing and deterministic O(B + I + n log n) time by sorting under a key-linear comparison model, where n is the number of rows, B is the total evidence encoding length, and I is the total hyperedge incidence size. We also prove a magnitude-only indistinguishability lower bound: any evaluator that observes only residual magnitudes fails on instances whose evidence atoms require different residual decisions after the magnitudes collapse them. 2026-06-07T15:53:21Z 11 pages Yushan Li http://arxiv.org/abs/2606.08662v1 Uncertainty Principles for the Number Theoretic Transform 2026-06-07T15:05:53Z Motivated by polynomial identity testing with exponentials (Li and Wu, ITCS'26), we study uncertainty principles for the number-theoretic transform (NTT). We show that the NTT satisfies strong sparsity tradeoffs: For every fixed prime $q$ and for all but finitely many primes $p \equiv 1 \pmod q$ every nonzero $f\in \mathbb F_p^{\mathbb Z_q}$ and its number-theoretic transform $\hat f$ satisfy \[ |\mathrm{Supp}(f)| + |\mathrm{Supp}(\hat f)| \ge q+1. \] Thus, a $k$-sparse function has transform support at least $q-k+1$. As our main technical contribution, we prove a probabilistic version of the above uncertainty principle, averaged over primes $p$, in the regime $p=q^{O(1)}$. As an application, we obtain a black-box identity test for $k$-sparse exponential polynomials of degree at most $d$ with vanishing soundness error, for $q$ moderately larger than $k$. 2026-06-07T15:05:53Z Giulio Malavolta Alon Rosen