https://arxiv.org/api/XEy+Vy7s5SQSQAPZ9eVPYzRJ5bA 2026-06-13T19:52:17Z 28966 120 15 http://arxiv.org/abs/2606.03129v1 Parallel Metric Skiplists and Nearest Neighbor Search 2026-06-02T04:15:35Z

The metric skip-list is a data structure designed for efficient nearest and $k$-nearest neighbor search in metric spaces. For many real-world datasets with reasonable distributions - specifically, those with a constant expansion rate - it supports $\tilde{O}(n)$ construction time and $O(k\log n)$ query time, where $n$ is the input size and $k$ is the number of nearest neighbors in queries. Notably, unlike alternative approaches, it does not require a bounded aspect ratio, making it more flexible for input data distributions. However, the inherently sequential nature of its original construction has, to our knowledge, precluded any existing parallel algorithm. In this paper, we present highly parallel and work-efficient algorithms for constructing metric skip lists. Under the assumption of a constant expansion rate, our approach achieves an expected work of $O(n \log n)$ and a polylogarithmic span with high probability. Our design is based on novel algorithmic insights that improves the sequential procedure, enabling a divide-and-conquer strategy that facilitates parallelism while maintaining efficiency. With our algorithms, we can also support improved bounds for relevant applications using nearest neighbor as building blocks, including bichromatic closest pair (BCP), density-based clustering, and $k$-NN graph construction, among others. To our knowledge, many of these results represent the first solutions to achieve both work efficiency and polylogarithmic span, relying solely on the assumption of a constant expansion rate.

2026-06-02T04:15:35Z Xiangyun Ding Rohin Garg Yan Gu Yihan Sun http://arxiv.org/abs/2604.01960v3 BBC: Improving Large-k Approximate Nearest Neighbor Search with a Bucket-based Result Collector 2026-06-02T03:45:45Z

Although Approximate Nearest Neighbor (ANN) search has been extensively studied, large-k ANN queries that aim to retrieve a large number of nearest neighbors remain underexplored, despite their numerous real-world applications. Existing ANN methods face significant performance degradation for such queries. In this work, we first investigate the reasons for the performance degradation of quantization-based ANN indexes: (1) the inefficiency of existing top-k collectors, which incurs significant overhead in candidate maintenance, and (2) the reduced pruning effectiveness of quantization methods, which leads to a costly re-ranking process. To address this, we propose a novel bucket-based result collector (BBC) to enhance the efficiency of existing quantization-based ANN indexes for large-k ANN queries. BBC introduces two key components: (1) a bucket-based result buffer that organizes candidates into buckets by their distances to the query. This design reduces ranking costs and improves cache efficiency, enabling high performance maintenance of a candidate superset and a lightweight final selection of top-k results. (2) two re-ranking algorithms tailored for different types of quantization methods, which accelerate their re-ranking process by reducing either the number of candidate objects to be re-ranked or cache misses. Extensive experiments on real-world datasets demonstrate that BBC accelerates existing quantization-based ANN methods by up to 3.8x at recall@k = 0.95 for large-k ANN queries.

2026-04-02T12:22:38Z Ziqi Yin Gao Cong Kai Zeng Jinwei Zhu Bin Cui http://arxiv.org/abs/2606.04038v1 Graphical and algebraic methods for Boolean factoring 2026-06-02T02:57:27Z

The problem of factoring Boolean polynomials has significant applications in both classical and quantum computing technology. In this paper we have developed novel algorithms for factoring both ESOP and SOP expressions. Our aim is to optimize the AND-count. The AND-count plays a key role in determining the number of AND and Toffoli gates required to implement a reversible function with classical and quantum circuits, respectively. The first type of algorithms that we develop, are graphical. We reduce the problem of Boolean factoring to covering a bipartite graph with bicliques, and so optimizing the number of bicliques required to cover the bipartite graph, leads to reduced number of factors, and hence AND-count. The second type of algorithm is algebraic, and is derived from multivariate Horner method. We have compared the performances of our algorithms with existing popular methods like EXORCISM-4 and EPOEM2, on random functions of up to 12 variables. We have observed that our multivariate Horner method is substantially faster, while our biclique-based method achieves the maximum AND-count reduction. In fact, compared to EXORCISM-4 our biclique based method achieves up to 5 times reduction in AND-count.

2026-06-02T02:57:27Z Simon Martiel Priyanka Mukhopadhyay http://arxiv.org/abs/2606.02948v1 From Non-Convex to Strongly Convex: Curvature-Adaptive FTPL for Online Optimization 2026-06-01T23:01:36Z

Curvature adaptivity is a classical theme in online optimization: for convex Lipschitz losses, adaptive methods interpolate between the optimal $O(\sqrt{T})$ regret for general convex losses and $O(\log T)$ regret under strong convexity. Recent work has shown that Follow-the-Perturbed-Leader (FTPL) achieves optimal $O(\sqrt{T})$ regret even for online non-convex Lipschitz losses, assuming access to an approximate offline-optimization oracle, but these guarantees do not exploit curvature. We show that FTPL can be made curvature-adaptive in the non-convex setting, without knowing in advance how curvature will accumulate over time. Our algorithm replaces the fixed perturbation scale of standard FTPL with a time-varying scale chosen using only past information. We give a simple follow-the-leader tuning rule for this scale and show that it competes, up to constants, with the best choice in hindsight. The resulting method achieves $O(\sqrt{T})$ regret for arbitrary non-convex Lipschitz losses and improves as cumulative curvature grows; with sufficiently accurate oracle calls, it achieves $O(\log T)$ regret when cumulative curvature grows linearly, which includes the classical strongly convex regime. We complement these upper bounds with matching lower bounds for prescribed cumulative-curvature sequences, already for one-dimensional convex losses, showing that the tradeoff between worst-case non-convex regret and curvature-driven fast rates is intrinsic.

2026-06-01T23:01:36Z Moses Charikar Chirag Pabbaraju Ambuj Tewari http://arxiv.org/abs/2504.01206v3 SplineSketch: Even More Accurate Quantiles with Error Guarantees 2026-06-01T20:24:47Z

Space-efficient streaming estimation of quantiles in massive datasets is a fundamental problem with numerous applications in data monitoring and analysis. While theoretical research led to optimal algorithms, such as the Greenwald-Khanna algorithm or the KLL sketch, practitioners often use other sketches that perform significantly better in practice but lack theoretical guarantees. Most notably, the widely used $t$-digest has unbounded worst-case error. In this paper, we seek to get the best of both worlds. We present a new quantile summary, SplineSketch, for numeric data, offering near-optimal theoretical guarantees, namely uniformly bounded rank error, and outperforming $t$-digest by a factor of 2-20 on a range of synthetic and real-world datasets. To achieve such performance, we develop a novel approach that maintains a dynamic subdivision of the input range into buckets while fitting the input distribution using monotone cubic spline interpolation. The core challenge is implementing this method in a space-efficient manner while ensuring strong worst-case guarantees.

2025-04-01T21:39:50Z Presented at SIGMOD'26. Changes since v2: Major revision of the theoretical properties and analysis Aleksander Łukasiewicz, Jakub Tětek, Pavel Veselý: SplineSketch: Even More Accurate Quantiles with Error Guarantees. Proc. ACM Manag. Data 3(6): 1-26 (2025) Aleksander Łukasiewicz Jakub Tětek Pavel Veselý 10.1145/3769827 http://arxiv.org/abs/2606.02805v1 On the gap of quiver representations 2026-06-01T19:24:49Z

The nullcone membership problem, deciding whether an orbit closure contains the origin, is fundamental in computational invariant theory. For self-adjoint groups, Bürgisser, Franks, Garg, Oliveira, Walter and Wigderson gave a geodesic optimization algorithm whose complexity is controlled by the gap, a condition number of the representation. We study the gap for quiver representations under the action of the special linear group. We prove that the inverse gap is polynomially bounded in the number of vertices and the maximum dimension for type A and $\hat{A}$, as well as tree quivers with uniform dimension vectors. Consequently, the algorithm of Bürgisser et al. solves the nullcone membership problem in polynomial time for these families. In contrast, we construct families of quivers and dimension vectors where the gap is exponentially small in the number of leaves, furthermore, for every connected quiver we exhibit dimension vectors such that the weight margin (a related condition number) is exponentially small in the number of vertices. We also extend our results to $σ$-semistability, thereby giving a new proof of a recent result of Iwamasa, Oki, and Soma.

2026-06-01T19:24:49Z John Maar http://arxiv.org/abs/2606.02752v1 Online K-d tree for approximate neighborhood search in data streams 2026-06-01T18:18:29Z

The k-Nearest Neighbors (kNN) algorithm has long been widely used in Machine Learning (ML) applications. However, the main concern when using it is the computational cost required for neighborhood search, which can make it unfeasible for large-scale applications. Optimization algorithms, such as the K-d tree, become an option in such scenarios. Under data streams, it can be challenging to maintain the properties of the K-d tree, as it requires inserting and deleting nodes on the fly. These operations can make maintaining the tree's balance and invariants difficult. Additionally, traditional K-d trees were initially designed for Minkowski-based distance functions. In this work, we describe an Online K-d tree and its adaptation to the Canberra distance that supports dynamic updates over data streams while preserving the structural invariants required for efficient traversal. Experimental analysis demonstrates that the Online K-d tree algorithm achieves faster processing time under data streams, and that adapting to the Canberra distance enabled effective subtree pruning, as evidenced by a minor loss in average accuracy and a substantial gain in instances processed per second. Our implementation can be found in our GitHub repository

2026-06-01T18:18:29Z Paper accepted to the ICPRAI 2026 Eduardo V. L. Barboza Robert Sabourin Rafael M. O. Cruz http://arxiv.org/abs/2606.02492v1 $O(n +f(k))$: Truly Linear FPT 2026-06-01T17:01:31Z

Parameterized complexity has always been concerned with practical computing: by confining combinatorial explosion to a secondary parameter $k$, one can uncover why and how many NP-hard problems are effectively tackled in practice. Today, however, the scale of data has changed: scientists study Big Data, which is so large that even quadratic dependence in the total input size $n$ is unaffordable. Therefore, what constitutes a practical algorithm has also changed. Classically, parameterized complexity is blind to the difference between defining fixed parameter tractability multiplicatively (i.e. $f(k) \cdot n^c$) or additively (i.e. $f(k) + n^c$). But what if the constant $c$ is one and we require true linearity, is this distinction still inconsequential? Here, we define and explore Truly Linear FPT (TLFPT) -- that is $O(n)+f(k)$ -- and show that it is a strict subset of Linear FPT (LFPT) -- that is $O(n) \cdot f(k)$ -- via diagonalization. Populating TLFPT requires careful consideration of linear-time algorithmics and data structures. We meet many inhabitants of TLFPT: SAT, Vertex Cover, Min-Max Matching, $(n-k)$-Coloring, Diverse Pair of Matchings, $k$-Path, and $H$-Coloring. Our parameterizations are equally varied. Beyond classical parameters like solution size, we leverage two parameters, treedepth and BFS-width, which are particularly well-suited to the TLFPT regime. We do so by developing techniques based on depth- and breadth-first search. For parameterized complexity to be of service to the scientific community, we need to contend with Big Data. For sufficiently large inputs, FPT beyond linear may not suffice. Thus, there is a practical and theoretical need for more ambitious goals. TLFPT is a first step forward.

2026-06-01T17:01:31Z 42 pages, 5 figures Benjamin Merlin Bumpus Rod Downey Tala Eagling-Vose Jessica Enright Michael R. Fellows David C. Kutner Laura Larios-Jones Barnaby Martin Frances Rosamond Ella Yates http://arxiv.org/abs/2606.02325v1 Terminal Steiner tree problem : Complexity and Algorithms 2026-06-01T14:37:12Z

Given a connected graph $G$ and a terminal set $R \subseteq V(G)$, the Steiner tree problem (ST) asks for a tree that spans all of $R$ with at most $r$ vertices from $V(G)\backslash R$, for some integer $r\geq 0$. It is known from (Garey et al.,1977 ) that ST is NP-complete. A Steiner tree in which all terminal vertices are constrained to be leaves is called a terminal Steiner tree. Our study addresses the existence of a terminal Steiner tree, its complexity across various graph classes, black-box applications of the ST, and a fixed-parameter tractable (FPT) algorithm with respect to the number of terminals.

2026-06-01T14:37:12Z Jyothish S Sadagopan Narasimhan http://arxiv.org/abs/2606.02263v1 Exact Sampling of Permutations with a Fixed Longest Increasing Subsequence 2026-06-01T13:50:04Z

We study exact uniform sampling of permutations of length $n$ whose longest increasing subsequence (LIS) has prescribed length $k$. For $k \in Θ(n)$, we give a direct rejection sampler whose expected running time is $O(n\log\log n)$ in the word-RAM model. The sampler uses an expanded proposal space consisting of permutations together with a specified increasing subsequence, and accepts exactly those proposals whose specified subsequence is the leftmost LIS. For arbitrary $1\le k\le n$, we give an exact sampler based on the Robinson--Schensted correspondence. The algorithm samples the corresponding Plancherel-conditioned shape by computing exact completion counts via determinant identities, and then samples two uniform tableaux of that shape. The direct implementation runs in $\tilde O(n^4k^5)$ expected time. We then show that the same sampler can be implemented in expected $\tilde O(n^3k^4)$ time by evaluating a determinant oracle through Hankel moment matrices.

2026-06-01T13:50:04Z Peter Clifford Raphaël Clifford http://arxiv.org/abs/2606.02183v1 Efficiently Listing Projected Trees, and Equivalence of Listing and Enumeration 2026-06-01T12:38:26Z

The subgraph isomorphism problem and its generalizations such as conjunctive queries, where some nodes are projected, are among the most fundamental problems in graph algorithms and database theory. In this paper, we study the listing and enumeration variants of these problems and present two main results. (1) We present the first algorithms for enumerating projected trees with polynomial preprocessing time ($\widetilde{O}(n^{17.42})$) and polylogarithmic delay ($\mathrm{polylog}(n)$). Prior to this work, all algorithms in the literature required time $Ω(n^{Ω(k)} + t)$ or $t \cdot n^{Ω(1)}$ to list all copies of a $k$-node tree with projections, where $t$ is the number of solutions. Our result generalizes to arbitrary projected hypergraphs, achieving enumeration in preprocessing time $\widetilde{O}(m^{17.42 \cdot \mathrm{subw}(H)})$ and polylogarithmic delay, where $\mathrm{subw}(H)$ is the submodular width of the pattern hypergraph $H$. We heavily rely on fast (rectangular and output-sensitive) matrix multiplication, which we complement by fine-grained lower bounds indicating that any algorithm beating time $Ω(n^{Ω(k)} + t)$ must rely on fast matrix multiplication. (2) As our second main result, we present a generic enumeration-to-listing reduction, establishing that listing and enumeration are equivalent under natural assumptions. For (colored) subgraph isomorphism, our reduction transforms any listing algorithm running in time $O(f(n,m) + t \cdot g(n,m))$ into an enumeration algorithm with preprocessing time $O((f(n,m)+g(n,m)+m) \log^2 n)$ and delay $O(g(n,m))$. We utilize this equivalence as a tool for proving our first main result, and we expect that our generic reduction will find many future applications.

2026-06-01T12:38:26Z Karl Bringmann Nick Fischer Yanheng Wang http://arxiv.org/abs/2606.02029v1 The Completion-Threshold Framework for Obligatory-Test Scheduling on Multiple Machines 2026-06-01T10:17:47Z

We study online scheduling with obligatory testing on $m$ identical machines with the objective of minimizing the sum of completion times. In this model, every job must undergo a test before its actual processing time is revealed. Consequently, the central algorithmic challenge is no longer whether to acquire information, but how to optimally balance machine capacity between revealing unknown jobs and processing currently known ones. While this tradeoff becomes structurally richer in the multiple-machine setting, the only prior explicit deterministic lower bound for this objective was $\sqrt{2}$, established strictly for a single machine in 2024 by Dogeas et al. [ESA 2024: 48:1-48:14]. Our core conceptual contribution is demonstrating that completion-threshold quantities, denoted $T_X$, serve as the fundamental analytical metric for this setting. Because every completed job must first pass through the testing phase, delayed revelation inherently forces delayed completion. By bounding these $T_X$ thresholds, we systematically derive strong lower bounds on the total completion time. Utilizing this framework, we establish the first substantial deterministic lower bounds for multiple machines, including a three-type bound of $1.4811$ and a multi-type dyadic construction that asymptotically approaches $3/2$. Finally, we complement these theoretical limits with a deterministic $2$-competitive list-scheduling algorithm for arbitrary test times.

2026-06-01T10:17:47Z Kao-Chuan Liang Ya-Chun Liang http://arxiv.org/abs/2411.12438v2 Dimension Reduction via Sum-of-Squares and Improved Clustering Algorithms for Non-Spherical Mixtures 2026-06-01T08:02:06Z

We develop a new approach for clustering non-spherical (i.e., arbitrary component covariances) Gaussian mixture models via a subroutine, based on the sum-of-squares method, that finds a low-dimensional separation-preserving projection of the input data. Our method gives a non-spherical analog of the classical dimension reduction, based on singular value decomposition, that, among several other applications, forms a key component of the celebrated spherical clustering algorithm of Vempala and Wang [VW04]. As applications, we obtain an algorithm to (1) cluster an arbitrary total-variation separated mixture of $k$ centered (i.e., zero-mean) Gaussians with $n\geq \operatorname{poly}(d) f(w_{\min}^{-1})$ samples and $\operatorname{poly}(n)$ time, and (2) cluster an arbitrary total-variation separated mixture of $k$ Gaussians with identical but arbitrary unknown covariance with $n \geq d^{O(\log w_{\min}^{-1})} f(w_{\min}^{-1})$ samples and $n^{O(\log w_{\min}^{-1})}$ time. Here, $w_{\min}$ is the minimum mixing weight of the input mixture, and $f$ does not depend on the dimension $d$. Our algorithms naturally extend to tolerating a dimension-independent fraction of arbitrary outliers. Before this work, the techniques in the state-of-the-art non-spherical clustering algorithms needed $d^{O(k)} f(w_{\min}^{-1})$ samples and time for clustering such mixtures. Our results may come as a surprise in the context of the $d^{Ω(k)}$ statistical query and sum-of-squares lower bounds [DKS17, DKPP24] for clustering non-spherical Gaussian mixtures. While these results are usually thought to rule out $d^{o(k)}$ cost algorithms for the problem, our results show that the lower bounds can in fact be circumvented for a remarkably general class of Gaussian mixtures.

2024-11-19T11:58:51Z 67 pages, updated to match camera-ready version at COLT 2026 Prashanti Anderson Mitali Bafna Rares-Darius Buhai Pravesh K. Kothari David Steurer http://arxiv.org/abs/2606.01809v1 A Near-Optimal Offline Algorithm for Dynamic All-Pairs Shortest Paths in Planar Digraphs 2026-06-01T07:27:10Z

In the planar, dynamic All-Pairs Shortest Paths (APSP) problem, a planar, weighted digraph $G$ undergoes a sequence of edge weight updates and the goal is to maintain a data structure on $G$, that can quickly answer distance queries between any two vertices $x,y \in V(G)$. The currently best algorithms for this problem require $\tilde{O}(n^{2/3})$ worst-case update and query time, while conditional lower bounds show that either update or query time $n^{0.5-δ}$ is needed for any constant $δ> 0$. In this article, we present the first algorithm with near-optimal $\tilde{O}(\sqrt{n})$ worst-case update and query time for the offline setting, where the update sequence is given initially. This result is obtained by giving the first offline dynamic algorithm for maintaining dense distance graphs (DDGs) faster than recomputing from scratch after each update. Further, we also present an \emph{online} algorithm for the incremental APSP problem with $\tilde{O}(\sqrt{n})$ worst-case update/ query time. This allows us to reduce the online dynamic APSP problem to the online decremental APSP problem, which constitutes partial progress even for the online version of this notorious problem.

2026-06-01T07:27:10Z Appeared in SODA'2022 Debarati Das Maximilian Probst Gutenberg Christian Wulff-Nilsen http://arxiv.org/abs/2606.01693v1 Scalable Concurrent Queues for GPU 2026-06-01T04:57:32Z

Concurrent queues can significantly impact supercomputing performance by being critical bottlenecks for task distribution, load balancing, and resource utilization. As HPC systems move beyond 10-million processor cores, the ability to rapidly move items between producer and consumer threads without excessive locking is essential for efficient queues, preventing idle cores, maximizing utilization, and achieving high parallel speedup. While concurrent queues are well studied on CPUs, they remain largely unexplored on modern GPUs, where SIMT execution, massive parallelism, and atomic contention reshape the design space. We present three linearizable GPU concurrent queues spanning from lock-free to wait-free guarantees: (1) G-WFQ-YMC, an adaptation of Yang and Mellor-Crummey's wait-free queue using preallocated segments; (2) G-LFQ, a bounded lock-free queue that uses wave-batched fast paths to maximize throughput; and (3) G-WFQ, a bounded wait-free queue that packs shared state into 64-bit compare-and-swap operations while preserving linearizability and bounded memory.

2026-06-01T04:57:32Z 10 pages, 5 figures Pratheek Prakash Shetty Thomas R. W. Scogland Wu-chun Feng