https://arxiv.org/api/QxGYYoBSPfVgW7VpoFFk+Bmn0us2026-06-13T16:09:01Z289667515http://arxiv.org/abs/2606.07408v1Earliest query answering over streamed trees2026-06-05T15:53:36ZStreaming allows executing queries over massive JSON or XML documents whose size makes it infeasible to fully parse them into a tree. Earliest query answering is a radical approach to reducing latency and memory footprint. To minimize latency, a document node must be returned as soon as the node is guaranteed to be an answer regardless of how the document ends. Similarly, to minimize memory footprint, a node must be discarded as soon as it cannot become an answer regardless of how the document ends. For simple queries that select nodes based on the path from the root, the decision for each node can be made on the spot, but practical languages such as XPath or JSONpath support filters, which allow selecting nodes based on information collected from various parts of the document, possibly further down the stream. This makes earliest query answering a challenging task, as candidate nodes must be kept in memory until it becomes clear that they can be safely returned or discarded. We show that this can be done for all unary queries expressible in monadic second order logic (MSO), while ensuring constant update time -- provided that nodes are returned by passing a suitable iterator, rather than one by one.2026-06-05T15:53:36ZMateusz GienieczkoMartín MuñozFilip MurlakCharles Papermanhttp://arxiv.org/abs/2605.31071v2Tree Containment Parameterized by Scanwidth2026-06-05T14:52:12ZTREE CONTAINMENT is a central decision problem in mathematical phylogenetics, asking whether a given rooted phylogenetic tree is embeddable in ("displayed by") a given rooted phylogenetic network. While the problem is NP-complete for general networks, many algorithmic advances have relied on structural parameters that capture how "tree-like" a network is. In this paper we investigate TREE CONTAINMENT under the structural parameter scanwidth, a directed width measure generalizing popular parameters measuring tree-likeness of phylogenetic networks. We first present a parameterized algorithm that solves the problem in $O(4^{k + k\log{k}} n + nm^2)$ time, where $n$ and $m$ are the numbers of nodes and arcs in the network and $k$ is the width of a given tree-extension. Complementing this upper bound, we prove a matching lower bound under the Exponential-Time Hypothesis (ETH), showing that there is no algorithm for TREE CONTAINMENT that runs in $2^{o(c\log{c})} n^{O(1)}$ time, even on binary inputs, where $c$ is the directed cutwidth of the input network, which upper-bounds the scanwidth $k$.2026-05-29T09:38:57ZLeo van IerselMark JonesMathias Wellerhttp://arxiv.org/abs/2202.05907v4Fast and perfect sampling of subgraphs and polymer systems2026-06-05T12:44:58ZWe give an efficient perfect sampling algorithm for weighted, connected induced subgraphs (or graphlets) of rooted, bounded degree graphs. Our algorithm utilizes a vertex-percolation process with a carefully chosen rejection filter and works under a percolation subcriticality condition. We show that this condition is optimal in the sense that the task of (approximately) sampling weighted rooted graphlets becomes impossible in finite expected time for infinite graphs and intractable for finite graphs when the condition does not hold. We apply our sampling algorithm as a subroutine to give near linear-time perfect sampling algorithms for polymer models and weighted non-rooted graphlets in finite graphs, two widely studied yet very different problems. This new perfect sampling algorithm for polymer models gives improved sampling algorithms for spin systems at low temperatures on expander graphs and unbalanced bipartite graphs, among other applications.2022-02-11T21:13:40ZAntonio BlancaSarah CannonWill Perkinshttp://arxiv.org/abs/2606.07205v1Towards Tight Bounds for Streaming Attention2026-06-05T12:15:33ZThe attention mechanism is a cornerstone of modern transformer architectures. However, its expressive power comes at the cost of quadratic runtime and linear space usage. In particular, the classical transformer architecture explicitly stores all previously seen input elements (tokens) in order to generate the next one. The problem of implementing a transformer in limited space, known as KV cache compression, has received much interest over the past few years, spurring the development of powerful heuristics. Recent works of Haris et al, COLT'25 and Kochetkova et al, NeurIPS'25, formalized KV cache compression as the streaming attention approximation problem, providing both upper bounds (based on discrepancy theory) and information theoretic lower bounds. However, those papers left open a significant gap between the upper and lower bounds. For example, the space usage of their algorithms increases with the precision parameter, but the lower bound does not get stronger.
In this work, we revisit the streaming attention approximation problem and provide nearly tight bounds on its space complexity. On the algorithmic side, we achieve the result through a surprisingly tight interplay between three distinct methods for kernel density estimation: discrepancy-based coreset constructions (e.g., Charikar-Kapralov-Waingarten'24), the polynomial method (e.g., Greengard-Rokhlin'87, Alman-Song'23), and space partitioning (e.g., Andoni-Laarhoven-Razenshteyn-Waingarten'17, Charikar-Kapralov-Nouri-Siminelakis'20). On the lower bound side, our main technical contribution is a new technique for using the INDEX problem with a large amount of side information that we hope will prove useful in other high dimensional geometric estimation problems.2026-06-05T12:15:33ZJustin Y. ChenYing FengPiotr IndykMichael KapralovEkaterina KochetkovaBoris Prokhorovhttp://arxiv.org/abs/2602.22976v3Efficient Parallel Algorithms for Hypergraph Matching2026-06-05T11:52:25ZWe present efficient parallel algorithms for computing maximal matchings in hypergraphs. Our algorithm finds locally maximal edges in the hypergraph and adds them in parallel to the matching. In the CRCW PRAM models our algorithms achieve $O(\log{\logΔ}\log{m})$ time with $O(κ\log {m})$ work w.h.p. where $m$ is the number of hyperedges, and $κ$ is the sum and $Δ$ is the maximum of all vertex degrees. The CREW PRAM model algorithm has a running time of $O((\logΔ+\log{d})\log{m})$ and requires $O(κ\log {m})$ work w.h.p. It can be implemented work-optimal with $O(κ)$ work in $O((\log{m}+\log{n})\log{m})$ time. We prove a~$1/d$-approximation guarantee for our algorithms.
We evaluate our algorithms experimentally by implementing and running the proposed algorithms on the GPU using CUDA and Kokkos. Our experimental evaluation demonstrates the practical efficiency of our approach on real-world hypergraph instances, yielding a speed up of up to 76 times compared to a single-core CPU algorithm.2026-02-26T13:16:38ZHenrik ReinstädtlerChristian SchulzNodari SitchinavaFabian Walliserhttp://arxiv.org/abs/2606.06287v2Quantum Algorithms for Triangle Cut Sparsification2026-06-05T03:00:06ZTriangles capture higher-order structures in graphs and are fundamental to applications such as clustering and network analysis. To enable efficient use of such structures at scale, we study the problem of triangle cut sparsification, which aims to reduce the graph size while approximately preserving triangle counts across every cut. We investigate quantum algorithms for this problem, using triangle listing as our main technical ingredient. In particular, we present a quantum algorithm for triangle listing that, for a graph with $n$ vertices, $m$ edges, and $t$ triangles, runs in time $T_{\mathrm{q\text{-}list}} =$ $\widetilde{O}\bigl(\min(n^{5/4}t^{7/12} + n^{7/6}t^{7/9}, m + m^{3/4}t^{1/2},$ $n^{3/2}t^{1/2})\bigr)$, improving upon the best known classical bounds over a broad range of parameters. Our algorithm is based on a heavy-light vertex partition and an extension of triangle detection via quantum walks and Grover search. Leveraging this result, we design a quantum algorithm for constructing $\varepsilon$-triangle cut sparsifiers of size $\widetilde{O}(n/\varepsilon^2)$ in time $\widetilde{O}(T_{\mathrm{q\text{-}list}} + \sqrt{mn}/\varepsilon)$. Finally, we demonstrate applications to clustering algorithms based on triangle-related measures and prove a lower bound of $Ω(n/\varepsilon^2)$ on the size of any $\varepsilon$-triangle cut sparsifiers.2026-06-04T15:25:34ZICML 2026Shan JiangPan Penghttp://arxiv.org/abs/2606.06686v1On the Hardness of Optimal Motion on Trees2026-06-04T20:00:05ZThis paper presents a simple framework that settles the complexity of Multi-Agent Path Finding (MAPF) on trees across standard objectives--distance, makespan, and flowtime--for both labeled and colored variants. In MAPF, agents occupy the vertices of a graph and must move to target vertices without collisions while optimizing a given objective. In the labeled case, the agents are distinct and have respective targets; in the colored case, agents of the same color are interchangeable. While many MAPF variants are known to be intractable, several basic cases on trees have remained open. We prove NP-hardness on trees for both labeled and 2-colored MAPF under all three objectives. In particular, we resolve the classical Pebble Motion problem, where one pebble moves at a time to an adjacent empty vertex and the goal is to minimize the total number of moves. Despite being one of the most basic discrete motion models, its complexity on trees had remained open for several decades. Moreover, for colored Pebble Motion, we give the first hardness result on any graph class, already with two colors, which is tight.
All of these results are established through the hardness of Stack Rearrangement, itself posed as an open problem, which asks to optimally rearrange items stored in stacks, and which we also prove to be NP-hard. Notably, the connection to stacks yields hardness already on very simple trees--subdivided stars--across all problems. Together, these results reveal a common tractability barrier that permeates several fundamental motion models, thereby unifying and strengthening prior hardness results.2026-06-04T20:00:05ZTzvika Gefthttp://arxiv.org/abs/2606.06681v1Online Span Minimization for Flexible Uniform Jobs2026-06-04T19:55:05ZMotivated by the critical need for energy-efficient scheduling in cloud computing, this paper investigates Span Minimization, a fundamental variant of the well-studied BusyTime problem. In the general BusyTime problem, $n$ jobs characterized by release times, deadlines, and processing times must be partitioned into bundles of capacity $B$, where the objective is to minimize the total active duration of the virtual machines. Span minimization addresses the specific case of unbounded capacity ($B = \infty$), a problem that serves as a vital precursor for achieving high-performance approximation guarantees in more complex scheduling environments.
While previous research established a deterministic $2$-approximation for interval jobs and a $3$-approximation for the general BusyTime problem, the online landscape of span minimization remains less explored. In this paper, we focus on the online version of span minimization. We demonstrate that randomization can be leveraged to break the known deterministic competitive barrier of $2$. For uniform-length jobs, we derive a randomized competitive upper bound of $\frac{1}{\ln{2}}\approx 1.443$ and a lower bound of $\frac{\sqrt{3}+1}{2}\approx 1.366$. Furthermore, we show that by introducing the ability to restart jobs, we can achieve an optimal competitive ratio equal to the golden ratio ($φ\approx 1.618$). Our results provide new insights into the power of randomization and flexibility in online energy-aware scheduling.2026-06-04T19:55:05ZThis paper will appear in ACM SPAA 2026 conferenceMozhengfu LiuSamir KhullerXueyan Tanghttp://arxiv.org/abs/2606.06439v1Temporal matching in trees2026-06-04T17:41:21ZWe study maximum matching problems in temporal graphs whose underlying graph is a tree. We consider two temporal models. In a $Δ$-matching, selected time edges sharing an endpoint must have time ticks differing by at least $Δ$. In a $γ$-matching, the selected objects are blocks of $γ$ consecutive appearances of the same underlying edge. We also consider the related ordered static problem of $d$-distance matchings.
We show that maximum $Δ$-matching remains NP-hard on temporal trees for every $Δ\geq 2$, even in the sparse case where each edge appears at most twice. Using a reduction between the temporal models, we obtain the analogous result for maximum $γ$-matching on temporal trees, even when each edge admits at most two $γ$-edges. We also show, via a reduction from $d$-distance matching, that maximum $γ$-matching is APX-hard even when the underlying graph is bipartite.
Complementing these hardness results, we identify several tractable cases. We prove that maximum $Δ$-matching is polynomial-time solvable on temporal trees in which every edge appears exactly once, and that maximum $γ$-matching is polynomial-time solvable when each edge admits at most one $γ$-edge. We also give dynamic-programming algorithms under bounded local-use and local-sparsity assumptions, and derive polynomial-time solvability of maximum $d$-distance matching when the input bipartite graph is a tree. Finally, we prove that both maximum $Δ$-matching and maximum $γ$-matching admit polynomial-time approximation schemes on temporal trees.2026-06-04T17:41:21ZMárk Hunor JuhászPéter Madarasihttp://arxiv.org/abs/2606.06316v1Quantum enhanced rare event discovery and sampling2026-06-04T15:54:53ZFinancial crashes, cascading failures in infrastructure, and critical errors in AI systems are frequently triggered by events that occur with extremely small probability. Efficiently discovering and sampling events with probability below a threshold is therefore of critical interest. Yet this task is highly non-trivial using existing classical or quantum methods. Being rare, such events require an immense sampling overhead to collect sufficient data samples. Moreover, because the rare events are not known in advance, they cannot be flagged for amplification using standard techniques. Here, we introduce a quantum algorithm for rare-event discovery and sampling without first learning which events are rare. The algorithm achieves the optimal quantum scaling with the rarity threshold. We further demonstrate that this can achieve a quadratic speedup for heavy-tailed systems whose tail has nonvanishing total mass, and translates into a robust polynomial speedup for stationary stochastic processes, with the exponent determined by its entropy-rate structure.2026-06-04T15:54:53Z36 pages (8+28)Naixu GuoPo-Wei HuangQisheng WangJayne ThompsonPatrick RebentrostMile GuChengran Yanghttp://arxiv.org/abs/2604.05152v2Polynomial and Pseudopolynomial Algorithms for Two Classes of Bin Packing Instances2026-06-04T15:01:40ZThe Cutting Stock Problem (CSP) and Bin Packing Problem (BPP) are classical combinatorial optimization problems extensively studied since the 1960s. State-of-the-art exact algorithms are based on set-cover and arc-flow models whose linear relaxation, rounded up, matches the integer optimum for most benchmark instances, a condition known as the Integer Round-up Property (IRUP). In 2016, Delorme et al. showed that all existing instances could be solved within ten minutes by approaches exploiting this property. This motivated them to introduce two new classes, Augmented IRUP (AI) and Augmented Non-IRUP (ANI), designed to make IRUP less evident to state-of-the-art methods. Although these classes have motivated significant advances over the past decade, 13 out of 500 AI and ANI instances remain unsolved within standard time limits from the literature. In this paper, we show that while AI and ANI are particularly hard for MIP-based methods, the BPP restricted to these classes is not strongly NP-hard. We present polynomial-time algorithms for the AI class and pseudopolynomial-time algorithms for the ANI class, which solve all such instances orders of magnitude faster than previous approaches. They are also straightforward to adapt to the Skiving Stock Problem, the dual counterpart of the CSP. In addition, they can be used as preprocessing routines in exact methods, as their runtime is independent of the instance class, although they are guaranteed to return an optimality status only for instances belonging to the class for which they were designed.2026-04-06T20:28:20ZRenan Fernando Franco da SilvaVinícius Loti de LimaRafael C. S. SchoueryJean-François CôtéManuel Iorihttp://arxiv.org/abs/2606.06145v1Workload-Aware Autotuning of Block Size in Square-Root Decomposition2026-06-04T13:22:11ZThe textbook choice B=sqrt(n) for square-root decomposition is asymptotically natural, but it is not always the fastest implementation choice. We study block-size autotuning as a reproducible algorithm-engineering problem and show that a learned workload model can improve over fixed sqrt(n) on the tested implementation. Under repeated grouped cross-validation, the best policy is a full-feature KNN-9 model that reduces mean regret from 1.2882 to 1.0646 and yields a paired geometric-mean speedup of 1.151x. A confidence gate retains most of that gain while reducing slowdowns. A family-free full-observation follow-up remains better than fixed blocking, which suggests that the model is learning from workload statistics rather than memorizing labels. In contrast, short-prefix variants do not produce a successful low-overhead online tuner in the current prototype. External validation is selective but supportive: Zipf-Hotspot is the strongest out-of-distribution case, and a six-window Baleen follow-up still improves over fixed blocking. Overall, block-size choice is workload aware and platform aware, and the fixed sqrt(n) rule leaves substantial performance on the table.2026-06-04T13:22:11Z14 pages, 6 figuresRuize Zhaohttp://arxiv.org/abs/2506.22728v2Counting Distinct (Non-)Crossing Substrings in Optimal Time2026-06-04T12:55:27ZLet $w$ be a string of length $n$. The problem of counting factors crossing a position -- Problem 64 from the textbook ``125 Problems in Text Algorithms'' [Crochemore, Lecroq, and Rytter, 2021] -- asks to count the number $\mathcal{C}(w,k)$ (resp. $\mathcal{N}(w,k)$) of distinct substrings in $w$ that have occurrences containing (resp. not containing) a position $k$ in $w$. The solutions provided in their textbook compute $\mathcal{C}(w,k)$ and $\mathcal{N}(w,k)$ in $O(n)$ time for a single position $k$ in $w$, and thus a direct application would require $O(n^2)$ time for all positions $k = 1, \ldots, n$ in $w$. Their solution is designed for constant-size alphabets. In this paper, we present new algorithms which compute $\mathcal{C}(w,k)$ in $O(n)$ total time for general ordered alphabets, and $\mathcal{N}(w,k)$ in $O(n)$ total time for linearly sortable alphabets,for all positions $k = 1, \ldots, n$ in $w$. We further derive model-dependent optimal bounds by separating the algorithms into preprocessing and linear-time postprocessing: for $\mathcal{C}$ the preprocessing is run reporting, and for $\mathcal{N}$ it is preprocessing based on longest previous non-overlapping factors (LPnF) and longest next factors (LNF). In particular, all values $\mathcal{C}(w,k)$ can be computed in $O(n\log n)$ time over general unordered alphabets in which direct accesses to alphabet characters are restricted to equality tests, and in $O(n\logσ)$ time in the word RAM model, where $σ$ denotes the number of distinct characters occurring in $w$. For $\mathcal{N}(w,k)$, the equality-testing complexity over general unordered alphabets is $Θ(n^2)$. We also show that our upper bounds are optimal for all of the aforementioned alphabet assumptions and computation models.2025-06-28T02:20:41ZHaruki UmezakiHiroki ShibataDominik KöpplYuto NakashimaShunsuke InenagaHideo Bannaihttp://arxiv.org/abs/2606.05809v1Detecting Large Quasi-cliques on Dynamic Networks2026-06-04T07:39:52ZMotivated by the problem of detecting large and cohesive groups of vertices in real networks, the task of finding large \emph{quasi-cliques} has attracted considerable attention across different research areas. From a computational complexity perspective, strong inapproximability results are known for this problem, yet several heuristics have been proposed to identify large quasi-cliques in real-world networks. Recently, [Pang \emph{et al.}, (WWW 2024)] introduced a similarity-based approach that represents the current state of the art. In this work, we extend that approach to \emph{dynamic} networks, thereby addressing an open problem posed by [Pang \emph{et al.}, (WWW 2024)].
We first present a Baseline fully dynamic algorithm where edges of the network can be both inserted and deleted. The algorithm exactly maintains the same quasi-clique returned by the algorithm by Pang et al. on the current graph, with update time $\widetilde{O}(Δ)$, where $Δ$ is the maximum degree. We then focus on the practically relevant incremental case, where only edge insertions are allowed, and design an algorithm with $O(\log Δ)$ update time. This method leverages a novel technique for dynamically maintaining accurate estimates of vertex $γ$-degrees, a core component of framework by Pang et al., and achieves up to $207\times$ speed-up over the Baseline while preserving comparable solution quality. Finally, we extend the approach to the fully dynamic setting, supporting both insertions and deletions, obtaining up to $21\times$ speed-up with limited and acceptable loss in quasi-clique size and density. We provide a formal analysis of our algorithms and validate them through an extensive set of experiments on real-world datasets.2026-06-04T07:39:52ZLuciano GualàSimone PellegriniLuca Pepè SciarriaAlessandro Straziotahttp://arxiv.org/abs/2606.05765v1PivCo-Huffman2026-06-04T06:46:07ZHuffman encoding has been an enduring technique for 70+ years, ubiquitous in compression algorithms since its invention. In this paper we propose a new approach to Huffman coding, based on a data structure from wavelet trees. The resulting pivot-coded Huffman (PivCo-Huffman) enables high-performance SIMD-friendly encoding and decoding operations. In our tests PivCo-Huffman consistently outperforms state-of-the-art Huffman codecs in decoding throughput. Additionally, we show how ANS-coding can be selectively applied to skewed nodes in this structure, yielding compression ratios approaching those of ANS-based codecs while preserving very high decompression speeds.2026-06-04T06:46:07ZMarcin Zukowski