https://arxiv.org/api/MVkLDLhc/ddw8VWCLh7knqdVPB42026-06-18T08:41:46Z2901319515http://arxiv.org/abs/2606.02752v1Online K-d tree for approximate neighborhood search in data streams2026-06-01T18:18:29ZThe k-Nearest Neighbors (kNN) algorithm has long been widely used in Machine Learning (ML) applications. However, the main concern when using it is the computational cost required for neighborhood search, which can make it unfeasible for large-scale applications. Optimization algorithms, such as the K-d tree, become an option in such scenarios. Under data streams, it can be challenging to maintain the properties of the K-d tree, as it requires inserting and deleting nodes on the fly. These operations can make maintaining the tree's balance and invariants difficult. Additionally, traditional K-d trees were initially designed for Minkowski-based distance functions. In this work, we describe an Online K-d tree and its adaptation to the Canberra distance that supports dynamic updates over data streams while preserving the structural invariants required for efficient traversal. Experimental analysis demonstrates that the Online K-d tree algorithm achieves faster processing time under data streams, and that adapting to the Canberra distance enabled effective subtree pruning, as evidenced by a minor loss in average accuracy and a substantial gain in instances processed per second. Our implementation can be found in our GitHub repository2026-06-01T18:18:29ZPaper accepted to the ICPRAI 2026Eduardo V. L. BarbozaRobert SabourinRafael M. O. Cruzhttp://arxiv.org/abs/2606.02492v1$O(n +f(k))$: Truly Linear FPT2026-06-01T17:01:31ZParameterized complexity has always been concerned with practical computing: by confining combinatorial explosion to a secondary parameter $k$, one can uncover why and how many NP-hard problems are effectively tackled in practice. Today, however, the scale of data has changed: scientists study Big Data, which is so large that even quadratic dependence in the total input size $n$ is unaffordable. Therefore, what constitutes a practical algorithm has also changed. Classically, parameterized complexity is blind to the difference between defining fixed parameter tractability multiplicatively (i.e. $f(k) \cdot n^c$) or additively (i.e. $f(k) + n^c$). But what if the constant $c$ is one and we require true linearity, is this distinction still inconsequential? Here, we define and explore Truly Linear FPT (TLFPT) -- that is $O(n)+f(k)$ -- and show that it is a strict subset of Linear FPT (LFPT) -- that is $O(n) \cdot f(k)$ -- via diagonalization. Populating TLFPT requires careful consideration of linear-time algorithmics and data structures. We meet many inhabitants of TLFPT: SAT, Vertex Cover, Min-Max Matching, $(n-k)$-Coloring, Diverse Pair of Matchings, $k$-Path, and $H$-Coloring. Our parameterizations are equally varied. Beyond classical parameters like solution size, we leverage two parameters, treedepth and BFS-width, which are particularly well-suited to the TLFPT regime. We do so by developing techniques based on depth- and breadth-first search. For parameterized complexity to be of service to the scientific community, we need to contend with Big Data. For sufficiently large inputs, FPT beyond linear may not suffice. Thus, there is a practical and theoretical need for more ambitious goals. TLFPT is a first step forward.2026-06-01T17:01:31Z42 pages, 5 figuresBenjamin Merlin BumpusRod DowneyTala Eagling-VoseJessica EnrightMichael R. FellowsDavid C. KutnerLaura Larios-JonesBarnaby MartinFrances RosamondElla Yateshttp://arxiv.org/abs/2606.02325v1Terminal Steiner tree problem : Complexity and Algorithms2026-06-01T14:37:12ZGiven a connected graph $G$ and a terminal set $R \subseteq V(G)$, the Steiner tree problem (ST) asks for a tree that spans all of $R$ with at most $r$ vertices from $V(G)\backslash R$, for some integer $r\geq 0$. It is known from (Garey et al.,1977 ) that ST is NP-complete. A Steiner tree in which all terminal vertices are constrained to be leaves is called a terminal Steiner tree. Our study addresses the existence of a terminal Steiner tree, its complexity across various graph classes, black-box applications of the ST, and a fixed-parameter tractable (FPT) algorithm with respect to the number of terminals.2026-06-01T14:37:12ZJyothish SSadagopan Narasimhanhttp://arxiv.org/abs/2606.02263v1Exact Sampling of Permutations with a Fixed Longest Increasing Subsequence2026-06-01T13:50:04ZWe study exact uniform sampling of permutations of length $n$ whose longest increasing subsequence (LIS) has prescribed length $k$. For $k \in Θ(n)$, we give a direct rejection sampler whose expected running time is $O(n\log\log n)$ in the word-RAM model. The sampler uses an expanded proposal space consisting of permutations together with a specified increasing subsequence, and accepts exactly those proposals whose specified subsequence is the leftmost LIS. For arbitrary $1\le k\le n$, we give an exact sampler based on the Robinson--Schensted correspondence. The algorithm samples the corresponding Plancherel-conditioned shape by computing exact completion counts via determinant identities, and then samples two uniform tableaux of that shape. The direct implementation runs in $\tilde O(n^4k^5)$ expected time. We then show that the same sampler can be implemented in expected $\tilde O(n^3k^4)$ time by evaluating a determinant oracle through Hankel moment matrices.2026-06-01T13:50:04ZPeter CliffordRaphaël Cliffordhttp://arxiv.org/abs/2606.02183v1Efficiently Listing Projected Trees, and Equivalence of Listing and Enumeration2026-06-01T12:38:26ZThe subgraph isomorphism problem and its generalizations such as conjunctive queries, where some nodes are projected, are among the most fundamental problems in graph algorithms and database theory. In this paper, we study the listing and enumeration variants of these problems and present two main results.
(1) We present the first algorithms for enumerating projected trees with polynomial preprocessing time ($\widetilde{O}(n^{17.42})$) and polylogarithmic delay ($\mathrm{polylog}(n)$). Prior to this work, all algorithms in the literature required time $Ω(n^{Ω(k)} + t)$ or $t \cdot n^{Ω(1)}$ to list all copies of a $k$-node tree with projections, where $t$ is the number of solutions. Our result generalizes to arbitrary projected hypergraphs, achieving enumeration in preprocessing time $\widetilde{O}(m^{17.42 \cdot \mathrm{subw}(H)})$ and polylogarithmic delay, where $\mathrm{subw}(H)$ is the submodular width of the pattern hypergraph $H$. We heavily rely on fast (rectangular and output-sensitive) matrix multiplication, which we complement by fine-grained lower bounds indicating that any algorithm beating time $Ω(n^{Ω(k)} + t)$ must rely on fast matrix multiplication.
(2) As our second main result, we present a generic enumeration-to-listing reduction, establishing that listing and enumeration are equivalent under natural assumptions. For (colored) subgraph isomorphism, our reduction transforms any listing algorithm running in time $O(f(n,m) + t \cdot g(n,m))$ into an enumeration algorithm with preprocessing time $O((f(n,m)+g(n,m)+m) \log^2 n)$ and delay $O(g(n,m))$. We utilize this equivalence as a tool for proving our first main result, and we expect that our generic reduction will find many future applications.2026-06-01T12:38:26ZKarl BringmannNick FischerYanheng Wanghttp://arxiv.org/abs/2606.02029v1The Completion-Threshold Framework for Obligatory-Test Scheduling on Multiple Machines2026-06-01T10:17:47ZWe study online scheduling with obligatory testing on $m$ identical machines with the objective of minimizing the sum of completion times. In this model, every job must undergo a test before its actual processing time is revealed. Consequently, the central algorithmic challenge is no longer whether to acquire information, but how to optimally balance machine capacity between revealing unknown jobs and processing currently known ones. While this tradeoff becomes structurally richer in the multiple-machine setting, the only prior explicit deterministic lower bound for this objective was $\sqrt{2}$, established strictly for a single machine in 2024 by Dogeas et al. [ESA 2024: 48:1-48:14]. Our core conceptual contribution is demonstrating that completion-threshold quantities, denoted $T_X$, serve as the fundamental analytical metric for this setting. Because every completed job must first pass through the testing phase, delayed revelation inherently forces delayed completion. By bounding these $T_X$ thresholds, we systematically derive strong lower bounds on the total completion time. Utilizing this framework, we establish the first substantial deterministic lower bounds for multiple machines, including a three-type bound of $1.4811$ and a multi-type dyadic construction that asymptotically approaches $3/2$. Finally, we complement these theoretical limits with a deterministic $2$-competitive list-scheduling algorithm for arbitrary test times.2026-06-01T10:17:47ZKao-Chuan LiangYa-Chun Lianghttp://arxiv.org/abs/2411.12438v2Dimension Reduction via Sum-of-Squares and Improved Clustering Algorithms for Non-Spherical Mixtures2026-06-01T08:02:06ZWe develop a new approach for clustering non-spherical (i.e., arbitrary component covariances) Gaussian mixture models via a subroutine, based on the sum-of-squares method, that finds a low-dimensional separation-preserving projection of the input data. Our method gives a non-spherical analog of the classical dimension reduction, based on singular value decomposition, that, among several other applications, forms a key component of the celebrated spherical clustering algorithm of Vempala and Wang [VW04].
As applications, we obtain an algorithm to (1) cluster an arbitrary total-variation separated mixture of $k$ centered (i.e., zero-mean) Gaussians with $n\geq \operatorname{poly}(d) f(w_{\min}^{-1})$ samples and $\operatorname{poly}(n)$ time, and (2) cluster an arbitrary total-variation separated mixture of $k$ Gaussians with identical but arbitrary unknown covariance with $n \geq d^{O(\log w_{\min}^{-1})} f(w_{\min}^{-1})$ samples and $n^{O(\log w_{\min}^{-1})}$ time. Here, $w_{\min}$ is the minimum mixing weight of the input mixture, and $f$ does not depend on the dimension $d$. Our algorithms naturally extend to tolerating a dimension-independent fraction of arbitrary outliers. Before this work, the techniques in the state-of-the-art non-spherical clustering algorithms needed $d^{O(k)} f(w_{\min}^{-1})$ samples and time for clustering such mixtures.
Our results may come as a surprise in the context of the $d^{Ω(k)}$ statistical query and sum-of-squares lower bounds [DKS17, DKPP24] for clustering non-spherical Gaussian mixtures. While these results are usually thought to rule out $d^{o(k)}$ cost algorithms for the problem, our results show that the lower bounds can in fact be circumvented for a remarkably general class of Gaussian mixtures.2024-11-19T11:58:51Z67 pages, updated to match camera-ready version at COLT 2026Prashanti AndersonMitali BafnaRares-Darius BuhaiPravesh K. KothariDavid Steurerhttp://arxiv.org/abs/2606.01809v1A Near-Optimal Offline Algorithm for Dynamic All-Pairs Shortest Paths in Planar Digraphs2026-06-01T07:27:10ZIn the planar, dynamic All-Pairs Shortest Paths (APSP) problem, a planar, weighted digraph $G$ undergoes a sequence of edge weight updates and the goal is to maintain a data structure on $G$, that can quickly answer distance queries between any two vertices $x,y \in V(G)$.
The currently best algorithms for this problem require $\tilde{O}(n^{2/3})$ worst-case update and query time, while conditional lower bounds show that either update or query time $n^{0.5-δ}$ is needed for any constant $δ> 0$.
In this article, we present the first algorithm with near-optimal $\tilde{O}(\sqrt{n})$ worst-case update and query time for the offline setting, where the update sequence is given initially. This result is obtained by giving the first offline dynamic algorithm for maintaining dense distance graphs (DDGs) faster than recomputing from scratch after each update.
Further, we also present an \emph{online} algorithm for the incremental APSP problem with $\tilde{O}(\sqrt{n})$ worst-case update/ query time. This allows us to reduce the online dynamic APSP problem to the online decremental APSP problem, which constitutes partial progress even for the online version of this notorious problem.2026-06-01T07:27:10ZAppeared in SODA'2022Debarati DasMaximilian Probst GutenbergChristian Wulff-Nilsenhttp://arxiv.org/abs/2606.01693v1Scalable Concurrent Queues for GPU2026-06-01T04:57:32ZConcurrent queues can significantly impact supercomputing performance by being critical bottlenecks for task distribution, load balancing, and resource utilization. As HPC systems move beyond 10-million processor cores, the ability to rapidly move items between producer and consumer threads without excessive locking is essential for efficient queues, preventing idle cores, maximizing utilization, and achieving high parallel speedup. While concurrent queues are well studied on CPUs, they remain largely unexplored on modern GPUs, where SIMT execution, massive parallelism, and atomic contention reshape the design space. We present three linearizable GPU concurrent queues spanning from lock-free to wait-free guarantees: (1) G-WFQ-YMC, an adaptation of Yang and Mellor-Crummey's wait-free queue using preallocated segments; (2) G-LFQ, a bounded lock-free queue that uses wave-batched fast paths to maximize throughput; and (3) G-WFQ, a bounded wait-free queue that packs shared state into 64-bit compare-and-swap operations while preserving linearizability and bounded memory.2026-06-01T04:57:32Z10 pages, 5 figuresPratheek Prakash ShettyThomas R. W. ScoglandWu-chun Fenghttp://arxiv.org/abs/2601.21237v2Characterizing the Effect of Noise in Language Generation in the Limit2026-06-01T03:47:51ZKleinberg and Mullainathan recently proposed a formal framework for studying the phenomenon of language generation, called language generation in the limit. In this model, an adversary gives an enumeration of example strings from an unknown target language, and the algorithm is tasked with correctly generating unseen strings from the target language within finite time. Refined notions of non-uniform and uniform generation were later introduced by Li, Raman, and Tewari (2025), and a noisy model was introduced by Raman and Raman (2025), which allows the adversary to insert extraneous strings. A natural question in the noisy model is to quantify the effect of noise, by studying the impact of each additional extraneous string. We show two complementary results in this setting. We first show that for both uniform and non-uniform generation, a single noisy string strictly reduces the set of collections that can be generated, thus answering an open question in Raman and Raman (2025). Then, we show for both uniform and non-uniform generation that generation with a single noisy string is equivalent to generation with any finite amount of noise, sharply contrasting with the strict hierarchy for noisy generation in the limit shown by Bai, Panigrahi, and Zhang (2026). Finally, we leverage our previous results to provide the first known characterization for non-uniform noise-dependent generatability.2026-01-29T03:58:40ZICML 2026Aaron LiIan Zhanghttp://arxiv.org/abs/2601.18115v2Robust Learning of a Group DRO Neuron2026-05-31T19:44:09ZWe study the problem of learning a single neuron under standard squared loss in the presence of arbitrary label noise and group-level distributional shifts, for a broad family of covariate distributions. Our goal is to identify a ''best-fit'' neuron parameterized by $\mathbf{w}_*$ that performs well under the most challenging reweighting of the groups. Specifically, we address a Group Distributionally Robust Optimization problem: given sample access to $K$ distinct distributions $\mathcal p_{[1]},\dots,\mathcal p_{[K]}$, we seek to approximate $\mathbf{w}_*$ that minimizes the worst-case objective over convex combinations of group distributions $\boldsymbolλ \in Δ_K$, where the objective is $\sum_{i \in [K]}λ_{[i]}\,\mathbb E_{(\mathbf x,y)\sim\mathcal p_{[i]}}(σ(\mathbf w\cdot\mathbf x)-y)^2 - νd_f(\boldsymbolλ,\frac{1}{K}\mathbf1)$ and $d_f$ is an $f$-divergence that imposes (optional) penalty on deviations from uniform group weights, scaled by a parameter $ν\geq 0$. We develop a computationally efficient primal-dual algorithm that outputs a vector $\widehat{\mathbf w}$ that is constant-factor competitive with $\mathbf{w}_*$ under the worst-case group weighting. Our analytical framework directly confronts the inherent nonconvexity of the loss function, providing robust learning guarantees in the face of arbitrary label corruptions and group-specific distributional shifts. The implementation of the dual extrapolation update motivated by our algorithmic framework shows promise on LLM pre-training benchmarks.2026-01-26T04:00:53ZGuyang CaoShuyao LiSushrut KarmalkarJelena Diakonikolashttp://arxiv.org/abs/2411.15363v4The Polymatroid Representation of a Greedoid, and Associated Galois Connections2026-05-31T19:37:15ZA greedoid is a generalization of a matroid allowing for more flexible analyses and modeling of combinatorial optimization problems. However, these structures decimate many matroid properties contributing to their pervasive nature. A polymatroid greedoid [KL85] presents an interesting middle ground, so we further develop this class. First we prove every local poset greedoid for which the greedy algorithm correctly solves linear optimizations over its basic words must have a polymatroid representation. For this, we use relationships between the lattices of greedoid flats and closed sets of a polymatroid to generalize concepts in [KL85]. Then, we show our generalization is defined by a Galois connection between the greedoid flats and closed sets of a representation. Finally, we apply this duality to identify a subclass of polymatroid greedoids with favorable properties, which we call strong polymatroid greedoids. As technical tools for our analyses, we introduce optimism and the Forking Lemma for interval greedoids. Both are pervasive in our work, and are of independent interest.2024-11-22T22:12:35Z38 pages, 8 figures, 4 appendices. In v1 there is an error in the proof of the main claim of an alternative description of polymatroid greedoids. This was noted in v2, and corrected in v3 by changing the main results to remove errors. In v4, some proofs were simplified by removing technical lemmas, and other minor improvements were madeRobert P. StreitVijay K. Garghttp://arxiv.org/abs/2606.01333v1Adversarial Configurations for the ReCom Transition Function2026-05-31T16:35:51ZReCom is a leading Markov Chain Monte Carlo algorithm for sampling balanced graph partitions in computational redistricting. At each step, its transition function proposes a new partition by merging two adjacent districts and if possible re-splitting the conjoined region. The transition function is efficient in practice, however, it is unknown whether it is guaranteed to run in polynomial time. In this report we exhibit an explicit family of 3-partitions on planar square grid graphs from which ReCom requires an exponentially large expected number of steps to re-split the graph (even if we admit approximately balanced splits), showing that in the worst case ReCom does not run in polynomial time. Notably, this result implies that ReCom is not technically rapidly mixing (if started from an adversarial configuration, ReCom requires exponential many steps to reach the stationary distribution).2026-05-31T16:35:51ZMicah Goldhttp://arxiv.org/abs/2606.01330v1On Thin Perfect Matchings up to Polylogarithmic Factors2026-05-31T16:31:52ZWe resolve the thin matching problem proposed by Anari, Charikar and Ramakrishnan [ACR23] up to polylogarithmic factors. Given a fractional perfect matching $x$, we say a perfect matching $M$ is $α$-thin w.r.t. $x$ if for any cut $(S,\overline{S})$, we have $$ |M \cap E(S,\overline{S})| \leq α\cdot x(S,\overline{S}).$$ [ACR23] conjectured that for any fractional perfect matching $x$, there exists a perfect matching $M$ which is $O(1)$-thin w.r.t. $x$.
First, we show that if $M$ is restricted to be in the support of $x$, then $α\geq Ω(n)$ and we complement this by designing an efficient algorithm that outputs an $O(n\log n)$-thin perfect matching where $n$ is the number of vertices.
Then, we relax this constraint and show that for any fractional perfect matching $x$, there is a perfect matching $M$ (which is not necessarily in the support of $x$) such that $M$ is $\text{polylog}(n)$-thin w.r.t. $x$. All results work for both bipartite and non-bipartite graphs. We also discuss applications to the metric distortion problem.2026-05-31T16:31:52ZAlireza HaqiShayan Oveis Gharanhttp://arxiv.org/abs/2606.01309v1Multiagent Matroid Upgrading: Greedy is Fair and Efficient2026-05-31T15:57:38ZThis paper introduces a general multiagent matroid upgrading problem that models a broad class of real-world resource allocation tasks. In this setting, there are multiple agents and a ground set of elements, where each element is assigned to a specific agent and has two associated costs: a default cost and a reduced (upgraded) cost. Upgrading an element lowers its cost to the upgraded value, while non-upgraded elements retain their default costs. Each agent is associated with its own matroid, with the goal of finding a minimum-cost basis. The central task is to select at most k elements to upgrade so as to minimize a non-decreasing convex function over the agents' minimum basis costs, capturing both efficiency and fairness objectives in multiagent systems.2026-05-31T15:57:38ZAppeared in AAMAS 2026Qingwen MaChao PengChangfeng XuChenyang XuRuilong Zhang