https://arxiv.org/api/9GCNXO7sbChMsA75S+XHwT5uiYQ2026-06-21T18:51:35Z2901958515http://arxiv.org/abs/2605.04826v1Faster Algorithms for Shortest Unique or Absent Substrings2026-05-06T12:18:33ZWe revisit two well-known algorithmic problems on strings: computing a shortest unique substring (SUS) and a shortest absent substring (SAS) of a string $S$ of length $n$. Both problems admit folklore $\mathcal{O}(n)$-time solutions using the suffix tree of $S$. However, for small alphabets, this complexity is not necessarily optimal in the word RAM model, where a string of length $n$ over alphabet $[0,σ)$ can be stored in $\mathcal{O}(n \log σ/\log n)$ space and read in $\mathcal{O}(n \log σ/\log n)$ time.
We present an $\mathcal{O}(n \log σ/\sqrt{\log n})$-time algorithm for computing a SUS of $S$. This algorithm decomposes the problem according to the length and the period of the sought substring and uses several tools and techniques, such as synchronizing sets, the analysis of runs, and wavelet trees, to reduce the computation of a SUS to a simple geometric problem. Further, we adapt this algorithm and combine it with an efficient construction of de Bruijn sequences in order to obtain an $\mathcal{O}(n \log σ/\sqrt{\log n})$-time algorithm for computing a SAS of $S$.2026-05-06T12:18:33ZSWAT 2026Panagiotis CharalampopoulosManal MohamedSolon P. PissisHilde VerbeekWiktor Zubahttp://arxiv.org/abs/2603.26214v2Optimal b-Colourings and Fall Colourings in $H$-Free Graphs2026-05-06T11:57:15ZIn a colouring of a graph, a vertex is b-chromatic if it is adjacent to a vertex of every other colour. We consider four well-studied colouring problems: b-Chromatic Number, Tight b-Chromatic Number, Fall Chromatic Number and Fall Achromatic Number, which fit into a framework based on whether every colour class has (i) at least one b-chromatic vertex, (ii) exactly one b-chromatic vertex, or (iii) all of its vertices being b-chromatic. By combining known and new results, we fully classify the computational complexity of b-Chromatic Number, Fall Chromatic Number and Fall Achromatic Number in $H$-free graphs. For Tight b-Chromatic Number in $H$-free graphs, we develop a general technique to determine new graphs $H$, for which the problem is polynomial-time solvable, and we also determine new graphs $H$, for which the problem is still NP-complete. We show, for the first time, the existence of a graph $H$ such that in $H$-free graphs, b-Chromatic Number is NP-hard, while Tight b-Chromatic Number is polynomial-time solvable.2026-03-27T09:37:31ZJungho AhnTala Eagling-VoseFelicia LuckeDavid ManloveFabricio MendozaDaniël Paulusmahttp://arxiv.org/abs/2605.04798v1Online Orthogonal Vectors Revisited2026-05-06T11:48:10ZWe prove new upper and lower bounds for the Online Orthogonal Vectors Problem ($\mathsf{OnlineOV}_{n,d}$). In this problem, a preprocessing algorithm receives $n$ vectors $x_1,\ldots,x_n\in\{0,1\}^d$ and constructs a data structure of size $S$. A query algorithm subsequently receives a query vector $q\in\{0,1\}^d$ and in time $T$ decides whether $q$ is orthogonal to any of the input vectors $x_i$.
We design a new deterministic data structure for $\mathsf{OnlineOV}_{n,d}$. In low dimensions ($d = c \log n$), our data structure matches the performance of the best known randomized algorithm due to Chan [SoCG 2017]. Furthermore, in moderate dimensions ($d=n^{\varepsilon}$), we give the first improvement since Charikar, Indyk and Panigrahy [ICALP 2002]. Along the way, we give the first deterministic refutation of a conjecture on the hardness of $\mathsf{OnlineOV}$ posed by Goldstein, Lewenstein and Porat [ISAAC 2017]. This data structure also extends to a number of problems, including Partial Match, Orthogonal Range Search, and DNF Evaluation. We use a novel structure-versus-randomness decomposition to design our algorithm.
Under the Non-Uniform Strong Exponential Time Hypothesis, we also prove arbitrarily large polynomial space lower bounds for any $\mathsf{OnlineOV}$ data structure with sublinear query time even with computationally unbounded preprocessing. These lower bounds extend to several other problems, including Polynomial Evaluation, Partial Match, Orthogonal Range Search, and Approximate Nearest Neighbors. We also prove similar lower bounds for $\mathsf{3-SUM}$ with preprocessing under the Non-Uniform Hamiltonian Path Conjecture.2026-05-06T11:48:10ZKarthik GajulapalliAlexander GolovnevSamuel KingSidhant Saraogihttp://arxiv.org/abs/2604.17546v2Homogeneous Network Caching is Fixed-Parameter Tractable Parameterized by the Number of Caches2026-05-06T09:12:54ZNetwork caching asks how to place contents in distributed caches so that future requests are served close to their users. Ganian, Mc Inerney and Tsigkari recently initiated the parameterized-complexity study of the problem and, for the homogeneous unit-size variant (HomNC), isolated an unresolved family of six parameterizations: by the number of caches $C$, the number of users $U$, $U+K$, $C+U$, $C+λ$, and the vertex-cover number $\text{vc}(G)$, where $K$ is the maximum cache capacity and $λ$ is the maximum number of contents requested with nonzero probability by any user. Their interreducibility theorem showed that these six cases stand or fall together under parameterized reductions, and they conjectured the family to be W[1]-hard. We resolve this conjecture in the opposite direction. We prove that HomNC is fixed-parameter tractable parameterized by $C$ alone, and therefore fixed-parameter tractable for all six parameterizations. Our algorithm is based on an exact $n$-fold integer programming formulation that reveals a nontrivial block structure in homogeneous network caching, with the repeated part depending only on $C$. Standard algorithms for $n$-fold integer programming then yield a running time of the form $f(C)\lvert I\rvert^{O(1)}$.2026-04-19T17:20:06ZJózsef PintérRegina Stanglhttp://arxiv.org/abs/2507.10467v3Colorful Minors2026-05-06T08:10:07ZWe introduce the notion of colorful minors, which generalizes the classical concept of rooted minors in graphs. A $q$-colorful graph is defined as a pair $(G, χ),$ where $G$ is a graph and $χ$ assigns to each vertex a (possibly empty) subset of at most $q$ colors. The colorful minor relation enhances the classical minor relation by merging color sets at contracted edges and allowing the removal of colors from vertices. This framework naturally models algorithmic problems involving graphs with (possibly overlapping) annotated vertex sets. We develop a structural theory for colorful minors by establishing three core theorems characterizing $\mathcal{H}$-colorful minor-free graphs, where $\mathcal{H}$ consists either of a clique or a grid with all vertices assigned all colors, or of grids with colors segregated and ordered on the outer face. Our results reveal that when exclusion is imposed not only on graphs but also to the way colors are distributed in them, a more refined structural landscape appears.
Leveraging our structural insights, we provide a complete classification -- parameterized by the number $q$ of colors -- of all colorful graphs that exhibit the Erdős-Pósa property with respect to colorful minors. On the algorithmic side, we deduce that colorful minor testing is fixed-parameter tractable. Together with the fact that the colorful minor relation forms a well-quasi-order, this implies that every colorful minor-monotone parameter on colorful graphs admits a fixed-parameter algorithm. Furthermore, we derive two algorithmic meta-theorems (AMTs) whose structural conditions are linked to extensions of treewidth and Hadwiger number on colorful graphs. Our results suggest how known AMTs can be extended to incorporate not only the structure of the input graph but also the way the colored vertices are distributed in it.2025-07-14T16:46:29ZEvangelos ProtopapasDimitrios M. ThilikosSebastian Wiederrechthttp://arxiv.org/abs/2512.04614v2On Tight FPT Time Approximation Algorithms for k-Clustering Problems2026-05-06T05:26:58ZFollowing recent advances in combining approximation algorithms with fixed-parameter tractability (FPT), we study FPT-time approximation algorithms for minimum-norm $k$-clustering problems, parameterized by the number $k$ of open facilities.
For the capacitated setting, we give a tight $(3+ε)$-approximation for the general-norm capacitated $k$-clustering problem in FPT-time parameterized by $k$ and $ε$. Prior to our work, such a result was only known for the capacitated $k$-median problem [CL, ICALP, 2019]. As a special case, our result yields an FPT-time $3$-approximation for capacitated $k$-center. The problem has not been studied in the FPT-time setting, with the previous best known polynomial-time approximation ratio being 9 [ABCG, MP, 2015].
In the uncapacitated setting, we consider the $top$-$cn$ norm $k$-clustering problem, where the goal of the problem is to minimize the $top$-$cn$ norm of the connection distance vector. Our main result is a tight $\big(1 + \frac 2{ec} + ε\big)$-approximation algorithm for the problem with $c \in \big(\frac1e, 1\big]$. (For the case $c \leq \frac1e$, there is a simple tight $(3+ε)$-approximation.) Our framework can be easily extended to give a tight $\left(3, 1+\frac2e + ε\right)$-bicriteria approximation for the ($k$-center, $k$-median) problem in FPT time, improving the previous best polynomial-time $(4, 8)$ guarantee [AB, WAOA, 2017].
All results are based on a unified framework: computing a $(1+ε)$-approximate solution using $O\left(\frac{k\log n}ε\right)$ facilities $S$ via LP rounding, sampling a few client representatives $R$ based on the solution $S$, guessing a few pivots from $S \cup R$ and some radius information on the pivots, and solving the problem using the guesses. We believe this framework can lead to further results on $k$-clustering problems.2025-12-04T09:42:15Z35 pages, 1 figures; accepted to ICALP 2026Han DaiShi LiSijin Penghttp://arxiv.org/abs/2501.16039v4Complexity of Constructing Minimal Faithful Permutation Representations for Fitting-free Groups2026-05-06T03:54:25ZIn this paper, we investigate the complexity of computing minimal faithful permutation representations for groups without abelian normal subgroups (a.k.a. Fitting-free groups). When our groups are given as quotients of permutation groups, we exhibit a polynomial-time algorithm for constructing such representations. Furthermore, in the setting of permutation groups, we obtain an $\textsf{NC}$ procedure for computing the minimal faithful permutation degree, and a randomized $\textsf{NC}$ ($\textsf{RNC}$) algorithm for computing a minimal faithful permutation representation. This improves upon the work of Das and Thakkar (STOC 2024, SIAM J. Comput. 2026), who established a Las Vegas polynomial-time algorithm for computing the minimal faithful permutation degree for this class in the setting of permutation groups.2025-01-27T13:34:52ZIn [v3], we computed the minimal faithful permutation degree. For this new version [v4], we also compute a minimal faithful permutation representation. Version [v3] corresponds to our FCT 2025 paperMichael LevetPranjal SrivastavaDhara Thakkarhttp://arxiv.org/abs/2106.05513v2Deterministic Mincut in Almost-Linear Time2026-05-06T03:16:36ZWe present a deterministic (global) mincut algorithm for weighted, undirected graphs that runs in $m^{1+o(1)}$ time, answering an open question of Karger from the 1990s. To obtain our result, we de-randomize the construction of the \emph{skeleton} graph in Karger's near-linear time mincut algorithm, which is its only randomized component. In particular, we partially de-randomize the well-known Benczur-Karger graph sparsification technique by random sampling, which we accomplish by the method of pessimistic estimators. Our main technical component is designing an efficient pessimistic estimator to capture the cuts of a graph, which involves harnessing the expander decomposition framework introduced in recent work by Goranci et al. (SODA 2021). As a side-effect, we obtain a structural representation of all approximate mincuts in a graph, which may have future applications.2021-06-10T06:01:25ZSTOC 2021, 31 pages. Fix technical error in Theorem 1.5 resulting in an $ε^{-7}$ term instead of $ε^{-4}$. Also fix formatting throughout the paperJason Lihttp://arxiv.org/abs/2605.04428v1Submodular Ground-Set Pruning: Monotone Tightness and a Non-Monotone Separation2026-05-06T02:43:13ZLarge-scale subset selection asks for a small useful set of examples, features, sensors, seed users, or context passages from an enormous ground set. Submodular maximization is a canonical model for such diminishing-returns problems, but rapidly growing datasets make even linear-time algorithms ever costlier. We study \emph{containment pruning}: first reduce the ground set to a smaller core $P$, then require that $P$ contain a near-optimal feasible solution for every downstream budget up to~$k$. Prior work has formulated many heuristics, but the theoretical limits of this preprocessing problem are largely unknown. For monotone submodular objectives, we prove that $1-1/e$ is tight: greedy achieves this containment factor, and no algorithm can beat it even with a larger pruning budget. For non-monotone objectives, we give the first$1/2-\varepsilon$ containment algorithms under cardinality constraints and extend the approach to knapsack constraints. This $1/2$ factor exceeds the best known algorithmic ratio and the known hardness threshold for non-monotone maximization, showing that pruning can be provably easier than optimization. Empirically, pruning lets an exact IP solver run on the reduced MaxCut instance with a ${\approx}620\times$ speedup, and proof-of-concept experiments on LLM context selection demonstrate the utility of non-monotone submodular proxies and our proposed containment algorithms.2026-05-06T02:43:13Z39 pages, 0 figuresAlan Kuhnlehttp://arxiv.org/abs/2603.11379v2Induced Minors and Coarse Tree Decompositions2026-05-06T01:07:43ZLet $G$ be a graph, $S \subseteq V(G)$ be a vertex set in $G$ and $r$ be a positive integer. The distance $r$-independence number of $S$ is the size of the largest subset $I \subseteq S$ such that no pair $u$, $v$ of vertices in $I$ have a path on at most $r$ edges between them in $G$. It has been conjectured [Chudnovsky et al., arXiv, 2025] that for every positive integer $t$ there exist positive integers $c$, $d$ such that every graph $G$ that excludes both the complete bipartite graph $K_{t,t}$ and the grid $\boxplus_t$ as an induced minor has a tree decomposition in which every bag has (distance $1$) independence number at most $c(\log n)^d$. We prove a weaker version of this conjecture where every bag of the tree decomposition has distance $16(\log n + 1)$-independence number at most $c(\log n)^d$. On the way we also prove a version of the conjecture where every bag of the decomposition has distance $8$-independence number at most $2^{c (\log n)^{1-(1/d)}}$.2026-03-11T23:46:09ZMaria ChudnovskyJulien CodsiAjaykrishnan E SDaniel Lokshtanovhttp://arxiv.org/abs/2506.14734v6Compressing Suffix Trees by Path Decompositions2026-05-05T22:17:18ZThe suffix tree is arguably the most fundamental data structure on strings: introduced by Weiner (SWAT 1973) and McCreight (JACM 1976), it allows solving a myriad of computational problems on strings in linear time. Motivated by its large space usage, subsequent research focused first on reducing its size by a constant factor via Suffix Arrays, and later on reaching space proportional to the size of the compressed string. Modern compressed indexes, such as the $r$-index (Gagie et al., SODA 2018), fit in space proportional to $r$, the number of runs in the Burrows-Wheeler transform (a strong and universal repetitiveness measure). These advances, however, came with a price: while modern compressed indexes boast optimal bounds in the RAM model, they are often orders of magnitude slower than uncompressed counterparts in practice due to catastrophic cache locality. This reality gap highlights that Big-O complexity in the RAM model has become a misleading predictor of real-world performance, leaving a critical question unanswered: can we design compressed indexes that are efficient in the I/O model of computation?
We answer this in the affirmative by introducing a new Suffix Array sampling technique based on particular path decompositions of the suffix tree. We prove that sorting the suffix tree leaves by specific priority functions induces a decomposition where the number of distinct paths (each corresponding to a string suffix) is bounded by $r$. This allows us to solve indexed pattern matching efficiently in the I/O model using a Suffix Array sample of size at most $r$, strictly improving upon the (tight) $2r$ bound of Suffixient Arrays, another recent compressed Suffix Array sampling technique.2025-06-17T17:15:08Zaccepted at ICALP 2026Ruben BeckerDavide CenzatoTravis GagieSung-Hwan KimRagnar Groot KoerkampGiovanni ManziniNicola Prezzahttp://arxiv.org/abs/2501.12549v2An O(log n)-Approximation Algorithm for (p,q)-Flexible Graph Connectivity via Independent Rounding2026-05-05T22:12:38ZIn the Flexible Graph Connectivity (FGC) problem, we are given an undirected multigraph on $n$ vertices with nonnegative edge costs, where each edge is classified as either safe or unsafe. Given integer parameters $p$ and $q$, the goal in $(p,q)$-FGC is to purchase a minimum-cost set of edges such that the resulting spanning subgraph remains $p$-edge-connected after the removal of any set of up to $q$ unsafe edges.
Our main contribution is an $O(\log n)$-approximation algorithm based on independent rounding, improving the previous best approximation ratio of $O(q \log n)$. Central to our approach is a new linear programming formulation of feasible solutions that encodes knapsack cover inequalities as cut-capacity constraints. Unlike prior work, the capacity of an edge in a cut may depend on the partially purchased solution for this cut. We show that the resulting linear program admits a polynomial-time separation oracle. Scaling the fractional solution by $Θ(\log n)$ and applying independent rounding yields a feasible integral solution with constant probability; here, we leverage the knapsack cover inequalities to obtain strong concentration bounds for the rounded solution relative to any given partial solution. A key ingredient in both separation and rounding is the use of Karger's bound on the number of near-minimum cuts.
We also extend the $(p,q)$-FGC problem to model more than two safety tiers and show that our results and techniques extend naturally to this setting, albeit with increased approximation ratios and running times that scale with the number of tiers.2025-01-22T00:03:25Z23 pages. A preliminary version appeared in the proceedings of the 26th International Conference on Integer Programming and Combinatorial Optimization (IPCO 2025)Sharat IbrahimpurLászló A. Véghhttp://arxiv.org/abs/2605.04258v1Constructing Suffixient Arrays Revisited2026-05-05T19:49:38ZRecently, Cenzato et al.\ proposed a new text index, called the \emph{suffixient array}, which is a subset of the suffix array and supports locating a single pattern occurrence or finding its maximal exact matches (MEMs), assuming random access to the input text $T[1..n]$ is available. They show that, given the suffix array, the longest common prefix array, and the Burrows--Wheeler transform (BWT) of the reverse of $T[1..n]$ over an alphabet $\{1,\ldots,σ\}$, a suffixient array can be constructed in linear time. However, their construction algorithms require multiple scans of these arrays. When restricted to a single pass over the arrays, they present an alternative construction algorithm running in $O(n + \overline{r} \log σ)$ time, where $\overline{r}$ is the number of runs in the BWT of the reversed text. In this paper, we present a new one-pass algorithm that constructs a suffixient array in linear time under the standard RAM model.2026-05-05T19:49:38ZTo appear at CPM2026Paola BonizzoniYounan GaoBrian Riccardihttp://arxiv.org/abs/2605.04244v1Faster Iterative $φ$ Queries on the Positional BWT2026-05-05T19:31:05ZThe Positional Burrows-Wheeler Transform (PBWT) is a fundamental data structure for the efficient representation and analysis of large-scale haplotype panels. For a panel of $h$ sequences $\{S_1, \dots, S_h\}$ over $m$ sites, a key operation is the $φ_j(i)$ query, which returns the haplotype index immediately preceding $S_i$ in co-lexicographic order at site $j$. Efficient support for $k$ iterative queries $φ^1, \dots, φ^k$ is essential for haplotype matching and variation analysis.
In this work, we introduce a simple and novel decomposition scheme that decomposes each haplotype row into sub-intervals, called refined segments, within which a haplotype's co-lexicographic predecessor for the sites remains unchanged. We show that refined segments satisfy two key properties: (i) each segment $[b,e]$ associated with $S_i$ overlaps with at most a constant number of segments of $S_{φ_e(i)}$, and (ii) the total number of segments is bounded by $O(\tilde{r} + h)$, where $\tilde{r}$ denotes the number of runs in the PBWT. Building on this decomposition, we present two space-time tradeoffs for supporting $k$ iterative $φ$ queries: (i) a structure using $O((\tilde{r} + h)\log n)$ bits of space that answers $k$ iterative queries in $O(\log \log_w \min(m,h) + k)$ time, where $n = m \cdot h$, and (ii) a more compact structure using $O(\tilde{r} \log h + h \log n)$ bits of space that supports queries in $O(k \log \log_w h)$ time.
Prior to our work, supporting these queries required $O((\tilde{r} + h)\log n)$ bits of space and $O(k \cdot \log \log_w m)$ time. Our second tradeoff is expected to be effective in practice for modern genomic datasets, where the number $h$ of haplotypes is typically much smaller than the number $m$ of sites.2026-05-05T19:31:05ZPaola BonizzoniTravis GagieYounan Gaohttp://arxiv.org/abs/2605.04183v1Nearly-Tight Bounds for Zonotope Containment and Beyond2026-05-05T18:24:10ZWe investigate the convex-body containment problem $\max\{s >0 : s Z \subseteq Q\}$, where the outer body $Q \subseteq \mathbb R^d$ is described by a membership oracle and the inner body $Z \subseteq \mathbb R^d$ is a zonotope. Our main result is a sampling-based $O(\sqrt{d})$-approximation algorithm for this problem that almost matches the lower bound of $Ω(\sqrt{d/\log d})$ by Khot and Naor in the oracle model. Assuming zonotopes can be sparsified by a linear number of generators, which is referred to as Talagrand conjecture, our approach attains the optimal approximation factor of $Θ(\sqrt{d/\log d})$. Our second main result is a proof of Talagrand's conjecture for $Δ$-modular zonotopes whenever $Δ$ is constant. Those zonotopes are of the form $Z = \{ Wx \colon \| x\|_\infty \leq 1\}$ where the non-zero $d \times d$ sub-determinants of $W$ are between $1$ and $Δ$. This result establishes a connection between zonoid sparsification and spectral sparsification of Batson, Spielman and Srivastava. We complement these results with a universal $Ω(\sqrt{d/\log d})$ lower bound holding for all zonotopes.
Finally, we consider containment problems $\max\{s >0 : s K \subseteq Q\}$, for general convex bodies $K \subseteq \mathbb R^d$. A result of Naszódi on approximating $K \subseteq \mathbb R^d$ by a polytope implies a $Θ(d/\log d)$ approximation algorithm in polynomial time. We show the tightness of this approximation factor in the oracle model via a reduction to the circumradius computation. Our lower bound holds for centrally symmetric convex sets, implying that Barvinok's optimal $O(\sqrt{d})$-approximation of a centrally symmetric convex body by a polytope with a polynomial number of vertices cannot be computed in polynomial time.2026-05-05T18:24:10ZFriedrich EisenbrandThomas RothvossMatteo RussoRuben Skorupinski