A Dividing Line for Structural Kernelization of Component Order Connectivity via Distance to Bounded Pathwidth

2026-03-23T17:36:38Z

In this work we study a classic generalization of the Vertex Cover (VC) problem, called the Component Order Connectivity (COC) problem. In COC, given an undirected graph $G$, integers $d \geq 1$ and $k$, the goal is to determine if there is a set of at most $k$ vertices whose deletion results in a graph where each connected component has at most $d$ vertices. When $d=1$, this is exactly VC. This work is inspired by polynomial kernelization results with respect to structural parameters for VC. On one hand, Jansen & Bodlaender [TOCS 2013] show that VC admits a polynomial kernel when the parameter is the distance to treewidth-$1$ graphs, on the other hand Cygan, Lokshtanov, Pilipczuk, Pilipczuk & Saurabh [TOCS 2014] showed that VC does not admit a polynomial kernel when the parameter is distance to treewidth-$2$ graphs. Greilhuber & Sharma [IPEC 2024] showed that, for any $d \geq 2$, $d$-COC cannot admit a polynomial kernel when the parameter is distance to a forest of pathwidth $2$. Here, $d$-COC is the same as COC only that $d$ is a fixed constant not part of the input. We complement this result and show that like for the VC problem where distance to treewidth-$1$ graphs versus distance to treewidth-$2$ graphs is the dividing line between structural parameterizations that allow and respectively disallow polynomial kernelization, for COC this dividing line happens between distance to pathwidth-$1$ graphs and distance to pathwidth-$2$ graphs. The main technical result of this work is that COC admits a polynomial kernel parameterized by distance to pathwidth-$1$ graphs plus $d$.

Isolation critical graphs under multiple edge subdivision

2026-03-23T16:58:19Z

This paper introduces the notion of an $(ι,q)$-critical graph. The isolation number of a graph $G$, denoted by $ι(G)$ and also known as the vertex-edge domination number of $G$, is the size of a smallest subset $D$ of the vertex set of $G$ such that the subgraph induced by the set of vertices that are not in the closed neighbourhood of $D$ has no edges. A graph $G$ is $(ι,q)$-critical if every subdivision of $q$ edges of $G$ gives a graph whose isolation number is greater than $ι(G)$, and $G$ has $q-1$ edges such that subdividing them gives a graph whose isolation number is $ι(G)$. We show that an $(ι,q)$-critical graph exists for every integer $q \ge 1$. We prove that if $G$ is a connected $m$-edge non-star graph, then $G$ is $(ι,q)$-critical for some $q \le m - 1$. We show that this bound is best possible. We provide a general characterization of $(ι,1)$-critical graphs as well as a constructive characterization of $(ι,1)$-critical trees, demonstrating that $(ι,1)$-criticality can be checked in linear time for trees.

Stable Algorithms Lower Bounds for Estimation

2026-03-23T16:50:24Z

In this work, we show that for all statistical estimation problems, a natural MMSE instability (discontinuity) condition implies the failure of stable algorithms, serving as a version of OGP for estimation tasks. Using this criterion, we establish separations between stable and polynomial-time algorithms for the following MMSE-unstable tasks (i) Planted Shortest Path, where Dijkstra's algorithm succeeds, (ii) random Parity Codes, where Gaussian elimination succeeds, and (iii) Gaussian Subset Sum, where lattice-based methods succeed. For all three, we further show that all low-degree polynomials are stable, yielding separations against low-degree methods and a new method to bound the low-degree MMSE. In particular, our technique highlights that MMSE instability is a common feature for Shortest Path and the noiseless Parity Codes and Gaussian subset sum. Last, we highlight that our work places rigorous algorithmic footing on the long-standing physics belief that first-order phase transitions--which in this setting translates to MMSE-instability impose fundamental limits on classes of efficient algorithms.

Optimal-Time Move Structure Balancing and LCP Array Computation from the RLBWT

2026-03-23T16:12:19Z

On repetitive text collections of size $n$, the Burrows-Wheeler Transform (BWT) tends to have relatively fewer runs $r$ in its run-length encoded BWT (RLBWT). This motivates many RLBWT-related algorithms and data structures that can be designed in compressed $O(r)$-space. These approaches often use the RLBWT-derived permutations LF, FL, $φ$, and $φ^{-1}$, which can be represented using a move structure to obtain optimal $O(1)$-time for each permutation step in $O(r)$-space. They are then used to construct compressed space text indexes supporting efficient pattern matching queries. However, move structure construction in $O(r)$-space requires an $O(r \log r)$-time balancing stage. The longest common prefix array (LCP) of a text collection is used to support pattern matching queries and data structure construction. Recently, it was shown how to compute the LCP array in $O(n + r \log r)$-time and $O(r)$ additional space from an RLBWT. However, the bottleneck remains the $O(r \log r)$-time move structure balancing stage. In this paper, we describe an optimal $O(r)$-time and space algorithm to balance a move structure. This result is then applied to LCP construction from an RLBWT to obtain an optimal $O(n)$-time algorithm in $O(r)$-space in addition to the output, which implies an optimal-time algorithm for LCP array enumeration in compressed $O(r)$-space.

On the Complexity of Fundamental Problems for DAG-Compressed Graphs

2026-03-23T14:57:34Z

A DAG compression of a (typically dense) graph is a simple data structure that stores how vertex clusters are connected, where the clusters are described indirectly as sets of reachable sinks in a directed acyclic graph (DAG). They generalize tree compressions, where the clusters form a tree-like hierarchy, and we give the first proof that DAG compressions can achieve better compressions than tree compressions. Our interest in DAG compression stems from the fact that several simple standard algorithms, like breadth-first search on graphs, can be implemented so that they work directly on the compressed rather than on the original graph and so that, crucially, the runtime is relative to the (typically small) size of the compressed graph. We add another entry to the list of algorithms where this is possible, by showing that Kruskal's algorithm for computing minimum spanning trees can be adapted to work directly on DAG compressions. On the negative side, we answer the central open problem from previous work, namely how hard it is to compute a minimum-size DAG compression for a given graph: This is NP-hard; and this is even the case for the dynamic setting, where we must update the DAG compression optimally when a single edge is added or deleted in the input graph.

Approximate Butterfly Counting in Sublinear Time

2026-03-23T11:01:44Z

Bipartite graphs serve as a natural model for representing relationships between two different types of entities. When analyzing bipartite graphs, butterfly counting is a fundamental research problem that aims to count the number of butterflies (i.e., 2x2 bicliques) in a given bipartite graph. While this problem has been extensively studied in the literature, existing algorithms usually necessitate access to a large portion of the entire graph, presenting challenges in real scenarios where graphs are extremely large and I/O costs are expensive. In this paper, we study the butterfly counting problem under the query model, where the following query operations are permitted: degree query, neighbor query, and vertex-pair query. We propose TLS, a practical two-level sampling algorithm that can estimate the butterfly count accurately while accessing only a limited graph structure, achieving significantly lower query costs under the standard query model. TLS also incorporates several key techniques to control the variance, including "small-degree-first sampling" and "wedge sampling via small subsets". To ensure theoretical guarantees, we further introduce two novel techniques: "heavy-light partition" and "guess-and-prove", integrated into TLS. With these techniques, we prove that the algorithm can achieve a (1+eps) accuracy for any given approximation parameter 0 < eps < 1 on general bipartite graphs with a promised time and query complexity. In particular, the promised time is sublinear when the input graph is dense enough. Extensive experiments on 15 datasets demonstrate that TLS delivers robust estimates with up to three orders of magnitude lower query costs and runtime compared to existing solutions.

Charting the Diameter Computation Landscape of Geometric Intersection Graphs in Three Dimensions and Higher

2026-03-23T10:34:01Z

Recent research on computing the diameter of geometric intersection graphs has made significant strides, primarily focusing on the 2D case where truly subquadratic-time algorithms were given for simple objects such as unit-disks and (axis-aligned) squares. However, in three or higher dimensions, there is no known truly subquadratic-time algorithm for any intersection graph of non-trivial objects, even basic ones such as unit balls or (axis-aligned) unit cubes. This was partially explained by the pioneering work of Bringmann et al. [SoCG '22] which gave several truly subquadratic lower bounds, notably for unit balls or unit cubes in 3D when the graph diameter $Δ$ is at least $Ω(\log n)$, hinting at a pessimistic outlook for the complexity of the diameter problem in higher dimensions. In this paper, we substantially extend the landscape of diameter computation for objects in three and higher dimensions, giving a few positive results. Our highlighted findings include: - A truly subquadratic-time algorithm for deciding if the diameter of unit cubes in 3D is at most 3 (Diameter-3 hereafter), the first algorithm of its kind for objects in 3D or higher dimensions. Our algorithm is based on a novel connection to pseudolines, which is of independent interest. - A truly subquadratic time lower bound for \Diameter-3 of unit balls in 3D under the Orthogonal Vector (OV) hypothesis, giving the first separation between unit balls and unit cubes in the small diameter regime. Previously, computing the diameter for both objects was known to be truly subquadratic hard when the diameter is $Ω(\log n)$. - A near-linear-time algorithm for Diameter-2 of unit cubes in 3D, generalizing the previous result for unit squares in 2D. - A truly subquadratic-time algorithm and lower bound for Diameter-2 and Diameter-3 of rectangular boxes (of arbitrary dimension and sizes), respectively.

Computing distances is FPT on graph associahedra and W[2]-hard on hypergraphic polytopes

2026-03-23T07:44:34Z

An elimination tree of a connected graph $G$ is a rooted tree on the vertices of $G$ obtained by choosing a root $v$ and recursing on the connected components of $G-v$ to obtain the subtrees of $v$. The graph associahedron of $G$ is a polytope whose vertices correspond to elimination trees of $G$ and whose edges correspond to tree rotations, a natural operation between elimination trees. These objects generalize associahedra, which correspond to the case where $G$ is a path. Ito et al. [ICALP 2023] recently proved that the problem of computing distances on graph associahedra is NP-hard. In this paper we prove that the problem, for a general graph $G$, is fixed-parameter tractable parameterized by the distance $k$. Prior to our work, only the case where $G$ is a path was known to be fixed-parameter tractable. To prove our result, we use a novel approach based on a marking scheme that restricts the search to a set of vertices whose size is bounded by a (large) function of $k$. On the negative side, we show that it is unlikely that FPT algorithms exist on a natural generalization of graph associahedra, namely hypergraphic polytopes, by proving that computing distances on them is W[2]-hard parameterized by the distance. We also prove that, on hypergraphic polytopes, the distance cannot be approximated in polynomial time within a factor $c \cdot \log(|V|+|\mathcal{E}|)$ for some constant $c > 0$ unless P = NP, where $H=(V, \mathcal{E})$ is the input hypergraph. This result strengthens the hardness result of Cardinal and Steiner [Combin. Theory 2025], who proved that the problem cannot be approximated within a factor $(1 + \varepsilon)$ for some absolute constant $\varepsilon > 0$ unless P = NP. Finally, we rule out the existence of polynomial kernels parameterized by the number of vertices of the input hypergraph, a parameter for which the problem is easily seen to be FPT.

Non-Exclusive Notifications for Ride-Hailing at Lyft I: Single-Cycle Approximation Algorithms

2026-03-23T03:43:49Z

Ride-hailing platforms increasingly rely on non-exclusive notifications-broadcasting a single request to multiple drivers simultaneously-to mitigate inefficiencies caused by uncertain driver acceptance. In this paper, the first in a two-part collaboration with Lyft, we formally model the 'Notification Set Selection Problem' for a single decision cycle, where the platform determines the optimal subset of drivers to notify for each incoming ride request. We analyze this combinatorial optimization problem under two contention-resolution protocols: 'First Acceptance (FA)', which prioritizes speed by assigning the ride to the first responder, and 'Best Acceptance (BA)', which prioritizes match quality by selecting the highest-valued accepting driver. We show that welfare maximization under both mechanisms is strongly NP-hard, ruling out a Fully Polynomial Time Approximation Scheme (FPTAS). Despite this, we derive several positive algorithmic results. For FA, we present a Polynomial Time Approximation Scheme (PTAS) for the single-rider case and a constant-factor approximation (factor 4) for the general matching setting. We highlight that the FA valuation function can be viewed as a novel discrete choice model with theoretical properties of independent interest. For BA, we prove that the objective is monotone and submodular, admitting a standard $(1 - 1/e)$-approximation. Moreover, using a polynomial-time demand oracle that we design for this problem, we show it is possible to surpass the $(1 - 1/e)$ barrier. Finally, in the special case of homogeneous acceptance probabilities, we show that the BA problem can be solved exactly in polynomial time via a linear programming formulation. We validate the empirical performance our algorithms through numerical experiments on synthetic data and on instances calibrated using real ride-sharing data from Lyft.

Stationary Online Contention Resolution Schemes

2026-03-23T03:43:31Z

Online contention resolution schemes (OCRSs) are a central tool in Bayesian online selection and resource allocation: they convert fractional ex-ante relaxations into feasible online policies while preserving each marginal probability up to a constant factor. Despite their importance, designing (near) optimal OCRSs is often technically challenging, and many existing constructions rely on indirect reductions to prophet inequalities and LP duality, resulting in algorithms that are difficult to interpret or implement. In this paper, we introduce "stationary online contention resolution schemes (S-OCRSs)," a permutation-invariant class of OCRSs in which the distribution of the selected feasible set is independent of arrival order. We show that S-OCRSs admit an exact distributional characterization together with a universal online implementation. We then develop a general `maximum-entropy' approach to construct and analyze S-OCRSs, reducing the design of online policies to constructing suitable distributions over feasible sets. This yields a new technical framework for designing simple and possibly improved OCRSs. We demonstrate the power of this framework across several canonical feasibility environments. In particular, we obtain an improved $(3-\sqrt{5})/2$-selectable OCRS for bipartite matchings, attaining the independence benchmark conjectured to be optimal and yielding the best known prophet inequality for this setting. We also obtain a $1-\sqrt{2/(πk)} + O(1/k)$-selectable OCRS for $k$-uniform matroids and a simple, explicit $1/2$-selectable OCRS for weakly Rayleigh matroids (including all $\mathbb{C}$-representable matroids such as graphic and laminar). While these guarantees match the best known bounds, our framework also yields concrete and systematic constructions, providing transparent algorithms in settings where previous OCRSs were implicit or technically involved.

Polynomial-size encoding of all cuts of small value in integer-valued symmetric submodular functions

2026-03-23T01:20:41Z

We study connectivity functions, that is, integer-valued symmetric submodular functions on a finite ground set attaining $0$ on the empty set. For a connectivity function $f$ on an $n$-element set $V$ and an integer $k\ge 0$, we show that the family of all sets $X\subseteq V$ with $f(X)=k$ admits a polynomial-size representation: it can be described by a list of at most $O(n^{4k})$ items, each consisting of a set to be included, another set to be excluded, and a partition of remaining elements, such that the union of some members of the partition and the set to be included are precisely all sets $X$ with $f(X)=k$. We also give an algorithm that constructs this representation in time $O(n^{2k+7}γ+n^{2k+8}+n^{4k+2})$, where $γ$ is the oracle time to evaluate $f$. This generalizes the low rank structure theorem of Bojańczyk, Pilipczuk, Przybyszewski, Sokołowski, and Stamoulis [Low rank MSO, arXiv, 2025] on cut-rank functions on graphs to general connectivity functions. As an application, for fixed $k$, we obtain a polynomial-time algorithm for finding a set $A$ with $f(A)=k$ and a prescribed cardinality constraint on $A$.

Hardening Confidential Federated Compute against Side-channel Attacks

2026-03-23T01:13:17Z

In this work, we identify a set of side-channels in our Confidential Federated Compute platform that a hypothetical insider could exploit to circumvent differential privacy (DP) guarantees. We show how DP can mitigate two of the side-channels, one of which has been implemented in our open-source library.

Finding Minimum Distance Preservers: A Parameterized Study

2026-03-22T23:14:42Z

For a given graph $G$ and a subset of vertices $S$, a \emph{distance preserver} is a subgraph of $G$ that preserves shortest paths between the vertices of $S$. We distinguish between a \emph{subsetwise} distance preserver, which preserves distances between all pairs in $S$, and a \emph{pairwise} distance preserver, which preserves distances only between specific pairs of vertices in $S$, given in the input. While a large body of work is dedicated to upper and lower bounds on the size of distance preservers and, more generally, graph spanners, the computational complexity of finding the minimum distance preserver has received comparatively little attention. We consider the respective \scup{Subsetwise Distance Preserver}\xspace (\scup{SDP}\xspace) and \scup{Pairwise Distance Preserver}\xspace (\scup{PDP}\xspace) problems and initiate the study of their computational complexity. We provide a detailed complexity landscape with respect to natural parameters, including the number of terminals, solution size, vertex cover, and treewidth. Our main contributions are as follows: \begin{itemize} \setlength{\itemsep}{0.5em} \item Both \scup{PDP}\xspace and \scup{SDP}\xspace are \nph\ even on subgraphs of the grid. Moreover, when parameterized by the number of terminals, the problems are \wh{1}\ on subgraphs of the grid, while they become \textsc{FPT}\ on full grids. \item \scup{PDP}\xspace is \nph\ on graphs of vertex cover $3$, while \scup{SDP}\xspace is \textsc{FPT}\ when parameterized by the vertex cover of the graph. Thus, the vertex cover parameter distinguishes the two variants. \item Both problems are \textsc{FPT}\ when parameterized by the number of terminals and the treewidth of the graph. \end{itemize}

Optimal-Cost Construction of Shallow Cuttings for 3-D Dominance Ranges in the I/O-Model

2026-03-22T17:31:57Z

Shallow cuttings are a fundamental tool in computational geometry and spatial databases for solving offline and online range searching problems. For a set $P$ of $N$ points in 3-D, at SODA'14, Afshani and Tsakalidis designed an optimal $O(N\log_2N)$ time algorithm that constructs shallow cuttings for 3-D dominance ranges in internal memory. Even though shallow cuttings are used in the I/O-model to design space and query efficient range searching data structures, an efficient construction of them is not known till now. In this paper, we design an optimal-cost algorithm to construct shallow cuttings for 3-D dominance ranges. The number of I/Os performed by the algorithm is $O\left(\frac{N}{B}\log_{M/B}\left(\frac{N}{B}\right) \right)$, where $B$ is the block size and $M$ is the memory size. As two applications of the optimal-cost construction algorithm, we design fast algorithms for offline 3-D dominance reporting and offline 3-D approximate dominance counting. We believe that our algorithm will find further applications in offline 3-D range searching problems and in improving construction cost of data structures for 3-D range searching problems.

The Library Theorem: How External Organization Governs Agentic Reasoning Capacity

2026-03-22T15:02:56Z

Externalized reasoning is already exploited by transformer-based agents through chain-of-thought, but structured retrieval -- indexing over one's own reasoning state -- remains underexplored. We formalize the transformer context window as an I/O page and prove that tool-augmented agents with indexed external memory achieve exponentially lower retrieval cost than agents restricted to sequential scanning: $O(\log_b N)$ versus $Ω(N)$ page reads per query, and $O(T \log_b T)$ versus $Θ(T^2)$ cumulative cost over $T$ reasoning steps -- a gap that widens as deliberation deepens. We test these predictions on a controlled lookup benchmark across three content types -- random hashes, ordered integers, and encyclopedia entries -- varying store size from 50 to 5,000 items, and replicate key conditions across two model generations (GPT-4o-mini and GPT-5.4). On abstract content, the indexed agent achieves median 1 page read regardless of store size, confirming the $O(1)$ prediction. Sorted pages without an index fail to close the gap: the weaker model cannot sustain binary search at scale, and the stronger model achieves near-optimal $\log_2 N$ search but still loses to the index by $5\times$. On familiar content (encyclopedia entries), a competing failure mode emerges: the model recognizes the domain, bypasses the retrieval protocol, and generates answers from parametric memory, producing catastrophic token expenditure even when the index is sound. This parametric memory competition dissociates the two cognitive operations that indexing combines: understanding content (where language models excel) and following navigational protocols (where they fail when understanding tempts them to shortcut). The result argues for a separation of concerns: use language models for index construction, where semantic understanding helps, and deterministic algorithms for index traversal, where it hurts.