Redundancy Is All You Need (for CSP Sparsification)

2026-05-15T18:20:19Z

The seminal work of Benczúr and Karger demonstrated cut sparsifiers of near-linear size. Subsequent extensions have yielded sparsifiers for hypergraph cuts and more recently linear codes over Abelian groups. A decade ago, Kogan and Krauthgamer asked about the sparsifiability of arbitrary constraint satisfaction problems (CSPs). For this question, a trivial lower bound is the size of a non-redundant CSP instance, which admits, for each constraint, an assignment satisfying only that constraint (so that no constraint can be dropped by the sparsifier). For instance, for graph cuts, spanning trees are non-redundant instances. Our main result is that redundant clauses are sufficient for sparsification: for any CSP predicate R, every unweighted instance of CSP(R) has a sparsifier of size at most its non-redundancy (up to polylog and $1/ε$ factors). For weighted instances, we similarly pin down the sparsifiability to the so-called chain length of the predicate. These results precisely determine the extent to which any CSP can be sparsified. Our result is established in the general setting of non-linear codes, or equivalently set families, yielding a VC-type theorem for multiplicative error approximation. A key technical ingredient in our work is a novel application of the entropy method from Gilmer's recent breakthrough on the union-closed sets conjecture. As an immediate consequence of our main theorem, a number of results in the non-redundancy literature immediately extend to CSP sparsification. We also contribute new techniques for understanding the non-redundancy of CSP predicates. By adapting methods from the matching vector codes literature in coding theory, we are able to construct an explicit predicate whose non-redundancy lies between $Ω(n^{1.5})$ and $\widetilde{O}(n^{1.6})$, the first example with a provably non-integral exponent.

An $\mathcal{O}(\log N)$ Time Algorithm for the Generalized Egg Dropping Problem

2026-05-15T15:14:26Z

The generalized egg dropping problem is a classic challenge in sequential decision-making. Standard dynamic programming evaluates the minimax minimum number of tests in $\mathcal{O}(K \cdot N^2)$ time. A known approach formulates the testable thresholds as a partial sum of binomial coefficients and applies binary search to reduce the time complexity to $\mathcal{O}(K \log N)$. In this paper, we demonstrate that binary search over the complete sequential test domain is suboptimal. By restricting a binary search over multiples of $K$, we isolate a dynamic structural envelope that guarantees convergence. We prove that this boundary balances the search depth against the combinatorial evaluation cost, cancelling the $K$ variable to strictly bound the search phase to $\mathcal{O}(\log N)$. Combined with an incremental traversal, our algorithm eliminates the standard bottlenecks. Furthermore, we formulate an explicit $\mathcal{O}(1)$ space policy to dynamically reconstruct the optimal decision tree.

On the parameterized complexity of Broadcast Independence and Broadcast Packing

2026-05-15T14:31:51Z

A broadcast on a connected graph is a function f that assigns each vertex v an integer f(v) with 0 <= f(v) <= ecc(v) where ecc(v) denotes the eccentricity of v. A vertex u hears a broadcasting vertex v (with f(v)>0) if u is at distance at most f(v) from v. Beyond the classical broadcast domination problem, where every vertex is required to hear at least one vertex, two variants raise intriguing combinatorial and algorithmic questions. In an independent broadcast, no broadcasting vertex hears another broadcasting vertex, while a broadcast packing requires that every vertex hears at most one broadcasting vertex. The corresponding problems Broadcast Independence and Broadcast Packing ask for broadcasts of values at least k under these constraints, where the value is the sum of the broadcast values. We initiate a systematic study of the parameterized complexity of such problems. We prove that Broadcast Independence and Broadcast Packing are FPT parameterized by the treewidth plus the diameter of G, with a family of dynamic-programming algorithms over nice tree decompositions. We obtain as a corollary that both problems are FPT parameterized by k and the treewidth of G and XP for treewidth only. The latter result shows that the known algorithm for trees (Bessy and Rautenbach, DAM 2022) can indeed be extended to bounded treewidth graphs. On the negative side, we show that Broadcast Independence is W[1]-hard parameterized by the pathwidth of G. Note that this result completes the picture for parameter k and treewidth for Broadcast Independence since it is known to be W[1]-hard for k only. We complement these results by showing that a weighted version of both problems, where the input comes with a weight function on the edges, is W[1]-hard parameterized by the vertex cover of G. Finally, we provide a constant-factor approximation algorithm parameterized by treewidth for Broadcast Independence.

Complexity of Non-Log-Concave Sampling in Fisher Information

2026-05-15T11:20:26Z

We study the query complexity of obtaining a relative Fisher information guarantee for sampling from a log-smooth non-log-concave distribution; this is a sampling analog of finding an approximate stationary point in optimization. Our algorithm is based on the proximal sampler, which is an implicit discretization of the Langevin diffusion, and requires an implementation of the backward step known as the restricted Gaussian oracle (RGO). We show that by leveraging the recent results for log-concave sampling with high-accuracy guarantees in Rényi divergence, we can obtain an approximate RGO implementation that -- when used with the proximal sampler -- yields a complexity guarantee in relative Fisher information that inherits the same dimension dependence as log-concave sampling, and improves upon prior work for non-log-concave sampling. We also show a converse reduction that any improvement in the dimension dependence in relative Fisher information for non-log-concave sampling will yield an improved dimension dependence for high-accuracy log-concave sampling.

Exploration of $k$-edge-deficient temporal graphs in linear time

2026-05-15T10:36:41Z

We study the Temporal Exploration problem, where an agent must visit all vertices of a temporal graph while traversing at most one available edge per time step. Unlike static graphs, which can be explored in linear time, temporal constraints can substantially increase exploration time even when every snapshot of the graph is connected. To better understand the source of this complexity, we focus on a near-static setting and consider always-connected $k$-edge-deficient temporal graphs, in which each snapshot is connected and differs from a fixed underlying $n$-vertex graph by at most $k$ edges. Although such graphs are structurally close to static graphs, they can still exhibit non-trivial temporal behaviour. Prior work showed that these graphs can be explored in $O(kn \log n)$ time steps and established a lower bound of $Ω(n \log k)$, leaving open whether linear-time exploration in $n$ is possible. We resolve this question by showing that any always-connected $k$-edge-deficient temporal graph admits an exploration schedule of length $O(nk \log k)$. Moreover, given such a temporal graph, the corresponding exploration schedule can be computed in polynomial time. The obtained bound is linear in the number of vertices up to a factor depending only on $k$, removes the extraneous logarithmic dependence on $n$, and is nearly optimal. In particular, for constant $k$, our result yields an order-optimal $Θ(n)$ exploration time, showing that temporal exploration in this near-static regime essentially retains the linear-time character of static graph traversal.

Optimizing Line Segment Inspection with Limited-Range Drones

2026-05-15T09:24:50Z

Optimization problems with drones are widely studied in a variety of civilian tasks, mainly due to their ability to traverse rough terrains and to carry cameras and other sensors for surveillance tasks. The limited battery life of these aerial robots poses challenges in operational research. In this paper, we address the following optimization problem. We are given a set of line segments (e.g. tubes in a solar plant) to inspect by drones. The objective is to detect broken pipes using artificial intelligence and path planning must be carried out efficiently. On the one hand, the limited capacity of the batteries necessitates periodic visits (tours) to a fixed base station. However, it is desirable to allocate a set of tours for each drone to ensure that the segments are covered as quickly as possible, aiming to minimize the makespan, which is the maximum time spent by any drone. We are able to prove that this optimization problem is strongly NP-hard even when the segments are positioned on a line and the scenario involves only two drones. Then, approximation algorithms are proposed. Our computational experiments demonstrate that the proposed algorithm achieves near-optimal performance across diverse operational scenarios.

The Robotaxi Placement Problem: Minimizing Expected ETA for Stochastic Demand

2026-05-15T08:54:32Z

Autonomous ride-hailing platforms must strategically position idle robotaxis to minimize the wait times of prospective riders. We formalize this as the \emph{robotaxi placement problem} ($k$-RP). Given a finite metric space and a demand distribution over its points, the goal is to position $k$ robotaxis to minimize the expected total distance in a perfect matching between the robotaxis and $k$ random riders. We present several theoretical results for this stochastic optimization problem. First, we observe that sampling robotaxi locations independently according to the demand distribution yields a randomized $2$-approximation algorithm. Second, we present an explicit inapproximability bound via a novel gap-preserving reduction from the maximum coverage problem. Furthermore, while it is not even clear whether the exact expected cost of a placement can be computed efficiently on general metrics, we design an exact polynomial-time dynamic programming algorithm for $k$-RP in tree metrics by decoupling the stochastic matching dependencies. Finally, empirical evaluations on real-world ride-hailing data reveal that a variance-reduced random placement strategy is highly effective in practice, yielding expected wait times that are very close to those obtained by computationally heavy exact algorithms for the uniform capacitated $k$-median problem.

Fast and Memory Efficient Multimodal Journey Planning with Delays

2026-05-15T08:41:11Z

State-of-the-art multimodal journey-planning algorithms, such as ULTRA, have recently been adapted to account for delays. In this work, we extend this approach to be more memory-efficient, faster, and accurate. We also adapt this framework to other state-of-the-art algorithms, like CSA and RAPTOR. We demonstrate a speedup of 1.9-4.2x over existing algorithms in the single-objective search (earliest arrival time). In the bicriteria setting, we achieve competitive speedup results but greater accuracy. We also find that our method scales much better as the delay buffer Delta increases.

New Algorithms for Parity-SAT and Its Bounded-Occurrence Versions

2026-05-15T07:46:08Z

Parity-SAT is the problem of determining whether a given CNF formula has an odd number of satisfying assignments. As a canonical $\oplus$P-complete problem, it represents a fundamental variant of the exact model counting problem (#SAT). Under the Strong Exponential Time Hypothesis (SETH), Parity-SAT admits no $O^*((2-\varepsilon)^n)$-time or $O^*((2-\varepsilon)^m)$-time algorithm for any constant $\varepsilon>0$, where $n$ and $m$ denote the numbers of variables and clauses, respectively. Thus, breaking the $2^n$ or $2^m$ barrier appears impossible in full generality. In this work, we revisit this barrier through structural restrictions and a refined exploitation of parity. We study Parity-$d$-occ-SAT, where each variable appears in at most $d$ clauses, and obtain three main results. First, we design a randomized $O^*(2^{m(1-1/O(d))})$-time algorithm, thereby breaking the $2^m$ barrier for every fixed $d$. Second, for the special case $d=2$, we develop a significantly sharper branching algorithm running in $O^*(1.1193^n)$ time or $O^*(1.3248^m)$ time. Third, leveraging the structural insights underlying the $d=2$ case, we obtain an $O^*(1.1052^L)$-time algorithm for general Parity-SAT, where $L$ denotes the formula length. All algorithms use only polynomial space. Notably, our running-time bounds are better than the best known bounds for the corresponding exact counting counterparts, highlighting a genuine algorithmic advantage of parity over counting. Conceptually, our results demonstrate that parity admits finer structural reductions and more efficient branching than exact model counting, and that bounded occurrence can be systematically leveraged to circumvent classical exponential barriers.

Deterministic Coreset for Lp Subspace

2026-05-15T02:42:52Z

We introduce the first iterative algorithm for constructing a $\varepsilon$-coreset that guarantees deterministic $\ell_p$ subspace embedding for any $p \in [1,\infty)$ and any $\varepsilon > 0$. For a given full rank matrix $\mathbf{X} \in \mathbb{R}^{n \times d}$ where $n \gg d$, $\mathbf{X}' \in \mathbb{R}^{m \times d}$ is an $(\varepsilon,\ell_p)$-subspace embedding of $\mathbf{X}$, if for every $\mathbf{q} \in \mathbb{R}^d$, $(1-\varepsilon)\|\mathbf{Xq}\|_{p}^{p} \leq \|\mathbf{X'q}\|_{p}^{p} \leq (1+\varepsilon)\|\mathbf{Xq}\|_{p}^{p}$. Specifically, in this paper, $\mathbf{X}'$ is a weighted subset of rows of $\mathbf{X}$ which is commonly known in the literature as a coreset. In every iteration, the algorithm ensures that the loss on the maintained set is upper and lower bounded by the loss on the original dataset with appropriate scalings. So, unlike typical coreset guarantees, due to bounded loss, our coreset gives a deterministic guarantee for the $\ell_p$ subspace embedding. For an error parameter $\varepsilon$, our algorithm takes $O(\mathrm{poly}(n,d,\varepsilon^{-1}))$ time and returns a deterministic $\varepsilon$-coreset, for $\ell_p$ subspace embedding whose size is $O\left(\frac{d^{\max\{1,p/2\}}}{\varepsilon^{2}}\right)$. Here, we remove the $\log$ factors in the coreset size, which had been a long-standing open problem. Our coresets are optimal as they are tight with the lower bound. As an application, our coreset can also be used for approximately solving the $\ell_p$ regression problem in a deterministic manner.

Online Algorithms for Repeated Optimal Stopping: Balancing Baseline Guarantees and Regret

2026-05-15T02:15:02Z

We study the repeated optimal stopping problem, in which the same optimal stopping instance with an unknown distribution is solved repeatedly over $T$ rounds. We aim to simultaneously achieve strong per-round performance guarantees relative to a given baseline and sublinear regret across all rounds. Our primary contribution is a comprehensive theoretical characterization of whether and when these two objectives are compatible. First, under standard semi-bandit feedback, we prove that maintaining the per-round guarantee forces regret of $Ω(T / \log T)$. Second, even under full feedback, we show that requiring almost-sure satisfaction of the per-round guarantee in every round is incompatible with sublinear regret. Third, under full feedback, we propose a general algorithmic framework that achieves both sublinear regret and the per-round guarantee with high probability. Our framework applies to canonical problems, including the prophet inequality, the secretary problem, and their variants under adversarial, random, and i.i.d. input models. For example, in the repeated prophet inequality problem, our method guarantees that, with high probability in each round, its expected reward is at least that of the classical single-sample algorithm, which achieves a $1/2$ competitive ratio, while simultaneously ensuring $\tilde{O}(\sqrt{T})$ regret. Furthermore, we establish a regret lower bound of $Ω(\sqrt{T})$ even in the i.i.d. model, which is nearly tight with respect to the number of rounds.

High-accuracy log-concave sampling with stochastic queries

2026-05-14T22:18:10Z

We show that high-accuracy guarantees for log-concave sampling -- that is, iteration and query complexities which scale as $\mathrm{poly}\log(1/δ)$, where $δ$ is the desired target accuracy -- are achievable using stochastic gradients with subexponential tails. Notably, this exhibits a separation with the problem of convex optimization, where stochasticity (even additive Gaussian noise) in the gradient oracle incurs $\mathrm{poly}(1/δ)$ queries. We also give an information-theoretic argument that light-tailed stochastic gradients are necessary for high accuracy: for example, in the bounded variance case, we show that the minimax-optimal query complexity scales as $Θ(1/δ)$. Our framework also provides similar high accuracy guarantees under stochastic zeroth order (value) queries, and an improved complexity result for sampling from finite-sum potentials.

#CFG and #DNNF admit FPRAS

2026-05-14T20:12:37Z

We provide the first fully polynomial-time randomized approximation scheme for the following two counting problems: 1. Given a Context Free Grammar $G$ over alphabet $Σ$, count the number of words of length exactly $n$ generated by $G$. 2. Given a circuit $\varphi$ in Decomposable Negation Normal Form (DNNF) over the set of Boolean variables $X$, compute the number of assignments to $X$ such that $\varphi$ evaluates to 1. Finding polynomial time algorithms for the aforementioned problems has been a longstanding open problem. Prior work could either only obtain a quasi-polynomial runtime (SODA 1995) or a polynomial-time randomized approximation scheme for restricted fragments, such as non-deterministic finite automata (JACM 2021) or non-deterministic tree automata (STOC 2021).

Hybrid Sketching Methods for Dynamic Connectivity on Sparse Graphs

2026-05-14T17:57:03Z

Dynamic connectivity is a fundamental dynamic graph problem, and recent algorithmic breakthroughs on dynamic graph sketching have reshaped what is theoretically possible: by encoding the graph as per-vertex linear sketches, these algorithms solve dynamic connectivity in only $Θ(V \log^2 V)$ space, independent of the number of edges,outperforming lossless $Θ(V+E)$-space structures that grow as the graph becomes denser. Prior to this work, no practical dynamic connectivity algorithm has been able to translate these theoretical breakthroughs into space savings on real-world graphs. The main obstacle is that per-vertex sketches cost thousands of bytes per vertex, so sketching only pays off once the graph becomes extremely dense. We observe that sparse real-world graphs are often not uniformly sparse, these graphs can contain dense cores on a small subset of vertices that account for a large fraction of edges. We exploit this structure via hybrid sketching: sketch only the dense core, and store the sparse periphery losslessly. We design new hybrid algorithms for fully-dynamic and semi-streaming connectivity with space $O(\min\{V+E, V \log V \log(2+E/V)\})$ w.h.p., simultaneously matching the lossless bound on sparse graphs, the sketching bound on dense graphs, and improving on both in an intermediate regime. A key component is BalloonSketch, a new l0-sampler reducing per-vertex sketch sizes by up to 8x. We implement HybridSCALE, a modular system treating the lossless and sketch-based components as subroutines. HybridSCALE is the first sketch-based dynamic connectivity system to save space on common real-world graphs. Compared to the state-of-the-art lossless baseline, HybridSCALE saves up to 15% space on sparse graphs (average degree < 100), up to 92% on intermediate density graphs (average degree ~ 100-1000), and up to 97% on dense graphs (average degree > 1000).

Sharp Phase Transitions in Estimation with Low-Degree Polynomials

2026-05-14T17:03:15Z

High-dimensional planted problems, such as finding a hidden dense subgraph within a random graph, often exhibit a gap between statistical and computational feasibility. While recovering the hidden structure may be statistically possible, it is conjectured to be computationally intractable in certain parameter regimes. A powerful approach to understanding this hardness involves proving lower bounds on the efficacy of low-degree polynomial algorithms. We introduce new techniques for establishing such lower bounds, leading to novel results across diverse settings: planted submatrix, planted dense subgraph, the spiked Wigner model, and the stochastic block model. Notably, our results address the estimation task -- whereas most prior work is limited to hypothesis testing -- and capture sharp phase transitions such as the "BBP" transition in the spiked Wigner model (named for Baik, Ben Arous, and Péché) and the Kesten-Stigum threshold in the stochastic block model. Existing work on estimation either falls short of achieving these sharp thresholds or is limited to polynomials of very low (constant or logarithmic) degree. In contrast, our results rule out estimation with polynomials of degree $n^δ$ where $n$ is the dimension and $δ> 0$ is a constant, and in some cases we pin down the optimal constant $δ$. Our work resolves open problems posed by Hopkins & Steurer (2017) and Schramm & Wein (2022), and provides rigorous support within the low-degree framework for conjectures by Abbe & Sandon (2018) and Lelarge & Miolane (2019).