Estimating Random-Walk Probabilities in Directed Graphs

2026-05-18T16:49:13Z

We study discounted random walks in directed graphs. In each step, the walk either terminates with a constant probability $α$, or proceeds to a random out-neighbor. Our goal is to estimate the probability $π(s, t)$ that a discounted random walk starting from $s$ terminates at $t$. This probability is also known as the Personalized PageRank (PPR) score, which measures the relevance of $t$ to $s$, for instance, when $s$ and $t$ are web pages on the Internet. We aim to estimate $π(s, t)$ within a constant relative error with constant probability. A variety of algorithms have been developed for several problem variants, such as single-pair, single-source, single-target, and single-node estimation, under both worst-case and average-case settings, and for different combinations of allowed graph queries. However, in many important cases, there remain polynomial gaps between known upper and lower bounds. In this paper, we establish tight upper and lower bounds (up to logarithmic factors of $n$) for all problem variants and query combinations, closing all existing gaps in both the worst-case and average-case settings. Below we give some examples for the worst-case settings. As an upper-bound example, the classic power method estimates $π(s,t)$ if it is above a threshold $δ$ in time $O(m\log(1/δ))$ but $π(s,t)$ can be as small as $1/n^{Θ(n)}$. For contrast, we propose algorithms that deterministically estimate arbitrarily small $π(s,t)$ in $O(m\log n)$ time. As a lower-bound example, we improve the lower bound for the single-pair problem from $Ω(\min\{n,1/δ\})$ to $Ω(\min\{m,1/δ\})$, which is optimal (up to logarithmic factors) since a simple Monte Carlo estimate takes $O(1/δ)$ time.

A Tight Double-Exponentially Lower Bound for High-Multiplicity Bin Packing

2026-05-18T14:43:04Z

Consider a high-multiplicity Bin Packing instance $I$ with $d$ distinct item types. In 2014, Goemans and Rothvoss gave an algorithm with runtime ${{|I|}^2}^{O(d)}$ for this problem~[SODA'14], where $|I|$ denotes the encoding length of the instance $I$. Although Jansen and Klein~[SODA'17] later developed an algorithm that improves upon this runtime in a special case, it has remained a major open problem by Goemans and Rothvoss~[J.ACM'20] whether the doubly exponential dependency on $d$ is necessary. We solve this open problem by showing that unless the ETH fails, there is no algorithm solving the high-multiplicity Bin Packing problem in time ${{|I|}^2}^{o(d)}$. To prove this, we introduce a novel reduction from 3-SAT. The core of our construction is efficiently encoding all information from a 3-SAT instance with $n$ variables into an ILP with $O(\log(n))$ variables and constraints. This result confirms that the Goemans and Rothvoss algorithm is essentially best-possible for Bin Packing parameterized by the number $d$ of item sizes in the context of XP time algorithms.

Treewidth Parameterized by Feedback Vertex Number

2026-05-18T14:08:41Z

We provide the first algorithm for computing an optimal tree decomposition for a given graph $G$ that runs in single exponential time in the feedback vertex number of $G$, that is, in time $2^{O(\text{fvn}(G))}\cdot n^{O(1)}$, where $\text{fvn}(G)$ is the feedback vertex number of $G$ and $n$ is the number of vertices of $G$. On a classification level, this improves the previously known results by Chapelle et al. [Discrete Applied Mathematics '17] and Fomin et al. [Algorithmica '18], who independently showed that an optimal tree decomposition can be computed in single exponential time in the vertex cover number of $G$. One of the biggest open problems in the area of parameterized complexity is whether we can compute an optimal tree decomposition in single exponential time in the treewidth of the input graph. The currently best known algorithm by Korhonen and Lokshtanov [STOC '23] runs in $2^{O(\text{tw}(G)^2)}\cdot n^4$ time, where $\text{tw}(G)$ is the treewidth of $G$. Our algorithm improves upon this result on graphs $G$ where $\text{fvn}(G)\in o(\text{tw}(G)^2)$. On a different note, since $\text{fvn}(G)$ is an upper bound on $\text{tw}(G)$, our algorithm can also be seen either as an important step towards a positive resolution of the above-mentioned open problem, or, if its answer is negative, then a mark of the tractability border of single exponential time algorithms for the computation of treewidth.

A Note on Second-Order Expected Maximum-Load Bounds for Binary Linear Hashing

2026-05-18T12:51:10Z

Let $S\subseteq F_2^u$ have size $n=2^\ell$, and let $h:F_2^u\to F_2^\ell$ be a uniformly random linear map. For $y\in F_2^\ell$, write $Load_h(y):=|h^{-1}(y)\cap S|$, and let $M(S,h):=\max_{y\in F_2^\ell} Load_h(y)$ be the maximum load. Jaber, Kumar and Zuckerman (STOC 2025) proved that the expected maximum load of $h$ on $S$ is at most $16\log n/\log\log n$, matching the fully independent keys-into-bins scale up to constants. Their proof also gives the tail estimate \[ \Pr\left[ M(S,h)\ge R\frac{\log n}{\log\log n} \right] \le O\left(\frac{1}{R^{2}}\right). \] We record a base optimization in their exponential-potential method showing that binary linear hashing nearly matches fully independent hashing also at the level of the second-order maximum-load scale. For every $R>1$ satisfying $R\ell^{1-1/R}\ge D\ln\ell$, where $D$ is an absolute constant, we prove \[ \Pr\left[ M(S,h)\ge R\frac{\log n}{\log\log n} \right] \le O\left( \frac{(\log\log n)^2}{R^2(\log n)^{2-2/R}} \right). \] Integrating this tail yields \[ E[M(S,h)] \le \left( 1+ (1+o(1)) \frac{\log\log\log n}{\log\log n} \right) \frac{\log n}{\log\log n}. \] Thus binary linear hashing matches fully independent hashing in the leading term and matches the dominant second-order correction up to a $1+o(1)$ factor. We also prove, by an independent self-contained argument, a sharp tail bound for one prescribed bucket: for fixed $y\in F_2^\ell$, \[ \Pr[ Load_h(y)>2^a-2]\le γ^{-1}2^{-a^2}, \] where $ γ=\prod_{j\ge1}(1-2^{-j}) $. A subspace construction shows that this is asymptotically tight even in the leading constant as $ a\to\infty $. However, this controls only a fixed bucket; a direct union bound over all buckets loses a factor $ 2^\ell $.

Near-Linear Time Generalized Sinkhorn Algorithms for Bounded Genus Graphs

2026-05-18T11:42:00Z

We present GenusSink, a new class of approximate generalized Sinkhorn algorithms with shortest-path-distance costs for bounded genus (e.g. planar) graphs, providing near-linear time: (1) pre-processing, (2) iteration step, (3) final transport plan matrix querying and near-linear memory. Graphs handled by GenusSink include in particular planar graphs and bounded-genus meshes approximating 3D objects. GenusSink addresses total quadratic time complexity of its brute-force counterpart by leveraging separator-based decomposition of graphs, computational geometry techniques, and new results on fast matrix-vector multiplications with generalized distance matrices, using, in particular, Fourier analysis and low displacement rank theory. It is inspired by recent breakthroughs in graph theory on approximating bounded genus metrics with small treewidth metrics \citep{minor-free-paper}. The graph-centric approach enables us to target optimal transport problem with the corresponding distributions defined on the manifolds approximated by weighted graphs and with cost functions given by geodesic distances. We conduct rigorous theoretical analysis of GenusSink, provide practical implementations, leveraging newly introduced in this paper \textit{separation graph field integrators} (S-GFIs) data structures and present empirical verification. GenusSink provides orders of magnitude more accurate computations than other efficient Sinkhorn algorithms, while still guaranteeing significant computational improvements, as compared to the baseline. As a by-product of the developed methods, we show that GenusSink is \textbf{numerically equivalent} to the brute-force geodesic Sinkhorn algorithm on $n$-vertex graphs with treewidth $O(\log \log (n))$ (e.g. on trees).

An Entropy-Governed Speedup for Quantum Algorithms on Local Hamiltonians

2026-05-18T11:36:43Z

Low-energy estimation and state preparation for general $k$-local Hamiltonians are fundamental challenges in quantum complexity theory. For constant relative accuracy, Buhrman et al. (PRL 2025) recently broke the natural Grover bound $O(2^{n/2})$, where $n$ denotes the number of qubits, for both problems. In this paper, for any sufficiently small parameter $d\ge 0$, we present an even faster quantum algorithm that outputs a quantum state with energy bounded by the minimum energy over all depth-$d$ states (i.e., states obtained by applying a depth-$d$ circuit to the all-zero state), together with an estimate of this energy. For the class of Hamiltonians with depth-$d$ ground states, our algorithm furthermore achieves exactly the same energy guarantees as Buhrman et al. Our results also provide insight into the distinction between strongly entangled states and those admitting efficient classical descriptions.

PLS-complete problems with lexicographic cost functions: Max-$k$-SAT and Abelian Permutation Orbit Minimization

2026-05-18T09:56:28Z

How hard is it to find a local optimum? If we are given a graph and want to find a locally maximal cut--meaning that the number of edges in the cut can't be improved by moving a single vertex from one side to the other--then just iterating improving steps finds a local maximum in $ |E|$ steps. If, on the other hand, the edges are weighted, this problem becomes hard for the class PLS (Polynomial Local Search). We are interested in optimization problems with lexicographic costs. For Max-Cut this would mean that the edges $e_1,\dots, e_m$ have costs $c(e_i) = 2^i$. For such a cost function finding a global Max-Cut is easy. In contrast, we show that it is PLS-complete to find an assignment for a 4-CNF formula that is locally maximal (when the clauses have lexicographic weights); and also for a 3-CNF when we allow switching two variables at a time. We use these results to answer a question in Scheder and Tantow, who showed that finding a lexicographic local minimum of a string $s \in \{0,1\}^n$ under the action of a list of given permutations $π_1, \dots, π_k \in S_{n}$ is PLS-complete. They ask whether the problem stays PLS-complete when the $π_1,\dots,π_k$ commute, i.e., generate an Abelian subgroup $G$ of $S_n$. We show that it does, and in fact stays PLS-complete even (1) when every element in $G$ has order two or (2) when $G$ is cyclic. Additionally, we use it to further investigate the complexity of computing pure $α$-Nash equilibria in congestion games. Using lexicographic 4-SAT, we obtain a simple proof of the PLS-completeness originally shown by Skopalik and Vöcking that can be extended to exponential and polynomial delay functions with positive coefficients. The number of strategies per player and players per resource is bounded. However, the degree of the polynomials is not bounded by a constant.

Complexity of Finding and Enumerating Interconnection Trees

2026-05-18T09:31:19Z

We study the problem of connecting the parts of a multipartite graph using a minimum number of edges under a matching constraint. We introduce interconnection trees, defined as matchings whose projections onto the quotient graph form a spanning tree. Motivated by applications in chemoinformatics, we investigate the decision, counting, and enumeration variants of this problem. We show that the decision problem is $NP$-complete. Nevertheless, it becomes tractable in several structured settings: it is fixed-parameter tractable in the number of parts, and admits polynomial or linear-time algorithms on complete, quasi-complete, and $t$-quasi-complete multipartite graphs. We also study enumeration, for which we design efficient flashlight-search based algorithms with optimal delay for complete multipartite graphs, and a weight-guided heuristic that prioritizes low-weight solutions and performs well in practice.

On efficient robust regression with subquadratic samples

2026-05-18T08:34:07Z

We revisit the problem of robust linear regression under Gaussian covariates with an unknown covariance matrix of condition number $κ$. For this fundamental problem, significant gaps remain in our understanding of the trade-offs among sample complexity, condition number, runtime, and prediction error for efficient algorithms. Our first result is a near-linear-time algorithm that uses $\widetilde{O}(d/ε^4)$ samples, where $d$ is the dimension and $ε$ is the corruption rate, and achieves prediction error $O(\sqrt{εκ})$ under the condition $εκ\lesssim 1$, improving over all prior works. We complement this result with a Statistical Query (SQ) lower bound showing that efficient SQ algorithms achieving error $o(\sqrt{εκ})$ when $εκ\lesssim 1$ require queries that take $Ω(d^2)$ samples to simulate. Finally, we prove a low-degree polynomial lower bound that gives fine-grained evidence that, without assumptions such as $εκ\lesssim 1$, efficient algorithms may require $\tildeΩ\left(\min\{dε^{2}κ^{2},\ ε^{2}d^{2}\}\right)$ samples to significantly outperform the trivial estimator that always guesses $0$.

On Occurrence-Preserving Morphisms

2026-05-18T08:24:10Z

A \emph{morphism} is a mapping that transforms words through letter-wise substitution, where each symbol is consistently replaced by a fixed word. In the field of combinatorics on words, one topic that has attracted considerable attention is the characterization of morphisms that preserve specific properties, such as overlap-freeness, square-freeness, lexicographic order, and primitivity. Continuing this direction, we initiate the study on \emph{occurrence-preserving morphisms}, which address the following fundamental question: given a morphism $φ$, two words $u$ and $v$, and $k \geq 1$, under what conditions does the number of occurrences of $u$ in $v$ equal the number of occurrences of $φ^k(u)$ in $φ^k(v)$? To answer this question, we introduce the notion of \emph{interference-free morphisms}, examine their properties, develop an efficient algorithm for deciding interference-freeness, and uncover a connection to \emph{recognizable morphisms}. We then present a precise characterization of occurrence-preserving morphisms in terms of interference-freeness. As applications of our characterization, we first show that there exists a bijection between the starting positions of the occurrences of $u$ in $v$ and those of $φ^k(u)$ in $φ^k(v)$. We then apply the characterization to the Fibonacci and Thue-Morse words to identify their \emph{minimal unique substrings~(MUSs)}. Finally, we exploit the connection between MUSs and \emph{net occurrences} to simplify existing proofs on net occurrences in these words.

Tolerant Testing for Unique Games

2026-05-18T02:28:41Z

We give tolerant testers with sublinear query complexity in the adjacency-list model for Unique Games. Prior tolerant testers required structural assumptions such as expansion or clusterability. For Unique Games, the tester distinguishes instances whose optimum fraction of violated constraints is at most $\varepsilon$ from those whose optimum is at least $ρ$, for $0<\varepsilon<ρ<1$, assuming $\varepsilon\log n\lesssimρ^4$. On instances with $n$ vertices and $m$ constraints, it uses $\widetilde O(\sqrt m\,ρ^{-13/2}+nρ^{-2}/\sqrt m)$ queries. We also give a specialized tester for bipartiteness, the $Q=2$ transposition case of Unique Games. Exploiting its signed structure, the tester achieves substantially better tolerance and query-complexity guarantees than the generic Unique Games tester. Writing $λ=ρ/(1+\log(1/ρ))$, the bipartiteness tester distinguishes graphs that can be made bipartite by deleting at most an $\varepsilon$ fraction of edges from graphs in which every bipartition has at least a $ρ$ fraction of edges with both endpoints on the same side, assuming $\varepsilon\log n\lesssimλ^2$, using $\widetilde O(\sqrt m/λ^2+n/(\sqrt m\,λ))$ queries.

Truthful Calibration Errors for Multi-Class Prediction

2026-05-17T22:00:23Z

Calibrated predictions are useful because their numerical values can be interpreted as probabilities. Calibration errors are therefore widely used to evaluate, compare, and tune probabilistic predictors. Recently, Haghtalab et al. (2024) introduced an additional requirement for such measures: truthfulness. A calibration measure is truthful if a predictor minimizes its expected measured error by reporting the true conditional label distribution. Many standard empirical calibration errors are non-truthful: a predictor may appear better calibrated by distorting its probabilities rather than reporting them truthfully. We study the practical role of truthfulness for calibration measurement in multiclass prediction. First, we introduce perfectly truthful calibration errors for multidimensional linear properties of the label distribution, generalizing the truthful calibration error for binary predictions in Hartline et al. (2025). This framework includes full multiclass calibration and classwise calibration. We also identify a truthful correction for confidence calibration. Second, we characterize the decision-theoretic implications of these truthful errors. For calibrated predictors, truthful calibration errors preserve the Blackwell dominance: a more informative calibrated predictor receives no larger expected error. Third, we show that this decision-theoretic interpretation explains and mitigates the well-observed ranking robustness problem of binned calibration errors. Empirically, non-truthful confidence-based errors can reverse model rankings when the number of bins changes, while our truthful errors give more stable rankings across binning choices.

Sparse induced subgraphs in $P_7$-free graphs of bounded clique number

2026-05-17T20:43:26Z

Many natural computational problems, including e.g. Max Weight Independent Set, Feedback Vertex Set, or Vertex Planarization, can be unified under an umbrella of finding the largest sparse induced subgraph, that satisfies some property definable in CMSO$_2$ logic. It is believed that each problem expressible with this formalism can be solved in polynomial time in graphs that exclude a fixed path as an induced subgraph. This belief is supported by the existence of a quasipolynomial-time algorithm by Gartland, Lokshtanov, Pilipczuk, Pilipczuk, and Rzążewski [STOC 2021], and a recent polynomial-time algorithm for $P_6$-free graphs by Chudnovsky, McCarty, Pilipczuk, Pilipczuk, and Rzążewski [SODA 2024]. In this work we extend polynomial-time tractability of all such problems to $P_7$-free graphs of bounded clique number.

Independent Set Reconfiguration Under Bounded-Hop Token

2026-05-17T20:02:49Z

The independent set reconfiguration problem (ISReconf) is the problem of determining, for given independent sets I_s and I_t of a graph G, whether I_s can be transformed into I_t by repeatedly applying a prescribed reconfiguration rule that transforms an independent set to another. As reconfiguration rules for the ISReconf, the Token Sliding (TS) model and the Token Jumping (TJ) model are commonly considered. While the TJ model admits the addition of any vertex (as far as the addition yields an independent set), the TS model admits the addition of only a neighbor of the removed vertex. It is known that the complexity status of the ISReconf differs between the TS and TJ models for some graph classes. In this paper, we analyze how changes in reconfiguration rules affect the computational complexity of reconfiguration problems. To this end, we generalize the TS and TJ models to a unified reconfiguration rule, called the k-Jump model, which admits the addition of a vertex within distance k from the removed vertex. Then, the TS and TJ models are the 1-Jump and D(G)-Jump models, respectively, where D(G) denotes the diameter of a connected graph G. We give the following three results: First, we show that the computational complexity of the ISReconf under the k-Jump model for general graphs is equivalent for all k >= 3. Second, we present a polynomial-time algorithm to solve the ISReconf under the 2-Jump model for split graphs. We note that the ISReconf under the 1-Jump (i.e., TS) model is PSPACE-complete for split graphs, and hence the complexity status of the ISReconf differs between k = 1 and k = 2. Third, we consider the optimization variant of the ISReconf, which computes the minimum number of steps of any transformation between Is and It. We prove that this optimization variant under the k-Jump model is NP-complete for chordal graphs of diameter at most 2k + 1, for any k >=3.

Finding the Balance Rate of Uncertain Signed Graphs

2026-05-17T15:01:11Z

Signed graphs are widely used to analyze complex systems such as social, political, and biological networks. The notion of balance, a key concept of signed graphs, reflects the stability of relationships. While it has been extensively studied in deterministic graphs, real-world networks often exhibit uncertainty in their connections, which traditional approaches struggle to address. To bridge this gap, we introduce the concept of balance rate, a metric for quantifying the degree of balance in uncertain signed graphs, and prove that computing it exactly is NP-hard, motivating the need for efficient estimation methods. We propose a novel Rao-Blackwellized spanning-tree estimator that achieves near-linear time complexity per sample by leveraging graph decomposition and structural properties. We also construct asymptotically justified confidence intervals using the Delta method. Experiments on real-world datasets demonstrate the efficiency and effectiveness of our approach, enabling scalable balance analysis in uncertain signed graphs.