https://arxiv.org/api/VPxfBrUqbckJVdIb2KL9HfQufTI 2026-06-18T09:43:37Z 29013 210 15 http://arxiv.org/abs/2606.01187v1 Dynamic Breadth First Search with Predictions 2026-05-31T12:05:01Z

Given a graph $G(V,E)$ having $n$ vertices and $m$ edges, we maintain its Breadth-First Search (BFS) tree from source $s$ under an online sequence of edge updates in the prediction model. Our approach leverages a predicted update sequence aiding online processing. We present algorithms for incremental (insertions-only), decremental (deletions-only), and fully dynamic (insertions and deletions) settings that maintain a BFS tree (parent and level information). Classically, the incremental and decremental BFS tree requires total $O(mn)$ time [JACM81], with amortized $O(n)$ and worst-case $O(m)$ update time. The combinatorial BMM conjecture restricts any polynomial improvement [FOCS14] even when the updates are known in advance [STOC15]. For fully dynamic BFS trees, only the trivial $O(m)$ time recomputation is known. Our complexity bounds are expressed in prediction error measures, where error vertices are those having incorrectly predicted distances, with the corresponding difference as their error. The vertex prediction error $η_{v}$ is the sum of degrees of error vertices, weighted vertex prediction error $η^*_{v}$ is error-weighted sum of degrees of error vertices, and $η_e$ counts the incorrectly predicted updates. For incremental and decremental BFS, our algorithm requires respectively $O(η_v + η_e)$ and $O(\min\{m,η^*_v + η_e\})$ worst case update time using $O(mn)$ preprocessing time and space, and total update time of $O(η^*_v + η_e)$. For fully-dynamic updates, our algorithm requires $O(\min\{m,η^*_v+η_e\})$ worst case update time. At its core, we extend the classical ES Trees [JACM81] for batch updates and fully dynamic updates. This simple extension is sufficient to give a competitive prediction algorithm, which may be generalized to other graph problems. We also consider space optimizations and error correction to improve our results.

2026-05-31T12:05:01Z Shahbaz Khan Shubham Kumar Verma Utkarsh Lohiya http://arxiv.org/abs/2606.01142v1 Repeated Descent: A Framework for Online Budget-Feasible Auctions 2026-05-31T10:23:36Z

We study budget feasible procurement auctions, in which $n$ agents, each with a privately held service cost, offer their services to an employer. The employer seeks to maximize a public submodular valuation function over the set of hired agents, while facing a hard budget constraint. We consider an online posted-price setting, in which agents arrive in a uniformly random order (a.k.a. \emph{secretary arrivals}) and the employer must make irrevocable take-it-or-leave-it offers upon their arrival. The employer does not get any feedback about the agent service costs other than whether they accept the offer or not. We introduce Repeated Descent (a.k.a. \RED), a deterministic framework based on adaptive linear posted pricing. \RED enforces budget feasibility by adaptively adjusting its pricing and balancing each pricing level with the number of agents considered in it. Using \RED as the main building block, we obtain a $1046$-competitive posted-price mechanism for online budget feasible auctions with secretary agent arrivals and submodular valuations, thus improving on the previously best known ratio of (Charalampopoulos et al., EC 2025) by several orders of magnitude. Combining \RED with random subsampling, we obtain the first constant-competitive posted-price budget feasible mechanism for non-monotone submodular valuations. On the negative side, we show that every online budget feasible mechanism with XOS valuations has a competitive ratio of $Ω\!\left(\tfrac{\log n}{(\log\log n)^2}\right)$.

2026-05-31T10:23:36Z Andreas Charalampopoulos Dimitris Fotakis Thanos Tolias http://arxiv.org/abs/2604.27548v2 Smallest suffixient set maintenance in near-real-time 2026-05-31T09:40:16Z

The size of the \textit{smallest suffixient set} of positions of a string recently emerged as a new measure of string \textit{repetitiveness} -- a measure reflecting how much of repetitive content the string contains. We study how to maintain the smallest suffixient set online in near-real-time, that is with small (in our case, polyloglog) worst-case time on processing each letter. Two frameworks are considered: when the text is given letter-by-letter in either a right-to-left or left-to-right direction. Our central algorithmic tool is Weiner's suffix tree algorithm and associated algorithmic primitives for its efficient implementation.

2026-04-30T07:54:25Z 19 pages, 5 figures Dominik Köppl Gregory Kucherov http://arxiv.org/abs/2506.07342v2 On Sketching Trimmed Statistics 2026-05-31T07:12:52Z

We study sketching trimmed statistics of a frequency vector, including the $F_p$ moment of the top-$k$ coordinates and of the trimmed-$k$ vector. Despite their natural role in robust analytics, this is the first time these problems have been studied in any sublinear space setting. For $p \in [0,2]$, we obtain $poly(\log n/\varepsilon)$-space algorithms for both tasks when $k$ is moderately large, and for general $k$ we identify a sharp structural threshold that characterizes exactly when sublinear space is possible: in particular, it is actually determined by the ratio between $a_k^2$ and $\|x_{-k}\|_2^2/k$. We extend these results to $p > 2$ and present several applications including algorithms for thresholded $F_p$ estimation and generalized impact indices. Notably, we improve the space bounds of Govindan, Monemizadeh, and Muthukrishnan (PODS 2017) for computing the $h$-index.

2025-06-09T01:20:56Z PODS 2026 Honghao Lin Hoai-An Nguyen David P. Woodruff http://arxiv.org/abs/2605.09382v2 Learning-Augmented Scalable Linear Assignment Problem Optimization via Neural Dual Warm-Starts 2026-05-31T06:34:59Z

The Linear Assignment Problem is a fundamental combinatorial optimization task where classical exact solvers ensure optimality but suffer from an $\mathcal{O}(N^{3})$ bottleneck, while recent neural approximations struggle with scalability and exactness. We propose a learning-augmented framework that accelerates exact solvers by predicting dual variables to warm-start the search, backed by a fallback mechanism to preserve worst-case guarantees. Central to our approach is RowDualNet, a lightweight, row-independent architecture that avoids the $\mathcal{O}(N^{2})$ memory bottleneck of graph models, enabling scalable neural warm-starting up to $N=16{,}384$. Feasibility is guaranteed by construction via the Min-Trick mechanism, completely eliminating the need for costly iterative projections. Empirically, our method drastically reduces the search effort of the Jonker-Volgenant (LAPJV) algorithm, yielding robust zero-shot generalization with strict optimality and end-to-end speedups of over 2x on complex synthetic data, 1.25x on real-world tracking, and 1.5x on transportation networks.

2026-05-10T07:15:49Z Accepted to ICML 2026. 23 pages, 18 figures Ilay Yavlovich Jad Agbaria Muhamed Mhamed Nir Weinberger Jose Yallouz http://arxiv.org/abs/2606.00996v1 Constant-Stretch Rounding on the Hypersimplex 2026-05-31T04:24:33Z

We study correlated rounding on the hypersimplex, the base polytope of the uniform matroid. For each point $x$ in the hypersimplex, the goal is to sample a $k$-subset $A(x)$ with marginals $x$, while coupling the samples for all choices of $x$ so that nearby inputs produce nearby sets. We give a constant-stretch scheme. Our scheme samples the maximum-entropy $k$-subset distribution with prescribed marginals using a common random ordering and common uniform thresholds. For every $x,y\in[0,1]^n$ with $\sum_i x_i=\sum_i y_i=k$, it satisfies $\mathbb{E}[|A(x)\triangle A(y)|]\le 6\|x-y\|_1$. This improves the previous $O(\log k)$ bound for hypersimplex correlated rounding and answers an open question raised by Naor, Raju, Shetty, Srinivasan, Valieva, and Wajc. By adding dummy coordinates, the same result gives stretch at most $12$ for the at-most-$k$ polytope. The proof was found in a GPT 5.5 Pro Extended conversation prompted by the authors, and Codex was used to help assemble the manuscript under the authors' supervision.

2026-05-31T04:24:33Z 15 pages Nima Anari Alireza Haqi Eric Ma http://arxiv.org/abs/2403.18059v6 Optimality of Non-Adaptive Algorithms in Online Submodular Welfare Maximization with Stochastic Outcomes 2026-05-30T23:00:44Z

We generalize the problem of online submodular welfare maximization to incorporate various stochastic elements that have gained significant attention in recent years. We show that a non-adaptive Greedy algorithm, which is oblivious to the realization of these stochastic elements, achieves the best possible competitive ratio among all polynomial-time algorithms, including adaptive ones, unless NP$=$RP. This result holds even when the objective function is not submodular but instead satisfies the weaker submodular order property. Our results unify and strengthen existing competitive ratio bounds across well-studied settings and diverse arrival models, showing that, in general, adaptivity to stochastic elements offers no advantage in terms of competitive ratio. To establish these results, we introduce a technique that lifts known results from the deterministic setting to the generalized stochastic setting. The technique has broad applicability, enabling us to show that, in certain special cases, non-adaptive Greedy-like algorithms outperform the Greedy algorithm and achieve the optimal competitive ratio. We also apply the technique in reverse to derive new upper bounds on the performance of Greedy-like algorithms in deterministic settings by leveraging upper bounds on the performance of non-adaptive algorithms in stochastic settings.

2024-03-26T19:24:40Z Forthcoming in Operations Research Rajan Udwani http://arxiv.org/abs/2606.00770v1 Search-space Reduction for Boolean MinCSPs via Essential Constraints 2026-05-30T15:21:37Z

For a fixed set $\mathcal{F}$ of Boolean constraint types, a MinCSP$(\mathcal{F})$-instance consists of a formula $F$ that applies $m$ constraints from $\mathcal{F}$ to a set of $n$ Boolean variables. The goal is to remove a minimum subset of constraint applications from $F$ to make the remaining formula satisfiable. Previous work characterized how the choice of $\mathcal{F}$ affects its polynomial-time solvability and approximability. We extend a recently introduced preprocessing framework for graph problems to the problem above. Rephrased in the context of CSPs, this framework defines a constraint application from a given formula $F$ as $c$-essential if it is contained in all $c$-approximate solutions to $F$. Being able to efficiently detect these essential parts of a solution reduces the search space of any follow-up FPT algorithms parameterized by the solution size and yields an immediate asymptotic improvement to the runtime of such algorithms. In this work, we present a dichotomy theorem that distinguishes constraint sets $\mathcal{F}$ for which $c_\mathcal{F}$-essential constraint applications can be detected efficiently for some $c_{\mathcal{F}} \in \mathcal{O}(1)$, from those for which this task is intractable under established complexity-theoretic conjectures. Our results show that for any set $\mathcal{F}$ of bijunctive constraints, there is a polynomial-time algorithm that detects $\mathcal{O}(1)$-essential constraint applications. This contrasts the fact that constant-factor approximating a bijunctive MinCSP$(\mathcal{F})$-problem is intractable under the Unique Games Conjecture.

2026-05-30T15:21:37Z Conference version to appear at the 20th Scandinavian Symposium on Algorithm Theory (SWAT 2026) Bart M. P. Jansen Ruben F. A. Verhaegh http://arxiv.org/abs/2606.00725v1 Eulerian-spanning set and coboundary operator: An investigation of maxcut beyond planar graphs 2026-05-30T13:33:40Z

Using the concepts of Eulerian-spanning set and coboundary operator, we generalize Hadlock's conversion of the maxcut problem on planar graphs to one on general graphs with non-negative weights. Using our conversion, we can explore algorithms for maxcut beyond the class of planar graphs. We obtain a Fixed-Parameter Tractable algorithm for $k$-contraction apex graphs. Specifically, our algorithm can be applied to graphs with crossing number $k$, giving an $O(2^k(n+k)^{3/2}\log (n+k))$-time algorithm that matches the best known results when restricted to non-negative weights.

2026-05-30T13:33:40Z 6 figures Qiming Fang Sihong Shao Yuxuan Wu http://arxiv.org/abs/2605.21400v3 Space-Time Trade-off in Integer Linear Scaling Rounded to the Nearest Integer through Multiplicative and Additive Decomposition 2026-05-30T08:02:34Z

We formulate the problem of clock skew compensation as a special case of the integer linear scaling in the form of iD/A and propose two algorithms -- i.e., the multiplicative decomposition of integer division (MDID) and the additive decomposition of direct search (ADDS) -- for its nearest integer solution, which are not only immune to floating-point precision loss but also non-incremental unlike our prior approaches based on Bresenham's algorithm. Having theoretically established both decomposition algorithms based on a unified and rigorous formulation of the problem of the integer linear scaling rounded to the nearest integer, we discuss the space-time trade-off through the analysis of their computational complexities and non-overflow conditions. Through the numerical examples in a practical context of clock skew compensation under two different scenarios based on 32-bit and 64-bit integers, we observe that MDID can obtain the nearest integer solutions with the complexity of O(1) when D is much smaller than the maximum value of the underlying integer type but overflows otherwise; in comparison, ADDS can handle all the cases under both scenarios without overflows but at the expense of increased computational complexity when i approaches the maximum value of the underlying integer type. We also observe that ADDS based on 32-bit integers is equivalent to the clock skew compensation based on 64-bit double-precision floating-point arithmetic, while both algorithms based on 64-bit integers are equivalent to the clock skew compensation based on 128-bit quadruple-precision floating-point arithmetic, which highlights another trade-off between the bounded compensation errors and lower space complexity of the integer-based decomposition algorithms and the lower chances of overflows resulting from the wide ranges of numbers of the clock skew compensation based on floating-point arithmetic.

2026-05-20T16:57:15Z 12 pages, 3 figures, under review for journal publication Kyeong Soo Kim http://arxiv.org/abs/2501.10918v2 A Min-Max Relation on Dicuts and Dijoins in Weighted Chordal Digraphs 2026-05-30T06:58:58Z

In a digraph, a dicut is a cut where all the arcs cross in one direction. A dijoin is a subset of arcs that intersects every dicut. Edmonds and Giles conjectured that in a weighted digraph, the minimum weight of a dicut is equal to the maximum size of a packing of dijoins. This has been disproved. However, the unweighted version conjectured by Woodall remains open. We prove that the Edmonds-Giles conjecture is true if the underlying undirected graph is chordal. We also give a strongly polynomial-time algorithm to construct such a packing.

2025-01-19T02:02:21Z Gérard Cornuéjols Siyue Liu R. Ravi http://arxiv.org/abs/2606.00500v1 Easy, robust approximate message passing for planted spike models 2026-05-30T03:18:52Z

We present a simple and efficient algorithm for robust approximate message passing (AMP) in the spiked matrix setting. In particular, let $\varepsilon$ be a sufficiently small constant, and suppose that $X \in \mathbb R^{n \times n}$ is a Gaussian matrix with a planted rank-$1$ spike, and $E \in \mathbb R^{n \times n}$ is an adversarially chosen matrix supported on an $\varepsilon n \times \varepsilon n$ principal minor. Let $v_{\mathrm{AMP}}(X)$ be the output of an AMP iteration on the uncorrupted matrix $X$. We give a procedure that, given access only to the corrupted matrix $Y = X + E$, computes a vector $v_{\mathrm{ALG}}(Y)$ which is $\tilde{O}(\sqrt{\varepsilon})$-close to $v_{\mathrm{AMP}}(X)$, for any of a class of AMP iterations which includes sparse Principal Component Analysis (PCA), non-negative PCA, and $\mathbb Z_2$ synchronization. Our algorithm consists of a spectral pre-processing step combined with a robust spectral initialization procedure; given these inputs, we prove that (perhaps surprisingly) AMP is robust out-of-the-box.

2026-05-30T03:18:52Z 32 pages Misha Ivkov Tselil Schramm http://arxiv.org/abs/2501.00614v14 A Minimum Counterexample Proof of the Seymour Second Neighborhood Conjecture via the Graph Level Order 2026-05-30T01:49:02Z

We provide a constructive proof of the Seymour Second Neighborhood Conjecture (SSNC) by reframing the problem as a set-packing optimization problem. The universal family of oriented graphs $\mathcal{O}$ is classified by their minimum out-degree $δ$. This shifts the objective to maximizing the number of non-Seymour vertices. A minimum counterexample (MCE) is a maximal packing of vertices that fail the SSNC. To prove such a packing is unsustainable, we introduce the Graph Level Order (GLOVER). This BFS-based coordinate system partitions $\mathcal{O}$ into rooted neighborhoods $R_i$ from a minimum out-degree node. Set-theoretic multiple parents resolve the double-counting that has plagued Seymour diamonds. This coordinate system also categorizes transitive triangles into eight distinct types and proves that seven are inconsistent in an MCE environment. Distinguishing it from BFS, the MCE environment forces cycles in the first neighborhood of every parent. These cause neighborhoods to become quadratically dense as they both decrease in size and need more arcs. The proof concludes with a supply-demand collision. Arc capacity is consumed when $i > \fracδ{3}$. This makes the packing of non-Seymour vertices unsustainable, forcing the appearance of a Seymour vertex in every graph of $\mathcal{O}$. The algorithm to identify these vertices is $O(|V|+|E|)$. This confirms that it can operate on large oriented networks that are dense and detectable in polynomial time.

2024-12-31T19:19:14Z 17 pages, 9 images. 4 tables. Cut out most of the fluff to trim the paper. Added section on further work Charles N. Glover http://arxiv.org/abs/2605.25280v2 Approximate Algorithms for Chamfer Distance Under Translation 2026-05-30T01:05:41Z

Given two sets of points A and B, $|A| = m$, $|B| = n$, the Chamfer distance from $A$ to $B$ is defined as $\operatorname{CD}(A,B) = \sum_{a\in A} \min_{b\in B} d(a,b)$, where $d$ is a distance metric. Chamfer distance is a popular measure of dissimilarity between two sets of points that has seen increasing usage in computer vision and information retrieval as a substitute for the more computationally demanding Earth Mover's distance. We propose a new problem, Chamfer distance under translation, defined as $\operatorname{CDuT}(A,B) :=\min_{t\in \mathbb{R}^d} \operatorname{CD}(A+t,B)$, where $A+t$ denotes the translation of every point in $A$ by $t$. Chamfer distance under translation is valuable in cases where translations capture aspects of the data unlikely to be relevant for dissimilarity, such as temporal, spatial, or other semantic information. For Chamfer distance under translation, we provide four algorithms: (1) an exact quadratic time algorithm in one dimension, (2) a near quadratic time ($2+\varepsilon$)-approximation algorithm in higher dimensions, (3) a $(1+\varepsilon)$-approximation algorithm with running time $\mathcal{O}(mn^2\varepsilon^{-(d+1)})$, and (4) a near-quadratic time $(1+\varepsilon)$-approximation algorithm for answering the decision version of $\operatorname{CDuT}$ given a separation assumption on $B$. We additionally explore the fine-grained complexity of $\operatorname{CDuT}$.

2026-05-24T22:20:33Z Preprint. 18 pages Gil Halevi Daniel Zhang Jason Zhang http://arxiv.org/abs/2602.00906v7 Hallucination is a Consequence of Space-Optimality: A Rate-Distortion Theorem for Membership Testing 2026-05-29T20:54:28Z

Large language models often hallucinate with high confidence on "random facts" that lack inferable patterns. We formalize the memorization of such facts as a membership testing problem, unifying the discrete error metrics of Bloom filters with the continuous log-loss of LLMs. By analyzing this problem in the regime where facts are sparse in the universe of plausible claims, we establish a rate-distortion theorem: the optimal memory efficiency is characterized by the minimum KL divergence between score distributions on facts and non-facts. This theoretical framework provides a distinctive explanation for hallucination under an idealized setting: even with optimal training, perfect data, and a simplified ``closed world'' setting, the information-theoretically optimal strategy under limited capacity is not to abstain or forget, but to assign high confidence to some non-facts, resulting in hallucination. We validate this theory empirically on both synthetic and real-world data, showing that hallucinations persist as a natural consequence of lossy compression. The same theorem recovers and sharpens classical space lower bounds for Bloom-type filters, pinning down an additive constant left open for two-sided filters.

2026-01-31T21:18:28Z ICML 2026 Anxin Guo Jingwei Li