https://arxiv.org/api/efSR9Zo9QMJZ1brvz8puPD90mTc2026-06-13T22:18:38Z2896615015http://arxiv.org/abs/2605.21400v3Space-Time Trade-off in Integer Linear Scaling Rounded to the Nearest Integer through Multiplicative and Additive Decomposition2026-05-30T08:02:34ZWe formulate the problem of clock skew compensation as a special case of the integer linear scaling in the form of iD/A and propose two algorithms -- i.e., the multiplicative decomposition of integer division (MDID) and the additive decomposition of direct search (ADDS) -- for its nearest integer solution, which are not only immune to floating-point precision loss but also non-incremental unlike our prior approaches based on Bresenham's algorithm. Having theoretically established both decomposition algorithms based on a unified and rigorous formulation of the problem of the integer linear scaling rounded to the nearest integer, we discuss the space-time trade-off through the analysis of their computational complexities and non-overflow conditions. Through the numerical examples in a practical context of clock skew compensation under two different scenarios based on 32-bit and 64-bit integers, we observe that MDID can obtain the nearest integer solutions with the complexity of O(1) when D is much smaller than the maximum value of the underlying integer type but overflows otherwise; in comparison, ADDS can handle all the cases under both scenarios without overflows but at the expense of increased computational complexity when i approaches the maximum value of the underlying integer type. We also observe that ADDS based on 32-bit integers is equivalent to the clock skew compensation based on 64-bit double-precision floating-point arithmetic, while both algorithms based on 64-bit integers are equivalent to the clock skew compensation based on 128-bit quadruple-precision floating-point arithmetic, which highlights another trade-off between the bounded compensation errors and lower space complexity of the integer-based decomposition algorithms and the lower chances of overflows resulting from the wide ranges of numbers of the clock skew compensation based on floating-point arithmetic.2026-05-20T16:57:15Z12 pages, 3 figures, under review for journal publicationKyeong Soo Kimhttp://arxiv.org/abs/2501.10918v2A Min-Max Relation on Dicuts and Dijoins in Weighted Chordal Digraphs2026-05-30T06:58:58ZIn a digraph, a dicut is a cut where all the arcs cross in one direction. A dijoin is a subset of arcs that intersects every dicut. Edmonds and Giles conjectured that in a weighted digraph, the minimum weight of a dicut is equal to the maximum size of a packing of dijoins. This has been disproved. However, the unweighted version conjectured by Woodall remains open. We prove that the Edmonds-Giles conjecture is true if the underlying undirected graph is chordal. We also give a strongly polynomial-time algorithm to construct such a packing.2025-01-19T02:02:21ZGérard CornuéjolsSiyue LiuR. Ravihttp://arxiv.org/abs/2606.00500v1Easy, robust approximate message passing for planted spike models2026-05-30T03:18:52ZWe present a simple and efficient algorithm for robust approximate message passing (AMP) in the spiked matrix setting. In particular, let $\varepsilon$ be a sufficiently small constant, and suppose that $X \in \mathbb R^{n \times n}$ is a Gaussian matrix with a planted rank-$1$ spike, and $E \in \mathbb R^{n \times n}$ is an adversarially chosen matrix supported on an $\varepsilon n \times \varepsilon n$ principal minor. Let $v_{\mathrm{AMP}}(X)$ be the output of an AMP iteration on the uncorrupted matrix $X$. We give a procedure that, given access only to the corrupted matrix $Y = X + E$, computes a vector $v_{\mathrm{ALG}}(Y)$ which is $\tilde{O}(\sqrt{\varepsilon})$-close to $v_{\mathrm{AMP}}(X)$, for any of a class of AMP iterations which includes sparse Principal Component Analysis (PCA), non-negative PCA, and $\mathbb Z_2$ synchronization. Our algorithm consists of a spectral pre-processing step combined with a robust spectral initialization procedure; given these inputs, we prove that (perhaps surprisingly) AMP is robust out-of-the-box.2026-05-30T03:18:52Z32 pagesMisha IvkovTselil Schrammhttp://arxiv.org/abs/2501.00614v14A Minimum Counterexample Proof of the Seymour Second Neighborhood Conjecture via the Graph Level Order2026-05-30T01:49:02ZWe provide a constructive proof of the Seymour Second Neighborhood Conjecture (SSNC) by reframing the problem as a set-packing optimization problem. The universal family of oriented graphs $\mathcal{O}$ is classified by their minimum out-degree $δ$. This shifts the objective to maximizing the number of non-Seymour vertices.
A minimum counterexample (MCE) is a maximal packing of vertices that fail the SSNC. To prove such a packing is unsustainable, we introduce the Graph Level Order (GLOVER). This BFS-based coordinate system partitions $\mathcal{O}$ into rooted neighborhoods $R_i$ from a minimum out-degree node.
Set-theoretic multiple parents resolve the double-counting that has plagued Seymour diamonds. This coordinate system also categorizes transitive triangles into eight distinct types and proves that seven are inconsistent in an MCE environment.
Distinguishing it from BFS, the MCE environment forces cycles in the first neighborhood of every parent. These cause neighborhoods to become quadratically dense as they both decrease in size and need more arcs.
The proof concludes with a supply-demand collision. Arc capacity is consumed when $i > \fracδ{3}$. This makes the packing of non-Seymour vertices unsustainable, forcing the appearance of a Seymour vertex in every graph of $\mathcal{O}$. The algorithm to identify these vertices is $O(|V|+|E|)$. This confirms that it can operate on large oriented networks that are dense and detectable in polynomial time.2024-12-31T19:19:14Z17 pages, 9 images. 4 tables. Cut out most of the fluff to trim the paper. Added section on further workCharles N. Gloverhttp://arxiv.org/abs/2605.25280v2Approximate Algorithms for Chamfer Distance Under Translation2026-05-30T01:05:41ZGiven two sets of points A and B, $|A| = m$, $|B| = n$, the Chamfer distance from $A$ to $B$ is defined as $\operatorname{CD}(A,B) = \sum_{a\in A} \min_{b\in B} d(a,b)$, where $d$ is a distance metric. Chamfer distance is a popular measure of dissimilarity between two sets of points that has seen increasing usage in computer vision and information retrieval as a substitute for the more computationally demanding Earth Mover's distance. We propose a new problem, Chamfer distance under translation, defined as $\operatorname{CDuT}(A,B) :=\min_{t\in \mathbb{R}^d} \operatorname{CD}(A+t,B)$, where $A+t$ denotes the translation of every point in $A$ by $t$. Chamfer distance under translation is valuable in cases where translations capture aspects of the data unlikely to be relevant for dissimilarity, such as temporal, spatial, or other semantic information. For Chamfer distance under translation, we provide four algorithms: (1) an exact quadratic time algorithm in one dimension, (2) a near quadratic time ($2+\varepsilon$)-approximation algorithm in higher dimensions, (3) a $(1+\varepsilon)$-approximation algorithm with running time $\mathcal{O}(mn^2\varepsilon^{-(d+1)})$, and (4) a near-quadratic time $(1+\varepsilon)$-approximation algorithm for answering the decision version of $\operatorname{CDuT}$ given a separation assumption on $B$. We additionally explore the fine-grained complexity of $\operatorname{CDuT}$.2026-05-24T22:20:33ZPreprint. 18 pagesGil HaleviDaniel ZhangJason Zhanghttp://arxiv.org/abs/2602.00906v7Hallucination is a Consequence of Space-Optimality: A Rate-Distortion Theorem for Membership Testing2026-05-29T20:54:28ZLarge language models often hallucinate with high confidence on "random facts" that lack inferable patterns. We formalize the memorization of such facts as a membership testing problem, unifying the discrete error metrics of Bloom filters with the continuous log-loss of LLMs. By analyzing this problem in the regime where facts are sparse in the universe of plausible claims, we establish a rate-distortion theorem: the optimal memory efficiency is characterized by the minimum KL divergence between score distributions on facts and non-facts. This theoretical framework provides a distinctive explanation for hallucination under an idealized setting: even with optimal training, perfect data, and a simplified ``closed world'' setting, the information-theoretically optimal strategy under limited capacity is not to abstain or forget, but to assign high confidence to some non-facts, resulting in hallucination. We validate this theory empirically on both synthetic and real-world data, showing that hallucinations persist as a natural consequence of lossy compression. The same theorem recovers and sharpens classical space lower bounds for Bloom-type filters, pinning down an additive constant left open for two-sided filters.2026-01-31T21:18:28ZICML 2026Anxin GuoJingwei Lihttp://arxiv.org/abs/2412.05790v2Acceleration by Random Stepsizes: Hedging, Equalization, and the Arcsine Stepsize Schedule2026-05-29T19:51:57ZWe show that for separable convex optimization, random stepsizes fully accelerate Gradient Descent. Specifically, using inverse stepsizes i.i.d. from the Arcsine distribution improves the convergence rate from $O(k)$ to $O(\sqrt{k})$, where $k$ is the condition number. No momentum or other algorithmic modifications are required. Our starting point is a remarkable "equalization property" of the Arcsine distribution: it yields an identical convergence rate for all quadratic functions. A key technical insight is that martingale arguments extend this phenomenon to all separable convex functions. We interpret this equalization as an extreme form of hedging: by using this random distribution over stepsizes, Gradient Descent converges at exactly the same rate for all functions in the function class.2024-12-08T03:01:34Zto appear in Foundations of Computational MathematicsJason M. AltschulerPablo A. Parrilohttp://arxiv.org/abs/2606.00292v1High-Dimensional Expanders, the Sparsest Cut Problem, and Steurer's Conjecture2026-05-29T19:24:12ZIn 2010, Steurer conjectured that any family of $n$ unit-norm vectors $v_1,\dots,v_n$ with polynomially small average correlation $\mathbb{E}_{i,j}|\langle v_i,v_j\rangle|\leq n^{-ε}$ contains linear-sized constant-separated sets. We refute this conjecture in a strong sense using the machinery of sparse high-dimensional expanders: such vector families do not even have linear-sized $\frac{1}{\log^{1/4-o(1)}(n)}$-separated sets. Consequently, we show that there are families of vertex expanders on $n$ vertices for which the (average) $L_2$-mixing time to the uniform distribution of any reweighted simple random walk is at least $\log^{5/4-o(1)} n$.2026-05-29T19:24:12Z10 pagesFarzam EbrahimnejadShayan Oveis Gharanhttp://arxiv.org/abs/2606.00289v1Inner Product Aware Quantization: Provably Fast, Accurate, and Adaptive Algorithms2026-05-29T19:21:16ZQuantization is a fundamental tool used to compress datasets, neural network weights, and memory usage in a range of computational tasks. Many downstream applications of vector quantization perform inner products with arbitrary inputs. This motivates the study of inner product aware quantization schemes that approximately preserve inner products with unseen vectors -- in contrast to simply minimizing the mean-squared error.
In this work, we formulate objectives that capture natural desiderata and develop adaptive and unbiased quantization methods that approximately preserve inner products with worst-case and average-case inputs. An analysis of these objectives shows a tight connection with the well-studied notion of Adaptive Stochastic Quantization (ASQ).
We develop provably fast exact and approximate algorithms for our objectives. Our theoretical results inspire efficient practical algorithms that perform well across a variety of workload distributions. They also lead to practical algorithms for standard ASQ which are 2-10$\times$ faster than prior state-of-the-art methods while maintaining quality. These theoretical and empirical results contribute towards making adaptive quantization techniques more efficient and tractable in practical settings.2026-05-29T19:21:16ZNathan WhiteKrish Singalhttp://arxiv.org/abs/2606.00247v1An Upper Bound on Grothendieck's Constant2026-05-29T18:28:09ZWe show that Grothendieck's real constant $K_G$ can be upper bounded by projecting vectors onto a random plane through the origin and thresholding a degree five Hermite polynomial. This resolves a conjecture of Braverman-Makarychev-Makarychev-Naor from 2011, who required an extra randomization step in their rounding scheme and proved $K_G<\fracπ{2\log(1+\sqrt{2})}-10^{-500}$. As a corollary of our result, we prove the bound $K_G<\fracπ{2\log(1+\sqrt{2})}-10^{-217}$ by thresholding degree three Hermite polynomials in the plane. We finally give a rigorous computer-assisted proof that $K_G<\fracπ{2\log(1+\sqrt{2})}-10^{-5}$ using interval arithmetic and degree three Hermite polynomial thresholding.2026-05-29T18:28:09Z37 pages, 2 figuresSteven Heilmanhttp://arxiv.org/abs/2605.31421v1Neuro-symbolic Syntactic Parsing: Shaping a Neural Network with the CYK Algorithm2026-05-29T15:21:11ZIn this paper, we show the possibility of a direct injection of algorithms into neural network architecture. We focus on a complex algorithm, that is, Cocke-Youger-Kasami (CYK) for parsing context-free grammars in Chomsky Normal Form and we propose CYKNN, a simple recurrent neural network architecture for encoding the CYK algorithm in trainable matrix-vector multiplications.We experimented with a very simple grammar with 4 variations showing that our approach outperforms existing LLMs with more than 20B parameters with an in-context learning setting and smaller LLMs of the Qwen family fine-tuned with LoRA. Our attempt paves the way to a different approach to neuro-symbolic methodologies.2026-05-29T15:21:11Z9 content pagesFabio Massimo ZanzottoFederico RanaldiGiorgio Sattahttp://arxiv.org/abs/2605.31417v1An Optimal Algorithm for Binary Closest String2026-05-29T15:18:49ZWe revisit the Binary Closest String problem, which asks, given a set of binary strings $X \subseteq \{0, 1\}^n$, to compute a string minimizing the maximum Hamming distance to $X$. A long line of work has focused on parameterized algorithms with respect to the optimal distance $d$, yielding a sequence of improvements from $O^*(d^d)$ through $O^*(16^d)$, $O^*(9.513^d)$, $O^*(8^d)$, $O^*(6.731^d)$ to the current best-known running time of $O^*(5^d)$ [Chen, Ma, Wang; Algorithmica '16].
We present a faster randomized algorithm running in time $O^*(4^d)$. Our result matches a recent fine-grained lower bound [Abboud, Fischer, Goldenberg, Karthik C.S., Safier; ESA '23], and is therefore conditionally optimal. As an extra benefit, our algorithm is remarkably simple.2026-05-29T15:18:49ZNick FischerMursalin Habibhttp://arxiv.org/abs/2512.08376v2A Distribution Testing Approach to Clustering Distributions2026-05-29T13:21:46ZWe study the following distribution clustering problem: Given a hidden partition of $k$ distributions into two groups, such that the distributions within each group are the same, and the two distributions associated with the two clusters are $\varepsilon$-far in total variation, the goal is to recover the partition. We establish upper and lower bounds on the sample complexity for two fundamental cases: (1) when one of the cluster's distributions is known, and (2) when both are unknown. Our upper and lower bounds characterize the sample complexity's dependence on the domain size $n$, number of distributions $k$, size $r$ of one of the clusters, and distance $\varepsilon$. In particular, we achieve tightness with respect to $(n,k,r,\varepsilon)$ (up to an $O(\log k)$ factor) for all regimes.2025-12-09T09:01:41ZGunjan KumarYash PoteJonathan Scarletthttp://arxiv.org/abs/2605.31176v1Retriever Portfolios: A Principled Approach to Adaptive RAG2026-05-29T11:43:05ZRetrieval-augmented generation (RAG) systems typically rely on a single retriever and a single set of hyperparameters, despite facing highly heterogeneous queries that range from simple factoid questions to complex multi-hop reasoning. We propose a method that automatically selects a small, diverse subset of retrievers (a portfolio) from a large pool of candidates, to cover different regions of the target query distribution. We formalize this setting via an expected best-of-$k$ objective over the query distribution and show that it admits an efficient portfolio construction algorithm with near-optimal guarantees. Across multiple QA benchmarks, our learned portfolios and router pipeline consistently outperform single-retriever and naive multi-retriever baselines on both retrieval metrics and answer quality. In addition, compared to inference-time hyperparameter tuning approaches, fixed portfolios enable parallel retrieval and LLM calls, achieving comparable (and sometimes better) accuracy with substantially lower latency and token cost.2026-05-29T11:43:05ZAccepted at ICML 2026. Code available at: https://github.com/mstou/retriever-portfoliosMiltiadis StourasVincent Cohen-AddadSilvio LattanziOla Svenssonhttp://arxiv.org/abs/2603.26176v2Improved Approximation Algorithms and Hardness Results for Shortest Common Superstring with Reverse Complements2026-05-29T07:49:57ZThe Shortest Common Superstring (SCS) problem is a fundamental task in sequence analysis. In genome assembly, however, the double-stranded nature of DNA implies that each fragment may occur either in its original orientation or as its reverse complement. This motivates the Shortest Common Superstring with Reverse Complements (SCS-RC) problem, which asks for a shortest string that contains, for each input string, either the string itself or its reverse complement as a substring. The previously best-known approximation ratio for SCS-RC was $\frac{23}{8}$. In this paper, we present a new approximation algorithm achieving an improved ratio of $\frac{8}{3}$. Our approach computes an optimal constrained cycle cover by reducing the problem, via a novel gadget construction, to a maximum-weight perfect matching in a general graph. We also investigate the computational hardness of SCS-RC. While the decision version is known to be NP-complete, no explicit inapproximability results were previously established. We show that the hardness of SCS carries over to SCS-RC through a polynomial-time reduction, implying that it is NP-hard to approximate SCS-RC within a factor better than $\frac{333}{332}$. Notably, this hardness result holds even for the DNA alphabet.2026-03-27T08:46:58ZRyosuke YamanoTetsuo Shibuya