https://arxiv.org/api/3FsyW8lzUVN9SVo3s/e2G+lBR1s 2026-06-21T08:12:58Z 29019 450 15 http://arxiv.org/abs/2605.13810v1 Provable Quantization with Randomized Hadamard Transform 2026-05-13T17:38:18Z

Vector quantization via random projection followed by scalar quantization is a fundamental primitive in machine learning, with applications ranging from similarity search to federated learning and KV cache compression. While dense random rotations yield clean theoretical guarantees, they require $Θ(d^2)$ time. The randomized Hadamard transform $HD$ reduces this cost to $O(d \log d)$, but its discrete structure complicates analysis and leads to weaker or purely empirical compression guarantees. In this work, we study a variant of this approach: dithered quantization with a single randomized Hadamard transform. Specifically, the quantizer applies $HD$ to the input vector and subtracts a random scalar offset before quantizing, injecting additional randomness at negligible cost. We prove that this approach is unbiased and provides mean squared error bounds that asymptotically match those achievable with truly random rotation matrices. In particular, we prove that a dithered version of TurboQuant achieves mean squared error $\bigl(π\sqrt{3}/2 + o(1)\bigr) \cdot 4^{-b}$ at $b$ bits per coordinate, where the $o(1)$ term vanishes uniformly over all unit vectors and all dimensions as the number of quantization levels grows.

2026-05-13T17:38:18Z Ying Feng Piotr Indyk Michael Kapralov Dmitry Krachun Boris Prokhorov http://arxiv.org/abs/2605.13806v1 Min-Max Optimization Requires Exponentially Many Queries 2026-05-13T17:34:24Z

We study the query complexity of min-max optimization of a nonconvex-nonconcave function $f$ over $[0,1]^d \times [0,1]^d$. We show that, given oracle access to $f$ and to its gradient $\nabla f$, any algorithm that finds an $\varepsilon$-approximate stationary point must make a number of queries that is exponential in $1/\varepsilon$ or $d$.

2026-05-13T17:34:24Z Martino Bernasconi Matteo Castiglioni Andrea Celli Alexandros Hollender http://arxiv.org/abs/2605.13800v1 Low-Cost Arborescence Under Edge Faults 2026-05-13T17:21:08Z

Our input is a directed graph $G = (V,E)$ on $n$ vertices and $m$ edges with a designated root vertex $r$ and a function $cost: E \rightarrow \mathbb{R}_{\geq 0}$. The problem is to maintain a min-cost arborescence in $G$ in the presence of edge faults (a single fault at a time). Edge faults are transient and once the faulty edge is repaired, the original min-cost arborescence $\mathcal{T}$ is restored. Whenever an edge fault happens, we need to update $\mathcal{T}$ to a min-cost arborescence in $G-f$, where $f$ is the faulty edge. Since computing a min-cost arborescence in $G - f$ takes $O(m + n\log n)$ time, we seek to construct a sparse subgraph $H$ in a preprocessing step such that in the event of any edge $f$ failing, it suffices to compute a min-cost arborescence in $H - f$ in order to find a low-cost arborescence in $G - f$. In the unweighted setting, this is the fault-tolerant subgraph problem for single-source {\em reachability}. Baswana, Choudhary, and Roditty (SICOMP, 2018) showed a $k$-fault tolerant reachability subgraph of size $O(2^kn)$, where $k$ is the number of edge faults. We show a simple polynomial-time algorithm to construct a subgraph $H$ of size $O(n^{3/2})$ such that, for any $f \in E$, a min-cost arborescence in $H-f$ is a 2-approximation of a min-cost arborescence in $G-f$. Thus whenever an edge fault happens, we can find a 2-approximate min-cost arborescence in $G-f$ in $O(n^{3/2})$ time. Our second problem is in the matroid setting. The input is a matroid $M = (E, {\cal I})$ with a function $cost: E \rightarrow \mathbb{R}$. The problem is to compute a sparse $S \subseteq E$ (called a $k$-fault tolerant preserver) such that for any $F \subseteq E$ with $|F| \le k$, the matroid $M|(S\setminus F)$ contains a min-cost basis of $M|(E\setminus F)$. We show a tight bound of $k.rank(E)$ on the size of a $k$-fault tolerant preserver.

2026-05-13T17:21:08Z Dipan Dey Telikepalli Kavitha http://arxiv.org/abs/2602.17346v2 Partial Optimality in the Preordering Problem 2026-05-13T16:02:27Z

Preordering is a generalization of clustering and partial ordering with applications in bioinformatics and social network analysis. Given a finite set $V$ and a value $c_{ab} \in \mathbb{R}$ for every ordered pair $ab$ of elements of $V$, the preordering problem asks for a preorder $\lesssim$ on $V$ that maximizes the sum of the values of those pairs $ab$ for which $a \lesssim b$. Building on the state of the art in solving this NP-hard problem partially, we contribute new partial optimality conditions and efficient algorithms for deciding these conditions. In experiments with real and synthetic data, these new conditions increase, in particular, the fraction of pairs $ab$ for which it is decided efficiently that $a \not\lesssim b$ in an optimal preorder.

2026-02-19T13:29:09Z David Stein Jannik Irmai Bjoern Andres http://arxiv.org/abs/2603.23193v3 Algorithms and Hardness for Geodetic Set on Tree-like Digraphs 2026-05-13T15:35:13Z

In the GEODETIC SET problem, an input is a (di)graph $G$ and integer $k$, and the objective is to decide whether there exists a vertex subset $S$ of size $k$ such that any vertex in $V(G)\setminus S$ lies on a shortest (directed) path between two vertices in $S$. The problem has been studied on undirected and directed graphs from both algorithmic and graph-theoretical perspectives. We focus on directed graphs and prove that GEODETIC SET admits a polynomial-time algorithm on ditrees, that is, digraphs with possible 2-cycles when the underlying undirected graph is a tree (after deleting possible parallel edges). This positive result naturally leads us to investigate cases where the underlying undirected graph is "close to a tree". Towards this, we show that GEODETIC SET on digraphs without 2-cycles and whose underlying undirected graph has feedback edge set number $\textsf{fen}$, can be solved in time $2^{\mathcal{O}(\textsf{fen})} \cdot n^{\mathcal{O}(1)}$, where $n$ is the number of vertices. To complement this, we prove that the problem remains NP-hard on DAGs (which do not contain 2-cycles) even when the underlying undirected graph has constant feedback vertex set number and constant pathwidth. Our last result significantly strengthens the result of Araújo and Arraes [Discrete Applied Mathematics, 2022] that the problem is NP-hard on DAGs when the underlying undirected graph is either bipartite, cobipartite or split.

2026-03-24T13:41:12Z 27 pages, 4 figures Florent Foucaud Narges Ghareghani Lucas Lorieau Morteza Mohammad-Noori Rasa Parvini Oskuei Prafullkumar Tale http://arxiv.org/abs/2502.05157v3 Efficient distributional regression trees learning algorithms for calibrated non-parametric probabilistic forecasts 2026-05-13T15:19:13Z

The perspective of developing trustworthy AI for critical applications in science and engineering requires machine learning techniques that are capable of estimating their own uncertainty. In the context of regression, instead of estimating a conditional mean, this can be achieved by producing a predictive interval for the output, or to even learn a model of the conditional probability $p(y|x)$ of an output $y$ given input features $x$. While this can be done under parametric assumptions with, e.g. generalized linear model, these are typically too strong, and non-parametric models offer flexible alternatives. In particular, for scalar outputs, learning directly a model of the conditional cumulative distribution function of $y$ given $x$ can lead to more precise probabilistic estimates, and the use of proper scoring rules such as the weighted interval score (WIS) and the continuous ranked probability score (CRPS) lead to better coverage and calibration properties. This paper introduces novel algorithms for learning probabilistic regression trees for the WIS or CRPS loss functions. These algorithms are made computationally efficient thanks to an appropriate use of known data structures - namely min-max heaps, weight-balanced binary trees and Fenwick trees. Through numerical experiments, we demonstrate that the performance of our methods is competitive with alternative approaches. Additionally, our methods benefit from the inherent interpretability and explainability of trees. As a by-product, we show how our trees can be used in the context of conformal prediction and explain why they are particularly well-suited for achieving group-conditional coverage guarantees.

2025-02-07T18:39:35Z Quentin Duchemin Guillaume Obozinski http://arxiv.org/abs/2601.16156v2 All ascents exponential from valued constraint graphs of pathwidth three 2026-05-13T13:58:06Z

Many combinatorial optimization problems can be formulated as finding an assignment that maximizes some pseudo-Boolean function (that we call the fitness function). Strict local search starts with some assignment and follows some update rule to proceed to an adjacent assignment of strictly higher fitness. This means that strict local search algorithms follow ascents in the fitness landscape of the pseudo-Boolean function. The complexity of the pseudo-Boolean function (and the fitness landscapes that it represents) can be parameterized by properties of the valued constraint satisfaction problem (VCSP) that encodes the pseudo-Boolean function. We focus on properties of the constraint graphs of the VCSP, with the intuition that spare graphs are less complex than dense ones. Specifically, we argue that pathwidth is the natural sparsity parameter for understanding limits on the power of strict local search. We show that prior constructions of sparse VCSPs where all ascents are exponentially long had pathwidth greater than or equal to four. We improve this this with our controlled doubling construction: a valued constraint satisfaction problem of pathwidth three where all ascents are exponentially long from a designated initial assignment. We conclude that all strict local search algorithms can be forced to take an exponential number of steps even on simple valued constraint graphs of pathwidth three.

2026-01-22T17:57:54Z 14 pages, 3 figures, 2 tables; slightly simplified construction and improved proof Artem Kaznatcheev Willemijn Volgering http://arxiv.org/abs/2507.18553v4 The Geometry of LLM Quantization: GPTQ as Babai's Nearest Plane Algorithm 2026-05-13T13:18:37Z

Quantizing the weights of large language models (LLMs) from 16-bit to lower bitwidth is the de facto approach to deploy massive transformers onto more affordable accelerators. While GPTQ emerged as one of the standard methods for one-shot post-training quantization at LLM scale, its inner workings are described as a sequence of algebraic updates that obscure geometric meaning or worst-case guarantees. In this work, we show that, when executed back-to-front (from the last to first dimension) for a linear layer, GPTQ is mathematically identical to Babai's nearest plane algorithm for the classical closest vector problem (CVP) on a lattice defined by the Hessian matrix of the layer's inputs. This equivalence is based on a sophisticated mathematical argument, and has two analytical consequences: first, the GPTQ error propagation step gains an intuitive geometric interpretation; second, GPTQ inherits the error upper bound of Babai's algorithm under the assumption that no weights are clipped. Leveraging this bound, we design post-training quantization methods that avoid clipping, and outperform the original GPTQ. In addition, we provide efficient GPU inference kernels for the resulting representation. Taken together, these results place GPTQ on a firm theoretical footing and open the door to importing decades of progress in lattice algorithms towards the design of future quantization algorithms for billion-parameter models. Source code is available at https://github.com/IST-DASLab/GPTQ-Babai.

2025-07-24T16:22:18Z Published as a conference paper at the Fourteenth International Conference on Learning Representations (ICLR 2026): https://openreview.net/forum?id=NFB4QGGS65 Jiale Chen Yalda Shabanzadeh Elvir Crnčević Torsten Hoefler Dan Alistarh http://arxiv.org/abs/2605.13402v1 Fast and Compact Graph Cuts for the Boykov-Kolmogorov Algorithm 2026-05-13T11:57:35Z

Computing a minimum $s$-$t$ cut in a graph is a solution to a wide range of computer vision problems, and is often done using the Boykov-Kolmogorov (BK) algorithm. In this paper, we revisit the BK algorithm from both a theoretical and practical point of view. We improve the analysis of the time complexity of the BK algorithm to $O(mn|C|)$ and propose a new algorithm, the fast and compact BK (fcBK) algorithm, with a time complexity of $O(m|C|)$, where $m$, $n$, and $|C|$ are the number of edges, number of vertices, and the capacity of the cut, respectively. We additionally propose a compact graph representation that allows our implementation to find a minimum $s$-$t$ cut in a graph with upwards of $10^9$ vertices and $10^{10}$ edges on a machine with 128 GB of memory. We find our implementation of the BK algorithm to be the fastest available implementation of the BK algorithm when evaluating on a comprehensive set of benchmark datasets, highlighting the importance of memory-efficient implementations. We make our implementations publicly available for further research and implementation development within minimum $s$-$t$ cut algorithms.

2026-05-13T11:57:35Z 15 pages, 6 figures, submitted to the IEEE for possible publication Christian Møller Mikkelstrup Anders Bjorholm Dahl Philip Bille Vedrana Andersen Dahl Inge Li Gørtz http://arxiv.org/abs/2605.13392v1 Tighter relaxations for MAP-MRF optimization via Singleton Arc Consistency 2026-05-13T11:50:58Z

We consider the MAP-MRF inference task, that is, minimizing a function of discrete variables represented as a sum of unary and pairwise terms. A prominent approach for tackling this NP-hard problem in practice is to solve its natural LP relaxation and then iteratively tighten the relaxation by adding clusters. Based on some theoretical observations, we propose a new technique for identifying such clusters. It works by running the Singleton Arc Consistency algorithm in a certain CSP instance. Experimental results indicate that the new tightening technique outperforms the previous approach by [Sontag et al. UAI 2012] that searches for frustrated cycles. Our code will be made available at https://github.com/vnk-ist/MAP-MRF/.

2026-05-13T11:50:58Z Asaf Lev-Ran Pavel Arkhipov Vladimir Kolmogorov http://arxiv.org/abs/2605.13917v1 Clustering with Locally Bounded Ignorance 2026-05-13T11:03:37Z

In Correlation Clustering, the input is a graph $G=(V,E)$ with weight function $ω: {V \choose 2}\to Z$ and the task is to partition the vertex set into clusters such that the total weight of edges between clusters and missing edges inside clusters is minimized. Due to close connections between Correlation Clustering and Edge Multicut, deciding whether there is a partition with total cost at most $k$ is FPT with respect to $k$ but a polynomial kernel is presumably impossible. We study the influence of the structure of the fuzzy edge graph, that is, the graph induced by the weight-0 edges, on the problem complexity. We show in particular that Correlation Clustering admits a polynomial problem kernel when parameterized by $k+d$, where $d$ is the degeneracy of the fuzzy edge graph, and when parameterized by $k+c$, where $c$ is the closure of the fuzzy edge graph. We complement these positive results by showing hardness for several settings where the graph induced by the edges and nonedges has very restricted structure.

2026-05-13T11:03:37Z Jaroslav Garvardt Christian Komusiewicz http://arxiv.org/abs/2605.13299v1 Strong Conflict-Free Vertex-Connection via Twin Cover: Kernelization and Chromatic Bounds 2026-05-13T10:13:01Z

A vertex-coloring of a connected graph $G$ is a strong conflict-free vertex-connection coloring if every two distinct vertices are joined by a shortest path on which some color appears exactly once. The minimum number of colors in such a coloring is the strong conflict-free vertex-connection number $\operatorname{svcfc}(G)$. We study this problem under the parameter twin cover. Let $X$ be a twin cover of $G$ of size $t$, and let $k$ be the target number of colors. In our first result, given $(G,k)$ together with a twin cover $X$, we reduce in polynomial time to an equivalent annotated instance on at most $\max\{2,t+(t+1)k2^{t+k-1}\}$ vertices. Hence the annotated version of Strong CFVC Number, in which a twin cover is supplied as part of the input, is fixed-parameter tractable parameterized by $t+k$. Using this bound, we then obtain a kernel parameterized by $\operatorname{tc}(G)+k$; in particular, for every fixed $k$, the problem is fixed-parameter tractable parameterized by the twin-cover number alone. In our second result, we prove every connected graph $G$ with twin cover $X$ of size $t$ satisfies $χ(G)\le \operatorname{svcfc}(G)\le χ(G)+t$. More generally, if $Y\subseteq X$ intersects every shortest path of length at least $3$, then $\operatorname{svcfc}(G)\le χ(G)+|Y|$. We also derive an exact expression for the chromatic number on graphs of bounded twin-cover number: for every proper coloring $\varphi$ of $G[X]$, the minimum number of colors needed to extend $\varphi$ to all of $G$ is $K_\varphi=\max_{S\subseteq X}(|\varphi(S)|+m(S))$, and hence $χ(G)=\min_{\varphi\text{ proper on }G[X]} K_\varphi$. Our results provide the first evidence that twin cover is a useful parameter for strong conflict-free vertex-connection and show that, once a twin cover is fixed, the remaining difficulty is concentrated in a bounded additive gap above the chromatic number.

2026-05-13T10:13:01Z Accepted to COCOON 2026; to appear in Springer LNCS Samuel German http://arxiv.org/abs/2602.13155v2 Learning to Approximate Uniform Facility Location via Graph Neural Networks 2026-05-13T09:47:05Z

Neural networks, particularly message-passing neural networks (MPNNs), are increasingly used as heuristics for hard combinatorial optimization problems. Yet many learning-based methods rely on supervision, reinforcement learning, or gradient estimators, causing high computational cost, unstable training, or limited guarantees. Classical approximation algorithms provide worst-case guarantees but are non-differentiable and cannot adapt to structure in natural input distributions. We study this tradeoff through Uniform Facility Location (UniFL), a problem with applications in clustering, summarization, logistics, and supply chains. We propose a fully differentiable MPNN that incorporates approximation-algorithmic principles without solver supervision or discrete relaxations. The model has provable approximation guarantees and empirically improves on standard approximation algorithms, narrowing the gap to integer linear programming.

2026-02-13T18:08:23Z ICML 2026 Chendi Qian Christopher Morris Stefanie Jegelka Christian Sohler http://arxiv.org/abs/2605.13264v1 Distributed Approximate Maximum Matching and Minimum Vertex Cover via Generalized Graph Decomposition 2026-05-13T09:45:29Z

The classic lower bound of Kuhn, Moscibroda and Wattenhofer [JACM 2016] states that approximate maximum matching and approximate vertex cover (among other problems) in the LOCAL model require $Ω(\min\{\sqrt{\frac{\log n}{\log\log n}}, \frac{\log Δ}{\log\log Δ}\})$ rounds, for any polylogarithmic or smaller approximation ratio. As a function of $Δ$, this complexity was subsequently matched for constant-approximate weighted vertex cover [Bar-Yehuda, Censor-Hillel and Schwartzman, JACM 2017] and constant-approximate maximum matching [Bar-Yehuda, Censor-Hillel, Ghaffari and Schwartzman, PODC 2017]. One might expect, therefore, that the true complexity should be $Θ(\frac{\log Δ}{\log\log Δ})$, and the $n$-dependent term in the lower bound is just an artefact of the proof method. We show that this is not the case, and a term dependent on $n$ is in fact required. Specifically, we show randomized algorithms for $2+\varepsilon$-approximate maximum matching and approximate (weighted) minimum vertex cover taking $O(\frac{\log n}{\log^2 \log n})$ rounds. Our algorithms are based on a novel graph decomposition result generalizing the method of Miller, Peng and Xu [SPAA 2013], which we use to reduce the `effective' degree of high-degree graphs. We expect that this decomposition may be of further use for other problems.

2026-05-13T09:45:29Z To appear at PODC 2026 Peter Davies-Peck 10.1145/3796701.3815909 http://arxiv.org/abs/2603.27405v2 DynamicLogLog: Faster, Smaller, and More Accurate Cardinality Estimation 2026-05-13T07:43:24Z

Cardinality estimation - calculating the number of distinct elements in a stream - is a longstanding problem with applications from networking to bioinformatics. HyperLogLog (HLL), the prevailing standard, has a well-known error spike in its transition region and requires 6 bits per bucket, with data structure size scaling as B*log(log(cardinality)). We present DynamicLogLog (DLL), which uses a shared exponent across all buckets, storing only relative leading-zero counts. This yields three benefits: (1) only 4 bits per bucket (33% memory reduction), (2) an early exit mask that rejects >99.9% of elements at high cardinality before any bucket access (over 10x faster than HLL when bandwidth-constrained), and (3) a flat error profile via Dynamic Linear Counting (DLC) and a Logarithmic Hybrid Blend that eliminates HLL's transition artifact. Squaring the maximum representable cardinality requires only a single additional bit of global state. At 2,048 buckets with 512k simulations, DLL4's hybrid estimate achieves 1.830% mean and 1.834% peak absolute error using 1,024 bytes, compared to 1.84% mean and 34.1% peak for HLL using 1,536 bytes. DLC achieves 1.90% mean without correction factors. DynamicUltraLogLog (UDLL6), a fusion of DLL and UltraLogLog, achieves ULL-level accuracy at 75% of the memory. History-corrected variants (Hybrid+n) and Layered DLC (LDLC) provide further improvements using per-state correction tables and anti-phase error cancellation.

2026-03-28T20:52:33Z 35 pages, 18 figures, 2 listings (code) Brian Bushnell