https://arxiv.org/api/3FsyW8lzUVN9SVo3s/e2G+lBR1s2026-06-21T08:12:58Z2901945015http://arxiv.org/abs/2605.13810v1Provable Quantization with Randomized Hadamard Transform2026-05-13T17:38:18ZVector quantization via random projection followed by scalar quantization is a fundamental primitive in machine learning, with applications ranging from similarity search to federated learning and KV cache compression. While dense random rotations yield clean theoretical guarantees, they require $Θ(d^2)$ time. The randomized Hadamard transform $HD$ reduces this cost to $O(d \log d)$, but its discrete structure complicates analysis and leads to weaker or purely empirical compression guarantees.
In this work, we study a variant of this approach: dithered quantization with a single randomized Hadamard transform. Specifically, the quantizer applies $HD$ to the input vector and subtracts a random scalar offset before quantizing, injecting additional randomness at negligible cost. We prove that this approach is unbiased and provides mean squared error bounds that asymptotically match those achievable with truly random rotation matrices. In particular, we prove that a dithered version of TurboQuant achieves mean squared error $\bigl(π\sqrt{3}/2 + o(1)\bigr) \cdot 4^{-b}$ at $b$ bits per coordinate, where the $o(1)$ term vanishes uniformly over all unit vectors and all dimensions as the number of quantization levels grows.2026-05-13T17:38:18ZYing FengPiotr IndykMichael KapralovDmitry KrachunBoris Prokhorovhttp://arxiv.org/abs/2605.13806v1Min-Max Optimization Requires Exponentially Many Queries2026-05-13T17:34:24ZWe study the query complexity of min-max optimization of a nonconvex-nonconcave function $f$ over $[0,1]^d \times [0,1]^d$. We show that, given oracle access to $f$ and to its gradient $\nabla f$, any algorithm that finds an $\varepsilon$-approximate stationary point must make a number of queries that is exponential in $1/\varepsilon$ or $d$.2026-05-13T17:34:24ZMartino BernasconiMatteo CastiglioniAndrea CelliAlexandros Hollenderhttp://arxiv.org/abs/2605.13800v1Low-Cost Arborescence Under Edge Faults2026-05-13T17:21:08ZOur input is a directed graph $G = (V,E)$ on $n$ vertices and $m$ edges with a designated root vertex $r$ and a function $cost: E \rightarrow \mathbb{R}_{\geq 0}$. The problem is to maintain a min-cost arborescence in $G$ in the presence of edge faults (a single fault at a time). Edge faults are transient and once the faulty edge is repaired, the original min-cost arborescence $\mathcal{T}$ is restored. Whenever an edge fault happens, we need to update $\mathcal{T}$ to a min-cost arborescence in $G-f$, where $f$ is the faulty edge. Since computing a min-cost arborescence in $G - f$ takes $O(m + n\log n)$ time, we seek to construct a sparse subgraph $H$ in a preprocessing step such that in the event of any edge $f$ failing, it suffices to compute a min-cost arborescence in $H - f$ in order to find a low-cost arborescence in $G - f$.
In the unweighted setting, this is the fault-tolerant subgraph problem for single-source {\em reachability}. Baswana, Choudhary, and Roditty (SICOMP, 2018) showed a $k$-fault tolerant reachability subgraph of size $O(2^kn)$, where $k$ is the number of edge faults. We show a simple polynomial-time algorithm to construct a subgraph $H$ of size $O(n^{3/2})$ such that, for any $f \in E$, a min-cost arborescence in $H-f$ is a 2-approximation of a min-cost arborescence in $G-f$. Thus whenever an edge fault happens, we can find a 2-approximate min-cost arborescence in $G-f$ in $O(n^{3/2})$ time.
Our second problem is in the matroid setting. The input is a matroid $M = (E, {\cal I})$ with a function $cost: E \rightarrow \mathbb{R}$. The problem is to compute a sparse $S \subseteq E$ (called a $k$-fault tolerant preserver) such that for any $F \subseteq E$ with $|F| \le k$, the matroid $M|(S\setminus F)$ contains a min-cost basis of $M|(E\setminus F)$. We show a tight bound of $k.rank(E)$ on the size of a $k$-fault tolerant preserver.2026-05-13T17:21:08ZDipan DeyTelikepalli Kavithahttp://arxiv.org/abs/2602.17346v2Partial Optimality in the Preordering Problem2026-05-13T16:02:27ZPreordering is a generalization of clustering and partial ordering with applications in bioinformatics and social network analysis. Given a finite set $V$ and a value $c_{ab} \in \mathbb{R}$ for every ordered pair $ab$ of elements of $V$, the preordering problem asks for a preorder $\lesssim$ on $V$ that maximizes the sum of the values of those pairs $ab$ for which $a \lesssim b$. Building on the state of the art in solving this NP-hard problem partially, we contribute new partial optimality conditions and efficient algorithms for deciding these conditions. In experiments with real and synthetic data, these new conditions increase, in particular, the fraction of pairs $ab$ for which it is decided efficiently that $a \not\lesssim b$ in an optimal preorder.2026-02-19T13:29:09ZDavid SteinJannik IrmaiBjoern Andreshttp://arxiv.org/abs/2603.23193v3Algorithms and Hardness for Geodetic Set on Tree-like Digraphs2026-05-13T15:35:13ZIn the GEODETIC SET problem, an input is a (di)graph $G$ and integer $k$, and the objective is to decide whether there exists a vertex subset $S$ of size $k$ such that any vertex in $V(G)\setminus S$ lies on a shortest (directed) path between two vertices in $S$. The problem has been studied on undirected and directed graphs from both algorithmic and graph-theoretical perspectives.
We focus on directed graphs and prove that GEODETIC SET admits a polynomial-time algorithm on ditrees, that is, digraphs with possible 2-cycles when the underlying undirected graph is a tree (after deleting possible parallel edges). This positive result naturally leads us to investigate cases where the underlying undirected graph is "close to a tree".
Towards this, we show that GEODETIC SET on digraphs without 2-cycles and whose underlying undirected graph has feedback edge set number $\textsf{fen}$, can be solved in time $2^{\mathcal{O}(\textsf{fen})} \cdot n^{\mathcal{O}(1)}$, where $n$ is the number of vertices. To complement this, we prove that the problem remains NP-hard on DAGs (which do not contain 2-cycles) even when the underlying undirected graph has constant feedback vertex set number and constant pathwidth. Our last result significantly strengthens the result of Araújo and Arraes [Discrete Applied Mathematics, 2022] that the problem is NP-hard on DAGs when the underlying undirected graph is either bipartite, cobipartite or split.2026-03-24T13:41:12Z27 pages, 4 figuresFlorent FoucaudNarges GhareghaniLucas LorieauMorteza Mohammad-NooriRasa Parvini OskueiPrafullkumar Talehttp://arxiv.org/abs/2502.05157v3Efficient distributional regression trees learning algorithms for calibrated non-parametric probabilistic forecasts2026-05-13T15:19:13ZThe perspective of developing trustworthy AI for critical applications in science and engineering requires machine learning techniques that are capable of estimating their own uncertainty. In the context of regression, instead of estimating a conditional mean, this can be achieved by producing a predictive interval for the output, or to even learn a model of the conditional probability $p(y|x)$ of an output $y$ given input features $x$. While this can be done under parametric assumptions with, e.g. generalized linear model, these are typically too strong, and non-parametric models offer flexible alternatives. In particular, for scalar outputs, learning directly a model of the conditional cumulative distribution function of $y$ given $x$ can lead to more precise probabilistic estimates, and the use of proper scoring rules such as the weighted interval score (WIS) and the continuous ranked probability score (CRPS) lead to better coverage and calibration properties.
This paper introduces novel algorithms for learning probabilistic regression trees for the WIS or CRPS loss functions. These algorithms are made computationally efficient thanks to an appropriate use of known data structures - namely min-max heaps, weight-balanced binary trees and Fenwick trees. Through numerical experiments, we demonstrate that the performance of our methods is competitive with alternative approaches. Additionally, our methods benefit from the inherent interpretability and explainability of trees. As a by-product, we show how our trees can be used in the context of conformal prediction and explain why they are particularly well-suited for achieving group-conditional coverage guarantees.2025-02-07T18:39:35ZQuentin DucheminGuillaume Obozinskihttp://arxiv.org/abs/2601.16156v2All ascents exponential from valued constraint graphs of pathwidth three2026-05-13T13:58:06ZMany combinatorial optimization problems can be formulated as finding an assignment that maximizes some pseudo-Boolean function (that we call the fitness function). Strict local search starts with some assignment and follows some update rule to proceed to an adjacent assignment of strictly higher fitness. This means that strict local search algorithms follow ascents in the fitness landscape of the pseudo-Boolean function. The complexity of the pseudo-Boolean function (and the fitness landscapes that it represents) can be parameterized by properties of the valued constraint satisfaction problem (VCSP) that encodes the pseudo-Boolean function. We focus on properties of the constraint graphs of the VCSP, with the intuition that spare graphs are less complex than dense ones. Specifically, we argue that pathwidth is the natural sparsity parameter for understanding limits on the power of strict local search. We show that prior constructions of sparse VCSPs where all ascents are exponentially long had pathwidth greater than or equal to four. We improve this this with our controlled doubling construction: a valued constraint satisfaction problem of pathwidth three where all ascents are exponentially long from a designated initial assignment. We conclude that all strict local search algorithms can be forced to take an exponential number of steps even on simple valued constraint graphs of pathwidth three.2026-01-22T17:57:54Z14 pages, 3 figures, 2 tables; slightly simplified construction and improved proofArtem KaznatcheevWillemijn Volgeringhttp://arxiv.org/abs/2507.18553v4The Geometry of LLM Quantization: GPTQ as Babai's Nearest Plane Algorithm2026-05-13T13:18:37ZQuantizing the weights of large language models (LLMs) from 16-bit to lower bitwidth is the de facto approach to deploy massive transformers onto more affordable accelerators. While GPTQ emerged as one of the standard methods for one-shot post-training quantization at LLM scale, its inner workings are described as a sequence of algebraic updates that obscure geometric meaning or worst-case guarantees. In this work, we show that, when executed back-to-front (from the last to first dimension) for a linear layer, GPTQ is mathematically identical to Babai's nearest plane algorithm for the classical closest vector problem (CVP) on a lattice defined by the Hessian matrix of the layer's inputs. This equivalence is based on a sophisticated mathematical argument, and has two analytical consequences: first, the GPTQ error propagation step gains an intuitive geometric interpretation; second, GPTQ inherits the error upper bound of Babai's algorithm under the assumption that no weights are clipped. Leveraging this bound, we design post-training quantization methods that avoid clipping, and outperform the original GPTQ. In addition, we provide efficient GPU inference kernels for the resulting representation. Taken together, these results place GPTQ on a firm theoretical footing and open the door to importing decades of progress in lattice algorithms towards the design of future quantization algorithms for billion-parameter models. Source code is available at https://github.com/IST-DASLab/GPTQ-Babai.2025-07-24T16:22:18ZPublished as a conference paper at the Fourteenth International Conference on Learning Representations (ICLR 2026): https://openreview.net/forum?id=NFB4QGGS65Jiale ChenYalda ShabanzadehElvir CrnčevićTorsten HoeflerDan Alistarhhttp://arxiv.org/abs/2605.13402v1Fast and Compact Graph Cuts for the Boykov-Kolmogorov Algorithm2026-05-13T11:57:35ZComputing a minimum $s$-$t$ cut in a graph is a solution to a wide range of computer vision problems, and is often done using the Boykov-Kolmogorov (BK) algorithm. In this paper, we revisit the BK algorithm from both a theoretical and practical point of view. We improve the analysis of the time complexity of the BK algorithm to $O(mn|C|)$ and propose a new algorithm, the fast and compact BK (fcBK) algorithm, with a time complexity of $O(m|C|)$, where $m$, $n$, and $|C|$ are the number of edges, number of vertices, and the capacity of the cut, respectively. We additionally propose a compact graph representation that allows our implementation to find a minimum $s$-$t$ cut in a graph with upwards of $10^9$ vertices and $10^{10}$ edges on a machine with 128 GB of memory. We find our implementation of the BK algorithm to be the fastest available implementation of the BK algorithm when evaluating on a comprehensive set of benchmark datasets, highlighting the importance of memory-efficient implementations. We make our implementations publicly available for further research and implementation development within minimum $s$-$t$ cut algorithms.2026-05-13T11:57:35Z15 pages, 6 figures, submitted to the IEEE for possible publicationChristian Møller MikkelstrupAnders Bjorholm DahlPhilip BilleVedrana Andersen DahlInge Li Gørtzhttp://arxiv.org/abs/2605.13392v1Tighter relaxations for MAP-MRF optimization via Singleton Arc Consistency2026-05-13T11:50:58ZWe consider the MAP-MRF inference task, that is, minimizing a function of discrete variables represented as a sum of unary and pairwise terms. A prominent approach for tackling this NP-hard problem in practice is to solve its natural LP relaxation and then iteratively tighten the relaxation by adding clusters. Based on some theoretical observations, we propose a new technique for identifying such clusters. It works by running the Singleton Arc Consistency algorithm in a certain CSP instance. Experimental results indicate that the new tightening technique outperforms the previous approach by [Sontag et al. UAI 2012] that searches for frustrated cycles. Our code will be made available at https://github.com/vnk-ist/MAP-MRF/.2026-05-13T11:50:58ZAsaf Lev-RanPavel ArkhipovVladimir Kolmogorovhttp://arxiv.org/abs/2605.13917v1Clustering with Locally Bounded Ignorance2026-05-13T11:03:37ZIn Correlation Clustering, the input is a graph $G=(V,E)$ with weight function $ω: {V \choose 2}\to Z$
and the task is to partition the vertex set into clusters such that
the total weight of edges between clusters and missing edges
inside clusters is minimized. Due to close connections
between Correlation Clustering and Edge Multicut,
deciding whether there is a partition with total cost at most $k$ is
FPT with respect to $k$ but a polynomial kernel is presumably
impossible. We study the influence of the structure of the fuzzy
edge graph, that is, the graph induced by the weight-0 edges, on the
problem complexity. We show in particular that Correlation
Clustering admits a polynomial problem kernel when parameterized
by $k+d$, where $d$ is the degeneracy of the fuzzy edge graph, and when
parameterized by $k+c$, where $c$ is the closure of the fuzzy edge
graph. We complement these positive results by showing hardness for
several settings where the graph induced by the edges and nonedges has very restricted structure.2026-05-13T11:03:37ZJaroslav GarvardtChristian Komusiewiczhttp://arxiv.org/abs/2605.13299v1Strong Conflict-Free Vertex-Connection via Twin Cover: Kernelization and Chromatic Bounds2026-05-13T10:13:01ZA vertex-coloring of a connected graph $G$ is a strong conflict-free vertex-connection coloring if every two distinct vertices are joined by a shortest path on which some color appears exactly once. The minimum number of colors in such a coloring is the strong conflict-free vertex-connection number $\operatorname{svcfc}(G)$. We study this problem under the parameter twin cover.
Let $X$ be a twin cover of $G$ of size $t$, and let $k$ be the target number of colors. In our first result, given $(G,k)$ together with a twin cover $X$, we reduce in polynomial time to an equivalent annotated instance on at most $\max\{2,t+(t+1)k2^{t+k-1}\}$ vertices. Hence the annotated version of Strong CFVC Number, in which a twin cover is supplied as part of the input, is fixed-parameter tractable parameterized by $t+k$. Using this bound, we then obtain a kernel parameterized by $\operatorname{tc}(G)+k$; in particular, for every fixed $k$, the problem is fixed-parameter tractable parameterized by the twin-cover number alone.
In our second result, we prove every connected graph $G$ with twin cover $X$ of size $t$ satisfies $χ(G)\le \operatorname{svcfc}(G)\le χ(G)+t$. More generally, if $Y\subseteq X$ intersects every shortest path of length at least $3$, then $\operatorname{svcfc}(G)\le χ(G)+|Y|$. We also derive an exact expression for the chromatic number on graphs of bounded twin-cover number: for every proper coloring $\varphi$ of $G[X]$, the minimum number of colors needed to extend $\varphi$ to all of $G$ is $K_\varphi=\max_{S\subseteq X}(|\varphi(S)|+m(S))$, and hence $χ(G)=\min_{\varphi\text{ proper on }G[X]} K_\varphi$. Our results provide the first evidence that twin cover is a useful parameter for strong conflict-free vertex-connection and show that, once a twin cover is fixed, the remaining difficulty is concentrated in a bounded additive gap above the chromatic number.2026-05-13T10:13:01ZAccepted to COCOON 2026; to appear in Springer LNCSSamuel Germanhttp://arxiv.org/abs/2602.13155v2Learning to Approximate Uniform Facility Location via Graph Neural Networks2026-05-13T09:47:05ZNeural networks, particularly message-passing neural networks (MPNNs), are increasingly used as heuristics for hard combinatorial optimization problems. Yet many learning-based methods rely on supervision, reinforcement learning, or gradient estimators, causing high computational cost, unstable training, or limited guarantees. Classical approximation algorithms provide worst-case guarantees but are non-differentiable and cannot adapt to structure in natural input distributions. We study this tradeoff through Uniform Facility Location (UniFL), a problem with applications in clustering, summarization, logistics, and supply chains. We propose a fully differentiable MPNN that incorporates approximation-algorithmic principles without solver supervision or discrete relaxations. The model has provable approximation guarantees and empirically improves on standard approximation algorithms, narrowing the gap to integer linear programming.2026-02-13T18:08:23ZICML 2026Chendi QianChristopher MorrisStefanie JegelkaChristian Sohlerhttp://arxiv.org/abs/2605.13264v1Distributed Approximate Maximum Matching and Minimum Vertex Cover via Generalized Graph Decomposition2026-05-13T09:45:29ZThe classic lower bound of Kuhn, Moscibroda and Wattenhofer [JACM 2016] states that approximate maximum matching and approximate vertex cover (among other problems) in the LOCAL model require $Ω(\min\{\sqrt{\frac{\log n}{\log\log n}}, \frac{\log Δ}{\log\log Δ}\})$ rounds, for any polylogarithmic or smaller approximation ratio. As a function of $Δ$, this complexity was subsequently matched for constant-approximate weighted vertex cover [Bar-Yehuda, Censor-Hillel and Schwartzman, JACM 2017] and constant-approximate maximum matching [Bar-Yehuda, Censor-Hillel, Ghaffari and Schwartzman, PODC 2017]. One might expect, therefore, that the true complexity should be $Θ(\frac{\log Δ}{\log\log Δ})$, and the $n$-dependent term in the lower bound is just an artefact of the proof method.
We show that this is not the case, and a term dependent on $n$ is in fact required. Specifically, we show randomized algorithms for $2+\varepsilon$-approximate maximum matching and approximate (weighted) minimum vertex cover taking $O(\frac{\log n}{\log^2 \log n})$ rounds. Our algorithms are based on a novel graph decomposition result generalizing the method of Miller, Peng and Xu [SPAA 2013], which we use to reduce the `effective' degree of high-degree graphs. We expect that this decomposition may be of further use for other problems.2026-05-13T09:45:29ZTo appear at PODC 2026Peter Davies-Peck10.1145/3796701.3815909http://arxiv.org/abs/2603.27405v2DynamicLogLog: Faster, Smaller, and More Accurate Cardinality Estimation2026-05-13T07:43:24ZCardinality estimation - calculating the number of distinct elements in a stream - is a longstanding problem with applications from networking to bioinformatics. HyperLogLog (HLL), the prevailing standard, has a well-known error spike in its transition region and requires 6 bits per bucket, with data structure size scaling as B*log(log(cardinality)). We present DynamicLogLog (DLL), which uses a shared exponent across all buckets, storing only relative leading-zero counts. This yields three benefits: (1) only 4 bits per bucket (33% memory reduction), (2) an early exit mask that rejects >99.9% of elements at high cardinality before any bucket access (over 10x faster than HLL when bandwidth-constrained), and (3) a flat error profile via Dynamic Linear Counting (DLC) and a Logarithmic Hybrid Blend that eliminates HLL's transition artifact. Squaring the maximum representable cardinality requires only a single additional bit of global state. At 2,048 buckets with 512k simulations, DLL4's hybrid estimate achieves 1.830% mean and 1.834% peak absolute error using 1,024 bytes, compared to 1.84% mean and 34.1% peak for HLL using 1,536 bytes. DLC achieves 1.90% mean without correction factors. DynamicUltraLogLog (UDLL6), a fusion of DLL and UltraLogLog, achieves ULL-level accuracy at 75% of the memory. History-corrected variants (Hybrid+n) and Layered DLC (LDLC) provide further improvements using per-state correction tables and anti-phase error cancellation.2026-03-28T20:52:33Z35 pages, 18 figures, 2 listings (code)Brian Bushnell