https://arxiv.org/api/KMQt8xUQT4+TNLQL63m7ekWyc+w 2026-06-21T16:33:16Z 29019 555 15 http://arxiv.org/abs/2605.07091v1 Estimating Correlation Clustering Cost in Node-Arrival Stream 2026-05-08T01:16:21Z

We study the correlation clustering problem in the node-arrival data stream model. Unlike previous work, where the stream consists of the graph's edges, we focus on the setting in which the stream contains only the nodes. This model better reflects many real-world scenarios in which the data stream naturally consists of raw objects (e.g., images, tweets), and the similar/dissimilar edges are derived through a similarity function. We present C$^4$Approx, a streaming algorithm that approximates the cost of correlation clustering using sublinear space in the number of nodes and a constant number of passes. We further complement this result with lower bounds. Experiments on real-world datasets show that by storing only 2% of the nodes, our algorithm achieves performance comparable to the classic Pivot algorithm and the more recent PrunedPivot algorithm, even on sparse graphs.

2026-05-08T01:16:21Z ICML 2026 Kaiwen Liu Seba Daniela Villalobos Qin Zhang http://arxiv.org/abs/2605.07080v1 Online Allocation with Unknown Shared Supply 2026-05-08T00:59:11Z

Many real-world resource allocation systems, such as humanitarian logistics and vaccine distribution, must preposition limited supply across multiple locations before demand is realized while stockouts incur irreversible service losses. To study this, we introduce the Online Shared Supply Allocation (OSSA) problem, a stateful online model in which a central hub allocates a finite, unknown supply to multiple sites facing sequential demand under fixed-charge transportation costs and lost-sales penalties. Unlike classical make-to-stock or make-to-order inventory models, OSSA precludes backlogging and replenishment only hedges against future demand. To tackle OSSA, we propose a deterministic threshold-proportional policy GPA and prove that it achieves a $4/3$-approximation to the offline optimum up to an additive term independent of the total supply. We complement this with matching lower bounds showing that the $4/3$ ratio is tight and that the additive-error dependence is unavoidable, even for randomized algorithms that know the total supply upfront. Finally, we develop a learning-augmented extension to GPA that principally incorporates imperfect forecasts (e.g., from human experts or ML models) commonly available in practice, enabling us to exploit high-quality advice while being robust against arbitrary bad ones. Synthetic and real-world experiments show that GPA outperforms natural baselines with global supply is scarce.

2026-05-08T00:59:11Z Tzeh Yuan Neoh Davin Choo Mengchu Yue Milind Tambe http://arxiv.org/abs/2510.07622v2 Conjugate queries can help 2026-05-08T00:12:53Z

We give a natural problem over input quantum oracles $U$ which cannot be solved with exponentially many black-box queries to $U$ and $U^\dagger$, but which can be solved with constant many queries to $U$ and $U^*$, or $U$ and $U^{\mathrm{T}}$. We also demonstrate a quantum commitment scheme that is secure against adversaries that query only $U$ and $U^\dagger$, but is insecure if the adversary can query $U^*$. These results show that conjugate and transpose queries do give more power to quantum algorithms, lending credence to the idea put forth by Zhandry that cryptographic primitives should prove security against these forms of queries. Our key lemma is that any circuit using $q$ forward and inverse queries to a state preparation unitary for a state $σ$ can be simulated to $\varepsilon$ error with $n = \mathcal{O}(q^2/\varepsilon)$ copies of $σ$. Consequently, for decision tasks, algorithms using (forward and inverse) state preparation queries only ever perform quadratically better than sample access. We also identify a motif, which we call the "acorn trick", where generically strengthening a quantum resource can be possible if the output is allowed to be random, bypassing no-go theorems for deterministic algorithms. We demonstrate this idea for several settings, including controlization and purification.

2025-10-08T23:44:30Z 28 pages; v2 expanding and clarifying discussion of the acorn trick and random purification Ewin Tang John Wright Mark Zhandry http://arxiv.org/abs/2310.02243v2 Learning quantum Hamiltonians at any temperature in polynomial time 2026-05-08T00:00:03Z

We study the problem of learning a local quantum Hamiltonian $H$ given copies of its Gibbs state $ρ= e^{-βH}/\textrm{tr}(e^{-βH})$ at a known inverse temperature $β>0$. Anshu, Arunachalam, Kuwahara, and Soleimanifar (arXiv:2004.07266) gave an algorithm to learn a Hamiltonian on $n$ qubits to precision $ε$ with only polynomially many copies of the Gibbs state, but which takes exponential time. Obtaining a computationally efficient algorithm has been a major open problem [Alhambra'22 (arXiv:2204.08349)], [Anshu, Arunachalam'22 (arXiv:2204.08349)], with prior work only resolving this in the limited cases of high temperature [Haah, Kothari, Tang'21 (arXiv:2108.04842)] or commuting terms [Anshu, Arunachalam, Kuwahara, Soleimanifar'21]. We fully resolve this problem, giving a polynomial time algorithm for learning $H$ to precision $ε$ from polynomially many copies of the Gibbs state at any constant $β> 0$. Our main technical contribution is a new flat polynomial approximation to the exponential function, and a translation between multi-variate scalar polynomials and nested commutators. This enables us to formulate Hamiltonian learning as a polynomial system. We then show that solving a low-degree sum-of-squares relaxation of this polynomial system suffices to accurately learn the Hamiltonian.

2023-10-03T17:50:26Z 66 pages; v2 minor edits, clarification on locality Ainesh Bakshi Allen Liu Ankur Moitra Ewin Tang http://arxiv.org/abs/2405.00082v4 Structure learning of Hamiltonians from real-time evolution 2026-05-07T23:58:59Z

We study the problem of Hamiltonian structure learning from real-time evolution: given the ability to apply $e^{-\mathrm{i} Ht}$ for an unknown local Hamiltonian $H = \sum_{a = 1}^m λ_a E_a$ on $n$ qubits, the goal is to recover $H$. This problem is already well-understood under the assumption that the interaction terms, $E_a$, are given, and only the interaction strengths, $λ_a$, are unknown. But how efficiently can we learn a local Hamiltonian without prior knowledge of its interaction structure? We present a new, general approach to Hamiltonian learning that not only solves the challenging structure learning variant, but also resolves other open questions in the area, all while achieving the gold standard of Heisenberg-limited scaling. In particular, our algorithm recovers the Hamiltonian to $\varepsilon$ error with total evolution time $O(\log (n)/\varepsilon)$, and has the following appealing properties: (1) it does not need to know the Hamiltonian terms; (2) it works beyond the short-range setting, extending to any Hamiltonian $H$ where the sum of terms interacting with a qubit has bounded norm; (3) it evolves according to $H$ in constant time $t$ increments, thus achieving constant time resolution. As an application, we can also learn Hamiltonians exhibiting power-law decay up to accuracy $\varepsilon$ with total evolution time beating the standard limit of $1/\varepsilon^2$.

2024-04-30T18:00:00Z 52 pages; v2 discussed more literature, qualified some claims; v3 minor correction discussing prior work; v4 strengthened main theorem Ainesh Bakshi Allen Liu Ankur Moitra Ewin Tang 10.1109/FOCS61266.2024.00069 http://arxiv.org/abs/2602.11476v2 Bounded Local Generator Classes for Deterministic State Evolution 2026-05-07T23:12:01Z

We define a bounded local generator class (BLGC) for deterministic state evolution on graph-indexed systems. The construction consists of finite-range generators operating on bounded local state under deterministic composition. Each update acts only on a bounded-radius neighborhood and applies a bounded local transformation with projection onto a compact state domain. Under the BLGC constraints, per-step operator work remains independent of total system size M. Specifically, incremental update cost satisfies $W_t = O(1)$ with respect to $M \to \infty$ for fixed interaction radius $r$. The framework admits a Hilbert-space embedding in $\ell^2(V)\otimes \mathbb{R}^d$ and yields bounded operators under composition on admissible subspaces. The result establishes a structural decoupling between global state capacity and incremental computational work. The claims apply specifically to the bounded local generator class defined in this paper.

2026-02-12T01:24:27Z 42 pages, 1 figure. Introduces bounded local generator classes BLGC for deterministic locality-preserving state evolution with dimension-work decoupling under bounded interaction radius R. Jay Martin http://arxiv.org/abs/2502.09834v2 Optimal $k$-Secretary with Logarithmic Memory 2026-05-07T21:39:36Z

We study memory-bounded algorithms for the $k$-secretary problem. The algorithm of Kleinberg (SODA 2005) achieves an optimal competitive ratio of $1 - O(1/\sqrt{k})$, yet a straightforward implementation requires $Ω(k)$ memory. Our main result is a $k$-secretary algorithm that matches the optimal competitive ratio using $O(\log k)$ words of memory. We prove this result by establishing a general reduction from $k$-secretary to (random-order) quantile estimation, the problem of finding the $k$-th largest element in a stream. We show that a quantile estimation algorithm with an $O(k^α)$ expected error (in terms of the rank) gives a $(1 - O(1/k^{1-α}))$-competitive $k$-secretary algorithm with $O(1)$ extra words. We then introduce a new quantile estimation algorithm that achieves an $O(\sqrt{k})$ expected error bound using $O(\log k)$ memory. Of independent interest, we give a different algorithm that uses $O(\sqrt{k})$ words and finds the $k$-th largest element exactly with high probability, generalizing a result of Munro and Paterson (1980).

2025-02-14T00:29:31Z To appear at ICALP 2026 Mingda Qiao Wei Zhang http://arxiv.org/abs/2605.06948v1 Modern column generation for estimating single- and multi-purchase ranked list choice models 2026-05-07T21:07:08Z

This paper studies the estimation of ranked-list discrete choice models with single and multiple purchases. In this setting, each consumer type is characterized by a ranking over a subset of products and a desired number of purchases, and the estimation task is to identify the set of consumer types and their probabilities that best explain the observed transactional data. This problem is computationally challenging due to the exponential number of possible consumer types and becomes more difficult when multiple purchases are allowed. We propose a column generation framework for this problem. Our main contribution is a dynamic programming algorithm for the column generation subproblem. This subproblem generalizes the linear ordering problem and incorporates acceleration techniques to improve computational efficiency. To the best of our knowledge, this is the first dynamic programming-based approach for generating consumer types in non-parametric models. The proposed framework supports multiple model variants with minor modifications. Computational experiments on synthetic and real data show substantial speedups over existing methods while maintaining high solution quality, and demonstrate effectiveness in both estimation and assortment optimization.

2026-05-07T21:07:08Z 55 pages with appendices Luciano Costa Gerardo Berbeglia Claudio Contardo Jean-François Cordeau http://arxiv.org/abs/2409.05020v2 A Performance Bound for the Greedy Algorithm in a Generalized Class of String Optimization Problems 2026-05-07T20:31:45Z

We present a simple performance bound for the greedy scheme in string optimization problems that obtains strong results. Our approach vastly generalizes the group of previously established greedy curvature bounds by Conforti and Cornuéjols (1984). We consider three constants, $α_G$, $α_G'$, and $α_G''$ introduced by Conforti and Cornuéjols (1984), that are used in performance bounds of greedy schemes in submodular set optimization. We first generalize both of the $α_G$ and $α_G''$ bounds to string optimization problems in a manner that includes maximizing submodular set functions over matroids as a special case. We then derive a much simpler and computable bound that allows for applications to a far more general class of functions with string domains. We prove that our bound is superior to both the $α_G$ and $α_G''$ bounds and provide a counterexample to show that the $α_G'$ bound is incorrect under the assumptions in Conforti and Cornuéjols (1984). We conclude with two applications. The first is an application of our result to sensor coverage problems. We demonstrate our performance bound in cases where the objective function is set submodular and string submodular. The second is an application to a social welfare maximization problem with black-box utility functions.

2024-09-08T08:20:59Z This is the accepted version of the paper for IEEE Transactions on Automatic Control IEEE Transactions on Automatic Control, vol. 71, no. 4, pp. 2305-2315, April 2026 Brandon Van Over Bowen Li Edwin K. P. Chong Ali Pezeshki 10.1109/TAC.2025.3626265 http://arxiv.org/abs/2606.12417v1 Assessing Student Ability to Select an Algorithmic Paradigm 2026-05-07T20:19:52Z

Computer science students are expected to be able to look at a problem and select an appropriate algorithm design paradigm to use to produce a solution. However, there is little research on how students determine which algorithmic paradigm to use. Historically, researchers have relied on free-response questions or interviews to assess students' knowledge of algorithmic paradigm selection. To successfully evaluate and scale teaching interventions for selecting an algorithmic design paradigm, we need to efficiently test a student's ability to select among different design paradigms. Here, we present the first attempts to assess student knowledge to select an algorithm design paradigm using multiple-choice questions. We present the construction of the \textit{algorithmic paradigm selection assessment} (APSA) and preliminary data demonstrating its effectiveness as an assessment. We discuss the key points we learned during this process to write multiple-choice questions for Algorithm Design Paradigms. We tested the internal consistency of our assessment using Cronbach's $α$ and obtained a score of $0.73$, which is above the required threshold of $0.7$. APSA can be used across institutions as a standardized way to assess students' ability to select different algorithm design paradigms. APSA will assist researchers in evaluating whether a theory helps students improve their knowledge of different Algorithm Design Paradigms.

2026-05-07T20:19:52Z Dip Kiran Pradhan Newar Michael Shindler Seth Poulsen http://arxiv.org/abs/2605.06900v1 Accelerated Relax-and-Round for Concave Coverage Problems 2026-05-07T20:00:40Z

We present an accelerated relax-and-round algorithm for concave coverage problems, which generalize the classic maximum coverage problem. Building on the relax-and-round framework of Barman et al. [STACS 2021], we propose two significant improvements. First, we replace the linear programming (LP) relaxation step with a projected accelerated gradient method applied to a smooth surrogate objective to achieve a $\widetilde{O}(mn \varepsilon^{-1})$ running time. Second, we use a specialized rounding scheme for the hypersimplex that combines the Carathéodory decomposition algorithm in Karalias et al. [NeurIPS 2025] with randomized swap rounding of Chekuri et al. [FOCS 2010]. We prove tight approximation ratios for new reward functions, including a $0.827$-approximation for the logarithmic reward $\varphi(x) = \log(1 + x)$. Finally, we conduct maximum multi-coverage experiments on synthetic and real-world graphs, demonstrating that our algorithm outperforms approaches that use state-of-the-art LP solvers.

2026-05-07T20:00:40Z 47 pages, 6 figures Matthew Fahrbach Mehraneh Liaee Morteza Zadimoghaddam http://arxiv.org/abs/2605.06899v1 Polylogarithmic Approximation for Covering and Connecting Multi-Interface Networks 2026-05-07T19:59:51Z

We study problems related to connecting multi-interface networks of wireless devices. These problems are modeled using graphs, where vertices represent the devices and edges represent potential communication links. Each vertex can activate multiple interfaces, and a connection between two vertices is established if they share at least one common active interface. We consider two problems arising in multi-interface networks: Coverage and Connectivity. In the Coverage problem, every connection defined in the network must be established, while in the Connectivity problem, groups of terminals specified in the input should be connected. The solution should minimize the maximum cost incurred by a vertex or the total cost incurred by all vertices. In this work we are interested in approximating the former of the two cost criterions. We model both problems using ILPs and we design approximation algorithms based on a randomized rounding of the solution of the linear programming relaxation. For the Coverage problem, this yields an $O(\log m)$-approximation algorithm, which is tight, since the problem generalizes Set-Cover. This improves upon the $O(b\cdot\log n)$-approximation algorithm, where $b$ is a certain graph parameter which can be as large as $Ω(n)$ [Algorithmica '12]. The same relaxation can also be used to get an $k$-approximation algorithm, where $k$ is the number of different interfaces. This generalizes a similar result for the uniform cost case. For the Connectivity problem, we obtain an $O(\log^2 m)$-approximation algorithm, which is the first non-trivial approximation algorithm for this problem. The algorithm is based on a similar LP relaxation with additional cut constraints to ensure connectivity. The rounding procedure is similar to the one for the Coverage problem but requires a more careful analysis to ensure that the connectivity constraints are satisfied.

2026-05-07T19:59:51Z 14 pages Michał Szyfelbein Camille Richer http://arxiv.org/abs/2605.06555v1 Fast decremental tree sums in forests 2026-05-07T16:49:35Z

We study two fundamental decremental dynamic graph problems. In both problems, we need to maintain a vertex-weighted forest of size $n$ under edge deletions, weight updates, and a certain information-retrieval query. Both problems can be solved in $O(\log n)$ time per update/query using standard dynamic forest data structures like top trees, even if additionally edge insertions are allowed. We investigate whether the deletion-only problem can be solved faster. First, we consider $\texttt{tree-sum}$ queries, where we ask for the sum of vertex weights in one of the connected components (i.e., trees) in the forest. We give a data structure with $O(n)$ preprocessing time and $O(\log^* n)$ time per operation, based on a micro-macro tree decomposition (Alstrup et al., 1997). If the forest is unweighted (i.e., all weights are 1 and cannot be changed), then the operation time can be improved to $O(1)$. Additionally, we give an asymptotically universally optimal algorithm. More specifically, our algorithm works in the group model, and processes $m$ operations on an initial forest $F$ in running time $O( \mathrm{OPT}(F, m) )$. Here $\mathrm{OPT}(F, m)$ is the number of weight additions and subtractions that a best possible algorithm performs to handle a worst-case instance for a fixed initial forest $F$ and a fixed number $m$ of operations. We achieve this with a combination of the aforementioned decomposition technique, precomputation of optimal data structures for very small instances, and some insights into the behavior of $\mathrm{OPT}$. Note that even the worst-case complexity of this algorithm remains unknown to us. Second, we consider $\texttt{subtree-sum}$ queries. Here, the forest is rooted, and a query $\texttt{subtree-sum}(v)$ returns the sum of weights in the subtree rooted at $v$. We show tight bounds for several variants of this problem. [...]

2026-05-07T16:49:35Z Benjamin Aram Berendsohn Marek Sokołowski http://arxiv.org/abs/2605.06398v1 On the Parameterized Approximability of (Mergeable) Sum of Radii Clustering 2026-05-07T15:10:33Z

The sum of radii problem ($k$-MSR) asks, given a metric space on $n$ points, to place $k$ balls covering all points so as to minimize the sum of their radii. Despite extensive study from the perspectives of approximation and parameterized algorithms, the exact parameterized complexity of the problem and the existence of efficient parameterized approximation schemes remained open. We advance this understanding on both the hardness and algorithmic fronts. We begin by showing that $k$-MSR is $W[2]$-hard parameterized by $k$, thereby pinpointing its location in the $W$-hierarchy. Moreover, via our reduction, we rule out efficient parameterized approximation schemes (EPAS)--that is, $(1+ε)$-approximations running in time $f(k,ε)\cdot \mathrm{poly}(n)$--unless $W[2] = FPT$. Assuming the Exponential Time Hypothesis, we further rule out such algorithms running in time $f(k,ε)\cdot n^{o(k)}$, strengthening recent lower bounds for the problem. On the algorithmic side, we study $k$-MSR under the framework of mergeable constraints, which captures a broad class of clustering constraints, including fairness, diversity, and lower bounds. We obtain an FPT $(\frac{8}{3}+ε)$-approximation, improving upon the previous best guarantee of $(4+ε)$. Moreover, given access to a suitable assignment subroutine, we achieve a $(2+ε)$-approximation, matching the best known bound for the unconstrained problem. This, in turn, yields $(2+ε)$ FPT-approximations for several important settings, including $(t,k)$-fair, $(α,β)$-fair, $\ell$-diversity, and private clustering.

2026-05-07T15:10:33Z 25 pages, 3 figures Ameet Gadekar http://arxiv.org/abs/2505.18879v5 Efficient Online Random Sampling via Randomness Recycling 2026-05-07T14:32:09Z

This article studies the fundamental problem of using i.i.d. coin tosses from an entropy source to efficiently generate random variables $X_i \sim P_i$ $(i \ge 1)$, where $(P_1, P_2, \dots)$ is a random sequence of rational discrete probability distributions subject to an \textit{arbitrary} stochastic process. Our method achieves an amortized expected entropy cost within $\varepsilon > 0$ bits of the information-theoretically optimal Shannon lower bound using $O(\log(1/\varepsilon))$ space. This result holds both pointwise in terms of the Shannon information content conditioned on $X_i$ and $P_i$, and in expectation to obtain a rate of $\mathbb{E}[H(P_1) + \dots + H(P_n)]/n + \varepsilon$ bits per sample as $n \to \infty$ (where $H$ is the Shannon entropy). The combination of space, time, and entropy properties of our method improves upon the Knuth and Yao (1976) entropy-optimal algorithm and Han and Hoshi (1997) interval algorithm for online sampling, which require unbounded space. It also uses exponentially less space than the more specialized methods of Kozen and Soloviev (2022) and Shao and Wang (2025) that generate i.i.d. samples from a fixed distribution. Our online sampling algorithm rests on a powerful algorithmic technique called \textit{randomness recycling}, which reuses a fraction of the random information consumed by a probabilistic algorithm to reduce its amortized entropy cost. On the practical side, we develop randomness recycling techniques to accelerate a variety of prominent sampling algorithms. We show that randomness recycling enables state-of-the-art runtime performance on the Fisher-Yates shuffle when using a cryptographically secure pseudorandom number generator, and that it reduces the entropy cost of discrete Gaussian sampling. Accompanying the manuscript is a performant software library in the C programming language.

2025-05-24T21:34:08Z Proceedings of the 2026 Annual ACM-SIAM Symposium on Discrete Algorithms (SODA), pp. 2473-2511. Society for Industrial and Applied Mathematics, 2026 Thomas L. Draper Feras A. Saad 10.1137/1.9781611978971.89