Mistake-Bounded Language Generation

2026-05-11T16:32:01Z

We investigate the learning task of language generation in the limit, but shift focus from the traditional time-of-last-mistake metric of a generator's success to a new notion of "mistake-bounded generation." While existing results for language generation in the limit focus on guaranteeing eventual consistency, they are blind to the cumulative error incurred during the learning process. We address this by shifting the goal to minimizing the total number of invalid elements output by a generation algorithm. We establish a formal reduction to the Learning from Correct Demonstrations framework of Joshi et al. (2025), enabling a general recipe for deriving mistake bounds via weighted update rules. For finite classes, we provide an algorithm that simultaneously achieves an optimal last-mistake time of $\mathsf{Cdim}(L)$ and a mistake bound of $\lfloor \log_2 |L| \rfloor$, whereas for the non-uniform setting of countably infinite streams of languages, we prove a fundamental trade-off: achieving logarithmic mistakes $O(\log i)$ necessarily precludes convergence guarantees established in prior work. Finally, we show that our framework can be extended to accommodate noisy adversaries and guarantee mistake bounds that scale with the adversary's suboptimality.

An Efficient Algorithm for Minimizing Ordered Norms in Fractional Load Balancing

2026-05-11T15:41:23Z

We study the problem of minimizing an ordered norm of a load vector (indexed by a set of $d$ resources), where a finite number $n$ of customers $c$ contribute to the load of each resource by choosing a solution $x_c$ in a convex set $X_c \subseteq \mathbb{R}^d_{\geq 0}$; so we minimize $||\sum_{c}x_c||$ for some fixed ordered norm $||\cdot||$. We devise a randomized algorithm that computes a $(1+\varepsilon)$-approximate solution to this problem and makes, with high probability, $\mathcal{O}\left((n+d) (\varepsilon^{-2}+\log\log d)\log (n+d)\right)$ calls to oracles that minimize linear functions (with non-negative coefficients) over $X_c$. While this has been known for the $\ell_{\infty}$ norm via the multiplicative weights update method, existing proof techniques do not extend to arbitrary ordered norms. Our algorithm uses a resource price mechanism that is motivated by the follow-the-regularized-leader paradigm, and is expressed by smooth approximations of ordered norms. We need and show that these have non-trivial stability properties, which may be of independent interest. For each customer, we define dynamic cost budgets, which evolve throughout the algorithm, to determine the allowed step sizes. This leads to non-uniform updates and may even reject certain oracle solutions. Using non-uniform sampling together with a martingale argument, we can guarantee sufficient expected progress in each iteration, and thus bound the total number of oracle calls with high probability.

On Identifying Critical Network Edges via Analyzing Changes in Shapes (Curvatures)

2026-05-11T15:35:10Z

In recent years extensions of manifold Ricci curvature to discrete combinatorial objects such as graphs and hypergraphs (popularly called as "network shapes"), have found a plethora of applications in a wide spectrum of research areas ranging over metabolic systems, transcriptional regulatory networks, protein-protein-interaction networks, social networks and brain networks to deep learning models but, in contrast, they have been looked at by relatively fewer researchers in the algorithms and computational complexity community. As an attempt to bring these network Ricci-curvature related problems under the lens of computational complexity and foster further inter-disciplinary interactions, we provide a formal framework for studying algorithmic and computational complexity issues for detecting critical edges in an undirected graph using Ollivier-Ricci curvatures and provide several algorithmic and inapproximability results for problems in this framework. Our results show some interesting connections between our problems, the exact perfect matching and perfect matching blocker problems for bipartite graphs and two well-known combinatorial packing/covering problems.

Log-Averaged Mirror Prox for Fast, Large-Scale Optimal Transport in Linear Space

2026-05-11T15:32:20Z

We propose Log-Averaged Mirror Prox (LAMP), a linear-space primal-dual method for large-scale optimal transport. LAMP implements primal mirror prox updates by tracking an averaged dual sequence, reducing storage complexity from ${O}(nm)$ to $O(n+m)$ while preserving dense, GPU-friendly reductions. Consequently, LAMP preserves the last-iterate $\widetilde{O}( nm\varepsilon^{-1})$ arithmetic complexity of conservatively parameterized primal-dual mirror prox. We further analyze LAMP as a direct optimal transport solver in a more performant parameter regime, providing a last-iterate sub-optimality certificate dependent on infeasibility and an explicit $O(1/t)$ term. Moreover, we give a computable sufficient condition for best-iterate convergence to a saddle-point. Numerical experiments with an optimized CUDA implementation show that LAMP outperforms first-order baselines in several high-accuracy (entropic) optimal transport problems. LAMP is further shown to scale up to problems with $n=m=2^{18}$ marginal supports, which were previously beyond the reach of primal-dual first-order methods.

Handicap reduction for linear complementarity problems

2026-05-11T15:16:19Z

Linear Complementarity Problems (LCPs) with sufficient matrices form an important subclass of LCPs, and it remains a significant open question whether problems in this class can be solved in polynomial time. Kojima, Megiddo, Noma, and Yoshise gave an Interior Point Algorithm (IPA) in 1991, that can solve LCPs with sufficient matrices in time bounded polynomially in the input size and the so-called handicap number $\hatκ(M)$ of the coefficient matrix $M$. However, this value can be exponentially large in the bit encoding length. In fact, no upper bounds were previously known on $\hatκ(M)$. Settling an open question raised in de Klerk and E.-Nagy (Math Programming, 2011), we give an exponential upper bound on $\hatκ(M)$ in the bit-complexity of $M$. This is based on a new characterization of sufficient matrices. The new characterization also leads to a simple new proof of Väliaho's theorem on the equivalence of sufficient and $\mathcal{P}^*$-matrices (Linear Algebra and its Applications, 1996). Noting that one can obtain an equivalent LCP by rescaling the rows and columns by a positive diagonal matrix, we define $\hatκ^\star(M)$ as the best possible handicap number achievable under such rescalings. Our second main result is an algorithm for LCPs with sufficient matrices, where the running time is polynomially bounded in the input size and in the optimized value $\hatκ^\star(M)$. This algorithm is based on the observation that the set of near-optimal row-rescalings forms a convex set. Our algorithm combines the Ellipsoid Method over the set of row rescalings, and an IPA with running time dependent on the handicap number of the matrix. If the IPA fails to solve the LCP in the desired running time, it provides a separation oracle to the Ellipsoid Method to find a better rescaling.

Graded Projection Recursion (GPR): Corrections, Obstructions, and Conservative Approximate Matrix Multiplication

2026-05-11T14:55:16Z

Earlier versions proposed Graded Projection Recursion (GPR) as a deterministic packed-recursion framework for model-honest near-quadratic dense matrix multiplication. This revised version withdraws the exact dense matrix multiplication theorem and the downstream consequences that depended on it with a conservative AMM framework. The local ingredients remain useful as local tools: the three-band packing identity, scaled middle-band extraction under certified gaps, centering and reconstruction identities, and row/column normalization bounds. The gap in the earlier argument is global: the proof relied on a bounded active-state realization that would remove first-mismatch terms through the recursion. For arbitrary dense inputs this would require an exact equality filter over the inner index. We formulate this obstruction as a target-slice/equality-filter problem and give a rank/capacity argument against the natural separable active-state realization. The positive replacement is a conservative approximate matrix multiplication framework. For chosen protected left and right query subspaces, the low/marginal part of AB is computed exactly and an unbiased AMM primitive is applied only to the high/high residual. The resulting estimator is unbiased, preserves protected queries exactly in every realization, localizes stochastic error to the residual subspace, and inherits residual output-norm or query-risk guarantees from the underlying estimator.

Edge-weighted Online Stochastic Matching Under Jaillet-Lu LP

2026-05-11T13:44:42Z

The online stochastic matching problem was introduced by [FMMM09], together with the $(1-\frac1e)$-competitive Suggested Matching algorithm. In the most general edge-weighted setting, this ratio has not been improved for more than one decade, until recently [Yan24] beat the $1-\frac1e$ bound and [QFZW23] further improved it to $0.650$. Both works measure the online competitiveness against the offline LP relaxation introduced by Jaillet and Lu [JL14]. The same LP has also played an important role in other settings as it is a natural choice for two-choice online algorithms. In this paper, we prove an upper bound of $0.663$ and a lower bound of $0.662$ for edge-weighted online stochastic matching under Jaillet-Lu LP. We propose a simple hard instance and identify the optimal online algorithm for this specific instance which has a competitive ratio of $<0.663$. Despite the simplicity of the instance, we then show that a near-optimal algorithm for it, which has a competitive ratio of $>0.662$, can be generalized to work on all instances without any loss. As our algorithm is generalized from a real near-optimal algorithm instead of manually combining trivial strategies, it has two natural advantages compared with previous works: (1) its matching strategy varies from time to time; (2) it utilizes global information about offline vertices. On the other hand, the upper bound suggests that more powerful LPs and multiple-choice strategies are needed if we want to further improve the ratio by $>0.001$. In addition to our main result, we also generalize the asymptotic equivalence between the Poisson arrival model and the original online stochastic matching established by [HS21], removing the requirement of approximate monotonicity for the online algorithm.

Faster Multi-Source Reachability and Approximate Distances via Shortcuts, Hopsets and Matrix Multiplication

2026-05-11T12:30:41Z

Given an $n$-vertex $m$-edge digraph $G = (V,E)$ and a subset $S \subseteq V$ of $|S| = n^σ$ (for some $0 \le σ\le 1$) designated sources, the $S \times V$ reachability problem is to compute the sets $\mathcal V_s$ of vertices reachable from $s$, for every $s \in S$. Naive centralized algorithms run BFS/DFS from each source in $O(m \cdot n^σ)$ time or compute $G$'s transitive closure in $\hat O(n^ω)$ time, where $ω\le 2.371552\ldots$ is the matrix multiplication exponent. Thus, the best known bound is $\hat O(n^{\min \{ 2 + σ, ω\}})$. Leveraging shortcut constructions by Kogan and Parter [SODA 2022, ICALP 2022], we develop a centralized algorithm with running time $\hat O(n^{1 + \frac{2}{3} ω(σ)})$, where $ω(σ)$ is the rectangular matrix multiplication exponent. Using current estimates on $ω(σ)$, our exponent improves upon $\min \{2 + σ, ω\}$ for $\tilde σ\leq σ\leq 0.53$, where $1/3 < \tilde σ< 0.3336$ is a universal constant. In a classical result, Cohen [Journal of Algorithms, 1996] devised parallel algorithms for $S \times V$ reachability on graphs admitting balanced recursive separators of size $n^ρ$ for $ρ< 1$, requiring polylogarithmic time and work $n^{\max \{ωρ, 2ρ+ σ\} + o(1)}$. We significantly improve, extend, and generalize Cohen's result. First, our parallel algorithm for graphs with small recursive separators has lower work complexity than Cohen's in boraod paramater ranges. Second, we generalize our algorithm to graphs of treewidth at most $n^ρ$ ($ρ< 1$) and provide a centralized algorithm that outperforms existing bounds for $S \times V$ reachability on such graphs. We also do this for some other graph familes with small separators. Finally, we extend these results to $(1 + ε)$-approximate distance computation.

Learning Confidence Ellipsoids and Applications to Robust Subspace Recovery

2026-05-11T12:24:28Z

We study the problem of finding confidence ellipsoids for an arbitrary distribution in high dimensions. Given samples from a distribution $D$ and a confidence parameter $α$, the goal is to find the smallest volume ellipsoid $E$ which has probability mass $\mathbb{P}_{D}[E] \ge 1-α$. Ellipsoids are a highly expressive class of confidence sets as they can capture correlations in the distribution, and can approximate any convex set. In statistics, this is the classic minimum volume estimator introduced by Rousseeuw as a robust non-parametric estimator of location and scatter. However in high dimensions, it becomes NP-hard to obtain any non-trivial approximation factor in volume when the condition number $β$ of the ellipsoid (ratio of the largest to the smallest axis length) goes to $\infty$. This motivates the focus of our paper: can we efficiently find confidence ellipsoids with volume approximation guarantees when compared to ellipsoids of bounded condition number $β$? Our main result is a polynomial time algorithm that finds an ellipsoid $E$ whose volume is within a $O(β)^{γd}$ multiplicative factor of the volume of best $β$-conditioned ellipsoid while covering at least $1-O(α/γ)$ probability mass for any $γ\in (0,1)$. In particular, setting $γ= o(1)$, this gives a $O(β)^{o(d)}$ volume approximation, with a multiplicative loss in miscoverage. We complement this with a computational hardness result that shows that such a dependence on $β$ seems necessary, even with some slack in coverage. The algorithm and analysis uses the rich primal-dual structure of the minimum volume enclosing ellipsoid and the geometric Brascamp-Lieb inequality. As a consequence, we obtain the first polynomial time algorithm with approximation guarantees on worst-case instances of the robust subspace recovery problem.

Static to Dynamic Correlation Clustering

2026-05-11T12:04:54Z

Correlation clustering is a well-studied problem, first proposed by Bansal, Blum, and Chawla [Mach. Learn. '04]. The input is an unweighted, undirected graph. The problem is to cluster the vertices so as to minimize the number of edges between vertices in different clusters and missing edges between vertices inside the same cluster. This problem has a wide application in data mining and machine learning. We introduce a general framework that transforms existing static correlation clustering algorithms into fully-dynamic ones that work against an adaptive adversary. We show how to apply our framework to known efficient correlation clustering algorithms, starting from the classic 3-approximate Pivot algorithm from Ailon, Charikar and Newman [JACM'08]. Applied to the most recent sublinear $1.485$-approximation algorithm from Cao, Cohen-Addad, Lee, Li, Lolck, Newman, Thorup, Vogl, Yan and Zhang [STOC'25], we get a $1.485$-approximation fully-dynamic algorithm that works with worst-case constant update time. The original static algorithm gets its approximation factor with constant probability, and we get the same against an adaptive adversary in the sense that for any given update step, not known to our algorithm, our solution is a $1.485$-approximation with constant probability when we reach this update. Most of previous dynamic algorithms, including the celebrated result from Behnezhad, Charikar, Ma and Tan [FOCS'19], had approximation factors around $3$ in expectation, and they could only handle an oblivious adversary. A recent algorithm by Braverman, Dharangutte, Pai, Shah, and Wang [AISTATS'25] could handle an adaptive adversary, but it has a large unspecified constant approximation ratio. This contrasts with our general transformation, which works with all the best approximation factors known for the static case.

Temporal Graph Reconfiguration for Always-Connected Graphs

2026-05-11T09:59:49Z

Network redesign problems ask for modifications to the edges of a given graph to satisfy certain properties. In temporal graphs, where edges are only active at certain times, we are sometimes only allowed to modify when the edges are going to be active. In practice, we might not even be able to perform all of the necessary modifications at once; changes must be applied step-by-step while the network is still in operation, meaning that the network must continue to satisfy some properties. To initiate a study in this area, we introduce the class of temporal graph reconfiguration problems. As a starting point, we consider the Layered Connectivity Reconfiguration (LCR) problem: Given two always-connected temporal graphs G1 and G2, determine if it is possible to transform G1 into G2 by changing the time at which a single temporal edge is active in each step, such that every intermediate temporal graph is always-connected. We provide a dynamic programming algorithm for the LCR problem. We also show that finding the shortest reconfiguration sequence between two temporal graphs is APX-hard. Additionally, we show that the LCR problem is equivalent to the Spanning Tree Sequence Reconfiguration (STSR) problem introduced by Hanaka et al. Therefore, our results also answer the two open questions presented by the authors: (i) find a simpler algorithm for the STSR problem, (ii) show that the STSR problem is inapproximable up to some factor.

Convex Optimization with Local Label Differential Privacy: Tight Bounds in All Privacy Regimes

2026-05-11T08:47:58Z

We study the problem of Stochastic Convex Optimization (SCO) under the constraint of local Label Differential Privacy (L-LDP). In this setting, the features are considered public, but the corresponding labels are sensitive and must be randomized by each user locally before being sent to an untrusted analyzer. Prior work for SCO under L-LDP (Ghazi et al., 2021) established an excess population risk bound with a \emph{linear} dependence on the size of the label space, $K$: $O\left({\frac{K}{ε\sqrt{n}}}\right)$ in the high-privacy regime ($ε\leq 1$) and $O\left({\frac{K}{e^ε \sqrt{n}}}\right)$ in the medium-privacy regime ($1 \leq ε\leq \ln K$). This left open whether this linear cost is fundamental to the L-LDP model. In this note, we resolve this question. First, we present a novel and efficient non-interactive L-LDP algorithm that achieves an excess risk of $O\left({\sqrt{\frac{K}{εn}}}\right)$ in the high-privacy regime ($ε\leq 1$) and $O\left({\sqrt{\frac{K}{e^ε n}}}\right)$ in the medium-privacy regime ($1 \leq ε\leq \ln K$). This quadratically improves the dependency on the label space size from $O(K)$ to $O(\sqrt{K})$. Second, we prove a matching information-theoretic lower bound across all privacy regimes for any sufficiently large $n$.

An Approximation Algorithm for 2-Vertex-Connectivity via Cycle-Restricted 2-Edge-Covers

2026-05-11T06:34:57Z

In the 2-Vertex-Connected Spanning Subgraph problem (2-VCSS), we are given an undirected graph $G$, and the objective is to find a 2-vertex-connected spanning subgraph $S$ of $G$ with the minimum number of edges. In the context of survivable network design, 2-VCSS is one of the most fundamental and well-studied problems. There has been active research on improving the approximation ratio of algorithms, and the current best ratio is $\frac{4}{3}$, achieved by Bosch-Calvo, Grandoni, and Jabal Ameli. In this paper, we improve the approximation ratio to $\frac{95}{72}+\varepsilon$ ($<1.32$). The key idea in our algorithm is to introduce a 2-edge-cover without certain cycle components, and use it as an initial solution.

A 4.509-Approximation Algorithm for Generalized Min Sum Set Cover

2026-05-11T05:56:44Z

We study the \emph{generalized min-sum set cover} (GMSSC) problem, where given a collection of hyperedges $E$ with arbitrary covering requirements $\{k_e \in \mathbb{Z}^+ : e \in E\}$, the objective is to find an ordering of the vertices that minimizes the total cover time of the hyperedges. A hyperedge $e$ is considered covered at the first time when $k_e$ of its vertices appear in the ordering. We present a $4.509$-approximation algorithm for GMSSC, improving upon the previous best-known guarantee of $4.642$~\cite[SODA'21]{BansalBFT21}. Our approach retains the general LP-based framework of Bansal, Batra, Farhadi, and Tetali~\cite{BansalBFT21} but provides an improved analysis that narrows the gap toward the lower bound of $4$-approximation assuming P$\neq$NP. Our analysis takes advantage of the constraints of the linear program in a nontrivial way, along with new lower-tail bounds for the sums of independent Bernoulli random variables, which could be of independent interest.

Dynamic Rank, Basis, and Matching

2026-05-11T03:11:25Z

We study dynamic algorithms for maintaining fundamental algebraic properties of matrices, specifically, rank, basis, and full-rank submatrices, with applications to maximum matching on dynamic graphs. Prior dynamic algorithms for rank achieve subquadratic update times but scale with the matrix dimension $n$, and could not always maintain the corresponding objects such as a basis or maximum full-rank submatrix. We present the first dynamic rank algorithms whose update time scales with the matrix rank $r$, achieving $\tilde O(r^{1.405})$ time per entry-update and $\tilde O(r^{1.528}+ z)$ per column-update, where $z$ is the number of changed entries. This extends to $\tilde O(|M|^{1.405})$ edge-update time to maintain the size $|M|$ of a maximum matching. We also give dynamic algorithms for maintaining a column-basis subject to column-updates and a maximum full-rank submatrix subject to entry-updates.