Adaptive Multi-Round Allocation with Stochastic Arrivals

2026-05-12T13:29:06Z

We study a sequential resource allocation problem motivated by adaptive network recruitment, in which a limited budget of identical resources must be allocated over multiple rounds to individuals with stochastic referral capacity. Successful referrals endogenously generate future decision opportunities while allocating additional resources to an individual exhibits diminishing returns. We first show that the single-round allocation problem admits an exact greedy solution based on marginal survival probabilities. In the multi-round setting, the resulting Bellman recursion is intractable due to the stochastic, high-dimensional evolution of the frontier. To address this, we introduce a population-level surrogate value function that depends only on the remaining budget and frontier size. This surrogate enables an exact dynamic program via truncated probability generating functions, yielding a planning algorithm with polynomial complexity in the total budget. We further analyze robustness under model misspecification, proving a multi-round error bound that decomposes into a tight single-round frontier error and a population-level transition error. Finally, we evaluate our method on real-world inspired recruitment scenarios.

On the LSH Distortion of Ulam and Cayley Similarities

2026-05-12T10:36:46Z

Locality-sensitive hashing (LSH) has found widespread use as a fundamental primitive, particularly to accelerate nearest neighbor search. An LSH scheme for a similarity function $S:\mathcal{X} \times \mathcal{X} \to [0,1]$ is a distribution over hash functions on $\mathcal{X}$ with the property that the probability of collision of any two elements $x,y\in \mathcal{X}$ is exactly equal to $S(x,y)$. However, not all similarity functions admit exact LSH schemes. The notion of LSH distortion measures how multiplicatively close a similarity function is to having an LSH scheme. In this work, we study the LSH distortion of the Ulam and Cayley similarities, which are popular similarity measures on permutations of $n$ elements. We show that the Ulam similarity admits a sublinear LSH distortion of $O(n / \sqrt{\log n})$; we also prove a lower bound of $Ω(n^{0.12})$ on the best LSH distortion achievable. On the other hand, we show that the LSH distortion of the Cayley similarity is $Θ(n)$.

Maximizing Reachability via Shifting of Temporal Paths

2026-05-12T09:51:59Z

We examine the problem of maximizing the reachability of a given source in temporal graphs that are given as the union of k temporal paths, i.e., every given path is a sequence of edges with strictly increasing labels that denote availability in time. This type of temporal graphs represent train networks. We consider shifting operations on the labels of the paths that maintain their temporal continuity. This means that we can move the availability of a temporal edge later or earlier in time, and propagate the shifts to all other affected edges of the path in order to preserve its temporal connectivity. We study the parameterized complexity of the problem with respect to the number of paths k, and the total budget b, where b is the maximum number of shifts we are allowed to perform. Our results reveal that fixed parameter tractability can be achieved (1) when parameterized both by k and b, and (2) when parameterized by k, and b is unconstrained. In almost every other case, e.g., parameterized by a single parameter or parameterized by k, while having a bound on b, we establish intractability lower bounds that are matched by XP algorithms.

Connectivity augmentation is fixed-parameter tractable

2026-05-12T08:31:20Z

In the vertex connectivity augmentation problem, we are given an undirected $n$-vertex graph $G$, a set of links $L \subseteq \binom{V(G)}{2} \setminus E(G)$, and integers $λ$ and $k$. The task is to insert at most $k$ links from $L$ to $G$ to make $G$ $λ$-vertex-connected. We show that the problem is fixed-parameter tractable (FPT) when parameterized by $λ$ and $k$, by giving an algorithm with running time $2^{O(k \log (k + λ))} n^{O(1)}$. This improves upon a recent result of Carmesin and Ramanujan [SODA'26], who showed that the problem is FPT parameterized by $k$ but only when $λ\le 4$. We also consider the analogous edge connectivity augmentation problem, where the goal is to make $G$ $λ$-edge-connected. We show that the problem is FPT when parameterized by $k$ only, by giving an algorithm with running time $2^{O(k \log k)} n^{O(1)}$. Previously, such results were known only under additional assumptions on the edge connectivity of $G$.

Random Access in Grammar-Compressed Strings: Optimal Trade-Offs in Almost All Parameter Regimes

2026-05-12T08:31:04Z

A Random Access query to a string $T\in [0..σ)^n$ asks for the character $T[i]$ at a given position $i\in [0..n)$. In $O(n\logσ)$ bits of space, this fundamental task admits constant-time queries. While this is optimal in the worst case, much research has focused on compressible strings, hoping for smaller data structures that still admit efficient queries. We investigate the grammar-compressed setting, where $T$ is represented by a straight-line grammar. Our main result is a general trade-off that optimizes Random Access time as a function of string length $n$, grammar size (the total length of productions) $g$, alphabet size $σ$, data structure size $M$, and word size $w=Ω(\log n)$ of the word RAM model. For any $M$ with $g\log n 0$ [Belazzougui et al.; ESA'15], [Ganardi, Jeż, Lohrey; J. ACM 2021]. The only tight lower bound [Verbin and Yu; CPM'13] was $Ω(\frac{\log n}{\log\log n})$ for $w=Θ(\log n)$, $n^{Ω(1)}\le g\le n^{1-Ω(1)}$, and $M=g\log^{Θ(1)}n$. In contrast, our result yields tight bounds in all relevant parameters and almost all regimes. Our data structure admits efficient deterministic construction. It relies on novel grammar transformations that generalize contracting grammars [Ganardi; ESA'21]. Beyond Random Access, its variants support substring extraction, rank, and select.

Congestion bounds via Laplacian eigenvalues and their application to tensor networks with arbitrary geometry

2026-05-12T04:11:06Z

Embedding the vertices of arbitrary graphs into trees while minimizing some measure of overlap is an important problem with applications in computer science and physics. In this work, we consider the problem of bijectively embedding the vertices of an $n$-vertex graph $G$ into the \textit{leaves} of an $n$-leaf \textit{rooted binary tree} $\mathcal{T}$. The congestion of such an embedding is given by the largest size of the cut induced by the two components obtained by deleting any vertex of $\mathcal{T}$. We show that for any embedding, the congestion lies between $λ_2(G)\cdot 2n/9$ and $λ_n(G)\cdot n/4$, letting $0=λ_1(G)\le \cdots \le λ_n(G)$ be the Laplacian eigenvalues of $G$, and there is an embedding for which the congestion is at most $λ_n(G)\cdot 2n/9$. Beyond these general bounds, we determine the congestion exactly for hypercubes and lattice graphs, and obtain asymptotically tight bounds for random regular graphs and Erdős-Rényi graphs. We further introduce an efficient contraction procedure based on spectral ordering and dynamic programming, which produces low-congestion embeddings in practice. Numerical experiments on structured graphs, random graphs, and tensor network representations of quantum circuits validate our theoretical bounds and demonstrate the effectiveness of the proposed method. These results yield new spectral bounds on the memory and time complexity of exact tensor network contraction in terms of the underlying graph structure.

FractalSortCPU: Bandwidth-Efficient Compressed Radix Sort on CPU

2026-05-12T02:43:04Z

Cloud database systems, particularly their middleware and query execution layers, use sorting as a core operation in query processing, indexing and join execution. Distribution-dependence and limited parallelism are key issues inherent in state-of-the-art radix sort which is preferred for large datasets due to performance advantages over comparison-based algorithms. Multi-pass bucketing, stochastic sampling and dependence graph structures are common solutions to these problems that incur the cost of data pre-processing and increased memory footprint hence they are less appropriate for large-scale workloads common in cloud environments. In-place radix sort schemes increase the number of passes as precision increases, which negatively impacts latency. Our work solves these problems by introducing a CPU-adapted histogram compression scheme for radix sorting for arbitrary-precision keys implemented on the CPU for increased accessibility, providing state-of-the-art execution time, while limiting histogram growth. Fully parallel key-based histogram updates eliminate the need for input bucketing and data pre-processing further lowering latency, mitigating distribution-dependence and reducing complexity. With a parallelized sorting architecture utilizing SIMD-accelerated operations for low latency, the algorithm demonstrates improvement over the state-of-the-art on the CPU, GPU, and FPGA by 6x, 3x and 2.5x in bandwidth efficiency on 512MB to 32GB data sets at 16-bit precision.

The tractability landscape of diffusion alignment: regularization, rewards, and computational primitives

2026-05-12T00:25:57Z

Inference-time reward alignment asks how to turn a pre-trained diffusion model with base law $p$ into a sampler that favors a reward $r$ while remaining close to $p$. Since there is no canonical distributional distance for this closeness constraint, different choices lead to different "reward-aligned" laws and, just as importantly, different algorithmic problems. We develop a primitive-based approach to reward alignment: rather than assuming arbitrary reward-aligned laws can be sampled, we ask which simple algorithmic primitives suffice to implement alignment for non-trivial reward classes. If closeness is measured in KL distance, the target law is $q(x) \propto p(x) \exp(λ^{-1}r(x))$. For this setting, we show that linear exponential tilts of the form $q(x)\propto p(x)\exp(\langle θ, x \rangle)$ -- which according to recent work [MRR26] can be efficiently sampled from -- are a sufficient primitive for aligning to a very broad class of convex low-dimensional rewards. If closeness is measured in Wasserstein distance, the corresponding primitive is a proximal transport oracle: given $x$, solve $\mbox{argmax}_y \{r(y)- λc(x,y)\}$. This oracle can be efficiently implemented for concave or low-dimensional Lipschitz rewards $r(x)=f(Ax)$. Together, these results illustrate that the choice of distribution distance for alignment affects the computational primitive and the tractable reward class.

Finite and Corruption-Robust Regret Bounds in Online Inverse Linear Optimization under M-Convex Action Sets

2026-05-11T23:34:22Z

We study online inverse linear optimization, also known as contextual recommendation, where a learner sequentially infers an agent's hidden objective vector from observed optimal actions over feasible sets that change over time. The learner aims to recommend actions that perform well under the agent's true objective, and the performance is measured by the regret, defined as the cumulative gap between the agent's optimal values and those achieved by the learner's recommended actions. Prior work has established a regret bound of $O(d\log T)$, as well as a finite but exponentially large bound of $\exp(O(d\log d))$, where $d$ is the dimension of the optimization problem and $T$ is the time horizon, while a regret lower bound of $Ω(d)$ is known (Gollapudi et al. 2021; Sakaue et al. 2025). Whether a finite regret bound polynomial in $d$ is achievable or not has remained an open question. We partially resolve this by showing that when the feasible sets are M-convex -- a broad class that includes matroids -- a finite regret bound of $O(d\log d)$ is possible. We achieve this by combining a structural characterization of optimal solutions on M-convex sets with a geometric volume argument. Moreover, we extend our approach to adversarially corrupted feedback in up to $C$ rounds. We obtain a regret bound of $O((C+1)d\log d)$ without prior knowledge of $C$, by monitoring directed graphs induced by the observed feedback to detect corruptions adaptively.

Performance bounds for nearest neighbor search with k-d trees

2026-05-11T23:02:00Z

The $k$-d tree is one of the oldest and most widely used data structures for nearest neighbor search. It partitions Euclidean space into axis-aligned rectangular cells. There are two standard ways to find the nearest neighbor to a query in a $k$-d tree. Defeatist search returns the closest data point in the query's cell, while comprehensive search also searches other cells as needed to guarantee it finds the nearest neighbor. Both strategies are commonly believed to perform poorly in high dimensions, but there have been few theoretical results explaining this. We prove non-asymptotic bounds on the runtime of comprehensive search and the accuracy of defeatist search. Under mild distributional assumptions, when the dimension $d$ is at least polylogarithmic in the number of data points, defeatist search is no more likely to return the nearest neighbor than random guessing, and comprehensive search visits every cell with high probability. We also show that on uniform data, with high probability, comprehensive search visits at most $2^{\mathcal{O}(d)}$ cells when each cell contains at least logarithmically many data points, and defeatist search returns the nearest neighbor when each cell additionally contains at least $2^{\mathcal{O}(d \log d)}$ data points. Finally, for arbitrary absolutely continuous distributions, we upper bound the expected distance between the query and the point returned by defeatist search.

Computational Complexity Analysis of Interval Methods in Solving Uncertain Nonlinear Systems

2026-05-11T22:38:01Z

This paper analyzes the computational complexity of validated interval methods for uncertain nonlinear systems and steady-state enclosure. Interval analysis produces guaranteed enclosures that account for uncertainty and round-off, but its adoption is often limited by computational cost in high dimensions. We develop an algorithm-level worst-case framework that makes explicit the dependence on the problem dimension $n$, the initial search region size $\mathrm{Vol}(X_0)$, the target tolerance $\varepsilon$, and the costs of validated primitives (inclusion-function evaluation, Jacobian evaluation, and interval linear algebra). Within this framework, we derive worst-case time and space bounds for interval bisection, subdivision$+$filter, interval constraint propagation, interval Newton, and interval Krawczyk, and identify dominant cost drivers. We also show that the computation of the determinant and inverse of interval matrices via naive Laplace expansion exhibits factorial growth with increasing matrix dimension, motivating specialized interval linear algebra. We complement the worst-case bounds with computational results on two application-motivated biochemical steady-state models (a Hill-type regulatory network and an enzyme-saturation-based winner-take-all circuit) in dimensions $n\in\{2,5,10\}$, including instances that process millions of boxes. The resulting analysis and experiments support the practical design of validated solvers for uncertainty-aware steady-state screening tasks such as robust operating-point certification and multistability assessment.

A data structure for monomial ideals with applications to signature Gröbner bases

2026-05-11T21:55:07Z

We introduce monomial divisibility diagrams (MDDs), a data structure for monomial ideals that supports insertion of new generators and fast membership tests. MDDs stem from a canonical tree representation by maximally sharing equal subtrees, yielding a directed acyclic graph. We establish basic complexity bounds for membership and insertion, and study empirically the size of MDDs. As an application, we integrate MDDs into the signature Gröbner basis implementation of the Julia package AlgebraicSolving.jl. Membership tests in monomial ideals are used to detect some reductions to zero, and the use of MDDs leads to substantial speed-ups compared to the existing representation by lists of generators with divmasks.

Chasing Small Sets Optimally Against Adaptive Adversaries

2026-05-11T17:56:19Z

We study deterministic online algorithms for the problem of chasing sets of cardinality at most $k$ in a metric space, also known as metrical service systems and equivalent to width-$k$ layered graph traversal. We resolve the 30-year-old gap of $Ω(2^k)\cap O(k2^k)$ on the competitive ratio of this problem by giving an $O(2^k)$-competitive deterministic algorithm. This bound is optimal even among randomized algorithms against adaptive adversaries. We also (slightly) improve the deterministic lower bound to $D_k$, defined recursively by $D_1=1$ and $D_{k+1}=2D_k+\sqrt{8+8D_k}+3$, which we conjecture to be exactly tight. For $k=3$, we provide a matching upper bound of $D_3$. Our results imply slightly improved upper and lower bounds for distributed asynchronous collective tree exploration and for the $k$-taxi problem, respectively. Our algorithm generalizes the classical doubling strategy, previously known to be optimal for $k=2$. The previous best bound for general $k$ was achieved by the generalized work function algorithm (WFA), and was known to be tight for WFA. Our improved bound therefore implies that WFA is sub-optimal for chasing small sets.

The stochastic block model has the overlap graph property for modularity

2026-05-11T17:49:30Z

The overlap gap property (OGP) is a statement about the geometry of near-optimal solutions. Exhibiting OGP implies failure of a class of local algorithms; and has been observed to coincide with conjectured algorithmic limits in problems with statistical computational gap. We consider the Stochastic Block Model (SBM), where the graph has a planted partition with $k$ equal-size blocks which form the `communities', and where, for parameters $p>q$, vertices within the same community connect with probability $p$, while vertices in different communities connect with probability $q$, independently across pairs of vertices. Modularity--based clustering algorithms have become ubiquitous in applications. This article studies theoretical limits of local algorithms based on the modularity score on the SBM. We establish that modularity exhibits OGP on the SBM. This rules out a class of local algorithms based on modularity for recovery in the SBM, and shows slow mixing time for a related Markov Chain. Theoretically this is one of the few instances where OGP has been established for a `planted' model, as most such analyses to date consider the `null' model. As part of our analysis, we extend a result by Bickel and Chen 2009, who established that with high probability, the modularity optimal partition of SBM is $o(n)$ local moves away from the planted partition, where $n$ is the graph size. We show that, with high probability, any partition with modularity score sufficiently near the optimal value is close to the planted partition.

FPT Approximation Schemes for Min-Sum Radii and Min-Sum Diameters Clustering

2026-05-11T17:38:16Z

In the classical Min-Sum Radii problem (MSR) we are given a set $X$ of $n$ points in a metric space and a positive integer $k\in [n]$. Our goal is to partition $X$ into $k$ subsets (the clusters) so as to minimize the sum of the radii of these clusters. The Min-Sum Diameters problem (MSD) is defined analogously, where instead of the radii of the clusters we consider their diameters. For both problems we present FPT approximation schemes for the natural parameter $k$. Specifically, given $ε>0$, we show how to compute $(1+ε)$-approximations for both MSD and MSR in time $(1/ε)^kn^{O(1)}$ and $(1/ε)^{O(k/ε\log 1/ε)}n^{poly(1/ε)}$ respectively. The previous best FPT approximation algorithms for these problems have approximation factors $4+ε$ and $2+ε$, respectively, and finding an FPT approximation scheme for both these problems had been outstanding open problems.