Early Pruning for Public Transport Routing

2026-05-26T09:01:21Z

Routing algorithms for public transport, particularly the widely used RAPTOR and its variants, often face performance bottlenecks during the transfer relaxation phase, especially on dense transfer graphs, when supporting unlimited transfers. This inefficiency arises from iterating over many potential inter-stop connections (walks, bikes, e-scooters, etc.). To maintain acceptable performance, practitioners often limit transfer distances or exclude certain transfer options, which can reduce path optimality and restrict the multimodal options presented to travellers. This paper introduces Early Pruning, a low-overhead technique that accelerates routing algorithms without compromising optimality. By pre-sorting transfer connections by duration and applying a pruning rule within the transfer loop, the method discards longer transfers at a stop once they cannot yield an earlier arrival than the current best solution. Early Pruning can be integrated with minimal changes to existing codebases and requires only a one-time preprocessing step. The technique preserves Pareto-optimality in extended-criteria settings whenever the additional optimization criteria are monotonically non-decreasing in transfer duration. Across multiple state-of-the-art RAPTOR-based solutions, including RAPTOR, ULTRA-RAPTOR, McRAPTOR, BM-RAPTOR, ULTRA-McRAPTOR, and UBM-RAPTOR and tested on the Switzerland and London transit networks, we achieved query time reductions of up to 57\%. This approach provides a generalizable improvement to the efficiency of transit pathfinding algorithms.

PAC Learning with Bandit Feedback: Sharp Sample Complexity in the Realizable Setting

2026-05-26T08:08:44Z

We study the problem of multiclass PAC learning with bandit feedback in the realizable setting. In this framework, there is an unknown data distribution over an instance space $\mathcal{X}$ and a label space $\mathcal{Y}$, as in classical multiclass PAC learning, but the learner does not observe the labels of the i.i.d. training examples. Instead, in each round, it receives an unlabeled instance, predicts its label, and receives bandit feedback indicating only whether the prediction is correct. Despite this restriction, the goal remains the same as in classical PAC learning. We provide a general characterization of the optimal sample complexity of this problem, sharp for every concept class up to logarithmic factors. Our characterization is based on a new combinatorial dimension, termed the bandit $\mathrm{DS}$ dimension, defined via generalized combinatorial structures we call pseudo-boxes. These extend the pseudo-cubes underlying the $\mathrm{DS}$ dimension by allowing a different number of neighbors in each coordinate. In contrast to the $\mathrm{DS}$ dimension, which governs the full-information setting by counting the number of coordinates in the pseudo-cube, the bandit $\mathrm{DS}$ dimension aggregates the number of neighbors across coordinates, leading to a characterization in which the sample complexity scales with the total number of neighbors. We also propose a general learning algorithm achieving the upper bound, based on an algorithmic principle called ListCascade, which connects bandit learning to list learning and may be of independent interest.

Low Soundness Linearity Testing on the Half-Slice

2026-05-26T02:01:37Z

Let $f: T\to \{ 0,1 \}$ be a Boolean function on the Boolean half-slice, $T$, \ie elements of $\{0,1\}^n$ with Hamming weight $n/2$. We show that if $f(x)+f(y)=f(x+y)$ holds with probability $\frac{1+δ}{2}$ over a uniform pair $(x,y)$ such that $x,y,x+y\in T$, then $f$ agrees with some linear function on at least $\frac{1+δ}{2}-o(1)$ fraction of the points in $T$. More generally, we show that if $f$ passes the natural $k$-query BLR test with probability $\frac{1+δ}{2}$ for any $k\geq3$, then it must agree with some affine function at $\frac{1+δ^{\frac{1}{k-2}}}{2}-o(1)$ fraction of the points in $T$. The only other known linearity test for the slice in the low soundness regime (i.e., when $δ$ can be arbitrarily small) was given by Kalai, Lifshitz, Minzer, and Ziegler [FOCS'24]. Our result improves upon this result in two significant ways: firstly, it works for $k=3$ queries, instead of requiring $k\geq4$; secondly, our result is sharper, e.g., when $k=4$, we are able to conclude an agreement of $\frac{1+\sqrtδ}{2}-o(1)$ instead of $\frac{1+c\sqrtδ}{2}$ for $c\approx.0035$. In particular, our result matches (up to the $o(1)$ term) the conclusion one obtains over the full hypercube via the classical BLR analysis. Our main technical contribution is a new dense model theorem using bounds on Krawtchouk polynomials. Using these Krawtchouk polynomial bounds, we also obtain a simple $k$-query test ($k\geq 5$) that avoids any use of the dense model machinery. This simplified test naturally extends to the slice over the $q$-ary hypercube, giving the first such result over larger alphabets.

A Dynamic, Self-balancing k-d Tree

2026-05-26T01:37:04Z

The original description of the k-d tree recognized that rebalancing techniques, used for building an AVL or red-black tree, are not applicable to a k-d tree, because these techniques involve cyclic exchange of tree nodes that violates the invariant of the k-d tree. For this reason, a static, balanced k-d tree is often built from all of the k-dimensional data en masse. However, it is possible to build a dynamic k-d tree that self-balances when necessary after insertion or deletion of each k-dimensional datum. This article describes insertion, deletion, and rebalancing algorithms for a dynamic, self-balancing k-d tree, and measures their performance.

Rapid mixing in positively weighted restricted Boltzmann machines

2026-05-25T17:09:57Z

We show polylogarithmic mixing time bounds for the alternating-scan sampler for positively weighted restricted Boltzmann machines. This is done via analysing the same chain and the Glauber dynamics for ferromagnetic two-spin systems, where we obtain new mixing time bounds up to the critical thresholds.

Random-Access Ranked Retrieval and Similarity Search

2026-05-25T16:27:35Z

We extend Random Access, a fundamental operation that enables efficient search and exploration algorithms, to the modern interactive data systems based on Ranked Retrieval and Similarity Search, where orderings are dynamically defined over a high-dimensional feature space. This extension enables efficient solutions for a wide range of applications, from data analytics tools and database systems to recommendation systems and machine learning. We formalize the Random-Access Ranked Retrieval (RAR) problem, and extend it to Similarity Search. Our algorithmic innovations include the development of a theoretically efficient algorithm based on geometric arrangements, achieving logarithmic query time. However, this method suffers from exponential space complexity in high dimensions. Therefore, we develop a second class of algorithms based on $\varepsilon$-sampling, which consume a linear space. Since exactly locating the tuple at a specific rank is challenging due to its connection to the range counting problem, we introduce a relaxed variant called $κ$-Random-Access Ranked Retrieval, which returns a small subset of size $κ$ guaranteed to contain the target tuple. To solve this problem efficiently, we define an intermediate problem, Stripe Range Retrieval (SRR), and design a hierarchical sampling data structure tailored for narrow stripe range queries. Our method achieves practical scalability in both data size and dimensionality. We prove near-optimal bounds on the efficiency of our algorithms and validate their performance through extensive experiments on real and synthetic datasets, demonstrating scalability to millions of tuples and hundreds of dimensions.

Ineffectiveness for Search and Undecidability of PCSP Meta-Problems

2026-05-25T16:07:10Z

It is an open question whether the search and decision versions of promise CSPs are equivalent. Most known algorithms for PCSPs solve only their \emph{decision} variant, and it is unknown whether they can be adapted to solve \emph{search} as well. The main approaches, called BLP, AIP and BLP+AIP, handle a PCSP by finding a solution to a relaxation of some integer program. We prove that rounding those solutions to a proper search certificate can be as hard as any problem in the class TFNP. In other words, these algorithms are ineffective for search. Building on the algebraic approach to PCSPs, we find sufficient conditions that imply ineffectiveness for search. Our tools are tailored to algorithms that are characterized by minions in a suitable way, and can also be used to prove undecidability results for meta-problems. This way, we show that the families of templates solvable via BLP, AIP, and BLP+AIP are undecidable. Using the same techniques we also analyze several algebraic conditions that are known to guarantee the tractability of finite-template CSPs. We prove that several meta-problems related to cyclic polymorphims and WNUs are undecidable for PCSPs. In particular, there is no algorithm deciding whether a finite PCSP template (1) admits cyclic a polymorphism, (2) admits a WNU.

On the Complexity of Bilevel Independent Set Problem

2026-05-25T15:07:38Z

We consider a bilevel optimization problem in which the ground set is partitioned between two decision makers, a leader and a follower, whose optimization problems are interleaved. We study the Bilevel Independent Set problem, and its special case, the Bilevel Interval Selection problem, on different variants emerging from a combination of the type of leader's objective function, the type of follower's objective function, and the setting in which the follower reacts, i.e., either optimistically or pessimistically. Here we consider sum and bottleneck type objective functions. We investigate the computational complexity of all these variants for the Bilevel Independent Set problem, and sort them into their respective level of the polynomial hierarchy. Our results range from $\mathsf{P}$, $\mathsf{NP}$-completeness to $Σ_2^\mathsf{p}$-completeness. For the Bilevel Interval Selection problem, we give a dynamic programming algorithm running in time $\mathcal{O}(n^4\log n)$ for the variants in which the leader and the follower have objective functions of the sum type.

Mathematical Foundations for Peer-to-Peer Lattice Computation

2026-05-25T14:09:32Z

We give structured proofs for five mathematical propositions governing synchronous peer-to-peer computation on a finite grid graph embedded in $\mathbb{Z}^2$. Proposition 1 gives three lower bounds: a transport-work bound $\sum_i a_i \ell_i \geq W_1(μ,ν)$ attained by every shortest-path schedule; a completion-depth bound $D_{\min} \geq r_μ$ attained by non-congesting parallel routing; and a compressive-reduction edge bound $|E'| \geq \mathrm{St}_G(\mathrm{supp}(μ)\cup\{x_\star\})$. A negative result refutes naive $O(f_{\text{act}}P^{3/2})$ concentration for sink-trunk loads under corner-sink dimension-order routing, showing variance $Θ(f_{\text{act}}(1-f_{\text{act}})P^2)$. Proposition 2 establishes, under the $α$-$β$-$γ$ collective-communication and a Mixture-of-Experts sparse-activation model, that the grid-to-cluster latency ratio improves monotonically as $f_{\text{act}}$ shrinks whenever cluster fixed overhead dominates the grid geometric constant. Proposition 3 identifies a sufficient algebraic criterion for schedule-independent reduction: update rules decomposing into a local map and an abelian-monoid merge, expressed as a product-preserving functor from the Lawvere theory of commutative monoids into the hardware-state category. Proposition 4 bounds the conditional expected route length under i.i.d. site failure in the subcritical regime $δ< p_c^{\text{site}}(\mathbb{Z}^2)$ by an additive detour, using Aizenman-Barsky exponential cluster-size decay. Proposition 5 augments the grid with $k$ uniform long-range shortcuts per node, collapsing the typical shortest-path length from $Θ(\sqrt{P})$ to $O(\log P)$ under a mean-field (Erdős-Rényi) universality argument -- rigorous for the 1-D-ring base (Newman-Watts-Strogatz), conjectural for the 2-D-grid base.

Weighted Clique and Independent Set in Edge-Distant Hereditary Graphs

2026-05-25T11:30:48Z

In this work, we investigate the algorithmic aspects of two natural extensions of hereditary classes: the edge-apex class and the edge-add class, recently introduced by Singh and Sivaraman. These are defined as the graph classes obtained by at most one edge deletion or one non-edge addition, respectively, from a hereditary class $\mathcal{G}$. Building on earlier results showing that both classes remain hereditary and admit finite forbidden induced subgraph characterizations whenever $\mathcal{G}$ does, we focus on the Weighted Maximum Clique Problem (WMCP) and the Weighted Maximum Independent Set Problem (WMISP). We first present algorithms for WMCP and WMISP on both the edge-apex and edge-add classes of hereditary graph classes. Extending this framework, we introduce the notion of the $\mathcal{G}$-edge distance of a graph $G$, denoted by $ξ_{\mathcal{G}}(G)$, which quantifies how far $G$ is from the class $\mathcal{G}$ in terms of the minimum number of edge deletions or non-edge additions needed to transform it into a member of $\mathcal{G}$. By parameterizing with respect to this distance, we show that both WMCP and WMISP can be solved in $O^*(2^k)$ time on graphs whose $\mathcal{G}$-edge distance is $k$, provided these problems admit polynomial-time algorithms within the class $\mathcal{G}$. This result extends earlier algorithmic characterizations of the single edge-apex and edge-add classes to the more general setting of $k$-edge-distant graphs. By combining our general results with known properties of transitive graphs, we show that WMCP and WMISP can be solved in $O^*(2^k)$ time for graphs with transitive-edge distance $k$.

Engineering Practical Succinct Bit Vectors: A Space-Time Pareto Analysis on Apple Silicon ARM64 Cores

2026-05-25T07:34:24Z

Succinct data structures use space close to the information-theoretic minimum while answering queries directly on the compressed representation. In this paper, we present a practical engineering study of rank and select queries on bit vectors. We evaluate a classic two-level block baseline (BlockBitVec), an asymmetric superblock implementation (FastBitVec), and an entropy-compressed representation (RRRBitVec) based on the Raman, Raman, and Rao (RRR) coding scheme. On Apple Silicon (M-series ARM architecture), we demonstrate a 1.4x speedup in rank queries through asymmetric 4096/256-bit block boundaries, with a rank index overhead of 7.8%. We investigate the empirical behavior of RRRBitVec and observe a symmetric density-dependent bell-curve for rank latency -- where queries at extreme densities (1% and 99%) run up to 39% faster due to offset elimination at boundary classes. We further show that RRRBitVec achieves a 4.9x speedup over classic binary-search select baselines, running in 33.7 ns at uniform density by using a superblock-level sampling index that restricts sequential scans to L1-cache lookups. All implementations are validated against a correctness fuzzer executing over 78 million assertions with no failures. Source code and test harnesses are publicly available.

A Note on Approximability of Densest At-Least-k-Subgraph

2026-05-25T06:14:39Z

We study the Densest At-Least-$k$-Subgraph (DAL$k$S) problem, in which we are given an undirected graph $G$ and an integer $k$, and the goal is to find a subgraph of $G$ with at least $k$ vertices with maximum density. The best-known algorithm, independently discovered by Khuller and Saha (2009) and by Andersen (2007), yields a 2-approximation for DAL$k$S in polynomial time. In this note, we provide a (simple) reduction from Densest $k$-Subgraph (D$k$S) to Densest At-Least-$k$-Subgraph, which shows that, if D$k$S is hard to approximate to within any constant factor, then DAL$k$S is hard to approximate to within $(3/2 - \varepsilon)$ factor for every $\varepsilon > 0$. This holds in both the normal (non-parameterized) and the parameterized (by $k$) settings. We then generalize the reduction to provide a tight $(2 - \varepsilon)$ factor hardness of approximating Densest At-Least-$k$-Subgraph, albeit under a stronger hypothesis which roughly states that Densest $k$-Subgraph is hard to approximate to within $k^{1 - δ}$ factor for any constant $δ> 0$. Once again, this extends naturally to the parameterized setting. Previously, $(2 - \varepsilon)$ factor inapproximability for DAL$k$S was only known under the Small Set Expansion Hypothesis (Bergner, 2013; Manurangsi, 2017), which does not apply to the parameterized version of the problem. Furthermore, we show that the exact version of DAL$k$S is W[1]-hard (parameterized by $k$).

Dynamic Necklace Splitting

2026-05-25T06:02:45Z

The necklace splitting problem is a classic problem in fair division with many applications, including data-informed fair hash maps. We extend necklace splitting to a dynamic setting, allowing for relocation, insertion, and deletion of beads. We present linear-time, optimal algorithms for the two-color case that support all dynamic updates. For more than two colors, we give linear-time, optimal algorithms for relocation subject to a restriction on the number of agents. Finally, we propose a randomized algorithm for the two-color case that handles all dynamic updates, guarantees approximate fairness with high probability, and runs in polylogarithmic time when the number of agents is small.

Faster Mixing for Triangulations via Transport Flows

2026-05-24T16:39:38Z

We prove an $\widetilde O(n^2)$ bound for the relaxation time and the log-Sobolev time (inverse log-Sobolev constant) of the classical triangulation flip chain on a convex $(n+2)$-gon, implying a mixing time of $\widetilde O(n^2)$. The previous state of the art for the mixing time of this chain, due to Eppstein and Frishberg, was $\widetilde O(n^3)$, while the best known lower bound on the mixing time, due to Molloy, Reed, and Steiger, is $Ω(n^{3/2})$. Our relaxation time bound makes significant progress towards Aldous' conjectured bound of $Θ(n^{3/2})$ for the relaxation time. We improve upon the analysis of Eppstein and Frishberg by further developing the framework of transport flows introduced in the work of Chen et al. In this light, our results can be seen as a more efficient way of using combinatorial decompositions to obtain functional inequalities for Markov chains. We hope our ideas will find other applications in the future.

The Dirichlet Mechanism for rounding with strong negative correlation, with applications

2026-05-24T13:41:44Z

Many optimization and scheduling problems can be abstracted in terms of a bipartite ``assignment graph" $G = (L \cup R, E)$, where the goal is to select exactly one edge for each right-node. For example, a right-node may correspond to a job, and a left-node to a possible machine assignment. A common strategy to solve such problems is to obtain a fractional relaxation $x_e$ for each edge $e$, and then have each right-node independently select an edge with probability $x_e$. However, this may cause the left-nodes to become unevenly loaded, leading to suboptimal solutions for some problems. To address this, a number of algorithms for dependent rounding with strong negative correlation have been developed, e.g. Bansal, Srinivasan & Svensson (2021), Im & Shadloo (2020), Im & Li (2023), Harris (2024), Naor, Srinivasan & Wajc (2025). We introduce a new method for this, which we call the \emph{Dirichlet mechanism}. It is based on having each left-node draw Dirichlet random variables for its edges, and then having each right-node select an edge based on these values. This achieves quantitatively stronger negative correlation than previous algorithms, and is also simpler since it avoids the need for a tie-breaking mechanism. We illustrate the mechanism with improved approximation ratios for two problems. For oblivious online dependent rounding, we achieve a $0.68$-approximation which improves upon the previous $0.652$-approximation of Naor, Srinivasan & Wajc (2025). For the problem of scheduling jobs on unrelated machines to minimize weighted completion time, we achieve a $1.387$-approximation which improves upon the $1.398$-approximation of Harris (2024). (A recent algorithm of Li (2025) based on iterated rounding also provides a $1.36$-approximation if the weights of each job are independent of machine.)