Diffusion-Network Alignment: An Efficient Algorithm and Explicit Probability Bounds

2026-06-11T04:08:39Z

This paper studies a variation of the classic network alignment problem, named diffusion-network alignment. The goal is to align the vertices of a rooted diffusion tree to the vertices of a network, where the diffusion tree could be from a communication trace or contact tracing, and the network could be an online or offline social network. Different from the classic network alignment where both networks are fully observed, this model captures the information asymmetry of two networks. To solve this problem, this paper presents an efficient algorithm based on tree correlation tests to extract alignment information from local neighborhoods. We analyze the performance of the algorithm in the sparse graph regime and show that with high probability, all matched pairs are correct. Furthermore, for each vertex on the diffusion tree, this paper establishes an explicit lower bound on the probability that the vertex is correctly matched. These lower bounds are depth-dependent and increase as vertices get closer to the root.

Adaptive Weighted Averaging

2026-06-11T00:03:10Z

We study the problem of selecting the largest among $n$ unknown values $x_1,\dots,x_n$ given only a single unbiased estimate $y_i$ for each $x_i$. We design strategies that are simultaneously admissible (not uniformly dominated by any other strategy) and also never worse than a given baseline such as uniform random selection. We provide an application to stochastic optimization, where we obtain online-to-batch conversion bounds with a desirable "no-compromise" guarantee: they are never worse than standard random iterate selection, and yet can be significantly better in benign settings.

A unified complexity bound for logconcave sampling

2026-06-10T21:28:30Z

We give a simple, unified, and nearly tight bound for sampling arbitrary logconcave distributions from a warm start using the In-and-Out algorithm along with exponential lifting. The main new ingredient in the analysis is an improved bound on the Poincaré constant of a lifted distribution. As a consequence, the resulting convergence rate is nearly tight for both constrained settings (e.g., Gaussian restricted to a convex body) and well-conditioned settings (e.g., strongly logconcave and smooth densities).

Random Proposals: A Softmax-Based Local-Improvement Framework for Maximum Weighted Matching

2026-06-10T21:27:48Z

We propose a randomized local-improvement algorithm for the Maximum Weighted Matching (MWM) problem. Our method introduces a softmax-based biased sampling mechanism that achieves local $\varepsilon$-dominance and yields an expected $\frac{1}{2}-\varepsilon$ approximation ratio. We prove convergence guarantees and show that the algorithm runs in $O\!\left(m\log(1/\varepsilon)/p_{\min}\right)$ time, where $p_{\min}$ is the minimum softmax proposal probability over all edges; under mild conditions on the bias parameter and weight range, this simplifies to $O(m\log(1/\varepsilon))$. The framework provides a tunable tradeoff between convergence speed and approximation quality.

The Parametrised Complexity of Counting Small Sub-Hypergraphs

2026-06-10T20:36:59Z

Subgraph counting is a fundamental and well-studied problem whose computational complexity is well understood. Quite surprisingly, the hypergraph version of subgraph counting has been almost ignored. In this work, we address this gap by investigating the most basic sub-hypergraph counting problem: given a (small) hypergraph $H$ and a (large) hypergraph $G$, compute the number of sub-hypergraphs of $G$ isomorphic to $H$. Formally, for a family $\mathcal{H}$ of hypergraphs, let #Sub($\mathcal{H}$) be the restriction of the problem to $H \in \mathcal{H}$; the induced variant #IndSub($\mathcal{H}$) is defined analogously. Our main contribution is a complete classification of the complexity of these problems. Assuming the Exponential Time Hypothesis, we prove that #Sub($\mathcal{H}$) is fixed-parameter tractable if and only if $\mathcal{H}$ has bounded fractional co-independent edge-cover number, a novel graph parameter we introduce. Moreover, #IndSub($\mathcal{H}$) is fixed-parameter tractable if and only if $\mathcal{H}$ has bounded fractional edge-cover number. Both results subsume pre-existing results for graphs as special cases. We also show that the fixed-parameter tractable cases of #Sub($\mathcal{H}$) and #IndSub($\mathcal{H}$) are unlikely to be in polynomial time, unless respectively #P = P and Graph Isomorphism $\in$ P. This shows a separation with the special case of graphs, where the fixed-parameter tractable cases are known to actually be in polynomial time.

First Order Logic on Pathwidth Revisited Again

2026-06-10T17:49:22Z

Courcelle's celebrated theorem states that all MSO-expressible properties can be decided in linear time on graphs of bounded treewidth. Unfortunately, the hidden constant implied by this theorem is a tower of exponentials whose height increases with each quantifier alternation in the formula. More devastatingly, this cannot be improved, under standard assumptions, even if we consider the much more restricted problem of deciding FO-expressible properties on trees. In this paper we revisit this well-studied topic and identify a natural special case where the dependence of Courcelle's theorem can, in fact, be improved. Specifically, we show that all FO-expressible properties can be decided with an elementary dependence on the input formula, if the input graph has bounded pathwidth (rather than treewidth). This is a rare example of treewidth and pathwidth having different complexity behaviors. Our result is also in sharp contrast with MSO logic on graphs of bounded pathwidth, where it is known that the dependence has to be non-elementary, under standard assumptions. Our work builds upon, and generalizes, a corresponding meta-theorem by Gajarský and Hliněný for the more restricted class of graphs of bounded tree-depth.

U-HNSW: An Efficient Graph-based Solution to ANNS Under Universal Lp Metrics

2026-06-10T17:47:19Z

Approximate nearest neighbor search under universal L_p metrics (ANNS-U-L_p) is an important and challenging research problem, as it requires answering queries under all possible p (0

The World's Fastest Matching Engine Algorithm

2026-06-10T15:47:40Z

A single CPU core sustains 32 million order messages per second at sub-microsecond median end-to-end host-path response latency, 4.7-11 times faster than the best available open-source matching engines on identical hardware. Scaled out, a single 96-core commodity server (~$1,630/month) sustains ~640 million messages per second across 10,000 symbols, over 20 times the provisioned capacity of the U.S. consolidated quote feed. We reach these numbers by attacking the storage layer that sets matching latency. The dominant order-book implementation, linked lists chained through a balanced tree, imposes two costs on every operation: pointer-chased traversal to the insertion point, and root-to-leaf search to locate the target price level. Under micro-bursts these costs produce tail-latency spikes that degrade market quality precisely when liquidity is most needed. We present two data-structure contributions that eliminate them. The first is the Priority-Indicated Node (PIN), a priority queue in which entries occupy fixed-capacity, contiguously addressable slots, with indicators encoding the entry's global priority status. Unlike heaps, which require O(log n) comparisons per operation, the PIN resolves insertion position directly from the indicators without comparing entries; indicator updates are O(1), independent of queue size. A depth-aware capacity model sizes each PIN so hot entries fit within L1 residency. The second targets a broader inefficiency: balanced search trees search from root to leaf on every insertion and deletion, even when the caller already knows the key's in-order neighbors, which in electronic trading are available at zero cost. Neighbor-aware insertion and deletion use known neighbor references to attach or remove a node with O(1) reference writes, followed by single-path rebalancing, across red-black, AVL, and B+-tree variants.

Nearly Instance Optimal Sparse Matrix Approximation from Matrix-Vector Products

2026-06-10T15:04:46Z

A large body of work studies the problem of learning an approximation to an implicit matrix $A\in \mathbb{R}^{m\times n}$ that is only accessible implicitly via matrix-vector product queries (matvec queries) of the form ${x} \rightarrow {A}{x}$ or ${x} \rightarrow {A}^T{x}$. Of particular interest are methods that learn a near-optimal approximation with a fixed sparsity pattern. For example, we might want to learn a near-optimal diagonal, banded, or arrow-head approximation to an implicit matrix $A$. Naturally, the number of matvec queries required to solve this problem depends on the sparsity pattern, which can be encoded as a binary matrix ${S}\in \{0,1\}^{m\times n}$. The query complexity of previous algorithms scales with quantities like the total number of ones in ${S}$, its maximum column/row sparsity, or the chromatic number of a its "conflict graph". These quantities are incomparable: for a given ${S}$, parameterizing by one might yield lower query complexity than another. In this work, we unify and tighten these prior results by providing a nearly sharp characterization of the matvec query complexity of sparse matrix approximation. Generalizing a definition from graph algorithms, let the degeneracy, ${degen}({S})$, denote the smallest number $k$ so that, if we iteratively delete all rows and columns of ${S}$ with $\leq k$ ones, we are left with an empty matrix. We show that a near-optimal approximation to $A$ with sparsity pattern $S$ can be learned with $\tilde{O}({degen}({S}))$ matrix-vector product queries, and $Ω({degen}({S}))$ queries are necessary, for any sparsity pattern ${S}$. Moreover, unlike prior work based on graph coloring, all of our methods run in polynomial time.

Extended Fourier analysis of signals

2026-06-10T13:13:25Z

This summary of the doctoral thesis provides a comprehensive formulation of the Extended Discrete Fourier Transform (EDFT), derived directly from the Fourier integral and its orthogonality properties. The method is obtained by solving weighted least-squares estimators in both continuous and discrete domains, yielding an adaptive frequency-domain representation that remains fully consistent with the classical Fourier framework. In the special case of uniformly sampled data on a uniform frequency grid of the same size, the EDFT reduces exactly to the classical Discrete Fourier Transform (DFT). However, when the analysis grid exceeds the number of observed samples, EDFT circumvents conventional zero-padding by optimizing the transformation basis over the extended frequency set. This enables accurate spectral estimation from incomplete or nonuniformly sampled data. Consequently, the EDFT achieves enhanced frequency resolution in regions of strong spectral content while maintaining global resolution balance, thereby remaining consistent with the uncertainty principle. The inverse EDFT reconstructs the original signal and produces extrapolated or interpolated samples wherever spectral information is available. The EDFT requires no explicit separation of deterministic and stochastic components and accurately captures broadband, transient, and sinusoidal features simultaneously. Simulation studies confirm its robustness under nonuniform sampling, multiple Nyquist zones, missing-data conditions, and signals with mixed spectra comprising both line and continuous components. Although iterative computation of the EDFT entails higher numerical cost compared to the classical DFT, this limitation - significant in the 1990s - has been largely mitigated by modern computational resources, rendering the EDFT practical for contemporary signal analysis applications.

Near-Optimal Distributed 2-Ruling Sets on Graphs with Low Arboricity

2026-06-10T11:53:00Z

Given a graph $G=(V,E)$, a $β$-ruling set is a subset of nodes $S\subseteq V$ that is independent, and each node in $V$ is at distance at most $β$ from some node in $S$. In this paper, we present almost optimal distributed algorithms for finding $2$-ruling sets in the classical \LOCAL model. Our main contribution is a randomized algorithm that w.h.p.\ computes a $2$-ruling set on any $n$-node graph with bounded arboricity in $O(\log \log n)$ rounds. In fact, the algorithm works up to arboricity $O(\log\log n)$, improves exponentially over the prior state of the art that can be achieved by combining [Barenboim, Elkin, Pettie, Schneider; JACM'16], [Ghaffari; SODA'16], and [Bisht, Kothapalli and Pemmaraju; PODC'14], and nearly matches the lower bound of $Ω(\log \log n / \log \log \log n)$ [Balliu, Brandt, Kuhn, Olivetti; FOCS'20]. The domination parameter $β=2$ is optimal for algorithms with runtime $\log^{o(1)}n$: on graphs with arboricity $2$, there is a lower bound of $Ω(\sqrt{\log n})$ rounds for MIS (i.e., $β= 1$) [Khoury, Schild; FOCS'25]. Additionally, we obtain improved algorithms for larger arboricity. For general graphs with arboricity $α$, we present a randomized algorithm that computes a $2$-ruling set in $\widetilde{O}(\log^{5/8} α+\log^{5/3} \log n)$ rounds. This improves exponentially over the state of the art for a large range of non-constant arboricity. Our techniques extend beyond distributed computing. We present an $O(\log \log \log n)$-round algorithm in the low-space Massively Parallel Computation (\mpc) model that w.h.p.\ computes a $2$-ruling set on any graph with arboricity up to $2^{poly (\log \log n)}$, improving exponentially over the state of the art from [Kothapalli, Pai, Pemmaraju; FSTTCS'20] combined with [Fischer, Giliberti, Grunau; SPAA'23].

On finding exact solutions of linear programs in the oracle model

2026-06-10T08:54:38Z

We consider linear programming in the oracle model: $\max\{c^\top x \,:\, x\in P\}$, where the polyhedron $P=\{x\in\mathbb{R}^n\,:\, Ax\le b\}$ is given by a separation oracle. We present an algorithm that finds exact primal and dual solutions using $O(n^2\log(n/δ))$ oracle calls and $O(n^4\log(n/δ)+n^5\log\log(1/δ))$ arithmetic operations, where $δ$ is a geometric condition number associated with the system $(A,b)$. These bounds do not depend on the cost vector $c$ and do not require a priori knowledge of $δ$. For rational data, $\log(1/δ)$ is polynomially bounded in the encoding size of $(A,b)$, thus providing a polynomial-time algorithm. The algorithm works in a black box manner, requiring a subroutine for approximate primal and dual solutions; the above running times are achieved when using the cutting plane method of Jiang, Lee, Song, and Wong (STOC 2020) for this subroutine. Whereas approximate solvers may return primal solutions only, we develop a general framework for extracting dual certificates based on the work of Burrell and Todd (Math. Oper. Res. 1985). Our algorithm strengthens results by Grötschel, Lovász, and Schrijver (Prog. Comb. Opt. 1984), and by Frank and Tardos (Combinatorica 1987) that rely on bit-complexity arguments. Our algorithm avoids rounding-based arguments such as simultaneous Diophantine approximation and uses geometric arguments instead.

Approximation of Spanning Tree Congestion using Hereditary Bisection

2026-06-10T08:25:42Z

The Spanning Tree Congestion (STC) problem is the following NP-hard problem: given a graph $G$, construct a spanning tree $T$ of $G$ minimizing its maximum edge congestion where the congestion of an edge $e\in T$ is the number of edges $uv$ in $G$ such that the unique path between $u$ and $v$ in $T$ passes through $e$; the optimal value for a given graph $G$ is denoted $STC(G)$. It is known that every spanning tree is an $n/2$-approximation for the STP problem. A long-standing problem is to design a better approximation algorithm. Our contribution towards this goal is an $O(Δ\cdot\log^{3/2}n)$-approximation algorithm where $Δ$ is the maximum degree in $G$ and $n$ the number of vertices. For graphs with a maximum degree bounded by a polylog of the number of vertices, this is an exponential improvement over the previous best approximation. Our main tool for the algorithm is a new lower bound on the spanning tree congestion which is of independent interest. Denoting by $hb(G)$ the hereditary bisection of $G$ which is the maximum bisection width over all subgraphs of $G$, we prove that for every graph $G$, $STC(G)\geq Ω(hb(G)/Δ)$.

A Fast Gaussian Mechanism under Continual Observation, with Applications

2026-06-10T07:36:58Z

We consider the problem of privately releasing a $k$-dimensional vector under updates: Starting with a zero vector, at times $t_1, t_2,\dots$ the vector is updated by adding $x^{(1)}, x^{(2)},\dots$, respectively. For positive integers $T$, $k$ we model the updates as a data set $\{(t_i, x^{(i)})\}_i$, where $t_i \in [T]$ and $x^{(i)} \in B_k$ (the $k$-dimensional unit ball). Two such data sets are said to be neighboring if their symmetric difference has size at most $1$. The continual release consists of the sum $A^{(t)} = \sum_{i \; : \; t_i \leq t} x^{(i)}$ for each time step $t=1,\dots,T$. Classical continual release techniques allow us to release an approximation of $A^{(1)},\dots,A^{(T)}$ with additive noise of magnitude $\text{polylog}(T)$, computed in time $O(kT)$, even in the on-line, adaptive case where data is continually revealed for the current time step. Motivated by private sketching techniques, we consider the setting where only a \emph{subset} of entries in $A^{(t)}$ need to be released at time step $t$. Our new result is that it is possible to sample any desired entry in a given noise vector in \emph{constant time} while reproducing exactly the distribution of the binary tree mechanism with Gaussian noise. The improvement on the known time bound of $O(\log T)$ comes from a new data structure that allows us to sample a new noise value with the correct correlations in constant time using Brownian bridges. We present two data management applications, of independent interest, that use our technique in conjunction with differentially private CountSketches: 1) A dynamic data structure for orthogonal range counting queries with a better privacy/accuracy/space trade-off than previous data structures, and 2) Join size estimation, where in addition we show improved high-probability bounds.

Beyond Frequency Marching: Orbit Recovery in Dihedral and Projected Multireference Alignment

2026-06-10T06:23:49Z

Multireference alignment (MRA) is the task of recovering a hidden "signal" vector, given many noisy copies that have been cyclically shifted by unknown offsets. This task belongs to the class of orbit recovery problems, in which the observed samples are affected by some group action. These problems have a variety of practical motivations, including the reconstruction of 3-dimensional molecular structure from cryogenic electron microscopy (cryo-EM) images. We consider two variants of MRA: dihedral MRA, where the cyclic group is replaced by the dihedral group, allowing for reversals of the vector in addition to shifts; and projected MRA, where the observations are passed through a projection operator akin to the tomographic projection present in cryo-EM. We apply the method of moments and aim to recover the signal from the third moment tensor of the samples. This inverse problem is well understood for basic MRA, but for the variants we consider there is no polynomial-time algorithm known to succeed for generic signals. We give the first such algorithm for both of these variants. Our method requires the signal length to be a power of two, and recursively subdivides the problem into smaller problems of half the size. The algorithm's success for generic signals is proven, conditional on a conjecture about the rank of a certain symbolic matrix of polynomials. For any given problem size, this conjecture can be verified on a computer.