https://arxiv.org/api/KftQyzCxNzU4YKHCsG0ilHxnBP4 2026-06-13T12:36:48Z 28966 30 15 http://arxiv.org/abs/2606.11469v1 Density estimation for Hellinger via minimum-distance estimators: mixtures of Gaussians, log-concave, and more 2026-06-09T21:57:20Z We study the task of density estimation, where we hope to accurately estimate a probability density from $n$ samples. A textbook method for density estimation in total variation distance is the minimum-distance estimator approach, where we conclude both the algorithm and the analysis merely from bounding the VC dimension of a particular concept class (the so-called Yatracos class). While this technique has originally yielded sharp guarantees primarily for total variation distance, in this work we extend the minimum-distance estimator approach for learning within Hellinger distance. Our main observation is that we may produce an analogous recipe for Hellinger (where we only require bounding the VC dimension of a related concept class) by drawing connections to recent results yielding reverse data processing inequalities. This recipe is flexible enough to accommodate fast algorithms originally designed for total variation distance; by modifying the approach of Acharya et al. (2017) we conclude the first near-linear time algorithm for learning classes including univariate mixtures of log-concave densities and mixtures of Gaussians (with arbitrary variances), with near-optimal sample complexity. 2026-06-09T21:57:20Z Spencer Compton Jerry Li http://arxiv.org/abs/2606.11448v1 A Unified Lower Bound on the Noisy Query Complexity of Boolean Functions 2026-06-09T21:00:16Z We study the query complexity of Boolean functions $f: \{0, 1\}^n \rightarrow \{0, 1\}$ in the noisy query model introduced by Feige, Raghavan, Peleg and Upfal [SICOMP 1994]. In this model, an algorithm can adaptively query the bits of an input vector, but each query result is independently flipped with constant probability $p \in (0, 1/2)$; repeated queries are allowed. The noisy query complexity $\mathsf{N}_p(f)$ of a function $f$ is defined as the minimum expected number of queries needed to compute $f(x)$ with error probability at most $1/3$, for the worst case input $x$. We prove a general lower bound on $\mathsf{N}_p(f)$ based on degree statistics of certain subgraphs of the Boolean hypercube. This is the first general lower bound beyond those implied by the simple observation that $\mathsf{N}_p(f)$ is lower bounded by the randomized query complexity. We show that this recovers (up to a constant factor) most previously known lower bounds on the noisy query complexity of Boolean functions, providing a unified framework for understanding these results and simplifying the proofs in several cases. Furthermore, this resolves in the affirmative an open problem of Gu, Li and Xu [COLT 2025] that $\mathsf{N}_p(f) = Ω(\mathsf{I}(f) \log \mathsf{I}(f))$, where $\mathsf{I}(f)$ denotes the total influence of $f$. We also apply our general lower bound to obtain tight bounds on the noisy query complexity for several new functions. 2026-06-09T21:00:16Z COLT 2026 Yuzhou Gu Xin Li Yinzhan Xu http://arxiv.org/abs/2606.11437v1 The Power of Test-Time Training for Approximate Sampling 2026-06-09T20:48:48Z Efficiently sampling from a complex probability distribution is a fundamental problem which has become increasingly pertinent in recent years with the rise of generative AI, as sophisticated sampling procedures from LLMs have been proposed to solve challenging reasoning problems. The efficacy of such sampling algorithms is limited, however, by the relationship between the LLM and the particular sampling task at hand, which has motivated the framework of test-time training (TTT). TTT works by updating a model's weights in response to partial generations and reward feedback received at inference time, thus adapting to the particular problem. In this work, we propose a formalization for TTT as the problem of producing a sample from a given probability measure $μ^\star$ belonging to a known class ${F}$ of distributions, given an oracle $\hat μ$ which yields approximate density estimates for $μ^\star$. This is closely related to the problem of reducing sampling to approximate counting studied in seminal works of Jerrum, Valiant & Vazirani (1986) and Jerrum & Sinclair (1989): namely, when ${F}$ is the class of all distributions, it coincides exactly with the aforementioned counting-to-sampling reduction. In this paper, we first show a quadratic lower bound on the query complexity of sampling from $μ^\star$ given query access to $\hat μ$ (for sufficiently large classes ${F}$), thus showing that the random walk approach proposed by Jerrum & Sinclair (1989) and refined by Hayes & Sinclair (2010), is optimal. This answers an open question posed by Hayes & Sinclair. We then show that this lower bound can be circumvented if the size of ${F}$ is bounded appropriately. As we discuss, this latter result can be viewed as an abstraction of TTT, and thus represents a starting point for the development of a principled theoretical framework for TTT. 2026-06-09T20:48:48Z Noah Golowich Ankur Moitra Dhruv Rohatgi http://arxiv.org/abs/2602.20949v2 Successor right-special strings with few Burrows--Wheeler Transform runs 2026-06-09T17:09:36Z We study successor right-special strings over an alphabet $Σ$ of size $σ$, a minimal-branching analogue of de Bruijn strings, and ask how few Burrows--Wheeler transform (BWT) runs are possible. In a de Bruijn string of order $k$, every $(k-1)$-context has all $σ$ right-extensions; here, every context is still right-special but has exactly two right-extensions, chosen by a successor rule. For order $3$, we construct an explicit family $B_σ^{(3)}$, for every $σ\geq 2$, whose cyclic BWT has $r_c = σ^2 + 2$ runs. A suitable terminated linearization has the same run count, $r = r_c = σ^2 + 2$, while the smallest suffixient set has size $χ= 2σ^2 + 1$. The ratio $χ/r = 2 - 3/(σ^2 + 2)$ then quantifies how nearly this forced branching saturates the known bound $χ/r \leq 2$, which we have previously shown to be asymptotically tight. Compared with our earlier alphabet-growing construction, this improves the gap from $O(1/σ)$ to $O(1/σ^2)$. We also show that the order-$3$ pattern appears as a blockwise two-row projection of normalized linear-feedback shift register (LFSR) de Bruijn sequences over $\mathbb F_q$, when such primitive trinomials $x^3 - x + c$ exist. For higher orders, we analyze the natural boundary-merged candidate $L_{σ,k}$ using the last-to-first (LF) permutation: it fails for $k = 4$ and all $σ\geq 3$, while verified $k = 5$ instances for $σ\in {3,4}$ yield $χ/r$ ratios exceeding $1.96$. 2026-02-24T14:33:55Z 13 pages, 1 table. Major revision of the previous version: reframed around successor right-special strings; improved the terminated BWT run count; added the LFSR interpretation and higher-order boundary-merged candidates Vinicius Tikara Venturi Date Leandro Miranda Zatesko http://arxiv.org/abs/2603.04689v4 Generalizing Fair Top-$k$ Selection: An Integrative Approach 2026-06-09T17:00:32Z Fair top-$k$ selection, which ensures appropriate proportional representation of members from minority or historically disadvantaged groups among the top-$k$ selected candidates, has drawn significant attention. We study the problem of finding a fair (linear) scoring function with multiple protected groups while also minimizing the disparity from a reference scoring function. This generalizes the prior setup, which was restricted to the single-group setting without disparity minimization. Previous studies imply that the number of protected groups may have a limited impact on the runtime efficiency. However, driven by the need for experimental exploration, we find that this implication overlooks a critical issue that may affect the fairness of the outcome. Once this issue is properly considered, our hardness analysis shows that the problem may become computationally intractable even for a two-dimensional dataset and small values of $k$. However, our analysis also reveals a gap in the hardness barrier, enabling us to recover the efficiency for the case of small $k$ when the number of protected groups is sufficiently small. Furthermore, beyond measuring disparity as the "distance" between the fair and the reference scoring functions, we introduce an alternative disparity measure$\unicode{x2014}$utility loss$\unicode{x2014}$that may yield a more stable scoring function under small weight perturbations. Through careful engineering trade-offs that balance implementation complexity, robustness, and performance, our augmented two-pronged solution demonstrates strong empirical performance on real-world datasets, with experimental observations also informing algorithm design and implementation decisions. 2026-03-05T00:06:47Z Guangya Cai http://arxiv.org/abs/2606.11067v1 Enumerating Inclusion-Maximal Arithmetic Progressions 2026-06-09T16:30:26Z We present a simple $\mathcal{O}\left( n^2 \frac{ \log N }{ \log \log N } + N \right)$ enumeration algorithm for solving a problem from mathematical and computational music analysis where, given a strictly increasing integer sequence, $S$, with $n$ entries and maximum value $N$, the task is to enumerate all $m$ $\textit{inclusion-maximal arithmetic progressions (IMAPs)}$ in this sequence. An IMAP is a subsequence, $S' \subseteq S$ with $k>2$ integers, in which (i) the difference between any two consecutive integers is the same number, $d$ (i.e., $S'$ is an $\textit{arithmetic progression}$), (ii) $S'$ cannot be further extended to the left or to the right with any additional integers from $S$ while still remaining an arithmetic progression (i.e., $S'$ is a $\textit{maximal}$ arithmetic progression), and (iii) there is no other maximal arithmetic progression, $S'' \subseteq S$, which $\textit{properly}$ contains $S'$ (i.e., $S'$ is an $\textit{inclusion-maximal}$ arithmetic progression). We further provide proofs for the expected number of IMAPs in random integer sequences, $S$, and a bound on their order of growth. Finally, we provide empirical experiments comparing both (a) the practical running time performance of the proposed algorithm against that of a previously known algorithm which has higher time complexity $\mathcal{O}(N^{2+o(1)}n)$, and (b) the actual enumerated number of IMAPs to that of their mathematically expected number. Notably, the proposed algorithm demonstrates a significant improvement in running time over the previously known algorithm, and in immediate practical applications, will allow for more efficient analysis of large and rhythmically complex musical pieces. 2026-06-09T16:30:26Z Brian Bemman Maximilien Gadouleau Oliver W. Gnilke George B. Mertzios http://arxiv.org/abs/2606.11283v1 Fixed-Parameter Tractability of Private Synthetic Data Generation 2026-06-09T15:14:11Z We study the problem of generating synthetic data under differential privacy. We establish fixed-parameter tractability (FPT) for this problem where the parameter is the treewidth of the query family's incidence graph. Our algorithms attain optimal error rates across all regimes and are realized by two different approaches: the first is based on linear programming (LP) and the FPT of the separation problem for the LP dual; the second is based on a subsampled private multiplicative weights method, where we obtain FPT for sampling from Gibbs distributions. Both approaches are unified by a dynamic programming framework over a tree decomposition. 2026-06-09T15:14:11Z Badih Ghazi Cristóbal Guzmán Pritish Kamath Alexander Knop Ravi Kumar Pasin Manurangsi http://arxiv.org/abs/2502.11561v4 Resident fitness computation in linear time and other algorithmic aspects of interacting trajectories 2026-06-09T14:51:08Z Systems of interacting trajectories were recently studied in~\cite{HGSTW24}. Such a system of $[0,1]$-valued piecewise linear trajectories arises as a scaling limit of the system of logarithmic subpopulation sizes in a population-genetic model (more precisely, a Moran model) with mutation and selection. By definition, the resident fitness is initially 0 and afterwards it increases by the ultimate slope of each trajectory that reaches height 1. We show that although the interaction of $n$ trajectories may yield $Ω(n^2)$ slope changes in total, the resident fitness function can be computed algorithmically in $O(n)$ time. Our algorithm uses the so-called continued lines representation of the system of interacting trajectories. In the special case of Poissonian interacting trajectories where the birth times of the trajectories form a Poisson process and the initial slopes are random and i.i.d., we provide a linear bound on the expected total number of slope changes. 2025-02-17T08:48:29Z 19 pages, 4 figures Katalin Friedl Viktória Nemkin András Tóbiás http://arxiv.org/abs/2606.10944v1 Express Language Modeling 2026-06-09T14:48:56Z We introduce a new tool, Express, for converting a non-causal attention approximation into a causal approximation with matching approximation guarantees. When combined with the state-of-the-art Thinformer approximation, Express improves upon the best known causal attention guarantees, delivering $\log^{3/2}(n)/s$ approximation error with only $O(s)$ memory and $O(s^2 \log^2(n))$ compression overhead for a sequence of length $n$. We pair these developments with an efficient I/O-aware Triton implementation, demonstrate substantial speedups over FlashAttention 2, and use Express to overcome four resource bottlenecks in the language modeling pipeline: long-context prefill, KV cache compression, long-form memory-constrained decoding, and long-form compute-constrained decoding. 2026-06-09T14:48:56Z Albert Gong Annabelle Michael Carrell Raaz Dwivedi Lester Mackey http://arxiv.org/abs/2606.10670v1 On the Complexity of Signed Domination 2026-06-09T10:20:43Z Given a graph $G = (V, E)$, a signed dominating function is a function $f: V \rightarrow \{-1, 1\}$ such that for every vertex $u \in V$, $\sum\limits_{v \in N[u]} f(v) \geq 1$. The weight of $f$ is defined as $\sum\limits_{u \in V} f(u)$. The objective of the \sd{} problem is to compute a signed dominating function $f$ of minimum weight. The problem is known to be NP-complete even when restricted to bipartite, chordal, and planar graphs. In this paper, we extend the known complexity results for the \sd{} problem. Since the problem is NP-complete on chordal graphs, we study its complexity on split graphs, a subclass of chordal graphs, and show that it remains NP-complete. Moreover, as the problem is W[2]-hard parameterized by weight, we investigate its parameterized complexity with respect to structural parameters. We prove that the problem is W[1]-hard when parameterized by feedback vertex set number (and hence by treewidth and clique-width). Motivated by this hardness result, we consider more restrictive parameters, neighbourhood diversity and twin cover number, and present FPT algorithms. 2026-06-09T10:20:43Z Extended abstract of this paper has appeared in IWOCA 2026 Sangam Balchandar Reddy 10.1007/978-3-032-27732-9_34 http://arxiv.org/abs/2606.10446v1 Proportionality from Sampled Approvals 2026-06-09T05:50:09Z How much voter input is necessary in order to ensure representation in multiwinner elections? If voters are randomly selected from an underlying population, how many draws are necessary to find a proportional committee of $k$ candidates, with high probability? Sample-based adaptations of standard multiwinner voting rules that satisfy the justified representation (JR) proportionality axiom use $\tilde O(k^5 \log \frac{m}δ)$ sampled approval ballots over $m$ candidates, where $δ$ is a probability of failure and $\tilde O$ suppresses $\mathrm{polylog}(k)$ factors. We present a rule for which the sample complexity of JR-family proportional committee selection is $\tilde O(k^{4}\log \frac{m}δ)$. This separates the sample complexity of JR from that of the natural corresponding additive approximation to the voter coverage (Chamberlin-Courant) objective, which we show requires $Θ(k^5\log \frac{m}δ)$ samples. For lower bounds, we present a family of instances with $m, \frac{1}δ \in \mathrm{poly}(k)$ for which $Ω(k^3)$ sampled ballots are necessary in order to identify a JR committee. We also show a dependence on $\log m$ is necessary. This lower bound is versatile, and also applies to Hare proportionality for solid coalitions (PSC) for ranked ballots. Unfortunately, no number of sampled ballots suffices to satisfy the slightly stronger Droop JR and Droop PSC axioms with high probability. But mild relaxations of JR require fewer samples, as do the beyond-worst-case domains and actual approval preferences we evaluate. 2026-06-09T05:50:09Z 44 pages, 9 figures Gregory Kehne http://arxiv.org/abs/2606.10399v1 Average-Case and Smoothed Near-Optimality for Color-Code Decoding 2026-06-09T04:18:13Z Minimum-weight decoding for two-dimensional color codes is NP-hard (Walters and Turner 2026), motivating the search for approximation guarantees beyond worst-case exact decoding. We study a block-based decoder for triangular color-code lattices. The decoder satisfies the deterministic additive guarantee \(\lvert E_{\mathrm{alg}}\rvert \leq \operatorname{OPT}(S)+O(n/τ)\), where \(n\) is the number of vertices and \(τ\) is the wall spacing. We show that this additive guarantee becomes a near-optimal multiplicative guarantee under natural noise models. For constant-rate i.i.d. face noise and constant local degree, choosing \(τ=Θ(ε^{-1})\) gives a \((1+ε)\)-approximation with probability \(1-\exp(-Ω(n))\), in time \(n2^{O(ε^{-1})}\). We also prove a smoothed analogue: the same near-optimality guarantee holds when an arbitrary adversarial error pattern is perturbed by independent constant-rate noise. Finally, in the low-probability regime \(p=o(1/\log^2 n)\), the syndrome decomposes into small active regions with high probability, allowing independent component-wise decoding and yielding an exact minimum-weight correction in time \(n2^{O((\log n)^{3/2})}\). These results show that, despite worst-case hardness, color-code decoding admits strong average-case, smoothed, and sparse-regime guarantees. 2026-06-09T04:18:13Z Daniel Gibney Jackson Huffstutler http://arxiv.org/abs/2405.19504v2 MUVERA: Multi-Vector Retrieval via Fixed Dimensional Encodings 2026-06-08T18:58:48Z Neural embedding models have become a fundamental component of modern information retrieval (IR) pipelines. These models produce a single embedding $x \in \mathbb{R}^d$ per data-point, allowing for fast retrieval via highly optimized maximum inner product search (MIPS) algorithms. Recently, beginning with the landmark ColBERT paper, multi-vector models, which produce a set of embedding per data point, have achieved markedly superior performance for IR tasks. Unfortunately, using these models for IR is computationally expensive due to the increased complexity of multi-vector retrieval and scoring. In this paper, we introduce MUVERA (MUlti-VEctor Retrieval Algorithm), a retrieval mechanism which reduces multi-vector similarity search to single-vector similarity search. This enables the usage of off-the-shelf MIPS solvers for multi-vector retrieval. MUVERA asymmetrically generates Fixed Dimensional Encodings (FDEs) of queries and documents, which are vectors whose inner product approximates multi-vector similarity. We prove that FDEs give high-quality $ε$-approximations, thus providing the first single-vector proxy for multi-vector similarity with theoretical guarantees. Empirically, we find that FDEs achieve the same recall as prior state-of-the-art heuristics while retrieving 2-5$\times$ fewer candidates. Compared to prior state of the art implementations, MUVERA achieves consistently good end-to-end recall and latency across a diverse set of the BEIR retrieval datasets, achieving an average of 10$\%$ improved recall with $90\%$ lower latency. 2024-05-29T20:40:20Z Fixed error in the proof of Theorem 2.1. The prior version claimed dimension $d_{FDE} = O(m/δ)^{O(1/ε)}$, whereas the new version has the correct bound $d_{FDE} = O(m/(εδ))^{O(1/ε^2)}$ Laxman Dhulipala Majid Hadian Rajesh Jayaram Jason Lee Vahab Mirrokni http://arxiv.org/abs/2606.09729v1 Bayesian Probing on Graphs 2026-06-08T16:51:30Z We introduce a stochastic probing problem with correlated items. In our model, which we call Bayesian Probing, the correlations are modeled by an underlying graph $G$. Each vertex is independently active with a known probability. Each item corresponds to an edge in the graph. Probing an edge has some cost, gives some reward if both endpoints are active, and also reveals the state of its endpoints. Hence a probe induces a Bayesian update on the remaining edges. The goal is to adaptively probe items/edges subject to a knapsack constraint to maximize the expected total reward obtained from the probed edges. Bayesian Probing generalizes stochastic knapsack and stochastic probing by allowing correlations between items. Moreover, it gives a tractable model for the Bayesian Active Search problem, a popular problem considered in the machine learning community. In Bayesian Active Search, the goal is to find items in a particular class by adaptively probing at most, say $k$, items. Given a prior distribution over items, we want to compute a Bayesian policy to maximize the number of such items found. For this general problem with arbitrary priors, there are strong lower bounds on efficiently computing good policies. In this paper, we design efficient approximation algorithms for Bayesian Probing. These results give the first efficient approximation algorithms for Bayesian Active Search, for a class of practically-relevant prior distributions. 2026-06-08T16:51:30Z Anupam Gupta Benjamin Moseley Rudy Zhou http://arxiv.org/abs/2606.09728v1 Quantum Cut Sparsifiers 2026-06-08T16:51:08Z In this paper, we continue a line of research initiated by Basu, Brakensiek, and Putterman [2026] studying the sparsifiability of Hamiltonians. We focus particularly on the sparsifiability of the widely-studied Quantum Cut (QC) Hamiltonians. Our main result is that in an $n$-qubit system, any $n$-qubit QC Hamiltonian can be sparsified to $\widetilde{O}(n /\varepsilon^2)$ many terms while preserving the energy of every state up to a factor of $1 \pm \varepsilon$. Our result can be interpreted as giving an importance sampling scheme for the edges of an arbitrary graph $G$ such that the \emph{Kikuchi} graph at level $\ell$ of the sampled graph is a spectral approximation to the Kikuchi graph of $G$. Importantly, the \emph{same} sampling scheme works simultaneously for all $\ell$. The natural approach of leverage score sampling, analyzed via matrix concentration inequalities, yields a polynomially worse bound in our setting because the underlying matrices have dimension $\sim 2^n$. Instead, our approach relies on decomposing the action of these matrices into invariant subspaces. Then, by using an operator-valued inequality of Alon and Kozma [Ann. Henri Poincaré, 2020], itself building on an \emph{octopus inequality} of Caputo, Liggett, and Richthammer [J. AMS, 2010], we extend our sparsification technique to all expander graphs. We then invoke expander decomposition to extend our sparsifier to all graphs. 2026-06-08T16:51:08Z Arpon Basu Joshua Brakensiek Pravesh K. Kothari Aaron Putterman