Balancing the Spread of Two Opinions in Sparse Social Networks

2026-06-02T22:34:26Z

Inspired by the famous Target Set Selection problem, we propose a new discrete model to simultaneously spread two opinions within a social network and perform an initial study of its complexity. Here, we are given a social network, a seed-set of agents for each opinion, two thresholds for each agent, a budget, and a number of rounds. The first threshold represents the willingness of an agent to adopt an opinion if the agent has no opinion at all, while the second threshold states the willingness to acquire a second opinion if the agent already has one. The goal is to add at most budget-many agents to the initial seed-sets such that the process started with these extended seed-sets stabilizes within the given number of rounds, with each agent having either both opinions or none. That is, our goal is to ensure that the spread of opinions is balanced. We show that the problem is NP-hard, and thus we study the problem from the perspective of parameterized complexity. In particular, we show that the problem is FPT when parameterized by the number of rounds, the maximum threshold, and the treewidth combined. This algorithm also applies to the combined parameter, the treedepth and the maximum threshold. Finally, we show that the problem is FPT when parameterized by the vertex cover number, the $3$-path vertex cover number, or the vertex integrity of the input network alone. To complement our tractability results, we show that the problem is W[1]-hard with respect to a) the sizes of the initial seed-sets and the feedback-vertex set number combined, even if all thresholds are bounded by a constant, and b) the budget, the 4-path vertex cover number, and the feedback-vertex set number combined, even if every activation process stabilizes in at most 4 rounds.

What Makes Majority Illusion Easy to Detect?

2026-06-02T22:25:11Z

Majority illusion is an undesirable phenomenon in social networks in which agents incorrectly perceive a minority opinion as dominant. This can severely distort collective behavior and decision-making. We study the fundamental question of detecting whether a social network allows for a majority illusion. Formally, in the $q$-Majority Illusion problem, we ask whether there exists a binary labeling of agents in which at least a $q$-fraction of agents have the majority of neighbors with the minority label. We investigate how various structural properties of the underlying social network influence the tractability of this question, and provide a detailed map of its computational complexity.

Local Clustering on Complex Graphs and Complex Hypergraphs

2026-06-02T21:08:29Z

Local/seeded clustering aims to find a compact cluster near the given starting instances. While most existing studies on graph clustering assume a discrete graph setting (i.e., unweighted, undirected graphs without self-loops), real-world graphs can be more complex. In this paper, we extend the classic non-approximating Andersen-Chung-Lang (ACL) clustering algorithm beyond discrete graphs and generalize its quadratic optimality to a wider range of complex graphs, including weighted, directed, and self-looped graphs and hypergraphs with edge-dependent vertex weights. Specifically, by leveraging PageRank, we propose two algorithms: GeneralACL for graphs and HyperACL for hypergraphs. We prove that, under two mild conditions, both algorithms can identify a quadratically optimal cluster in terms of conductance. Additionally, we provide experiments to validate our theoretical findings. Our code is available at https://github.com/iDEA-iSAIL-Lab-UIUC/HyperACL.

Publishing Below-Threshold Triangle Counts under Local Weight Differential Privacy

2026-06-02T18:51:05Z

We propose an algorithm for counting below-threshold triangles in weighted graphs under local weight differential privacy. While prior work has largely focused on unweighted graphs, edge weights are intrinsic to many real-world networks. We consider the setting in which the graph topology is publicly known and privacy is required only for the contribution of an individual to incident edge weights, capturing practical scenarios such as road and telecommunication networks. Our method uses two rounds of communication. In the first round, each node releases privatized information about its incident edge weights under local weight differential privacy. In the second round, nodes locally count below-threshold triangles using this privatized information; we introduce both biased and unbiased variants of the estimator. We further develop two refinements: (i) a pre-computation step that reduces covariance and thus lowers expected error, and (ii) an efficient procedure for computing smooth sensitivity, which substantially reduces running time relative to a straightforward implementation. Finally, we present experimental results that quantify the trade-offs between the biased and unbiased variants and demonstrate the effectiveness of the proposed improvements.

Scheduling in Queueing Systems with Uncertain and Evolving Holding Costs

2026-06-02T18:46:22Z

In content moderation for social media platforms, the cost of delaying the review of a content is proportional to its view trajectory, which fluctuates and is apriori unknown. Motivated by such uncertain and evolving holding costs, we consider a queueing model where job states evolve based on a Markov chain with state-dependent instantaneous holding costs. We demonstrate that in the presence of such uncertain and evolving holding costs, the two canonical algorithmic principles, instantaneous-cost ($cμ$-rule) and expected-remaining-cost ($cμ/θ$-rule), are suboptimal. By viewing each job as a Markovian ski-rental problem, we develop a new index-based algorithm, Opportunity-adjusted Remaining Cost (OaRC), that adjusts to the opportunity of serving jobs in the future when uncertainty partly resolves. We show that the suboptimality gap of OaRC scales as $\tilde{O}(\sqrt{N})$, where $N$ is the system size. This bound shows that OaRC achieves asymptotic optimality for overloaded systems when the system size $N$ scales to infinity. Moreover, the bound is independent of the state-space size, which is a desirable property when job states contain contextual information. We corroborate our results with an extensive simulation study based on two holding cost patterns (online ads and user-generated content) that arise in content moderation for social media platforms. Our simulations based on synthetic and real datasets demonstrate that OaRC consistently outperforms existing practice, which is based on the two canonical algorithmic principles.

Planar Perfect Matching Counting is as Hard as Determinants

2026-06-02T17:56:22Z

In the 1960s, Fisher, Kasteleyn and Temperley designed an ingenious algorithm for computing the partition function of the dimer model, or equivalently, for counting perfect matchings in edge-weighted planar graphs (Philos. Mag. 1961; J. Mathematical Phys. 1963). This FKT algorithm later became the foundation for Valiant's holographic algorithms (FOCS 2004; SIAM J. Comput. 2008), which motivated the study of counting problems under the Holant framework. Combined with an algorithm by Yuster (FOCS 2008), the FKT algorithm allows us to count edge-weighted perfect matchings in planar $n$-vertex graphs with $\tilde{O}(n^{ω/2})$ arithmetic operations, where $ω<2.372$ is the matrix multiplication exponent. We prove a corresponding lower bound: Over algebraic circuits and other sufficiently strong computational models, perfect matchings in edge-weighted $n$-vertex planar graphs $G$ cannot be counted in $O(n^{ω/2-ε})$ arithmetic operations. This confirms the optimality of Yuster's algorithm. Our bound holds even when $G$ is an edge-weighted square grid.

Ranked MSO-enumeration over compressed words

2026-06-02T17:37:07Z

It is shown that the ranked query enumeration problem for a fixed MSO-query on strings can be solved with linear preprocessing and constant delay in the grammar-compressed setting, where the input string is given by a so-called straight-line program, i.e., a context-free grammar that produces exactly one string. Moreover, `ranked' means that the output tuples of the MSO-query are printed in a specific order that has to be MSO-definable. This is the first result for ranked query enumeration on compressed data. A corollary of this result is that for a fixed polyregular function $f$ and a word $w$ that is given by a straight-line program of size $n$, one can list after preprocessing time $\mathcal{O}(n)$ the symbols in $f(w)$ from left to right with constant delay, which generalizes a result of Bojanczyk for the case where $w$ is uncompressed. The proofs for these results are based on factorization trees, which are made accessible to the grammar-compressed setting (a contribution of independent interest).

Revisiting $O(n \log \log n)$ chaining for anchored edit distance

2026-06-02T17:18:53Z

Colinear chaining is a classical heuristic for sequence alignment: it enables scalable genome comparison and is a main component of many state-of-the-art read mappers based on seed-chain-extend. The earliest $O(n \log \log n)$ time algorithms by Eppstein et al. (J. ACM, 1992) chained $n$ fragments between two sequences $T$ and $Q$ while minimizing a gap cost based on the diagonal distance $Δ_{\text{diag}}$ between consecutive fragments. They also forbid fragment overlaps, which are essential in current chaining formulations: in long-read mapping, overlaps improve sensitivity and avoid restrictions on the fragment class considered. Jain, Gibney, and Thankachan (J. Comput. Biol. 2022) recently combined a $Δ_{\text{diag}} = |Δ_T -Δ_Q|$ overlap cost with the classic $L_\infty = \max(Δ_T , Δ_Q)$ gap cost that takes the maximum between the horizontal and vertical gap between the fragments and they proved that chaining under this cost model is equivalent to the anchored edit distance. We improve the existing $O(n \log^3 n)$-time algorithm for anchored edit distance to $O(n \log \log n)$ time in $O(n)$ space, by combining the gap-cost computation of Chao and Miller (Algorithmica, 1995) with the overlap-cost computation of Baker and Giancarlo (ESA, 1998). By developing llchain, a simpler $O(n \log n)$-time implementation of our method, we show how chaining algorithms that might have been recently overlooked by the bioinformatics community scale competitively to millions of fragments and large genomes. On average, llchain is $10\times$ faster than other methods on instances with $3\,000\,000$ anchors, and over $3\times$ faster on MEMs between HiFi reads and a reference human genome.

Learning DNF through Generalized Fourier Representations

2026-06-02T16:38:41Z

The Boolean Fourier representation has been widely used in learning theory, particularly for learning Disjunctive Normal Form (DNF) under uniform and product distributions. Extending these results to non-product distributions has remained a longstanding open problem. We address this challenge by introducing a generalized Fourier representation that enables learning under a broad class of non-product distributions. Our approach represents any distribution $D$ as a Bayesian network (BN) and derives a corresponding Fourier expansion. We show that standard Fourier-based learning techniques using membership queries to identify heavy coefficients can be adapted to this generalized representation with minor modifications. We prove that the $L_1$ spectral norm of conjunctions remains bounded under this expansion for difference-bounded tree BNs, significantly generalizing the known result for uniform distributions; matching lower bounds demonstrate the necessity of these constraints. Using these results, we establish the learnability of DNF and the agnostic learnability of decision trees under such distributions. Finally, we present an algorithm for learning difference-bounded tree BN distributions, extending our results to settings where the distribution is unknown.

ETH-Tight Complexity of Optimal Morse Matching on Bounded-Treewidth Complexes

2026-06-02T14:22:20Z

The Optimal Morse Matching (OMM) problem asks for a discrete gradient vector field on a simplicial complex that minimizes the number of critical simplices. It is NP-hard and has been studied extensively in heuristic, approximation, and parameterized complexity settings. Parameterized by treewidth $k$, OMM has long been known to be solvable on triangulations of $3$-manifolds in $2^{O(k^2)} n^{O(1)}$ time and in FPT time for triangulations of arbitrary manifolds, but the exact dependence on $k$ has remained an open question. We resolve this by giving a new $2^{O(k \log k)} n$-time algorithm for any finite regular CW complex, and show that no $2^{o(k \log k)} n^{O(1)}$-time algorithm exists unless the Exponential Time Hypothesis (ETH) fails.

Stepsize Hedging: an Alternative Mechanism for Accelerating Gradient Descent

2026-06-02T14:12:25Z

Can gradient descent be accelerated by just choosing better stepsizes? Surprisingly, the answer is yes. This short expository article provides an accessible introduction to this phenomenon of stepsize hedging.

Deterministic Distance Approximation in MPC via Improved Hitting Sets

2026-06-02T13:59:26Z

In this paper, we provide the first deterministic algorithms with sublogarithmic round complexity for spanners and approximate shortest paths in various MPC models. Moreover, we significantly improve upon the state of the art in the deterministic Congested Clique. In particular, we obtain the following four results on undirected graphs: 1. In both linear MPC and Congested Clique, we obtain an $O(k)$ stretch-spanner of a weighted graph of size $O(n^{1+1/k})$ in $O(1)$ rounds, for some parameter $k\ge 0$. For $k=O(\log{n})$, this leads to an $O(\log n)$ approximation of APSP in constant rounds in both models. 2. In sublinear MPC, we obtain an $O(k^{1+\varepsilon})$-stretch spanner of a weighted graph of size $O(n^{1+1/k})$ in $O(\log k)$ rounds, for any fixed constant $\varepsilon>0$. 3. In Congested Clique, we obtain $O(1)$-approximate APSP for weighted graphs in $O(\log \log \log n)$ rounds. 4. In near-linear MPC, we obtain $(1+\varepsilon)$-approximate single-source shortest paths and $O(1)$-approximate all-pairs shortest paths for unweighted graphs in $\textsf{poly}\log \log n$ rounds. Our algorithm only requires a single near-linear memory machine, where the rest can have sublinear memory. Our deterministic algorithms obtain similar guarantees to the state of the art randomized algorithms without incurring additional factors in the round complexity. To obtain these results, we inspect the randomized algorithms and isolate a randomized sampling routine. Then we derandomize these sampling routines by using a deterministic hitting set. Hereto, we develop a versatile deterministic hitting set algorithm, which we hope will have further derandomization applications.

Algorithmically Fair Maximization of Multiple Submodular Objective Functions and Implications to Constrained Fair Division

2026-06-02T13:33:00Z

Constrained maximization of submodular functions is a central problem in combinatorial optimization. In many realistic scenarios, multiple agents each need to maximize their own submodular objective over a common ground set, subject to individual constraints, with the requirement that their solutions be disjoint. We study this setting through the lens of algorithmic fairness and constrained fair division. Inspired by the fair division literature, we propose and analyze a simple Round-Robin protocol in which agents take turns building their solutions one item at a time; each agent is free to use any internal algorithm, and the protocol itself performs no computation. We show that agents following simple greedy policies enjoy solid guarantees for both monotone and non-monotone objectives subject to constraints as general as $p$-systems. For monotone objectives, a greedy agent $i$ with a $p_i$-system constraint achieves a $1/(n+p_i)$ fraction of the best value available when they first get to choose. On instances that are robust to competition -- where no agent's optimal value is greatly affected by losing some items to others -- these guarantees improve to a $1/Θ(p_i)$ approximation of the unconstrained optimum, which is asymptotically best-possible in polynomial time. We further establish novel fairness guarantees: greedy agents produce approximately feasible-envy-free-up-to-one-item (FEF1) and approximately feasible-envy-free-towards-unallocated-items (FEFu) allocations for monotone and non-monotone objectives. Via a simple augmented protocol and a self-contained polynomial-time proxy algorithm, we also obtain the first $Θ(1/p_i)$-approximate feasible maximin share (FMMS) guarantees for submodular agents with combinatorial constraints. Finally, although greedy policies may not be individually optimal, consistently improving upon them is NP-hard even in the simplest settings.

The anti-lexicographic SUS-anchor: a near-optimal k=1 sampling scheme

2026-06-02T07:13:57Z

In recent years, there has been a renewed interest in the search for low density minimizer schemes. These schemes take a window of $w$ consecutive $k$-mers, and sample one of them: the smallest under some specific order. Schemes such as the mod-minimizer provide a low density (fraction of sampled $k$-mers) when $k \gg w$, while schemes such as the greedy minimizer work well for explicit small parameters roughly in the regime $k \leq 2w$, for $k$ and $w$ up to $15$ or so. When $k < \log_σw$ is very small, minimizer schemes cannot do well, and more general sampling schemes are needed that can be richer than just comparing $k$-mers. Bidirectional-string anchors (bd-anchors) form one such scheme. Inspired by bd-anchors, we introduce the smallest unique substring or SUS-anchor: Given a window, this considers all suffixes that do not occur as a substring elsewhere in the window. It then samples the start position of the smallest suffix according to the new anti-lexicographic order that minimizes the first character and maximizes the remaining characters. We give a linear-time and $O(w)$ space streaming algorithm to compute all SUS-anchors of a string. For alphabet size $σ=4$ and $k=1$, the anti-lexicographic SUS-anchor empirically has density $<1\%$ away from the density lower bound, significantly improving over bd-anchors that are often $>15\%$ above it. For alphabet size $σ=2$, the density is at most $10\%$ above the lower bound, which again improves over the $>50\%$ overhead of bd-anchors.

HRNN: A Hybrid Graph Index for Approximate Reverse k-Nearest Neighbor Search on High-Dimensional Vectors

2026-06-02T06:35:54Z

Reverse k-nearest neighbor (RkNN) search returns all data points that regard a query vector as one of their k-nearest neighbors (kNNs). Existing RkNN methods typically follow a filter-and-verification framework: vectors near the query vector are first collected as candidates and then verified against their kNN-radius (i.e., the distance to their k-th nearest neighbor). However, existing methods face two key limitations in high-dimensional spaces. First, nearby vectors often do not belong to the query's true RkNN set, resulting in excessive candidate expansion overhead. Second, existing methods compute kNN-radius online during verification, incurring substantial query-processing cost. To address these limitations, we propose HRNN, a hybrid graph index for approximate RkNN search. (1) Rather than directly treating nearby vectors as RkNN candidates, HRNN uses them as proxy points based on the assumption that a query's RkNN results can often be discovered through the RkNN results of its nearby vectors. (2) To reduce verification cost, HRNN materializes high-fidelity kNN-radius offline, eliminating expensive online reconstruction while preserving accuracy. HRNN combines a navigation graph, a ranked KNN graph, and reverse-neighbor lists into a hybrid index that supports efficient proxy retrieval, candidate generation, and kNN-radius access. We also develop efficient index construction and append-only maintenance algorithms. Extensive experiments show that HRNN consistently outperforms existing methods, achieving up to one order of magnitude higher throughput. Moreover, HRNN scales to datasets containing up to 10 million high-dimensional vectors while supporting efficient dynamic index maintenance.