Differentially Private Spectral Graph Clustering: Balancing Privacy, Accuracy, and Efficiency

2026-05-11T07:10:10Z

We study spectral graph clustering under edge differential privacy. We propose a matrix shuffling mechanism that combines randomized edge flipping with a random permutation of the adjacency matrix. While edge flipping alone provides only a constant $\varepsilon$ guarantee as the graph grows, shuffling amplifies privacy so that the effective $\varepsilon$ tends to zero with the number of nodes. We develop a unified error analysis framework -- based on Davis--Kahan perturbation theory and a classification-margin bound -- that gives explicit misclassification rates for all the mechanisms considered as a function of the privacy budget, eigengap, and number of communities. Applying this framework, we show that the matrix shuffling mechanism achieves an error rate scaling of $\tilde{O}(1/n)$, a clear improvement over two canonical DP baselines from the private PCA literature: the Gaussian mechanism applied directly to the adjacency matrix (Analyze Gauss) and the noisy power method, both of which scale as $\tilde{O}(1)$ in $n$. We further propose a private spectral gap detection algorithm for estimating the number of communities. Experiments on synthetic and real-world networks validate our theoretical findings.

Private Structured-Subset Retrieval

2026-05-11T06:14:49Z

We introduce the \emph{Private Structured-Subset Retrieval (PSSR)} problem, where a user retrieves $D$ messages from a database of $K$ messages replicated across $N$ non-colluding servers, and the demand is restricted to a known structured family of $D$-subsets. This formulation generalizes Multi-message Private Information Retrieval (MPIR) and captures settings where the demand space is constrained by application-specific structure. Focusing on balanced ${\{0,1\}}$-linear schemes, a class that includes several best-known MPIR schemes, we derive converse bounds on the maximum retrieval rate and minimum subpacketization level required to achieve any given rate. We also develop an optimization-based framework to construct schemes for general structured demand families, providing flexibility in optimizing the retrieval rate or the subpacketization level. When specialized to the full demand family, this framework recovers known balanced $\{0,1\}$-linear MPIR constructions; for more restricted demand families, it can exploit the demand structure to increase the retrieval rate, reduce the subpacketization level, or both. We demonstrate this through a structured-demand example in which the proposed PSSR scheme simultaneously achieves a higher rate and requires a smaller subpacketization than the best-known MPIR scheme for the same parameters $N$, $K$, and $D$. Our parallel work on contiguous-demand families further illustrates the scope of this framework by yielding rate-optimal schemes with substantially smaller subpacketization and no field-size restrictions, improving upon MPIR-based schemes.

A Fast Hierarchical Splitting Approach for Non-Adaptive Learning of Random Hypergraphs

2026-05-11T04:20:42Z

This work focuses on the problem of learning an unknown $3$-uniform hypergraph using edge-detecting queries. Our goal is to design a querying strategy that recovers the hyperedge set using as few queries as possible. We restrict our attention to random hypergraphs under the Erdős--Rényi (ER) model, in which each potential hyperedge appears independently with probability $q = Θ(n^{-3(1-θ)})$ for $θ\in (0;1)$. Prior work [Austhof-Reyzin-Tani, ISIT 2025] presents a testing-decoding scheme that uses $O(\bar{m}\log n)$ tests but requires a decoding time of $Ω(n^3)$, where $\bar{m} = q\binom{n}{3}$ denotes the expected number of hyperedges. In this work, we extend the binary splitting framework and adapt it to the $3$-uniform hypergraph setting. We obtain a testing-decoding scheme that recovers the hyperedge set with high probability using $O(\bar{m} \log n)$ tests and achieves decoding time $O(\bar{m}^{5/3}\log n)$ for the case $θ> \dfrac{2}{3}$ and $O(\bar{m}^{5/3}\log^2{\bar{m}}\log n)$ for the case $θ\leq \dfrac{2}{3}$. Thus, compared with prior work, our result significantly improves the decoding complexity while maintaining optimal query complexity.

Survey-Free Radio Map Construction via HMM-Based Coarse-to-Fine Inference

2026-05-11T03:42:52Z

Traditional radio map construction methods mandate labor-intensive data collection and precise location labeling. To address these limitations, we propose a novel survey-free approach for radio map construction that relies solely on unlabeled Received Signal Strength (RSS) measurements, thereby obviating the need for manual site surveys or auxiliary Inertial Measurement Units (IMUs). The key idea involves embedding multiple unlabeled RSS sequences into a known indoor layout, specifically targeting corridor-guided environments with a dominant unidirectional pedestrian flow. However, aligning the embedded coordinates with the RSS collection locations remains challenging due to the random fluctuations inherent in RSS data. To tackle this, we introduce a Hidden Markov Model (HMM)- based Coarse-to-Fine Inference (HCFI) framework. At the coarse level, we employ an HMM-based region label inference algorithm to partition RSS sequences and align the RSS segments with specific physical regions using graph-based inference. At the fine level, we develop an HMM-based location label inference technique to estimate RSS collection coordinates by leveraging RSS propagation principles while incorporating sequential spatio-temporal mobility probability. Empirical results from an office environment demonstrate that the proposed method achieves a radio map construction Mean Absolute Error (MAE) of 8.96 dB. Furthermore, based on the estimated radio map, k-Nearest Neighbor (KNN) localization yields an average positioning error of approximately 3.33 meters, offering a highly viable, survey-free solution for radio map construction under sequential topological assumptions.

Annotation-Free Indoor Radio Mapping via Physics-Informed Trajectory Inference

2026-05-11T03:39:28Z

Constructing indoor radio maps traditionally requires extensive site surveys with precise user-location labels, making the calibration process costly and time-consuming. Existing calibration-reduction methods either depend on partial location annotations or exploit inertial measurement units (IMUs) to provide relative motion cues; however, IMU-assisted solutions are constrained by hardware availability, device-level access restrictions, and accumulated sensor drift. In this paper, we study a location-label-free indoor radio mapping problem under known access-point deployment geometry and a known walkable spatial domain. We propose a physics-informed trajectory inference framework that uses only Channel State Information (CSI), without relying on user-location labels or IMU measurements. The key idea is to recover the latent spatial coordinates of CSI measurements by exploiting the local spatial continuity of multipath propagation. To this end, we construct a Power-Angle-Delay Profile (PADP) feature distance from MIMO-OFDM CSI and show that, within a local neighborhood and under quasi-static multipath conditions, this distance provides a physically meaningful proxy for small spatial displacements. We then incorporate the PADP-based continuity constraint into a spatially regularized Bayesian inference model for joint trajectory recovery and propagation-parameter estimation. Experiments on a real-world industrial CSI dataset demonstrate that the proposed framework achieves an average localization error of 0.88 m and a relative beam map construction error of 6.68%, improving upon representative channel-embedding and IMU-assisted baselines.

Rényi Rate-Distortion-Perception-Privacy Tradeoff under Indirect Observation

2026-05-11T03:15:43Z

We introduce a Rényi Rate-Distortion-Perception-Privacy (R-RDPP) framework for indirect source coding. A latent source~$S$ is correlated with a private attribute~$U$, and the encoder observes only a noisy view~$X$ such that $(S,U) - X - Y$ holds at the decoder output~$Y$. The communication cost is measured by Sibson's $α$-mutual information $\Ialp$, the privacy leakage by $\Ibeta$, the semantic distortion between $S$ and $Y$, and the realism constraint at the semantic marginal $P_S$. We characterize the scalar Gaussian RDPP tradeoff, revealing that standard privacy metrics inherently penalize legitimate semantic recovery. To resolve this, we introduce a conditional privacy measure that quantifies only the residual leakage. In addition, we refine the achievability bounds for $α> 1$ via the Poisson functional representation. By deriving the exact geometric-mixture distribution of the Poisson index, we obtain exact closed-form expressions for integer-order Rényi entropies and sharper computable bounds in regimes where the resulting expression improves the logarithmic-moment approach.

Closed-Form Gaussian Estimators for Multi-Source Partial Information Decomposition

2026-05-11T03:14:19Z

Computing multi-source partial information decomposition (PID) for continuous data is hard: existing closed-form Gaussian estimators are restricted to two source variables, while continuous arbitrary-source estimators are typically learning-based and do not provide closed-form expressions. To address this, we develop closed-form Gaussian estimators for multi-source PID. We provide two-source redundancy, multi-source unique information, the K-th order synergistic effect from source subsets of size K, and the total synergistic effect. The estimators are derived from the conditional-independence-based information measures introduced in our earlier work, under which every quantity reduces to a log-determinant expression in covariance blocks of the system. The resulting estimator is plug-in consistent, affine invariant, source-permutation symmetric, and additive over independent systems. We validate it on a controlled Gaussian benchmark, evaluate its computational efficiency against baselines, and confirm its numerical stability in finite-sample regimes. To our knowledge, this is the first covariance-based closed-form estimator that provides multi-source continuous PID measures.

A Global Coding Scheme for OFDM over Finite Fields

2026-05-11T01:49:59Z

This paper proposes a highly efficient global coded-multiplexing scheme, conceptualized as Orthogonal Frequency Division Multiplexing over a finite field (FF-OFDM), for reliable multiuser communications. By utilizing a prime length cyclic code and its Hadamard equivalents as algebraic subcarriers, independent data streams are globally multiplexed via a Galois Fourier Transform (GFT) without rate loss. We show that this finite-field synthesis intrinsically generates a global Quasi-Cyclic Low-Density Parity-Check (QC-LDPC) code over $\mathrm{GF}(2^s)$, whose parity-check matrix is governed by the structural rigor of partial geometries. At the receiver, supported by a binary decomposition theorem, the received nonbinary global codeword is jointly decoded using parallel binary iterative soft-decision algorithms prior to demultiplexing. This joint decoding enables seamless reliability information sharing across all user streams, achieving near-bound error performance, rapid convergence without error floors, and strictly linear amortized decoding complexity.

An Information-Theoretic Criterion for Efficient Data Synthesis

2026-05-11T01:27:59Z

Synthetic data becomes crucial for large language model training, but its effectiveness is highly inconsistent. We provide an information-theoretic account of this inconsistency: synthetic data improves a model only when the generation-training loop is information-open, i.e., shaped by external signals (verifiers, environments, or rubrics) that inject task-relevant information beyond the model's current distribution. When the loop is information-closed (relying on the model's own outputs without such signals), the data processing inequality ensures that task-relevant information can only decrease, making collapse a predicted outcome. Among information-open pipelines, both efficiency and generalization hinge on the meta-level of supervision: a coarser signal such as binary correctness treats all acceptable outputs as equivalent, so the behavior it teaches is not tied to any particular domain or surface form and generalizes naturally across tasks and domains. These observations lead to a guiding thesis: learning preferentially converges to the most information-efficient signal component available, which accelerates learning when that component is the intended one, but causes reward hacking when a spurious pattern happens to be simpler.

Physics-Inspired Probabilistic Computing for Extremely Large-Scale MIMO Detection in Future 6G Wireless Systems

2026-05-11T01:17:06Z

Extremely large-scale multiple-input multiple-output (XL-MIMO) architectures are a key enabler of forthcoming 6G wireless communication networks by allowing high data rates through massive spatial multiplexing. Here, we approach these problems with physics-inspired unconventional computing based on Ising machines (IMs). For binary modulation, probabilistic IMs (PIMs) and oscillator-based IMs achieve optimal ML detection with systems up to 2048x2048 antennas with only 100 iterations, matching optimal sphere decoder performance for computationally treatable sizes and outperforming the minimum mean-square error (MMSE) industrial standard. For M-QAM up to 256, a generalized PIM-inspired framework, based on d-dimensional probabilistic variables (p-dits) that directly encode QAM symbols, shows low bit-error-rate across sizes up to 256x256 antennas, outperforming or matching MMSE with reduced algorithmic complexity. Unlike the binary mapping, the p-dit interaction matrix is independent of the QAM order, enabling adaptive MIMO modulation. These results show a promising scalable paradigm for XL MIMO detection in future 6G networks.

Cross-Domain Lossy Compression via Constrained Minimum Entropy Coupling

2026-05-11T00:19:11Z

This paper studies cross-domain lossy compression through the lens of minimum entropy coupling (MEC) with rate and classification constraints. In this setting, an encoder observes samples from a degraded source domain, while the decoder is required to generate outputs following a prescribed target distribution and to preserve information relevant to a downstream classification task. Motivated by logarithmic-loss distortion, we adopt an information-based objective that maximizes the coupling strength between the source and reconstruction, rather than minimizing a sample-wise distortion. Under common randomness, we formulate a rate-constrained MEC problem (MEC-B) and show that the intermediate representation can be removed without loss of optimality, yielding an equivalent deterministic coupling formulation. For Bernoulli sources, closed-form expressions are derived with and without classification constraints. In addition, we implement a neural restoration framework using quantization, entropy modeling, distribution matching, and classification regularization. Experiments on MNIST super-resolution and SVHN denoising show that increasing the available rate improves classification accuracy and yields more informative reconstructions.

On the Rényi Rate-Distortion-Perception Function and Functional Representations

2026-05-10T22:43:27Z

We extend the Rate-Distortion-Perception (RDP) framework to the Rényi information-theoretic regime, utilizing Sibson's $α$-mutual information to characterize the fundamental limits under distortion and perception constraints. For scalar Gaussian sources, we derive closed-form expressions for the Rényi RDP function, showing that the perception constraint induces a feasible interval for the reproduction variance. Furthermore, we establish a Rényi-generalized version of the Strong Functional Representation Lemma. Our analysis reveals a phase transition in the complexity of optimal functional representations: for $0.5<α< 1$, the coding cost is bounded by the $α$-divergence of order $α+1$, necessitating a codebook with heavy-tailed polynomial decay; conversely, for $α> 1$, the representation collapses to one with finite support, offering new insights into the compression of shared randomness under generalized notions of mutual information.

Learning from Acceptance: Cumulative Regret in the Game of Coding

2026-05-10T21:04:43Z

Classical coding-theoretic guarantees often rely on trust assumptions, such as requiring sufficiently many honest nodes compared with adversarial ones. These assumptions are difficult to enforce in open decentralized systems where participants are not centrally certified. At the same time, such environments often contain incentive mechanisms: participants may be rewarded only when their submitted data are accepted and the system remains functional. This changes the role of an adversary. Rather than acting as a pure saboteur, a strategic adversary may submit data that are consistent enough to be accepted while still degrading the quality of the final estimate. The game-of-coding framework models this strategic interaction between a data collector (DC) and an adversary. Existing works on the game of coding mostly consider the complete-information case, where the DC knows how the adversary trades off acceptance and estimation error. In this paper, we study an incomplete-information version of the game of coding in which the DC, acting as a Stackelberg leader, does not know the adversary's utility trade-off and must learn through repeated interaction. Prior work on the unknown-adversary setting considered an explore-then-commit objective, where only the final selected acceptance rule is evaluated. In contrast, we study the full learning trajectory: every acceptance rule used during the algorithm is executed and contributes to performance. We propose an algorithm that refines its search around promising acceptance rules, prove that it achieves sublinear cumulative regret, and evaluate its performance through numerical experiments.

Recovery Algorithms for Linear Batch Codes

2026-05-10T20:51:36Z

Various types of recovery algorithms for batch codes have been investigated, such as asynchronous recovery or recovery as afforded by batch codes obtained from Almost Affinely Disjoint (AAD) families. In this paper, we offer the first systematic investigation of linear batch codes equipped with particular recovery algorithms. We introduce and investigate various known and new types of algorithms, and we investigate the order hierarchy of these types of batch codes. The simplest known recovery algorithms are those associated with graph-based batch codes. We investigate the resulting batch codes for arbitrary bipartite graphs, thereby generalizing some known results.

Entropy-informed Decoding: Adaptive Information-Driven Branching

2026-05-10T20:44:47Z

Large language models (LLMs) achieve remarkable generative performance, yet their output quality is dependent on the decoding strategy. While sampling-based methods (e.g., top-k, nucleus) and search-and-select based methods (e.g., beam search, best-of-n, majority voting) can improve upon greedy decoding, both approaches suffer from limitations: sampling generally commits to a single path, while search often expends excessive computation regardless of task complexity. To address these, we introduce Entropy-informed decoding (EDEN), a plug-and-play, model-agnostic decoding framework that adaptively allocates computation based on the model's own uncertainty, approximating higher-width beam search with fewer expansions. At each generation step, EDEN estimates the entropy of the output token distribution and adjusts the branching factor monotonically with the entropy, expanding more candidates in high-entropy regions and following a greedier path in low-entropy regions, improving token efficiency. Experiments across complex tasks, including mathematical reasoning, code generation, and scientific questions, demonstrate that EDEN consistently improves output quality over existing decoding strategies, achieving better accuracy-expansion trade-offs than fixed-width beam search. By treating next-token selection as a noisy maximisation problem, we prove that branching factors monotone in entropy are guaranteed to find better (i.e. more probable) continuations than any fixed branching factor within the same total expansion budget, and derive explicit regret rates characterising the benefit of the adaptive allocation.