https://arxiv.org/api/XuU/iBv+94zgvBnDj/hJ0XYBg6E2026-06-22T11:55:45Z5484140515http://arxiv.org/abs/2602.00906v7Hallucination is a Consequence of Space-Optimality: A Rate-Distortion Theorem for Membership Testing2026-05-29T20:54:28ZLarge language models often hallucinate with high confidence on "random facts" that lack inferable patterns. We formalize the memorization of such facts as a membership testing problem, unifying the discrete error metrics of Bloom filters with the continuous log-loss of LLMs. By analyzing this problem in the regime where facts are sparse in the universe of plausible claims, we establish a rate-distortion theorem: the optimal memory efficiency is characterized by the minimum KL divergence between score distributions on facts and non-facts. This theoretical framework provides a distinctive explanation for hallucination under an idealized setting: even with optimal training, perfect data, and a simplified ``closed world'' setting, the information-theoretically optimal strategy under limited capacity is not to abstain or forget, but to assign high confidence to some non-facts, resulting in hallucination. We validate this theory empirically on both synthetic and real-world data, showing that hallucinations persist as a natural consequence of lossy compression. The same theorem recovers and sharpens classical space lower bounds for Bloom-type filters, pinning down an additive constant left open for two-sided filters.2026-01-31T21:18:28ZICML 2026Anxin GuoJingwei Lihttp://arxiv.org/abs/2509.04631v2Fundamental bounds on efficiency-confidence trade-off for transductive conformal prediction2026-05-29T18:51:57ZTransductive conformal prediction addresses the simultaneous prediction for multiple data points. Given a desired confidence level, the objective is to construct a prediction set that includes the true outcomes with the prescribed confidence. We demonstrate a fundamental trade-off between confidence and efficiency in transductive methods, where efficiency is measured by the size of the prediction sets. Specifically, we derive a strict finite-sample bound showing that any non-trivial confidence level leads to exponential growth in prediction set size for data with inherent uncertainty. The exponent scales linearly with the number of samples and is proportional to the conditional entropy of the data. Additionally, the bound includes a second-order term, dispersion, defined as the variance of the log conditional probability distribution. We show that the transductive methods based on the approximate conditional distribution can approach this bound. Inspired by this setup, we introduce a practical transductive prediction algorithm that surpasses Bonferroni methods.2025-09-04T19:49:58ZArash BehboodiAlvaro H. C. CorreiaFabio Valerio MassoliChristos Louizoshttp://arxiv.org/abs/2605.31579v1Functional Multi-Target Detection via Bispectrum Inversion2026-05-29T17:47:45ZThis paper develops a functional theory for multi-target detection, where a compactly supported signal is recovered from a single noisy observation containing many unknown translations of the signal. Our formulation allows continuous, off-grid translations and correlated stationary Gaussian process noise, extending beyond the discrete, grid-aligned, white-noise models common in prior work. We analyze two uninitialized recovery algorithms based on autocorrelation analysis; in particular, both algorithms first estimate the signal's bispectrum via a debiased third-order empirical autocorrelation. The signal is then recovered from the estimated bispectrum using either a functional frequency marching scheme or a Kotlarski-type deconvolution formula. For both algorithms, we prove non-asymptotic recovery guarantees for compactly supported signals without bandlimiting assumptions. The resulting error bounds depend on the smoothness of the signal and the accuracy of bispectrum estimation, with the latter governed by the noise characteristics and the number of signal occurrences. Numerical experiments validate our theory and demonstrate accurate recovery in low-SNR regimes.2026-05-29T17:47:45ZAnna LittleDaniel Sanz-AlonsoMikhail SweeneyRuiyi Yanghttp://arxiv.org/abs/2605.31549v1Microwave Linear Analog Computer (MiLAC) for Simultaneous Active and Passive Beamforming2026-05-29T17:08:33ZMicrowave linear analog computers (MiLACs) have recently emerged to enable high-performance and efficient beamforming in the analog domain. In this paper, we introduce a dual-functionality framework for MiLAC-aided transceivers. Beyond analog-domain precoding/combining (active beamforming), a MiLAC and its antenna array can simultaneously act as a reconfigurable intelligent surface (RIS) (passive beamforming). This allows the MiLAC to execute beamforming for transmission/reception while reflecting external incident signals. We provide an optimal reconfiguration strategy for this dual-functional MiLAC, and characterize the fundamental limits on the trade-off between active and passive rate, namely the capacity region bounds and the sum-rate capacity.2026-05-29T17:08:33ZSubmitted to IEEE for publicationMatteo NeriniBruno Clerckxhttp://arxiv.org/abs/2605.31526v1Distributionally Robust Physical-Layer Security for Satellite Communication via Aerial Reconfigurable Intelligent Surface2026-05-29T16:40:02ZSatellite communications are envisioned as a key enabler for ubiquitous coverage in future 6G networks, yet the broadcast nature renders them vulnerable to eavesdropping, especially given the long-distance transmissions and associated high uncertainties. In this paper, we propose the physical layer security enhancement for multi-beam satellite communications with the assistance of an aerial reconfigurable intelligent surface (ARIS). Considering the high dynamics and uncertainties of channels, we characterize the channel distribution with moment-based ambiguity sets. Accordingly, a distributionally robust secrecy rate optimization is formulated through joint design of transmit and reflection beamforming. We then introduce a conditional value-at-risk-based reformulation to convert the probabilistic constraints into deterministic forms. An alternating optimization framework is subsequently employed to iteratively update the transmit and reflective beamforming vectors until convergence. Simulation results demonstrate that the proposed distributionally robust scheme significantly enhances secrecy performance, and maintains reliable performance across various channel error distributions.2026-05-29T16:40:02ZAccepted @ IEEE TCOMZhaole WangXiao TangNaijin LiuJinxin LiuQinghe DuLei ChenTingwu Linhttp://arxiv.org/abs/2605.31465v1The Nonparametric Kiefer-Weiss Problem2026-05-29T15:58:40ZA nonparametric variant of the Kiefer-Weiss problem is proposed and solved. The objective is to minimize a weighted sum of the error probabilities of a binary sequential test subject to a constraint on its maximum expected sample size. This maximum is taken over all possible probability distributions on the given sequence space. First, it is shown that the nonparametric Kiefer-Weiss problem can be reduced to an optimal stopping problem. Then, the optimal stopping policy is derived under the assumption that at most k uses of randomization are permitted during any run of the test. The solution to the original problem is then obtained by letting k go to infinity. The optimal cost function is shown to be the solution of a nonlinear Bellman equation. The corresponding optimal stopping policy is shown to be based on a two-dimensional test statistic, with one component tracking the likelihood ratio and the other one tracking the expected remaining sample size. Critically, the stopping policy uses randomization to increase the remaining expected sample size for some runs, while stopping early for others. The optimal randomization rule is shown to be determined by a function that maps the likelihood ratio to an integer-valued sample size. Two approximations of this function are proposed that can be evaluated easily in practice. The results are illustrated with two numerical examples of nonparametric Kiefer-Weiss tests, one for a shift in the success probability of a Bernoulli distribution, and one for a shift in the mean of a normal distribution.2026-05-29T15:58:40Z32 pages, 6 figures, 2 tables. Submitted to the Annals of StatisticsMichael FaussH. Vincent PoorAbdelhak M. Zoubirhttp://arxiv.org/abs/2606.12445v1SAT, MaxSAT, and SMT for QLDPC Distance Computation: A Large-Scale Empirical Study2026-05-29T15:34:33ZExact distance computation for quantum LDPC (QLDPC) codes plays a central role in validating candidate fault-tolerant quantum-code constructions, yet the computational structure of this problem remains poorly understood. Despite substantial recent progress in QLDPC design, it remains unclear which algorithmic principles govern the practical scalability of exact distance computation and which classes of exact solvers are best suited to this task. To address these questions, we conduct a systematic study of SAT- and MaxSAT-based formulations for exact QLDPC distance computation across representative codes. We further compare these formulations against several established exact-distance approaches in order to better understand the algorithmic landscape of exact QLDPC distance computation. Our study challenges and refines several prevailing intuitions about exact QLDPC distance computation. First, despite the XOR-rich structure of QLDPC parity checks, practical scalability appears to be governed more by the handling of cardinality constraints and optimization bounds than by parity reasoning alone. Accordingly, XOR-aware reasoning does not provide a systematic advantage across our benchmark suite. Second, Brouwer-Zimmermann-style search, long regarded as the benchmark paradigm for exact distance computation in sparse classical codes, no longer maintains its traditional scalability advantage in the QLDPC setting. This finding challenges the expectation that techniques successful for sparse classical codes remain dominant for QLDPC codes. Third, substantial qualitative differences arise even among MaxSAT solvers themselves. Branch-and-bound MaxSAT significantly outperforms unsat-core-based MaxSAT on challenging benchmarks, demonstrating that solver architecture and optimization strategy play a decisive role in practical scalability.2026-05-29T15:34:33Z15 pages of main text and 28 pages of appendix. 3 figuresYu-Fang ChenSeyed Mohammad Reza JafariChing-Yi Laihttp://arxiv.org/abs/2603.29400v2Model-Based Beam-Steered Optical Wireless Positioning with Single-LED Single-Photodiode for 3D Localization2026-05-29T15:07:20ZState-of-the-art optical wireless positioning (OWP) commonly reaches centimeter-level accuracy by depending on dense multi-light-emitting diodes (LED) infrastructures, photodiode (PD) arrays, or image-sensor receivers, incurring hardware complexity and deployment cost. This paper introduces a single beam-steered LED, single-PD OWP architecture that achieves three-dimensional (3D) localization without receiver rotation, cameras, or PD arrays; the core idea is to steer the transmitter through K known orientations and exploit the resulting received-signal-strength variations at the PD to estimate LED-to-PD direction and distance. We derive a composite Cramer-Rao lower bound and position-error bound (PEB) for the joint observation model, and cast the steering-pattern design as a genetic algorithm that minimizes the PEB over a 3D testbed. We develop both model-based a constrained nonlinear estimator and closed-form direction estimators: a statistically efficient generalized least squares solution, and a lightweight weighted least squares approximation. Simulations demonstrate centimeter-level accuracy for 3D OWP with a single beam-steered LED and a single PD.2026-03-31T08:05:04ZKevin Acuna-CondoriBastien BéchadergueHongyu GuanLuc Chassagnehttp://arxiv.org/abs/2605.31379v1Rényi divergences and binary state discrimination error exponents for fermionic quasi-free states2026-05-29T14:48:00ZThe trade-off relations between the two types of error probabilities in binary i.i.d. quantum state discrimination can be expressed by single-copy formulas in terms of the Petz-type and the sandwiched Rényi divergences of the two states representing the two hypotheses. In the non-i.i.d. setting, the error exponents can usually be expressed in terms of regularized Rényi divergences, which do not admit explicit formulas in general. Here, we consider a class of states, translation-invariant and gauge-invariant quasifree states on doubly infinite fermionic chains, and give explicit formulas for a wide range of regularized Rényi divergences between such states, including $(α,z)$, log-Euclidean, maximal, measured, and the recently introduced integral Rényi divergences. We show that the case where there is a single mode at each lattice site becomes asymptotically classical, with all the different types of regularized Rényi divergences being equal, while in the case of multiple modes per site, non-commutativity persists under regularization, and for any fixed $α$, the regularized Rényi $(α,z)$-divergences give different regularized values for different $z$ parameters in general. We also generalize a previous construction from [Bunth, Maróti, Mosonyi, Zimborás, Lett.~Math.~Phys.~113:(7), 2023] to the case of multiple modes per lattice site to obtain a large class of states exhibiting super-exponential decay of the discrimination error probabilities.2026-05-29T14:48:00Z38 pagesMilán MosonyiGábor Maróti-Zareczkyhttp://arxiv.org/abs/2605.31353v1Sensing with Random Signals: The Role of Time Sharing2026-05-29T14:28:45ZIn monostatic, decision-aided, or known-waveform integrated sensing and communications (ISAC) formulations, the sensing receiver is often modeled as knowing the transmitted waveform. This assumption is not suitable for passive, bistatic, or distributed settings where the sensing receiver knows the signaling rule but not the transmitted symbols. We study such a symbol-unaware ISAC model, where sensing is measured by the unconditioned mutual information $I(S;V)$ rather than the symbol-aware quantity $I(S;V|X)$. For discrete-input memoryless channels, we characterize the capacity-sensing region through an auxiliary time-sharing variable, showing that the optimal upper boundary is the upper concave envelope of the single-mode frontier. Thus, explicit time sharing is unnecessary when the single-mode frontier is already concave, but strictly beneficial when its upper concave envelope strictly dominates the frontier. For Rayleigh-fading BPSK, we further show that the curvature of the single-mode boundary is determined by the stochastic ordering of the communication- and sensing-side effective SNR distributions. Communication-side dominance yields a concave single-mode frontier and no time-sharing gain, sensing-side dominance yields a convex single-mode frontier and a strict time-sharing gain, and equality yields a linear boundary. The result extends to SIMO-BPSK through the ordering of post-combining SNR distributions. These findings explain when symbol-unaware ISAC optimally moves from data-symbol transmission to pilot-like sensing modes.2026-05-29T14:28:45ZYi GengWenyi Zhanghttp://arxiv.org/abs/2512.08376v2A Distribution Testing Approach to Clustering Distributions2026-05-29T13:21:46ZWe study the following distribution clustering problem: Given a hidden partition of $k$ distributions into two groups, such that the distributions within each group are the same, and the two distributions associated with the two clusters are $\varepsilon$-far in total variation, the goal is to recover the partition. We establish upper and lower bounds on the sample complexity for two fundamental cases: (1) when one of the cluster's distributions is known, and (2) when both are unknown. Our upper and lower bounds characterize the sample complexity's dependence on the domain size $n$, number of distributions $k$, size $r$ of one of the clusters, and distance $\varepsilon$. In particular, we achieve tightness with respect to $(n,k,r,\varepsilon)$ (up to an $O(\log k)$ factor) for all regimes.2025-12-09T09:01:41ZGunjan KumarYash PoteJonathan Scarletthttp://arxiv.org/abs/2602.09405v2Is Memorization Helpful or Harmful? Prior Information Sets the Threshold2026-05-29T13:00:24ZWe examine the connection between training error and generalization error for arbitrary estimating procedures, working in an overparameterized linear model under general priors in a Bayesian setup. We find determining factors inherent to the prior distribution $π$, giving explicit conditions under which optimal generalization necessitates that the training error be (i) near interpolating relative to the noise size (i.e., memorization is necessary), or (ii) close to the noise level (i.e., overfitting is harmful). Remarkably, these phenomena occur when the noise reaches thresholds determined by the Fisher information and the variance parameters of the prior $π$.2026-02-10T04:35:29Z33 pages, 3 figures. Accepted to the Conference on Learning Theory (COLT) 2026Chen ChengRina Foygel Barberhttp://arxiv.org/abs/2605.31214v1Geometric construction of k-optimal locally repairable codes2026-05-29T12:18:51ZA linear code is referred to as a locally repairable code (LRC) with locality r if any erased code symbol can be recovered by accessing at most r other code symbols. LRCs are highly desirable for distributed storage systems to enhance repair efficiency. In this paper, we investigate LRCs with disjoint repair sets via the parity-check matrix method. Firstly, we propose a novel concept of the s-Pasch configuration and present a geometric characterization for the existence of LRCs with minimum distance 5 and locality 3. Subsequently, we construct k-optimal LRCs by exploiting the point-line relationship in PG(2,q). Finally, a family of q-ary k-optimal LRCs with minimum distance 6 and general locality r is constructed using partial r-spreads.2026-05-29T12:18:51ZYi FuXiuling Shanhttp://arxiv.org/abs/2605.31209v1q-Exponential Random Graphs: higher-order networks from simple constraints2026-05-29T12:14:57ZExponential Random Graphs (ERGs) are among the most widely used network models, derived as principled least-bias graph ensembles that maximize Shannon entropy under constraints on the expected values of given structural properties. However, it has been recently (re)discovered that, in the absence of additional information privileging Shannon entropy, the most agnostic inferential construction should maximize the broader class of Uffink entropies. The resulting entropy-maximizing distribution changes from the exponential (Boltzmann-Gibbs) to the so-called q-exponential one. Since maximizing Shannon entropy may produce an unjustified independence between degrees of freedom, here we investigate how the most popular ERGs with independent edges (namely, the Erdos-Renyi and configuration models) generalize to higher-order q-Exponential Random Graphs with dependent edges in the non-Shannon case, while keeping their defining constraints (number of links and degree sequence, respectively) unchanged. We find features, such as a phase transition between sparse and dense regimes, that are absent in the original ERGs but typical of higher-order networks, plus novel phenomena such as richer assortativity and clustering profiles, which allow for the coexistence of link sparsity and triadic closure. These results show that higher-order networks do not necessarily require higher-order constraints, as they naturally arise from simpler ones in a framework that is even more agnostic than Shannon's.2026-05-29T12:14:57Z35 pages, 16 figures. Complementary code to reproduce numerical experiments is available at https://github.com/DavidDobas/q-exponential-random-graphsDavid DobášDiego GarlaschelliPetr Jizbahttp://arxiv.org/abs/2602.04107v2Supervised Learning as Lossy Compression: Characterizing Generalization and Sample Complexity via Finite Blocklength Analysis2026-05-29T10:06:26ZThis paper presents a novel information-theoretic perspective on generalization in machine learning by framing the learning problem within the context of lossy compression and applying finite blocklength analysis. In our approach, the sampling of training data formally corresponds to an encoding process, and the model construction to a decoding process. By leveraging finite blocklength analysis, we derive lower bounds on sample complexity and generalization error for a fixed randomized learning algorithm and its associated optimal sampling strategy. Our bounds explicitly characterize the degree of overfitting of the learning algorithm and the mismatch between its inductive bias and the task as distinct terms. This separation provides a significant advantage over existing frameworks. Additionally, we decompose the overfitting term to show its theoretical connection to existing metrics found in information-theoretic bounds and stability theory, unifying these perspectives under our proposed framework.2026-02-04T00:45:01Z40 pages, 1 figureKosuke SugiyamaMasato Uchida