https://arxiv.org/api/oN1TAJ3QU5OmrGAOdBNJSqQclZQ 2026-06-27T12:43:42Z 54938 1005 15 http://arxiv.org/abs/2605.04915v2 Optimal Error Exponents for Composite Sequential Quantum Hypothesis Testing 2026-05-10T15:58:10Z

We study the composite sequential quantum hypothesis testing (SQHT) problem, where the objective is to distinguish a null quantum state from a set of alternative quantum states. We propose a mixture-sequential quantum probability ratio test that adaptively selects measurements based on the current mixture estimate of the alternative set, and stops upon the first threshold crossing of the mixture log-likelihood ratio. Under an expected sample size constraint, we show that our proposed strategy simultaneously achieves the Type-I and (worst-case) Type-II error exponents, characterized by the minimal measured relative entropies between the null state and the alternative set. We further establish a matching converse, thereby characterizing the optimal error exponent region. Finally, our results show that achieving vanishing error probabilities in composite SQHT requires an expected sample complexity at least as large as that of sequential testing between two fixed states.

2026-05-06T13:44:17Z Under Review Jacob Paul Simpson Efstratios Palias Sharu Theresa Jose http://arxiv.org/abs/2605.09617v1 Symmetric Sudoku-Type Games from Perfect Codes 2026-05-10T15:56:28Z

This paper presents a novel construction method for symmetric Sudoku-type games based on Lee distance perfect codes and diameter perfect codes. The proposed method utilizes the tiling property of these codes to define the structure of the subgrid constraints of Sudoku-type games. In this way, our games inherit the symmetric properties of Sudoku. We provide a detailed analysis of two small cases: a $5 \times 5$ Sudoku in $\mathbb{Z}_5^2$, and an $8 \times 8$ Sudoku in $\mathbb{Z}_8^2$. By defining equivalence relations via rigid motions, we provide a complete enumeration of valid grids, identifying 17 inequivalent solutions for $5\times 5$ Sudoku. For two different types of $8\times 8$ Sudoku, we characterize 232,735 and 304,014 inequivalent solutions, respectively. Furthermore, to verify practical playability, we implement a human-like solver that assesses the difficulty of the generated games. The analysis confirms that our $5\times5$ Sudoku games offer a balanced distribution of difficulty levels, ranging from Easy to Hard, making them a viable alternative to traditional $9 \times 9$ Sudoku.

2026-05-10T15:56:28Z 24 pages, 7 figures Junmin An Jae-Hyun Baek Keon-Hwi Kim Haeun Lim Jon-Lark Kim 10.1109/TG.2026.3683305 http://arxiv.org/abs/2605.09608v1 Geometry Conflict: Explaining and Controlling Forgetting in LLM Continual Post-Training 2026-05-10T15:40:44Z

Continual post-training aims to extend large language models (LLMs) with new knowledge, skills, and behaviors, yet it remains unclear when sequential updates enable capability transfer and when they cause catastrophic forgetting. Existing methods mitigate forgetting through sequential fine-tuning, replay, regularization, or model merging, but offer limited criteria for determining when incorporating new updates is beneficial or harmful. In this work, we study LLM continual post-training through three questions: What drives forgetting? When do sequentially acquired capabilities transfer or interfere? How can compatibility be used to control update integration? We address these questions through task geometry: we represent each post-training task by its parameter update and study the covariance geometry induced by the update. Our central finding is that: forgetting can be considered as a state-relative update-integration failure, it arises when the covariance geometries induced by tasks misalign with the geometry of the evolving model state. Sequential updates transfer when they remain compatible with the model state shaped by previous updates, and interfere when state-relative geometry conflict becomes high. Motivated by this finding, we propose Geometry-Conflict Wasserstein Merging (GCWM), a data-free update-integration method that constructs a shared Wasserstein metric via Gaussian Wasserstein barycenters and uses geometry conflict to gate geometry-aware correction. Across Qwen3 0.6B--14B on domain-continual and capability-continual settings, GCWM consistently outperforms data-free baselines, improving retention and final performance without replay data. These results identify geometry conflict as both an explanatory signal for forgetting and a practical control signal for LLM continual post-training.

2026-05-10T15:40:44Z Yuanyi Wang Yifan Yang Su Lu Yanggan Gu Pengkai Wang Wenjun Wang Zhaoyi Yan Congkai Xie Jianmin Wu Jialun Cao Shing-Chi Cheung Hongxia Yang http://arxiv.org/abs/2605.09561v1 Sparse Discrete Laplace and Gaussian Mechanisms under Local Differential Privacy 2026-05-10T14:24:30Z

We study sparse locally private channels of the form $M(y\mid x)\propto w(x,y) 1\{y\in S(x)\},$ where the admissible output set $S(x)$ is allowed to depend on the private input $x$ and is assumed to be small. Here, we consider the sparse discrete-Laplace family with kernel $w(x,y)=e^{-λd(x,y)}$ and the sparse Gaussian family with kernel $w(x,y)=e^{-d(x,y)^2/(2σ^2)}$. For both families we give exact characterizations of pure and approximate local differential privacy. For pure $\varepsilon$-local differential privacy, we show that input-dependent sparse supports are obtained when all supports coincide. For $(\varepsilon,δ)$-local differential privacy, we derive exact formulas for the privacy defect in terms of support leakage and excess privacy loss on the overlap region. We then specialize the analysis to radius-truncated sparse discrete-Laplace and radius-truncated sparse Gaussian mechanisms and obtain explicit privacy-sparsity tradeoffs in terms of the support size $s$. In particular, we show that nontrivial approximate local privacy requires a minimum support size, whereas larger supports reduce support leakage but increase distortion. For the Gaussian family, the overlap term exhibits an additional quadratic dependence on the support radius, which implies a sharper tradeoff between privacy and sparsity. These results identify the support cardinality as the intrinsic complexity parameter of the mechanism and yield an optimal design principle: choose the smallest support size that satisfies the target privacy constraint.

2026-05-10T14:24:30Z Amirreza Zamani Sajad Daei Parastoo Sadeghi Mikael Skoglund http://arxiv.org/abs/2501.12072v3 Fault-tolerant syndrome extraction in [[n,1,3]] non-CSS code family generated using measurements on graph states 2026-05-10T11:51:59Z

The reliability of quantum computation critically depends on the performance of quantum error-correcting codes (QECCs). Performance of QECCs can be severely degraded by hook errors, which effectively reduce the code distance. In this work, we construct a family of $[[n,1,3]]$ non-CSS QECCs, which are fault-tolerant (FT) against noisy syndrome measurements. We employ the bare-ancilla method of Muyuan Li \emph{et al.} to demonstrate fault tolerance against hook errors during syndrome extraction. We present a systematic protocol for generating these QECCs using graph codes and propose a family of $[[n,1,3]]$ codes that preserve the fault-tolerant properties of the bare ancilla codes. We use a custom lookup-table decoder and simulate the code's performance under both anisotropic and circuit-level depolarizing noise. Our results reveal a trade-off in performance with respect to the code rate and identify optimized codes under these noise models. We benchmark our results against the flag-qubit method of Chao \emph{et al}. Notably, we report a new bare ancilla code with improved code rate while maintaining the same distance compared to the bare code used in the work of Muyuan Li \emph{et al.}

2025-01-21T11:55:44Z 17 pages, 10 figures Harsh Gupta Mainak Bhattacharyya Ritik Jain Ankur Raina http://arxiv.org/abs/2604.06822v2 Non-RS type cyclic MDS codes over finite fields via cyclotomic field reduction 2026-05-10T08:28:46Z

Cyclic maximum distance separable (MDS for short) codes are a special subclass of linear codes and have received a lot of attention, as these codes have very important applications in many areas including quantum codes, designs and finite geometry. However, the existing construction methods for cyclic MDS codes are mainly focused on strict restrictions on certain parameters or are relatively complex in their construction approaches. In this paper, we investigate this approach further via norm reduction in cyclotomic fields. By converting the verification of the MDS property over a finite field into checking non-zero minors in characteristic zero, we propose a construction method of cyclic MDS codes over finite fields via cyclotomic field reduction. Based on this method, we obtain several cyclic MDS codes over finite fields and many non-RS type cyclic MDS codes are produced. Compared with the existing construction methods, our method is relatively simpler. Moreover, the results of this paper show that the parameters of the obtained non-RS cyclic MDS codes are flexible.

2026-04-08T08:40:25Z Can Xiang Chunming Tang http://arxiv.org/abs/2605.09396v1 Universal Feature Selection with Noisy Observations and Weak Symmetry Conditions 2026-05-10T07:49:12Z

This paper relaxes the restrictive symmetry conditions adopted in [4], [5] and extends their universal feature selection framework to accommodate noisy observations as well as attribute structures that may exhibit directional preferences. We introduce the notion of weak spherical symmetry, quantified by second-moment distances, which allows controlled deviations from rotational invariance. Under this relaxed condition, we develop a universal feature selection framework based on the singular value decomposition of the canonical dependence matrix computed from noisy data. Our main result shows that the selected features achieve asymptotically optimal error exponents up to a residual term that depends on the symmetry deviation $δ$ and the noise levels $η_1, η_2$. When $δ, η_1, η_2$ are relatively small, our result recovers that of [5], thereby demonstrating that exact spherical symmetry is unnecessary. Overall, our findings highlight the robustness of the selection framework against second-moment deviations and observation noise, thereby broadening its applicability across diverse inference tasks and providing a theoretically grounded tool for universal feature selection in practical scenarios.

2026-05-10T07:49:12Z 6 pages, 0 figures. This work has been submitted to the 2026 IEEE Information Theory Workshop (ITW) for possible publication Dier Tang Department of Mathematics, The University of Hong Kong, Hong Kong, China Guangyue Han Department of Mathematics, The University of Hong Kong, Hong Kong, China http://arxiv.org/abs/2605.09353v1 Covert Capacity of Degraded Broadcast Channels 2026-05-10T06:06:54Z

We derive the capacity region of the degraded broadcast channel (DBC) subject to the constraint that the communication is not detected by an adversary, the Warden. Our capacity result is in a computable form and numerical results show that time-sharing is suboptimal in general, and improved rates can be obtained through superposition coding.

2026-05-10T06:06:54Z Yossef Steinberg Michèle Wigger http://arxiv.org/abs/1708.05468v3 Information-Theoretic Privacy with General Distortion Constraints 2026-05-10T01:24:33Z

The privacy-utility tradeoff problem is formulated as determining the privacy mechanism (random mapping) that minimizes the mutual information (a metric for privacy leakage) between the private features of the original dataset and a released version. The minimization is studied with two types of constraints on the distortion between the public features and the released version of the dataset: (i) subject to a constraint on the expected value of a cost function $f$ applied to the distortion, and (ii) subject to bounding the complementary CDF of the distortion by a non-increasing function $g$. The first scenario captures various practical cost functions for distorted released data, while the second scenario covers large deviation constraints on utility. The asymptotic optimal leakage is derived in both scenarios. For the distortion cost constraint, it is shown that for convex cost functions there is no asymptotic loss in using stationary memoryless mechanisms. For the complementary CDF bound on distortion, the asymptotic leakage is derived for general mechanisms and shown to be the integral of the single letter leakage function with respect to the Lebesgue -- Stieltjes measure defined based on the refined bound on distortion. However, it is shown that memoryless mechanisms are generally suboptimal in both cases.

2017-08-17T23:56:01Z Kousha Kalantari Oliver Kosut Lalitha Sankar http://arxiv.org/abs/2605.09214v1 Fast Rates for Offline Contextual Bandits with Forward-KL Regularization under Single-Policy Concentrability 2026-05-09T23:17:46Z

\emph{Kullback-Leibler} (KL) regularization is ubiquitous in reinforcement learning algorithms in the form of \emph{reverse} or \emph{forward} KL. Recent studies have demonstrated $ε^{-1}$-type fast rates for decision making under reverse KL regularization, in contrast to the standard $ε^{-2}$-type sample complexity. However, for forward-KL-regularized objectives, existing statistical analyses are either not applicable or result in $\tilde{O}(ε^{-2})$ slow rates. We take the first step towards addressing this problem via a streamlined analysis of forward-KL-regularized offline CBs. We give the first $\tilde{O}(ε^{-1})$ upper bounds in tabular and general function approximation settings, both under notions of \emph{single-policy concentrability}. In particular, our convex-analytical pipeline unifies these settings by exploiting the pessimism principle in a novel way and completely bypasses the proof routines in previous works based on the mean value theorem, which might be of independent interest. Moreover, we provide rate-optimal lower bounds, manifesting the tightness of our upper bounds in terms of statistical rates. Our lower bounds also demonstrate that the forward-KL-regularized sample complexity recovers the unregularized slow rate in the low-regularization regime, similarly to the reverse-KL regularization.

2026-05-09T23:17:46Z 31 pages, comments are welcome Qingyue Zhao Kaixuan Ji Heyang Zhao Quanquan Gu http://arxiv.org/abs/2605.03184v2 Single-Period Portfolio Selection via Information Projection 2026-05-09T20:52:19Z

We study the single-period portfolio selection problem under Constant Relative Risk-Aversion (CRRA) utility through the information-theoretic lens. Assuming only that the market payoff vector has finite support, we show that the Certainty-Equivalent (CE) growth rate under CRRA utility can be decomposed into a portfolio-induced Rényi divergence term, a Rényi entropy term of the risk-tilted market law, and a log-partition term. In this setting, the Rényi order has a clear operational meaning: it exactly coincides with the investor's coefficient of relative risk aversion. We further show that CRRA portfolio selection is equivalent to a Rényi information-projection problem. Using a variational representation of Rényi divergence, we obtain a Blahut-Arimoto-style alternating optimization with a closed-form auxiliary update and a KL-type portfolio step. In the low risk-aversion regime, this method empirically requires fewer iterations than both direct CRRA utility optimization and Cover's method.

2026-05-04T21:52:58Z Submitted to IEEE ITW 2026 Bo-Yu Yang Michael Gastpar http://arxiv.org/abs/2511.14849v3 Channel Coding for Gaussian Channels with Multifaceted Power Constraints 2026-05-09T19:37:12Z

Through refined asymptotic analysis based on the normal approximation, we study how higher-order coding performance depends on the mean power as well as on finer statistics of the input power. We introduce a multifaceted power model in which the expectation of an arbitrary (but finite) number of arbitrary functions of the normalized average power is constrained. The framework generalizes existing models, recovering the standard maximal and expected power constraints and the recent mean and variance constraint as special cases. Under certain growth and continuity assumptions on the functions, our main theorem gives an exact characterization of the minimum average error probability for Gaussian channels as a function of the first- and second-order coding rates. The converse proof reduces the code design problem to minimization over a compact (under the Prokhorov metric) set of probability distributions, characterizes the extreme points of this set and invokes the Bauer's maximization principle. Our results for the multifaceted power model serve as more precise benchmarks for practical modulation schemes with multiple amplitude levels, probabilistic shaping and nonuniform constellation geometries.

2025-11-18T19:04:06Z Adeel Mahmood Aaron B. Wagner 10.1109/TIT.2026.3690380 http://arxiv.org/abs/2605.09133v1 On Conservative Statistical Riemann Surfaces 2026-05-09T19:30:11Z

We establish a correspondence between information geometry and gauge theory. First, we define an important class of statistical manifolds, that is normalized and satisfies a conservation field equation. Second, we prove that for a conservative statistical structure on an orientable surface, the Chebyshev 1-form is constrained to be harmonic, and the traceless part of the Amari--Chentsov tensor descends to a holomorphic cubic differential. Then, we demonstrate that normalized conservative statistical structures are geometrically generated by solutions to the scalar Tzitzéica equation on Higgs bundles with general linear holonomy, generalizing the Labourie-Loftin correspondence. Finally, we prove that the moduli space of normalized conservative statistical structures on a closed orientable surface of genus at least 2 is completely parameterized by a holomorphic vector bundle over the Teichmüller space, consisting of Abelian differentials and cubic differentials.

2026-05-09T19:30:11Z 10 pages, 0 figure Hanwen Liu http://arxiv.org/abs/2605.09121v1 A Communication-Theoretic Framework for LLM Agents: Cost-Aware Adaptive Reliability 2026-05-09T19:14:33Z

Agents built on large language models (LLMs) rely on a range of reliability techniques, including retry, majority voting, and self-consistency, that have been developed in parallel rather than within a common analytical framework. We observe that an LLM sampled at temperature $T$ is a discrete stochastic channel $p(y \mid x)$ in the sense of Shannon's coding theory, and use this identity as the entry point for such a framework grounded in communication theory. Each of these techniques is a special case of one of six classical reliability operators: diversity combining, hybrid retransmission, iterative generator-critic decoding, rateless sampling, structured redundant verification, and difficulty-adaptive routing. Within the framework we give two closed-form results: a noise-variance threshold above which uniform averaging beats quality-weighted averaging, and a contractivity criterion for generator-critic refinement, consistent with a contractive-to-divergent transition we observe between 3B- and 14B-parameter models. We further introduce a cost-aware semantic-nearest-neighbor router whose single Lagrangian knob traverses the quality-cost frontier without retraining. Across six channel configurations spanning local and cloud models on 69 hard tasks, no fixed model-technique-budget choice dominates, motivating per-task allocation. On a 300-item hard split of MMLU, GSM8K, and HumanEval, our router occupies the full empirical Pareto frontier: at matched quality, its normalized cost is ${\approx}56$\% lower than the strongest fixed technique; at matched normalized cost, it improves quality by ${\approx}7$\% ($26$\% over single-shot decoding). These results argue for consolidating these reliability techniques into a single tunable layer informed by channel coding.

2026-05-09T19:14:33Z Hamed Omidvar Vahideh Akhlaghi http://arxiv.org/abs/2603.08308v3 Weighted Chernoff information and optimal loss exponent in context-sensitive hypothesis testing 2026-05-09T17:52:18Z

We study binary hypothesis testing for i.i.d. observations under a multiplicative context weight. For the optimal weighted total loss, defined as the sum of weighted type-I and type-II losses, we prove the logarithmic asymptotic $$ L_n^* = \exp\{-n D_C^{\mathrm{w}}(\mathbb{P}, \mathbb{Q}) + o(n)\}, \quad n \to \infty, $$ where $D_C^{\mathrm{w}}$ is the weighted Chernoff information. The single-letter form of the exponent relies on a structural assumption that the weight factorises across observations, $\varphi(x_1^n) = \prod_{i=1}^n \varphi(x_i)$; this restriction is essential for the single-letter representation and should be distinguished from the weaker qualitative description "multiplicative context weight". The proof embeds the weighted geometric mixtures $\varphi p^αq^{1-α}$ into a likelihood-ratio exponential family and identifies the rate through its log-normaliser. We also derive concentration bounds for the tilted weighted log-likelihood, obtain closed forms for Gaussian, Poisson, and exponential models, and extend the exponent characterisation to finitely many hypotheses.

2026-03-09T12:31:03Z 30 pages, 3 figures, 1 table (2026) Entropy, 28(5), 536 Mark Kelbert El'mira Yu. Kalimulina 10.3390/e28050536