https://arxiv.org/api/0Cn34pPPO4eqDzAiQ7TdWBJX/h82026-06-27T07:36:24Z5493893015http://arxiv.org/abs/2605.12084v1Learning What Matters: Adaptive Information-Theoretic Objectives for Robot Exploration2026-05-12T13:07:27ZDesigning learnable information-theoretic objectives for robot exploration remains challenging. Such objectives aim to guide exploration toward data that reduces uncertainty in model parameters, yet it is often unclear what information the collected data can actually reveal. Although reinforcement learning (RL) can optimize a given objective, constructing objectives that reflect parametric learnability is difficult in high-dimensional robotic systems. Many parameter directions are weakly observable or unidentifiable, and even when identifiable directions are selected, omitted directions can still influence exploration and distort information measures. To address this challenge, we propose Quasi-Optimal Experimental Design (Q{\footnotesize OED}), an adaptive information objective grounded in optimal experimental design. Q{\footnotesize OED} (i) performs eigenspace analysis of the Fisher information matrix to identify an observable subspace and select identifiable parameter directions, and (ii) modifies the exploration objective to emphasize these directions while suppressing nuisance effects from non-critical parameters. Under bounded nuisance influence and limited coupling between critical and nuisance directions, Q{\footnotesize OED} provides a constant-factor approximation to the ideal information objective that explores all parameters. We evaluate Q{\footnotesize OED} on simulated and real-world navigation and manipulation tasks, where identifiable-direction selection and nuisance suppression yield performance improvements of \SI{35.23}{\percent} and \SI{21.98}{\percent}, respectively. When integrated as an exploration objective in model-based policy optimization, Q{\footnotesize OED} further improves policy performance over established RL baselines.2026-05-12T13:07:27ZYouwei YuJionghao WangZhengming YuWenping WangLantao Liuhttp://arxiv.org/abs/2605.12080v1On Capacity and Delay of Wireless Networks with Node Failures2026-05-12T13:06:00ZOne key challenge in designing resilient large-scale wireless ad hoc networks is to understand how random node failures affect fundamental network performance. In this work, we show that both network capacity and delay scale as \scalebox{0.65}{$\textstyle Θ\left(\sqrt{\frac{n(1-q)}{\log n}}\right)$}, where $n$ is the total number of nodes and $q$ is the node failure probability. The network capacity degenerates to the classical result given by P. Gupta and P. R. Kumar when $q=0$. Based on these results, we find that even with the same number of non-faulty nodes, a network with $n$ nodes and node failure probability $q$ has lower network capacity than a failure-free network with $n(1-q)$ nodes. To compensate for the network capacity loss caused by random node failures, at least $ε(n,q) nq$ redundant nodes are required, where $ε(n,q)>1$. We further prove that the optimal trade-off between network capacity and delay remains $O(1)$ regardless of node failures, implying that high network capacity and low delay cannot be achieved simultaneously. These results demonstrate robustness against stochastic variations in wireless channels.2026-05-12T13:06:00ZWei LiMin ShengJunyu LiuJiandong Lihttp://arxiv.org/abs/2605.12063v1Memory Constrained Adversarial Hypothesis Testing2026-05-12T12:49:16ZWe study adversarial binary hypothesis testing under memory constraints. The test is a time-invariant randomized finite state machine (FSM) with S states. Associated with each hypothesis is a set of distributions. Given the hypothesis, the distribution of each sample is chosen from the set associated with the hypothesis by an adversary who has access to past samples and the history of states of the FSM so far. We obtain upper and lower bounds on the minimax asymptotic probability of error as a function of S. The bounds have the same exponential behaviour in S and match for a class of problems.2026-05-12T12:49:16ZMalhar A. ManagoliVinod M. Prabhakaranhttp://arxiv.org/abs/2605.12049v1Scaling Laws and Tradeoffs in Recurrent Networks of Expressive Neurons2026-05-12T12:29:33ZCortical neurons are complex, multi-timescale processors wired into recurrent circuits, shaped by long evolutionary pressure under stringent biological constraints. Mainstream machine learning, by contrast, predominantly builds models from extremely simple units, a default inherited from early neural-network theory. We treat this as a normative architectural question. How should one split a fixed parameter budget $P$ between the number of units $N$, per-unit effective complexity $k_e$, and per-unit connectivity $k_c$? What controls the optimal allocation? This calls for a model in which per-unit complexity can be tuned independently of width and connectivity. Accordingly, we introduce the ELM Network, whose recurrent layer is built from Expressive Leaky Memory (ELM) neurons, chosen to mirror functional components of cortical neurons. The architecture allows for individually adjusting $N$, $k_e$, and $k_c$ and trains stably across orders of magnitude in scale. We evaluate the model on two qualitatively different sequence benchmarks: the neuromorphic SHD-Adding task and Enwik8 character-level language modeling. Performance improves monotonically along each of the three axes individually. Under a fixed budget, a clear non-trivial optimum emerges in their tradeoff, and larger budgets favor both more and more complex neurons. A closed-form information-theoretic model captures these tradeoffs and attributes the diminishing returns at two ends to: per-neuron signal-to-noise saturation and across-neuron redundancy. A hyperparameter sweep spanning three orders of magnitude in trainable parameters traces a near-Pareto-frontier scaling law consistent with the framework. This suggests that the simple-unit default in ML is not obviously optimal once this tradeoff surface is probed, and offers a normative lens on cortex's reliance on complex spatio-temporal integrators.2026-05-12T12:29:33Z25 pages, 21 figures, 3 tables, including derivations. Submitted for peer reviewAaron SpielerGeorg MartiusAnna Levinahttp://arxiv.org/abs/2605.12001v1CR^2: Cost-Aware Risk-Controlled Routing for Wireless Device-Edge LLM Inference2026-05-12T11:50:15ZAs large language models (LLMs) move from centralized clouds to mobile edge environments, efficient serving must balance latency, energy consumption, and accuracy under constrained device-edge resources. Query-level routing between lightweight on-device models and stronger edge models provides a flexible mechanism to navigate this trade-off. However, existing routers are designed for centralized cloud settings and optimize token-level costs, failing to capture the dynamic latency and energy overheads in wireless edge deployments. In this paper, we formulate mobile edge LLM routing as a deployment-constrained, cost-aware decision problem, and propose CR^2, a two-stage device-edge routing framework. CR^2 decouples a lightweight on-device margin gate from an edge-side utility selector for deferred queries. The margin gate operates on frozen query embeddings and a user-specified cost weight to predict whether local execution is utility-optimal relative to the best edge alternative under the target operating point. We further introduce a conformal risk control (CRC) calibration procedure that maps each operating point to an acceptance threshold, enabling explicit control of the marginal false-acceptance risk under the full-information utility reference. Experiments on the routing task show that CR^2 closely matches a full-information reference router using only device-side signals before deferral. Compared with strong query-level baselines, CR^2 consistently improves the deployable accuracy-cost Pareto frontier and reduces normalized deployment cost by up to 16.9% at matched accuracy.2026-05-12T11:50:15Zsubmitted to IEEE JournalNan XueShengkang ChenZhiyong ChenJiangchao YaoYaping SunZixia HuMeixia Taohttp://arxiv.org/abs/2605.11998v1From Submodularity to Matrix Determinants: Strengthening Han's, Szász's, and Fischer's Inequalities2026-05-12T11:47:34ZDembo, Cover, and Thomas (1991) developed an elegant information-theoretic framework for proving determinantal inequalities for positive definite matrices, which relies on the structural inequalities of differential entropy. Submodular functions, which subsume entropy, inherently satisfy these structural inequalities because they obey generalized forms of the fundamental properties of entropy -- a chain rule and the property that conditioning reduces the function's value (under an appropriate definition of conditioning). Applying subadditivity, Han's inequality (1978), and partition subadditivity (i.e., subadditivity over a partition) yields Hadamard's, Szász's, and Fischer's inequalities, respectively. Furthermore, this framework recovers Ky Fan's inequality (1955), a strengthening of Hadamard's inequality. This improvement fundamentally arises because conditional subadditivity yields a tighter upper bound on the joint entropy than the one obtained via unconditional subadditivity.
In this paper, we establish conditional strengthenings of Han's inequality and partition subadditivity in the general setting of submodular functions. We derive equality conditions for these strengthened bounds and characterize when they strictly improve their unconditional counterparts. We specialize these results to differential entropy and apply them to establish strengthened versions of Szász's and Fischer's inequalities. The strengthening of Szász's inequality recovers Ky Fan's inequality as a special case, and is strictly stronger than the classical Szász's inequality for any non-diagonal positive definite matrix. We also derive an inequality concerning eigenvalues, which generalizes and strictly strengthens a corresponding eigenvalue inequality of Ky Fan. We provide numerical examples to explicitly illustrate the tightness of our proposed matrix determinantal bounds.2026-05-12T11:47:34Z12 pagesGunank JakharGowtham R. KurriSuryajith ChillaraVinod M. Prabhakaranhttp://arxiv.org/abs/2605.11912v1Constacyclic codes of length $np^s$ over $\frac{\mathbb{F}_{p^m}[u]}{\langle u^t\rangle}$: Torsions and Cardinalities2026-05-12T10:26:33ZThe purpose of this article is to study constacyclic codes of length $np^s$ over $R^t:=\frac{\mathbb{F}_{p^m}[u]}{\langle u^t \rangle },$ where $t$ is a natural number and $\gcd(n,p)=1$. We give generators of all the ideals of $R^{t,n}_δ:=\frac{R^t[x]}{\langle x^{np^s}-δ\rangle},$ where $δ= δ_0+uδ_1+\dots+u^{t-1}δ_{t-1}$ is a unit in $R^t$. For $n=1,\ 2, \ 3$ and $t=3$, we provide all types of ideals (constacyclic codes) and also give the torsional degrees as well as cardinalities of these codes.2026-05-12T10:26:33ZAkanksha TiwariPramod KanwarRitumoni Sarmahttp://arxiv.org/abs/2605.07624v3Kolmogorov--Nagumo Mean Frameworks for Conditional Entropy2026-05-12T10:03:58ZThis study focuses on conditional entropy frameworks based on the Kolmogorov--Nagumo (KN) mean. First, $(η, ψ)$-KN averaging (\texttt{EPKNAVG}), a KN-mean extension of the $η$-averaging (\texttt{EAVG}) framework for $(η, F)$-entropies, is introduced and proven to be equivalent to \texttt{EAVG} under suitable concavification conditions. Second, motivated by generalized $g$-vulnerability, a new framework is proposed for generalized $g$-conditional entropies. This framework captures conditional entropies beyond the scope of \texttt{EAVG}-type representations. In particular, it is shown that there exists an $α$ and a joint probability distribution $p_{X, Y}$ such that the Augustin--Csisz{\' a}r conditional entropy $H_α^{\mathrm{C}}(X|Y)$ cannot be represented by any $(η,F)$-entropy satisfying \texttt{EAVG}. In contrast, it is represented within the proposed framework. Furthermore, sufficient conditions are derived under which the proposed generalized $g$-conditional entropies satisfy the conditioning reduces entropy property and the data-processing inequality.2026-05-08T11:53:49ZAkira KamatsukaTakahiro Yoshidahttp://arxiv.org/abs/2605.11870v1Information theoretic underpinning of self-supervised learning by clustering2026-05-12T09:50:11ZSelf-supervised learning (SSL) is recognized as an essential tool for building foundation models for Artificial Intelligence applications. The advances in SSL have been made thanks to vigorous arguments about the principles of SSL and through extensive empirical research. The aim of this paper is to contribute to the development of the underpinning theory of SSL, focusing on the deep clustering approach. By analogy to supervised learning, we formulate SSL as K-L divergence optimization.
The mode collapse is prevented by imposing an optimisation constraint on the teacher distribution. This leads to normalization using inverse cluster priors. We show that using Jensen inequality this normalization simplifies to the popular batch centering procedure. Distillation and centering are common {heuristics-based} practices in SSL, {but our work underpins them theoretically.} The theoretical model developed not only supports specific existing successful SSL methods, but also suggests directions for future investigations.2026-05-12T09:50:11ZJosef KittlerSara AtitoMuhammad Awaishttp://arxiv.org/abs/2605.11831v1Maximum Entropy of Sums of Independent Ternary Random Variables2026-05-12T09:21:09ZThe classical problem of maximizing the Shannon entropy of a sum of independent random variables supported on a finite alphabet is considered and settled in the ternary case. Namely, the following theorem is established: if \(X_1,\ldots,X_n\) are independent random variables taking values in \(\{0,1,2\}\), then the entropy of \(S_n=X_1+\cdots+X_n\) is maximized when \(X_1,\ldots,X_{n-1}\) are uniform on \(\{0,2\}\) and the probability mass function of \(X_n\) is given by \(\Prob(X_n=0) = \Prob(X_n=2) = w/2\), \(\Prob(X_n=1) = 1-w\), where \(w = \big(1 + 2^{-H(B_n)+H(B_{n-1})}\big)^{-1}\) and \(B_m\sim \Bin(m,1/2)\). The statement can be seen as an extension to ternary alphabets of the Shepp--Olkin--Mateev theorem. The proof uses the Hermite--Biehler theorem, Newton's inequalities, and Yu's maximum-entropy theorem for ultra-log-concave distributions.2026-05-12T09:21:09Z12 pages, 1 figureStat. Probab. Lett., vol. 238, article no. 110867, 2026Mladen Kovačević10.1016/j.spl.2026.110867http://arxiv.org/abs/2605.11810v1Empirical coordination in the finite blocklength regime: an achievability result---Extended version2026-05-12T09:04:28ZEmpirical coordination offers a way to understand how agents can coordinate actions under communication constraints. This paper investigates the finite blocklength regime of this problem, where the encoder and decoder aim to produce a sequence of action pairs that is jointly typical with respect to a target distribution. Adopting Shannon's random coding argument and leveraging the method of types, we analyze the average performance of a random codebook to establish an achievability result. The resulting bound on the optimal rate is presented both in exact form and as an asymptotic expansion, aligning with the prevailing characterizations in the finite blocklength literature. This work extends finite blocklength analysis to the empirical coordination setting, complementing existing results on strong coordination.2026-05-12T09:04:28ZExtended version of a submission to ITW 2026Olivier MassicotGiulia CerviaMaël Le Treusthttp://arxiv.org/abs/2605.03631v2Design and Analysis of Quantum Dual-Containing CSS LDPC Codes based on Quasi-Dyadic Matrices2026-05-12T08:51:44ZBuilding scalable quantum computers requires quantum error-correcting codes that enable reliable operations in the presence of noise. Motivated by such need, this paper introduces two constructions of high-rate, quantum dual-containing (DC) Calderbank-Shor-Steane (CSS) low-density parity-check (LDPC) codes based on quasi-dyadic matrices. Their DC structure enables the transversal implementation of the Hadamard gate, and, jointly with the sparsity of their parity-check matrices enable low-complexity decoding via a standard binary belief-propagation algorithm. We provide several theoretical results concerning the cycle properties of these CSS codes. We also investigate their automorphism groups as well as their minimum distance. Furthermore, through numerical simulations, we show that the quantum CSS LDPC codes obtained through these constructions achieve better finite-length error rate performance than existing DC codes across different block lengths and code rates.2026-05-05T11:00:37Z14 pages, Journal paperAlessio BaldelliMarco BaldiMassimo BattaglioniFranco ChiaralucePaolo Santinihttp://arxiv.org/abs/2511.01202v5Forget BIT, It is All about TOKEN: Towards Semantic Information Theory for LLMs2026-05-12T07:58:24ZDespite the empirical successes of Large Language Models (LLMs), the prevailing paradigm is heuristic and experiment-driven, tethered to massive compute and data, while a first-principles theory remains absent. This treatise develops a Semantic Information Theory at the confluence of statistical physics, signal processing, and classical information theory, organized around a single paradigm shift: replacing the classical BIT - a microscopic substrate devoid of semantic content - with the macroscopic TOKEN as the atomic carrier of meaning and reasoning. Within this framework we recast attention and the Transformer as energy-based models, and interpret semantic embedding as vectorization on the semantic manifold. Modeling the LLM as a stateful channel with feedback, we adopt Massey's directed information as the native causal measure of autoregressive generation, from which we derive a *directed rate-distortion function for pre-training, a directed rate-reward function for RL-based post-training, and a sub-martingale account of inference-time semantic information flow. This machinery makes precise the identification of next-token prediction with Granger causal inference, and sharpens the limits of LLM reasoning against Pearl's Ladder of Causation - affirming that *whereas the BIT defined the Information Epoch, the TOKEN will define the AI Epoch.2025-11-03T03:56:34ZBo Baihttp://arxiv.org/abs/2605.12564v1Multiport Antenna Q-factor2026-05-12T07:46:21ZThis article proposes an estimate of multiport antenna bandwidth based on a generalization of a single-port Q-factor. The explicit derivation is based on converting the stored energy matrix to its port equivalent and on the port parameters themselves. The work discusses the bandwidth dependencies on feeding and matching. Derived formulas are shown to utilize the total active reflection coefficient and allow for a single-frequency bandwidth evaluation. Examples comprising two different dipole arrays and electrically large patch antenna arrays validate the theory.2026-05-12T07:46:21ZVojtech NeumanMiloslav CapekLukas Jelinekhttp://arxiv.org/abs/2605.11657v1Stepped Frequency Division Multiplexing: A Jump-Free Continuous-Time AFDM Waveform2026-05-12T07:20:03ZAffine frequency division multiplexing (AFDM) has emerged as a promising modulation scheme for doubly selective channels, but its canonical continuous-time realization, referred to herein as piecewise continuous AFDM (PC-AFDM), has been observed to exhibit high out-of-band emission (OOBE) whose mechanism has not been analytically characterized. This paper shows that the underlying cause is frequency wrapping, which introduces internal envelope jumps between AFDM sampling instants and generates a high-frequency spectral tail distinct from ordinary block truncation. To eliminate these discontinuities without altering the inverse discrete affine Fourier transform (IDAFT) output sequence, we propose stepped frequency division multiplexing (SFDM). In SFDM, the instantaneous frequency is kept constant at the midpoint of the wrapped chirp within each sampling interval, while the phase is continuously accumulated across interval boundaries. We prove that, under continuous phase accumulation and without additional phase correction, the midpoint choice is the unique sample-preserving choice for arbitrary chirp-rate parameter. The resulting waveform is continuous within each AFDM block, reduces OOBE, and preserves the standard AFDM modulation matrix, guard-interval structure, and receiver processing. Moreover, under fractional-delay propagation, SFDM mitigates the receiver sensitivity that arises when delayed sampling points fall near wrapping-induced discontinuities in PC-AFDM. Numerical results verify the theoretical tail coefficients, demonstrate OOBE reduction, and show improved receiver robustness in the high-percentile and worst-case regimes. These findings establish SFDM as a spectrally cleaner and more reliable physical layer for AFDM systems.2026-05-12T07:20:03ZYewen CaoYulin Shao