https://arxiv.org/api/rTXewoMX5zGQK7ETds+64B5hKRc2026-06-25T22:42:03Z5492284015http://arxiv.org/abs/2604.21909v2Directional Confusions Reveal Divergent Inductive Biases Through Rate-Distortion Geometry in Human and Machine Vision2026-05-14T17:52:52ZTo humans, a robin seems more like a bird than a bird seems like a robin, but does this asymmetry also hold for machine vision? Humans and modern vision models can match each other in accuracy while making systematically different kinds of errors, differing not in how often they fail, but in who gets mistaken for whom. We show these directional confusions reveal distinct inductive biases invisible to accuracy alone. Using matched human and deep neural network responses on a natural-image categorization task under 12 perturbation types, we quantify asymmetry in confusion matrices and link its organization to the geometry of the information--error trade-off - how efficiently, and how gracefully, a system generalizes under distortion. We find that humans exhibit broad but weak asymmetries across many class pairs, whereas deep vision models show sparser, stronger directional collapses into a few dominant categories. Robustness training reduces overall asymmetry magnitude but fails to recover this human-like distributed structure. Generative simulations further show that these two asymmetry organizations shift the trade-off geometry in opposite directions even at matched accuracy, explaining why the same scalar asymmetry score can reflect fundamentally different generalization strategies. Together, these results establish directional confusion structure as a sensitive, interpretable signature of inductive bias that accuracy-based evaluation cannot recover.2026-04-23T17:52:16ZLeyla Roksan CaglarPedro A. M. MedianoBaihan Linhttp://arxiv.org/abs/2605.15135v1Deep Mixture of Experts Network for Resource Optimization in Aerial-Terrestrial CF-mMIMO Systems under URLLC2026-05-14T17:42:17ZAs a critical component of sixth-generation (6G) wireless networks, ultra-reliable and low-latency communication (URLLC) is expected to support real-time and reliable information exchange in low-altitude environments. However, achieving URLLC often incurs significant resource overhead, including increased bandwidth consumption, higher transmit power, and denser access point (AP) deployment, which pose significant challenges to both spectral efficiency (SE) and energy efficiency (EE). Besides, existing iterative optimization algorithms are computationally intensive and struggle to meet the latency requirements of URLLC. To address these challenges, we propose a hybrid aerial-terrestrial cell-free massive MIMO (CF-mMIMO) network to support diverse services, along with a channel prediction network and a deep mixture of experts (MoE) network for uplink optimization. First, we design a channel prediction network (CP-Net) to mitigate channel aging caused by high-mobility user equipment (UE). CP-Net employs three Transformer-based sub-networks for aged channel state information (CSI) prediction, while a channel quality-aware loss function is introduced to improve the prediction accuracy of weak links. Based on the predicted CSI, we develop a deep MoE network (MoE-Net) for power allocation comprising three expert models targeting different objectives. Then, we introduce a weighted gating network (WT-Net) to learn an efficient adaptive combination of expert outputs. The proposed framework better captures heterogeneous UE requirements and improves communication performance under URLLC constraints. Numerical results demonstrate the effectiveness of the proposed method.2026-05-14T17:42:17Z15 pages, accepted for publication in IEEE Transactions on Wireless CommunicationsDonggen LiChong HuangJingfu LiPei XiaoWenjiang FengDusit NiyatoZhu Hanhttp://arxiv.org/abs/2605.15110v1Proposal and study of statistical features for string similarity computation and classification2026-05-14T17:27:04ZAdaptations of features commonly applied in the field of visual computing, co-occurrence matrix (COM) and run-length matrix (RLM), are proposed for the similarity computation of strings in general (words, phrases, codes and texts). The proposed features are not sensitive to language related information. These are purely statistical and can be used in any context with any language or grammatical structure. Other statistical measures that are commonly employed in the field such as longest common subsequence, maximal consecutive longest common subsequence, mutual information and edit distances are evaluated and compared. In the first synthetic set of experiments, the COM and RLM features outperform the remaining state-of-the-art statistical features. In 3 out of 4 cases, the RLM and COM features were statistically more significant than the second best group based on distances (P-value < 0.001). When it comes to a real text plagiarism dataset, the RLM features obtained the best results.2026-05-14T17:27:04ZInternational Journal of Data Mining, Modelling and Management, 2020E. O. RodriguesD. CasanovaM. TeixeiraV. PegoriniF. FavarimE. CluaA. ConciPanos Liatsishttp://arxiv.org/abs/2601.08929v3A Global Characterization of $f$-Divergences Yielding PSD Mutual-Information Matrices2026-05-14T17:02:37ZGiven $n$ random variables, when does the matrix of pairwise $f$-mutual informations define a PSD kernel over variables? For convex finite generators $f:(0,\infty)\to\mathbb{R}$ with $f(1)=0$ and finite boundary value $f(0)$, we give a closed characterization up to linear transformation $f\sim f+c(t-1)$, which leaves every $f$-divergence and every $f$-mutual-information matrix unchanged. The matrix $M^{(f)}_{ij}:=I_f(X_i;X_j)$ is PSD for every finite-alphabet family if and only if the normalized representative has a globally convergent expansion $\bar f(t)=\sum_{m\ge2}a_m(t-1)^m$, with $a_m\ge0$, on all of $(0,\infty)$. Sufficiency follows from a replica embedding for monomial generators plus closure under nonnegative mixtures. Necessity first extracts the local Taylor cone at $1$ using biased three-point kernels $H_a$, the Belton--Guillot--Khare--Putinar (BGKP) low-rank Hankel positivity-preserver theorem, and then bootstraps analyticity to the divergence. This is a kernel characterization problem, not a metric one: PSD of the variable-indexed matrix is distinct from Hilbertian properties of divergences between distributions. The result explains why Shannon MI and Jensen--Shannon fail, why $χ^2$ succeeds, and why non-analytic divergences such as total variation and ReLU are excluded.2026-01-13T19:09:19ZRevised main theorem and proof; fixes the local-to-global step and proves that the local analytic expansion extends to positive real lineZachary Robertsonhttp://arxiv.org/abs/2501.17473v2Remote State Estimation over a Wearing Channel: Information Freshness vs. Channel Aging2026-05-14T15:58:14ZWe study the remote estimation of a linear Gaussian system over a channel that wears out over time and with every use. The sensor can either transmit a fresh measurement in the current time slot, restore the channel quality at the cost of downtime, or remain silent. Frequent transmissions yield accurate estimates but incur significant wear on the channel. Renewing the channel too often improves channel conditions but results in poor estimation quality. What is the optimal timing to transmit measurements and restore the channel? This problem is formulated as a semi-Markov decision process (SMDP). We establish monotonicity properties of the optimal policy and propose structure-aware solution methods.2025-01-29T08:33:48ZThis paper has been accepted for publication in IEEE Transactions on Automatic ControlJiping LuoGeorge StamatakisOsvaldo SimeoneNikolaos Pappashttp://arxiv.org/abs/2202.05568v2Change of measure through the Legendre transform2026-05-14T15:08:56ZPAC-Bayes generalisation bounds are derived via change-of-measure inequalities that transfer concentration properties from a reference measure to all posterior measures. The specific choice of change of measure determines the assumptions required on the empirical risk; in particular, the classical Donsker--Varadhan theorem leads to bounds relying on bounded exponential moments. We study change-of-measure inequalities based on \(f\)-divergences, obtained by combining the Legendre transform of \(f\) with the Fenchel--Young inequality. Beyond their intrinsic interest in probability theory, we show how these inequalities are helpful in learning theory and yield PAC-Bayes bounds under tailored assumptions on the empirical risk, thereby extending the range of conditions under which PAC-Bayesian guarantees can be established.2022-02-11T11:53:28Z27 pagesAntoine Picard-WeibelBenjamin Guedjhttp://arxiv.org/abs/2605.14917v1A Mutual Information Lower Bound for Multimodal Regression Active Learning2026-05-14T14:50:47ZActive learning for continuous regression has lacked an acquisition function that targets epistemic uncertainty when the predictive distribution is multimodal: variance misses modal disagreement, and information-theoretic targets like BALD are designed for discrete outputs. We introduce a Two-Index framework that makes this separation explicit: one stochastic index selects among competing model hypotheses (epistemic source), while a second governs within-hypothesis randomness (aleatoric source). An entropy decomposition within the framework identifies the mutual information between the output and the epistemic index as a principled acquisition objective, and we prove this quantity vanishes as the model is trained on growing datasets, confirming that it captures exactly the uncertainty data can resolve. Because this mutual information is intractable for continuous outputs, we derive the Mutual Information Lower Bound (MI-LB) acquisition function, a closed-form approximation for Mixture Density Network ensembles. On benchmarks featuring multimodal systems, MI-LB matches or beats every baseline evaluated and is the only method to do so consistently -- geometric and Fisher-based baselines compete only when the input space already encodes the multimodality, and collapse otherwise.2026-05-14T14:50:47ZLeonardo Ferreira GuilhotoAkshat KaushalParis Perdikarishttp://arxiv.org/abs/2605.14823v1A class of optimal authentication codes with secrecy2026-05-14T13:32:35ZIn this paper, a class of linear authentication codes with secrecy are constructed, which have simple encoding rules and are easy to implement. Based on the special Weil sum, the maximum success probabilities of substitution attack and impersonation attack are calculated, and these codes are proven to be asymptotically optimal with respect to certain bounds.2026-05-14T13:32:35ZHaibo LiuChengzhi WeiQunying Liaohttp://arxiv.org/abs/2511.19289v2Performance Guarantees for Quantum Neural Estimation of Entropies2026-05-14T10:40:37ZEstimating quantum entropies and divergences is an important problem in quantum physics, information theory, and machine learning. Quantum neural estimators (QNEs), which utilize a hybrid classical-quantum architecture, have recently emerged as an appealing computational framework for estimating these measures. Such estimators combine classical neural networks with parametrized quantum circuits, and their deployment typically entails tedious tuning of hyperparameters controlling the sample size, network architecture, and circuit topology. This work initiates the study of formal guarantees for QNEs of measured (Rényi) relative entropies in the form of non-asymptotic error risk bounds. We further establish exponential tail bounds showing that the error is sub-Gaussian and thus sharply concentrates about the ground truth value. For an appropriate sub-class of density operator pairs on a space of dimension $d$ with bounded Thompson metric, our theory establishes a copy complexity of $O(|Θ(\mathcal{U})|d/ε^2)$ for QNE with a quantum circuit parameter set $Θ(\mathcal{U})$, which has minimax optimal dependence on the accuracy $ε$. Additionally, if the density operator pairs are permutation invariant, we improve the dimension dependence above to $O(|Θ(\mathcal{U})|\mathrm{polylog}(d)/ε^2)$. Our theory aims to facilitate principled implementation of QNEs for measured relative entropies and guide hyperparameter tuning in practice.2025-11-24T16:36:06Z43 pagesQuantum 10, 2113 (2026)Sreejith SreekumarZiv GoldfeldMark M. Wilde10.22331/q-2026-05-21-2113http://arxiv.org/abs/2605.14625v1Digital Twin Synchronization Over Mobile Embodied AI Network With Agentic Intelligence2026-05-14T09:39:48ZEfficient digital twin (DT) synchronization relies on maintaining high-fidelity virtual representations with minimal age of information (AoI). However, the synergistic potential of cooperative sensing and autonomous mobility of the sensing agent remains underexplored in existing DT synchronization frameworks. In this paper, we propose an agentic AI-empowered mobile embodied AI network (MEAN) framework for DT synchronization. In the proposed hybrid architecture, the base station (BS) conducts global orchestration, while the agents autonomously execute a five-stage closed-loop workflow: move-to-sense, cooperative sensing, onboard semantic processing, channel-aware mobility, and uplink transmission. To optimize synchronization performance, we formulate a joint topology dispatching and multidimensional resource allocation problem aimed at minimizing the maximum twin deviation across regions, subject to heterogeneous sensing fidelity and energy budget constraints. To tackle this, we develop a hierarchical two-layer optimization algorithm, where the outer-layer refines multi-agent assignment via a dynamic matching game, and the inner-layer iteratively optimizes the continuous resources. Extensive simulation results verify the convergence of the proposed algorithm and demonstrate its substantial superiority over multiple baseline schemes in reducing synchronization deviation. Furthermore, the results reveal that semantic compression serves as a vital substitute for channel resources in latency reduction under constrained bandwidth, while autonomous velocity adaptation provides an essential degree of freedom for the system to navigate the fundamental energy-time trade-off.2026-05-14T09:39:48ZZhouxiang ZhaoJiaxiang WangYahao DingYinchao YangZhaohui YangMohammad Shikh-BahaeiJulie A. McCannZhaoyang ZhangKaibin Huanghttp://arxiv.org/abs/2605.14603v1Quaternary codes with new parameters from two-generator simplicial complexes2026-05-14T09:18:46ZIn this article, we construct infinite families of quaternary (that is, over the ring $\mathbb{Z}_4$) $\mathcal{C}_{D}$-codes, where the defining set $D$ is derived utilizing a two-generator simplicial complex, and determine their Lee weight distributions. As a result, we find at least 32 new or improved quaternary linear codes as per the database \cite{aydin2022updated} of best-known quaternary codes, including codes from a Plotkin-optimal family. We also report 6 projective quaternary linear codes with best-known parameters that might outperform the currently reported best-known codes due to their projectivity. Further, we establish necessary and sufficient conditions for their Gray image to be linear, which in turn gives an infinite family of Griesmer codes and several infinite families of minimal binary linear codes.2026-05-14T09:18:46ZAnkit YadavNilay Kumar MondalRitumoni Sarmahttp://arxiv.org/abs/2602.19292v2Strategic Gaussian Signaling under Linear Sensitivity Mismatch2026-05-14T09:01:27ZWe analyze Stackelberg Gaussian signaling games where the encoder and decoder have a linear sensitivity mismatch. Unlike the standard additive-bias model, a sensitivity mismatch means the encoder prefers the decoder to track a linear transformation of the state rather than a shifted one. We derive the equilibrium structure for both noiseless (cheap-talk) and noisy signaling channels. In the noiseless case, the equilibrium admits a spectral characterization: the encoder transmits information only along eigenspaces associated with the negative eigenvalues of a mismatch matrix. In the noisy regime, we derive analytical thresholds for informative signaling, showing that communication collapses if the sensitivity mismatch or transmission cost exceeds a channel-dependent threshold.2026-02-22T17:59:27ZAccepted to the 23rd IFAC World Congress (2026). This is an extended version containing full proofsHassan MunifVineeth Satheeskumar VarmaSamson Lasaulcehttp://arxiv.org/abs/2605.03823v2Realizable Bayes-Consistency for General Metric Losses2026-05-14T08:42:31ZWe study strong universal Bayes-consistency in the realizable setting for learning with general metric losses, extending classical characterizations beyond $0$-$1$ classification (Bousquet et al., 2020; Hanneke et al., 2021) and real-valued regression (Attias et al., 2024). Given an instance space $(X,ρ)$, a label space $(Y,\ell)$ with possibly unbounded loss, and a hypothesis class $H \subseteq Y^{X}$, we resolve the realizable case of an open problem presented in Tsir Cohen and Kontorovich (2022). Specifically, we find the necessary and sufficient conditions on the hypothesis class $H$ under which there exists a distribution-free learning rule whose risk converges almost surely to the best-in-class risk (which is zero) for every realizable data-generating distribution. Our main contribution is this sharp characterization in terms of a combinatorial obstruction: Similarly to Attias et al. (2024), we introduce the notion of an infinite non-decreasing $(γ_k)$-Littlestone tree, where $γ_k \to \infty$. This extends the Littlestone tree structure used in Bousquet et al. (2020) to the metric loss setting.2026-05-05T14:50:55Z14 pages. To appear in Proceedings of the 43rd International Conference on Machine Learning (ICML 2026); v2: fixed abstract metadata renderingDan Tsir CohenSteve HannekeAryeh Kontorovichhttp://arxiv.org/abs/2507.23180v2The Construction of Near-optimal Universal Coding of Integers2026-05-14T08:16:41ZThe Universal Coding of Integers~(UCI) is suitable for discrete memoryless sources with unknown probability distributions and infinitely countable alphabet sizes. A UCI is a class of prefix codes for which the ratio of the average codeword length to $\max\{1,H(P)\}$ is within a constant expansion factor \textcolor{red}{$C_{\mathcal{C}}$} for any decreasing probability distribution $P$, where $H(P)$ is the entropy of $P$. For any UCI code $\mathcal{C}$, \emph{the minimum expansion factor} \textcolor{red}{$C_{\mathcal{C}}^{*}$} is defined to represent the infimum of the set of extension factors of $\mathcal{C}$. Each $\mathcal{C}$ has a unique corresponding \textcolor{red}{$C_{\mathcal{C}}^{*}$}, and the smaller \textcolor{red}{$C_{\mathcal{C}}^{*}$} is, the better the compression performance of $\mathcal{C}$ is. The class of UCIs $\mathcal{C}$ (or a family $\{\mathcal{C}_i\}_{i=1}^{\infty}$) that achieves the smallest \textcolor{red}{$C_{\mathcal{C}}^{*}$} is defined as the \emph{optimal UCI}. The best current result is that the range of $C_{\mathcal{C}}^{*}$ for the optimal UCI is $2\leq C_{\mathcal{C}}^{*}\leq 2.5$. In this paper, we prove a tighter probability inequality for decreasing distributions, which serves as a new tool for studying the properties of UCIs. On the basis of this inequality, we prove that there exists a class of near-optimal UCIs, called the $ν$ code, achieving \textcolor{red}{$C_ν=2.0386$}. This narrows the range of the minimum expansion factor for the optimal UCI to $2\leq C_{\mathcal{C}}^{*}\leq 2.0386$. We show that the $ν$ code is currently optimal in terms of the minimum expansion factor. In addition, we propose a new proof showing that the minimum expansion factor of the optimal UCI is lower bounded by $2$.2025-07-31T01:21:06ZWei YanYunghsiang S. Hanhttp://arxiv.org/abs/2605.14451v1CP-OFDM Achieves Lower Ranging CRB Than Frequency-Spread Waveforms in the Large-Sample Regime2026-05-14T06:45:31ZThe inherent randomness of communication symbols creates a fundamental tension in Integrated Sensing and Communications (ISAC). On the one hand, they enable data transmission while allowing sensing to fully reuse communication resources. On the other hand, their randomness induces waveform-dependent fluctuations that directly affect sensing accuracy. This paper investigates a foundational question arising from this tradeoff: \textit{How does the modulation waveform affect the ranging Cramér--Rao Bound (CRB) when sensing reuses random data symbols?} We address this question by revealing a structural factorization of the Fisher information matrix (FIM) for joint delay-amplitude estimation, which separates the deterministic Jacobian of the target geometry from the random frequency-domain signal power induced by the data symbols. This structure yields a Jensen-type universal lower bound on the CRB, which is exactly attained by CP-OFDM under PSK constellations. For QAM and broader sub-Gaussian constellations, we develop an asymptotic perturbation analysis of the inverse FIM and prove that, when the number of transmitted symbols $N$ grows large, CP-OFDM achieves a lower ranging CRB than any frequency-spread orthogonal waveform over the almost-sure event where the random FIM is invertible. This superiority is further extended to amplitude estimation and full joint delay-amplitude estimation. We also characterize the local geometry of the stochastic CRB minimization problem over the unitary group. The analysis reveals that CP-OFDM is a stationary point for finite $N$, and its Riemannian Hessian is positive semidefinite for sufficiently large $N$, establishing its asymptotic local optimality. Numerical results confirm that OFDM outperforms representative waveforms including SC, OTFS, and AFDM.2026-05-14T06:45:31Z20 pages, 1 figure, submitted to IEEE for possible publicationFan LiuYifeng XiongYa-Feng LiuJie YangChristos MasourosShi Jin