https://arxiv.org/api/D6L1oyVq42oTBWGic5NOq664zPA 2026-06-10T08:33:48Z 183838 165 15 http://arxiv.org/abs/2606.10569v1 Hidden Consensus:Preference-Validity Compression in Human Feedback 2026-06-09T08:32:11Z Standard RLHF pipelines often reduce heterogeneous human judgments into a single scalar reward target. We argue that this reduction can mis-measure alignment in structurally plural societies, where disagreement may reflect culturally, historically, linguistically, regionally, or normatively grounded interpretations rather than annotation noise. We call this failure Preference-Validity Compression, the collapse of multiple plural-valid response options into a single optimization target. Using Malaysia as a diagnostic setting, we analyze RLHF-style feedback aggregation through preference events linking prompts, responses, and acceptability judgments across interpretive frames. Across 321 preference events from 20 participants and 107 trio-annotated prompts, 79% of prompts contain more than one majority-supported response that single-winner aggregation would discard, and apparent dominance gaps between top responses diminish when all majority-supported options are considered. Participants frequently select multiple acceptable responses, and discarded responses demonstrably reflect coherent local, practical, or cultural frames. These findings show that majority aggregation in this corpus measures argmax acceptability rather than plural alignment. We treat this as a measurement-validity issue and argue that future alignment methods should satisfy Validity-Preserving Consistency, remaining stable across plural-valid interpretive frames rather than collapsing them into a single reward target. 2026-06-09T08:32:11Z 28 pages. When AI learns from human feedback, it forces a single "correct" answer, but sometimes multiple answers are all genuinely valid, and that nuance gets thrown away Dorcas Chia Ern Chua Karen Myn Hui Lee Jia Yue Tan Zhen Xue Gue Norzalena Abdul Hamid Azima Binti Azmi Keat Mei Yeong Aizat Izyani binti Mujab Hafsah Noor Azam Chee Guo Khoo Han Ying Lim Chee Seng Chan http://arxiv.org/abs/2606.10554v1 Benchmarking Knowledge Editing using Logical Rules 2026-06-09T08:21:56Z Large Language Models (LLMs) are increasingly deployed in real-world applications that require access to up-to-date knowledge. However, retraining LLMs is computationally expensive. Therefore, knowledge editing techniques are crucial for maintaining current information and correcting erroneous assertions within pre-trained models. Current benchmarks for knowledge editing primarily focus on recalling edited facts, often neglecting their logical consequences. To address this limitation, we introduce a new benchmark designed to evaluate how knowledge editing methods handle the logical consequences of a single fact edit. Our benchmark extracts relevant logical rules from a knowledge graph for a given edit. Then, it generates multi-hop questions based on these rules to assess the impact on logical consequences. Our findings indicate that while existing knowledge editing approaches can accurately insert direct assertions into LLMs, they frequently fail to inject entailed knowledge. Specifically, experiments with popular methods like ROME and FT reveal a substantial performance gap, up to 24%, between evaluations on directly edited knowledge and on entailed knowledge. This highlights the critical need for semantics-aware evaluation frameworks in knowledge editing. 2026-06-09T08:21:56Z Accepted at the 24th International Semantic Web Conference 2025 The Semantic Web. ISWC 2025. ISWC 2025. Lecture Notes in Computer Science, vol 16141. Springer, Cham Tatiana Moteu Ngoli NDah Jean Kouagou Hamada M. Zahera Axel-Cyrille Ngonga Ngomo 10.1007/978-3-032-09530-5_3 http://arxiv.org/abs/2606.10543v1 Flexible Flows for Biological Sequence Design 2026-06-09T08:11:14Z Designing functional biological sequences requires navigating vast discrete spaces under strict evolutionary and biophysical constraints. Discrete Flow Matching (DFM) offers a generative framework over such spaces, but existing approaches rely on biologically uninformative couplings and offer limited flexibility for variable-length sequence generation and fine-grained control. We propose a structured coupling that encodes domain-specific preferences among sequence elements, biasing the source distribution toward plausible regions without modifying the flow objective or training procedure. Building on this, we introduce a latent edit-based rate parameterization that models variable-length generation via edit operations conditioned on a shared global latent, akin to a latent variable model, while remaining tractable. We further introduce a latent classifier-free guidance mechanism that steers generation coherently in continuous latent space, along with Dirichlet-prior temperature scaling for test-time control over edit operations. Our method achieves state-of-the-art performance across diverse biological sequence tasks, including density estimation, unconditional and conditional DNA sequence generation, and peptide sequence generation. 2026-06-09T08:11:14Z Yogesh Verma Dani Korpela Harri Lähdesmäki Vikas Garg http://arxiv.org/abs/2606.09677v2 MeCo: One-Step MeanFlow-based Corrector for Multi-Channel Speech Separation 2026-06-09T08:08:09Z While discriminative models for multi-channel speech separation excel in reference-based metrics, they often exhibit suboptimal human listening quality. To address this, we propose a novel MeanFlow-based one-step generative corrector (MeCo). MeCo learns a conditional average velocity field to map discriminative estimates directly onto the clean speech manifold in a single step. To maximize one-step generation performance, we introduce Data-Space Optimization (DSO). DSO integrates an $\mathbf{x}_r$-loss, which penalizes prediction errors on longer displacement intervals to serve as a generative objective for human listening quality, with an Endpoint SI-SDR loss that directly optimizes terminal signal fidelity. Experiments demonstrate that MeCo achieves state-of-the-art (SOTA) performance with minimal computational overhead, simultaneously achieving superior signal fidelity and human listening quality in both in-domain and out-of-domain scenarios. 2026-06-08T15:58:31Z 5 pages, accepted to Interspeech 2026 Dohwan Kim Jung-Woo Choi http://arxiv.org/abs/2606.10532v1 ActiveMem: Distributed Active Memory for Long-Horizon LLM Reasoning 2026-06-09T08:03:38Z Memory is essential for enabling large language model (LLM) agents to handle long-horizon reasoning tasks. Existing memory mechanisms are largely centralized, typically organizing retrieved information and interaction history within a single model context. This design imposes a fundamental trade-off: scaling reasoning trajectories risks context overload, whereas aggressive content pruning may result in irreversible information loss. Seeking a better trade-off, we draw inspiration from human cognitive systems, especially the functional complementarity between the prefrontal cortex (executive control) and the hippocampus (memory management), suggesting that such a trade-off need not be inherent, but may instead stem from centralized memory organization. To this end, we propose ActiveMem, a heterogeneous framework that decouples agent memory from the core reasoning process. Specifically, a high-level Planner utilizes distilled semantic gists to execute reasoning, while a lightweight, distributed memory system operates in parallel to actively accumulate and consolidate these gists throughout the task. Experiments on BrowseComp-Plus and GAIA show that ActiveMem achieves state-of-the-art accuracy with significantly reduced overhead, demonstrating the effectiveness of distributed active memory for long-horizon reasoning. 2026-06-09T08:03:38Z Yunhan Jiang Wenbin Duan Shasha Guo Liang Pang Xiaoqian Sun Huawei Shen http://arxiv.org/abs/2606.10531v1 LC-QAT: Data-Efficient 2-Bit QAT for LLMs via Linear-Constrained Vector Quantization 2026-06-09T08:02:56Z Quantization-aware training (QAT) is essential for extremely low-bit large language models (LLMs). Current QAT methods are mainly based on scalar quantization (SQ), which enables efficient optimization but suffers from severe performance degradation at 2-bit precision. On the other hand, vector quantization (VQ) provides substantially higher representational capacity, but its discrete codebook lookup prevents end-to-end training. We propose LC-QAT, a 2-bit weight-only VQ-QAT framework that represents quantized weights via a learned affine mapping over discrete vectors, which yields a high-quality PTQ initialization and enables fully differentiable end-to-end optimization without explicit codebook lookup in the training forward pass. This strong post-training initialization makes LC-QAT highly data-efficient. Experiments across diverse LLMs demonstrate that LC-QAT consistently outperforms state-of-the-art QAT methods while using only 0.1%--10% of the training data. Our results establish LC-QAT as a practical and scalable solution for extreme low-bit model deployment. 2026-06-09T08:02:56Z Accepted by ICML 2026 Haoyu Wang Xingyu Yu Haiyan Zhao Fengxiang Wang Xu Han http://arxiv.org/abs/2606.10530v1 Machine Learning Methods for Studying Latent Neural Activity Dynamics 2026-06-09T08:01:11Z Recent developments in brain recording are driving a demand for machine learning tools capable of decoding the latent structure of large populations of neurons. In this paper, we provide a comprehensive survey that outlines the trajectory of Latent Variable Models (LVMs) from early state-space models to more recent deep generative models. We organize the literature into three closely related domains: (1) Single-Region Latent Dynamics, which includes models such as linear dynamical systems to more complex dynamics represented by Recurrent Neural Networks (RNNs) and Neural Ordinary Differential Equations (ODEs); (2) Multi-Region Communication, which employs probabilistic as well as subspace methods to study how information is transferred across different brain areas considering synaptic propagation delays and network connectivity; and (3) Behavior-Aligned Modeling, which seeks to disentangle neural activity related to task performance from other internal states via supervised or contrastive learning. This survey also includes large-scale neural foundation models, such as Transformers and diffusion models, that rely on large-scale pre-training for optimal performance across subjects. Finally, we conclude and discuss benchmarks, evaluation criteria, and open challenges, such as the ability to identify causal links or directionality of communication, to facilitate future research for bridging interpretable brain dynamics with reliable neural decoding. 2026-06-09T08:01:11Z Accepted by IJCAI 2026 survey track Shufeng Kong Fumei Deng Xinyi Dong Caihua Liu Weiwei Chen Yingheng Wang Daniel Cao Azahara Oliva Antonio Fernandez-Ruiz Carla Gomes http://arxiv.org/abs/2605.11458v3 Adaptive Teacher Exposure for Self-Distillation in LLM Reasoning 2026-06-09T07:59:58Z On-policy self-distillation has become a strong recipe for LLM reasoning, where a privileged teacher supervises the student's own rollouts while conditioning on the reference solution. A design choice shared by nearly all such methods, however, has gone unquestioned: the teacher always sees the full reference reasoning. We argue that this default itself is part of the problem and identify a teacher-side exposure mismatch: when the teacher conditions on reasoning far beyond the student's current competence, the resulting token targets become too strong to absorb. A controlled fixed-exposure sweep makes this concrete on two fronts: 1) full exposure is not reliably the best choice, and 2) student-teacher mismatch grows monotonically as the teacher sees more privileged reasoning. This motivates treating teacher exposure not as a fixed hyperparameter but as a learnable training-time control variable. We therefore propose Adaptive Teacher Exposure for Self-Distillation (ATESD). ATESD models the reveal ratio with a lightweight Beta-policy controller conditioned on compact training-state statistics, and uses one sampled exposure for a short hold window of student updates. To make this exposure controller learnable, we optimize it with a discounted learning-progress reward that scores each held decision by its effect on the student's future improvement rather than its immediate loss change, addressing the delayed credit assignment induced by on-policy distillation. Experiments on AIME 24, AIME 25, and HMMT 25 across Qwen3-{1.7B, 4B, 8B} show that ATESD consistently outperforms competitive self-distillation and RL baselines, improving over OPSD by +0.95, +2.05, and +2.33 Average@12 points respectively, and establishing adaptive teacher exposure as an effective new axis for reasoning self-distillation. 2026-05-12T03:15:58Z 11 pages, 4 figures; code not released yet Zihao Han Tiangang Zhang Huaibin Wang Yilun Sun http://arxiv.org/abs/2501.00745v3 Dynamics of Adversarial Attacks on Large Language Model-Based Search Engines 2026-06-09T07:56:59Z The increasing integration of Large Language Model (LLM) based search engines has transformed the landscape of information retrieval. However, these systems are vulnerable to adversarial attacks, especially ranking manipulation attacks, where attackers craft webpage content to manipulate the LLM's ranking and promote specific content, gaining an unfair advantage over competitors. In this paper, we study the dynamics of ranking manipulation attacks. We frame this problem as an Infinitely Repeated Prisoners' Dilemma, where multiple players strategically decide whether to cooperate or attack. We analyze the conditions under which cooperation can be sustained, identifying key factors such as attack costs, discount rates, attack success rates, and trigger strategies that influence player behavior. We identify tipping points in the system dynamics, demonstrating that cooperation is more likely to be sustained when players are forward-looking. However, from a defense perspective, we find that simply reducing attack success probabilities can, paradoxically, incentivize attacks under certain conditions. Furthermore, defensive measures to cap the upper bound of attack success rates may prove futile in some scenarios. These insights highlight the complexity of securing LLM-based systems. Our work provides a theoretical foundation and practical insights for understanding and mitigating their vulnerabilities, while emphasizing the importance of adaptive security strategies and thoughtful ecosystem design. 2025-01-01T06:23:26Z New Frontiers in Game-Theoretic Learning Workshop, International Conference on Machine Learning (ICML) 2026 Xiyang Hu http://arxiv.org/abs/2606.10525v1 Assessing Automated Prompt Injection Attacks in Agentic Environments 2026-06-09T07:54:58Z Indirect prompt injection poses a critical threat to LLM agents that interact with untrusted external data, yet automated attack methods--proven effective for jailbreaking--remain underexplored in realistic agentic settings. We present a comprehensive empirical evaluation of automated prompt injection attacks against LLM agents, adapting both white-box (GCG) and black-box (TAP) methods to the agentic setting within the AgentDojo framework. We evaluate across 80 task pairs spanning four domains and multiple models, and find that black-box optimization substantially outperforms gradient-based methods, a gap we attribute to GCG's optimization instability under reasonable compute budgets. We also find that TAP's effectiveness depends on the attacker model, as both general capability and safety tuning affect attack success--stronger models produce more effective injections, while safety-tuned attackers can refuse to generate adversarial prompts. Task-universal attacks transfer effectively to unseen tasks and out-of-distribution domains, but attacks optimized on smaller open-source models do not transfer to frontier models like GPT-5. These findings highlight automated prompt injection as a credible but model-dependent threat, with significant barriers remaining for model-agnostic exploitation. 2026-06-09T07:54:58Z David Hofer Edoardo Debenedetti Florian Tramèr http://arxiv.org/abs/2603.04852v2 Non-Parametric Structural Priors for Geometry Theorem Prediction 2026-06-09T07:41:14Z Multi-step theorem prediction is a central challenge in geometry problem solving. Existing neural-symbolic approaches rely heavily on supervised parametric models, which exhibit limited generalization to evolving theorem libraries. In this work, we explore training-free theorem prediction through the lens of in-context learning (ICL). We identify a critical scalability bottleneck, termed Structural Drift: as reasoning depth increases, the performance of vanilla ICL degrades sharply, often collapsing to near zero. We attribute this failure to the LLM's inability to recover latent topological dependencies, leading to unstructured exploration. To address this issue, we propose Theorem Precedence Graphs, which encode temporal dependencies from historical solution traces as directed graphs, and impose explicit topological constraints that effectively prune the search space during inference. Coupled with retrieval-augmented graph construction and a stepwise symbolic executor, our approach enables LLMs to act as structured planners without any gradient-based optimization. Experiments on the FormalGeo7k benchmark show that our method achieves 89.29% accuracy, substantially outperforming ICL baselines and matching state-of-the-art supervised models. These results indicate that explicit structural priors offer a promising direction for scaling LLM-based symbolic reasoning. 2026-03-05T06:08:50Z Junbo Zhao Ting Zhang Can Li Wei He Jingdong Wang Hua Huang http://arxiv.org/abs/2606.10507v1 HIPIF: Hierarchical Planning and Information Folding for Long-Horizon LLM Agent Learning 2026-06-09T07:35:14Z While Large Language Models (LLMs) have demonstrated strong capabilities as autonomous agents across a wide range of tasks, their performance often degrades in multi-turn long-horizon agentic tasks. Existing methods have made progress through fine-grained credit assignment to alleviate long-horizon sparse rewards and hierarchical reinforcement learning to decompose tasks and reduce long-term dependency. However, these methods still do not directly address long-context interference, in which continuously growing histories weaken the agent's ability to track the global task state and impair subsequent reasoning and decision-making. Inspired by the way humans handle complex tasks through subgoal decomposition and completed progress summarization, we propose Hierarchical Planning and Information Folding (HIPIF) for long-horizon LLM agent learning. HIPIF trains the agent end-to-end to organize long-horizon execution around explicit subgoals while folding completed subgoal histories to reduce long-context interference. Furthermore, to stabilize subgoal-based planning and execution, HIPIF combines hierarchical reflection and subgoal-oriented process rewards to guide subgoal generation, transition, and execution, without relying on costly auxiliary models or task-specific expert trajectories. Extensive experiments on three publicly available agentic benchmarks demonstrate the validity of our method. 2026-06-09T07:35:14Z Juncheng Diao Zhicong Lu Peiguang Li Yongwei Zhou Changyuan Tian Qingbin Li Rongxiang Weng Jingang Wang Xunliang Cai http://arxiv.org/abs/2606.06493v3 HANDOFF: Humanoid Agentic Task-Space Whole-Body Control via Distilled Complementary Teachers 2026-06-09T07:34:32Z For a humanoid robot to be deployed in the real world, the choice of command space (i.e., the interface between task planning and whole-body control) is crucial. Existing whole-body controllers typically demand dense kinematic or spatial references that planners struggle to synthesize from task semantics. We instead propose a compact, explicit interface that is intuitive, general, modular, and expressive enough for diverse loco-manipulation skills. To this end, we introduce HANDOFF, a single humanoid whole-body controller that follows this interface and is distilled via multi-teacher KL distillation under a context-conditioned gating scheme into a mixture-of-experts student from three complementary specialists: whole-body motion tracking with safety-filtered data, locomotion, and fall-recovery. On the Unitree G1, HANDOFF matches state-of-the-art velocity tracking and offers one of the largest robust manipulation workspaces. We further demonstrate hardware feasibility through multiple natural-language-driven task roll-outs, powered by a VLM-driven agentic planner with no task-specific data or controller fine-tuning. 2026-06-04T17:59:50Z 22 pages, 9 figures, Project page: https://lzyang2000.github.io/HANDOFF/ Lizhi Yang Junheng Li Nehar Poddar Yiling Hou Gio Huh Robert Griffin Georgia Gkioxari Aaron Ames http://arxiv.org/abs/2606.10504v1 Cross-Modal Knowledge Distillation without Paired Data: Theoretical Foundation and Algorithm 2026-06-09T07:29:12Z Cross-modal knowledge distillation (CMKD) studies how a (large) teacher model trained on one type of data (e.g., images) can guide a (smaller) student model building on another type of data (e.g., text/audio). Existing CMKD methods often require paired multi-modal data with aligned semantics, but obtaining such paired data are often costly and impractical. To mitigate this limitation, we develop a new CMKD framework for the more challenging setting where paired data are unavailable. In particular, we establish a cross-modal distributional relationship between teacher and student models, which reveals two fundamental quantities governing effective distillation: feature alignment and label alignment. These quantities characterize semantic discrepancy between modalities at the levels of representation and prediction distributions, respectively. Motivated by this insight, we propose a principled framework, with theoretical guarantees, that enables effective cross-modal knowledge distillation by aligning distributions rather than individual samples. Extensive experiments across a wide range of multimodal benchmarks show that our framework is highly effective in both unpaired and paired data settings, improving significantly over prior work. 2026-06-09T07:29:12Z Trong Khiem Tran Anh Duc Chu Quang Hung Pham Phi Le Nguyen Trong Nghia Hoang http://arxiv.org/abs/2509.04027v4 Why Does Reasoning Length Converge? Unveiling the Underfitting-Overfitting Trade-off in Chain-of-Thought 2026-06-09T07:28:31Z Test-time scaling, primarily manifested through multi-step Chain-of-Thought (CoT) reasoning via Reinforcement Learning (RL), has emerged as a pivotal paradigm for enhancing the reasoning capabilities of Large Language Models (LLMs). However, a significant theoretical gap persists: traditional token-level analysis fails to capture the macroscopic dynamics of reasoning-level scaling. To address this, we introduce CoT-Space, a novel theoretical framework that recasts the reasoning process from a discrete token-prediction task to an optimization process within a continuous, reasoning-level semantic space. By modeling the reasoning trajectory from both noise and risk perspectives and revitalizing foundational principles from classical learning theory, we demonstrate that the observed convergence to an optimal CoT length is a natural consequence of the fundamental trade-off between underfitting and overfitting. We further utilize RL as a tool to elicit and verify these results in our experiments. Our findings provide a mechanistic explanation for the internal test-time scaling via RL, offering a principled theoretical foundation to optimize reasoning trajectories in modern LLMs. 2025-09-04T09:02:16Z Preprint Edition Zeyu Gan Hao Yi Yong Liu