https://arxiv.org/api/we75vlbX8afeAQrm0NS5lyFc6ig 2026-06-11T08:04:43Z 184037 495 15 http://arxiv.org/abs/2601.08379v2 MMD Guidance: Training-Free Distribution Adaptation for Diffusion Models via Maximum Mean Discrepancy Guidance 2026-06-09T06:35:29Z

Pre-trained diffusion models have emerged as powerful generative priors for both unconditional and conditional sample generation, yet their outputs often deviate from the characteristics of user-specific target data. Such mismatches are especially problematic in domain adaptation tasks, where only a few reference examples are available and retraining the diffusion model is infeasible. Existing inference-time guidance methods can adjust sampling trajectories, but they typically optimize surrogate objectives such as classifier likelihoods rather than directly aligning with the target distribution. We propose \emph{MMD Guidance}, a training-free mechanism that augments the reverse diffusion process with gradients of the \textit{Maximum Mean Discrepancy (MMD)} between generated samples and a reference dataset. MMD provides reliable distributional estimates from limited data, exhibits low variance in practice, and is efficiently differentiable, which makes it particularly well-suited for the guidance task. Our framework naturally extends to prompt-aware adaptation in conditional generation models via product kernels. Also, it can be applied with computational efficiency in latent diffusion models (LDMs), since guidance is applied in the latent space of the LDM. Experiments on synthetic and real-world benchmarks demonstrate that MMD Guidance can achieve distributional alignment while preserving sample fidelity. The project code is available at github.com/matinamehdizadeh/MMD-Guidance.

2026-01-13T09:42:57Z Matina Mahdizadeh Sani Nima Jamali Mohammad Jalali Farzan Farnia http://arxiv.org/abs/2606.04718v2 CoRe-MoE: Contrastive Reweighted Mixture of Experts for Multi-Terrain Humanoid Locomotion with Gait Adaptation 2026-06-09T06:35:28Z

Humans primarily rely on walking and running to traverse complex terrains. Similarly, humanoid robots should be able to smoothly transition between walking and running while maintaining natural and stable locomotion. However, unifying gait transition and multi-terrain adaptation within a single policy remains challenging due to gradient interference between tasks and the distribution shift caused by terrain variations. Although Mixture-of-Experts (MoE) architectures can mitigate multi-skill interference, direct joint training often fails to achieve clear expert specialization. To address these challenges, we propose CoRe-MoE, a two-stage reinforcement learning framework that decouples gait generation from terrain adaptation. In the first stage, a stable locomotion policy is learned to produce natural walking and running behaviors with smooth transitions. In the second stage, a terrain-aware MoE branch is introduced, and the gating network is trained with a contrastive objective to learn structured terrain representations and promote expert specialization. The final action is obtained through weighted fusion of the base gait policy and the terrain-aware branch, enabling the policy to preserve stable locomotion while adapting to complex terrains. Extensive simulation results demonstrate that the proposed method outperforms baseline approaches in terms of success rate, locomotion stability, and multi-terrain adaptability. Furthermore, zero-shot deployment on a Unitree G1 humanoid robot validates the effectiveness of our framework, achieving robust walking and running across stairs, slopes, steps, obstacles, and unstructured outdoor terrains while maintaining accurate foothold control and dynamic stability.

2026-06-03T10:51:46Z Kailun Huang, Zikang Xie, Yanzhe Xie and Panpan Liao contributed equally to this work. Corresponding authors: Renjing Xu and Haohui Huang Kailun Huang Hong Kong University of Science and Technology Zikang Xie Hong Kong University of Science and Technology Yanzhe Xie Hong Kong University of Science and Technology Panpan Liao Guangdong University of Technology Fanghai Zhang Hong Kong University of Science and Technology Yanheng Mai Hong Kong University of Science and Technology Wenhao Xu South China Agricultural University Yunheng Wang Hong Kong University of Science and Technology Renjing Xu Hong Kong University of Science and Technology Haohui Huang Guangdong University of Technology http://arxiv.org/abs/2605.28066v2 PromptEmbedder: Efficient and Transferable Text Embedding via Dual-LLM Soft Prompting 2026-06-09T06:33:45Z

Large Language Models (LLMs) have demonstrated remarkable efficacy in text embedding, yet current adaptation methods like LoRA face significant bottlenecks in computational efficiency and cross-architecture transferability. Whenever a new backbone emerges, existing approaches require costly retraining from scratch. To address this, we propose PromptEmbedder, a novel dual-LLM framework that decouples embedding knowledge from specific backbone weights. PromptEmbedder utilizes a Prompting LLM to generate instruction-aware soft prompts for a frozen Embedding LLM via a differentiable generation process with continuous relaxation, ensuring full gradient flow during contrastive training. By localizing task-specific knowledge within the Prompting LLM, adapting to new architectures requires only retraining a lightweight linear alignment matrix. Evaluations on the MTEB benchmark show that PromptEmbedder achieves comparable performance with LoRA finetuning while reducing GPU memory by 40% and accelerating training by 3.7x. Our approach establishes a scalable, architecture-agnostic paradigm for efficient LLM-based representation learning.

2026-05-27T07:23:55Z Yu-Che Tsai Kuan-Yu Chen Yuan-Hao Chen Yu-Han Chang Ching-Yu Tsai Yu-Hsiang Chuang Shou-De Lin http://arxiv.org/abs/2411.02817v2 Conditional Vendi Score: Prompt-Aware Diversity Evaluation for Generative AI Models and LLMs 2026-06-09T06:28:35Z

Generative models guided by text prompts are widely evaluated for fidelity and prompt alignment, yet their ability to produce outputs remains underexplored. Existing diversity metrics such as Vendi and RKE, which are based on the von Neumann and Rényi entropies of kernel matrices, were developed for unconditional models and cannot distinguish prompt-induced from model-induced variability. We address this gap by introducing \textit{Conditional-Vendi} and \textit{Conditional-RKE}, diversity measures derived from the conditional entropy of positive semidefinite matrices. These scores isolate model-induced diversity in prompt-guided generation, with Conditional-RKE enjoying an $O(1/\sqrt{n})$ convergence rate. For Conditional-Vendi, we introduce a truncated-spectrum approximation that yields scalable and consistent estimates. Experiments on text-to-image, image-captioning, and LLM tasks show that the conditional scores recover ground-truth diversity orderings and can also guide diffusion models toward more diverse samples. The codebase is available at https://github.com/mjalali/conditional-vendi.

2024-11-05T05:30:39Z Mohammad Jalali Azim Ospanov Amin Gohari Farzan Farnia http://arxiv.org/abs/2606.10461v1 ERAlign: Energy-based Representation Alignment of GNNs and LLMs on Text-attributed Graphs 2026-06-09T06:16:40Z

Text-attributed Graphs (TAGs) incorporate textual node attributes with graph structures to describe rich relational semantics. Recent efforts to integrate Graph Neural Networks (GNNs) and Large Language Models (LLMs) have shown promise for learning on TAGs, yet achieving well-aligned representations remains challenging. Prior studies largely rely on heuristics that perform coarse-grained matching. They lack sufficient constraints and ignore distributional alignment, leading to representation drift and limited generalization. Building on Energy-based Models (EBMs), we propose an Energy-based Representation Alignment (ERAlign) framework that projects GNN-encoded graph structure and LLM-derived text embeddings in a shared latent space to achieve distribution consistency. Concretely, layer-wise alignment is quantified by a distance metric and optimized via an EBM objective. By decreasing energy values, our framework yields well-aligned representations for downstream tasks. During training, we introduce Energy Discrepancy (ED) to avoid high sampling costs associated with intractable normalization. ED also carries theoretical guarantees of higher training efficiency and reduced energy landscape distortion. Empirical evaluations on eight TAG datasets demonstrate that ERAlign obtains state-of-the-art performance across varying levels of supervision and cross-task transfer scenarios.

2026-06-09T06:16:40Z Accepted to ICML 2026 Xianlin Zeng Fan Xia Xiangyu Chen http://arxiv.org/abs/2606.10460v1 LakeQA: An Exploratory QA Benchmark over a Million-Scale Data Lake 2026-06-09T06:15:36Z

Recent large language models (LLMs) have shown rapid progress in reading-based question answering (QA), where evidence is explicitly provided or can be trivially retrieved. In contrast, real-world questions are often not paired with accurate evidence documents. The useful evidence resides in massive data lakes, making search a prerequisite for answering. However, there is a lack of comprehensive benchmarks that require both searching and reasoning over large data lakes. To this end, we introduce LakeQA, a comprehensive benchmark for search-centric question answering over data lakes that jointly emphasizes searching and reasoning capabilities. LakeQA is built on a heterogeneous collection of approximately 9.5 TB of text resources from Wikipedia and open-source government data, spanning structured and unstructured data. To ensure task quality, each sample is annotated by at least one Ph.D.-level expert. Each task requires long-horizon multi-hop reasoning with implicit intermediate steps: agents need to discover the correct documents and then compose evidence across sources to produce the answer. Experimental results on seven frontier LLMs demonstrate that LakeQA is challenging. For instance, GPT-5.2 achieves only an exact-match score of 18.37% on LakeQA. Overall, LakeQA provides a realistic testbed for developing LLM agents that can both find and analyze data in modern data lakes.

2026-06-09T06:15:36Z Haonan Wang Jiaxiang Liu Yurong Liu Austin Senna Wijaya Tianle Zhou Eden Wu Yijia Chen Wanting You Reya Vir Daniela Pinto Grace Fan Yusen Zhang Juliana Freire Eugene Wu http://arxiv.org/abs/2606.10458v1 Minimum Distortion Quantization with Specified Output Distribution 2026-06-09T06:06:41Z

We derive the optimal quantizer of a real-valued random variable $W$ with distribution $P_W$ such that 1) the distribution of the quantization output $X$ that can take $k$ values follows any specified distribution $P_X$ over $\{1,\ldots,k\}$, and 2) the minimum mean squared error (MMSE) of estimating $W$ from $X$ is minimized. It is shown that the optimal quantizer takes the form $X=σ\big(F_{σ^{-1}(X)}^{-1}(F_W(W))\big)$, where $σ$ is the optimal permutation of $\{1,\ldots,k\}$ among all permutations to minimize the MMSE, and $F$ is the cumulative distribution function. When $P_W$ is uniform over an interval or $P_X$ is uniform over $\{1,\ldots,k\}$, the quantizer takes a simple form $X=F_{X}^{-1}(F_W(W))$. The concept of majorization plays a key role in the optimality proof. Specifying the output distribution is useful for designing quantizers with explicitly controlled output entropy, maximized mutual information between input and output, tailored output distribution to match channel input requirements for communication, and data anonymization.

2026-06-09T06:06:41Z Aolin Xu http://arxiv.org/abs/2606.10457v1 Trace2Policy: From Expert Behavior Traces to Self-Evolving Decision Agents 2026-06-09T06:05:29Z

Decision rules that enterprise experts apply tacitly -- in auditing, compliance, and contract review -- can be systematically recovered and improved through iterative error analysis. We present \textbf{Trace2Policy}, whose core mechanism -- \textbf{EISR} (\textbf{E}rror-driven \textbf{I}terative \textbf{S}kill \textbf{R}efinement) -- maintains a human-readable rule document as its optimization target: each round executes the rules on a validation set, clusters errors by root cause into MISSING, WRONG, or CONFLICT types, applies targeted patches, and commits only those that pass a regression gate. \textbf{For this class of compliance-sensitive, skewed-base-rate decision tasks, we identify rule quality -- not model capability -- as the dominant performance lever}: across five LLMs, one-shot distillation plateaus near $\sim$70\% on the deployed pool, while eight EISR rounds lift the same rules to 79.6\% when compiled into deterministic Python -- zero LLM calls at inference. \textbf{Execution form compounds the gain: in production, the same EISR-refined content runs 9.8~pp higher as compiled Python than as an LLM prompt, a form-and-engineering bundle the 22-day deployment matured together.} Deployed for 22 days at a major logistics carrier (3,349 audit cases), the compiled pipeline outperforms the pure-LLM baseline it replaced (72.7\%); on these calibrated, skewed-base-rate workloads, re-enabling LLM fallback monotonically degrades accuracy. An LLM-driven variant, \textbf{Auto-EISR}, reproduces this refinement at \$5--\$10 per cycle versus $\sim$70 expert-hours, and transfers to four public benchmarks spanning legal reasoning (LegalBench) and process-mining decisions (BPIC 2012) without re-engineering.

2026-06-09T06:05:29Z Junli Zha Jinbo Wang Chao Zhou Xiang Song http://arxiv.org/abs/2605.09370v4 From Detection to Recovery: Operational Analysis on LLM Pre-training with 504 GPUs 2026-06-09T06:04:44Z

Large-scale AI training is fundamentally a distributed systems problem, where hardware failures are routine operating conditions rather than rare exceptions, yet public operational evidence from production training clusters remains limited. This report presents an empirical analysis of a 63-node NVIDIA B200 production cluster (504 GPUs), using 55 days of Prometheus time-series data and 73 days of operational logs covering 224 multi-node training sessions. The environment is cross-organizational: five parties (SKT, Upstage, Lablup, NVIDIA Korea, VAST Data) share a unified monitoring pipeline. This enabled joint diagnosis of a 60-node-scale storage I/O bottleneck absent in 2-4-node tests, a production-scale phenomenon no single team could isolate alone. We perform three quantitative analyses yielding four findings. First, over 751 Prometheus metrics and 10 XID-identified GPU failures, no single metric is consistently dominant across failure types, motivating multi-signal detection. Second, 523 checkpoint events trace the save/load path from GPU VRAM to the NFS server: restart loading reaches 21.5% of maximum read bandwidth (700 GB/s) and save bursts 16.0% of maximum write bandwidth (250 GB/s), with NFS/RPC queueing and transport-layer backlog rising together. Third, across 224 sessions over 73 days, node exclusions concentrate so the top 3 of 63 nodes account for over 50%. Fourth, auto-retry chain analysis shows a 33.3% success rate over 12 chains (73 attempts), 2.7x the 12.5% manual rate, with a median retry interval of 11 minutes (IQR 10-11). All analyses are grounded in production infrastructure providing session-level workload management, GPU-centric scheduling, and unified observability.

2026-05-10T06:46:06Z 42 pages, 19 figures, 16 tables. Lablup Technical Report Daemyung Kang Eunjin Hwang Hanjeong Lee HyeokJin Kim Hyunhoi Koo Jeongkyu Shin Jeongseok Kang Jihyun Kang Jinho Heo Joongi Kim Junbum Lee Jungseung Yang Kyujin Cho Youngsook Song http://arxiv.org/abs/2606.10456v1 The Distributed Detectability Band Against Marginal-Preserving Attacks 2026-06-09T06:04:30Z

AI-control monitors score individual agent actions to detect misbehavior, but real harm can be distributed across many benign-looking steps, each individually below any per-step alarm. We construct a marginal-preserving, correlation-encoded distributed-sabotage attack using a Gaussian-copula AR(1) construction: the per-step monitor-score marginal is held exactly equal to benign, so mean, max, top-k tail, and threshold monitors (Monitor A) are defeated by construction, while harm is encoded in the temporal correlation structure. We sequence the paper around three reviewer-mandated gates. (1) Realizability gate: the stealthy attack achieves KS-distance to benign of 0.013 (effectively zero) at all tested harm levels up to 3.0, confirming that harm is fully decoupled from the per-step marginal and realizability is not harm-limited. (2) Monitor-A-vs-B reconciliation: we show formally that the attack, built against Monitor A's score marginal, remains marginal-preserving under a different-score Monitor B (the correlation/sequence family: CUSUM, SPRT, HMM-LR, runs test, autocorrelation, windowed logistic), and scope worst-case claims to score functions that admit a temporal signature. (3) Non-empty detectability band: Monitor A achieves AUC 0.52 (chance); Monitor B spans AUC 0.79-0.97 at the same 1% FPR target, and as harm is amortized over more steps Monitor A collapses to chance while Monitor B holds at AUC ~0.95. These results demonstrate a non-empty detectability band and characterize the sub-threshold sabotage frontier: distribution-shape monitors fail by construction; temporal-correlation monitors can detect but are not trivially optimal.

2026-06-09T06:04:30Z 10 pages, 11 figures Zhang Qinqin Gao Yuze http://arxiv.org/abs/2605.18271v2 From Volume to Value: Preference-Aligned Memory Construction for On-Device RAG 2026-06-09T06:03:09Z

With the rapid emergence of personal AI agents based on Large Language Models (LLMs), implementing them on-device has become essential for privacy and responsiveness. To handle the inherently personal and context-dependent nature of real-world requests, such agents must ground their generation in device-resident personal context. However, under tight memory budgets, the core bottleneck is what to store so that retrieval remains aligned with the user. We propose EPIC (Efficient Preference-aligned Index Construction), which focuses on user preferences as a compact and stable form of personal context and integrates them throughout the RAG pipeline. EPIC selectively retains preference-relevant information from raw data and aligns retrieval toward preference-aligned contexts. Across four benchmarks covering conversations, debates, explanations, and recommendations, EPIC reduces indexing memory by 2,404 times, improves preference-following accuracy by 18.79 %p, and achieves 32.17 times lower retrieval latency over the best-performing baseline. In on-device experiments, EPIC maintains under 1 MB memory and achieves 5.21 to 29.35 ms/query latency across three platforms, while supporting streaming updates under preference drift. Our code and data are available at https://github.com/UbiquitousAILab/EPIC.

2026-05-18T12:06:05Z Accepted to ICML 2026. Code and data are available at https://github.com/UbiquitousAILab/EPIC Changmin Lee Jaemin Kim Taesik Gong http://arxiv.org/abs/2606.10448v1 Mitigating Bias in Low-SNR Financial Reinforcement Learning via Quantum Representations 2026-06-09T05:55:06Z

The financial market is a typical low signal-to-noise ratio (SNR) setting, which often destabilizes off-policy maximum-entropy methods like Soft Actor-Critic (SAC). Specifically, noisy state representations may produce unreliable Q-value estimates, and bootstrapping amplifies these errors, forming a failure mode we call the "Financial Entropy Trap". In this paper, we propose FPQC-SAC, an efficient and plug-and-play SAC variant that places a compact and bounded Parameterized Quantum Circuit (PQC) before the actor and critic networks to constrain feature propagation at the representation level, rather than filtering raw inputs or regularizing Q-values after bootstrapping. Notably, FPQC-SAC reduces the impact of extreme market fluctuations on Bellman target estimation, while trainable quantum entanglement preserves flexible cross-asset interactions. Empirical evaluations on real-world portfolio management tasks demonstrate that FPQC-SAC substantially enhances out-of-sample stability and cumulative returns by achieving a 66.89% relative gain in cumulative return over standard unconstrained SAC and outperforms the best continuous-control deep reinforcement learning baseline by approximately 27%. Open-source code is available at https://github.com/ZeyuLIU-UST/FPQC-SAC-main.

2026-06-09T05:55:06Z Preprint. Code available at https://github.com/ZeyuLIU-UST/FPQC-SAC-main Zeyu Liu Xuanzhi Feng Sing Kwong Lai Yuanchen Gao Xiaoyi Pang Hualei Zhang Jingcai Guo Jie Zhang Song Guo http://arxiv.org/abs/2410.15595v4 A Comprehensive Survey of Direct Preference Optimization: Datasets, Theories, Variants, and Applications 2026-06-09T05:48:30Z

With the rapid advancement of large language models (LLMs), aligning policy models with human preferences has become increasingly critical. Direct Preference Optimization (DPO) has emerged as a promising approach for alignment, acting as an RL-free alternative to Reinforcement Learning from Human Feedback (RLHF). Despite DPO's various advancements and inherent limitations, an in-depth review of these aspects is currently lacking in the literature. In this work, we present a comprehensive review of the challenges and opportunities in DPO, covering theoretical analyses, variants, relevant preference datasets, and applications. Specifically, we categorize recent studies on DPO based on key research questions to provide a thorough understanding of DPO's current landscape. Additionally, we propose several future research directions to offer insights on model alignment for the research community. An updated collection of relevant papers can be found on https://github.com/Mr-Loevan/DPO-Survey.

2024-10-21T02:27:24Z Accepted by TPAMI 2026. Project page: https://github.com/Mr-Loevan/DPO-Survey Wenyi Xiao Zechuan Wang Leilei Gan Shuai Zhao Zongrui Li Ruirui Lei Wanggui He Luu Anh Tuan Long Chen Hao Jiang Zhou Zhao Fei Wu http://arxiv.org/abs/2407.20242v5 BadRobot: Jailbreaking Embodied LLM Agents in the Physical World 2026-06-09T05:19:18Z

Embodied AI represents systems where AI is integrated into physical entities. Large Language Model (LLM), which exhibits powerful language understanding abilities, has been extensively employed in embodied AI by facilitating sophisticated task planning. However, a critical safety issue remains overlooked: could these embodied LLMs perpetrate harmful behaviors? In response, we introduce BadRobot, a novel attack paradigm aiming to make embodied LLMs violate safety and ethical constraints through typical voice-based user-system interactions. Specifically, three vulnerabilities are exploited to achieve this type of attack: (i) manipulation of LLMs within robotic systems, (ii) misalignment between linguistic outputs and physical actions, and (iii) unintentional hazardous behaviors caused by world knowledge's flaws. Furthermore, we construct a benchmark of various malicious physical action queries to evaluate BadRobot's attack performance. Based on this benchmark, extensive experiments against existing prominent embodied LLM frameworks (e.g., Voxposer, Code as Policies, and ProgPrompt) demonstrate the effectiveness of our BadRobot. Our code is available at https://github.com/Rookie143/BadRobot.

2024-07-16T13:13:16Z Accepted to ICLR 2025. Please cite the conference version. Project page: https://Embodied-LLMs-Safety.github.io International Conference on Learning Representations (ICLR) 2025 Hangtao Zhang Chenyu Zhu Xianlong Wang Ziqi Zhou Changgan Yin Minghui Li Lulu Xue Yichen Wang Shengshan Hu Aishan Liu Peijin Guo Leo Yu Zhang http://arxiv.org/abs/2606.10431v1 Vision-Assisted Foundation Model for Solving Multi-Task Vehicle Routing Problems 2026-06-09T05:15:25Z

Multi-task vehicle routing problems play a critical role in enhancing efficiency across various industries and service sectors. These problems consist of multiple variants that optimize routing costs while meeting diverse customer constraints. Existing multi-task VRP solvers solely utilize a graph-based modality, limiting their ability to address variants with multiple constraints. As a format to represent complex semantics, vision modality shows great potential for encoding diverse VRP constraints. This motivates us to learn patch-level semantics from the vision images, and then integrate them into a graph-based model to solve various VRP variants simultaneously. However, directly applying this approach to multi-task VRPs presents three challenges: 1) existing VRP images lack constraint representations, which are essential for multi-task VRPs, 2) the fixed receptive field of individual patches cannot effectively accommodate varying requirements across tasks, and 3) imbalanced pixel distribution among constraints may cause the model to overlook constraints with fewer pixels. In this paper, we propose a vision-assisted foundation model (VaFM) to address these challenges. In the vision modality, input images tailored to all constraints are encoded by a convolutional neural network. The obtained patch embeddings are fused with graph-based nodes to generate solutions, with an auxiliary task designed to address the pixel-imbalanced issue. The performance of VaFM is evaluated across 16 different VRP variants. The experimental results demonstrate the superiority of VaFM over state-of-the-art methods, especially for variants with complex constraints.

2026-06-09T05:15:25Z Accepted by TNNLS Shuangchun Gui Zhiguang Cao Wen Song Yew-Soon Ong