https://arxiv.org/api/KfnJ5HUsxqRoLKWBTs/IfYOJjmw2026-06-10T10:41:19Z18383819515http://arxiv.org/abs/2606.10466v1UPLOTS: A Unified Pretrained Language Model for Constrained Time-series Generation2026-06-09T06:36:06ZIn time-series generation, existing approaches typically handcraft ortrain a separate model for each dataset, which hinders their scalability and fails to leverage shared temporal structures across domains. To address this fragmentation, we propose UPLOTS, a Unified, Prompt-guided Language model framework fOr constrained Time-Series Generation across diverse domains. Instead of building task-specific models, UPLOTS leverages a single pre-trained transformer backbone guided by learned constraint prompts, enabling on-demand generation with precise pattern control. One key innovation is our dynamic multi-dataset loss re-weighting and prompt-to-pattern mapping, which allows UPLOTS to internalize diverse temporal structures during training and conditionally generate them at inference. We evaluate UPLOTS on four real-world benchmarks and multiple constraint settings, including peak-period, calendar, load-level, and volatility patterns. Additional held-out constraint-combination and downstream forecasting experiments further demonstrate that UPLOTS generalizes beyond the original peak-pattern setting and improves data augmentation under scarce real-data regimes. Our code and baselines are available at anonymous github repo: https://anonymous.4open.science/r/UPLOTS-6C36.2026-06-09T06:36:06ZDu YinHao XueJinliang DengYang YangShuang AoArian PrabowoFlora Salimhttp://arxiv.org/abs/2601.08379v2MMD Guidance: Training-Free Distribution Adaptation for Diffusion Models via Maximum Mean Discrepancy Guidance2026-06-09T06:35:29ZPre-trained diffusion models have emerged as powerful generative priors for both unconditional and conditional sample generation, yet their outputs often deviate from the characteristics of user-specific target data. Such mismatches are especially problematic in domain adaptation tasks, where only a few reference examples are available and retraining the diffusion model is infeasible. Existing inference-time guidance methods can adjust sampling trajectories, but they typically optimize surrogate objectives such as classifier likelihoods rather than directly aligning with the target distribution. We propose \emph{MMD Guidance}, a training-free mechanism that augments the reverse diffusion process with gradients of the \textit{Maximum Mean Discrepancy (MMD)} between generated samples and a reference dataset. MMD provides reliable distributional estimates from limited data, exhibits low variance in practice, and is efficiently differentiable, which makes it particularly well-suited for the guidance task. Our framework naturally extends to prompt-aware adaptation in conditional generation models via product kernels. Also, it can be applied with computational efficiency in latent diffusion models (LDMs), since guidance is applied in the latent space of the LDM. Experiments on synthetic and real-world benchmarks demonstrate that MMD Guidance can achieve distributional alignment while preserving sample fidelity. The project code is available at github.com/matinamehdizadeh/MMD-Guidance.2026-01-13T09:42:57ZMatina Mahdizadeh SaniNima JamaliMohammad JalaliFarzan Farniahttp://arxiv.org/abs/2606.04718v2CoRe-MoE: Contrastive Reweighted Mixture of Experts for Multi-Terrain Humanoid Locomotion with Gait Adaptation2026-06-09T06:35:28ZHumans primarily rely on walking and running to traverse complex terrains. Similarly, humanoid robots should be able to smoothly transition between walking and running while maintaining natural and stable locomotion. However, unifying gait transition and multi-terrain adaptation within a single policy remains challenging due to gradient interference between tasks and the distribution shift caused by terrain variations. Although Mixture-of-Experts (MoE) architectures can mitigate multi-skill interference, direct joint training often fails to achieve clear expert specialization. To address these challenges, we propose CoRe-MoE, a two-stage reinforcement learning framework that decouples gait generation from terrain adaptation. In the first stage, a stable locomotion policy is learned to produce natural walking and running behaviors with smooth transitions. In the second stage, a terrain-aware MoE branch is introduced, and the gating network is trained with a contrastive objective to learn structured terrain representations and promote expert specialization. The final action is obtained through weighted fusion of the base gait policy and the terrain-aware branch, enabling the policy to preserve stable locomotion while adapting to complex terrains. Extensive simulation results demonstrate that the proposed method outperforms baseline approaches in terms of success rate, locomotion stability, and multi-terrain adaptability. Furthermore, zero-shot deployment on a Unitree G1 humanoid robot validates the effectiveness of our framework, achieving robust walking and running across stairs, slopes, steps, obstacles, and unstructured outdoor terrains while maintaining accurate foothold control and dynamic stability.2026-06-03T10:51:46ZKailun Huang, Zikang Xie, Yanzhe Xie and Panpan Liao contributed equally to this work. Corresponding authors: Renjing Xu and Haohui HuangKailun HuangHong Kong University of Science and TechnologyZikang XieHong Kong University of Science and TechnologyYanzhe XieHong Kong University of Science and TechnologyPanpan LiaoGuangdong University of TechnologyFanghai ZhangHong Kong University of Science and TechnologyYanheng MaiHong Kong University of Science and TechnologyWenhao XuSouth China Agricultural UniversityYunheng WangHong Kong University of Science and TechnologyRenjing XuHong Kong University of Science and TechnologyHaohui HuangGuangdong University of Technologyhttp://arxiv.org/abs/2605.28066v2PromptEmbedder: Efficient and Transferable Text Embedding via Dual-LLM Soft Prompting2026-06-09T06:33:45ZLarge Language Models (LLMs) have demonstrated remarkable efficacy in text embedding, yet current adaptation methods like LoRA face significant bottlenecks in computational efficiency and cross-architecture transferability. Whenever a new backbone emerges, existing approaches require costly retraining from scratch. To address this, we propose PromptEmbedder, a novel dual-LLM framework that decouples embedding knowledge from specific backbone weights. PromptEmbedder utilizes a Prompting LLM to generate instruction-aware soft prompts for a frozen Embedding LLM via a differentiable generation process with continuous relaxation, ensuring full gradient flow during contrastive training. By localizing task-specific knowledge within the Prompting LLM, adapting to new architectures requires only retraining a lightweight linear alignment matrix. Evaluations on the MTEB benchmark show that PromptEmbedder achieves comparable performance with LoRA finetuning while reducing GPU memory by 40% and accelerating training by 3.7x. Our approach establishes a scalable, architecture-agnostic paradigm for efficient LLM-based representation learning.2026-05-27T07:23:55ZYu-Che TsaiKuan-Yu ChenYuan-Hao ChenYu-Han ChangChing-Yu TsaiYu-Hsiang ChuangShou-De Linhttp://arxiv.org/abs/2411.02817v2Conditional Vendi Score: Prompt-Aware Diversity Evaluation for Generative AI Models and LLMs2026-06-09T06:28:35ZGenerative models guided by text prompts are widely evaluated for fidelity and prompt alignment, yet their ability to produce outputs remains underexplored. Existing diversity metrics such as Vendi and RKE, which are based on the von Neumann and Rényi entropies of kernel matrices, were developed for unconditional models and cannot distinguish prompt-induced from model-induced variability. We address this gap by introducing \textit{Conditional-Vendi} and \textit{Conditional-RKE}, diversity measures derived from the conditional entropy of positive semidefinite matrices. These scores isolate model-induced diversity in prompt-guided generation, with Conditional-RKE enjoying an $O(1/\sqrt{n})$ convergence rate. For Conditional-Vendi, we introduce a truncated-spectrum approximation that yields scalable and consistent estimates. Experiments on text-to-image, image-captioning, and LLM tasks show that the conditional scores recover ground-truth diversity orderings and can also guide diffusion models toward more diverse samples. The codebase is available at https://github.com/mjalali/conditional-vendi.2024-11-05T05:30:39ZMohammad JalaliAzim OspanovAmin GohariFarzan Farniahttp://arxiv.org/abs/2606.10461v1ERAlign: Energy-based Representation Alignment of GNNs and LLMs on Text-attributed Graphs2026-06-09T06:16:40ZText-attributed Graphs (TAGs) incorporate textual node attributes with graph structures to describe rich relational semantics. Recent efforts to integrate Graph Neural Networks (GNNs) and Large Language Models (LLMs) have shown promise for learning on TAGs, yet achieving well-aligned representations remains challenging. Prior studies largely rely on heuristics that perform coarse-grained matching. They lack sufficient constraints and ignore distributional alignment, leading to representation drift and limited generalization. Building on Energy-based Models (EBMs), we propose an Energy-based Representation Alignment (ERAlign) framework that projects GNN-encoded graph structure and LLM-derived text embeddings in a shared latent space to achieve distribution consistency. Concretely, layer-wise alignment is quantified by a distance metric and optimized via an EBM objective. By decreasing energy values, our framework yields well-aligned representations for downstream tasks. During training, we introduce Energy Discrepancy (ED) to avoid high sampling costs associated with intractable normalization. ED also carries theoretical guarantees of higher training efficiency and reduced energy landscape distortion. Empirical evaluations on eight TAG datasets demonstrate that ERAlign obtains state-of-the-art performance across varying levels of supervision and cross-task transfer scenarios.2026-06-09T06:16:40ZAccepted to ICML 2026Xianlin ZengFan XiaXiangyu Chenhttp://arxiv.org/abs/2606.10460v1LakeQA: An Exploratory QA Benchmark over a Million-Scale Data Lake2026-06-09T06:15:36ZRecent large language models (LLMs) have shown rapid progress in reading-based question answering (QA), where evidence is explicitly provided or can be trivially retrieved. In contrast, real-world questions are often not paired with accurate evidence documents. The useful evidence resides in massive data lakes, making search a prerequisite for answering. However, there is a lack of comprehensive benchmarks that require both searching and reasoning over large data lakes. To this end, we introduce LakeQA, a comprehensive benchmark for search-centric question answering over data lakes that jointly emphasizes searching and reasoning capabilities. LakeQA is built on a heterogeneous collection of approximately 9.5 TB of text resources from Wikipedia and open-source government data, spanning structured and unstructured data. To ensure task quality, each sample is annotated by at least one Ph.D.-level expert. Each task requires long-horizon multi-hop reasoning with implicit intermediate steps: agents need to discover the correct documents and then compose evidence across sources to produce the answer. Experimental results on seven frontier LLMs demonstrate that LakeQA is challenging. For instance, GPT-5.2 achieves only an exact-match score of 18.37% on LakeQA. Overall, LakeQA provides a realistic testbed for developing LLM agents that can both find and analyze data in modern data lakes.2026-06-09T06:15:36ZHaonan WangJiaxiang LiuYurong LiuAustin Senna WijayaTianle ZhouEden WuYijia ChenWanting YouReya VirDaniela PintoGrace FanYusen ZhangJuliana FreireEugene Wuhttp://arxiv.org/abs/2606.10458v1Minimum Distortion Quantization with Specified Output Distribution2026-06-09T06:06:41ZWe derive the optimal quantizer of a real-valued random variable $W$ with distribution $P_W$ such that 1) the distribution of the quantization output $X$ that can take $k$ values follows any specified distribution $P_X$ over $\{1,\ldots,k\}$, and 2) the minimum mean squared error (MMSE) of estimating $W$ from $X$ is minimized. It is shown that the optimal quantizer takes the form $X=σ\big(F_{σ^{-1}(X)}^{-1}(F_W(W))\big)$, where $σ$ is the optimal permutation of $\{1,\ldots,k\}$ among all permutations to minimize the MMSE, and $F$ is the cumulative distribution function. When $P_W$ is uniform over an interval or $P_X$ is uniform over $\{1,\ldots,k\}$, the quantizer takes a simple form $X=F_{X}^{-1}(F_W(W))$. The concept of majorization plays a key role in the optimality proof. Specifying the output distribution is useful for designing quantizers with explicitly controlled output entropy, maximized mutual information between input and output, tailored output distribution to match channel input requirements for communication, and data anonymization.2026-06-09T06:06:41ZAolin Xuhttp://arxiv.org/abs/2606.10457v1Trace2Policy: From Expert Behavior Traces to Self-Evolving Decision Agents2026-06-09T06:05:29ZDecision rules that enterprise experts apply tacitly -- in auditing, compliance, and contract review -- can be systematically recovered and improved through iterative error analysis. We present \textbf{Trace2Policy}, whose core mechanism -- \textbf{EISR} (\textbf{E}rror-driven \textbf{I}terative \textbf{S}kill \textbf{R}efinement) -- maintains a human-readable rule document as its optimization target: each round executes the rules on a validation set, clusters errors by root cause into MISSING, WRONG, or CONFLICT types, applies targeted patches, and commits only those that pass a regression gate. \textbf{For this class of compliance-sensitive, skewed-base-rate decision tasks, we identify rule quality -- not model capability -- as the dominant performance lever}: across five LLMs, one-shot distillation plateaus near $\sim$70\% on the deployed pool, while eight EISR rounds lift the same rules to 79.6\% when compiled into deterministic Python -- zero LLM calls at inference. \textbf{Execution form compounds the gain: in production, the same EISR-refined content runs 9.8~pp higher as compiled Python than as an LLM prompt, a form-and-engineering bundle the 22-day deployment matured together.} Deployed for 22 days at a major logistics carrier (3,349 audit cases), the compiled pipeline outperforms the pure-LLM baseline it replaced (72.7\%); on these calibrated, skewed-base-rate workloads, re-enabling LLM fallback monotonically degrades accuracy. An LLM-driven variant, \textbf{Auto-EISR}, reproduces this refinement at \$5--\$10 per cycle versus $\sim$70 expert-hours, and transfers to four public benchmarks spanning legal reasoning (LegalBench) and process-mining decisions (BPIC 2012) without re-engineering.2026-06-09T06:05:29ZJunli ZhaJinbo WangChao ZhouXiang Songhttp://arxiv.org/abs/2605.09370v4From Detection to Recovery: Operational Analysis on LLM Pre-training with 504 GPUs2026-06-09T06:04:44ZLarge-scale AI training is fundamentally a distributed systems problem, where hardware failures are routine operating conditions rather than rare exceptions, yet public operational evidence from production training clusters remains limited. This report presents an empirical analysis of a 63-node NVIDIA B200 production cluster (504 GPUs), using 55 days of Prometheus time-series data and 73 days of operational logs covering 224 multi-node training sessions.
The environment is cross-organizational: five parties (SKT, Upstage, Lablup, NVIDIA Korea, VAST Data) share a unified monitoring pipeline. This enabled joint diagnosis of a 60-node-scale storage I/O bottleneck absent in 2-4-node tests, a production-scale phenomenon no single team could isolate alone.
We perform three quantitative analyses yielding four findings. First, over 751 Prometheus metrics and 10 XID-identified GPU failures, no single metric is consistently dominant across failure types, motivating multi-signal detection. Second, 523 checkpoint events trace the save/load path from GPU VRAM to the NFS server: restart loading reaches 21.5% of maximum read bandwidth (700 GB/s) and save bursts 16.0% of maximum write bandwidth (250 GB/s), with NFS/RPC queueing and transport-layer backlog rising together. Third, across 224 sessions over 73 days, node exclusions concentrate so the top 3 of 63 nodes account for over 50%. Fourth, auto-retry chain analysis shows a 33.3% success rate over 12 chains (73 attempts), 2.7x the 12.5% manual rate, with a median retry interval of 11 minutes (IQR 10-11).
All analyses are grounded in production infrastructure providing session-level workload management, GPU-centric scheduling, and unified observability.2026-05-10T06:46:06Z42 pages, 19 figures, 16 tables. Lablup Technical ReportDaemyung KangEunjin HwangHanjeong LeeHyeokJin KimHyunhoi KooJeongkyu ShinJeongseok KangJihyun KangJinho HeoJoongi KimJunbum LeeJungseung YangKyujin ChoYoungsook Songhttp://arxiv.org/abs/2606.10456v1The Distributed Detectability Band Against Marginal-Preserving Attacks2026-06-09T06:04:30ZAI-control monitors score individual agent actions to detect misbehavior, but real harm can be distributed across many benign-looking steps, each individually below any per-step alarm. We construct a marginal-preserving, correlation-encoded distributed-sabotage attack using a Gaussian-copula AR(1) construction: the per-step monitor-score marginal is held exactly equal to benign, so mean, max, top-k tail, and threshold monitors (Monitor A) are defeated by construction, while harm is encoded in the temporal correlation structure. We sequence the paper around three reviewer-mandated gates. (1) Realizability gate: the stealthy attack achieves KS-distance to benign of 0.013 (effectively zero) at all tested harm levels up to 3.0, confirming that harm is fully decoupled from the per-step marginal and realizability is not harm-limited. (2) Monitor-A-vs-B reconciliation: we show formally that the attack, built against Monitor A's score marginal, remains marginal-preserving under a different-score Monitor B (the correlation/sequence family: CUSUM, SPRT, HMM-LR, runs test, autocorrelation, windowed logistic), and scope worst-case claims to score functions that admit a temporal signature. (3) Non-empty detectability band: Monitor A achieves AUC 0.52 (chance); Monitor B spans AUC 0.79-0.97 at the same 1% FPR target, and as harm is amortized over more steps Monitor A collapses to chance while Monitor B holds at AUC ~0.95. These results demonstrate a non-empty detectability band and characterize the sub-threshold sabotage frontier: distribution-shape monitors fail by construction; temporal-correlation monitors can detect but are not trivially optimal.2026-06-09T06:04:30Z10 pages, 11 figuresZhang QinqinGao Yuzehttp://arxiv.org/abs/2605.18271v2From Volume to Value: Preference-Aligned Memory Construction for On-Device RAG2026-06-09T06:03:09ZWith the rapid emergence of personal AI agents based on Large Language Models (LLMs), implementing them on-device has become essential for privacy and responsiveness. To handle the inherently personal and context-dependent nature of real-world requests, such agents must ground their generation in device-resident personal context. However, under tight memory budgets, the core bottleneck is what to store so that retrieval remains aligned with the user. We propose EPIC (Efficient Preference-aligned Index Construction), which focuses on user preferences as a compact and stable form of personal context and integrates them throughout the RAG pipeline. EPIC selectively retains preference-relevant information from raw data and aligns retrieval toward preference-aligned contexts. Across four benchmarks covering conversations, debates, explanations, and recommendations, EPIC reduces indexing memory by 2,404 times, improves preference-following accuracy by 18.79 %p, and achieves 32.17 times lower retrieval latency over the best-performing baseline. In on-device experiments, EPIC maintains under 1 MB memory and achieves 5.21 to 29.35 ms/query latency across three platforms, while supporting streaming updates under preference drift. Our code and data are available at https://github.com/UbiquitousAILab/EPIC.2026-05-18T12:06:05ZAccepted to ICML 2026. Code and data are available at https://github.com/UbiquitousAILab/EPICChangmin LeeJaemin KimTaesik Gonghttp://arxiv.org/abs/2606.10448v1Mitigating Bias in Low-SNR Financial Reinforcement Learning via Quantum Representations2026-06-09T05:55:06ZThe financial market is a typical low signal-to-noise ratio (SNR) setting, which often destabilizes off-policy maximum-entropy methods like Soft Actor-Critic (SAC). Specifically, noisy state representations may produce unreliable Q-value estimates, and bootstrapping amplifies these errors, forming a failure mode we call the "Financial Entropy Trap". In this paper, we propose FPQC-SAC, an efficient and plug-and-play SAC variant that places a compact and bounded Parameterized Quantum Circuit (PQC) before the actor and critic networks to constrain feature propagation at the representation level, rather than filtering raw inputs or regularizing Q-values after bootstrapping. Notably, FPQC-SAC reduces the impact of extreme market fluctuations on Bellman target estimation, while trainable quantum entanglement preserves flexible cross-asset interactions. Empirical evaluations on real-world portfolio management tasks demonstrate that FPQC-SAC substantially enhances out-of-sample stability and cumulative returns by achieving a 66.89% relative gain in cumulative return over standard unconstrained SAC and outperforms the best continuous-control deep reinforcement learning baseline by approximately 27%. Open-source code is available at https://github.com/ZeyuLIU-UST/FPQC-SAC-main.2026-06-09T05:55:06ZPreprint. Code available at https://github.com/ZeyuLIU-UST/FPQC-SAC-mainZeyu LiuXuanzhi FengSing Kwong LaiYuanchen GaoXiaoyi PangHualei ZhangJingcai GuoJie ZhangSong Guohttp://arxiv.org/abs/2410.15595v4A Comprehensive Survey of Direct Preference Optimization: Datasets, Theories, Variants, and Applications2026-06-09T05:48:30ZWith the rapid advancement of large language models (LLMs), aligning policy models with human preferences has become increasingly critical. Direct Preference Optimization (DPO) has emerged as a promising approach for alignment, acting as an RL-free alternative to Reinforcement Learning from Human Feedback (RLHF). Despite DPO's various advancements and inherent limitations, an in-depth review of these aspects is currently lacking in the literature. In this work, we present a comprehensive review of the challenges and opportunities in DPO, covering theoretical analyses, variants, relevant preference datasets, and applications. Specifically, we categorize recent studies on DPO based on key research questions to provide a thorough understanding of DPO's current landscape. Additionally, we propose several future research directions to offer insights on model alignment for the research community. An updated collection of relevant papers can be found on https://github.com/Mr-Loevan/DPO-Survey.2024-10-21T02:27:24ZAccepted by TPAMI 2026. Project page: https://github.com/Mr-Loevan/DPO-SurveyWenyi XiaoZechuan WangLeilei GanShuai ZhaoZongrui LiRuirui LeiWanggui HeLuu Anh TuanLong ChenHao JiangZhou ZhaoFei Wuhttp://arxiv.org/abs/2407.20242v5BadRobot: Jailbreaking Embodied LLM Agents in the Physical World2026-06-09T05:19:18ZEmbodied AI represents systems where AI is integrated into physical entities. Large Language Model (LLM), which exhibits powerful language understanding abilities, has been extensively employed in embodied AI by facilitating sophisticated task planning. However, a critical safety issue remains overlooked: could these embodied LLMs perpetrate harmful behaviors? In response, we introduce BadRobot, a novel attack paradigm aiming to make embodied LLMs violate safety and ethical constraints through typical voice-based user-system interactions. Specifically, three vulnerabilities are exploited to achieve this type of attack: (i) manipulation of LLMs within robotic systems, (ii) misalignment between linguistic outputs and physical actions, and (iii) unintentional hazardous behaviors caused by world knowledge's flaws. Furthermore, we construct a benchmark of various malicious physical action queries to evaluate BadRobot's attack performance. Based on this benchmark, extensive experiments against existing prominent embodied LLM frameworks (e.g., Voxposer, Code as Policies, and ProgPrompt) demonstrate the effectiveness of our BadRobot. Our code is available at https://github.com/Rookie143/BadRobot.2024-07-16T13:13:16ZAccepted to ICLR 2025. Please cite the conference version. Project page: https://Embodied-LLMs-Safety.github.ioInternational Conference on Learning Representations (ICLR) 2025Hangtao ZhangChenyu ZhuXianlong WangZiqi ZhouChanggan YinMinghui LiLulu XueYichen WangShengshan HuAishan LiuPeijin GuoLeo Yu Zhang