https://arxiv.org/api/D6L1oyVq42oTBWGic5NOq664zPA 2026-04-09T12:53:43Z 171213 165 15 http://arxiv.org/abs/2604.06691v1 KD-MARL: Resource-Aware Knowledge Distillation in Multi-Agent Reinforcement Learning 2026-04-08T05:12:40Z

Real world deployment of multi agent reinforcement learning MARL systems is fundamentally constrained by limited compute memory and inference time. While expert policies achieve high performance they rely on costly decision cycles and large scale models that are impractical for edge devices or embedded platforms. Knowledge distillation KD offers a promising path toward resource aware execution but existing KD methods in MARL focus narrowly on action imitation often neglecting coordination structure and assuming uniform agent capabilities. We propose resource aware Knowledge Distillation for Multi Agent Reinforcement Learning KD MARL a two stage framework that transfers coordinated behavior from a centralized expert to lightweight decentralized student agents. The student policies are trained without a critic relying instead on distilled advantage signals and structured policy supervision to preserve coordination under heterogeneous and limited observations. Our approach transfers both action level behavior and structural coordination patterns from expert policies while supporting heterogeneous student architectures allowing each agent model capacity to match its observation complexity which is crucial for efficient execution under partial or limited observability and limited onboard resources. Extensive experiments on SMAC and MPE benchmarks demonstrate that KD MARL achieves high performance retention while substantially reducing computational cost. Across standard multi agent benchmarks KD MARL retains over 90 percent of expert performance while reducing computational cost by up to 28.6 times FLOPs. The proposed approach achieves expert level coordination and preserves it through structured distillation enabling practical MARL deployment across resource constrained onboard platforms.

2026-04-08T05:12:40Z Accepted in IJCNN 2026 Monirul Islam Pavel Siyi Hu Muhammad Anwar Masum Mahardhika Pratama Ryszard Kowalczyk Zehong Jimmy Cao http://arxiv.org/abs/2604.06685v1 ChemVLR: Prioritizing Reasoning in Perception for Chemical Vision-Language Understanding 2026-04-08T05:01:59Z

While Vision-Language Models (VLMs) have demonstrated significant potential in chemical visual understanding, current models are predominantly optimized for direct visual question-answering tasks. This paradigm often results in "black-box" systems that fail to utilize the inherent capability of Large Language Models (LLMs) to infer underlying reaction mechanisms. In this work, we introduce ChemVLR, a chemical VLM designed to prioritize reasoning within the perception process. Unlike conventional chemical VLMs, ChemVLR analyzes visual inputs in a fine-grained manner by explicitly identifying granular chemical descriptors, such as functional groups, prior to generating answers. This approach ensures the production of explicit and interpretable reasoning paths for complex visual chemical problems. To facilitate this methodology, we implement a cross-modality reverse-engineering strategy, combined with a rigorous filtering pipeline, to curate a large-scale reasoning-and-captioning dataset comprising 760k high-quality samples across molecular and reaction tasks. Furthermore, we adopt a three-stage training framework that systemically builds model perception and reasoning capacity. Experiments demonstrate that ChemVLR achieves state-of-the-art (SOTA) performance, surpassing both leading proprietary models and domain-specific open-source baselines. We also provide comprehensive ablation studies to validate our training strategy and data generation designs. Code and model weights will be available at https://github.com/xxlllz/ChemVLR.

2026-04-08T05:01:59Z Accepted by ACL 2026 Findings, Preprint Version Xuanle Zhao Xinyuan Cai Xiang Cheng Xiuyi Chen Bo Xu http://arxiv.org/abs/2510.12088v2 One Life to Learn: Inferring Symbolic World Models for Stochastic Environments from Unguided Exploration 2026-04-08T04:53:32Z

Symbolic world modeling requires inferring and representing an environment's transitional dynamics as an executable program. Prior work has focused on largely deterministic environments with abundant interaction data, simple mechanics, and human guidance. We address a more realistic and challenging setting, learning in a complex, stochastic environment where the agent has only "one life" to explore a hostile environment without human guidance. We introduce OneLife, a framework that models world dynamics through conditionally-activated programmatic laws within a probabilistic programming framework. Each law operates through a precondition-effect structure, activating in relevant world states. This creates a dynamic computation graph that routes inference and optimization only through relevant laws, avoiding scaling challenges when all laws contribute to predictions about a complex, hierarchical state, and enabling the learning of stochastic dynamics even with sparse rule activation. To evaluate our approach under these demanding constraints, we introduce a new evaluation protocol that measures (a) state ranking, the ability to distinguish plausible future states from implausible ones, and (b) state fidelity, the ability to generate future states that closely resemble reality. We develop and evaluate our framework on Crafter-OO, our reimplementation of the Crafter environment that exposes a structured, object-oriented symbolic state and a pure transition function that operates on that state alone. OneLife can successfully learn key environment dynamics from minimal, unguided interaction, outperforming a strong baseline on 16 out of 23 scenarios tested. We also test OneLife's planning ability, with simulated rollouts successfully identifying superior strategies. Our work establishes a foundation for autonomously constructing programmatic world models of unknown, complex environments.

2025-10-14T02:49:32Z Accepted to ICLR 2026. Project page: https://onelife-worldmodel.github.io/; 44 pages Zaid Khan Archiki Prasad Elias Stengel-Eskin Jaemin Cho Mohit Bansal http://arxiv.org/abs/2604.06674v1 Between Century and Poet: Graph-Based Lexical Semantic Change in Persian Poetry 2026-04-08T04:47:53Z

Meaning in Persian poetry is both historical and relational. Words persist through literary tradition while shifting their force through changing constellations of neighbors, rhetorical frames, and poetic voices. This study examines that process using aligned Word2Vec spaces combined with graph-based neighborhood analysis across centuries and major poets. Rather than modeling semantic change as vector displacement alone, it treats lexical history as the rewiring of local semantic graphs: the gain and loss of neighbors, shifts in bridge roles, and movement across communities. The analysis centers on twenty target words, anchored by five recurrent reference terms: Earth, Night, two wine terms, and Heart. Surrounding them are affective, courtly, elemental, and Sufi concepts such as Love, Sorrow, Dervish, King, Annihilation, and Truth. These words exhibit distinct patterns of change. Night is more time-sensitive, Earth more poet-sensitive, and Heart shows continuity despite graph-role mobility. The two wine terms highlight probe sensitivity: one is broad and semantically diffuse, while the other is narrower and more stable. A lexical audit confirms that the corpus contains historically driven terms, poet-specific usages, and sparsely attested mystical vocabulary requiring caution. Overall, semantic change in Persian poetry is better captured as neighborhood rewiring than as abstract drift. For Digital Humanities, this approach restores local structure to computational analysis and supports interpretations closer to literary practice: persistence, migration, mediation, and selective transformation.

2026-04-08T04:47:53Z Kourosh Shahnazari Seyed Moein Ayyoubzadeh Mohammadali Keshtparvar http://arxiv.org/abs/2511.19693v3 TREASURE: The Visa Payment Foundation Model for High-Volume Transaction Understanding 2026-04-08T04:37:02Z

Payment networks form the backbone of modern commerce, generating high volumes of transaction records from daily activities. Properly modeling this data can enable applications such as abnormal behavior detection and consumer-level insights for hyper-personalized experiences, ultimately improving people's lives. In this paper, we present TREASURE, TRansformer Engine As Scalable Universal transaction Representation Encoder, a multipurpose transformer-based foundation model specifically designed for transaction data. The model simultaneously captures both consumer behavior and payment network signals (such as response codes and system flags), providing comprehensive information necessary for applications like accurate recommendation systems and abnormal behavior detection. Verified with industry-grade datasets, TREASURE features three key capabilities: 1) an input module with dedicated sub-modules for static and dynamic attributes, enabling more efficient training and inference; 2) an efficient and effective training paradigm for predicting high-cardinality categorical attributes; and 3) demonstrated effectiveness as both a standalone model that increases abnormal behavior detection performance by 111% over production systems and an embedding provider that enhances recommendation models by 104%. We present key insights from extensive ablation studies, benchmarks against production models, and case studies, highlighting valuable knowledge gained from developing TREASURE.

2025-11-24T20:57:31Z Chin-Chia Michael Yeh Uday Singh Saini Xin Dai Xiran Fan Shubham Jain Yujie Fan Jiarui Sun Junpeng Wang Menghai Pan Yingtong Dou Yuzhong Chen Vineeth Rakesh Liang Wang Yan Zheng Mahashweta Das http://arxiv.org/abs/2604.06666v1 A Graph-Enhanced Defense Framework for Explainable Fake News Detection with LLM 2026-04-08T04:34:19Z

Explainable fake news detection aims to assess the veracity of news claims while providing human-friendly explanations. Existing methods incorporating investigative journalism are often inefficient and struggle with breaking news. Recent advances in large language models (LLMs) enable leveraging externally retrieved reports as evidence for detection and explanation generation, but unverified reports may introduce inaccuracies. Moreover, effective explainable fake news detection should provide a comprehensible explanation for all aspects of a claim to assist the public in verifying its accuracy. To address these challenges, we propose a graph-enhanced defense framework (G-Defense) that provides fine-grained explanations based solely on unverified reports. Specifically, we construct a claim-centered graph by decomposing the news claim into several sub-claims and modeling their dependency relationships. For each sub-claim, we use the retrieval-augmented generation (RAG) technique to retrieve salient evidence and generate competing explanations. We then introduce a defense-like inference module based on the graph to assess the overall veracity. Finally, we prompt an LLM to generate an intuitive explanation graph. Experimental results demonstrate that G-Defense achieves state-of-the-art performance in both veracity detection and the quality of its explanations.

2026-04-08T04:34:19Z Accepted by TOIS Bo Wang Jing Ma Hongzhan Lin Zhiwei Yang Ruichao Yang Yuan Tian Yi Chang http://arxiv.org/abs/2604.03486v2 VisionClaw: Always-On AI Agents through Smart Glasses 2026-04-08T04:29:20Z

We present VisionClaw, an always-on wearable AI agent that integrates live egocentric perception with agentic task execution. Running on Meta Ray-Ban smart glasses, VisionClaw continuously perceives real-world context and enables in-situ, speech-driven action initiation and delegation via OpenClaw AI agents. Therefore, users can directly execute tasks through the smart glasses, such as adding real-world objects to an Amazon cart, generating notes from physical documents, receiving meeting briefings on the go, creating events from posters, or controlling IoT devices. We evaluate VisionClaw through a controlled laboratory study (N=12) and a longitudinal deployment study (N=5). Results show that integrating perception and execution enables faster task completion and reduces interaction overhead compared to non-always-on and non-agent baselines. Beyond performance gains, deployment findings reveal a shift in interaction: tasks are initiated opportunistically during ongoing activities, and execution is increasingly delegated rather than manually controlled. These results suggest a new paradigm for wearable AI agents, where perception and action are continuously coupled to support situated, hands-free interaction.

2026-04-03T22:17:10Z 17 pages, 11 figures, plus appendix Xiaoan Liu DaeHo Lee Eric J Gonzalez Mar Gonzalez-Franco Ryo Suzuki http://arxiv.org/abs/2604.06663v1 Restoring Heterogeneity in LLM-based Social Simulation: An Audience Segmentation Approach 2026-04-08T04:29:08Z

Large Language Models (LLMs) are increasingly used to simulate social attitudes and behaviors, offering scalable "silicon samples" that can approximate human data. However, current simulation practice often collapses diversity into an "average persona," masking subgroup variation that is central to social reality. This study introduces audience segmentation as a systematic approach for restoring heterogeneity in LLM-based social simulation. Using U.S. climate-opinion survey data, we compare six segmentation configurations across two open-weight LLMs (Llama 3.1-70B and Mixtral 8x22B), varying segmentation identifier granularity, parsimony, and selection logic (theory-driven, data-driven, and instrument-based). We evaluate simulation performance with a three-dimensional evaluation framework covering distributional, structural, and predictive fidelity. Results show that increasing identifier granularity does not produce consistent improvement: moderate enrichment can improve performance, but further expansion does not reliably help and can worsen structural and predictive fidelity. Across parsimony comparisons, compact configurations often match or outperform more comprehensive alternatives, especially in structural and predictive fidelity, while distributional fidelity remains metric dependent. Identifier selection logic determines which fidelity dimension benefits most: instrument-based selection best preserves distributional shape, whereas data-driven selection best recovers between-group structure and identifier-outcome associations. Overall, no single configuration dominates all dimensions, and performance gains in one dimension can coincide with losses in another. These findings position audience segmentation as a core methodological approach for valid LLM-based social simulation and highlight the need for heterogeneity-aware evaluation and variance-preserving modeling strategies.

2026-04-08T04:29:08Z Xiaoyou Qin Zhihong Li Xiaoxiao Cheng http://arxiv.org/abs/2604.05268v2 Region-R1: Reinforcing Query-Side Region Cropping for Multi-Modal Re-Ranking 2026-04-08T04:25:11Z

Multi-modal retrieval-augmented generation (MM-RAG) relies heavily on re-rankers to surface the most relevant evidence for image-question queries. However, standard re-rankers typically process the full query image as a global embedding, making them susceptible to visual distractors (e.g., background clutter) that skew similarity scores. We propose Region-R1, a query-side region cropping framework that formulates region selection as a decision-making problem during re-ranking, allowing the system to learn to retain the full image or focus only on a question-relevant region before scoring the retrieved candidates. Region-R1 learns a policy with a novel region-aware group relative policy optimization (r-GRPO) to dynamically crop a discriminative region. Across two challenging benchmarks, E-VQA and InfoSeek, Region-R1 delivers consistent gains, achieving state-of-the-art performances by increasing conditional Recall@1 by up to 20%. These results show the great promise of query-side adaptation as a simple but effective way to strengthen MM-RAG re-ranking.

2026-04-07T00:05:12Z 12 pages, 4 figures, accepted to ACL 2026 Findings, code available at https://github.com/taco-group/Region-R1 Chan-Wei Hu Zhengzhong Tu http://arxiv.org/abs/2511.19365v2 DeCo: Frequency-Decoupled Pixel Diffusion for End-to-End Image Generation 2026-04-08T04:08:23Z

Pixel diffusion aims to generate images directly in pixel space in an end-to-end fashion. This approach avoids the limitations of VAE in the two-stage latent diffusion, offering higher model capacity. Existing pixel diffusion models suffer from slow training and inference, as they usually model both high-frequency signals and low-frequency semantics within a single diffusion transformer (DiT). To pursue a more efficient pixel diffusion paradigm, we propose the frequency-DeCoupled pixel diffusion framework. With the intuition to decouple the generation of high and low frequency components, we leverage a lightweight pixel decoder to generate high-frequency details conditioned on semantic guidance from the DiT. This thus frees the DiT to specialize in modeling low-frequency semantics. In addition, we introduce a frequency-aware flow-matching loss that emphasizes visually salient frequencies while suppressing insignificant ones. Extensive experiments show that DeCo achieves superior performance among pixel diffusion models, attaining FID of 1.62 (256x256) and 2.22 (512x512) on ImageNet, closing the gap with latent diffusion methods. Furthermore, our pretrained text-to-image model achieves a leading overall score of 0.86 on GenEval in system-level comparison. Codes are publicly available at https://github.com/Zehong-Ma/DeCo.

2025-11-24T17:59:06Z Accepted to CVPR2026. Project Page: https://zehong-ma.github.io/DeCo. Code Repository: https://github.com/Zehong-Ma/DeCo Zehong Ma Longhui Wei Shuai Wang Shiliang Zhang Qi Tian http://arxiv.org/abs/2604.04614v2 A Clinical Point Cloud Paradigm for In-Hospital Mortality Prediction from Multi-Level Incomplete Multimodal EHRs 2026-04-08T03:59:16Z

Deep learning-based modeling of multimodal Electronic Health Records (EHRs) has become an important approach for clinical diagnosis and risk prediction. However, due to diverse clinical workflows and privacy constraints, raw EHRs are inherently multi-level incomplete, including irregular sampling, missing modalities, and sparse labels. These issues cause temporal misalignment, modality imbalance, and limited supervision. Most existing multimodal methods assume relatively complete data, and even methods designed for incompleteness usually address only one or two of these issues in isolation. As a result, they often rely on rigid temporal/modal alignment or discard incomplete data, which may distort raw clinical semantics. To address this problem, we propose HealthPoint (HP), a unified clinical point cloud paradigm for multi-level incomplete EHRs. HP represents heterogeneous clinical events as points in a continuous 4D space defined by content, time, modality, and case. To model interactions between arbitrary point pairs, we introduce a Low-Rank Relational Attention mechanism that efficiently captures high-order dependencies across these four dimensions. We further develop a hierarchical interaction and sampling strategy to balance fine-grained modeling and computational efficiency. Built on this framework, HP enables flexible event-level interaction and fine-grained self-supervision, supporting robust modality recovery and effective use of unlabeled data. Experiments on large-scale EHR datasets for risk prediction show that HP consistently achieves state-of-the-art performance and strong robustness under varying degrees of incompleteness.

2026-04-06T12:03:36Z 20 pages Bohao Li Tao Zou Junchen Ye Yan Gong Bowen Du http://arxiv.org/abs/2505.20662v3 AutoReproduce: Automatic AI Experiment Reproduction with Paper Lineage 2026-04-08T03:54:50Z

Efficient reproduction of research papers is pivotal to accelerating scientific progress. However, the increasing complexity of proposed methods often renders reproduction a labor-intensive endeavor, necessitating profound domain expertise. To address this, we introduce the paper lineage, which systematically mines implicit knowledge from the cited literature. This algorithm serves as the backbone of our proposed AutoReproduce, a multi-agent framework designed to autonomously reproduce experimental code in a complete, end-to-end manner. To ensure code executability, AutoReproduce incorporates a sampling-based unit testing strategy for rapid validation. To assess reproduction capabilities, we introduce ReproduceBench, a benchmark featuring verified implementations, alongside comprehensive metrics for evaluating both reproduction and execution fidelity. Extensive evaluations on PaperBench and ReproduceBench demonstrate that AutoReproduce consistently surpasses existing baselines across all metrics. Notably, it yields substantial improvements in reproduction fidelity and final execution performance.

2025-05-27T03:15:21Z Accepted by ACL 2026 Main Xuanle Zhao Zilin Sang Yuxuan Li Qi Shi Weilun Zhao Shuo Wang Duzhen Zhang Xu Han Zhiyuan Liu Maosong Sun http://arxiv.org/abs/2604.06650v1 A Parameter-Efficient Transfer Learning Approach through Multitask Prompt Distillation and Decomposition for Clinical NLP 2026-04-08T03:52:52Z

Existing prompt-based fine-tuning methods typically learn task-specific prompts independently, imposing significant computing and storage overhead at scale when deploying multiple clinical natural language processing (NLP) systems. We present a multitask prompt distillation and decomposition framework that learns a single shared metaprompt from 21 diverse clinical source tasks and adapts it to unseen target tasks with fewer than 0.05% trainable parameters. Evaluated across five clinical NLP task types (named entity recognition, relation extraction, question answering, natural language inference, and summarization) on 10 held-out target datasets using three backbone models (LLaMA 3.1 8B, Meditron3 8B, gpt-oss 20B), our framework consistently outperforms LoRA by 1.5~1.7% despite using orders of magnitude fewer parameters, and exceeds single-task prompt tuning by 6.1~6.6%. The gpt-oss 20B model achieves the highest overall performance, particularly on clinical reasoning tasks. The strong zero- and few-shot performance demonstrates better transferability of the shared prompt representation.

2026-04-08T03:52:52Z Cheng Peng Mengxian Lyu Ziyi Chen Yonghui Wu http://arxiv.org/abs/2603.28253v2 MR-ImagenTime: Multi-Resolution Time Series Generation through Dual Image Representations 2026-04-08T03:27:07Z

Time series forecasting is vital across many domains, yet existing models struggle with fixed-length inputs and inadequate multi-scale modeling. We propose MR-CDM, a framework combining hierarchical multi-resolution trend decomposition, an adaptive embedding mechanism for variable-length inputs, and a multi-scale conditional diffusion process. Evaluations on four real-world datasets demonstrate that MR-CDM significantly outperforms state-of-the-art baselines (e.g., CSDI, Informer), reducing MAE and RMSE by approximately 6-10 to a certain degree.

2026-03-30T10:25:35Z Xianyong Xu Yuanjun Zuo Zhihong Huang Yihan Qin Haoxian Xu Leilei Du Haotian Wang http://arxiv.org/abs/2604.06638v1 RPM-Net Reciprocal Point MLP Network for Unknown Network Security Threat Detection 2026-04-08T03:25:43Z

Effective detection of unknown network security threats in multi-class imbalanced environments is critical for maintaining cyberspace security. Current methods focus on learning class representations but face challenges with unknown threat detection, class imbalance, and lack of interpretability, limiting their practical use. To address this, we propose RPM-Net, a novel framework that introduces reciprocal point mechanism to learn "non-class" representations for each known attack category, coupled with adversarial margin constraints that provide geometric interpretability for unknown threat detection. RPM-Net++ further enhances performance through Fisher discriminant regularization. Experimental results show that RPM-Net achieves superior performance across multiple metrics including F1-score, AUROC, and AUPR-OUT, significantly outperforming existing methods and offering practical value for real-world network security applications. Our code is available at:https://github.com/chiachen-chang/RPM-Net

2026-04-08T03:25:43Z Compared to the ICASSP 2026 proceedings version, this version corrects a transcription error in Table 1 (ODIN's precision, recall, and f1 scores) Jiachen Zhang Yueming Lu Fan Feng Zhanfeng Wang Shengli Pan Daoqi Han