https://arxiv.org/api/fpXQxfKOD3WLB3Kdzrv+mBPdm58 2026-04-11T12:56:34Z 171450 300 15 http://arxiv.org/abs/2604.07590v1 DCD: Domain-Oriented Design for Controlled Retrieval-Augmented Generation 2026-04-08T20:47:51Z Retrieval-Augmented Generation (RAG) is widely used to ground large language models in external knowledge sources. However, when applied to heterogeneous corpora and multi-step queries, Naive RAG pipelines often degrade in quality due to flat knowledge representations and the absence of explicit workflows. In this work, we introduce DCD (Domain-Collection-Document), a domain-oriented design to structure knowledge and control query processing in RAG systems without modifying the underlying language model. The proposed approach relies on a hierarchical decomposition of the information space and multi-stage routing based on structured model outputs, enabling progressive restriction of both retrieval and generation scopes. The architecture is complemented by smart chunking, hybrid retrieval, and integrated validation and generation guardrail mechanisms. We describe the DCD architecture and workflow and discuss evaluation results on synthetic evaluation dataset, highlighting their impact on robustness, factual accuracy, and answer relevance in applied RAG scenarios. 2026-04-08T20:47:51Z 11 pages, 4 figures, 2 links, link to HF https://huggingface.co/datasets/redmadrobot-rnd/dcd, link to GIT https://github.com/redmadrobot-rnd/dcd Valeriy Kovalskiy Nikita Belov Nikita Miteyko Igor Reshetnikov Max Maximov http://arxiv.org/abs/2601.10587v2 Adversarial Evasion Attacks on Computer Vision using SHAP Values 2026-04-08T20:47:19Z The paper introduces a white-box attack on computer vision models using SHAP values. It demonstrates how adversarial evasion attacks can compromise the performance of deep learning models by reducing output confidence or inducing misclassifications. Such attacks are particularly insidious as they can deceive the perception of an algorithm while eluding human perception due to their imperceptibility to the human eye. The proposed attack leverages SHAP values to quantify the significance of individual inputs to the output at the inference stage. A comparison is drawn between the SHAP attack and the well-known Fast Gradient Sign Method. We find evidence that SHAP attacks are more robust in generating misclassifications particularly in gradient hiding scenarios. 2026-01-15T16:58:55Z 10th bwHPC Symposium - September 25th & 26th, 2024 Frank Mollard Marcus Becker Florian Roehrbein 10.13140/RG.2.2.28762.40647 http://arxiv.org/abs/2604.07585v1 Don't Measure Once: Measuring Visibility in AI Search (GEO) 2026-04-08T20:38:05Z As large language model-based chat systems become increasingly widely used, generative engine optimization (GEO) has emerged as an important problem for information access and retrieval. In classical search engines, results are comparatively transparent and stable: a single query often provides a representative snapshot of where a page or brand appears relative to competitors. The inherent probabilistic nature of AI search changes this paradigm. Answers can vary across runs, prompts, and time, making one-off observations unreliable. Drawing on empirical studies, our findings underscore the need for repeated measurements to assess a brand's GEO performance and to characterize visibility as a distribution rather than a single-point outcome. 2026-04-08T20:38:05Z 19 pages, 7 figures, 17 tables. Comments welcome! Julius Schulte Malte Bleeker Philipp Kaufmann http://arxiv.org/abs/2604.07584v1 From Papers to Property Tables: A Priority-Based LLM Workflow for Materials Data Extraction 2026-04-08T20:37:17Z Scientific data are widely dispersed across research articles and are often reported inconsistently across text, tables, and figures, making manual data extraction and aggregation slow and error-prone. We present a prompt-driven, hierarchical workflow that uses a large language model (LLM) to automatically extract and reconstruct structured, shot-level shock-physics experimental records by integrating information distributed across text, tables, figures, and physics-based derivations from full-text published research articles, using alloy spall strength as a representative case study. The pipeline targeted 37 experimentally relevant fields per shot and applied a three-level priority strategy: (T1) direct extraction from text/tables, (T2) physics-based derivation using verified governing relations, and (T3) digitization from figures when necessary. Extracted values were normalized to canonical units, tagged by priority for traceability, and validated with physics-based consistency and plausibility checks. Evaluated on a benchmark of 30 published research articles comprising 11,967 evaluated data points, the workflow achieved high overall accuracy, with priority-wise accuracies of 94.93% (T1), 92.04% (T2), and 83.49% (T3), and an overall weighted accuracy of 94.69%. Cross-model testing further indicated strong agreement for text/table and equation-derived fields, with lower agreement for figure-based extraction. Implementation through an API interface demonstrated the scalability of the approach, achieving consistent extraction performance and, in a subset of test cases, matching or exceeding chat-based accuracy. This workflow demonstrates a practical approach for converting unstructured technical literature into traceable, analysis-ready datasets without task-specific fine-tuning, enabling scalable database construction in materials science. 2026-04-08T20:37:17Z Koushik Rameshbabu Jing Luo Ali Shargh Khalid A. El-Awady Jaafar A. El-Awady http://arxiv.org/abs/2604.07569v1 Learning is Forgetting: LLM Training As Lossy Compression 2026-04-08T20:12:07Z Despite the increasing prevalence of large language models (LLMs), we still have a limited understanding of how their representational spaces are structured. This limits our ability to interpret how and what they learn or relate them to learning in humans. We argue LLMs are best seen as an instance of lossy compression, where over training they learn by retaining only information in their training data relevant to their objective(s). We show pre-training results in models that are optimally compressed for next-sequence prediction, approaching the Information Bottleneck bound on compression. Across an array of open weights models, each compresses differently, likely due to differences in the data and training recipes used. However even across different families of LLMs the optimality of a model's compression, and the information present in it, can predict downstream performance on across a wide array of benchmarks, letting us directly link representational structure to actionable insights about model performance. In the general case the work presented here offers a unified Information-Theoretic framing for how these models learn that is deployable at scale. 2026-04-08T20:12:07Z 12 page core paper, 16 page Appendix - A shorter version with fewer visuals appears at ICLR 2026 Henry C. Conklin Tom Hosking Tan Yi-Chern Julian Gold Jonathan D. Cohen Thomas L. Griffiths Max Bartolo Seraphina Goldfarb-Tarrant http://arxiv.org/abs/2604.07562v1 Reasoning-Based Refinement of Unsupervised Text Clusters with LLMs 2026-04-08T20:02:48Z Unsupervised methods are widely used to induce latent semantic structure from large text collections, yet their outputs often contain incoherent, redundant, or poorly grounded clusters that are difficult to validate without labeled data. We propose a reasoning-based refinement framework that leverages large language models (LLMs) not as embedding generators, but as semantic judges that validate and restructure the outputs of arbitrary unsupervised clustering algorithms.Our framework introduces three reasoning stages: (i) coherence verification, where LLMs assess whether cluster summaries are supported by their member texts; (ii) redundancy adjudication, where candidate clusters are merged or rejected based on semantic overlap; and (iii) label grounding, where clusters are assigned interpretable labels in a fully unsupervised manner. This design decouples representation learning from structural validation and mitigates common failure modes of embedding-only approaches. We evaluate the framework on real-world social media corpora from two platforms with distinct interaction models, demonstrating consistent improvements in cluster coherence and human-aligned labeling quality over classical topic models and recent representation-based baselines. Human evaluation shows strong agreement with LLM-generated labels, despite the absence of gold-standard annotations. We further conduct robustness analyses under matched temporal and volume conditions to assess cross-platform stability. Beyond empirical gains, our results suggest that LLM-based reasoning can serve as a general mechanism for validating and refining unsupervised semantic structure, enabling more reliable and interpretable analyses of large text collections without supervision. 2026-04-08T20:02:48Z Accepted to the Findings of the 64th Annual Meeting of the Association for Computational Linguistics (ACL 2026) Tunazzina Islam http://arxiv.org/abs/2604.07559v1 Dual-Loop Control in DCVerse: Advancing Reliable Deployment of AI in Data Centers via Digital Twins 2026-04-08T20:01:34Z The growing scale and complexity of modern data centers present major challenges in balancing energy efficiency with outage risk. Although Deep Reinforcement Learning (DRL) shows strong potential for intelligent control, its deployment in mission-critical systems is limited by data scarcity and the lack of real-time pre-evaluation mechanisms. This paper introduces the Dual-Loop Control Framework (DLCF), a digital twin-based architecture designed to overcome these challenges. The framework comprises three core entities: the physical system, a digital twin, and a policy reservoir of diverse DRL agents. These components interact through a dual-loop mechanism involving real-time data acquisition, data assimilation, DRL policy training, pre-evaluation, and expert verification. Theoretical analysis shows how DLCF can improve sample efficiency, generalization, safety, and optimality. Leveraging DLCF, we implemented the DCVerse platform and validated it through case studies on a real-world data center cooling system. The evaluation shows that our approach achieves up to 4.09% energy savings over conventional control strategies without violating SLA requirements. Additionally, the framework improves policy interpretability and supports more trustworthy DRL deployment. This work provides a foundation for reliable AI-based control in data centers and points toward future extensions for holistic, system-wide optimization. 2026-04-08T20:01:34Z Qingang Zhang Yuejun Yan Guangyu Wu Siew-Chien Wong Jimin Jia Zhaoyang Wang Yonggang Wen http://arxiv.org/abs/2604.07558v1 Generative Experiences for Digital Mental Health Interventions: Evidence from a Randomized Study 2026-04-08T19:59:28Z Digital mental health (DMH) tools have extensively explored personalization of interventions to users' needs and contexts. However, this personalization often targets what support is provided, not how it is experienced. Even well-matched content can fail when the interaction format misaligns with how someone can engage. We introduce generative experience as a paradigm for DMH support, where the intervention experience is composed at runtime. We instantiate this in GUIDE, a system that generates personalized intervention content and multimodal interaction structure through rubric-guided generation of modular components. In a preregistered study with N = 237 participants, GUIDE significantly reduced stress (p = .02) and improved the user experience (p = .04) compared to an LLM-based cognitive restructuring control. GUIDE also supported diverse forms of reflection and action through varied interaction flows, while revealing tensions around personalization across the interaction sequence. This work lays the foundation for interventions that dynamically shape how support is experienced and enacted in digital settings. 2026-04-08T19:59:28Z Ananya Bhattacharjee Michael Liut Matthew Jörke Diyi Yang Emma Brunskill http://arxiv.org/abs/2604.07553v1 TR-EduVSum: A Turkish-Focused Dataset and Consensus Framework for Educational Video Summarization 2026-04-08T19:53:39Z This study presents a framework for generating the gold-standard summary fully automatically and reproducibly based on multiple human summaries of Turkish educational videos. Within the scope of the study, a new dataset called TR-EduVSum was created, encompassing 82 Turkish course videos in the field of "Data Structures and Algorithms" and containing a total of 3281 independent human summaries. Inspired by existing pyramid-based evaluation approaches, the AutoMUP (Automatic Meaning Unit Pyramid) method is proposed, which extracts consensus-based content from multiple human summaries. AutoMUP clusters the meaning units extracted from human summaries using embedding, statistically models inter-participant agreement, and generates graded summaries based on consensus weight. In this framework, the gold summary corresponds to the highest-consensus AutoMUP configuration, constructed from the most frequently supported meaning units across human summaries. Experimental results show that AutoMUP summaries exhibit high semantic overlap with robust LLM (Large Language Model) summaries such as Flash 2.5 and GPT-5.1. Furthermore, ablation studies clearly demonstrate the decisive role of consensus weight and clustering in determining summary quality. The proposed approach can be generalized to other Turkic languages at low cost. 2026-04-08T19:53:39Z 8 pages, 2 figures, 3 tables. Accepted at the Second Workshop on Natural Language Processing for Turkic Languages (SIGTURK 2026), EACL 2026, Rabat, Morocco Figen Eğin Aytuğ Onan 10.18653/v1/2026.sigturk-1.5 http://arxiv.org/abs/2604.07551v1 MCP-DPT: A Defense-Placement Taxonomy and Coverage Analysis for Model Context Protocol Security 2026-04-08T19:53:26Z The Model Context Protocol (MCP) enables large language models (LLMs) to dynamically discover and invoke third-party tools, significantly expanding agent capabilities while introducing a distinct security landscape. Unlike prompt-only interactions, MCP exposes pre-execution artifacts, shared context, multi-turn workflows, and third-party supply chains to adversarial influence across independently operated components. While recent work has identified MCP-specific attacks and evaluated defenses, existing studies are largely attack-centric or benchmark-driven, providing limited guidance on where mitigation responsibility should reside within the MCP architecture. This is problematic given MCP's multi-party design and distributed trust boundaries. We present a defense-placement-oriented security analysis of MCP, introducing a layer-aligned taxonomy that organizes attacks by the architectural component responsible for enforcement. Threats are mapped across six MCP layers, and primary and secondary defense points are identified to support principled defense-in-depth reasoning under adversaries controlling tools, servers, or ecosystem components. A structured mapping of existing academic and industry defenses onto this framework reveals uneven and predominantly tool-centric protection, with persistent gaps at the host orchestration, transport, and supply-chain layers. These findings suggest that many MCP security weaknesses stem from architectural misalignment rather than isolated implementation flaws. 2026-04-08T19:53:26Z Mehrdad Rostamzadeh Sidhant Narula Nahom Birhan Mohammad Ghasemigol Daniel Takabi http://arxiv.org/abs/2604.07549v1 EMSDialog: Synthetic Multi-person Emergency Medical Service Dialogue Generation from Electronic Patient Care Reports via Multi-LLM Agents 2026-04-08T19:52:51Z Conversational diagnosis prediction requires models to track evolving evidence in streaming clinical conversations and decide when to commit to a diagnosis. Existing medical dialogue corpora are largely dyadic or lack the multi-party workflow and annotations needed for this setting. We introduce an ePCR-grounded, topic-flow-based multi-agent generation pipeline that iteratively plans, generates, and self-refines dialogues with rule-based factual and topic flow checks. The pipeline yields EMSDialog, a dataset of 4,414 synthetic multi-speaker EMS conversations based on a real-world ePCR dataset, annotated with 43 diagnoses, speaker roles, and turn-level topics. Human and LLM evaluations confirm high quality and realism of EMSDialog using both utterance- and conversation-level metrics. Results show that EMSDialog-augmented training improves accuracy, timeliness, and stability of EMS conversational diagnosis prediction. 2026-04-08T19:52:51Z Accepted by ACL Findings 2026 Xueren Ge Sahil Murtaza Anthony Cortez Homa Alemzadeh http://arxiv.org/abs/2604.07546v1 Agentic Copyright, Data Scraping & AI Governance: Toward a Coasean Bargain in the Era of Artificial Intelligence 2026-04-08T19:43:28Z This paper examines how the rapid deployment of multi-agentic AI systems is reshaping the foundations of copyright law and creative markets. It argues that existing copyright frameworks are ill-equipped to govern AI agent-mediated interactions that occur at scale, speed, and with limited human oversight. The paper introduces the concept of agentic copyright, a model in which AI agents act on behalf of creators and users to negotiate access, attribution, and compensation for copyrighted works. While multi-agent ecosystems promise efficiency gains and reduced transaction costs, they also generate novel market failures, including miscoordination, conflict, and collusion among autonomous agents. To address these market failures, the paper develops a supervised multi-agent governance framework that integrates legal rules and principles, technical protocols, and institutional oversight. This framework emphasizes ex ante and ex post coordination mechanisms capable of correcting agentic market failures before they crystallize into systemic harm. By embedding normative constraints and monitoring functions into multi-agent architectures, supervised governance aims to align agent behavior with the underlying values of copyright law. The paper concludes that AI should be understood not only as a source of disruption, but also as a governance tool capable of restoring market-based ordering in creative industries. Properly designed, agentic copyright offers a path toward scalable, fair, and legally meaningful copyright markets in the age of AI. 2026-04-08T19:43:28Z Paulius Jurcys Mark Fenwick http://arxiv.org/abs/2505.13126v3 Iterative Formalization and Planning in Partially Observable Environments 2026-04-08T19:36:17Z Using LLMs not to predict plans but to formalize an environment into the Planning Domain Definition Language (PDDL) has been shown to improve performance and control. While most existing methodology only applies to fully observable environments, we adapt to the more realistic and challenging partially observable environments without sufficient information to make a complete plan. We propose PDDLego, a framework to iteratively formalize, plan, grow, and refine PDDL representations by decomposing the environment and the goal into fully observable episodes. Without finetuning, in-context exemplars, or trajectories, PDDLego improves planning success and exhibits robustness against problem complexity compared to end-to-end approaches. We also show that the domain knowledge captured after a successful trial can benefit future tasks. 2025-05-19T13:58:15Z In Findings of ACL 2026 Liancheng Gong Wang Zhu Jesse Thomason Li Zhang http://arxiv.org/abs/2604.07535v1 Trust the AI, Doubt Yourself: The Effect of Urgency on Self-Confidence in Human-AI Interaction 2026-04-08T19:17:59Z Studies show that interactions with an AI system fosters trust in human users towards AI. An often overlooked element of such interaction dynamics is the (sense of) urgency when the human user is prompted by an AI agent, e.g., for advice or guidance. In this paper, we show that although the presence of urgency in human-AI interactions does not affect the trust in AI, it may be detrimental to the human user's self-confidence and self-efficacy. In the long run, the loss of confidence may lead to performance loss, suboptimal decisions, human errors, and ultimately, unsustainable AI systems. Our evidence comes from an experiment with 30 human participants. Our results indicate that users may feel more confident in their work when they are eased into the human-AI setup rather than exposed to it without preparation. We elaborate on the implications of this finding for software engineers and decision-makers. 2026-04-08T19:17:59Z Baran Shajari Xiaoran Liu Kyanna Dagenais Istvan David http://arxiv.org/abs/2604.07533v1 RL-ASL: A Dynamic Listening Optimization for TSCH Networks Using Reinforcement Learning 2026-04-08T19:17:26Z Time Slotted Channel Hopping (TSCH) is a widely adopted Media Access Control (MAC) protocol within the IEEE 802.15.4e standard, designed to provide reliable and energy-efficient communication in Industrial Internet of Things (IIoT) networks. However, state-of-the-art TSCH schedulers rely on static slot allocations, resulting in idle listening and unnecessary power consumption under dynamic traffic conditions. This paper introduces RL-ASL, a reinforcement learning-driven adaptive listening framework that dynamically decides whether to activate or skip a scheduled listening slot based on real-time network conditions. By integrating learning-based slot skipping with standard TSCH scheduling, RL-ASL reduces idle listening while preserving synchronization and delivery reliability. Experimental results on the FIT IoT-LAB testbed and Cooja network simulator show that RL-ASL achieves up to 46% lower power consumption than baseline scheduling protocols, while maintaining near-perfect reliability and reducing average latency by up to 96% compared to PRIL-M. Its link-based variant, RL-ASL-LB, further improves delay performance under high contention with similar energy efficiency. Importantly, RL-ASL performs inference on constrained motes with negligible overhead, as model training is fully performed offline. Overall, RL-ASL provides a practical, scalable, and energy-aware scheduling mechanism for next-generation low-power IIoT networks. 2026-04-08T19:17:26Z 14 pages F. Fernando Jurado-Lasso J. F. Jurado