https://arxiv.org/api/XcbDOzdXnn0aHRuY1x/i5gM3F7c 2026-04-12T14:47:33Z 171450 540 15 http://arxiv.org/abs/2409.01633v4 SleepNet and DreamNet: Enriching and Reconstructing Representations for Consolidated Visual Classification 2026-04-08T02:34:57Z An effective integration of rich feature representations with robust classification mechanisms remains a key challenge in visual understanding tasks. This study introduces two novel deep learning models, SleepNet and DreamNet, which are designed to improve representation utilization through feature enrichment and reconstruction strategies. SleepNet integrates supervised learning with representations obtained from pre-trained encoders, leading to stronger and more robust feature learning. Building on this foundation, DreamNet incorporates pre-trained encoder decoder frameworks to reconstruct hidden states, allowing deeper consolidation and refinement of visual representations. Our experiments show that our models consistently achieve superior performance compared with existing state-of-the-art methods, demonstrating the effectiveness of the proposed enrichment and reconstruction approaches. 2024-09-03T06:04:39Z Mingze Ni Wei Liu http://arxiv.org/abs/2601.21463v2 Unifying Speech Editing Detection and Content Localization via Prior-Enhanced Audio LLMs 2026-04-08T02:32:24Z Existing speech editing detection (SED) datasets are predominantly constructed using manual splicing or limited editing operations, resulting in restricted diversity and poor coverage of realistic editing scenarios. Meanwhile, current SED methods rely heavily on frame-level supervision to detect observable acoustic anomalies, which fundamentally limits their ability to handle deletion-type edits, where the manipulated content is entirely absent from the signal. To address these challenges, we present a unified framework that bridges speech editing detection and content localization through a generative formulation based on Audio Large Language Models (Audio LLMs). We first introduce AiEdit, a large-scale bilingual dataset (approximately 140 hours) that covers addition, deletion, and modification operations using state-of-the-art end-to-end speech editing systems, providing a more realistic benchmark for modern threats. Building upon this, we reformulate SED as a structured text generation task, enabling joint reasoning over edit type identification, and content localization. To enhance the grounding of generative models in acoustic evidence, we propose a prior-enhanced prompting strategy that injects word-level probabilistic cues derived from a frame-level detector. Furthermore, we introduce an acoustic consistency-aware loss that explicitly enforces the separation between normal and anomalous acoustic representations in the latent space. Experimental results demonstrate that the proposed approach consistently outperforms existing methods across both detection and localization tasks. 2026-01-29T09:39:28Z Jun Xue Yi Chai Yanzhen Ren Jinshen He Zhiqiang Tang Zhuolin Yi Yihuan Huang Yuankun Xie Yujie Chen http://arxiv.org/abs/2507.18937v3 CNN-based Surface Temperature Forecasts with Ensemble Numerical Weather Prediction 2026-04-08T02:26:53Z Due to limited computational resources, medium-range temperature forecasts typically rely on low-resolution numerical weather prediction (NWP) models, which are prone to systematic and random errors. We propose a method that integrates a convolutional neural network (CNN) with an ensemble of low-resolution NWP models (40-km horizontal resolution) to produce high-resolution (5-km) surface temperature forecasts with lead times extending up to 5.5 days (132 h). First, CNN-based post-processing (bias correction and spatial downscaling) is applied to individual ensemble members to reduce systematic errors and perform downscaling, which improves the deterministic forecast accuracy. Second, this member-wise correction is applied to all 51 ensemble members to construct a new high-resolution ensemble forecasting system with an improved probabilistic reliability and spread-skill ratio that differs from the simple error reduction mechanism of ensemble averaging. Whereas averaging reduces forecast errors by smoothing spatial fields, our member-wise CNN correction reduces error from noise while maintaining forecast information at a level comparable to that of other high-resolution forecasts. Experimental results indicate that the proposed method provides a practical and scalable solution for improving medium-range temperature forecasts, which is particularly valuable for use in operational centers with limited computational resources. 2025-07-25T04:19:05Z 48 pages, 14 figures Takuya Inoue Meteorological Research Institute, Tsukuba, Japan Takuya Kawabata Meteorological Research Institute, Tsukuba, Japan http://arxiv.org/abs/2604.07382v1 Latent Structure of Affective Representations in Large Language Models 2026-04-08T02:13:48Z The geometric structure of latent representations in large language models (LLMs) is an active area of research, driven in part by its implications for model transparency and AI safety. Existing literature has focused mainly on general geometric and topological properties of the learnt representations, but due to a lack of ground-truth latent geometry, validating the findings of such approaches is challenging. Emotion processing provides an intriguing testbed for probing representational geometry, as emotions exhibit both categorical organization and continuous affective dimensions, which are well-established in the psychology literature. Moreover, understanding such representations carries safety relevance. In this work, we investigate the latent structure of affective representations in LLMs using geometric data analysis tools. We present three main findings. First, we show that LLMs learn coherent latent representations of affective emotions that align with widely used valence--arousal models from psychology. Second, we find that these representations exhibit nonlinear geometric structure that can nonetheless be well-approximated linearly, providing empirical support for the linear representation hypothesis commonly assumed in model transparency methods. Third, we demonstrate that the learned latent representation space can be leveraged to quantify uncertainty in emotion processing tasks. Our findings suggest that LLMs acquire affective representations with geometric structure paralleling established models of human emotion, with practical implications for model interpretability and safety. 2026-04-08T02:13:48Z Benjamin J. Choi Melanie Weber http://arxiv.org/abs/2405.11619v2 Novel Interpretable and Robust Web-based AI Platform for Phishing Email Detection 2026-04-08T02:11:22Z Phishing emails continue to pose a significant threat, causing financial losses and security breaches. This study addresses limitations in existing research, such as reliance on proprietary datasets and lack of real-world application, by proposing a high-performance machine learning model for email classification. Utilizing a comprehensive and largest available public dataset, the model achieves a f1 score of 0.99 and is designed for deployment within relevant applications. Additionally, Explainable AI (XAI) is integrated to enhance user trust. This research offers a practical and highly accurate solution, contributing to the fight against phishing by empowering users with a real-time web-based application for phishing email detection. 2024-05-19T17:18:27Z 19 pages, 7 figures, dataset link: https://www.kaggle.com/datasets/naserabdullahalam/phishing-email-dataset/ Abdulla Al-Subaiey Mohammed Al-Thani Naser Abdullah Alam Kaniz Fatema Antora Amith Khandakar SM Ashfaq Uz Zaman 10.1016/j.compeleceng.2024.109625 http://arxiv.org/abs/2604.03647v2 Stabilizing Unsupervised Self-Evolution of MLLMs via Continuous Softened Retracing reSampling 2026-04-08T02:08:51Z In the unsupervised self-evolution of Multimodal Large Language Models, the quality of feedback signals during post-training is pivotal for stable and effective learning. However, existing self-evolution methods predominantly rely on majority voting to select the most frequent output as the pseudo-golden answer, which may stem from the model's intrinsic biases rather than guaranteeing the objective correctness of the reasoning paths. To counteract the degradation, we propose Continuous Softened Retracing reSampling (CSRS) in MLLM self-evolution. Specifically, we introduce a Retracing Re-inference Mechanism (RRM) that the model re-inferences from anchor points to expand the exploration of long-tail reasoning paths. Simultaneously, we propose Softened Frequency Reward (SFR), which replaces binary rewards with continuous signals, calibrating reward based on the answers' frequency across sampled reasoning sets. Furthermore, incorporated with Visual Semantic Perturbation (VSP), CSRS ensures the model prioritizes mathematical logic over visual superficiality. Experimental results demonstrate that CSRS significantly enhances the reasoning performance of Qwen2.5-VL-7B on benchmarks such as MathVision. We achieve state-of-the-art (SOTA) results in unsupervised self-evolution on geometric tasks. Our code is avaible at https://github.com/yyy195/CSRS. 2026-04-04T08:52:43Z 16 pages, 6 figures Yunyao Yu Zhengxian Wu Zhuohong Chen Hangrui Xu Zirui Liao Xiangwen Deng Zhifang Liu Senyuan Shi Haoqian Wang http://arxiv.org/abs/2511.22396v2 Asking like Socrates: Socrates helps VLMs understand remote sensing images 2026-04-08T01:53:03Z Recent multimodal reasoning models, inspired by DeepSeek-R1, have significantly advanced vision-language systems. However, in remote sensing (RS) tasks, we observe widespread pseudo reasoning: models narrate the process of reasoning rather than genuinely reason toward the correct answer based on visual evidence. We attribute this to the Glance Effect, where a single, coarse perception of large-scale RS imagery results in incomplete understanding and reasoning based on linguistic self-consistency instead of visual evidence. To address this, we propose RS-EoT (Remote Sensing Evidence-of-Thought), a language-driven, iterative visual evidence-seeking paradigm. To instill this paradigm, we propose SocraticAgent, a self-play multi-agent system that synthesizes reasoning traces via alternating cycles of reasoning and visual inspection. To enhance and generalize these patterns, we propose a two-stage progressive RL strategy: first, RL on fine-grained Grounding tasks to enhance RS-EoT capabilities, followed by RL on RS VQA to generalize to broader understanding scenarios. Experiments show RS-EoT achieves state-of-the-art performance on multiple RS VQA and grounding benchmarks. Analyses reveal clear iterative cycles of reasoning and evidence seeking, confirming RS-EoT mitigates the Glance Effect and enables genuine evidence-grounded reasoning. Our code, data, and models are available at https://geox-lab.github.io/Asking_like_Socrates 2025-11-27T12:19:37Z Accepted by CVPR 2026 Run Shao Ziyu Li Zhaoyang Zhang Linrui Xu Xinran He Hongyuan Yuan Bolei He Yongxing Dai Yiming Yan Yijun Chen Wang Guo Haifeng Li http://arxiv.org/abs/2601.22451v2 Countering the Over-Reliance Trap: Mitigating Object Hallucination for LVLMs via a Self-Validation Framework 2026-04-08T01:52:53Z Despite progress in Large Vision Language Models (LVLMs), object hallucination remains a critical issue in image captioning task, where models generate descriptions of non-existent objects, compromising their reliability. Previous work attributes this to LVLMs' over-reliance on language priors and attempts to mitigate it through logits calibration. However, they still lack a thorough analysis of the over-reliance. To gain a deeper understanding of over-reliance, we conduct a series of preliminary experiments, indicating that as the generation length increases, LVLMs' over-reliance on language priors leads to inflated probability of hallucinated object tokens, consequently exacerbating object hallucination. To circumvent this issue, we propose Language-Prior-Free Verification to enable LVLMs to faithfully verify the confidence of object existence. Based on this, we propose a novel training-free Self-Validation Framework to counter the over-reliance trap. It first validates objects' existence in sampled candidate captions and further mitigates object hallucination via caption selection or aggregation. Experiment results demonstrate that our framework mitigates object hallucination significantly in image captioning task (e.g., 65.6% improvement on CHAIRI metric with LLaVA-v1.5-7B), surpassing the previous SOTA methods. This result highlights a novel path towards mitigating hallucination by unlocking the inherent potential within LVLMs themselves. 2026-01-30T01:37:53Z Code is available at https://github.com/Liushiyu-0709/SelfVal Shiyu Liu Xinyi Wen Zhibin Lan Ante Wang Jinsong Su http://arxiv.org/abs/2604.06571v1 LLM-based Schema-Guided Extraction and Validation of Missing-Person Intelligence from Heterogeneous Data Sources 2026-04-08T01:35:56Z Missing-person and child-safety investigations rely on heterogeneous case documents, including structured forms, bulletin-style posters, and narrative web profiles. Variations in layout, terminology, and data quality impede rapid triage, large-scale analysis, and search-planning workflows. This paper introduces the Guardian Parser Pack, an AI-driven parsing and normalization pipeline that transforms multi-source investigative documents into a unified, schema-compliant representation suitable for operational review and downstream spatial modeling. The proposed system integrates (i) multi-engine PDF text extraction with Optical Character Recognition (OCR) fallback, (ii) rule-based source identification with source-specific parsers, (iii) schema-first harmonization and validation, and (iv) an optional Large Language Model (LLM)-assisted extraction pathway incorporating validator-guided repair and shared geocoding services. We present the system architecture, key implementation decisions, and output design, and evaluate performance using both gold-aligned extraction metrics and corpus-level operational indicators. On a manually aligned subset of 75 cases, the LLM-assisted pathway achieved substantially higher extraction quality than the deterministic comparator (F1 = 0.8664 vs. 0.2578), while across 517 parsed records per pathway it also improved aggregate key-field completeness (96.97\% vs. 93.23\%). The deterministic pathway remained much faster (mean runtime 0.03 s/record vs. 3.95 s/record for the LLM pathway). In the evaluated run, all LLM outputs passed initial schema validation, so validator-guided repair functioned as a built-in safeguard rather than a contributor to the observed gains. These results support controlled use of probabilistic AI within a schema-first, auditable pipeline for high-stakes investigative settings. 2026-04-08T01:35:56Z 9 pages, 6 figures. Accepted at International Conference on Intelligent Digitization of Systems and Services (IDSS 2026) Joshua Castillo Ravi Mukkamala http://arxiv.org/abs/2604.06566v1 AI-Driven Research for Databases 2026-04-08T01:34:11Z As the complexity of modern workloads and hardware increasingly outpaces human research and engineering capacity, existing methods for database performance optimization struggle to keep pace. To address this gap, a new class of techniques, termed AI-Driven Research for Systems (ADRS), uses large language models to automate solution discovery. This approach shifts optimization from manual system design to automated code generation. The key obstacle, however, in applying ADRS is the evaluation pipeline. Since these frameworks rapidly generate hundreds of candidates without human supervision, they depend on fast and accurate feedback from evaluators to converge on effective solutions. Building such evaluators is especially difficult for complex database systems. To enable the practical application of ADRS in this domain, we propose automating the design of evaluators by co-evolving them with the solutions. We demonstrate the effectiveness of this approach through three case studies optimizing buffer management, query rewriting, and index selection. Our automated evaluators enable the discovery of novel algorithms that outperform state-of-the-art baselines (e.g., a deterministic query rewrite policy that achieves up to 6.8x lower latency), demonstrating that addressing the evaluation bottleneck unlocks the potential of ADRS to generate highly optimized, deployable code for next-generation data systems. 2026-04-08T01:34:11Z Audrey Cheng Harald Ng Aaron Kabcenell Peter Bailis Matei Zaharia Lin Ma Xiao Shi Ion Stoica http://arxiv.org/abs/2509.17183v3 LifeAlign: Lifelong Alignment for Large Language Models with Memory-Augmented Focalized Preference Optimization 2026-04-08T01:32:04Z Alignment plays a crucial role in Large Language Models (LLMs) in aligning with human preferences on a specific task/domain. Traditional alignment methods suffer from catastrophic forgetting, where models lose previously acquired knowledge when adapting to new preferences or domains. We introduce LifeAlign, a novel framework for lifelong alignment that enables LLMs to maintain consistent human preference alignment across sequential learning tasks without forgetting previously learned knowledge. Our approach consists of two key innovations. First, we propose a focalized preference optimization strategy that aligns LLMs with new preferences while preventing the erosion of knowledge acquired from previous tasks. Second, we develop a short-to-long memory consolidation mechanism that merges denoised short-term preference representations into stable long-term memory using intrinsic dimensionality reduction, enabling efficient storage and retrieval of alignment patterns across diverse domains. We evaluate LifeAlign across multiple sequential alignment tasks spanning different domains and preference types. Experimental results demonstrate that our method achieves superior performance in maintaining both preference alignment quality and knowledge retention compared to existing lifelong learning approaches. The codes and datasets have been released on https://github.com/real-ljs/LifeAlign. 2025-09-21T18:06:05Z Junsong Li Jie Zhou Bihao Zhan Yutao Yang Qianjun Pan Shilian Chen Tianyu Huai Xin Li Qin Chen Liang He http://arxiv.org/abs/2604.06562v1 On Emotion-Sensitive Decision Making of Small Language Model Agents 2026-04-08T01:24:13Z Small language models (SLM) are increasingly used as interactive decision-making agents, yet most decision-oriented evaluations ignore emotion as a causal factor influencing behavior. We study emotion-sensitive decision making by combining representation-level emotion induction with a structured game-theoretic evaluation. Emotional states are induced using activation steering derived from crowd-validated, real-world emotion-eliciting texts, enabling controlled and transferable interventions beyond prompt-based methods. We introduce a benchmark built around canonical decision templates that span cooperative and competitive incentives under both complete and incomplete information. These templates are instantiated using strategic scenarios from \textsc{Diplomacy}, \textsc{StarCraft II}, and diverse real-world personas. Experiments across multiple model families in various architecture and modalities, show that emotional perturbations systematically affect strategic choices, but the resulting behaviors are often unstable and not fully aligned with human expectations. Finally, we outline an approach to improve robustness to emotion-driven perturbations. 2026-04-08T01:24:13Z Jiaju Lin Xingjian Du Qingyun Wu Ellen Wenting Zou Jindong Wang http://arxiv.org/abs/2604.04344v2 Domain-Contextualized Inference: A Computable Graph Architecture for Explicit-Domain Reasoning 2026-04-08T01:21:14Z We establish a computation-substrate-agnostic inference architecture in which domain is an explicit first-class computational parameter. This produces domain-scoped pruning that reduces per-query search space from O(N) to O(N/K), substrate-independent execution over symbolic, neural, vector, and hybrid substrates, and transparent inference chains where every step carries its evaluative context. The contribution is architectural, not logical. We formalize the computational theory across five dimensions: a five-layer architecture; three domain computation modes including chain indexing, path traversal as Kleisli composition, and vector-guided computation as a substrate transition; a substrate-agnostic interface with three operations Query, Extend, Bridge; reliability conditions C1 to C4 with three failure mode classes; and validation through a PHQ-9 clinical reasoning case study. The computational theory including operational semantics, complexity bounds, monad structure, substrate transitions, and boundary conditions is the contribution of this paper. 2026-04-06T01:35:42Z 22 pages; Corrected author name spelling and updated pseudo-code style Chao Li Yuru Wang Chunyi Zhao http://arxiv.org/abs/2604.05429v2 Bridging Natural Language and Microgrid Dynamics: A Context-Aware Simulator and Dataset 2026-04-08T01:03:11Z Addressing the critical need for intelligent, context-aware energy management in renewable systems, we introduce the OpenCEM Simulator and Dataset: the first open-source digital twin explicitly designed to integrate rich, unstructured contextual information with quantitative renewable energy dynamics. Traditional energy management relies heavily on numerical time series, thereby neglecting the significant predictive power embedded in human-generated context (e.g., event schedules, system logs, user intentions). OpenCEM bridges this gap by offering a unique platform comprising both a meticulously aligned, language-rich dataset from a real-world PV-and-battery microgrid installation and a modular simulator capable of natively processing this multi-modal context. The OpenCEM Simulator provides a high-fidelity environment for developing and validating novel control algorithms and prediction models, particularly those leveraging Large Language Models. We detail its component-based architecture, hybrid data-driven and physics-based modelling capabilities, and demonstrate its utility through practical examples, including context-aware load forecasting and the implementation of online optimal battery charging control strategies. By making this platform publicly available, OpenCEM aims to accelerate research into the next generation of intelligent, sustainable, and truly context-aware energy systems. 2026-04-07T04:52:51Z Tinko Sebastian Bartels Ruixiang Wu Xinyu Lu Yikai Lu Fanzeng Xia Haoxiang Yang Yue Chen Tongxin Li http://arxiv.org/abs/2604.06550v1 SkillSieve: A Hierarchical Triage Framework for Detecting Malicious AI Agent Skills 2026-04-08T00:58:48Z OpenClaw's ClawHub marketplace hosts over 13,000 community-contributed agent skills, and between 13% and 26% of them contain security vulnerabilities according to recent audits. Regex scanners miss obfuscated payloads; formal static analyzers cannot read the natural language instructions in SKILL.md files where prompt injection and social engineering attacks hide. Neither approach handles both modalities. SkillSieve is a three-layer detection framework that applies progressively deeper analysis only where needed. Layer 1 runs regex, AST, and metadata checks through an XGBoost-based feature scorer, filtering roughly 86% of benign skills in under 40ms on average at zero API cost. Layer 2 sends suspicious skills to an LLM, but instead of asking one broad question, it splits the analysis into four parallel sub-tasks (intent alignment, permission justification, covert behavior detection, cross-file consistency), each with its own prompt and structured output. Layer 3 puts high-risk skills before a jury of three different LLMs that vote independently and, if they disagree, debate before reaching a verdict. We evaluate on 49,592 real ClawHub skills and adversarial samples across five evasion techniques, running the full pipeline on a 440 ARM single-board computer. On a 400-skill labeled benchmark, SkillSieve achieves 0.800 F1, outperforming ClawVet's 0.421, at an average cost of 0.006 per skill. Code, data, and benchmark are open-sourced. 2026-04-08T00:58:48Z 7 pages, 5 tables, 1 figure Yinghan Hou Zongyou Yang