https://arxiv.org/api/jCSiGbenszwWQO4WCEIUv7SdD5c2026-06-10T21:02:17Z2722547515http://arxiv.org/abs/2509.25017v2Uncertainty-Aware Deep Learning for Wildfire Danger Forecasting2026-06-09T14:12:38ZWildfires are among the most severe natural hazards, posing a significant threat to both humans and natural ecosystems. The growing risk of wildfires increases the demand for forecasting models that are not only accurate but also reliable. Deep Learning (DL) has shown promise in predicting wildfire danger; however, its adoption is hindered by concerns over the reliability of its predictions, some of which stem from the lack of uncertainty quantification. To address this challenge, we present an uncertainty-aware DL framework that jointly captures epistemic (model) and aleatoric (data) uncertainty to enhance short-term wildfire danger forecasting. In the next-day forecasting, our best-performing model improves the F1 Score by 2.3% and reduces the Expected Calibration Error by 2.1% compared to a deterministic baseline, enhancing both predictive skill and calibration. Our experiments confirm the reliability of the uncertainty estimates and illustrate their practical utility for decision support, including the identification of uncertainty thresholds for rejecting low-confidence predictions and the generation of well-calibrated wildfire danger maps with accompanying uncertainty layers. Extending the forecast horizon up to ten days, we observe that aleatoric uncertainty increases with time, showing greater variability in environmental conditions, while epistemic uncertainty remains stable. Finally, we show that although the two uncertainty types may be redundant in low-uncertainty cases, they provide complementary insights under more challenging conditions, underscoring the value of their joint modeling for robust wildfire danger prediction. In summary, our approach significantly improves the accuracy and reliability of wildfire danger forecasting, advancing the development of trustworthy wildfire DL systems.2025-09-29T16:43:17ZSpyros KondylatosNikolas PapadopoulosGustau Camps-VallsIoannis Papoutsishttp://arxiv.org/abs/2511.01927v2Learning-Guided Integration Contours Construction for Fast Large-Scale Generalized Eigensolvers2026-06-09T14:09:40ZSolving large-scale Generalized Eigenvalue Problems (GEPs) is a fundamental yet computationally prohibitive task in science and engineering. As a promising direction, contour integral (CI) methods offer an efficient and parallelizable framework. However, their performance is critically dependent on the selection of integration contours -- improper selection without reliable prior knowledge of eigenvalue distribution can incur significant computational overhead and compromise numerical accuracy. To address this challenge, we propose Deepcontour, a novel hybrid framework that integrates a deep learning-based spectral predictor with Kernel Density Estimation (KDE) for principled contour design. Specifically, Deepcontour utilizes its specialized Eigen-Neural-Operator (ENO) to provide rapid spectral distribution priors, driving a KDE module to automatically construct the optimized integration contours, which guide the CI solver to efficiently find the desired eigenvalues. Deepcontour achieves up to a 5.63x speedup across diverse scientific datasets while maintaining strict numerical rigor. By merging the predictive power of deep learning with the numerical rigor of classical solvers, this work establishes an efficient and robust paradigm for solving large-scale GEPs.2025-11-02T12:58:58ZYeqiu ChenZiyan LiuHong WangLei Liuhttp://arxiv.org/abs/2606.10896v1Flash-GMM: A Memory-Efficient Kernel for Scalable Soft Clustering2026-06-09T14:07:18ZWe present \textbf{Flash-GMM}, a fused Triton kernel for efficient computation of Gaussian Mixture Models (GMMs) over large-scale data in a single GPU pass. By eliminating the need to materialize the full responsibility matrix in GPU memory, Flash-GMM achieves a \textbf{20$\times$} speedup over existing implementations and enables training on datasets more than \textbf{100$\times$} larger than previously feasible on one device. To demonstrate its impact, we integrate Flash-GMM into the IVF coarse quantizer for approximate nearest-neighbor (ANN) search. We show that soft GMM clustering is now a viable drop-in replacement for $k$-means, and that GMM responsibilities can be leveraged to assign border vectors to multiple clusters. Our approach reaches fixed recall targets with up to $1.7\times$ fewer distance computations, or equivalently, yields $+2$--$12$ recall@10 at matched computational cost. We release the kernel as an open-source project.2026-06-09T14:07:18ZGal BlochAriel GeraMatan OrbachOhad EytanAssaf Toledohttp://arxiv.org/abs/2606.10890v1Optimal Post-Training Quantization Scales and Where to Find Them2026-06-09T14:03:04ZPost-training quantization (PTQ) compresses large language models by mapping weights to low-bit representations. The scaling factor that defines the quantization grid is typically chosen using simple, data-free heuristics. In this work, we present PiSO (Piecewise Scale Optimization), an algorithm that leverages calibration data to compute the optimal channel-wise weight scales exactly and efficiently under round-to-nearest quantization. PiSO partitions the scale search space into finitely many intervals on which the objective admits a closed-form minimizer. We extend PiSO to group-wise quantization via principled heuristics and propose effective strategies for interleaving scale optimization with error correction. Experiments on Llama and Qwen models across multiple model sizes and target weight bit-widths demonstrate consistent improvements in perplexity and downstream zero-shot accuracy, both standalone and combined with error correction. In particular, we observe increased benefits as the target bit-width narrows and quantization becomes more challenging.2026-06-09T14:03:04ZJuan AmboagePablo Monteagudo-LagoIan ColbertGiuseppe FrancoNicholas Fraserhttp://arxiv.org/abs/2606.10889v1Sleep EEG Signal Criticality as a Non-Invasive Predictor of Cognitive Decline in Dementia2026-06-09T14:02:40ZEarly detection of neurodegeneration remains a critical clinical challenge. This study investigates whether sleep EEG signal criticality, quantified via Multifractal Detrended Fluctuation Analysis (MFDFA), serves as a non-invasive biomarker for future cognitive decline. We analyzed longitudinal data from the National Sleep Research Resource (NSRR) Study of Osteoporotic Fractures (SOF) cohort, comparing baseline sleep EEG dynamics between women who remained cognitively normal and those who later progressed to dementia-related impairment ($3MS < 78$).Our results reveal significant group-level differences in Hurst exponent $H(q)$ distributions, particularly during non-REM stages N2 and N3. Cognitively healthy individuals exhibited signal dynamics significantly closer to an optimally critical state across all electrode locations ($p \leqslant 0.001$), supporting the Brain Criticality Hypothesis. Supervised UMAP projections confirmed clear spatial separation between groups throughout the overnight sleep architecture.The dementia group demonstrated a shift in DFA exponents toward $1.0$, suggesting that a reconfiguration of scale-free neural dynamics during sleep precedes clinical symptoms. These findings highlight the potential for MFDFA-derived measures to be integrated into automated, sleep-based screening tools, enabling earlier preventative interventions during the prodromal window of dementia.2026-06-09T14:02:40Z4 pages, 2 figures, accepted for publication in the Proc. 48th Annu. Int. Conf. IEEE EMBS (EMBC 2026), Toronto, Canada, July 20-24, 2026Stanisław NarębskiTomasz KomendzińskiTomasz M. Rutkowskihttp://arxiv.org/abs/2602.16898v6MALLVI: A Multi-Agent Framework for Integrated Generalized Robotics Manipulation2026-06-09T14:01:51ZTask planning for robotic manipulation with large language models (LLMs) is an emerging area. Prior approaches rely on specialized models, fine tuning, or prompt tuning, and often operate in an open loop manner without robust environmental feedback, making them fragile in dynamic settings. MALLVI presents a Multi Agent Large Language and Vision framework that enables closed-loop feedback driven robotic manipulation. Given a natural language instruction and an image of the environment, MALLVI generates executable atomic actions for a robot manipulator. After action execution, a Vision Language Model (VLM) evaluates environmental feedback and decides whether to repeat the process or proceed to the next step. Rather than using a single model, MALLVI coordinates specialized agents, Decomposer, Localizer, Thinker, and Reflector, to manage perception, localization, reasoning, and high level planning. An optional Descriptor agent provides visual memory of the initial state. The Reflector supports targeted error detection and recovery by reactivating only relevant agents, avoiding full replanning. Experiments in simulation and real-world settings show that iterative closed loop multi agent coordination improves generalization and increases success rates in zero shot manipulation tasks. Code available at https://github.com/iman1234ahmadi/MALLVI .2026-02-18T21:28:56ZSome fundemental change in text and codebaseMehrshad TajiArad Mahdinezhad KashaniIman AhmadiAmirHossein JadidiSaina KashaniBabak Khalajhttp://arxiv.org/abs/2606.10877v1XtrAIn: Training-Guided Occlusion for Feature Attribution2026-06-09T13:52:05ZOcclusion-based attribution methods provide an intuitive way to estimate feature importance by perturbing input features and measuring the resulting change in model output. However, their reliability is strongly affected by how feature removal is implemented: externally selected baselines can introduce bias, out-of-distribution samples, and unstable explanations, while in nonlinear models the occlusion of a set of features can also alter the contribution of non-occluded features. We refer to this effect as attribution shift, as the attribution scores of the non-occluded features drift from their initial values. To challenge these major issues that render explanations unstable, we introduce XtrAIn, a training-guided attribution method that transfers the occlusion operation from the input space to the parameter space. Instead of replacing input values with hand-crafted baselines, XtrAIn follows the model's training trajectory and measures how feature-associated parameter updates affect the output logits. We further introduce Xstep, a lightweight approximation for reducing computational cost, and XtrAIn+, a target-focused variant that emphasizes updates aligned with the target class. Experiments on controlled image datasets and PAM50 breast-cancer subtype classification show that the proposed methods produce cleaner and more interpretable attribution patterns than standard attribution baselines. Overall, XtrAIn provides a training-aware perspective on feature attribution and offers a useful diagnostic tool for studying how feature-level evidence is formed during training.2026-06-09T13:52:05Z12 pages, 7 figures, 1 tableThodoris LymperopoulosIoannis KakogeorgiouDenia Kanellopoulouhttp://arxiv.org/abs/2606.07586v2From Human Guidance to Autonomy: Agent Skill System for End-to-End LLM Deployment on Spatial NPUs2026-06-09T13:49:01ZSpatial neural processing units (NPUs) provide an energy-efficient platform for edge LLM inference, but efficiently deploying an LLM end-to-end on such hardware remains labor-intensive. Although AI coding agents have begun to lower this cost, existing studies have largely focused on single-kernel optimization rather than end-to-end LLM deployment on resource-constrained spatial NPUs.
We present a two-stage methodology, instantiated on the AMD XDNA 2 NPU, that progresses from human-guided development to agent autonomy. In the first stage, we develop a reference deployment of Llama-3.2-1B through human-guided agent assistance. The resulting implementation achieves a speedup of 2.2x on prefill and 4.0x on decode over the hand-optimized baseline, with the optimization trajectory and its lessons recorded as structured documentation throughout. In the second stage, we distill the documentation into an agent skill system consisting of eight phases, orchestrating the optimization and debugging skill sets, with numerical correctness strictly enforced at each phase.
Using our agent skill system, we autonomously deploy eight additional decoder-only LLMs (Llama-3.2-3B, SmolLM2-1.7B, Qwen2.5-{0.5B, 1.5B, 3B}, Qwen3-{0.6B, 1.7B, 4B}) end-to-end on the AMD XDNA 2 NPU using the open-source compiler stack. To our knowledge, these models have not previously been deployed on AMD NPUs via any open-source software stack. Each deployment completes in 0.5-4 hours of agent wall time with almost no human guidance, and passes the numerical-correctness gates, demonstrating functional generalization to previously unencountered LLMs. Three of the eight match or exceed the sustained performance of our Llama-3.2-1B reference deployment, suggesting that the resulting implementations can be competitive without additional model-specific human engineering.2026-05-27T18:16:23ZAccepted to the Machine Learning for Architecture and Systems Workshop (MLArchSys), co-located with ISCA 2026Jiajie LiErwei WangZhiru ZhangSamuel Baylisshttp://arxiv.org/abs/2606.10868v1When Do Autoregressive Sequence Models Forecast Physical Wavefields? A Controlled Study on Synthetic Seismograms2026-06-09T13:46:14ZLong-horizon autoregressive forecasting of oscillatory physical signals, such as seismograms, gravitational-wave strain, and similar wavefields is limited by error accumulation: as a causal model is fed its own outputs over hundreds of steps, small per-step errors compound into phase drift that pointwise metrics fail to detect. We ask when such rollout stays stable, using synthetic three-component seismograms as a physically structured testbed and the \textsc{SeismoGPT} autoregressive forecaster as the model under study. Through controlled, intra-architecture ablations evaluated on free-running rollout with paired significance tests, we isolate the contribution of each design choice. Multi-token prediction is the dominant stabilizer, accounting for almost the entire improvement over a single-token baseline ($+0.040$ median NCC); a horizon-embedding hybrid prediction head and a cross-horizon STFT-magnitude coherence loss each add a small but consistent further gain. Performance depends sharply on a context-ratio threshold near one, roughly the full P-S interval of observed signal, below which rollout generalization collapses. The dominant residual failure is a polarity inversion that a magnitude-based spectral loss cannot, by construction, penalize, identifying phase-aware objectives as the natural next step. We frame this as a controlled study of rollout stability on oscillatory wavefields, not a benchmark of forecasting architectures.2026-06-09T13:46:14Z16 pages, 5 figures and 3 tablesWaleed EsmailStuart RussellJana KlingeAlexander KappesChristine Thomashttp://arxiv.org/abs/2606.10857v1Embodiment-conditioned Generalist Control for Multirotor Aerial Robots2026-06-09T13:36:26ZWe present a generalist position control policy capable of controlling arbitrary multirotor configurations of a certain rotor count (e.g., hexarotors or quadrotors) with a single set of network weights. The policy is conditioned on a physics-grounded embodiment descriptor: a mass and inertia-normalized control allocation matrix that captures how mass-normalized motor thrusts generate linear and angular accelerations in the body-frame. To train the policy, we sample from a broad distribution of arbitrary multirotor configurations, including non-planar and asymmetric systems, and optimize a single, compact network using Proximal Policy Optimization. Training requires only five minutes on an RTX 3090 GPU using a custom NVIDIA Warp-based dynamics simulator. Through extensive simulation experiments, we show that embodiment conditioning enables robust generalist control across arbitrary morphologies. We demonstrate zero-shot real-world transfer of this generalist policy on three diverse hexarotor systems, including a planar robot, a partially symmetric non-planar system, and a random asymmetric, non-planar configuration.2026-06-09T13:36:26ZOrestis KonstantaropoulosWelf RehbergMihir KulkarniKostas Alexishttp://arxiv.org/abs/2411.05698v3Visual-TCAV: Concept-based Attribution and Saliency Maps for Post-hoc Explainability in Image Classification2026-06-09T13:29:25ZConvolutional Neural Networks (CNNs) have shown remarkable performance in image classification. However, interpreting their predictions is challenging due to the size and complexity of these models. State-of-the-art saliency methods generate local explanations highlighting the area in the input image where a class is identified but cannot explain how a concept of interest contributes to the prediction. On the other hand, concept-based methods, such as TCAV, provide insights into how sensitive the network is to a human-defined concept but cannot compute its attribution in a specific prediction nor show its location within the input image. We introduce Visual-TCAV, a novel explainability framework aiming to bridge the gap between these methods by providing both local and global explanations. Visual-TCAV uses Concept Activation Vectors (CAVs) to generate class-agnostic saliency maps that show where the network recognizes a certain concept. Moreover, it can estimate the attribution of these concepts to the output of any class using a generalization of Integrated Gradients. We evaluate the method's faithfulness via a controlled experiment where the ground truth for explanations is known, showing better ground truth alignment than TCAV. Our code is available at https://github.com/DataSciencePolimi/Visual-TCAV.2024-11-08T16:52:52ZAccepted in TMLRAntonio De SantisRiccardo CampiMatteo BianchiMarco Brambillahttp://arxiv.org/abs/2403.00420v3Robust Deep Reinforcement Learning Through Adversarial Attacks and Training : A Survey2026-06-09T13:25:29ZDeep Reinforcement Learning (DRL) is a subfield of machine learning for training autonomous agents that take sequential actions across complex environments. Despite its significant performance in well-known environments, it remains susceptible to minor condition variations, raising concerns about its reliability in real-world applications. To improve usability, DRL must demonstrate trustworthiness and robustness. A way to improve the robustness of DRL to unknown changes in the environmental conditions and possible perturbations is through Adversarial Training, by training the agent against well-suited adversarial attacks on the observations and the dynamics of the environment. Addressing this critical issue, our work presents an in-depth analysis of contemporary adversarial attack and training methodologies, systematically categorizing them and comparing their objectives and operational mechanisms.2024-03-01T10:16:46Z83 pages, 17 figues, 3 table, 15 algorithmsLucas SchottJosephine DelasHatem HajriElies GherbiReda YaichNora Boulahia-CuppensFrederic CuppensSylvain Lamprierhttp://arxiv.org/abs/2606.10835v1Geometrically Averaged Hard Target Updates for Linear Q-Learning2026-06-09T13:24:36ZPeriodic hard target updates are among the most common stabilization devices in modern deep Q-learning. Recent studies suggest that target updates can improve stability in Q-learning with function approximation, including linear function approximation. We introduce and analyze the so-called $λ$-target update, obtained by averaging the $m$-periodic target update maps with $λ$-geometric weights $(1-λ)λ^{m-1}$, $λ\in [0,1]$. The endpoint $λ=0$ recovers the one-period target update, while the continuous endpoint $λ\uparrow1$ recovers projected Q-value iteration. We study this mechanism for Q-learning with linear function approximation, namely linear Q-learning, using a switching-system model and related tools. For clarity, the paper treats a deterministic version; the formulation extends to stochastic reinforcement-learning settings.2026-06-09T13:24:36ZDonghwan Leehttp://arxiv.org/abs/2605.30292v2Leave a Window Out: Modifying the Jackknife for Predictive Inference in Time Series2026-06-09T13:11:00ZConformal prediction methods enjoy strong theoretical and empirical predictive inference performance, provided the data is exchangeable and is treated symmetrically during training. However, these assumptions are impractical in many settings, such as time series, where temporal dependence violates exchangeability and it is preferable to use predictors that leverage dependence by treating data asymmetrically. Recent work shows that split conformal prediction is robust to these issues, but sample splitting can reduce accuracy, motivating the study of methods that do not rely on data splitting in the time series setting.
In this work, we show that the vanilla leave-one-out jackknife can suffer arbitrary loss of coverage even in canonical time series models with mild temporal dependence. As a remedy, we propose a modification tailored to such settings, which we term the leave-a-window-out (LWO) method, and show that it can achieve valid coverage provided that the model-fitting procedure satisfies mild stability properties. Our proofs are based on quantifying the degree to which the data departs from cyclic exchangeability, which we introduce new coefficients to measure. Experiments on time series demonstrate that our method often enjoys valid coverage when the vanilla jackknife fails to cover, while producing much narrower intervals than split conformal prediction.2026-05-28T17:41:12Z40 pages, 8 figuresHanyang JiangRina Foygel BarberAshwin PananjadyYao Xiehttp://arxiv.org/abs/2606.10825v1MODIP: Efficient Model-Based Optimization for Diffusion Policies2026-06-09T13:09:21ZDiffusion policies (DPs) have emerged as expressive policy representations for robot learning, often used with imitation learning methods such as behavioral cloning (BC). However, while their success has largely been confined to BC, direct reinforcement learning (RL) fine-tuning remains challenging because actions are generated through a multi-step denoising process. In this work, we propose MODIP, a framework for the offline-to-online fine-tuning of DPs. Rather than directly applying RL to the DPs, MODIP leverages a world model (WM) to guide policy adaptation and keeps the simplicity and stability of BC. We utilize model predictive control (MPC) to generate high-quality trajectories within the WM, and use them as supervised targets for fine-tuning the DP. To make MPC planning efficient, MODIP uses a terminal state value instead of a policy-dependent state-action value, reducing inference time. Additionally, MODIP trains critics with policy-independent TD targets, reducing training time. Experiments on D4RL (MuJoCo, Kitchen) and RoboMimic tasks show that MODIP improves diffusion policies beyond BC, and is competitive with or outperforms diffusion policy RL fine-tuning methods and strong model-based baselines such as TD-MPC2.2026-06-09T13:09:21ZZakariae El AsriPhilippe Gratias-QuiquandonNicolas ThomeOlivier Sigaud