https://arxiv.org/api/MJIO6FjikmU6Z3tzlJriVLqFaFg2026-06-10T00:40:13Z83597515http://arxiv.org/abs/2605.22242v2Decomposing Ensemble Spread in Lorenz '96 With Learned Stochastic Parameterizations2026-05-24T10:07:29ZWeather and climate forecasts are inherently uncertain due to chaotic dynamics, imperfect initial conditions, and incomplete representation of the underlying physical processes. Operational ensemble forecasts aim to represent these uncertainties through forecast spread, yet many approaches yield underdispersive estimates, with spread that grows too slowly relative to forecast error. Using the two-scale Lorenz 1996 system as a widely used, controlled testbed, we design a systematic approach to disentangle intrinsic variability, initial-condition perturbations, and stochastic model uncertainty. We compare multiple ensemble configurations and parameterization strategies, including existing deterministic and autoregressive as well as novel Bayesian and flow-based approaches. Our results show that ensemble perturbations do not increase the system's long-term variance; rather, they regulate how rapidly trajectories decorrelate and explore the invariant measure. Stochastic parameterizations, particularly those with temporally persistent structure, enhance early spread growth and improve spread-error consistency. Overall, we bring clarity to how different sources of uncertainty interact in a chaotic system and provide guidance for the design and evaluation of stochastic parameterizations in weather and climate models.2026-05-21T09:48:10ZBirgit KühbacherDaan CrommelinNiki Kilbertushttp://arxiv.org/abs/2605.24945v1RealBench: Benchmarking Data-Driven Numerical Weather Forecasting Under Operational Conditions and Extreme Event Challenges2026-05-24T08:46:17ZAccurate evaluation of weather forecasting models is critical for their reliable deployment in real-world applications. However, existing benchmarks predominantly rely on reanalysis products such as ERA5, which are generated through delayed data assimilation and do not reflect the constraints of real-time operational forecasting, thereby resulting in a systematic mismatch between benchmark performance and real-world forecasting. In this work, we introduce RealBench, a next-generation benchmark for AI weather forecasting that emphasizes realistic evaluation under operational conditions. RealBench features a strictly out-of-distribution test set spanning 2025 to eliminate data leakage and capture recent atmospheric regimes. It integrates multiple data sources, including low-latency operational analysis and a large-scale global in-situ observation dataset comprising over 10,000 stations, enabling direct evaluation against real atmospheric measurements. Beyond standard global metrics, RealBench provides a comprehensive evaluation framework for high-impact extreme events, including heatwaves, cold surges, and tropical cyclones, using event-specific metrics that better reflect real-world forecasting priorities. The evaluation results reveal substantial discrepancies between reanalysis-based metrics and real-world performance, particularly concerning extreme events. By highlighting the limitations of existing benchmarks, this work establishes a more faithful and operationally relevant evaluation paradigm, providing a rigorous foundation for advancing next-generation AI weather forecasting systems. The benchmark implementation is available at: https://github.com/lixruize-del/NWP-Benchmark.2026-05-24T08:46:17Z35 pages, 22 figuresRuize LiZhibin WenTao HanHao ChenFenghua LingWei ZhangSong GuoLei Baihttp://arxiv.org/abs/2605.24896v1Exascale Hybrid Numerical-AI Ensembles for Operational Flood-Season Forecasting in East Asia: 15-km Decadal Hindcasts and 1-km High-Resolution Capability2026-05-24T06:52:51ZSeasonal forecasting of summer rainfall in East Asia remains a grand challenge, as predictability at 3 to 6 month lead times is constrained by the spring predictability barrier, weak large-scale signals, and localized nonlinear convective extremes. We address this challenge with CAPES, which integrates a kilometer-resolution coupled regional model with atmosphere, land, and ocean components and a data-driven AI seasonal forecasting system. At 15 km resolution, the fused workflow combines 174 numerical members from varying start times, physics schemes, and parameter perturbations with 1,600 AI members generated from initial and physical perturbations. Using the full LineShine system, CAPES completes ten annual 1,774-member hindcasts for 2016 to 2025 within 14.6 hours, improving the mean prediction score from ECMWF's 71.8 to 75.9 and delivering a major gain in operational forecasting capability. The 1-km configuration further enables fine-scale typhoon simulation and establishes the feasibility of kilometer-scale fused ensemble forecasting on a one-week timescale.2026-05-24T06:52:51Z12 pages, 14 figures, 5 tablesMengxuan ChenYunpu XuQiuyan SunHan ZhangJiayi LaiZheng ZhouJuepeng ZhengHongsong MengNan WeiJinxiao ZhangXiongchuan TanHaodong BianYinan CaiGe YangFang WangYunyun LiuConghui HeRunmin DongLanning WangYutong LuYongjiu DaiHaohuan Fuhttp://arxiv.org/abs/2508.11307v3Approximating the universal thermal climate index using sparse regression with orthogonal polynomials2026-05-23T15:47:53ZThe Universal Thermal Climate Index (UTCI) is a measure of thermal comfort that quantifies how humans experience environmental conditions. Due to its robustness and versatility as a bioclimatic indicator, it has been extensively employed across a wide range of studies in bioclimatology and is increasingly used as an operational measure of outdoor thermal comfort. Calculating the UTCI value from the relevant environmental parameters is nominally not straightforward, which is why using a 6th-degree polynomial approximation has become the standard way to calculate UTCI values. Although it is computationally efficient, the error of this polynomial approximation can be substantial. The goal of this study was to develop an improved version of the polynomial approximation - one that retains comparable computational efficiency but is more robust in terms of numerical stability and substantially more accurate, particularly in reducing the frequency of larger errors. This goal was achieved using sparse orthogonal regression, namely sparse regression with an orthogonal polynomial basis, which not only substantially reduces the average errors (i.e., the mean error, the mean absolute error, and the root mean square error) but also drastically reduces the frequency of large errors. By leveraging Legendre polynomial bases, approximation models could be constructed that efficiently populate a Pareto front of accuracy versus complexity and exhibit stable, hierarchical coefficient structures across varying model capacities. Training the new approximation models over only 20% of the data, with the testing performed over the remaining 80%, highlights successful generalization, with the results being robust under bootstrapping. The decomposition effectively approximates the UTCI as a Fourier-like expansion in an orthogonal basis, yielding results near the theoretical optimum in the L2 (least squares) sense.2025-08-15T08:22:01ZFinal peer-reviewed version of the manuscriptGeoscientific Model Development 19, 4319-4330 (2026)Sabin RomanLjupco TodorovskiSaso DzeroskiGregor Skok10.5194/gmd-19-4319-2026http://arxiv.org/abs/2602.17235v2Weak 21st-century AMOC response to Greenland meltwater in a strongly eddying ocean model2026-05-23T15:01:36ZClimate models project that the Atlantic Meridional Overturning Circulation (AMOC) will weaken in the 21st century, but the magnitude is highly uncertain. Some of this uncertainty is structural, as most climate models neglect increasing meltwater from the Greenland ice sheet and do not explicitly capture mesoscale ocean eddies. Here, we quantify the impact of Greenland meltwater on the AMOC until 2100 under SSP5-8.5 forcing for the first time in a strongly eddying (1/10° horizontal resolution) ocean model. The meltwater-induced additional AMOC weakening is small (0.6 $\pm$ 0.2 Sv) compared to the weakening due to warming alone, and similar at high and low resolution. The same meltwater would cause a stronger AMOC weakening under present-day climate conditions. We link both resolution-independence and state-dependence to large-scale controls of the AMOC. Our results demonstrate that the background ocean state is more important than resolution in determining how Greenland meltwater affects the AMOC.2026-02-19T10:30:05ZOliver MehlingHenk A. Dijkstrahttp://arxiv.org/abs/2605.24544v1JAX-SCM v1.0: a modern atmospheric single-column model for boundary layer research2026-05-23T12:24:53ZWe present JAX-SCM v1.0, an open-source atmospheric single-column model for boundary layer research, implemented in Python using the JAX computing library. The model solves for horizontal wind, potential temperature, and specific humidity, combined with prognostic turbulent kinetic energy and turbulent statistics parameterized by the Mellor-Yamada-Nakanishi-Niino level-2.5 (MYNN-2.5) turbulence closure. We verify the implementation against three well-established benchmark cases covering neutral (turbulent Ekman layer), stable (GABLS1), and convective (Wangara Day 33) conditions. Close agreement with reference solutions is demonstrated across all regimes. By building on JAX, the model benefits from just-in-time compilation and native GPU support. While JAX-SCM is not yet fully differentiable, basing it on JAX also lays the foundation for future integration with machine learning components. The model is designed for simplicity and modularity, lowering the barrier to entry for users and developers alike.2026-05-23T12:24:53ZMaximilian Pierzynahttp://arxiv.org/abs/2605.24273v1Plume Segmentation from MethaneSAT with Cross-Sensor Transfer Learning and Physics-Informed Postprocessing2026-05-22T22:53:57ZAutomated detection and masking of individual methane plumes from satellite imagery is important for operational emission attribution and quantification. We present a machine learning framework for plume detection from MethaneSAT retrieved column-averaged dry-air mole fractions of methane. We address two core challenges: the scarcity of labeled MethaneSAT data and the need for inference reliability across diverse atmospheric and surface conditions. We first demonstrate that Mask R-CNN with a ResNet-50 backbone outperforms U-Net semantic segmentation on both MethaneAIR (an airborne version of MethaneSAT) and MethaneSAT data, with pixel-level F1 score gains of 10.49 and 5.48 respectively. To address MethaneSAT data scarcity, we evaluate three cross-sensor transfer strategies leveraging MethaneAIR flights and synthetic plumes. Mask R-CNN with ResNet-50 fine-tuned from MethaneAIR pre-trained weights is the most effective strategy, achieving instance-level precision of 0.60 and a near-perfect recall of 0.98 at the baseline operating point. A physics-informed post-processing pipeline converts detections into two operationally distinct modes. The first is a high-sensitivity mode that applies morphological filtering and proximity-based merging for comprehensive emission screening, achieving precision of 0.71 and recall of 0.94. The second is a high-precision mode that additionally applies a distribution-based classifier for confident source attribution, achieving precision of 0.92 and recall of 0.70. Manual review of detections classified as false positives against our wavelet-based ground truth labels reveals that a meaningful fraction of cases correspond to real methane enhancements excluded by conservative labeling criteria, indicating that precision values reported are lower bounds on true detection performance... Our data and code are available at: https://doi.org/10.7910/DVN/FR959H2026-05-22T22:53:57Z35 pages, 20 figures, 9 tablesManuel Pérez-CarrascoMaya NasrZhan ZhangApisada ChulakadabbaJavier RogerRaia OttenheimerSébastien RocheMaryann SargentChris Chan MillerDaniel VaronJack WarrenLuis GuanterKang SunJonathan FranklinJia ChenCecilia GarraffoXiong LiuRitesh GautamSteven Wofsyhttp://arxiv.org/abs/2605.23875v1Atmosphere as a steam engine2026-05-22T17:34:17ZEarth's atmosphere operates a steam cycle in which water vapor evaporates from the surface, expands, condenses, and returns as precipitation. The Clausius-Clapeyron law relates the incremental expansion work of saturated water vapor to latent heat converted at a Carnot efficiency corresponding to the temperature difference between evaporation and condensation. We generalize this relation to an atmospheric column with condensation occurring over a range of heights and derive the expansion work per mole of precipitated water. This includes the gravitational work associated with lifting moist air to the mean condensation height, the expansion work generated by condensation, and a correction for incomplete condensation. Using GPCP v3.3 precipitation and observational constraints on condensation height, we estimate the global steam-engine power as $W_v=4.4\pm0.9$ W/m2, close to an independent estimate of total atmospheric power, $W=W_P+W_K\simeq4.3\pm0.6$ W/m2, obtained from the gravitational power of precipitation and kinetic energy generation by horizontal pressure gradients diagnosed from MERRA-2. Kinetic energy generation is $W_K\simeq3.2\pm0.3$ W/m2, of which at least two thirds is generated in the lower atmosphere. The smaller upper-atmospheric contribution, dominated by temperature-related pressure gradients, is comparable to Lorenz available potential energy generation. The agreement between steam-engine and atmospheric power is linked to condensation and precipitation fallout. By removing water from the atmospheric gas phase and enabling column-mass redistribution, precipitation maintains surface pressure gradients that drive cross-isobaric flow in the frictional lower atmosphere. The steam-engine framework thus provides a thermodynamic basis for condensation-induced atmospheric dynamics and identifies a major lower-atmospheric power pathway associated with water phase transitions.2026-05-22T17:34:17Z33 pages, 8 figures, 2 tablesAnastassia MakarievaAndrei Nefiodovhttp://arxiv.org/abs/2503.05331v3Radiosonde-constrained reconstructions reveal a weakening Northern Hadley circulation2026-05-22T16:08:46ZThe Northern Hadley cell (NHC) is a fundamental component of Earth's atmospheric circulation, governing precipitation patterns affecting nearly four billion people. Despite its importance, the sign of recent multidecadal trends in NHC strength remains unresolved. Climate models consistently simulate a weakening, whereas reanalyses have suggested an opposing strengthening. Here, we constrain this discrepancy using the global radiosonde record. To assess the NHC, we reconstruct the three-dimensional meridional wind from sparse radiosonde observations using a masked autoencoder graph neural network and apply an identical reconstruction to five modern reanalyses, sampled at the same locations. This paired reconstruction framework reveals a systematic underestimation of climatological NHC strength across all reanalyses, corroborated in ERA5 by systematic data assimilation increments that persistently strengthen the circulation. Most importantly, our radiosonde-based reconstructions provide vertically resolved observational evidence of a statistically significant weakening of the NHC since 1980, reconciling observations with climate model projections. The weakening is consistently reproduced by all reanalysis-based reconstructions and is robust across training datasets and analysis periods, strengthening confidence in projected changes in the Hadley circulation. More broadly, this study establishes a temporally homogeneous reconstruction framework for evaluating large-scale circulation changes and assessing both reanalysis products and climate model projections.2025-03-07T11:15:53Z44 pages (18+10+16), 31 figures (5+10+16). The study has been extended by introducing and applying a novel masked autoencoder graph neural network (MAE-GNN) to reconstruct full meridional wind fields from radiosonde observations and reanalyses equivalents. The manuscript has been extensively revised, with results, conclusions, and figures updated. It is under revision at Nature CommunicationsMatic PikovnikFaculty of Mathematics and Physics, University of Ljubljana, Jadranska 19, 1000 Ljubljana, SloveniaŽiga ZaplotnikEuropean Centre for Medium-range Weather Forecasts, Robert-Schuman-Platz 3, 53175 Bonn, GermanyFaculty of Mathematics and Physics, University of Ljubljana, Jadranska 19, 1000 Ljubljana, Sloveniahttp://arxiv.org/abs/2605.23778v1The physics of AI weather models2026-05-22T15:43:56ZCould it be that AI weather models are solving physical equations, although they may not be the equations used by conventional NWP models? We compute correlations of forecast skill and Centered Kernel Alignment, providing evidence that different AI weather models represent the atmosphere in similar ways, despite differences in architecture and capacity. We argue that the architecture and training of the AI models constrains the form of the physical laws that they might simulate. In particular, we propose that the models implement a particle description of the atmosphere, where the latent variables at each mesh point correspond to the position of a particle in the high dimensional latent space. We hypothesize that the movement of the particles follows a gradient flow in the latent space towards a minimum of a learned free energy functional. Analysis of the GraphCast and Aurora models show that they make changes on large spatial scales in the early processor layers and move to smaller scale with increasing layer depth, consistent with the gradient flow hypothesis.2026-05-22T15:43:56ZGeorge CraigTobias SelzMatthias BeylichKirsten I. Tempesthttp://arxiv.org/abs/2605.23776v1Precipitation diffusion downscaling and application to out-of-distribution simulations with and without stratospheric aerosol injection2026-05-22T15:43:25ZStratospheric aerosol injection (SAI), a possible climate engineering strategy where reflective particles are injected into the stratosphere, has been explored to mitigate global warming and its associated risks, such as the intensification of extreme precipitation events. However, current Earth system models (ESMs) often used to simulate SAI and other climate change scenarios are too coarse to properly assess such risks. Traditional statistical downscaling methods, used to project higher resolution impacts, may be biased and unrealistic. To address this, we train a deep learning diffusion downscaler to generate 0.25° contiguous United States (CONUS) daily precipitation using historical and future climate simulations from the Mesoscale Atmosphere-Ocean Interaction in Seasonal-to-Decadal Climate Prediction (MESACLIP) project, then apply the diffusion downscaler to out-of-distribution CESM2 simulations with and without SAI. The diffusion model generates realistic downscaled precipitation using either MESACLIP or CESM2 inputs. It also faithfully recreates the climate change projections of extreme precipitation in MESACLIP. Diffusion-downscaled projections of the future CESM2 SAI scenarios suggest that SAI could nearly cut in half the CONUS-average increase in yearly max precipitation, compared to the non-SAI scenario. However, there is considerable regional variation and internal variability, with SAI modeled to only slightly reduce increases in extreme precipitation frequency in the Mid Atlantic and the Pacific Northwest, but mitigating most intensification in other regions. Future application of diffusion downscaling to a wider variety of SAI scenarios would provide valuable insight into how proposed SAI strategies may affect precipitation variability on fine spatial scales for regional impact assessments.2026-05-22T15:43:25ZCameron DongJames W. HurrellElizabeth A. Barneshttp://arxiv.org/abs/2601.05841v2Non-stationary time series attribution for heatwaves over Europe2026-05-22T11:20:13ZThe increasing occurrence of extreme weather events since the beginning of the 21st century has led to the development of new methods to attribute extreme events to anthropogenic climate change. The way in which the extreme event is defined has a major influence on the attribution result. A frequently disregarded or overlooked aspect concerns the temporal dependence and the clustering of extremes. This study presents an approach for attributing complete time series during extreme events to anthropogenic forcing. The approach is based on a non-stationary Markov process using bivariate extreme value theory to model the temporal dependence of the time series. We calculate the likelihood ratio of an observational time series from ERA5 given the distributions as estimated from CMIP6 simulations with historical natural-only and natural and anthropogenic forcing scenarios. The spatial fields are condensed by the extremal pattern index (EPI) as a compact description of spatial extremes. In addition, the study examines the extent to which attribution statements about the occurrence of extreme heat events change when the effect of the mean warming is eliminated. The resulting attribution statement provides very strong evidence for the scenario with anthropogenic drivers over Europe, especially since the beginning of the 21st century. For central and southern Europe, the influence of anthropogenic greenhouse gas emissions on heatwaves could already have been proven in the 1970s using today's methods. There is no reliable signal apart from a general increase in temperature, neither in terms of the temporal dependence of extreme heat days nor in terms of the shape of the extreme value distribution.2026-01-09T15:14:25Z41 Pages, 23 figures. v2 (revised version) that been submitted to Advances in Statistical Climatology, Meteorology and Oceanography (ASCMO)Pascal MeurerSebastian BuschowSvenja SzemkusPetra Friederichshttp://arxiv.org/abs/2605.11639v2Enabling High-Accuracy Data Assimilation with Limited Ensembles via Machine Learning-Based Covariance Correction2026-05-22T09:51:48ZData assimilation (DA) integrates numerical model forecasts with observations to achieve the optimal state estimation. Ensemble-based methods, such as the ensemble Kalman filter (EnKF), are widely used for state estimation for high-dimensional and nonlinear dynamic systems. However, their performance strongly depends on the ensemble size, therefore causing a tradeoff problem between analysis accuracy and computational cost. To address this problem, this study presents a machine learning-based EnKF framework that maintains high accuracy with a relatively small ensemble size. Specifically, a multilayer perceptron (MLP) function is built to predict the difference between the forecast error covariances estimated from a limited ensemble and a sufficiently large ensemble, with the latter being assumed to be an accurate approximation of the underlying truth. This predicted covariance difference term is then incorporated into the EnKF algorithm via an element-wise scaling strategy, resulting in an amended forecast covariance matrix that better approximates the true uncertainty level and sequentially produces more accurate analysis results. To demonstrate the feasibility and robustness of the proposed algorithm, we perform a set of numerical experiments with the Lorenz-63 and Lorenz-96 systems under various configurations, and the results consistently indicate that the proposed algorithm can significantly outperform the standard EnKF with the same limited ensemble size, by achieving notably higher analysis accuracy while remaining computationally efficient. This approach provides a practical and feasible pathway to accurate and computationally efficient data assimilation for high-dimensional and nonlinear dynamic systems.2026-05-12T07:00:50ZZhou YaoZhilin LiLi ZhaoZeng LiuZhaokuan LuSeungnam KimGuangyao Wanghttp://arxiv.org/abs/2605.23403v1Hybrid Quantum-Classical Corrective Diffusion Modeling for Meteorological Downscaling2026-05-22T09:14:25ZStatistical downscaling is a crucial component of the weather modeling field, where high-resolution outputs must be reconstructed from coarse-resolution inputs with the full cost of dynamical refinement. In this work, we investigate a hybrid quantum-classical corrective diffusion model for probabilistic statistical downscaling of weather fields. The proposed model inserts variational quantum circuit layers into the most compressed bottleneck of the diffusion UNet while leaving the regression branch fully classical. This placement tests whether quantum circuits can act as compact nonlinear feature maps for latent-channel mixing. We evaluate intra-channel and cross-channel ansätze on 10m wind components. On the 2020 validation set, the hybrid models remain stable, preserve the large-scale spatial organization of the generated wind fields, and improve both MAE and CRPS relative to a classical corrective diffusion model in several configurations. Structural diagnostics further show that the hybrid variants preserve kinetic-energy spectra and windspeed distributions similar to its classical counterpart while producing controlled changes in tail behavior, extreme-windspeed localization, and joint wind field components structure. Backend studies on the 2020 validation set show negligible impact from simulated device noise at the tested circuit scale, whereas real-hardware deployment remains limited by qubit availability and execution fidelity. The 2021 out-of-distribution test shows that these in-distribution gains do not transfer uniformly under temporal shift, revealing a generalization gap that motivates future mitigation through stabilization and regularization. These results show that bottleneck-level quantum hybridization can make a nontrivial contribution to weather statistical downscaling, while also highlighting that circuit scale and hardware deployment remain key limiting factors.2026-05-22T09:14:25Z11 pages, 9 figures. Submitted to IEEE QCE 2026Rui WangEdoardo PasettoAmer DelilbasicMorris RiedelKristel MichielsenGabriele Cavallarohttp://arxiv.org/abs/2605.24067v1Seeing Inside the Storm: Improving Nowcasting by Integrating Meteorological Drivers2026-05-22T08:15:35ZMost nowcasting systems, built on radar reflectivity, focus on current precipitation, ignoring the atmospheric precursors -- such as low-level convergence, turbulent eddies, and latent heating -- that offer a fleeting window to foresee storm birth. We introduce MeteoLogist, a physics-inspired radar intelligence framework that models the full life cycle of convection -- from its precursors to organized storm evolution. However, exploiting these precursors is non-trivial: they originate from multiple meteorological drivers -- thermodynamic, kinematic, and microphysical -- that evolve asynchronously (C1) and remain spatially fragmented (C2). To this end, MeteoLogist designs three tightly integrated components. The Physics-Tailored Encoders process radar echoes according to their intrinsic physical scales and semantics, forming thermodynamic, kinematic, and microphysical streams that capture distinct dynamical regimes. The Temporal-Phase Aligner addresses C1 by leveraging causal temporal attention to capture when and how different drivers interact and activate. The Cross-Field Spatial Aggregator addresses C2 through cross-regional fusion, aligning weak and scattered precursors across neighboring cells to expose upstream triggers and enforce spatial coherence. Evaluated on 3D-NEXRAD (2020--2022, US-wide), MeteoLogist boosts high-impact detection (CSI40) by +9.7% over strong baselines, and achieves a remarkable 37.67% gain during the storm-developing stage -- demonstrating true foresight in sensing storms before they appear. The code can be found in the supplementary material.2026-05-22T08:15:35ZMinghui QiuJun ChenLin ChenWeifeng ChenShuxin ZhongZhidan LiuYu ZhangKaishun Wu