Generative climate downscaling enables high-resolution compound risk assessment by preserving multivariate dependencies

2026-05-12T04:56:59Z

Physics-based climate projections using general circulation models are essential for assessing future risks, but their coarse resolution limits regional decision-making. Statistical downscaling can efficiently add detail, yet many methods treat variables independently, degrading inter-variable relationships that govern compound hazards such as heat stress, drought, and wildfire. Here we show that a diffusion-based multivariate generative framework, combined with bias correction, recovers degraded inter-variable correlations even under a 50$\times$ increase in linear resolution. When applied to five meteorological variables over Japan, the framework reduces inter-variable correlation errors by more than fourfold relative to existing baselines while improving both univariate and spatial accuracy, leading to more accurate detection of severe drought. These results demonstrate that multivariate generative downscaling improves the reliability of compound risk assessment under large resolution gaps.

Automated Detection and Climatological Analysis of Ripple-Scale Gravity Wave Instabilities Using a Squeeze-and-Excitation Convolutional Neural Network

2026-05-11T20:46:01Z

All-sky OH airglow imaging provides two-dimensional observations of mesospheric gravity wave structure near ~87 km altitude. Ripple-scale instability signatures, characterized by 5-15 km horizontal wavelengths and short lifetimes, are particularly difficult to identify consistently using manual inspection. In this study, we develop a reproducible, automated detection framework based on a squeeze-and-excitation convolutional neural network (SE-CNN) trained on 41 x 41 pixel image patches, to identify ripple-scale structures in 512 x 512 pixel all-sky airglow images acquired at Yucca Ridge Field Station (40.7o N, 104.9o W). The time-differenced images are normalized using a robust median-absolute-deviation (MAD) scaling procedure to mitigate star contamination and background variability. The model is trained and validated on manually annotated ripple and non-ripple patches, then evaluated using independent test subsets. The automated detection is performed using a sliding-window approach with spatial and temporal clustering criteria for event definition. At the patch level, the classifier achieves 92\% F1-score with high precision and recall. At the event level, automated detections recover approximately 90\% of manually identified ripple events while identifying additional low-amplitude occurrences. Validated against previous manual identification study, the automated detection catalog enables objective quantification of ripple occurrence frequency, seasonal modulation, and lifetime distributions. By emphasizing methodological transparency, calibration considerations, and validation metrics, this framework establishes a scalable measurement technique for systematic detection of mesospheric instability signatures in long-term airglow image archives.

Reflecting on a Decade of Formalized Tornado Emergencies

2026-05-11T06:23:34Z

In 1999 the NWS began using the phrase "tornado emergency" to denote tornado warnings for storms with the potential to cause rare, catastrophic damage. After years of informal usage, tornado emergencies were formally introduced to 46 weather forecasting offices in 2014 as part of the impact-based warning (IBW) program, with a nationwide rollout occurring over the following years. In concert with the new tiered warning approach, the Warning Decision Training Division (WDTD) also introduced suggested criteria for when forecasters should consider upgrading a tornado warning to a tornado emergency, which includes thresholds of rotational velocity (VROT) and significant tornado parameter (STP). Although significant research has studied both tornado forecasting and tornado warning dissemination in the decade since, relatively little work has examined the effectiveness of the tornado emergency specifically. Our analysis of all 89 IBW tornado emergencies issued from 2014-2023 found that forecasters do not appear to follow the suggested criteria for issuance in the majority of cases, with only a handful of tornado emergencies meeting both the VROT and STP thresholds. Regardless, 70% of tornado emergencies were issued for EF-3+ tornadoes, and tornado emergencies covered 55% of all EF-4 tornadoes as well as 41% of all tornadoes resulting in 3 or more fatalities. Based on these results, we propose several updates to the current NWS training materials for impact-based tornado warnings.

Estimating the Kinetic Energy Spectrum from the Second-Order Velocity Structure Function using a Regularized Fitting Approach

2026-05-10T18:01:05Z

Ocean turbulence plays a key role in shaping large-scale circulation, heat uptake, and biogeochemical processes. The kinetic energy (KE) wavenumber spectrum is a fundamental diagnostic, quantifying how KE is distributed across spatial scales. The second-order structure function -- computed from velocity differences between spatially separated observations -- provides a complementary measure, but unlike the KE spectrum, it reflects a non-local, weighted integral of KE over all scales. Analytic relationships link the two metrics, permitting forward and inverse transformations between them. However, recovering the KE spectrum from the structure function via the inverse relationship is highly sensitive to sampling limitations and numerical discretization errors. Here we propose a regularized approach in which the spectrum is assumed to consist of a finite number of segments with distinct slopes and amplitudes, and the inversion is formulated as an optimization problem. The approach is first validated in an idealized setting; for a number of idealized KE spectra with prescribed sets of spectral slopes and amplitudes, the corresponding structure functions are computed by numerically evaluating the forward relationship. These structure functions are then used to determine the underlying parameters using our proposed approach, which shows that we are able to perfectly recover the parameters and consequently the KE spectra. The method is further evaluated on high-resolution ocean model output, where it reconstructs the underlying spectra well even in the presence of noise. Finally, we apply the method to surface drifter observations (GLAD and LASER experiments). The results show that the framework enables estimation of the KE spectrum from sparse Lagrangian data, extending spectral diagnostics beyond gridded Eulerian measurements.

The Physical Limit of Neural Hypoxia Detection in the Black Sea from Satellite Observations

2026-05-10T16:57:46Z

Coastal hypoxia (O_2 < 63 [mmol / m^3]) threatens ocean health worldwide. On continental shelves, summer stratification prevents bottom oxygen consumed by respiration from being renewed, making monitoring essential to protect vulnerable ecosystems and reduce biodiversity loss. Although satellite observations are increasingly available, their potential to infer subsurface oxygen remains largely unexplored. This can be framed as a Bayesian inverse problem relating surface observations to the complete Black Sea states. Here, we solve it using a deep generative neural network trained on numerical model outputs, providing a tractable and computationally efficient approximation of the true posterior distribution of sea states. We find that accurate state estimation is limited to the mixed layer, because its homogeneity makes surface conditions representative of subsurface states. During summer, we detect 38% of all hypoxic events shelf-wide with a precision of 47%. Improving results will likely require longer assimilation windows or sub-surface observations.

METBRA25Y: Brazil Surface Meteorology Archive with Harmonized Variables and Quality Control

2026-05-09T05:19:46Z

This data paper describes METBRA25Y, a harmonized archive of hourly surface meteorological observations from Brazil derived from public historical records of the Instituto Nacional de Meteorologia (INMET). The dataset was designed to support reproducible environmental, climatological, hydrological, agricultural, urban-risk, and machine-learning studies that require station-level meteorological time series with standardized variable names and explicit quality-control metadata. The processing workflow ingests annual INMET archives, parses station metadata from raw file headers, normalizes heterogeneous Portuguese column names into a canonical schema, constructs hourly timestamps, consolidates observations by city and station, and exports compressed CSV files together with station manifests, per-station quality flags, daily precipitation aggregates, variable-level failure summaries, and missing-data audits. The quality-control protocol follows a two-stage strategy: first, physically implausible values are converted to missing values and flagged; second, temporal and cross-variable consistency checks generate diagnostic flags without necessarily overwriting the original measurements. The resulting package covers observations between 2000 and 2025, with stationspecific temporal coverage, and includes key meteorological variables such as precipitation, air temperature, dew point, relative humidity, atmospheric pressure, wind speed, wind gust, wind direction, and global solar radiation. Based on the summary files included in the current release snapshot, the archive contains 616 unique station codes across variable summaries, of which 605 have coordinates within a broad Brazil plausibility envelope. This paper documents the dataset provenance, file organization, harmonized schema, quality-control rules, technical validation outputs, limitations, and recommended usage practices.

Learning from Translation: Seasonal Errors and Feature Importance of the ERA5 Turbulence Predictions

2026-05-08T16:43:58Z

Turbulence is a phenomena that is {\it locally} and statistically characterized by measurements, but it is caused by {\it nonlocal} energy cascades associated with the environment. The presence of turbulence coincides with fluctuations in the refractive index, which impact optical sensing, imaging, and signaling applications. Here, we study the machine learning models that predict near-surface optical turbulence strength $C_n^2$, derived from anemometer-based surface flux measurements through Monin-Obukhov similarity theory, using ERA5 reanalysis data as model inputs. We evaluate the model's ability to perform temporal extrapolation by training on one year of co-located $C_n^2$ observations and ERA5 data, and applying the model to ERA5 data from other years at the same site to reconstruct a multi-year time series. We compare the predictions across Southern California and New York. In spite of varying weather and terrain, the ML models show consistent performance and seasonal behavior across training years. All models show greater correlation, faster convergence, and lower prediction errors in the summer. However, some ERA5 features drive predictions in New York but not California and vice versa, and such feature dependence depends on the season. Seasonal error and feature trends suggest that turbulence is affected by atmospheric composition or other seasonal environmental considerations that are not currently monitored by ERA5. We find, regardless of terrain, the primary feature of importance to turbulence prediction is solar radiation, which underlines the central role of radiative energy transfer in driving atmospheric turbulence. We point toward physics-informed ML translation and feature selection as tools for improving the generalizability of data-driven models.

Bridging the Sensitivity Gap in Precipitation Estimates from Spaceborne Radars using Passive Microwave Observations

2026-05-08T03:58:00Z

Current global precipitation estimates from spaceborne precipitation radars are limited by their sensitivity to light and frozen precipitation, leading to systematic underestimation of precipitation at high latitudes. Because passive microwave retrievals (PMW) are commonly trained using these radar observations as reference data, this limitation is propagated into PMW This study introduces a novel PMW oceanic precipitation retrieval, GPROF-NN eXtended Precipitation Regime (XPR), that combines reference estimates from a cloud radar and a precipitation radar to overcome the sensitivity limitations of current spaceborne precipitation radars. The retrieval is trained to estimate light precipitation from CloudSat observations and moderate-to-heavy precipitation using observations from the GPM Dual-Frequency Precipitation Radar. The two estimates are combined using a fusion scheme to obtain a consistent precipitation estimate across precipitation regimes. Validation against in situ measurements from shipborne disdrometers shows a 26% improvement in the detection skill for high-latitude precipitation in terms of the critical success index and a reduction in the underestimation of high-latitude and frozen precipitation by more than 50% compared to retrievals constrained only by precipitation radar data. However, the fused retrieval does not improve the precision of instantaneous precipitation estimates, which is likely due to significant random errors in the CloudSat-based reference estimates of liquid precipitation. These results demonstrate that PMW retrievals can leverage the complementary sensitivities of cloud and precipitation radars to provide more consistent precipitation estimates across precipitation regimes than either reference instrument alone. The proposed retrieval provides a pathway to improve the representation of oceanic precipitation in future GPM precipitation products.

Track-Dependent Links between Tropical Cyclones and Extratropical Predictability in Physical and AI Models

2026-05-08T03:38:08Z

Global medium-range weather forecasts suffer occasional failures, often linked to tropical cyclones (TCs). We investigate TC influences on extratropical predictability by comparing forecasts from a physics-based model (ECMWF-IFS) and an AI-hybrid model (Google-NGCM) initialized near TC genesis. Analyzing 108 out-of-sample Northern Hemisphere cases reveals similar extratropical error growth patterns and comparable performance between the models. This suggests that the NGCM is capable of predicting the bulk upscale effects of tropical convection without directly representing convective processes. Leveraging the NGCM's computational efficiency, we compare forecasts initialized with and without TC genesis to isolate track-dependent forecast impacts. For Week-2 extratropical forecasts, TC impacts are highly time-, metric-, and track-dependent. The analysis confirms that some poleward-moving TCs degrade Week-2 US and European forecasts and suggests significant impacts from westward-moving TCs. The findings highlight the utility of the AI-hybrid model in predictability research and complex tropical-extratropical teleconnections that warrant future research.

GPROF-IR: An Improved Single-Channel Infrared Precipitation Retrieval for Merged Satellite Precipitation Products

2026-05-08T03:02:16Z

Current merged precipitation products such as IMERG, GSMAP, and CMORPH combine satellite estimates from passive microwave (PMW) and infrared (IR) observations. However, the different information content of these sensors makes it challenging to produce consistent precipitation estimates, even for coincident observations. The resulting inconsistencies between PMW and IR retrievals can introduce artifacts in the temporal evolution of merged precipitation fields and lead to an overreliance on time-propagated PMW estimates. We introduce GPROF-IR, a novel IR precipitation retrieval that leverages a convolutional neural network to improve precipitation estimates from single-channel IR observations. We demonstrate that the proposed model is able to leverage the temporal information in half-hourly IR observations to improve precipitation estimates. GPROF-IR is designed for integration into the upcoming release of the Integrated Multi-Satellite Retrieval for GPM (IMERG V08) and produces estimates that are climatologically consistent with the GPROF-NN PMW retrieval. We evaluate GPROF-IR using independent, global reference measurements and demonstrate substantial improvements over conventional IR retrievals. GPROF-IR provides lower mean squared error and higher correlation coefficient than IMERG V07 PMW estimates over continental land masses but remains below the accuracy of PMW precipitation estimates over sea surfaces and climate regimes with a greater influence from shallow precipitation. By expoiting both spatial and temporal information content in geostationary IR observations, GPROF-IR establishes a new state of the art for single-channel IR precipitation retrievals. GPROF-IR can be used to quasi-global precipitation estimates at half-hourly resolution from 1998 onward, providing a consistent and accurate foundation for improving merged precipitation products.

HealDA: Highlighting the importance of initial errors in end-to-end AI weather forecasts

2026-05-07T21:44:32Z

AI weather models now rival leading numerical weather prediction (NWP) systems in medium-range skill. However, almost all still rely on NWP data assimilation (DA) to provide initial conditions, tying them to expensive infrastructure and limiting the practical speed and accuracy gains of ML. More recently, ML-based DA systems have been proposed, which are often trained and evaluated end-to-end with a forecast model, making it difficult to assess the quality of their analysis fields. We introduce HealDA, a global ML-based DA system that maps a short window of satellite and conventional observations directly to a 1° atmospheric state on the HEALPix grid, using a smaller sensor suite than operational NWP. We treat HealDA strictly as a DA module: its analyses are used to initialize off-the-shelf ML forecast models without any fine-tuning of either. For a variety of off-the-shelf ML forecast models, including FourCastNet3 (FCN3), Aurora, and FengWu, HealDA-initialized forecasts lose less than one day of effective lead time when scored against ERA5. HealDA-initialized FCN3 ensembles similarly trail those of the ECMWF IFS ENS system by < 24 h. We find that forecast error growth in these models is unchanged from HealDA initialization, and the skill gap primarily arises from the larger initial error of the HealDA analysis. Spectral analysis reveals that this stems from overfitting to the large scales and upper-tropospheric fields. We also demonstrate that small changes in the verification setup can shift apparent skill by 12--24h, underscoring the need for consistent scoring. Taken together, these results clarify the current performance of ML-based DA systems and show that a relatively simple, direct observation-to-state network can already provide initial conditions that are usable by state-of-the-art ML forecast models with only modest loss in medium-range skill.

AIMIP Phase 1: systematic evaluations of AI weather and climate models

2026-05-07T21:04:05Z

We present the AI weather and climate model intercomparison project (AIMIP), phase 1. Drawing from the rich tradition of intercomparisons in climate model development, we specify a common experiment, output data format, and training constraints (namely, training against historical reanalysis data) for AIMIP Phase 1 models. We aim to identify differences in modeling frameworks and AI architectural choices that influence model behavior, and build trust in AI weather and climate models through open data and evaluation. AIMIP Phase 1 models must simulate the atmosphere given specified historical sea surface temperatures over 1979-2024. We evaluate the models' performance using five major evaluation criteria: biases, trends, response to El Niño-related sea surface temperature anomalies, temporal variability, and out-of-sample generalization tests. We find that the AI models are able to simulate the historical climate and response to forcing as well as a conventional physically-based model, but some AI models underestimate historical warming trends, and their predictions diverge in the out-of-sample generalization tests. We describe the AIMIP Phase 1 dataset that is publicly available for additional evaluations.

Growth of small localized perturbations in Surface Quasi-Geostrophic turbulence

2026-05-07T09:39:23Z

The ``butterfly effect'', i.e. the growth of a localized infinitesimal perturbation, is the fundamental property of chaotic systems. While the butterfly effect is today an obvious property of low-dimensional chaotic systems, its significance is more nuanced in extended systems with many spatial and temporal scales, such as geophysical flows. In this Letter we explore the butterfly effect, i.e., the fate of infinitesimal localized perturbations, in the Surface-Quasi-Geostrophic turbulence, a minimal model for mesoscale geophysical turbulence in the regime of strong stratification and rotation. We find that the evolution of a spatially localized perturbation exhibits strong variability, with an initial transient regime in which the perturbation energy decreases. The duration of this transient is broad and can persist for several small-scale characteristic times, depending on the initial location of the perturbation.

Turbulent damping of fast tidal oscillations by three-dimensional Rayleigh-Bénard convection with a radiating free surface

2026-05-06T16:41:14Z

We present three-dimensional Dedalus simulations of Rayleigh-Bénard convection with a blackbody-radiating free upper surface, subject to a low-amplitude oscillatory forcing that mimics tidal perturbations in convective envelopes of stars and planets. The forcing period is 10-100 times shorter than the convective timescale, $t_{\rm conv}$. Using a Reynolds decomposition of the velocity field averaged over one oscillation period, in which the tidal oscillations naturally constitute the fluctuating field and convection the mean flow, we elucidate the kinetic energy exchange between the two. Provided the oscillatory Reynolds number exceeds a modest threshold, we find that the oscillations systematically transfer kinetic energy to the mean flow at a volume-averaged rate $D_R \sim u'^2 t_{\rm conv}^{-1}$, where $u'$ is the rms fluctuation velocity. This reflects strong, order-unity correlations between the fluctuation velocities and the mean flow. These arise because the oscillatory forcing displaces fluid elements that are then redirected by buoyancy and incompressibility in the same manner as the mean flow. The transfer is dominated by correlations involving vertical velocity fluctuations and vertical gradients of the mean flow. The resulting energy transfer rate is consistent, within the equilibrium-tide framework, with the observed tidal circularisation of solar-type binaries and with the orbital evolution of moons of Jupiter and Saturn. This validates the formalism proposed by Terquem (2021) for the dissipation of fast tides, a longstanding problem. Replacing the free surface with a rigid upper boundary significantly and artificially modifies the correlations.

Interpretable Neural Networks to Predict Momentum Fluxes of Orographic Gravity Waves

2026-05-06T15:49:01Z

State-of-the-art Earth system models (ESMs) cannot explicitly resolve many small-scale atmospheric processes such as atmospheric gravity waves, and thus must represent, or parameterise, their effects on the resolved state. Machine learning (ML) has the potential to improve these parameterisations. In our study, we train neural networks (NNs) on ERA5 reanalysis data to predict momentum fluxes of orographic gravity waves as a function of the state variables at the resolution of a coarse ESM. Employing a full year of data, we extract inertia-gravity waves using the software MODES, which applies linear theory for wave filtering, and train ML models on data coarse-grained to the ESM's target resolution. We consider four different cases: the full spectrum of inertia-gravity waves resolved in ERA5, or just the part of the spectrum that is subgrid-scale in the target ESM, both over all land or just over mountainous terrain. Our NNs successfully predict momentum fluxes, with a global coefficient of determination ($R^2$) ranging from 0.72 to 0.56, depending on the case, when evaluated offline with data from another year. An analysis of our models using SHAP values, an explainable AI technique, suggests that the networks learned physically meaningful relationships. In addition, we give a comparison with the physics-based parameterisation scheme by Lott and Miller. This work forms the basis for the development of operational ML-based parameterisations to improve the representation of gravity waves and their effects in climate models.