Thunderstorms, including straight-line (non-tornadic) winds, cause an average of over 100 deaths and \$10 billion of insured damage per year in the United States. In the past decade machine learning has led to significant improvements in the prediction of other convective hazards, such as tornadoes, hail, lightning, and convectively induced aircraft turbulence. However, very few studies have used machine learning specifically to predict damaging straight-line winds. We have developed machine-learning models to predict the probability of damaging straight-line wind, defined as a gust >=50 kt (25.72 m/s), for a given storm cell. Predictions are made for three buffer distances around the storm cell (0, 5, and 10 km) and five lead-time windows ([0, 15]; [15, 30]; [30, 45]; [45, 60]; and [60, 90] minutes).
Three types of data are used to train models: radar images from the Multi-year Reanalysis of Remotely Sensed Storms (MYRORSS); atmospheric soundings from the Rapid Update Cycle (RUC) model and North American Regional Reanalysis (NARR); and near-surface wind observations from the Meteorological Assimilation Data Ingest System (MADIS), Oklahoma Mesonet, one-minute meteorological aerodrome reports (METARs), and National Weather Service local storm reports. Radar images are used to determine the structural and hydrometeorological properties of storm cells, while soundings are used to determine properties of the near-storm environment, which are important for storm evolution. Both of these data types are used to create predictor variables. Meanwhile, near-surface wind observations are used as verification data (to determine which storm cells produced damaging straight-line winds).
For each buffer distance and lead-time window, we experiment with five machine-learning algorithms: logistic regression, logistic regression with an elastic net, feed-forward neural nets, random forests, and gradient-boosted tree (GBT) ensembles. Forecast probabilities from each model are calibrated with isotonic regression, which makes them more reliable. Forecasts are verified mainly with three numbers: area under the receiver-operating-characteristic curve (AUC), maximum critical success index (CSI), and Brier skill score (BSS). AUC and maximum CSI range from [0, 1], where 0 is the worst score and 1 is a perfect score. BSS ranges from [-infinity, 1], where -infinity is the worst score; 1 is a perfect score; and >0 means that the model is better than climatology. Models are ranked by AUC. The best model (for a buffer distance of 0 km and lead time of [15, 30] minutes) has an AUC of 0.996, maximum CSI of 0.99, and BSS of 0.88. The worst model (for a buffer distance of 10 km and lead time of [60, 90] minutes) has an AUC of 0.89, maximum CSI of 0.20, and BSS of 0.12. All models outperform climatology.
Finally, for each buffer distance and lead-time window, we use three methods to select the most important predictor variables: sequential forward selection, J-measures, and decision trees.
Ryan Lagerquist (2016). Using Machine Learning To Predict Damaging Straight-Line Convective Winds. Master's Thesis, School of Meteorology, University of Oklahoma.
Related publications and presentations
The code for the thesis can be released on request.
Created by amcgovern [at] ou.edu.
Last modified August 20, 2016 2:19 PM