Probabilistic forecasts of mesoscale convective system initiation using the random forest data mining technique
Ahijevych, D., Pinto, J., Williams, J. K., & Steiner, M. (2016). Probabilistic forecasts of mesoscale convective system initiation using the random forest data mining technique. Weather And Forecasting, 31, 581-599. doi:10.1175/WAF-D-15-0113.1
A data mining and statistical learning method known as a random forest (RF) is employed to generate 2-h forecasts of the likelihood for initiation of mesoscale convective systems (MCS-I). The RF technique uses an ensemble of decision trees to relate a set of predictors [in this case radar reflect... Show moreA data mining and statistical learning method known as a random forest (RF) is employed to generate 2-h forecasts of the likelihood for initiation of mesoscale convective systems (MCS-I). The RF technique uses an ensemble of decision trees to relate a set of predictors [in this case radar reflectivity, satellite imagery, and numerical weather prediction (NWP) model diagnostics] to a predictand (in this case MCS-I). The RF showed a remarkable ability to detect MCS-I events. Over 99% of the 550 observed MCS-I events were detected to within 50 km. However, this high detection rate came with a tendency to issue false alarms either because of premature warning of an MCS-I event or in the continued elevation of RF forecast likelihoods well after an MCS-I event occurred. The skill of the RF forecasts was found to increase with the number of trees and the fraction of positive events used in the training set. The skill of the RF was also highly dependent on the types of predictor fields included in the training set and was notably better when a more recent training period was used. The RF offers advantages over high-resolution NWP because it can be run in a fraction of the time and can account for nonlinearly varying biases in the model data. In addition, as part of the training process, the RF ranks the importance of each predictor, which can be used to assess the utility of new datasets in the prediction of MCS-I. Show less