Optimizing Genetic Algorithm Parameters for Atmospheric Carbon Monoxide Modeling
Duggal, M., Daniels, W., Hammerling, D., & Buchholz, R. (2021). Optimizing Genetic Algorithm Parameters for Atmospheric Carbon Monoxide Modeling (No. NCAR/TN-566+STR). doi:10.5065/h45f-c987
A main source of atmospheric carbon monoxide (CO) variability in the Southern Hemisphere is large burn events, making CO a useful proxy for fires. Therefore, predictive CO models over fire regions can help countries prepare for unusually large fire seasons. Fires are related to the climate throug... Show moreA main source of atmospheric carbon monoxide (CO) variability in the Southern Hemisphere is large burn events, making CO a useful proxy for fires. Therefore, predictive CO models over fire regions can help countries prepare for unusually large fire seasons. Fires are related to the climate through fuel dryness and availability, both of which respond to variability in the climate. Climate indices are metrics that summarize climate variability through changes in sea surface temperature and wind. In previous work, we developed a multiple linear regression model that uses these climate indices to predict atmospheric CO and created the R package regClimateChem to perform variable selection. This package offers three different variable selection techniques: stepwise selection, a genetic algorithm, and an exhaustive search. The exhaustive search always finds the best possible model but is computationally expensive. Stepwise selection runs quickly and is scalable but often fails to find the best model. We implement a genetic algorithm as a potential compromise between computational expense and model accuracy. As a stochastic variable selection technique, the genetic algorithm has many parameters that affect the stopping criterion, frequency of the model modification techniques, and population size. Here we present a parameter optimization study for the genetic algorithm, seeking to balance computational expense and model quality. When considering models with four covariates, we find that the optimized genetic algorithm parameters result in a runtime reduction of 11.8% and only compromise 0.3% accuracy compared to the default settings. We then consider models with five covariates using a high-performance computing system. For models with five covariates, we find that the optimized population size becomes close to the total number of models, meaning it behaves similarly to the exhaustive method Show less