Prediction of PM2.5 Concentration Based on Ensemble Learning

doi:10.13190/j.jbupt.2019-153

JOURNAL OF BEIJING UNIVERSITY OF POSTS AND TELECOM ›› 2019, Vol. 42 ›› Issue (6): 162-169.doi: 10.13190/j.jbupt.2019-153

• Reports • Previous Articles Next Articles

Prediction of PM_2.5 Concentration Based on Ensemble Learning

PENG Yan, ZHAO Zi-ru, WU Ting-xian, WANG Jie

School of Management, Capital Normal University, Beijing 100056, China

Received:2019-07-22 Online:2019-12-28 Published:2019-11-15

Abstract

Abstract: The increase of PM2.5 is a cause of haze. Effectively predicting PM2.5 concentration and analyzing its influence factors play an important role in air quality forecasting and controlling. Considering nonlinearity and uncertainty of PM2.5 concentration, a PM2.5 concentration prediction model which firstly selects features using integrated trees was presented based on ensemble trees-gradient boosting decision tree(GBDT). With standard arithmetic mean aggregation method, the article calculates the influence degree of each feature on the increment of PM2.5 concentration, and provides the impact ranking from strong to weak. The grid-search to select the optimal parameters of the GBDT algorithm was used, such as the depth of the tree. Two datasets, the pollutant concentration data and meteorological observation data of Beijing from 2015 to 2016, are used in the prediction model proposed. Compared with standard models such as decision tree, random forest and support vector machine, the ensemble trees-GBDT model is found to be lower mean absolute errors, lower root mean square errors and better generalization ability.

Key words: PM_2.5 prediction model, integrated feature selection, gradient boosting decision tree, analysis of influencing factors

CLC Number:

TP391

PENG Yan, ZHAO Zi-ru, WU Ting-xian, WANG Jie. Prediction of PM_2.5 Concentration Based on Ensemble Learning[J]. JOURNAL OF BEIJING UNIVERSITY OF POSTS AND TELECOM, 2019, 42(6): 162-169.

References

[1] 张青, 饶灿. 典型区域城市PM_2.5与PM₁₀比值相关性研究[J]. 绿色科技, 2019(12):129-130. Zhang Qing, Rao Can. Correlation analysis between PM_2.5 and PM₁₀ ratio in typical regional cities[J]. Journal of Green Science and Technology, 2019(12):129-130.
[2] 刘晓红, 王慧. 基于中欧对比视角的货运机动车尾气排放PM_2.5分析研究[J]. 环境科学学报, 2019, 39(8):2830-2838. Liu Xiaohong, Wang Hui. An analysis of vehicle-related PM_2.5 emissions:the perspective from China and Europe[J]. Acta Scientiae Circumstantiae, 2019, 39(8):2830-2838.
[3] 李建新, 刘小生, 刘静, 等. 基于MRMR-HK-SVM模型的PM_2.5浓度预测[J]. 中国环境科学, 2019, 39(6):2304-2310. Li Jianxin, Liu Xiaosheng, Liu Jing, et al. Prediction of PM_2.5 concentration based on MRMR-HK-SVM model[J]. China Environmental Science, 2019, 39(6):2304-2310.
[4] 王平, 张红, 秦作栋, 等. 基于wavelet-SVM的PM₁₀浓度时序数据预测[J]. 环境科学, 2017, 38(8):3153-3161. Wang Ping, Zhang Hong, Qin Zuodong, et al. PM₁₀ concentration forecasting model based on wavelet-SVM[J]. Environmental Science, 2017, 38(8):3153-3161.
[5] 任才溶, 谢刚. 基于随机森林和气象参数的PM_2.5浓度等级预测[J]. 计算机工程与应用, 2019, 55(2):213-220. Ren Cairong, Xie Gang. Prediction of PM_2.5 concentration level based on random forest and meteorological parameters[J]. Computer Engineering and Applications, 2019, 55(2):213-220.
[6] 黄婕, 张丰, 杜震洪, 等. 基于RNN-CNN集成深度学习模型的PM_2.5小时浓度预测[J]. 浙江大学学报(理学版), 2019, 46(3):370-379. Huang Jie, Zhang Feng, Du Zhenhong, et al. Hourly concentration prediction of PM_2.5 based on RNN-CNN ensemble deep learning model[J]. Journal of Zhejiang University(Science Edition), 2019, 46(3):370-379.
[7] 张俐, 袁玉宇, 王枞. 基于最大相关信息系数的FCBF特征选择算法[J]. 北京邮电大学学报, 2018, 41(4):86-90. Zhang Li, Yuan Yuyu, Wang Cong. FCBF feature selection algorithm based on maximum information coefficient[J]. Journal of Beijing University of Posts and Telecommunications, 2018, 41(4):86-90.
[8] 崔鸿雁, 徐帅, 张利锋, 等. 机器学习中的特征选择方法研究及展望[J]. 北京邮电大学学报, 2018, 41(1):1-12. Cui Hongyan, Xu Shuai, Zhang Lifeng, et al. The keytechniques and future vision of feature selection in machine learning[J]. Journal of Beijing University of Posts and Telecommunications, 2018, 41(1):1-12.
[9] Dietterich T G. Machine learning research:four current directions[J]. AI Magazine, 1997, 18(4):97-136.
[10] 刘云翔, 陈斌, 周子宜. 一种基于随机森林的改进特征筛选算法[J]. 现代电子技术, 2019, 42(12):117-121. Liu Yunxiang, Chen Bin, Zhou Ziyi. An improved feature selection algorithm based on random forest[J]. Modern Electronics Technique, 2019, 42(12):117-121.
[11] Geurts P, Ernst D, Wehenkel L. Extremely randomized trees[J]. Machine Learning, 2006, 63(1):3-42.
[12] 黄丛吾, 陈报章, 马超群, 等. 基于极端随机树方法的WRF-CMAQ-MOS模型研究[J]. 气象学报, 2018, 76(5):779-789. Huang Congwu, Chen Baozhang, Ma Chaoqun, et al. WRF-CMAQ-MOS studies based on extremely randomized trees[J]. Acta Meteorologica Sinica, 2018, 76(5):779-789.
[13] 刘金硕, 刘必为, 张密, 等. 基于GBDT的电力计量设备故障预测[J]. 计算机科学, 2019, 46(S1):392-396. Liu Jinshuo, Liu Biwei, Zhang Mi, et al. Fault prediction of power metering equipment based on GBDT[J]. Computer Science, 2019, 46(S1):392-396.
[14] 雷雪梅, 谢依彤. 用于高血压菜谱识别的基于遗传算法的改进XGBoost模型[J]. 计算机科学, 2018, 45(增刊1):476-481. Lei Xuemei, Xie Yitong. Improved XGBoost model based on genetic algorithm for hypertension recipe recognition[J]. Computer Science, 2018, 45(S1):476-481.
[15] Friedman J H. Greedy function approximation:a gradient boosting machine[J]. The Annals of Statistics, 2001, 29(5):1189-1232.

Prediction of PM_2.5 Concentration Based on Ensemble Learning

PDF

Knowledge

Abstract

Cite this article

share this article

References

Related Articles 2

Recommended Articles

Metrics

Comments

[1]	SUN Quan-ming, CHANG Lei, MA Cheng, QU Zhi-jian. Multi-Modal Transportation Recommendation Based on Graph Embedding and CaGBDT [J]. Journal of Beijing University of Posts and Telecommunications, 2021, 44(5): 81-87,106.
[2]	XU Xiao-bo, ZHANG Wen-bo, HE Chao, LUO Yi. A Malicious Code Detection Method Based on Ensemble Learning of Behavior [J]. JOURNAL OF BEIJING UNIVERSITY OF POSTS AND TELECOM, 2019, 42(4): 89-95.