北京邮电大学学报

  • EI核心期刊

北京邮电大学学报 ›› 2019, Vol. 42 ›› Issue (6): 162-169.doi: 10.13190/j.jbupt.2019-153

• 研究报告 • 上一篇    下一篇

PM2.5浓度预测与影响因素分析

彭岩, 赵梓如, 吴婷娴, 王洁   

  1. 首都师范大学 管理学院, 北京 100056
  • 收稿日期:2019-07-22 出版日期:2019-12-28 发布日期:2019-11-15
  • 通讯作者: 王洁(1978-),女,副教授,E-mail:wangjie@cnu.edu.can. E-mail:wangjie@cnu.edu.can
  • 作者简介:彭岩(1967-),女,教授.
  • 基金资助:
    全国教育科学规划项目-教育部重点课题(DLA190426)

Prediction of PM2.5 Concentration Based on Ensemble Learning

PENG Yan, ZHAO Zi-ru, WU Ting-xian, WANG Jie   

  1. School of Management, Capital Normal University, Beijing 100056, China
  • Received:2019-07-22 Online:2019-12-28 Published:2019-11-15

摘要: 针对PM2.5浓度的非线性和不确定性,提出了一种基于集成树-梯度提升决策树(EnsembleTrees-GBDT)的PM2.5预测模型.该模型首先在集成树框架下进行特征选择,即选取PM2.5浓度主要影响因素,使用算术均值聚合法计算出各项特征对PM2.5浓度增加的影响程度,并以影响程度由强到弱的次序排序;其次使用网格搜索对GBDT算法进行参数优化,选取树的深度等参数的最优值;最后构建完整的PM2.5浓度集成预测模型.使用北京市2015-2016年的污染物浓度和气象条件观测值2个数据集,对模型进行了预测仿真实验.对比实验结果表明,所提出的EnsembleTrees-GBDT预测模型相比于决策树、随机森林、支持向量机等模型,具有更低的平均绝对误差和均方根误差,同时具有更好的泛化能力,能够更准确地预测PM2.5浓度,并实现对PM2.5浓度影响因素的有效分析.

关键词: PM2.5预测模型, 集成特征选择, 梯度提升决策树, 影响因素分析

Abstract: The increase of PM2.5 is a cause of haze. Effectively predicting PM2.5 concentration and analyzing its influence factors play an important role in air quality forecasting and controlling. Considering nonlinearity and uncertainty of PM2.5 concentration, a PM2.5 concentration prediction model which firstly selects features using integrated trees was presented based on ensemble trees-gradient boosting decision tree(GBDT). With standard arithmetic mean aggregation method, the article calculates the influence degree of each feature on the increment of PM2.5 concentration, and provides the impact ranking from strong to weak. The grid-search to select the optimal parameters of the GBDT algorithm was used, such as the depth of the tree. Two datasets, the pollutant concentration data and meteorological observation data of Beijing from 2015 to 2016, are used in the prediction model proposed. Compared with standard models such as decision tree, random forest and support vector machine, the ensemble trees-GBDT model is found to be lower mean absolute errors, lower root mean square errors and better generalization ability.

Key words: PM2.5 prediction model, integrated feature selection, gradient boosting decision tree, analysis of influencing factors

中图分类号: