北京邮电大学学报

  • EI核心期刊

北京邮电大学学报 ›› 2018, Vol. 41 ›› Issue (4): 86-90.doi: 10.13190/j.jbupt.2017-229

• 研究报告 • 上一篇    下一篇

基于最大相关信息系数的FCBF特征选择算法

张俐, 袁玉宇, 王枞   

  1. 北京邮电大学 可信分布式计算与服务教育部重点实验室, 北京 100876
  • 收稿日期:2017-12-05 出版日期:2018-08-28 发布日期:2018-10-09
  • 作者简介:张俐(1977-),男,博士生,E-mail:zhangli_3913@163.com;袁玉宇(1971-),教授,博士生导师.
  • 基金资助:
    国家科技基础性工作专项项目(2015FY111700-6)

FCBF Feature Selection Algorithm Based on Maximum Information Coefficient

ZHANG Li, YUAN Yu-yu, WANG Cong   

  1. Key Laboratory of Trustworthy Distributed Computing and Service(Ministry of Education), Beijing University of Posts and Telecommunications, Beijing 100876, China
  • Received:2017-12-05 Online:2018-08-28 Published:2018-10-09

摘要: 在相关性快速过滤特征选择算法(FCBF)基础上,通过最大相关系数的方式改进FCBF算法.首先,通过最大相关系数和对称不确定性度量准则,计算出每个特征与标签之间的相关度量值,并按照数值大小顺序进行排序;其次,通过最大相关系数和近似马尔可夫毯原理进行无关特征和冗余特征的筛选,最终选择出最优特征子集.在加利福尼亚大学欧文分校的机器学习库(UCI)的8个公开数据集中进行对比实验结果表明基于最大相关系数的特征选择算法(NFCBF)总体优于FCBF算法,它所选择出特征数比FCBF算法所选择特征数平均少了3.625个,分类准确率平均提高了0.075%.与互信息最大算法(MIM)、最少的绝对收缩和选择算法(Lasso)和岭算法(Ridge)等相比也具有明显的优势.

关键词: 最大相关系数, 快速过滤特征选择, 特征相关, 特征冗余, 分类

Abstract: Based on the correlation fast Filtering Feature selection algorithm (FCBF),which is improved by the maximum correlation coefficient. Firstly, It calculates the correlation measure between each feature and label with the ‘maximum normalized information coefficient’ criterion and ‘measurement principle of symmetric uncertainty’ and sort these feature according to the calculated value.Finally, It filters irrelevant features and redundant features by the ‘maximum normalized information coefficient’ criterion and approximate Markov Blanket and obtain the optimal feature subset. Experimental results on machine learning repository of university of california irvine(UCI) eight open datasets show that NFCBF algorithm outperforms FCBF algorithm. The number of features selected by feature selection algorithm based on maximum information coefficient (NFCBF algorithm) is less than 3.625 of the selected feature subset of FCBF algorithm, and the classification accuracy is improved by 0.075%. NFCBF algorithm gives better performance than mutual information maximization algorithm(MIM), Least absolute shrinkage and selection operator algorithm(Lasso) and Ridge algorithm.

Key words: maximal information coefficient, fast correlation based feature selection, feature relevance, feature redundancy, classification

中图分类号: