北京邮电大学学报

  • EI核心期刊

北京邮电大学学报 ›› 2018, Vol. 41 ›› Issue (1): 1-12.doi: 10.13190/j.jbupt.2017-150

• 综述 •    下一篇

机器学习中的特征选择方法研究及展望

崔鸿雁1,2,3, 徐帅1,2,3, 张利锋1,2,3, Roy E. Welsch4, Berthold K. P. Horn5   

  1. 1. 北京邮电大学 网络与交换技术国家重点实验室, 北京 100876;
    2. 北京邮电大学 网络体系构建与融合北京市重点实验室, 北京 100876;
    3. 先进信息网络北京实验室, 北京 100876;
    4. Sloan School of Management, Massachusetts Institute of Technology, MA 02139, USA;
    5. Csail Laboratory, Massachusetts Institute of Technology, MA 02139, USA
  • 收稿日期:2017-07-20 出版日期:2018-02-28 发布日期:2018-01-04
  • 作者简介:崔鸿雁(1977-),女,博士生导师,E-mail:cuihy@bupt.edu.cn.
  • 基金资助:
    教育部-中国移动科研基金项目(MCM20170306)

The Key Techniques and Future Vision of Feature Selection in Machine Learning

CUI Hong-yan1,2,3, XU Shuai1,2,3, ZHANG Li-feng1,2,3, Roy E. Welsch4, Berthold K. P. Horn5   

  1. 1. State Key Laboratory of Networking and Switching Technology, Beijing University of Posts and Telecommunications, Beijing 100876, China;
    2. Key Laboratory of Network System Architecture and Convergence, Beijing University of Posts and Telecommunications, Beijing 100876, China;
    3. Beijing Laboratory of Advanced Information Networks, Beijing 100876, China;
    4. Sloan School of Management, Massachusetts Institute of Technology, MA 02139, USA;
    5. Csail Laboratory, Massachusetts Institute of Technology, MA 02139, USA
  • Received:2017-07-20 Online:2018-02-28 Published:2018-01-04

摘要: 任何领域的大数据研究都离不开用机器学习方法提取特征.为了探求满足海量大数据分析需求的特征选择方法,笔者对利用机器学习进行特征选择的常用方法做了深入分析,归纳总结出特征选择的五大类方法:相关性度量方法、Lasso稀疏选择方法、集成方法、神经网络方法、主成分分析方法.通过对比不同特征选择方法的原理、实现过程以及应用场景,给出了不同算法下进行特征选择时的适用范围、优缺点和关键点,为研究者提供参考.

关键词: 机器学习, 特征选择, 迁移学习, 对抗神经网络, 人工智能

Abstract: Big data research is widely spread around the world, and feature selection of machine learning plays an important role on these researches. To address the issue of discovering novel feature selection methods in data mining tasks on big data, this paper researches five models related to feature selection:linear coefficient correlation, Lasso sparse selection, ensemble learning models, neural networks, principal component analysis. The merits and drawbacks of these models are extensively discussed in depth in this paper, which may help in providing a direction for those who are interested in the machine learning area.

Key words: machine learning, feature selection, transfer learning, generative adversarial networks, artificial intelligence

中图分类号: