北京邮电大学学报

  • EI核心期刊

北京邮电大学学报 ›› 2009, Vol. 32 ›› Issue (1): 65-68.doi: 10.13190/jbupt.200901.65.mayl

• 论文 • 上一篇    下一篇

机器学习用于网络流量识别

马永立 钱宗珏 寿国础 胡怡红   

  1. 北京邮电大学通信测试技术研究中心, BUPT Comtest R&D Center, Beijing University of Posts and Telecommunications, Beijing 100876, China 北京邮电大学 北京邮电大学
  • 收稿日期:2008-05-28 修回日期:1900-01-01 出版日期:2009-01-28 发布日期:2009-01-28
  • 通讯作者: 马永立

Network Flow Identification Based on Machine Learning

Guochu Shou Yihong Hu   

  • Received:2008-05-28 Revised:1900-01-01 Online:2009-01-28 Published:2009-01-28

摘要:

提出了将机器学习中的C4.5算法应用于传输层的网络流量特征识别技术.运用相关性特征选择和遗传算法形成了流量特征子集.提出并采用 N折交叉验证与测试集相结合的方法评估了国家运营宽带网络中的流量测试分类结果.实验证明,无需预知端口和协议标签,网络流量就能被成功地识别与分析.

关键词: 机器学习, 决策树, 流量识别

Abstract:

Machine learning with C4.5 algorithm is proposed for network traffic identification. The correlation feature selection algorithm and the genetic algorithm are adopted to select the attribute feature subset. A method of combining N-fold cross-validation with testing set is suggested to assess the classification results of the current national broadband network traffic. Experiments demonstrate that network traffic can be successfully identified and analyzed, meanwhile, the port number and the application layer protocol label of network flows are not necessary to be known in advance.

Key words: machine learning, decision tree, flow identification