Journal of Beijing University of Posts and Telecommunications

  • EI核心期刊

JOURNAL OF BEIJING UNIVERSITY OF POSTS AND TELECOM ›› 2014, Vol. 37 ›› Issue (5): 66-70,79.doi: 10.13190/j.jbupt.2014.05.014

• Papers • Previous Articles     Next Articles

Study of Internet Traffic Classification Method Based on Bootstrapping

LIU Zhen1,2, WANG Ruo-yu2, LIU Qiong1,2   

  1. 1. School of Software, South China University of Technology, Guangzhou 510006, China;
    2. School of Computer Science and Engineering, South China University of Technology, Guangzhou 510006, China
  • Received:2013-11-22 Online:2014-10-28 Published:2014-11-07

Abstract:

Aiming at the class labeling starvation and class imbalance problems in Internet traffic classification, a bootstrapping based traffic classification method was presented. An initial classifier was trained on a small number of labeled samples, and then it is updated iteratively by predicting the class labels of unlabeled samples and extending the training set. A new algorithm was devised to compute the confidence used for selecting new labeled samples into the extension set. It correctly adopts classifying unlabeled samples with a posterior probability distribution as probabilistic event and to decrease the noise in the extension set. Moreover, the heuristic rule was built with aid of probably approximately correct theory, its biases is toward selecting minority class samples into the extension set so as to reduce class imbalance degree. Experiments show that the bootstrapping based classifier gets improved of 9.46% on overall classification accuracy compared with initial classifier, and the recalls of minority classes get increased about 2.22% averagely compared with the existing method.

Key words: semi-supervised learning, class imbalance, Bootstrapping, Internet traffic classification

CLC Number: