Journal of Beijing University of Posts and Telecommunications

  • EI核心期刊

Journal of Beijing University of Posts and Telecommunications ›› 2020, Vol. 43 ›› Issue (1): 68-73.doi: 10.13190/j.jbupt.2019-033

• Papers • Previous Articles     Next Articles

Log Template Extraction Algorithm Based on Normalized Feature Discrimination

SHUANG Kai1,2, LI Yi-wen1, Lü Zhi-heng1, HAN Jing3, LIU Jian-wei3   

  1. 1. State Key Laboratory of Networking and Switching Technology, Beijing University of Posts and Telecommunications, Beijing 100876, China;
    2. Science and Technology of Information Transmission and Dissemination in Communication Networks Laboratory, Shijiazhuang 050081, China;
    3. ZTE Corporation, Shenzhen 518057, China
  • Received:2019-03-22 Online:2020-02-28 Published:2020-03-27
  • Supported by:
     

Abstract: A log template extraction algorithm based on normalized feature discrimination is proposed, aiming at the problem that the number of clusters needs to be provided as a priori information in traditional log template extraction. First, log data is initially compressed to reduce data redundancy. Then, a log clustering process is implemented, and the normalized feature is used to discriminate whether the clustering result meets requirement:if so, the clustering process is successfully completed; if not, the number of log clusters is adjusted by using binary search and redo clustering. Finally, the log template is extracted via clustering results. In addition, an evaluation metric that measures the effectiveness of template extraction is designed. Experiments on real data indicated that the algorithm can achieve more stable and accurate template extraction performance than the benchmark method, and it had good generalization performance.

Key words: template extraction, log analysis, text clustering, normalized feature

CLC Number: