基于归一化特征判别的日志模板挖掘算法

doi:10.13190/j.jbupt.2019-033

北京邮电大学学报 ›› 2020, Vol. 43 ›› Issue (1): 68-73.doi: 10.13190/j.jbupt.2019-033

基于归一化特征判别的日志模板挖掘算法

双锴^1,2, 李怡雯¹, 吕志恒¹, 韩静³, 刘建伟³

1. 北京邮电大学网络与交换技术国家重点实验室, 北京 100876;
2. 通信网信息传输与分发技术重点实验室, 石家庄 050081;
3. 中兴通讯股份有限公司, 深圳 518057

收稿日期:2019-03-22 出版日期:2020-02-28 发布日期:2020-03-27
作者简介:双锴(1977-),男,副教授,硕士生导师,E-mail:shuangk@bupt.edu.cn.
基金资助:
国家重点研发计划项目（2016QY01W0200）；上海市青年科技英才扬帆计划项目（18YF1423300）；通信网信息传输与分发技术重点实验室开放基金课题（SXX18641X024）

Log Template Extraction Algorithm Based on Normalized Feature Discrimination

SHUANG Kai^1,2, LI Yi-wen¹, Lü Zhi-heng¹, HAN Jing³, LIU Jian-wei³

1. State Key Laboratory of Networking and Switching Technology, Beijing University of Posts and Telecommunications, Beijing 100876, China;
2. Science and Technology of Information Transmission and Dissemination in Communication Networks Laboratory, Shijiazhuang 050081, China;
3. ZTE Corporation, Shenzhen 518057, China

Received:2019-03-22 Online:2020-02-28 Published:2020-03-27
Supported by:

摘要/Abstract

摘要： 针对传统日志模板挖掘时需要日志聚类数目作为先验信息的问题，提出了一种基于归一化特征判别的日志模板挖掘算法.首先，对日志数据进行压缩，以提高后续处理效率；其次，进行日志聚类过程，使用归一化的日志统计特征判断聚类是否满足要求，若满足，则聚类成功；若不满足，则采用二分搜索的方式调整日志聚类的数目，重新进行聚类；最后，从聚类结果中提取日志模板，设计了一种衡量模板挖掘效果的评价指标.在真实数据集上的实验结果表明，算法的模板挖掘匹配度优于基准方法，并且具有良好的泛化性能.

关键词: 模板挖掘, 日志分析, 文本聚类, 归一化特征

Abstract: A log template extraction algorithm based on normalized feature discrimination is proposed, aiming at the problem that the number of clusters needs to be provided as a priori information in traditional log template extraction. First, log data is initially compressed to reduce data redundancy. Then, a log clustering process is implemented, and the normalized feature is used to discriminate whether the clustering result meets requirement:if so, the clustering process is successfully completed; if not, the number of log clusters is adjusted by using binary search and redo clustering. Finally, the log template is extracted via clustering results. In addition, an evaluation metric that measures the effectiveness of template extraction is designed. Experiments on real data indicated that the algorithm can achieve more stable and accurate template extraction performance than the benchmark method, and it had good generalization performance.

Key words: template extraction, log analysis, text clustering, normalized feature

中图分类号:

TP391

双锴, 李怡雯, 吕志恒, 韩静, 刘建伟. 基于归一化特征判别的日志模板挖掘算法[J]. 北京邮电大学学报, 2020, 43(1): 68-73.

SHUANG Kai, LI Yi-wen, Lü Zhi-heng, HAN Jing, LIU Jian-wei. Log Template Extraction Algorithm Based on Normalized Feature Discrimination[J]. Journal of Beijing University of Posts and Telecommunications, 2020, 43(1): 68-73.

参考文献

[1] 董蜻灵, 李芳, 何婷婷. 基于LDA模型的文本聚类研究[C]//孙茂松, 陈群秀. 中国计算语言学研究前沿进展(2009r 2011). 北京:清华大学出版社, 2011:455-461.
[2] Vaarandi R. A data clustering algorithm for mining patterns from event logs[C]//Proceedings of the 3rd IEEE Workshop on IP Operations & Management (IPOM 2003). Kansas City:IEEE, 2003:119-126.
[3] Fu Q, Lou J G, Wang Y, et al. Execution anomaly detection in distributed systems through unstructured log analysis[C]//2009 Ninth IEEE International Conference on Data Mining. Miami:IEEE, 2009:149-158.
[4] Makanju A, Zincirheywood A N N, Milios E E. Clustering event logs using iterative partitioning[C]//Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. New York:ACM, 2009:1255-1264.
[5] Makanju A, Zincirheywood A N N, Milios E E. Investigating event log analysis with minimum apriori information[C]//Ifip/IEEE International Symposium on Integrated Network Management. Ghent:IEEE, 2013:962-968.
[6] Tang L, Li T, Perng C S. LogSig:generating system events from raw textual logs[C]//Proceedings of the 20th ACM International Conference on Information and Knowledge Management. New York:ACM, 2011:785-794.
[7] Nandi A, Mandal A, Atreja S, et al. Anomaly detection using program control flow graph mining from execution logs[C]//ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. New York:ACM, 2016:215-224.
[8] Vaarandi R, Pihelgas M. LogCluster-A data clustering and pattern mining algorithm for event logs[C]//2015 11th International Conference on Network and Service Management (CNSM). Barcelona:IEEE, 2015:1-7.
[9] Aharon M, Barash G, Cohen I, et al. One graph is worth a thousand logs:uncovering hidden structures in massive system event logs[C]//Joint European Conference on Machine Learning and Knowledge Discovery in Databases. Berlin:Springer, 2009:227-243.
[10] Vaarandi R. A breadth-first algorithm for mining frequent patterns from event logs[C]//International Conference on Intelligence in Communication Systems. Berlin:Springer, 2004:293-308.

基于归一化特征判别的日志模板挖掘算法

Log Template Extraction Algorithm Based on Normalized Feature Discrimination

PDF

可视化

摘要/Abstract

引用本文

使用本文

参考文献

相关文章 1

编辑推荐

Metrics

本文评价