北京邮电大学学报

  • EI核心期刊

北京邮电大学学报 ›› 2015, Vol. 38 ›› Issue (s1): 63-66,71.doi: 10.13190/j.jbupt.2015.s1.015

• 论文 • 上一篇    下一篇

双层聚类模型在日志数据分析中的应用

古恒1,2, 陈钊3, 王枞2,4, 张思悦2,4, 傅群超2,4   

  1. 1. 北京邮电大学 计算机学院, 北京 100876;
    2. 北京邮电大学 可信分布式计算与服务教育部重点实验室, 北京 100876;
    3. 北京市政务信息安全应急处置中心, 北京 100101;
    4. 北京邮电大学 软件学院, 北京 100876
  • 收稿日期:2014-08-26 出版日期:2015-06-28 发布日期:2015-06-28
  • 作者简介:古 恒(1991—), 女, 硕士生, E-mail: kelly@bupt.edu.cn;王 枞(1958—), 女, 教授, 博士生导师.
  • 基金资助:

    北京市科委项目(Z131100001113034)

The Application of Double Layer Clustering Model on Log Data Analysis

GU Heng1,2, CHEN Zhao3, WANG Cong2,4, ZHANG Si-yue2,4, FU Qun-chao2,4   

  1. 1. School of Computer Science, Beijing University of Posts and Telecommunications, Beijing 100876, China;
    2. Key Laboratory of Trustworthy Distributed Computing and Service (BUPT), Ministry of Education, Beijing University of Posts and Telecommunications, Beijing 100876, China;
    3. Beijing Government Computer Emergency Response Center, Beijing 100101, China;
    4. School of Software, Beijing University of Posts and Telecommunications, Beijing 100876, China
  • Received:2014-08-26 Online:2015-06-28 Published:2015-06-28

摘要:

提出了一种基于自组织特征映射(SOM)神经网络和模糊c-均值(FCM)的双层聚类方法,对Web日志中的日志数据集进行聚类. 第一层是无监督SOM神经网络聚类方法,它所产生的类的个数大大减少了原始数据集的个数,降低了FCM对类初始中心点的依赖;然后利用FCM聚类算法的优势对第一层中产生的类的中心点进行聚类,从而大大减少了聚类的时间复杂度;最后通过平行坐标技术可视化展示聚类前后的日志数据集,方便对日志数据进行分析.

关键词: 平行坐标, 日志数据, 聚类, 自组织特征映射, 模糊c-均值

Abstract:

A double clustering model to make web log data sets clustering was proposed based on the self-organizing map (SOM) neural networks and the fuzzy c-means (FCM) method. The first tier uses unsupervised clustering method—SOM neural network, so the number of classes it generates significantly reduces compared with the original data set, it also reduces the FCM method's rely on class initial centers. Using the FCM clustering algorithm to cluster the center points of classes generated by the first layer, the time complexity of clustering is greatly reduced. Meanwhile, the parallel coordinates visualization technology to demonstrate the log dataset was used, it is suitable to analyze the log data.

Key words: parallel coordinates, log data, cluster, self-organizing map, fuzzy c-means

中图分类号: