北京邮电大学学报

  • EI核心期刊

北京邮电大学学报 ›› 2016, Vol. 39 ›› Issue (3): 114-119.doi: 10.13190/j.jbupt.2016.03.021

• 研究报告 • 上一篇    下一篇

面向实时海量数据流的数据聚类

赵金东, 于彦伟, 刘惊雷   

  1. 烟台大学 计算机与控制工程学院, 山东 烟台 264005
  • 收稿日期:2015-06-25 出版日期:2016-06-28 发布日期:2016-06-28
  • 作者简介:赵金东(1974-),男,副教授,博士,E-mail:zhjdong@126.com.
  • 基金资助:

    国家自然科学基金项目(61403328,61572419);山东省自然科学基金项目(ZR2013FM011)

A Data Clustering Algorithm over Real Time High-Volume Data Streams

ZHAO Jin-dong, YU Yan-wei, LIU Jing-lei   

  1. Computer and Control Engineering College, Yantai University, Shandong Yantai 264005, China
  • Received:2015-06-25 Online:2016-06-28 Published:2016-06-28

摘要:

针对海量实时数据流,提出了一种基于密度和网格划分相结合的聚类算法.首先对数据空间进行划分,判断每个单元格中数据点的属性.如果单元格内数据点密度高于阈值,则判定这些点为核心点;否则,根据单元格邻居内数据点的数量对数据点进行再次判断,以确定单元格内的数据点是边界点还是噪声点.算法克服了基于密度的算法运行效率低的缺点,又弥补了基于网格的算法精度较低的不足.通过实验验证了算法的效率和性能,并与经典的DBSCAN和CLIQUE算法进行了对比分析.最后分析了算法在面向海量实时数据流方面所具有的优势,并提出了进一步的研究方向.

关键词: 异常检测, 聚类分析, 密度聚类, 网格聚类, 海量数据流

Abstract:

The energy efficient and real-time data collecting problem in wireless sensor network was studied. The mobile data collecting protocol consists four phases:nodes clustering, routes planning, routes combine and data collecting is proposed. Two heuristic algorithms save and nearest neighbor were presented to build data collecting routes which incur the least mobile cost while satisfy the deadline constraint. Simulations show that the proposed heuristic routes planning algorithms have good performance in terms of energy saving, deadline guarantee and travel cost reduction.

Key words: outlier detection, clustering analysis, density-based cluster, partition-based cluster, high-volume data stream

中图分类号: