北京邮电大学学报

  • EI核心期刊

北京邮电大学学报 ›› 2007, Vol. 30 ›› Issue (2): 6-10.doi: 10.13190/jbupt.200702.6.108

• 论文 • 上一篇    下一篇

融合网格密度的聚类中心初始化方案

牛 琨1, 张舒博2, 陈俊亮1   

  1. (1. 北京邮电大学 计算机科学与技术学院, 北京 100876; 2. 中国电信北京研究院 决策研究部, 北京 100035)
  • 收稿日期:2006-05-08 修回日期:1900-01-01 出版日期:2007-04-30 发布日期:2007-04-30
  • 通讯作者: 牛 琨

A Cell Density-Enabled Schema for Initializing Cluster Centers

NIU Kun1, ZHANG Shu-bo2, CHEN Jun-liang1   

  1. (1. School of Computer Science and Technology, Beijing University of Posts and Telecommunications, Beijing 100876, China;
    2. Department of Strategy Research, China Telecom Beijing Research Institute, Beijing 100035, China)
  • Received:2006-05-08 Revised:1900-01-01 Online:2007-04-30 Published:2007-04-30
  • Contact: NIU Kun

摘要:

提出了一种采用密度指针的聚类中心初始化方法——density pointer (DP) 算法。DP算法以网格单元的几何中心为对称中心,连接该中心与网格单元各顶点,以此对称分割传统的类矩形网格单元,形成超三角形子空间;进而根据各个超三角形子空间与邻居单元相邻的超三角形子空间的密度差异确定密度指针的方向,并根据密度指针计算出每个密集网格单元的聚集因子;最后将具有较大局部聚集因子的网格单元族的重心作为初始聚类中心。在公开数据集和人工数据集上的实验结果表明,DP算法能快速高效地找到接近于真实聚类中心的数据点作为初始聚类中心。针对算法的效率实验表明,DP算法的时间开销与数据集实例数、维度及网格单元数量均呈一阶线性关系。

关键词: 密度指针, 聚集因子, 聚类中心, 初始化

Abstract:

A novel approach using density pointer is proposed to initialize cluster centers. The density pointer (DP) algorithm takes the geometric centers of grid cells as symmetrical centers. With the interconnections between these centers and the vertices of grid cells, DP partitions traditional rectangular-like grid cells into hyper triangle-like subspaces. The density differences between hyper-triangle subspaces and the corresponding subspaces of their neighborhoods are considered to define density pointers. After that, DP detects density pointers to calculate the aggregation factors of dense cells and then takes the gravity centers of the cells with larger local aggregation factors as initial cluster centers. Experiments on both public and real datasets show that DP is helpful to find cluster centers near to real centers quickly and effectively. Moreover, the running time of DP is linear with respect to the number of instances, the number of grid cells and the dimensions.

Key words: density pointer, aggregation factor, cluster centers, initializing

中图分类号: