北京邮电大学学报

  • EI核心期刊

北京邮电大学学报 ›› 2007, Vol. 30 ›› Issue (3): 1-5.doi: 10.13190/jbupt.200703.1.niuk

• 论文 •    下一篇

采用属性聚类的高维子空间聚类算法

牛 琨1, 张舒博2, 陈俊亮1   

  1. 1. 北京邮电大学 网络与交换技术国家重点实验室, 北京 100876; 2. 中国电信北京研究院 决策研究部, 北京 100035
  • 收稿日期:2006-08-07 修回日期:1900-01-01 出版日期:2007-06-30 发布日期:2007-06-30
  • 通讯作者: 牛 琨

Subspace Clustering through Attribute Clustering

NIU Kun1, ZHANG Shu-bo2, CHEN Jun-liang1   

  1. 1. State Key Laboratory of Networking and Switching Technology, Beijing 100876, China;
    2. Dept. of Strategy Research, China Telecom Beijing Research Institute, Beijing 100035, China
  • Received:2006-08-07 Revised:1900-01-01 Online:2007-06-30 Published:2007-06-30
  • Contact: NIU Kun

摘要:

为了解决现有子空间聚类算法时间复杂度偏高以及对输入参数敏感的问题,提出了一种基于属性聚类方法的高效子空间聚类算法.算法首先通过计算每个属性的基尼值来过滤冗余属性,而后通过基于二维联合基尼值的关系函数建立非冗余属性的关系矩阵,以衡量任意两个非冗余属性的相关度, 进而在关系矩阵上应用可产生交叠的聚类算法,聚类结果即为所有兴趣度子空间的候选集合,最后调用聚类算法得到所有存在于这些子空间内的簇.在人工数据集和真实数据集上的实验表明,新算法不仅在时间复杂度和子空间簇的寻找能力方面均有较优表现,而且对输入参数的取值不甚敏感.

关键词: 子空间聚类, 高维数据, 属性聚类

Abstract:

Many recently proposed subspace clustering methods suffer from two severe problems: First, the algorithms typically scale exponentially with the data dimensionality or the subspace dimensionality of clusters. Second, the clustering results are often sensitive to input parameters. A fast algorithm of subspace clustering using attribute clustering is proposed to overcome these limitations. This algorithm first filters out redundant attributes by computing the gini coefficient. To evaluate the correlation of each two non-redundant attributes, the relation matrix of non-redundant attributes is constructed based on the relation function of two dimensional united gini coefficients. After applying overlapping clustering algorithm on relation matrix, the candidate of all interesting subspaces is achieved. Finally, all subspace clusters can be gotten by clustering on interesting subspaces. Experiments on both synthesis and real datasets show that the new algorithm not only achieves a significant gain of runtime and quality to find subspace clusters but also is insensitive to input parameters.

Key words: subspace clustering, high dimensional data, attribute clustering

中图分类号: