Journal of Beijing University of Posts and Telecommunications

  • EI核心期刊

Journal of Beijing University of Posts and Telecommunications ›› 2025, Vol. 48 ›› Issue (2): 18-27.

Previous Articles     Next Articles

Improved K-means Clustering Algorithm based on PageRank

  

  • Received:2023-05-16 Revised:2024-06-27 Online:2025-04-30 Published:2025-04-30

Abstract: To address the randomness problem in initial cluster center selection of K-means clustering algorithm, a PageRank importance measurement algorithm based on Gauss-Seidel iteration is proposed to improve K-means. By simulating the clustering data as a random walk model on a directed graph using PageRank, in other words, a first-order Markov chain. The clustering data is traversed randomly, and a linear equation system is established. Gauss-Seidel is introduced, and its continuous updating characteristic of unknown variables in the process of solving linear equation systems is used to ensure the accuracy of node PageRank values, and then the top k nodes are selected as the initial cluster centers. Then, the objective function value is optimized iteratively based on minimizing the loss function to accurately divide each node to the best cluster. Finally, a series of comparative experiments are performed on University of California Irvine (UCI) dataset and classic images, and experimental results show that the rand index, adjusted rand index, normalized mutual information and other performance evaluation indicators of improved K-means are better than the original K-means for discrete data, improving K-means can solve the edge segmentation fuzzy problem of the original K-means for classic images, effectively segmenting the target image subject and background.

Key words: K-means clustering, cluster center, Gaussian seidel iteration, segment images

CLC Number: