北京邮电大学学报

  • EI核心期刊

北京邮电大学学报 ›› 2020, Vol. 43 ›› Issue (5): 21-26.doi: 10.13190/j.jbupt.2020-017

• 论文 • 上一篇    下一篇

半监督聚类目标下粒子群算法的分析与改进

孙艺1, 夏启钊2   

  1. 1. 北京邮电大学 计算机学院(国家示范性软件学院), 北京 100876;
    2. 北京邮电大学 国际学院, 北京 100876
  • 收稿日期:2020-02-16 发布日期:2021-03-11
  • 作者简介:孙艺(1979-),男,高级工程师,E-mail:sunyisse@bupt.edu.cn.
  • 基金资助:
    河北省重点研发计划项目(20313701D);河北省重点研发计划项目(19210404D);国家自然科学基金项目(U1536112);国家社会科学基金重点项目(17AJL014)

Analysis and Improvement of Semi-Supervised K-means Clustering Based on Particle Swarm Optimization Algorithm

SUN Yi1, XIA Qi-zhao2   

  1. 1. School of Computer Science(National Pilot Software Engineering School), Beijing University of Posts and Telecommunications, Beijing 100876, China;
    2. International School, Beijing University of Posts and Telecommunications, Beijing 100876, China
  • Received:2020-02-16 Published:2021-03-11

摘要: 传统粒子群算法的优点较为明显,但是随着环境复杂度的增高,传统算法的聚类中心敏感度升高,空聚类过多,类标号对聚类结果的影响不足等问题日趋严重.为此,提出了一种改进算法,以半监督K均值聚类为目标,以自适应K值的方式,随机地计算初始化聚类中心,并根据均值聚类算法的需要编码成粒子,同时引入软性约束概念重新构造目标函数;最后使用改进后的算法进行寻优.所提出的粒子群算法改进了自适应参数,引入了免疫扰动和混沌扰动2种扰动方式,同时应用了退火策略和动态聚类策略.实验结果表明,该算法在很大程度上解决了上述问题.

关键词: 半监督, K均值, 信息熵, 扰动, 退火策略

Abstract: Traditional particle swarm optimization has obvious advantages, but with increased complexity of the environment. When the traditional algorithm is used, the sensitivity of the clustering center is increased, there are too many empty clusters, and the performance of the class label has insufficient influence on the clustering results. An improved algorithm is proposed, which aims at semi-supervised K-means clustering; first, the clustering center is initialized by random calculation in an adaptive K-value method, and the particles are encoded according to the needs of the mean clustering algorithm. At the same time, the objective function is reconstructed with the concept of soft constraints, and finally the improved algorithm is used for optimization. The adaptive parameters in the improved particle swarm optimization algorithm is improved, two disturbance methods of immune disturbance and chaos disturbance is introduced, and the annealing strategy and dynamic clustering strategy at the same time is applied. Experiments show that the algorithm has solved the above problem.

Key words: semi-supervised, K-means clustering, information entropy, perturbation, annealing strategy

中图分类号: