北京邮电大学学报

  • EI核心期刊

北京邮电大学学报 ›› 2011, Vol. 34 ›› Issue (6): 64-68.doi: 10.13190/jbupt.201106.64.yanh

• 论文 • 上一篇    下一篇

基于量值的频繁闭项集层次聚类算法

延皓1,张博1,2,刘芳3,雷振明4   

  1. 1. 北京邮电大学信息与通信工程学院宽带网络流量监控教研中心
    2.
    3. 北京邮电大学
    4. 北京邮电大学信息工程学院
  • 收稿日期:2011-01-20 修回日期:2011-05-18 出版日期:2011-12-28 发布日期:2011-10-18
  • 通讯作者: 延皓 E-mail:yanhao71@163.com
  • 作者简介:延皓(1983-),男,博士生,E-mail:yanhao71@163.com 雷振明(1951-),男,教授,博士生导师
  • 基金资助:

    国家自然基金项目;高等学校学科创新引智计划项目

Closed Frequent Itemsets Hierarchical Clustering based on Items’ Quantities

  • Received:2011-01-20 Revised:2011-05-18 Online:2011-12-28 Published:2011-10-18
  • Contact: Hao Yan E-mail:yanhao71@163.com

摘要:

提出了基于量值的频繁闭项集层次聚类算法CFIHCQ,并将其应用于Web使用挖掘。该算法首先通过用户Web访问数据获取频繁闭项集;其次,以频繁闭项集对簇进行初始化,并以打分的方式将用户指入唯一簇;再次按照簇标记生成自上而下的簇树结构,并使用用户访问向量分裂子簇;最后,对簇树进行剪枝。实验表明,该算法能够很好的预测用户Web访问行为;在海量用户数据情况下,可满足实时挖掘的需求;并能以树结构展示挖掘结果。

Abstract:

A Web Usage Mining algorithm named Closed Frequent Itemsets Hierarchical Clustering based on Quantities (CFIHCQ) is proposed. The algorithm first obtains Closed Frequent Itemsets with network user Web access data. Then it initial clusters with Closed Frequent Itemsets and points users in to the only cluster using scoring method. After that, it construct cluster tree using cluster labels. User access vectors are used to divide sub-clusters in cluster tree. Finally the cluster tree is pruned. Experimental results indicate CFIHCQ has many advantages such as accurate predicating network user Web access behavior, real-time mining in huge data set, and easy-browse result with tree structure.

中图分类号: