北京邮电大学学报

  • EI核心期刊

北京邮电大学学报 ›› 2013, Vol. 36 ›› Issue (2): 74-78.doi: 10.13190/jbupt.201302.74.231

• 论文 • 上一篇    下一篇

基于降维算法的分布式语义资源搜索

张春红1, 胡清源1, 程时端2   

  1. 1. 北京邮电大学 信息与通信工程学院, 北京 100876; 2. 北京邮电大学 网络技术研究院, 北京 100876
  • 收稿日期:2012-06-09 修回日期:2012-11-27 出版日期:2013-04-30 发布日期:2013-03-25
  • 通讯作者: 胡清源 E-mail:qingyuanhaha@gmail.com
  • 作者简介:张春红(1971-),女,讲师,硕士,E-mail:zhangch.bupt.001@gmail.com
  • 基金资助:

    杭州华星——北邮信通院2011研究生创新基金;国家科技重大专项项目(2012ZX03005008)

A Distributed Semantic Resources Search Based on Dimensionality Reduction Algorithm

ZHANG Chun-hong1, HU Qing-yuan1, CHENG Shi-duan2   

  1. 1. School of Information and Communication Engineering, Beijing University of Posts and Telecommunications, Beijing 100876, China;<br>2. Institute of Network Technology, Beijing University of Posts and Telecommunications, Beijing 100876, China
  • Received:2012-06-09 Revised:2012-11-27 Online:2013-04-30 Published:2013-03-25
  • Contact: Qing-Yuan HU E-mail:qingyuanhaha@gmail.com

摘要:

提出了一种面向高维资源的分布式相似资源搜索机制. 针对传统的分布式对等(P2P)网络无法解决高维资源的相似性搜索问题,通过基于主成分分析的降维算法将高维资源向量模型映射到低维空间,以低维空间中资源向量模型为索引,映射到P2P网络里的分布式散列表中,以一种完全基于P2P网络和路由机制的简单有效方式实现分布式相似性资源搜索,同时避免资源维数过高引发搜索的维数灾难. 对降维处理后资源相似性信息保留情况进行了分析,并通过基于内容寻址网络的仿真验证了降维算法对于构建低维资源索引的有效性. 对于具有一定聚类特征的高维资源,该方法可以在分布式的相似性搜索中获得较高的查准率.

关键词: 向量模型, 坐标空间, 降维, 资源搜索, 对等网络

Abstract:

A distributed semantic resources search mechanism for high-dimensional resources is presented. Faced with the problem that the similarity search with high-dimensional resources couldnt be effectively achieved in traditional peer-to-peer (P2P) network, a high-dimensional resource vector model is mapped to the low dimensional space based on dimensionality reduction algorithm based on principal component analysis and then projected to distributed hash table in P2P network which is a simple and effective way to achieve distributed similarity search. Meanwhile, the curse of dimensionality owing to the high dimension of resources could be prevented in the search. The maintenance of the similarity information after processing of dimensionality reduction is analyzed. Simulation based on content addressable network is shown the effectiveness of low-dimensional index built by dimensionality reduction algorithm. The mechanism will achieve a high precision ratio in distributed similarity search for the clustered high-dimensional resources.

Key words: vector model, coordinate space, dimension reduction, resources search, peer-to-peer network

中图分类号: