Journal of Beijing University of Posts and Telecommunications

  • EI核心期刊

JOURNAL OF BEIJING UNIVERSITY OF POSTS AND TELECOM ›› 2014, Vol. 37 ›› Issue (3): 32-37.doi: 10.13190/j.jbupt.2014.03.007

Previous Articles     Next Articles

Hierarchical News Topic Detection Using Improved LSH

LU Mei-lian, WANG Zi, LI Jia-shan   

  1. State Key Laboratory of Networking and Switching Technology, Beijing University of Posts and Telecommunications, Beijing 100876, China
  • Received:2013-08-08 Online:2014-06-28 Published:2014-06-08

Abstract:

To improve the timeliness of detecting topics in retrospective topic detection, an improved locality sensitive Hashing (LSH) algorithm is proposed and applied in constructing hierarchical topic model for web news. Firstly, the news content feature is excavated, and the topic feature is excavated using latent dirichlet allocation model. Then the non-binary content eigenvector and topic eigenvector are converted to binary feature space. Finally, news articles are clustered in order using binary content eigenvector and binary topic eigenvector by LSH, and the hierarchical topic-content news topic model is generated. Experiments prove the following results: extracting content feature and topic feature can express the news exactly; converting content eigenvector and topic eigenvector to unified binary space can reduce the time complexity of clustering, and thus increase the efficiency of topic detection while ensure the accuracy and semantic expansibility.

Key words: topic detection, hierarchy clustering, topic model, locality sensitive Hashing

CLC Number: