Journal of Beijing University of Posts and Telecommunications

  • EI核心期刊

JOURNAL OF BEIJING UNIVERSITY OF POSTS AND TELECOM ›› 2001, Vol. 24 ›› Issue (1): 42-46.

Previous Articles     Next Articles

Automatic Text Categorization Based on K-Nearest Neighbor

SUN Jian, WANG Wei, ZHONG Yi-xin   

  1. Information Engineering School, Beijing University of Posts and Telecommunications, Beijing 100876, China
  • Received:2000-09-14 Online:2001-01-10

Abstract: A method that integrates language information and statistical information from the training corpus is put forward. The weight of these characters is computed from three parameters: word frequency, centralized degree, decentralized degree. After training, we get the vector space model of the text categorization. The classification of the input text is decided by K-nearest-neighbor.The result shows that the method improves the accuracy of the categorization.

Key words: natural language understanding, vector space model, K-nearest-neighbor, automatic text categorization

CLC Number: