Journal of Beijing University of Posts and Telecommunications

  • EI核心期刊

JOURNAL OF BEIJING UNIVERSITY OF POSTS AND TELECOM ›› 2008, Vol. 31 ›› Issue (1): 14-17.doi: 10.13190/jbupt.200801.14.qiny

• Papers • Previous Articles     Next Articles

Cascade Identification of Chinese Chunks

QIN Ying, WANG Xiao-jie, ZHONG Yi-xin   

  1. Information Engineering School, Beijing University of Posts and Telecommunications, Beijing 100876,China
  • Received:2007-03-31 Revised:1900-01-01 Online:2008-02-28 Published:2008-02-28
  • Contact: QIN Ying

Abstract:

Most statistical-based Chinese chunking researches was inspired by English chunking of CoNLL2000. After representing chunks within the scheme of tags for words in a chunk (BIO), chunk identification task was cast as word sequence tagging and tackled as multi-classification problems. For sake of decreasing classification complexity, a decomposed chunking approach was proposed: first, chunk boundary identification, and then chunk type identification. The vital problem of Chinese chunking is actually boundary identification. Cascade chunk identifiers were built based on conditional random fields (CRF). The experimental dataset was extracted from Chinese tree bank 5.1 (CTB5.1). As to the features selection, some methods often used in Chinese word segmentation were borrowed to chunking task. On 5 cross validation of dataset, F1-measure of chunk boundary identification is 95.05%, and the precision of chunk type recognition is 99.43% as well. And the total chunking F1-meausre reaches 93.58%. Comparing with other relative researches, the performance is improved and the training time of learners is sharply shortened.

Key words: Chinese chunking, boundary identification, type identification, conditional random fields

CLC Number: