北京邮电大学学报

  • EI核心期刊

北京邮电大学学报 ›› 2014, Vol. 37 ›› Issue (6): 120-124.doi: 10.13190/j.jbupt.2014.06.025

• 研究报告 • 上一篇    下一篇

面向英语文章的词性标注算法

谭咏梅, 吴坤   

  1. 北京邮电大学 智能科学与技术中心, 北京 100876
  • 收稿日期:2013-07-10 出版日期:2014-12-28 发布日期:2014-12-28
  • 作者简介:谭咏梅(1975-),女,副教授,硕士生导师,E-mail:ymtan@bupt.edu.cn.
  • 基金资助:

    国家自然科学基金项目(61273365)

A Part-of-Speech Tagging Method for English Essay

TAN Yong-mei, WU Kun   

  1. Center for Intelligence Science and Technology, Beijing University of Posts and Telecommunications, Beijing 100876, China
  • Received:2013-07-10 Online:2014-12-28 Published:2014-12-28

摘要:

面向英语文章的词性标注是对英语文章实现自动批改的基础,虽然研究者对英语词性标注做了大量有益的研究,但是大多数的研究都面向英语为第一语言的用户,而面向英语为第二语言用户的相关研究则很少. 为此,对以英语为第二语言用户的英语文章进行了人工标注,在此基础上提出了一种面向英语文章的词性标注算法,融合了词聚类、无标语料统计信息、单词发音等特征. 实验结果表明,该算法能有效提高词性标注性能,标注正确率从94.49%可提高到97.07%.

关键词: 词性标注, 学生英语文章, 特征, 词聚类

Abstract:

Part-of-speech tagging for Chinese English learner language is the base of automated essay scoring system. Much of fruitful part-of-speech tagging researches researchers was done, however, most of them are focused on the English essays written by native speaker, there is no research about essays of Chinese English learner. A corpus of Chinese English learner essay are annotated, and a part-of-speech tagging algorithm for Chinese English learner language is presented. This algorithm uses rich features, such as unsupervised word clusters, unsupervised tag dictionary and phonetic normalization. Based on these rich features, the system outperforms the state-of-art tagging on the corpus, and the tagging accuracy is raised from 94.49% to 97.07%.

Key words: part-of-speech tagging, essays of learner, feature, word clustering

中图分类号: