北京邮电大学学报

  • EI核心期刊

北京邮电大学学报 ›› 2017, Vol. 40 ›› Issue (2): 16-20.doi: 10.13190/j.jbupt.2017.02.003

• 论文 • 上一篇    下一篇

针对中国学生英文文章的词性标注方法

谭咏梅, 杨林, 胡单   

  1. 北京邮电大学 智能科学与技术中心, 北京 100876
  • 收稿日期:2016-03-29 出版日期:2017-04-28 发布日期:2017-04-26
  • 作者简介:谭咏梅(1975-),女,副教授,E-mail:ymtan@bupt.edu.cn.

A Part-of-Speech Tagging Algorithm for Essay Written by Chinese English Learner

TAN Yong-mei, YANG Lin, HU Dan   

  1. Intelligence Science and Technology Center, Beijing University of Posts and Telecommunications, Beijing 100876, China
  • Received:2016-03-29 Online:2017-04-28 Published:2017-04-26

摘要: 提出了一种基于词向量的两层词性标注方法,使用少量人工提取的特征,大部分特征可使用词向量和第1层标注向量自动训练得到.该方法将标注集分成两类,分别作为不同层的标注集.首先,对容易标注的类别进行标注;然后,对难以标注的动词或者名词进行第2层标注,将其标注为具体的某类动词或名词.利用该方法对中国学生写的英语文章进行词性标注的准确率可从95.23%提高到95.63%,超过了现有基于词向量词性标注器对相同语料词性标注的准确率.

关键词: 词性标注, 中国学生, 文章, 词向量

Abstract: A tagging algorithm about two layers part-of-speech base on word embedding was proposed. Only a few artificial features are needed in this algorithm, most features are replaced by word embedding and tagging vector that is got in the first layer. In addition, the tag set is divided into two categories, which are the tag sets of different layers. The ones which are easily to be tagged are tagged firstly in the first layer.Those tags which are hardly to be tagged as noun and verb are tagged in the second layer. Using this algorithm, the accuracy of part-of-speech tagging of essays written by Chinese English learner is improved from 95.23% to 95.63%, which outperforms the state-of-art word results of part-of-speech tagging of essays written by Chinese English learner based on vector based on word embedding.

Key words: part-of-speech tagging, Chinese English learner, essays, word vector

中图分类号: