北京邮电大学学报

  • EI核心期刊

北京邮电大学学报 ›› 2020, Vol. 43 ›› Issue (5): 84-90.doi: 10.13190/j.jbupt.2020-032

• 论文 • 上一篇    下一篇

基于数据增强的中文医疗命名实体识别

王蓬辉, 李明正, 李思   

  1. 北京邮电大学 人工智能学院, 北京 100876
  • 收稿日期:2020-03-24 发布日期:2021-03-11
  • 通讯作者: 李思(1985-),女,副教授,E-mail:lisi@bupt.edu.cn. E-mail:lisi@bupt.edu.cn
  • 作者简介:王蓬辉(1996-),男,硕士生.
  • 基金资助:
    国家自然科学基金项目(61702047)

Data Augmentation for Chinese Clinical Named Entity Recognition

WANG Peng-hui, LI Ming-zheng, LI Si   

  1. School of Artificial Intelligence, Beijing University of Posts and Telecommunications, Beijing 100876, China
  • Received:2020-03-24 Published:2021-03-11

摘要: 由于缺乏大量已标注数据,在中文医疗命名实体识别中,主要利用外部资源来改善医疗实体识别的性能,这需要大量的时间和有效的规则加入外部资源.为了解决标注数据不足的问题,提出了一种基于生成对抗网络的数据增强算法,自动生成大量标注数据,提高医疗实体识别的性能.实验结果表明,该算法在性能方面优于实验中的基准模型,证明了该算法在医疗实体识别上的有效性.

关键词: 命名实体识别, 数据增强, 序列生成对抗网络

Abstract: Chinese clinical named entity recognition plays an important role in recognizing medical entities contained in Chinese electronic medical records. Limited to lack of large annotated data, most of existing methods concentrate on employing external resources to improve the performance of clinical named entity recognition, which require lots of time and efficient rules. To solve the problem of lack of large annotated data, data augmentation using sequence adversarial generative network is used to generate more various data depending on entities and non-entities in the training set. Experiments show that when using generated data to expand training set, the proposed named entity recognition system has achieved competitive performance compared with state-of-art methods, which shows the effectiveness of our data augmentation method.

Key words: named entity recognition, data augmentation, generative adversarial network

中图分类号: