北京邮电大学学报

  • EI核心期刊

北京邮电大学学报 ›› 2022, Vol. 45 ›› Issue (4): 13-18,57.doi: 10.13190/j.jbupt.2021-191

• 智慧医疗 • 上一篇    下一篇

基于预训练语言模型的中医症状标准化方法

谢永红1,2, 陶浒1,2, 贾麒1,2, 杨石兵1,2, 韩辛亮2   

  1. 1. 北京科技大学 计算机与通信工程学院, 北京 100083;
    2. 材料领域知识工程北京市重点实验室, 北京 100083
  • 收稿日期:2021-09-01 出版日期:2022-08-28 发布日期:2022-09-03
  • 作者简介:谢永红(1970—),女,副教授,邮箱:xieyh@ustb.edu.cn。
  • 基金资助:
    国家重点研发计划项目(2018YFC1707410)

Traditional Chinese Medicine Symptom Normalization Approach Based on Pre-Trained Language Models

XIE Yonghong1,2, TAO Hu1,2, JIA Qi1,2, YANG Shibing1,2, HAN Xinliang2   

  1. 1. School of Computer and Communication Engineering, University of Science and Technology Beijing, Beijing 100083, China;
    2. Beijing Key Laboratory of Knowledge Engineering for Materials Science, Beijing 100083, China
  • Received:2021-09-01 Online:2022-08-28 Published:2022-09-03

摘要: 为了解决中医症状描述词的异名同义、一对多的问题,提出了一种基于预训练语言模型的2阶段症状标准化框架:第1阶段,生成候选标准症状词,参考中医症状词的定义与分类,利用多标签分类思想对原始症状词进行语义划分,进而得到相应语义标签下的候选标准症状词;第2阶段,对候选标准症状词进行排序,使用匹配模型对第1阶段得到的候选标准症状词集进行评分与排序,同时用策略对结果进行二次召回以提高症状标准化框架的性能,由此得到最终的标准化结果。实验结果表明,提出的症状标准化方法与传统方法相比能够更有效地处理症状标准化的问题。通过对比分析不同预训练语言模型在症状标准化任务上的性能,进一步说明了所提框架和策略的有效性。

关键词: 中医, 症状标准化, 实体匹配, 语义分类, 预训练语言模型

Abstract: To solve the issue in traditional Chinese medicine that one symptom has different literal descriptions and one symptom corresponds to multiple normalized descriptions, a two-stage framework based on pre-trained language models is proposed. In the first step, according to the definition and classification of symptoms, a multi-label text classification model is adopted to semantically divide the symptom descriptions to obtain candidate normalization symptom words. In the second step, we score and sort the candidate normalization symptom words with an entity matching model, and some strategies are designed to perform a second recall of the results to improve performance. After that, the candidate word with the highest score in each semantic label is regarded as the normalization result. Experiments results show that the proposed method performs better than traditional methods on solving the symptom normalization problem. Furthermore, the research compares and analyzes the results using different pre-trained language models on the symptom normalization task to verify the effectiveness of the proposed method.

Key words: traditional Chinese medicine, symptom normalization, entity matching, semantic classification, pre-trained language model

中图分类号: