Journal of Beijing University of Posts and Telecommunications

  • EI核心期刊

Journal of Beijing University of Posts and Telecommunications ›› 2022, Vol. 45 ›› Issue (4): 14-20.

Previous Articles     Next Articles

Traditional Chinese Medicine Symptom Normalization Approach Based on Pre-trained Language Models

  

  • Received:2021-09-01 Revised:2021-11-17 Online:2022-08-28 Published:2022-06-26

Abstract: Symptom normalization plays a vital role in mining Traditional Chinese medicine (TCM) knowledge and the promotion of the modernization of TCM. It is difficult because the challenges of symptom descriptions such as one symptom having different literal descriptions, one-to-many symptom descriptions. To deal with this problem, a two-stage framework based on pre-trained language models is proposed. First, a multi-label text classification model is adopted to semantically divide the symptom descriptions to obtain candidate normalization symptom words, according to the definition and classification of symptoms. Then score and sort the candidate words with a symptom word matching model, after which take the candidate word with the highest score in each semantic label as the normalization result of the symptom description. Finally, some strategies are designed to perform a second recall of the results to improve performance. The research analyzes the results obtained with different pre-trained models with a constructed symptom normalization dataset. The experiments show that the method and strategies can effectively deal with symptom normalization, among which the ERNIE-based model shows the best performance with F1 value 0.894.

Key words: Traditional Chinese Medicine, Symptom Normalization, Entity Matching, Semantic Classification, Pre-trained Language Models

CLC Number: