北京邮电大学学报

  • EI核心期刊

北京邮电大学学报 ›› 2021, Vol. 44 ›› Issue (5): 107-113.doi: 10.13190/j.jbupt.2021-003

• 研究报告 • 上一篇    下一篇

基于ERNIE-CRF-ESL安全隐患文本结构化解析

艾新波1, 郭彦君2, 谢云昊1, 陈成1   

  1. 1. 北京邮电大学 人工智能学院, 北京 100876;
    2. 北京邮电大学 现代邮政学院, 北京 100876
  • 收稿日期:2021-02-03 出版日期:2021-10-28 发布日期:2021-09-06
  • 作者简介:艾新波(1981-),男,副教授,博士生导师,E-mail:axb@bupt.edu.cn.
  • 基金资助:
    国家自然科学基金项目(61702047)

Structural Analysis of Hidden Danger Description Text Based on ERNIE-CRF-ESL

AI Xin-bo1, GUO Yan-jun2, XIE Yun-hao1, CHEN Cheng1   

  1. 1. School of Artificial Intelligence, Beijing University of Posts and Telecommunications, Beijing 100876, China;
    2. School of Modern Post, Beijing University of Posts and Telecommunications, Beijing 100876, China
  • Received:2021-02-03 Online:2021-10-28 Published:2021-09-06

摘要: 安全隐患描述文本是通过自然语言描述进行记录的,具有主观随意性问题,现有的序列标注相关模型无法从中提取关键知识信息.根据安全隐患描述文本的特点,首先设计了一种适用于安全隐患描述文本的序列标注方法,提出了基于知识集成的增强表示(ERNIE)模型的进行词向量特征提取,在其基础上通过融合条件随机场(CRF)模块和信息提取(ESL)模块,构建了一种安全生产隐患描述文本结构化解析方法.在某超大城市的安全隐患描述文本上进行了实验,实验结果表明,所提模型在文本结构化解析任务的精确率达到了65.1%,可以从城市安全隐患非结构化数据中获取更多的知识信息,进而规范化安全隐患排查记录工作.

关键词: 安全隐患描述文本, 结构化解析, 序列标注模型

Abstract: The safety hazard description text is recorded by natural language description,which has the problem of subjective arbitrariness. The existing sequence annotation-related models cannot extract key knowledge information from the safety hazard description. Based on the characteristics of the safety hazard description text,a sequence annotation method is designed for the safety hazard description text,and the enhanced representation from knowledge integration (ERNIE) model is proposed for word vector feature extraction. Based on the conditional random fields (CRF) module and the information extraction (ESL) module,a structured parsing method of safety hazard description text is constructed. An experiment is carried out on a description text of a hidden safety hazard in a mega-city. The experimental results show that the proposed model achieves a 65.1% precision rate in the text structured parsing task. The proposed algorithm can obtain more knowledge information from the unstructured data of urban safety hazards,and then standardize the safety hazards investigation and recording work.

Key words: safety hazard description text, structural analysis, sequence labeling model

中图分类号: