北京邮电大学学报

  • EI核心期刊

北京邮电大学学报 ›› 2019, Vol. 42 ›› Issue (6): 126-133,141.doi: 10.13190/j.jbupt.2019-202

• 研究报告 • 上一篇    下一篇

通过检测语义分歧识别无答案问题

刘咏彬1, 王小捷1, 袁彩霞1, 易炼2   

  1. 1. 北京邮电大学 计算机院, 北京 100876;
    2. 阿里巴巴(北京)软件服务有限公司, 北京 100022
  • 收稿日期:2019-09-28 出版日期:2019-12-28 发布日期:2019-11-15
  • 作者简介:刘咏彬(1977-),女,讲师,E-mail:liuyb@bupt.edu.cn.
  • 基金资助:
    中央高校基本科研业务费专项资金项目(500419302)

Unanswerable Questions Recognition by Semantic Discrepancy Detection

LIU Yong-bin1, WANG Xiao-jie1, YUAN Cai-xia1, YI Lian2   

  1. 1. School of Telecommunication Engineering, Beijing University of Posts and Telecommunications, Beijing 100876, China;
    2. Alibaba(Beijing) Software Services Company Limited, Beijing 100022, China
  • Received:2019-09-28 Online:2019-12-28 Published:2019-11-15

摘要: 机器阅读理解中存在无法仅从给定文档中获取问题答案的特殊情况,为此,基于语义冲突检测的机器阅读理解网络(SCDNet)提出应通过检测问题与文档内容之间的语义分歧来识别这种情况.经分析发现,文档无法为问题提供答案的根本原因主要分为两类:一是文档中不包含问题所需的语义信息;二是二者包含的语义成分之间存在分歧.据此推断,可以通过检测文档语义信息是否全面涵盖问题所需的信息来识别问题是否可由文档信息给出回答.此外,通过在损失函数中加入答案文本长度惩罚项,网络优化目标函数更接近评测指标,系统性能得到提升.网络模型使用联合训练模型建模无答案的问题识别与答案抽取2个子任务,并使用端到端的方式训练.实验结果证明,其对无答案问题类别预测的正确率超过了性能先进的基线模型SAN2.0,在SQuAD2.0数据集上取得了72.43的F1值和76.96的无答案问题识别正确率.

关键词: 机器阅读理解, 问答系统, 无答案的问题

Abstract: Machine reading comprehension (MRC) with unanswerable questions is challenging to the field of natural language processing research. Unlike previous work which ignores the mechanism of answerable and unanswerable, the semantic conflicts detection-based MRC network (SCDNet) was proposed aiming at detections of no-answer (NA) questions through semantic conflicts detection network. The basic idea is that if the given question is unanswerable, there exists semantic absence or conflicts between the question and the reference passages. Therefore, SCDNet predicts the NA probability by checking whether the passage covers the integral semantics of the question. Besides, in order to extract the exact answer from the passage, SCDNet is applied an answer length penalty in the loss function, which helps the learning objective to be more consistent with the evaluation metrics. SCDNet packs the NA question predictor and the answer extractor in a joint model and is trained in an end-to-end manner. Experiments show that SCDNet performs better than some strong baseline models, and achieve an F1 score of 72.43 and 76.96 NA accuracy on SQuAD 2.0 dataset.

Key words: machine reading comprehension, question answering, unanswerable question

中图分类号: