北京邮电大学学报

  • EI核心期刊

北京邮电大学学报 ›› 2014, Vol. 37 ›› Issue (5): 36-40.doi: 10.13190/j.jbupt.2014.05.008

• 论文 • 上一篇    下一篇

结合实体链接与实体聚类的命名实体消歧

谭咏梅, 杨雪   

  1. 北京邮电大学 智能科学与技术中心, 北京 100876
  • 收稿日期:2013-07-11 出版日期:2014-10-28 发布日期:2014-11-07
  • 作者简介:谭咏梅(1975- ), 女, 副教授, E-mail: ymtan@bupt.edu.cn.
  • 基金资助:

    国家自然科学基金项目(61273365)

An Named Entity Disambiguation Algorithm Combining Entity Linking and Entity Clustering

TAN Yong-mei, YANG Xue   

  1. Center for Intelligence Science and Technology, Beijing University of Posts and Telecommunications, Beijing 100876, China
  • Received:2013-07-11 Online:2014-10-28 Published:2014-11-07

摘要:

为了消除文本中命名实体的歧义,提出了一种结合实体链接与实体聚类的命名实体消歧算法,结合2种方法,可弥补单独使用其中一种方法的局限. 该算法在背景文本中将待消歧实体指称扩充为全称,使用扩充后的全称在英文维基百科知识库中生成候选实体集合,同时提取多种特征对候选实体集合进行排序,对于知识库中没有对应实体的指称使用聚类消歧. 实验结果表明,该算法在KBP2011评测数据上的F值为0.746,在KBP2012评测数据上的F值为0.670.

关键词: 命名实体消歧, 实体链接, 聚类

Abstract:

In order to eliminate the ambiguity of named entities in the documents, a named entity disambiguation algorithm combining entity linking and entity clustering is proposed, and the proposed algorithm combines two methods to compensate for the limitations of only using one of the methods. The proposed algorithm expands the mentions in the background document firstly, and generates candidates in the English Wikipedia knowledge base for expansions secondly, then extracts a variety of features to rank candidates, lastly uses clustering to disambiguate the mentions which has none candidates in the knowledge base. The experimental results show that, in the proposed algorithm, the F measure in KBP2011 data set is 0.746 and the F measure in KBP2012 data set is 0.670.

Key words: named entity disambiguation, entity linking, clustering

中图分类号: