北京邮电大学学报

  • EI核心期刊

北京邮电大学学报 ›› 2009, Vol. 32 ›› Issue (3): 109-112.doi: 10.13190/jbupt.200903.109.liy

• 研究报告 • 上一篇    下一篇

维基百科的中文语义相关词获取及相关度分析计算

李赟 黄开妍 任福继 钟义信   

  1. 北京邮电大学;日本德岛大学 北京邮电大学
  • 收稿日期:2008-11-07 修回日期:2009-02-10 出版日期:2009-06-28 发布日期:2009-06-28
  • 通讯作者: 李赟

Wikipedia based Semantic Related Chinese Words Exploring and Relatedness Computing

LI Yun; Fuji REN ZHONG Yi-xin   

  • Received:2008-11-07 Revised:2009-02-10 Online:2009-06-28 Published:2009-06-28
  • Contact: LI Yun

摘要:

本文介绍了利用开放式百科全书维基百科获取语义关联词汇,并对语义相关程度进行分析和计算的方法。我们选择并整理了5万余篇维基百科中文语料,并利用超链接关系及词的互现等特征,获得了近40万对在概念或事实存在某种紧密语义关系的词,并简单分析了其聚类特性。进一步我们结合词在文档中的位置、频率等信息对语义相关程度进行了计算,并结合经典算法的相关结果,在不同语义相关度的集合上进行了对比实验,分析了本文获取语义关联词方法的有效性

Abstract:

This paper introduces our way of finding semantic related Chinese word pairs from the open encyclopedia Wikipedia and analyzing the degree of semantic relations. Almost 50,000 structured documents are collected from Wikipedia pages. Then considering of hyperlinks and text overlaps etc., about 400,000 semantic related pairs are employed. We roughly measured the semantic relatedness using the position and frequency information in the documents. With comparing experiment on data sets with different degrees of semantic relations using some other classic algorithms, we analyze the reliability of our measures and other properties. Key words: Wikipedia; semantic relation; semantic relatedness.

Key words: Wikipedia, semantic relation, semantic relatedness