北京邮电大学学报

  • EI核心期刊

北京邮电大学学报 ›› 2013, Vol. 36 ›› Issue (4): 81-84.doi: 10.13190/jbupt.201304.81.wangxw

• 研究报告 • 上一篇    下一篇

双语主题跨语言伪相关反馈

王序文, 王小捷, 孙月萍   

  1. 北京邮电大学 智能科学与技术中心, 北京 100876
  • 收稿日期:2012-11-15 出版日期:2013-08-31 发布日期:2013-05-22
  • 作者简介:王序文(1982—),女,博士生,E-mail:xw.y.wang@gmail.com;王小捷(1969—),男,教授,博士生导师.
  • 基金资助:

    国家自然科学基金项目(61273365);国家高技术研究发展计划项目(2012AA011104)

Cross-Lingual Pseudo Relevance Feedback Based on Bilingual Topics

WANG Xu-wen, WANG Xiao-jie, SUN Yue-ping   

  1. Center for Intelligence Science and Technology, Beijing University of Posts and Telecommunications, Beijing 100876, China
  • Received:2012-11-15 Online:2013-08-31 Published:2013-05-22

摘要:

面向跨语言信息检索任务提出了一个引入双语主题的跨语言伪相关反馈模型. 将潜在狄利克雷分配模型扩展为能同时对双语文档建模的主题模型,其中每个主题既可以生成源语言词项,也可以生成目标语言词项;为查询式选择相关的双语主题,并利用其中的相关词项对查询翻译进行优化扩展,获得用于二次检索的新查询. 实验结果表明,基于该反馈模型的跨语言检索效果优于其他基于单语主题模型和向量空间模型等反馈策略.

关键词: 伪相关反馈, 潜在狄利克雷分配, 双语主题, 跨语言信息检索, 查询扩展

Abstract:

A cross-lingual pseudo relevance feedback model based on bilingual topics is proposed for cross language information retrieval task. The latent Dirichlet allocation (LDA) model is extended to the bilingual topic model, each topic could generate a source language token and a target language token. A strategy on how to choose topics and words for cross language query expansion is given, and the secondary retrieval is performed on the basis of the refined query translation. Experiments show that this model outperforms monolingual LDA-based feedback method as well as classical techniques based on vector space model.

Key words: pseudo relevance feedback, latent Dirichlet allocation, bilingual topics, cross language information retrieval, query expansion