北京邮电大学学报

  • EI核心期刊

北京邮电大学学报 ›› 2024, Vol. 47 ›› Issue (2): 123-129.

• 研究报告 • 上一篇    下一篇

一种联合语义和关联匹配的工程咨询报告检索模型

张乐1,杜一凡1,吕学强1,李业龙2,夏雷2   

  1. 1. 北京信息科技大学
    2. 北京市工程咨询有限公司
  • 收稿日期:2023-03-03 修回日期:2023-05-05 出版日期:2024-04-28 发布日期:2024-01-24
  • 通讯作者: 吕学强 E-mail:lvxueqiang@aliyun.com
  • 基金资助:
    国家自然科学基金项目“中文专利价值自动评估研究”;国家语委重点项目“多模态语言舆情监测研究”;北京市教育委员会科学研究计划项目“融合舆情口碑的首都景区知识图谱构建方法研究”

A Retrieval Model of Engineering Consulting Report Based on Joint Semantic and Association Matching

  • Received:2023-03-03 Revised:2023-05-05 Online:2024-04-28 Published:2024-01-24

摘要: 提出了一种面向工程咨询报告的文本检索模型,通过联合语义匹配和关联匹配实现了标题与段落的准确、高效检索,可有效地辅助工程咨询报告的撰写工作。首先,基于工程咨询报告的文本检索语料集,对对比学习模型进行微调,并对标准的基于变换器的双向编码器(Vanilla BERT)模型进行初始化;接着,利用 Vanilla BERT 模型和线性层对语料文本进行训练,得到语义匹配分数。同时, 构建了文本信息和关键词信息的义原词向量表示,并通过深度文本交互模型获得关联匹配分数。将语义匹配分数和关联匹配分数归一化后进行加权融合,得到最终的匹配分数,完成标题与段落之间的文本检索。在所提模型中结合了上下文向量表示和文本交互匹配方法,相较于最优的对比模型,所提模型的 P@20 评价指标提升了 7.49% ,有效增强了文本检索的效果。

关键词: 文本检索, 联合排序, 词向量, 字向量, 义原

Abstract: Writing engineering consulting reports requires writers to collect and read a large number of government policy documents, news reports, etc. There exist some problems such as high labor cost and long writing cycle. How to use text retrieval technology to intelligently match relevant paragraphs and recommend them to writers become particularly important. Proposes a text retrieval model for engineering consulting reports, abbreviated as JSAM, which combines semantic matching and association matching to achieve accurate and efficient retrieval of titles and paragraphs, and can effectively assist the writing of engineering consulting reports. A text retrieval corpus for engineering consulting reports is constructed. The comparative learning model of simCSE is fine-tuned by the corpus set. The Vanilla BERT model is initialized by the obtained model parameters, and the semantic matching score is obtained by sending the text information of the corpus into the Vanilla BERT model. At the same time, the text information and keyword information are represented by word-level semantic primitive vectors through the SAT model, and sent to the deep text interaction model DRMM to obtain the association matching score. The obtained semantic matching score and association matching score are normalized and then weighted and fused to obtain the final matching score, and the text retrieval between the title and the paragraph is completed. Compared with the comparative model CEDR-DRMM, the JSAM combines context vector representation and text interaction matching method, which improves the evaluation index of P@20 by 4.03 percentage points and effectively enhances the effect of text retrieval.

Key words: text retrieval, joint ranking, word vector, character vector, sememe

中图分类号: