北京邮电大学学报

  • EI核心期刊

北京邮电大学学报 ›› 2006, Vol. 29 ›› Issue (s2): 75-78.doi: 10.13190/jbupt.2006s2.75.308

• 论文 • 上一篇    下一篇

领域语义语法的统计生成

刘建毅1,2, 王菁华1, 王枞1   

  1. 1. 北京邮电大学 信息工程学院, 北京 100876; 2. 北京师范大学 中文信息处理研究所, 北京 100875
  • 收稿日期:2006-09-12 修回日期:1900-01-01 出版日期:2006-11-30 发布日期:2006-11-30
  • 通讯作者: 刘建毅

Statistical Acquisition of Domain-Specific Semantic Grammar

LIU Jian-yi1,2, WANG Jing-hua1, WANG Cong1   

  1. 1. School of Information Engineering, Beijing University of Posts and Telecommunications, Beijing 100876, China;
    2.Graduate School of Chinese Information Processing, Beijing Normal University, Beijing 100875, China
  • Received:2006-09-12 Revised:1900-01-01 Online:2006-11-30 Published:2006-11-30
  • Contact: LIU Jian-yi1

摘要:

提出了一个基于统计的从未标注语料库中半自动获取语义语法算法。该算法对特定领域的语料库进行反复的时间聚类和空间聚类,通过时间聚类发现语言片段的语法结构,通过空间聚类发现语言片段的语义类别;循环迭代,可以生成一个粗糙的文法。最后,将这些抽取出来的粗糙文法经过人工校对,从而得到新领域的语义语法。实验结果表明,该方法是有效和切实可行的。

关键词: 对话系统, 语义语法, K-L距离, 互信息

Abstract:

An approach for semiautomatic grammar acquisition from un-annotated corpus about a specific domain is presented. Its grammar is produced by an iterative procedure, it spatially and temporally clusters the words from a domain-specific corpus. Temporal clustering can discover the fragment’s syntactic structure. Spatial clustering can discover the fragment’s semantic category. Finally, the resultant grammar is post-processed by hand-editing. The preliminary experimental result shows that the method is effective and practical.

Key words: dialog system, semantic grammar, kullback-leibler divergence, Mutual Information

中图分类号: