北京邮电大学学报

  • EI核心期刊

北京邮电大学学报 ›› 2008, Vol. 31 ›› Issue (1): 5-8.doi: 10.13190/jbupt.200801.5.tanym

• 论文 • 上一篇    下一篇

使用SVMs进行汉语浅层分析

谭咏梅,王小捷,周延泉,钟义信   

  1. 北京邮电大学 信息工程学院, 北京 100876
  • 收稿日期:2006-11-01 修回日期:1900-01-01 出版日期:2008-02-28 发布日期:2008-02-28
  • 通讯作者: 谭咏梅

Chinese Shallow Parsing with Support Vector Machines

TAN Yong-mei, WANG Xiao-jie, Zhou Yan-quan, ZHONG Yi-xin   

  1. School of Information Engineering, Beijing University of Posts and Telecommunications, Beijing 100876, China
  • Received:2006-11-01 Revised:1900-01-01 Online:2008-02-28 Published:2008-02-28
  • Contact: TAN Yong-mei

摘要:

提出了基于support vector machines(SVMs)的汉语浅层分析方法,并且为描述整个层次短语结构定义了10种汉语组块类型. 与其他机器学习方法相比,该方法能自动选择对浅层分析有用特征,并能选择出有效的特征组合,较以前的研究可反映识别方向、特征模板、核函数、多分类方法及其组合对基于SVMs的汉语浅层分析性能的影响. 在开放语料Chinese TreeBank 上, Precision、 Recall和 FB1平均达到了95.36%、97.30%和96.32%.

关键词: 支持向量机, 浅层分析, 组块

Abstract:

To be able to represent the whole hierarchical phrase structure, 10 types of Chinese chunks are defined. A method of Chinese shallow paring based on support vector machines is presented. Conventional recognition techniques based on machine learning have difficulty in selecting useful features as well as finding appropriate combination of selected features. SVMs can automatically focus on useful features and robustly handle a large feature set to develop models that maximize their generalizability. On the other hand, it is well known that SVMs achieve high generalization of very high dimensional feature space. Furthermore, by introducing the kernel principle, SVMs can carry out the training in high-dimensional space with smaller computational cost independent of their dimensionality. The experiments produced promising results.

Key words: support vector machines, shallow parsing, chunk

中图分类号: