北京邮电大学学报

  • EI核心期刊

北京邮电大学学报 ›› 2018, Vol. 41 ›› Issue (1): 88-94.doi: 10.13190/j.jbupt.2017-117

• 论文 • 上一篇    下一篇

维吾尔语和韩语形态分析之模型构建

徐春1,2,3,4, 蒋同海1,3, 于凯4, 姜文斌2,5   

  1. 1. 中国科学院 新疆理化技术研究所, 乌鲁木齐 830011;
    2. 中国科学院大学, 北京 100049;
    3. 新疆民族语音语言信息处理重点实验室, 乌鲁木齐 830011;
    4. 新疆财经大学 计算机科学与工程学院, 乌鲁木齐 830012;
    5. 中国科学院 计算技术研究所 智能信息重点实验室, 北京 100190
  • 发布日期:2018-01-04
  • 作者简介:徐春(1977-),女,博士生,E-mail:xuchun@mails.ucas.ac.cn;蒋同海(1963-),男,研究员,博士生导师.
  • 基金资助:
    新疆自治区高校科研计划面上项目(XJEDU2017M027);新疆自治区自然科学基金项目(2015211B034);新疆自治区重点实验室开放课题(2015KL031);新疆自治区重大科技专项课题(2016A03007-3);国家自然科学基金项目(71561025);新疆自治区高校科研计划重点项目(XJEDU2016I038);新疆维吾尔自治区自然科学基金面上项目(2016D01A060)

Model Construction of Uygur and Korean Morphological Analysis

XU Chun1,2,3,4, JIANG Tong-hai1,3, YU Kai4, JIANG Wen-bin2,5   

  1. 1. Xinjiang Technical Institute of Physics & Chemistry, Chinese Academy of Sciences, Urumqi 830011, China;
    2. University of Chinese Academy of Sciences, Beijing 100049, China;
    3. Xinjiang Key Laboratory of Minority Speech and Language Information Processing, Urumqi 830011, China;
    4. College of Computer Science and Engineering, Xinjiang University of Finance and Economics, Urumqi 830012, China;
    5. Key Laboratory of Intelligent Information Processing, Institute of Computing Technology, Chinese Academy of Science, Beijing 100190, China
  • Published:2018-01-04

摘要: 为维吾尔语和韩语形态分析建立了一种图状结构的判别式模型,该模型将语句的形态分析建模为形态成分的图状结构,通过灵活丰富的特征设计描述了词语内部形态成分之间以及分属相邻词语的形态成分之间的关联约束.相比传统的线性模型,图状模型更好地考虑了各形态成分之间的语言学关联,从而取得更高的整句分析性能.在维吾尔语和韩语上的实验结果表明,图状模型相比线性模型的性能有一定提升,形态分析词级准确率分别提升了4.4%和2.8%.

关键词: 形态分析, 黏着语, 图状模型, 线性模型

Abstract: A discriminant model of the graphic structure is established for the morphological analysis of Uighur and Korean language. The model builds the morphological analysis of the sentence into the graphic structure of morphological components, and describes the correlation between the morphological components of the words inside and the morphological components of the adjacent words through flexible and rich feature design. Compared with the traditional linear model, the pattern model is better to consider the linguistic association between the morphological components, and it is expected to achieve higher sentence analysis performance. The experimental results in Uighur and Korean indicate that the graphic model achieves certain performance improvement comparing with the linear model, and the word level accuracy of morphological analysis respectively increases by 4.4 and 2.8 percentage points.

Key words: morphological analysis, agglutinative language, graphic model, linear mode

中图分类号: