北京邮电大学学报

  • EI核心期刊

北京邮电大学学报 ›› 2009, Vol. 32 ›› Issue (5): 36-40.doi: 10.13190/jbupt.200905.36.dongy

• 论文 • 上一篇    下一篇

条件随机场模型在韵律结构预测中的应用

董远;周涛;董乘宇;王海拉   

  1. (1.北京邮电大学 信息与通信工程学院, 北京 100876;
    2.法国电信北京研发中心, 北京 100190)
  • 收稿日期:2009-03-11 修回日期:2009-08-03 出版日期:2009-10-28 发布日期:2009-10-28
  • 通讯作者: 董远

Prosodic Structure Prediction based on Conditional Random Field Model

DONG Yuan1,ZHOU Tao1,DONG Cheng-yu2,WANG Hai-la2   

  1. DONG Yuan1,〓ZHOU Tao1,DONG Cheng-yu2,WANG Hai-la2
  • Received:2009-03-11 Revised:2009-08-03 Online:2009-10-28 Published:2009-10-28

摘要:

为提高中文语音合成的自然度,对文本的韵律结构体系进行研究,并提出一种基于条件随机场(CRF)的韵律结构预测方法. 从一个大规模人工标注的语料库中,选取由机器生成的分词标注特征和分级的韵律边界信息,利用CRF算法进行机器学习产生韵律词和韵律短语的CRF模型,并用于韵律结构的预测中. 实验结果表明,韵律词和韵律短语的F-score分别达到90.67%和80.05%,相比于基于最大熵(ME)模型的韵律结构预测方法分别提高了3.62%和5.65%,同时准确率和召回率也有较大提高.

关键词: 语音合成, 韵律结构, 条件随机场, 机器学习

Abstract:

Prosodic structure prediction is an important component in mandarin text-to-speech (TTS) system. A prosodic structure prediction method is proposed, based on the conditional random field (CRF) algorithm. Prosodic word model and prosodic phrase model utilize CRF method for machine learning based on automatically segmented and tagged features and hierarchal prosodic structure information extracted from a large-scale manually labeled speech corpus. The approach achieves F-score of 90.67% in prosody word prediction and 80.05% in prosody phrase prediction, 3.62% and 5.65% higher than that of max entropy (ME) algorithm based method. Experiment results show that the approach of CRF based method makes considerable improvement in prosodic structure prediction, and works well in real mandarin TTS system.

Key words: text-to-speech, prosodic structure, conditional random field, machine learning