Journal of Beijing University of Posts and Telecommunications

  • EI核心期刊

JOURNAL OF BEIJING UNIVERSITY OF POSTS AND TELECOM ›› 2009, Vol. 32 ›› Issue (5): 36-40.doi: 10.13190/jbupt.200905.36.dongy

• Papers • Previous Articles     Next Articles

Prosodic Structure Prediction based on Conditional Random Field Model

DONG Yuan1,ZHOU Tao1,DONG Cheng-yu2,WANG Hai-la2   

  1. DONG Yuan1,〓ZHOU Tao1,DONG Cheng-yu2,WANG Hai-la2
  • Received:2009-03-11 Revised:2009-08-03 Online:2009-10-28 Published:2009-10-28

Abstract:

Prosodic structure prediction is an important component in mandarin text-to-speech (TTS) system. A prosodic structure prediction method is proposed, based on the conditional random field (CRF) algorithm. Prosodic word model and prosodic phrase model utilize CRF method for machine learning based on automatically segmented and tagged features and hierarchal prosodic structure information extracted from a large-scale manually labeled speech corpus. The approach achieves F-score of 90.67% in prosody word prediction and 80.05% in prosody phrase prediction, 3.62% and 5.65% higher than that of max entropy (ME) algorithm based method. Experiment results show that the approach of CRF based method makes considerable improvement in prosodic structure prediction, and works well in real mandarin TTS system.

Key words: text-to-speech, prosodic structure, conditional random field, machine learning