北京邮电大学学报

  • EI核心期刊

北京邮电大学学报 ›› 2023, Vol. 46 ›› Issue (2): 116-121.doi: 10.13190/j.jbupt.2021-322

• 模式识别与图像处理 • 上一篇    下一篇

基于融合特征MGCC的语种识别方法

王延凯1,龙华2,邵玉斌2,杜庆治2,王瑶2   

  1. 1. 昆明理工大学信息工程与自动化学院
    2. 昆明理工大学
  • 收稿日期:2021-12-23 修回日期:2022-04-19 出版日期:2023-04-28 发布日期:2023-05-14
  • 通讯作者: 龙华 E-mail:hualong89@gmail.com;longhua@kust.edu.cn
  • 基金资助:
    国家自然科学基金

Language Identification method based on Fusion Feature MGCC

  • Received:2021-12-23 Revised:2022-04-19 Online:2023-04-28 Published:2023-05-14
  • Supported by:
    The National Natural Science Foundation of China

摘要: 针对噪声环境下单一声学特征很难有效表征语种信息的问题,提出了一种将梅尔倒谱系数和伽马频率倒谱系数融合的语种识别方法.方法首先提取语音的梅尔频率倒谱系数和伽马频率倒谱系数,然后将两特征通过矩阵空间变换,得到融合特征梅尔伽马倒谱系数,最后将融合特征输入到深度瓶颈网络,并分别在25种不同的噪声环境下测试MGCC特征的语种识别性能.实验结果表明,在不同噪声不同信噪比下,所提方法的识别准确率远高于单一的声学特征及其它融合特征,在纯净环境下的语种识别准确率可以达到99.56%,在-5dB低信噪比下仍可以达到93%以上,证明了所提方法的有效性和鲁棒性.

关键词: 语种识别, 融合特征, 深度神经网络, 低信噪比, 鲁棒性

Abstract: Aiming at the problem that it is difficult for a single acoustic feature to effectively represent language information in a noisy environment, a language identification method is proposed by combining mel-scale frequency cepstral coefficients and gammatone frequency cepstral coefficients. Firstly, the mel-scale frequency cepstral coefficients and gammatone frequency cepstral coefficients of speech are extracted. Then, the two features are transformed by matrix space to obtain the mel-scale gammatone cepstral coefficients of fusion feature. Finally, the fusion feature is input into the deep bottleneck network, and the language identification performance of MGCC features is tested in 25 different noise environments. The experimental results show that the identification accuracy of the proposed method is much higher than that of the single acoustic feature and other fusion features under different noise and different signal noise ratios. The accuracy of language identification can reach 99.56% in the clean corpus, and can still reach more than 93% under -5dB signal noise ratio, which proves the effectiveness and robustness of the proposed method.

Key words: language identification, fusion feature, deep bottleneck network, low signal noise rate, robustness

中图分类号: