北京邮电大学学报

  • EI核心期刊

北京邮电大学学报 ›› 2023, Vol. 46 ›› Issue (2): 122-128.

• 模式识别与图像处理 • 上一篇    下一篇

基于Gammatone尺度功率规整系数谱图的语种识别

张昊阁1,邵玉斌1,龙华1,杜庆治1,周大春2   

  1. 1. 昆明理工大学
    2. 昆明理工大学(呈贡校区)
  • 收稿日期:2022-03-31 修回日期:2022-06-23 出版日期:2023-04-28 发布日期:2023-05-14
  • 通讯作者: 邵玉斌 E-mail:shaoyubin999@QQ.com
  • 基金资助:
    国家自然科学基金

Language Identification Based on Gammatone-Scale Power-Normalized Coefficients Spectrograms

1,yubin yubinshao1, 1, 1,Da-Chun ZHOU2   

  1. 1.
    2. Kunming University of Science and Technology
  • Received:2022-03-31 Revised:2022-06-23 Online:2023-04-28 Published:2023-05-14
  • Contact: yubin yubinshao E-mail:shaoyubin999@QQ.com

摘要: 针对语种识别在噪声环境下识别率低的问题,提出一种基于Gammatone尺度功率规整系数谱图的语种识别方法。依据在功率上对噪声的抑制和Gammatone滤波器组的听觉特征提取出Gammatone尺度功率规整系数作为特征,并转化为图像获得特征谱图,然后运用暗通道先验算法与自动色阶算法对图像进行增强去噪,最后使用残差神经网络进行训练和识别。实验表明,在信噪比为0dB,噪声源分别为白噪声、车内噪声、粉红噪声、高频信道噪声、餐厅噪声、工厂噪声条件下,该方法相对于线性灰度语谱图识别率分别提升了39.1%、12.3%、19.0%、5.5%、28.2%、28.5%,且在其它信噪比下的识别率也有一定的提升。

关键词: 语种识别, 听觉特征, 功率规整, 残差神经网络

Abstract: Aiming at the low identification rate of language identification in noisy environment, a language identification method is proposed based on the Gammatone-scale power-normalized coefficients spectrograms, which are obtained by extracting coefficients as features based on the suppression of noise in power and the auditory features of the Gammatone filter-banks, and transformed into images as spectrograms. Then the dark channel prior algorithm and automatic color scale algorithm are applied to enhance and denoise the images. Finally, the residual neural network is used for training and identification. Experiments show that the identification rate of the proposed method is improved by 39.1%, 12.3%, 19.0%, 5.5%, 28.2% and 28.5% relative to the linear gray-scale spectrograms under the conditions of signal-to-noise ratio is 0dB and noise sources are white noise, volvo noise, pink noise, HF channel noise, babble noise and factory floor noise respectively. The identification rate under other signal-to-noise ratios is also improved.

Key words: language identification, auditory features, power-normalized, residual neural network

中图分类号: