北京邮电大学学报

  • EI核心期刊

北京邮电大学学报 ›› 2023, Vol. 46 ›› Issue (1): 38-43.

• 论文 • 上一篇    下一篇

基于对数Gammatone滤波器能量谱图的语种识别

张昊阁,邵玉斌,龙华,彭艺,周大春   

  1.  昆明理工大学

  • 收稿日期:2021-12-29 修回日期:2022-02-21 出版日期:2023-02-28 发布日期:2023-02-22
  • 通讯作者: 邵玉斌 E-mail:shaoyubin999@qq.com
  • 基金资助:
    国家自然科学基金项目

Language Recognition Based on Log Gammatone-Scale Filter Bank Energies Spectrograms

ZHANG Haoge, SHAO Yubin, LONG Hua, PENG Yi, ZHOU Dachun   

  1.  Kunming University of Science and Technology
  • Received:2021-12-29 Revised:2022-02-21 Online:2023-02-28 Published:2023-02-22
  • Contact: SHAO yubin E-mail:shaoyubin999@qq.com

摘要: 针对语种识别在噪声环境下识别率低的问题,提出了一种基于对数 Gammatone 滤波器能量特征谱图的语种识别方法依据 Gammatone 滤波器组的听觉特征提取出对数 Gammatone 滤波器能量特征,并将特征转化为图像获得特征谱图,然后运用暗通道先验算法对特征图进行增强去噪,最后使用残差神经网络模型进行训练识别实验表明,在信噪比为 0 dB,噪声源分别为白噪声车内噪声和粉红噪声时,该方法相对于线性灰度语谱图识别率分别提升了 32.7% 、10.1% 29.1% ,且在其他信噪比下的识别率也有一定的提升

关键词: 语种识别, 听觉特征, Gammatone滤波器, 残差神经网络

Abstract: To solve the low recognition rate issue of language recognition in noisy environment,a language recognition method based on the log Gammatone-scale filter bank energies is proposed. First the log Gammatone-scale filter bank energies features are extracted based on the auditory features of the Gammatone filter-banks, and the features are transformed into images to obtain feature spectrograms. Then, the dark channel prior is applied to enhance and denoise the images. Finally,the residual neural network model is used for training and recognition. Experimental results show that when the signal-to-noise ratio is 0 dB, and the noise sources are white noise,volvo noise and pink noise,the recognition rate of the proposed method is improved by 32.7% ,10.1% and 29.1% , respectively, compared with the linear gray-scale spectrogram,and the recognition rate under other signal-to-noise ratios is also improved.

Key words: language recognition , auditory features , Gammatone filters , residual neural network

中图分类号: