北京邮电大学学报

  • EI核心期刊

北京邮电大学学报

• •    

低信噪比下基于B-Wave-U-Net特征增强的音素识别

黄辉波,邵玉斌,龙华,杜庆治   

  1. 昆明理工大学信息工程与自动化学院
  • 收稿日期:2023-11-07 修回日期:2024-01-13 发布日期:2024-07-18
  • 通讯作者: 黄辉波
  • 基金资助:
    云南省媒体融合重点实验室项目

Phoneme Recognition Based on B-Wave-U-Net Feature Enhancement at Low Signal-to-Noise Ratio

  • Received:2023-11-07 Revised:2024-01-13 Published:2024-07-18

摘要: 针对低信噪比下音素识别准确率低的问题,提出一种基于B-Wave-U-Net特征增强的音素识别方法。首先,将双向长短期记忆模型(BLSTM)融入Wave-U-Net编码器的起始端,并从中引出支路信息流,再跳跃连接到解码器的末端,加入全连接层,从而构建出B-Wave-U-Net网络;然后,利用B-Wave-U-Net对从语音信号提取出的对数语谱图进行图像增强、去噪,以得到新的特征谱图;最后,将所得到的新的特征谱图输入到Mel滤波器组中,得出Fbank特征输出。在信噪比为0dB, 噪声源为白噪声的条件下, 采用THCHS30数据集和ResNet-BLSTM-CTC模型进行音素识别测试。结果表明,所提基于B-Wave-U-Net的特征增强网络相比于CRN、GCRN、DCCRN、GDCRN网络,在音素错误率指标上分别降低了2.5%、2.1%、1.6%、0.9%。此外,在其他信噪比下的音素错误率也有一定下降。

关键词: 音素识别, Fbank, 对数语谱图, Wave-U-Net, BLSTM

Abstract: To address the problem of low phoneme recognition accuracy under low signal-to-noise ratio (SNR), a phoneme recognition method based on B-Wave-U-Net feature enhancement is proposed. Firstly, Bidirectional Long-Short-Term Memory(BLSTM) module is integrated into the starting end of Wave-U-Net encoder, and information flow is derived from it, which is then jump-connected to the end of the decoder, followed by the addition of a fully connected layer, thus forming the B-Wave-U-Net network. Subsequently, the log spectrogram extracted from the speech signal is used to enhance and denoise the image by the B-Wave-U-Net, resulting in a new feature spectrogram,which is finally input into a Mel filter bank to obtain Fbank feature output. Under the condition of a SNR of 0dB and white noise as the noise source, phoneme recognition testing was conducted using the THCHS30 dataset and a ResNet-BLSTM-CTC model. The results reveal that the proposed B-Wave-U-Net based feature enhancement network can reduce the phoneme error rate by 2.5%, 2.1%, 1.6%, and 0.9% compared to CRN, GCRN, DCCRN, and GDCRN networks, respectively. Additionally, there is also a certain reduction in phoneme error rate under other SNR conditions.

Key words: Phonemes Recognition, Fbank, Log Speech Map, Wave-U-Net, BLSTM

中图分类号: