Journal of Beijing University of Posts and Telecommunications

  • EI核心期刊

Journal of Beijing University of Posts and Telecommunications

   

Phoneme Recognition Based on B-Wave-U-Net Feature Enhancement at Low Signal-to-Noise Ratio

  

  • Received:2023-11-07 Revised:2024-01-13 Published:2024-07-18

Abstract: To address the problem of low phoneme recognition accuracy under low signal-to-noise ratio (SNR), a phoneme recognition method based on B-Wave-U-Net feature enhancement is proposed. Firstly, Bidirectional Long-Short-Term Memory(BLSTM) module is integrated into the starting end of Wave-U-Net encoder, and information flow is derived from it, which is then jump-connected to the end of the decoder, followed by the addition of a fully connected layer, thus forming the B-Wave-U-Net network. Subsequently, the log spectrogram extracted from the speech signal is used to enhance and denoise the image by the B-Wave-U-Net, resulting in a new feature spectrogram,which is finally input into a Mel filter bank to obtain Fbank feature output. Under the condition of a SNR of 0dB and white noise as the noise source, phoneme recognition testing was conducted using the THCHS30 dataset and a ResNet-BLSTM-CTC model. The results reveal that the proposed B-Wave-U-Net based feature enhancement network can reduce the phoneme error rate by 2.5%, 2.1%, 1.6%, and 0.9% compared to CRN, GCRN, DCCRN, and GDCRN networks, respectively. Additionally, there is also a certain reduction in phoneme error rate under other SNR conditions.

Key words: Phonemes Recognition, Fbank, Log Speech Map, Wave-U-Net, BLSTM

CLC Number: