Journal of Beijing University of Posts and Telecommunications
Received:
Revised:
Published:
Abstract: To address the problem of low phoneme recognition accuracy under low signal-to-noise ratio (SNR), a phoneme recognition method based on B-Wave-U-Net feature enhancement is proposed. Firstly, Bidirectional Long-Short-Term Memory(BLSTM) module is integrated into the starting end of Wave-U-Net encoder, and information flow is derived from it, which is then jump-connected to the end of the decoder, followed by the addition of a fully connected layer, thus forming the B-Wave-U-Net network. Subsequently, the log spectrogram extracted from the speech signal is used to enhance and denoise the image by the B-Wave-U-Net, resulting in a new feature spectrogram,which is finally input into a Mel filter bank to obtain Fbank feature output. Under the condition of a SNR of 0dB and white noise as the noise source, phoneme recognition testing was conducted using the THCHS30 dataset and a ResNet-BLSTM-CTC model. The results reveal that the proposed B-Wave-U-Net based feature enhancement network can reduce the phoneme error rate by 2.5%, 2.1%, 1.6%, and 0.9% compared to CRN, GCRN, DCCRN, and GDCRN networks, respectively. Additionally, there is also a certain reduction in phoneme error rate under other SNR conditions.
Key words: Phonemes Recognition, Fbank, Log Speech Map, Wave-U-Net, BLSTM
CLC Number:
TN912.3
0 / / Recommend
Add to citation manager EndNote|Ris|BibTeX
URL: https://journal.bupt.edu.cn/EN/