Journal of Beijing University of Posts and Telecommunications

  • EI核心期刊

Journal of Beijing University of Posts and Telecommunications ›› 2025, Vol. 48 ›› Issue (1): 100-106.

Previous Articles     Next Articles

Phoneme Recognition Based on B-Wave-U-Net Feature Enhancement at Low Signal-to-Noise Ratio

HUANG Huibo,  SHAO Yubin,  LONG Hua,  DU Qingzhi   

  • Received:2023-11-07 Revised:2024-01-13 Online:2025-02-26 Published:2025-02-25

Abstract: To address the issue of low phoneme recognition accuracy at low signal-to-noise ratios (SNR), a phoneme recognition method is proposed based on B-Wave-U-Net feature enhancement. First, a bidirectional long short-term memory (BLSTM) network is integrated at the beginning side of the Wave- U-Net encoder, from where the information flow is extracted and jump-connected to the decoder side. Then it will be inserted into a fully connected layer to form the B-Wave-U-Net network. The next speech spectrogram is then enhanced and denoised using the B-Wave-U-Net. Finally, Mel filtering is applied to extract the log-Mel scale bank energy features. Phoneme recognition tests are conducted under 0 dB SNR with a white noise source, using the THCHS30 dataset and the ResNet-BLSTM-CTC model. Experimental results show that the proposed B-Wave-U-Net outperforms the baseline network, reducing the phoneme error rate by 0.9% to 2.5% . This demonstrates the significant advantage of the B-Wave-U-Net in robust feature extraction for phoneme recognition under noisy conditions.

Key words: phonemes recognition ,   log Mel-scale filter bank energies , Wave-U-Net, bidirectional long short-term memory

CLC Number: