Phoneme Recognition
Based on B-Wave-U-Net Feature Enhancement at Low Signal-to-Noise Ratio

Journal of Beijing University of Posts and Telecommunications ›› 2025, Vol. 48 ›› Issue (1): 100-106.

Phoneme Recognition Based on B-Wave-U-Net Feature Enhancement at Low Signal-to-Noise Ratio

HUANG Huibo, SHAO Yubin, LONG Hua, DU Qingzhi

Received:2023-11-07 Revised:2024-01-13 Online:2025-02-26 Published:2025-02-25

Abstract

Abstract: To address the issue of low phoneme recognition accuracy at low signal-to-noise ratios (SNR), a phoneme recognition method is proposed based on B-Wave-U-Net feature enhancement. First, a bidirectional long short-term memory (BLSTM) network is integrated at the beginning side of the Wave- U-Net encoder, from where the information flow is extracted and jump-connected to the decoder side. Then it will be inserted into a fully connected layer to form the B-Wave-U-Net network. The next speech spectrogram is then enhanced and denoised using the B-Wave-U-Net. Finally, Mel filtering is applied to extract the log-Mel scale bank energy features. Phoneme recognition tests are conducted under 0 dB SNR with a white noise source, using the THCHS30 dataset and the ResNet-BLSTM-CTC model. Experimental results show that the proposed B-Wave-U-Net outperforms the baseline network, reducing the phoneme error rate by 0.9% to 2.5% . This demonstrates the significant advantage of the B-Wave-U-Net in robust feature extraction for phoneme recognition under noisy conditions.

Key words: phonemes recognition , log Mel-scale filter bank energies , Wave-U-Net, bidirectional long short-term memory

CLC Number:

TN912.3

HUANG Huibo, SHAO Yubin, LONG Hua, DU Qingzhi. Phoneme Recognition Based on B-Wave-U-Net Feature Enhancement at Low Signal-to-Noise Ratio[J]. Journal of Beijing University of Posts and Telecommunications, 2025, 48(1): 100-106.

[1]	. Language Identification method based on Fusion Feature MGCC [J]. Journal of Beijing University of Posts and Telecommunications, 2023, 46(2): 116-121.
[2]	yubin yubinshao Da-Chun ZHOU. Language Identification Based on Gammatone-Scale Power-Normalized Coefficients Spectrograms [J]. Journal of Beijing University of Posts and Telecommunications, 2023, 46(2): 122-128.
[3]	SHAO Yu-bin, LIU Jing, LONG Hua, LI Yi-min. Language Identification in Real Noisy Environments [J]. Journal of Beijing University of Posts and Telecommunications, 2021, 44(6): 134-140.
[4]	SHAO Yu-bin, LIU Jing, LONG Hua, DU Qing-zhi, LI Yi-min. Language Identification Based on Vocal Tract Spectrum Parameters [J]. Journal of Beijing University of Posts and Telecommunications, 2021, 44(3): 112-119.
[5]	WU Xin-zhong, XIA Ling-xiang, ZHANG Xu, ZHOU Cheng. Voice Activity Detection Method Based on MFPH [J]. JOURNAL OF BEIJING UNIVERSITY OF POSTS AND TELECOM, 2019, 42(2): 83-89.
[6]	ZHANG Qiu-yu, XING Peng-fei, HUANG Yi-bo, DONG Rui-hong, YANG Zhong-ping. Perceptual Hashing Algorithm for Multi-Format Audio [J]. JOURNAL OF BEIJING UNIVERSITY OF POSTS AND TELECOM, 2016, 39(4): 77-82.
[7]	XU Jing-yun, ZHAO Xiao-qun, LI Rong-yun, WANG Qiao. Vocoder Excitation Model Based on Voicing Cut-Off Frequency [J]. JOURNAL OF BEIJING UNIVERSITY OF POSTS AND TELECOM, 2015, 38(3): 28-33.
[8]	ZHANG Wei-wei, CHANG Yong-yu, LIU Yi-tong, YANG Da-cheng. Performance Study of PESQ for Speech Codecs in Chinese Environment [J]. JOURNAL OF BEIJING UNIVERSITY OF POSTS AND TELECOM, 2014, 37(3): 115-119.
[9]	LI Hong SUN Yun-lian. DENOISING By ICA Based on EMD Virtual Channel [J]. JOURNAL OF BEIJING UNIVERSITY OF POSTS AND TELECOM, 2007, 30(5): 33-36.
[10]	Lin Sheng Ji Yong Quan Ziyi. Error Analysis of the Implementation of MPEG Audio Coding#br# by Fixed Point DSP [J]. JOURNAL OF BEIJING UNIVERSITY OF POSTS AND TELECOM, 1999, 22(1): 71-74.
[11]	CHEN Liang, ZHANG Xiong-wei. Implementation of Speech Segmentation and Enhancement Based on Fractal Dimension [J]. JOURNAL OF BEIJING UNIVERSITY OF POSTS AND TELECOM, 2003, 26(S1): 112-114.

Phoneme Recognition Based on B-Wave-U-Net Feature Enhancement at Low Signal-to-Noise Ratio

PDF

Knowledge

Abstract

Cite this article

share this article

References

Related Articles 11

Recommended Articles

Metrics

Comments