北京邮电大学学报

  • EI核心期刊

北京邮电大学学报 ›› 2025, Vol. 48 ›› Issue (1): 100-106.

• 研究报告 • 上一篇    下一篇

低信噪比下基于 B-Wave-U-Net 特征增强的音素识别

黄辉波,邵玉斌,龙华,杜庆治   

  1. 昆明理工大学 信息工程与自动化学院
  • 收稿日期:2023-11-07 修回日期:2024-01-13 出版日期:2025-02-26 发布日期:2025-02-25
  • 通讯作者: 邵玉斌 E-mail:3319606561@qq.com
  • 基金资助:
    云南省媒体融合重点实验室项目

Phoneme Recognition Based on B-Wave-U-Net Feature Enhancement at Low Signal-to-Noise Ratio

HUANG Huibo,  SHAO Yubin,  LONG Hua,  DU Qingzhi   

  • Received:2023-11-07 Revised:2024-01-13 Online:2025-02-26 Published:2025-02-25

摘要: 针对低信噪比下音素识别准确率低的问题,提出一种基于B-Wave-U-Net特征增强的音素识别方法。首先,将双向长短期记忆(BLSTM)网络融入Wave-U-Net编码器的起始端,并从中引出支路信息流,再跳跃连接到解码器的末端,加入全连接层,从而构建出B-Wave-U-Net;接着,使用B-Wave-U-Net对语谱图增强、去噪;最后经过梅尔滤波,得到对数梅尔尺度滤波器组能量特征。在信噪比为0dB,噪声源为白噪声的条件下,采用THCHS30数据集和ResNet-BLSTM-CTC模型进行音素识别测试。结果表明,所提B-Wave-U-Net优于对比网络,音素错误率降低了0.9%~2.5%。验证了在音素识别下的噪声鲁棒性特征提取上,B-Wave-U-Net发挥了重要的优势。

关键词: 音素识别 , 对数梅尔尺度滤波器组能量 , 对数语谱图 , Wave-U-Net ,  双向长短期记忆

Abstract: To address the issue of low phoneme recognition accuracy at low signal-to-noise ratios (SNR), a phoneme recognition method is proposed based on B-Wave-U-Net feature enhancement. First, a bidirectional long short-term memory (BLSTM) network is integrated at the beginning side of the Wave- U-Net encoder, from where the information flow is extracted and jump-connected to the decoder side. Then it will be inserted into a fully connected layer to form the B-Wave-U-Net network. The next speech spectrogram is then enhanced and denoised using the B-Wave-U-Net. Finally, Mel filtering is applied to extract the log-Mel scale bank energy features. Phoneme recognition tests are conducted under 0 dB SNR with a white noise source, using the THCHS30 dataset and the ResNet-BLSTM-CTC model. Experimental results show that the proposed B-Wave-U-Net outperforms the baseline network, reducing the phoneme error rate by 0.9% to 2.5% . This demonstrates the significant advantage of the B-Wave-U-Net in robust feature extraction for phoneme recognition under noisy conditions.

Key words: phonemes recognition ,   log Mel-scale filter bank energies , Wave-U-Net, bidirectional long short-term memory

中图分类号: