基于精简双线性注意力网络的环境声音分类

北京邮电大学学报 ›› 2023, Vol. 46 ›› Issue (6): 102-0.

基于精简双线性注意力网络的环境声音分类

董绍江,夏蒸富,蔡巍巍

1. 重庆交通大学
2. 大陆汽车研发（重庆）有限公司

收稿日期:2022-08-08 修回日期:2023-02-22 出版日期:2023-12-28 发布日期:2023-12-29
通讯作者: 董绍江 E-mail:dongshaojiang100@ 163. com

Environmental Sound Classification Based on Compact Bilinear Attention Network

Received:2022-08-08 Revised:2023-02-22 Online:2023-12-28 Published:2023-12-29

摘要/Abstract

摘要： 局部区域差异会导致环境声音难以精确分类。对此，提出了一种基于精简双线性注意力网络的环境声音分类方法。首先，引入多维时频特征充分表征环境声音的特征；其次，引入随机擦除在线数据增强方法，避免因缺乏数据集而导致模型过拟合的问题，提高样本的多样性；最后，在精简双线性网络框架不变的情况下，采用密集型连接网络（DensNet-169）作为特征提取模块，并引入通道空间位置注意力模块，关注环境声音特征局部区域的差异。实验结果表明，所提方法在ESC-10和ESC-50数据集上的准确率均超过人耳识别的准确率。

关键词: 精简双线性网络, 注意力模块, 环境声音分类, 随机擦除数据增强, 多维时频特征

Abstract: Local regional differences can make it difficult to classify environmental sounds accurately. Therefore, an environmental sound classification based on compact bilinear attention network is proposed. First, multi-dimensional time-frequency features are introduced to fully characterize the characteristics of environmental sound. Second, online random erasing data augmentation is introduced to avoid overfitting of the trained model due to lack of dataset and improve sample diversity. Finally, with the unchanged compact bilinear network framework, DensNet-169 is adopted as the feature extraction module, and the channel spatial location attention module is introduced to pay attention to the differences of local regions of environmental sound features. The experimental results show that the accuracy of the proposed method on ESC-10 and ESC-50 datasets can reach 96.0% and 87.9%, respectively, both of which are better than human ear recognition accuracy.

Key words: compactbilinear network, attention module, environmental sound classification, random erasing data augmentation, multi-dimensional time-frequency features

中图分类号:

TN912

董绍江夏蒸富蔡巍巍. 基于精简双线性注意力网络的环境声音分类[J]. 北京邮电大学学报, 2023, 46(6): 102-0.