北京邮电大学学报

  • EI核心期刊

北京邮电大学学报 ›› 2023, Vol. 46 ›› Issue (4): 58-63.

• 论文 • 上一篇    下一篇

轻量化汉语唇读模型及数据集构建

孙保胜,谢东亮   

  1. 北京邮电大学  计算机学院
  • 收稿日期:2022-05-31 修回日期:2022-08-30 出版日期:2023-08-28 发布日期:2023-08-24
  • 通讯作者: 谢东亮 E-mail:xiedl@bupt.edu.cn

Lightweight Chinese Lipreading Model and Dataset Construction

SUN Baosheng,  XIE Dongliang   

  • Received:2022-05-31 Revised:2022-08-30 Online:2023-08-28 Published:2023-08-24
  • Contact: Dongliang Xie E-mail:xiedl@bupt.edu.cn

摘要: 为了促进汉语唇读的快速发展和实际应用,提出了一种基于交错组卷积和空洞卷积组合的轻量化唇读模型所提模型通过分组卷积学习不同特征间的相关性,通过空洞卷积扩展模型视野,在大幅度降低模型参数量和复杂度的同时提高模型识别精度针对汉语唇读数据集较少的问题,在可控制环境下录制了一个句子级汉语唇读数据集在录制数据集和公开数据集上对轻量化唇读模型适用性进行实验验证,证明了模型的有效性并通过热图可视化的方法分析了模型对视频帧和文本映射关系的学习能力

关键词: 汉语唇读, 轻量化, 交错组卷积, 空洞卷积

Abstract: In order to promote the rapid development and practical application of Chinese lipreading, a lightweight lipreading model is proposed based on the combination of interleaved group convolution and dilated convolution. In the proposed model, the interleaved group convolution is taken to learn the correlation between different features and the dilated convolution is taken to expand the model receptive field, which greatly reduces the amount and complexity of model parameter and improves the accuracy of model recognition. In addition, the largest sentence-level Chinese lipreading dataset is recorded in a controlled environment to enrich the Chinese lipreading dataset. The applicability of the lightweight lipreading model is verified on the recorded datasets and public datasets. The learning ability of the model to the video frame and text mapping relationship is analyzed visually through the heatmap.

Key words: Chinese lipreading , lightweight , interleaved group convolution, dilated convolution

中图分类号: