基于CM-Transformer的连续手语识别

北京邮电大学学报 ›› 2022, Vol. 45 ›› Issue (5): 49-53,78.

基于CM-Transformer的连续手语识别

叶康,张淑军,郭淇,李辉,崔雪红

青岛科技大学信息科学技术学院

收稿日期:2021-11-01 修回日期:2022-01-05 出版日期:2022-10-28 发布日期:2022-11-01
通讯作者: 张淑军 E-mail:zhangsj@qust.edu.cn
基金资助:
山东省重点研发计划项目

Continuous Sign Language Recognition Based on CM-Transformer

YE Kang, ZHANG Shujun, GUO Qi, LI Hui, CUI Xuehong

Received:2021-11-01 Revised:2022-01-05 Online:2022-10-28 Published:2022-11-01

摘要/Abstract

摘要： 针对捕获手语动作的全局特征和局部特征以及保留图像中原有的结构和捕获上下文联系,提出了一种改进的卷积多层感知机鄄自注意力(CM-Transformer)方法用于连续手语识别。 CM-Transformer 将卷积层的结构一致性优势与自注意力模型编码器的全局建模性能相结合,以捕获长期的序列依赖。同时将自注意力模型前馈层替换为多层感知机,以发挥其平移不变性和局部性。使用随机帧丢弃和随机梯度停止技术,减少时间和空间上的训练计算量,防止过拟合,由此构建一种高效计算的轻量级网络;最后使用连接主义时间分类解码器对输入和输出序列对齐,得到最终的识别结果。在两个大型基准数据集上的实验结果表明了所提方法的有效性。

关键词: 连续手语识别, 卷积神经网络, 自注意力模型, 多层感知机

Abstract: To capture the global and local features of sign language actions and preserve the original structure and context in the image, an improved convolution multilayer perceptron Transformer ( CM-Transformer) model is proposed for continuous sign language recognition. The structural consistency advantage of convolution layer and the global modeling performance of self attention model encoder are combined by CM-Transformer to capture long-term sequence dependence. Meanwhile, the feedforward layer of self attention model is replaced by multilayer perceptron to perform translation invariance and locality. In addition, random frame discarding and random gradient stopping techniques are used to reduce the training computation in time and space, and prevent over fitting. Thus, an efficient and lightweight network has been constructed. Finally the connectionist temporal classification decoder is used to align the input and output sequences to obtain the final recognition result. Experimental results on two large benchmark data sets show the effectiveness of the proposed method.

Key words: continuous sign language recognition, convolutional neural network, self-attention model, multilayer perceptron

中图分类号:

TP391.41
','1');return false;" target="_blank"> TP391.41

叶康张淑军郭淇李辉崔雪红. 基于CM-Transformer的连续手语识别[J]. 北京邮电大学学报, 2022, 45(5): 49-53,78.

YE Kang, ZHANG Shujun, GUO Qi, LI Hui, CUI Xuehong. Continuous Sign Language Recognition Based on CM-Transformer[J]. Journal of Beijing University of Posts and Telecommunications, 2022, 45(5): 49-53,78.