北京邮电大学学报

  • EI核心期刊

北京邮电大学学报 ›› 2024, Vol. 47 ›› Issue (2): 90-96.

• 论文 • 上一篇    下一篇

基于编码视频的动态手势数据优化与识别

谢晓燕1,曹盘宇1,夏浩1,2,陈雨馨1,2   

  1. 1. 西安邮电大学
    2.
  • 收稿日期:2023-04-12 修回日期:2023-05-27 出版日期:2024-04-28 发布日期:2024-01-24
  • 通讯作者: 曹盘宇 E-mail:cpy@stu.xupt.edu.cn
  • 基金资助:
    科技创新2030——“新一代人工智能”重大项目(2022ZD0119005);国家自然科学基金重点项目(61834005)

Dynamic Gesture Data Optimization and Recognition Based on Encoded Video

  • Received:2023-04-12 Revised:2023-05-27 Online:2024-04-28 Published:2024-01-24
  • Contact: Yu PanCAO E-mail:cpy@stu.xupt.edu.cn

摘要: 编码视频数据流中的运动矢量和残差等语法元素可用于替代光流进行运动表示,但其固有的像素噪声和特征稀疏性会影响精细动作的识别精度。对此,在对编码视频语法元素进行数据优化的基础上, 设计了一个高精度、低复杂度的动态手势识别框架。首先,提出了关键 P 帧选择方法,通过选择信息量更高的编码帧解决了特征稀疏性问题;其次, 提出了联合残差特征表示方法,利用残差得到精细的手势轮廓图,去除了运动矢量中手部以外的像素噪声;最后, 设计了一种轻量而高效的动态手势识别模型,利用优化后的运动矢量和残差获得了类似于光流的计算效果。在 viva,sheffield klnect gesture,NvGesture 和 EgoGesture 等数据集上对所提方法进行了验证,实验结果显示,所提方法中仅使用 RGB 数据模式可达到的识别精度分别为 82.94% 、99.72% 、81.12% 和 90.48% ,降低了 89% 的存储开销,并且以 4.7 倍的运行速度获得了与先进方法相近的结果。

关键词: 动态手势识别, 编码视频, 运动矢量, 残差, 数据优化

Abstract: The syntax elements extract from encoding video data streaming, such as motion vectors and residuals, can be used to characterize the motion of action recognition and obtain the better precision than optical-flow. But its inherent pixel noise and feature sparsity may also lead to some error when fine movements recognized. To address these issues, a dynamic gesture recognition framework was designed to get higher-precision and lower-complexity, by using the data optimization of syntax elements in coding video. Specifically, a key P-frame selection strategy is introduced to cope with the feature sparsity by selecting encoding frames which cover higher information content. Moreover, a joint residual feature representation method is proposed to remove the noisy motion vectors outside the hand by using finer gesture contour maps obtained from residuals. It is demonstrated that the presented model achieves the similar computation effects as optical flow. Experiments on the baseline dataset, VIVA dataset, SKIG dataset, NvGesture and EgoGesture dataset, the results show that the scheme achieves an average recognition accuracy of 82.94%, 99.72%, 81.12% and 90.48% using only RGB data, reducing storage overhead by 89% and achieving similar results to SOTA methods at 4.7 times the operating speed.

Key words: dynamic gesture recognition, encoded video, motion vector, residual, data optimization

中图分类号: