北京邮电大学学报

  • EI核心期刊

北京邮电大学学报 ›› 2022, Vol. 45 ›› Issue (6): 62-69.

• 论文 • 上一篇    下一篇

面向语义通信的3D骨骼点数据编码与压缩方法

张浩1,冯春燕1,杨佳汇1,郭彩丽2,周博文1   

  1. 1. 北京邮电大学
    2. 北京邮电大学信息与通信工程学院
  • 收稿日期:2022-05-31 修回日期:2022-08-27 出版日期:2022-12-28 发布日期:2022-11-24
  • 通讯作者: 郭彩丽 E-mail:guocaili@bupt.edu.cn

Encoding and compression method of 3D skeleton data for semantic communication

  • Received:2022-05-31 Revised:2022-08-27 Online:2022-12-28 Published:2022-11-24
  • Contact: Cai Li GUO E-mail:guocaili@bupt.edu.cn

摘要: 随着万物智联成为时代所趋,传统视频编码与压缩方法难以有效去除视频数据中的大量冗余信息,势必会降低传输效率。针对这一挑战,提出了一种面向语义通信的3D骨骼点数据信源编码与压缩方法(DMDCT)。针对骨骼点数据中的冗余问题,从语义概念出发,提出多尺度骨骼点表示方法,自适应地描述参与每个不同动作语义的骨骼点运动的状态的同时保留人体骨骼架构;引入离散余弦变换(DCT)从频域层面分离多尺度骨骼点表示的直流分量与交流分量,进一步减少了整体数据量。区别于传统通信传输原始视频数据的方式,结合语义通信只传输与高层任务相关的骨骼点数据,提高了数据传输效率。在公开数据集NTU RGB+D上以动作识别为例的实验表明,DMDCT在同等压缩率下,TOP-1准确率比同类算法提高了约5%,且仅保留10%DCT系数仍可达到74.2%的准确率,而数据量仅为原始数据量的6%。

关键词: 3D人体骨骼点数据, 视频数据压缩, 语义通信, 动作识别, 6G

Abstract: As the Internet of Everything becomes the trend of the times, traditional video coding and compression methods are difficult to remove a large amount of redundant information in video data, which will inevitably reduce transmission efficiency. To address this challenge, a semantic communication-oriented 3D skeleton data source encoding and compression method (DMDCT) is proposed. For the redundancy problem in the skeleton data, starting from the semantic concept, a multi-scale skeleton representation method is proposed, which adaptively describes the motion state of skeleton participating in each different action semantics while retaining the human skeleton structure. Discrete Cosine Transform (DCT) separates the DC and AC components represented by multi-scale skeleton points from the frequency domain level, further reducing the overall data volume. Different from the traditional communication method of transmitting original video data, combined with semantic communication, only skeleton point data related to high-level tasks is transmitted, which improves the data transmission efficiency. Experiments on the public dataset NTU RGB+D taking action recognition as an example show that, under the same compression rate, DMDCT's TOP-1 accuracy rate is about 5% higher than that of similar algorithms, and retaining only 10% of the DCT coefficients can still achieve an accuracy of 74.2%, while the data volume is only 6% of the original data volume.

Key words: 3D human skeleton data, video data compression, semantic communication, action recognition, 6G

中图分类号: