北京邮电大学学报

  • EI核心期刊

北京邮电大学学报 ›› 2020, Vol. 43 ›› Issue (2): 129-134.doi: 10.13190/j.jbupt.2019-071

• 研究报告 • 上一篇    

一种基于ResNet网络特征的视觉目标跟踪算法

马素刚1,2, 赵祥模1, 侯志强2, 王忠民2,3, 孙韩林2   

  1. 1. 长安大学 信息工程学院, 西安 710064;
    2. 西安邮电大学 计算机学院, 西安 710121;
    3. 西安邮电大学 陕西省网络数据分析与智能处理重点实验室, 西安 710121
  • 收稿日期:2019-04-30 发布日期:2020-04-28
  • 作者简介:马素刚(1982-),男,高级工程师,E-mail:msg@xupt.edu.cn.
  • 基金资助:
    国家自然科学基金项目(61571458,61473309);陕西省重点研发计划项目(2018ZDCXL-GY-04-02);陕西省教育厅专项科研计划项目(17JK0696);西安市科技计划项目(GXYD17.17)

A Visual Object Tracking Algorithm Based on Features Extracted by Deep Residual Network

MA Su-gang1,2, ZHAO Xiang-mo1, HOU Zhi-qiang2, WANG Zhong-min2,3, SUN Han-lin2   

  1. 1. School of Information Engineering, Chang'an University, Xi'an 710064, China;
    2. School of Computer Science and Technology, Xi'an University of Posts and Telecommunications, Xi'an 710121, China;
    3. Shanxi Key Laboratory of Network Data Analysis and Intelligent Processing, Xi'an University of Posts and Telecommunications, Xi'an 710121, China
  • Received:2019-04-30 Published:2020-04-28

摘要: 针对复杂场景下目标容易丢失的问题,提出了一种基于深度残差网络(ResNet)特征的尺度自适应视觉目标跟踪算法.首先,通过ResNet提取图像感兴趣区域的多层深度特征,考虑到修正线性单元(ReLU)激活函数对目标特征的抑制作用,在ReLU函数之前选取用于提取目标特征的卷积层;然后,在提取的多层特征上分别构建基于核相关滤波的位置滤波器,并对得到的多个响应图进行加权融合,选取响应值最大的点即为目标中心位置.目标位置确定后,对目标进行多个尺度采样,分别提取不同尺度图像的方向梯度直方图(fHOG)特征,在此基础上构建尺度相关滤波器,从而实现对目标尺度的准确估计.在视频集OTB100中与其他6种相关算法进行了比较,实验结果表明,所提算法取得了较高的跟踪成功率和精确度,能够较好地适应目标的尺度变化、背景干扰等复杂场景.

关键词: 视觉目标跟踪, 深度残差网络, 核相关滤波, 深度学习, 尺度估计

Abstract: Because the objects are easy to be lost in complex scenes, a scale adaptive visual object tracking algorithm based on deep residual network (ResNet) features is proposed. Firstly, the ResNet is used to extract the multi-layer deep features of the image region of interest. Considering the restraining effect of rectified linear units (ReLU) activation function on target features, only the convolutional layers before ReLU function are selected. Secondly, the translation filters based on kernelized correlation filter are constructed in the extracted multi-layer features, and then the weighted fusion of the multiple response maps is carried out to obtain the target position with the largest response value. After the target location is determined, the target is sampled at multiple scales, and the felzenszwalb histogram of oriented gradients (fHOG) features of different scale images are extracted separately. On this basis, a scale correlation filter is constructed to estimate the target scale accurately. Comparing with six related algorithms in OTB100, an experiment is carried. It is shown that the proposed algorithm achieves high tracking success rate and accuracy, and can adapt to scale variation, background clutter and other complex scenes.

Key words: visual object tracking, deep residual network, kernelized correlation filter, deep learning, scale estimation

中图分类号: