北京邮电大学学报

  • EI核心期刊

北京邮电大学学报 ›› 2018, Vol. 41 ›› Issue (4): 37-43.doi: 10.13190/j.jbupt.2017-227

• 论文 • 上一篇    下一篇

HEVC运动估计中SAD算法的动态可重构实现

蒋林1, 武鑫2, 崔继兴3, 谢晓燕3, 山蕊2   

  1. 1. 西安科技大学 集成电路设计实验室, 西安 710054;
    2. 西安邮电大学 电子工程学院, 西安 710121;
    3. 西安邮电大学 计算机学院, 西安 710121
  • 收稿日期:2017-11-16 出版日期:2018-08-28 发布日期:2018-10-09
  • 作者简介:蒋林(1970-),男,教授,E-mail:jl@xupt.edu.cn.
  • 基金资助:
    国家自然科学基金项目(61772417,61834005,61602377,61272120);陕西省科技统筹创新工程计划项目(2016KTZDGY02-040-2);陕西省重点科技攻关计划项目(207KY-060)

Dynamic Reconfigurable Implementation of SAD Algorithm in HEVC Motion Estimation

JIANG Lin1, WU Xin2, CUI Ji-xing3, XIE Xiao-yan3, SHAN Rui2   

  1. 1. Integrated Circuit Design Laboratory, Xi'an University of Science and Technology, Xi'an 710054, China;
    2. School of Electronic Engineering, Xi'an University of Posts and Telecommunications, Xi'an 710121, China;
    3. School of Computer Science and Technology, Xi'an University of Posts and Telecommunications, Xi'an 710121, China
  • Received:2017-11-16 Online:2018-08-28 Published:2018-10-09

摘要: 高效视频编码(HEVC)标准中引入的不对称分割模式导致运动估计算法中绝对差值和(SAD)运算量成倍增加.为了提高运动估计算法的执行效率,方便用户进行自主选择,设计了同时支持不对称分割模式开启和关闭2种执行模式以及执行模式间自由切换的可重构阵列结构.为了满足用户要求编码速度的同时,最大限度地利用可重构阵列处理器的资源,在阵列结构为16×16个处理元中通过加载16×8、16×4以及16×2个处理元的指令来进行阵列规模的动态重构,采用指令下发的方式将不同的指令发送到对应处理元进行相应配置.实验结果表明,所提出的可重构实现方式在硬件资源占用量接近条件下,相较于流水化实现处理时间减少了约35%,吞吐量提高了约0.4倍.该实现具有较高的执行效率,能够进行执行模式与阵列规模的切换,具有较好的灵活性.

关键词: 高效视频编码, 绝对差值和, 可重构阵列结构, 非对称分割

Abstract: The asymmetric partitioning mode introduced in the high efficiency video coding (HEVC) standard results in a double increase of the sum of absolute difference (SAD) operation amount in the motion estimation algorithm. In order to improve the efficiency of motion estimation algorithm, it is convenient for users to choose independently, a reconfigurable array structure is designed which supports both the opening and closing of an asymmetric partitioning mode and the free switching between execution modes. In order to satisfy the user's requirement for coding speed, and maximize the use of the resources of the reconfigurable array processor, 16×8, 16×4, and 16×2 processing elements are loaded in an array structure of 16×16 processing elements. The instruction is used to dynamically reconfigure the array size, and different instructions are sent to corresponding processing elements for corresponding configuration by means of the instruction issuance manner. The experimental results show that the proposed reconfigurable implementation approach reduces the processing time by about 35% and the throughput is improved by about 0.4 times compared with the streamlining under the condition that the hardware resource occupancy closely. The implementation has high execution efficiency, and can switch between execution mode and array size, and has better flexibility.

Key words: high efficiency video coding, sum of absolute difference, reconfigurable, asymmetric method partition

中图分类号: