北京邮电大学学报

  • EI核心期刊

北京邮电大学学报 ›› 2019, Vol. 42 ›› Issue (3): 106-113.doi: 10.13190/j.jbupt.2018-252

• 研究报告 • 上一篇    下一篇

高性能行任务散列法GPU一般稀疏矩阵-矩阵乘法

汤洋1, 赵达非2,3, 黄智濒2,3, 戴志涛2,3   

  1. 1. 北京邮电大学 理学院, 北京 100876;
    2. 北京邮电大学 智能通信软件与多媒体北京市重点实验室, 北京 100876;
    3. 北京邮电大学 计算机学院, 北京 100876
  • 收稿日期:2018-10-09 出版日期:2019-06-28 发布日期:2019-06-20
  • 作者简介:汤洋(1997-),男,硕士生;黄智濒(1978-),男,讲师,硕士生导师,E-mail:huangzb@bupt.edu.cn.
  • 基金资助:
    中央高校基本科研业务费专项资金项目(2017RC42);IBMSUR项目(IA2016010);提升政府治理能力大数据应用技术国家工程实验室重点支持项目;中国博士后科学基金面上项目(2014M550662)

High Performance Row-Based Hashing GPU SpGEMM

TANG Yang1, ZHAO Da-fei2,3, HUANG Zhi-bin2,3, DAI Zhi-tao2,3   

  1. 1. School of Science, Beijing University of Posts and Telecommunications, Beijing 100876, China;
    2. Beijing Key Laboratory of Intelligent Telecommunication Software and Multimedia, Beijing University of Posts and Telecommunications, Beijing 100876, China;
    3. School of Computer Science, Beijing University of Posts and Telecommunication, Beijing 100876, China
  • Received:2018-10-09 Online:2019-06-28 Published:2019-06-20

摘要: 针对一般稀疏矩阵-矩阵乘法(SpGEMM)的性能问题,提出了一种基于任务分类和低延迟散列表的图形处理器上的加速SpGEMM算法RBSPARSE.该算法由一种低成本子任务复杂度预分析方法和一种低延迟共享内存上的散列表的方法组成,以达到最大效率.通过解决负载均衡和内存延迟问题,RBSPARSE可以显著减少计算的总时间.比较了RBSparse和BHSparse,前者是最快的SpGEMM算法,结果表明RBSparse的性能是BHSparse的平均3.1倍,在最佳情况下可达到14.49倍.

关键词: 稀疏矩阵-矩阵乘法, 图形处理器, 性能优化, 散列表, 共享内存

Abstract: Aiming at the performance problem of general sparse matrix-matrix multiplication (SpGEMM), a graphics processing unit (GPU)-accelerate SpGEMM algorithm based on task classification and low-latency Hashing table, RBSPARSE, was presented in the paper. RBSPARSE consists of a low-cost pre-analysis method to identify the complexity of sub-tasks, and a Hashing table-based algorithm which could utilize low-latency shared memory to achieve max efficiency. By taking the load balancing issue and the memory latency issue into consideration, RBSPARSE could significantly reduce the overall time in computation. RBSparse and BHSparse are compared. BHSparse is the previous state-of-the-art algorithm for SpGEMM. The result shows that our algorithm is 3.1 times faster than BHSparse on average, and could achieve a maximum 14.49 times faster speed in the best scenario.

Key words: general sparse matrix-matrix multiplication, graphics processing unit, performance optimization, Hash table, shared memory

中图分类号: