High Performance Row-Based Hashing GPU SpGEMM

doi:10.13190/j.jbupt.2018-252

JOURNAL OF BEIJING UNIVERSITY OF POSTS AND TELECOM ›› 2019, Vol. 42 ›› Issue (3): 106-113.doi: 10.13190/j.jbupt.2018-252

• Reports • Previous Articles Next Articles

High Performance Row-Based Hashing GPU SpGEMM

TANG Yang¹, ZHAO Da-fei^2,3, HUANG Zhi-bin^2,3, DAI Zhi-tao^2,3

1. School of Science, Beijing University of Posts and Telecommunications, Beijing 100876, China;
2. Beijing Key Laboratory of Intelligent Telecommunication Software and Multimedia, Beijing University of Posts and Telecommunications, Beijing 100876, China;
3. School of Computer Science, Beijing University of Posts and Telecommunication, Beijing 100876, China

Received:2018-10-09 Online:2019-06-28 Published:2019-06-20

Abstract

Abstract: Aiming at the performance problem of general sparse matrix-matrix multiplication (SpGEMM), a graphics processing unit (GPU)-accelerate SpGEMM algorithm based on task classification and low-latency Hashing table, RBSPARSE, was presented in the paper. RBSPARSE consists of a low-cost pre-analysis method to identify the complexity of sub-tasks, and a Hashing table-based algorithm which could utilize low-latency shared memory to achieve max efficiency. By taking the load balancing issue and the memory latency issue into consideration, RBSPARSE could significantly reduce the overall time in computation. RBSparse and BHSparse are compared. BHSparse is the previous state-of-the-art algorithm for SpGEMM. The result shows that our algorithm is 3.1 times faster than BHSparse on average, and could achieve a maximum 14.49 times faster speed in the best scenario.

Key words: general sparse matrix-matrix multiplication, graphics processing unit, performance optimization, Hash table, shared memory

CLC Number:

TP391

TANG Yang, ZHAO Da-fei, HUANG Zhi-bin, DAI Zhi-tao. High Performance Row-Based Hashing GPU SpGEMM[J]. JOURNAL OF BEIJING UNIVERSITY OF POSTS AND TELECOM, 2019, 42(3): 106-113.

References

[1] Bell N, Dalton S, Olson L N.Exposing fine-grained parallelism in algebraic multigrid methods[J].SIAM Journal on Scientific Computing, 2012, 34(4):123-152.
[2] Buluç A, Gilbert J R.The combinatorial BLAS:design, implementation, and applications[J].International Journal of High Performance Computing Applications, 2011, 25(4):496-509.
[3] Yuan Tao, Huang Zhibin.Shuffle reduction based sparse matrix-vector multiplication on Kepler GPU[J].International Journal of Grid and Distributed Computing, 2016, 9(10):99-106.
[4] Greathouse J L, Daga M.Efficient sparse matrix-vector multiplication on GPUs using the CSR storage format[C]//Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis.New York:IEEE Press, 2014:769-780.
[5] 菅立恒, 易卫东.使用GPU加速无线传感器网络信道仿真[J].北京邮电大学学报, 2013, 36(2):24-27.Jian Liheng, Yi Weidong.Acceleration of simulation of radio channel in wireless sensor networks using GPU[J].Journal of Beijing University of Posts and Telecommunications, 2013, 36(2):24-27.
[6] Liu Weifeng, Vinter B.An efficient GPU general sparse matrix-matrix multiplication for irregular data[C]//2014 IEEE 28^th International Parallel and Distributed Processing Symposium.New York:IEEE Press, 2014:370-381.
[7] 黄智濒, 周锋, 马华东.自适应访问模式的缓存替换策略[J].北京邮电大学学报, 2016, 39(3):44-48.Huang Zhibin, Zhou Feng, Ma Huadong.A cache replacement policy adapting to the request access pattern[J].Journal of Beijing University of Posts and Telecommunications, 2016, 39(3):44-48.
[8] Liu Junhong, He Xin, Liu Weifeng, et al.Register-based implementation of the sparse general matrix-matrix multiplication on GPUs[J].ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming.New York:ACM, 2018:407-408.
[9] Anh P N Q, Fan Rui, Wen Yonggang.Balanced Hashing and efficient GPU sparse general matrix-matrix multiplication[C]//Proceedings of the 2016 International Conference on Supercomputing.New York:ACM, 2016:36.
[10] Dalton S, Bell N, Olson L, et al.CUSP:generic parallel algorithms for sparse matrix and graph computations:Version 0.5[EB/OL].(2015-03-13)[2018-05-30].https://cusplibrary.github.io.
[11] Batcher K E.Sorting networks and their applications[C]//Spring Joint Computer Conference.New York:ACM, 1968:307-314.
[12] Davis T A, Hu Yifan.The University of Florida sparse matrix collection[J].ACM Transactions on Mathematical Software, 2011, 38(1):1.

High Performance Row-Based Hashing GPU SpGEMM

PDF

Knowledge

Abstract

Cite this article

share this article

References

Related Articles 9

Recommended Articles

Metrics

Comments

[1]	LI Jun-feng, LI Dan, HUANG Yu-kai, CHENG Yang, LING Rui-lin. High Performance and Scalable NAT System on Commodity Platforms [J]. Journal of Beijing University of Posts and Telecommunications, 2021, 44(2): 14-19.
[2]	WANG Ying, LI Hong-lin, FEI Zi-xuan, ZHAO Hong-yu, WANG Hong. Research and Prospect of TCP Optimization in 5G Multi-Access Networks [J]. Journal of Beijing University of Posts and Telecommunications, 2019, 42(1): 1-15.
[3]	LI Ying-xue, ZHONG Shi-yuan, LEI Jing, HUANG Chun-ming, YAO Zhu-xiang. Optimization of Cooperative Spectrum Sensing Based on OFDM for Cognitive Radio Networks [J]. JOURNAL OF BEIJING UNIVERSITY OF POSTS AND TELECOM, 2015, 38(5): 96-98,103.
[4]	. A Comprehensive Load Balance Mechanism for Structured P2P Systems [J]. JOURNAL OF BEIJING UNIVERSITY OF POSTS AND TELECOM, 2012, 35(3): 87-90.
[5]	. A P2P Resource Sharing Mechanism for IPv4/IPv6 Hybrid Network [J]. JOURNAL OF BEIJING UNIVERSITY OF POSTS AND TELECOM, 2011, 34(4): 113-117.
[6]	. An Effective LoadBalancing Algorithm SDYA for Structured P2P Systems [J]. JOURNAL OF BEIJING UNIVERSITY OF POSTS AND TELECOM, 2010, 33(6): 116-120.
[7]	ZOU Dong-yao, SONG Mei-na, SONG Jun-de. A District Management Strategy for P2P Chord Model [J]. JOURNAL OF BEIJING UNIVERSITY OF POSTS AND TELECOM, 2008, 31(3): 54-58.
[8]	HAN Li, LEI Zhen-ming, LIU Fang. A New Peer-to-Peer Network Constitution Based on DHT [J]. JOURNAL OF BEIJING UNIVERSITY OF POSTS AND TELECOM, 2007, 30(1): 118-122.
[9]	CHENG Jiu-jun¹， LI Yu-hong¹， CHENG Shi-duan¹， MA Jian². The Architecture on the Mobile P2P System and the Study for the Key Technology [J]. JOURNAL OF BEIJING UNIVERSITY OF POSTS AND TELECOM, 2006, 29(4): 86-89.