Journal of Beijing University of Posts and Telecommunications

  • EI核心期刊

JOURNAL OF BEIJING UNIVERSITY OF POSTS AND TELECOM ›› 2013, Vol. 36 ›› Issue (4): 76-80.doi: 10.13190/jbupt.201304.76.songj

• Papers • Previous Articles     Next Articles

Load-Balanced Data Layout Approach in Data-Intensive Computing

SONG Jie, LI Tian-tian, YAN Zhen-xing, ZHU Zhi-liang   

  1. Software College, Northeastern University, Shenyang 110819, China
  • Received:2012-10-11 Online:2013-08-31 Published:2013-05-22

Abstract:

Widely used in data-intensive computing, the MapReduce model deploys computing to the data side so as to execute in parallel. On this occasion, data layout will not only affect the storage itself, but also affect the computing efficiency. Computing efficiency of node is determined by features of data stored on this node. Therefore, the study on load balancing is accordingly shifted from traditional server management or task scheduling to study of data layout as a purpose to improve parallelism. The data layout characteristics in data-intensive computing and MapReduce environment is analyzed, a load-balanced goal of data layout is proposed, and a load-balanced data layout approach in a specific environment is presented as well. The proposed data layout goal and approach are proved effective through experiments. It is shown that the proposed data layout approach can effectively improve the parallelism of MapReduce applications, thus optimizing the computing efficiency.

Key words: data-intensive computing, data layout, load balancing, MapReduce, cloud computing

CLC Number: