北京邮电大学学报

  • EI核心期刊

北京邮电大学学报 ›› 2015, Vol. 38 ›› Issue (6): 34-38.doi: 10.13190/j.jbupt.2015.06.008

• 论文 • 上一篇    下一篇

HDFS平台上以能效为考量的小文件合并

于俊洋1,2, 胡志刚1, 刘秀磊3   

  1. 1. 中南大学 软件学院, 长沙 410075;
    2. 河南大学 软件学院, 河南 开封 475000;
    3. 北京信息科技大学 计算机学院, 北京 100101
  • 收稿日期:2015-01-10 出版日期:2015-12-28 发布日期:2015-12-01
  • 作者简介:于俊洋(1982—),男,讲师,E-mail:jyyu@henu.edu.cn.
  • 基金资助:

    国家自然科学基金项目(61272148,61301136);高等学校博士学科点专项科研基金项目(20120162110061,20120162120091)

Smallfiles on HDFS Merging based on the Energy Efficiency

YU Jun-yang1,2, HU Zhi-gang1, LIU Xiu-lei3   

  1. 1. Software School, Central South University, Changsha 410075, China;
    2. Software School, Henan University, Henan Kaifeng 475000, China;
    3. Computer School, Beijing Information Science and Technology University, Bejing 100101, China
  • Received:2015-01-10 Online:2015-12-28 Published:2015-12-01

摘要:

为了解决Hadoop分布式文件系统(HDFS)平台上小文件的存在带来MapReduce程序运行能耗成本偏高问题,建立Hadoop节点集群的能耗模型进行分析推导,证明了在Hadoop平台上,存在能使程序运行能耗成本最低的最优文件大小,并在此基础上结合经济学边际分析理论提出一种基于能耗成本和访问成本考虑的最优文件大小判定策略. 此策略可以对存放在HDFS上的小文件合并进行效益计算,将小文件合并为成本最优文件大小以获得最佳收益. 通过实验证明了能效最优数据块大小的存在,并通过实验证明了成本和效益相结合利用边际分析理论来确定数据块大小的合理性和有效性.

关键词: 云计算, Hadaop分布式文件系统, Hadoop, 能效, 边际分析

Abstract:

The map reduce program operated on Hadoop distributed file system (HDFS) has a high-energy-cost problem caused by existence of small files. In order to solve this problem, the article established a new energy model of Hadoop node cluster to analyze data then proved that there exists the optimal file size on Hadoop which can reduce the energy cost of program operation to the lowest level, and based on the above data and the margin analysis theory, a judging strategy was put forward, which can find the optimal file size from the angle of energy cost and visit cost. This strategy can merge the small files on HDFS to the optimal file size according to the cost efficiency, so to get the best benefit. The existence of optimal sized data block was proved by examination, and the reasonability and validity of identifying the data block size by the combination of cost and efficiency under the margin analysis theory are proved as well by examination.

Key words: cloud computing, Hadoop distributed file system, Hadoop, energy efficiency, marginal analysis

中图分类号: