On Datacenter-Network-Aware Load Balancing in MapReduce

作者: Yanfang Le , Feng Wang , Jiangchuan Liu , Funda Ergun

DOI: 10.1109/CLOUD.2015.71

关键词:

摘要: MapReduce has emerged as a powerful tool for distributed and scalable processing of voluminous data. For skewed data input, load balancing is necessary among the worker nodes to minimize overall finishing time, which however can incur massive movement in center network. In this paper, we first time examine problem center-network-aware shuffle sub phase MapReduce. Different from earlier studies that generally assume network inside negligible delay infinite capacity, consider traffic bottlenecks real networks by introducing constraints on available bandwidth, demonstrate corresponding be decomposed into two problems flow balancing, respectively. We show effective solutions both them, together yield complete solution towards near optimal balancing. A much simpler yet performance-wise comparable greedy algorithm also developed fast implementation practice. The effectiveness our been demonstrated synthetic public datasets.

参考文章(18)
Jian Tan, Shicong Meng, Xiaoqiao Meng, Li Zhang, Improving ReduceTask data locality for sequential MapReduce jobs 2013 Proceedings IEEE INFOCOM. pp. 1627- 1635 ,(2013) , 10.1109/INFCOM.2013.6566959
Katrina LaCurts, Shuo Deng, Ameesh Goyal, Hari Balakrishnan, Choreo: network-aware task placement for cloud applications internet measurement conference. pp. 191- 204 ,(2013) , 10.1145/2504730.2504744
Yanfang Le, Jiangchuan Liu, Funda Ergun, Dan Wang, Online Load Balancing for MapReduce with Skewed Data Input international conference on computer communications. pp. 2004- 2012 ,(2014) , 10.1109/INFOCOM.2014.6848141
Balaji Palanisamy, Aameek Singh, Ling Liu, Bhushan Jain, Purlieus: locality-aware resource allocation for MapReduce in a cloud ieee international conference on high performance computing data and analytics. pp. 58- ,(2011) , 10.1145/2063384.2063462
Farhad Shahrokhi, D. W. Matula, The maximum concurrent flow problem Journal of the ACM. ,vol. 37, pp. 318- 334 ,(1990) , 10.1145/77600.77620
Mohammad Hammoud, Majd F. Sakr, Locality-Aware Reduce Task Scheduling for MapReduce ieee international conference on cloud computing technology and science. pp. 570- 576 ,(2011) , 10.1109/CLOUDCOM.2011.87
Srikanth Kandula, Sudipta Sengupta, Albert Greenberg, Parveen Patel, Ronnie Chaiken, The nature of data center traffic Proceedings of the 9th ACM SIGCOMM conference on Internet measurement conference - IMC '09. pp. 202- 208 ,(2009) , 10.1145/1644893.1644918
Shadi Ibrahim, Hai Jin, Lu Lu, Song Wu, Bingsheng He, Li Qi, LEEN: Locality/Fairness-Aware Key Partitioning for MapReduce in the Cloud ieee international conference on cloud computing technology and science. pp. 17- 24 ,(2010) , 10.1109/CLOUDCOM.2010.25
Smriti R. Ramakrishnan, Garret Swart, Aleksey Urmanov, Balancing reducer skew in MapReduce workloads using progressive sampling Proceedings of the Third ACM Symposium on Cloud Computing - SoCC '12. pp. 16- ,(2012) , 10.1145/2391229.2391245
Mohammad Al-Fares, Alexander Loukissas, Amin Vahdat, A scalable, commodity data center network architecture ACM SIGCOMM Computer Communication Review. ,vol. 38, pp. 63- 74 ,(2008) , 10.1145/1402946.1402967