Datacenter-Network-Aware Online Load Balancing in MapReduce

作者: Yanfang Le

DOI:

关键词:

摘要: MapReduce has emerged as a powerful tool for distributed and scalable processing of voluminous data. Different from earlier heuristics in the very late reduce stage or after seeing all data, we address skew beginning data input, make no assumption about priori knowledge distribution nor require synchronized operations. We show that optimal strategy is constrained version online minimum makespan and, context where pairs with identical keys must be scheduled to same machine, propose an algorithm provable 2-competitive ratio. further suggest sample-based enhancement, which, probabilistically, achieves 3/2-competitive ratio bounded error. Examine project again, found datacenter network could bottleneck shuffle subphase MapReduce. This potentially lead poor overall performance even balanced workload thus needed addressed. Earlier studies either assume inside negligible delay infinite capacity, use hop count cost measurement. consider realistic bandwidth constraints real world networks effective solution toward near datacenter-networkaware load balancing.

参考文章(37)
Michaela Wahl, Rudolf Fleischer, Online scheduling revisited Untitled Event. pp. 202- 210 ,(2000)
Per-Åke Larson, Weipeng P. Yan, Eager Aggregation and Lazy Aggregation very large data bases. pp. 345- 357 ,(1995)
Matei Zaharia, Andy Konwinski, Anthony D. Joseph, Ion Stoica, Randy Katz, Improving MapReduce performance in heterogeneous environments operating systems design and implementation. pp. 29- 42 ,(2008) , 10.5555/1855741.1855744
John F. Rudin, R. Chandrasekaran, Improved Bounds for the Online Scheduling Problem SIAM Journal on Computing. ,vol. 32, pp. 717- 735 ,(2003) , 10.1137/S0097539702403438
Jaliya Ekanayake, Hui Li, Bingjing Zhang, Thilina Gunarathne, Seung-Hee Bae, Judy Qiu, Geoffrey Fox, Twister: a runtime for iterative MapReduce high performance distributed computing. pp. 810- 818 ,(2010) , 10.1145/1851476.1851593
Kasahara, Narita, Practical Multiprocessor Scheduling Algorithms for Efficient Parallel Processing IEEE Transactions on Computers. ,vol. 33, pp. 1023- 1029 ,(1984) , 10.1109/TC.1984.1676376
Fangfei Chen, Murali Kodialam, T. V. Lakshman, Joint scheduling of processing and Shuffle phases in MapReduce systems international conference on computer communications. pp. 1143- 1151 ,(2012) , 10.1109/INFCOM.2012.6195473
Yanfang Le, Jiangchuan Liu, Funda Ergun, Dan Wang, Online Load Balancing for MapReduce with Skewed Data Input international conference on computer communications. pp. 2004- 2012 ,(2014) , 10.1109/INFOCOM.2014.6848141
Yingyi Bu, Bill Howe, Magdalena Balazinska, Michael D. Ernst, HaLoop Proceedings of the VLDB Endowment. ,vol. 3, pp. 285- 296 ,(2010) , 10.14778/1920841.1920881
Willis Lang, Jignesh M. Patel, Energy management for MapReduce clusters Proceedings of the VLDB Endowment. ,vol. 3, pp. 129- 139 ,(2010) , 10.14778/1920841.1920862