Datacenter-Network-Aware Online Load Balancing in MapReduce

DOI:

关键词:

摘要: MapReduce has emerged as a powerful tool for distributed and scalable processing of voluminous data. Different from earlier heuristics in the very late reduce stage or after seeing all data, we address skew beginning data input, make no assumption about priori knowledge distribution nor require synchronized operations. We show that optimal strategy is constrained version online minimum makespan and, context where pairs with identical keys must be scheduled to same machine, propose an algorithm provable 2-competitive ratio. further suggest sample-based enhancement, which, probabilistically, achieves 3/2-competitive ratio bounded error. Examine project again, found datacenter network could bottleneck shuffle subphase MapReduce. This potentially lead poor overall performance even balanced workload thus needed addressed. Earlier studies either assume inside negligible delay infinite capacity, use hop count cost measurement. consider realistic bandwidth constraints real world networks effective solution toward near datacenter-networkaware load balancing.

sfu.ca 本地加速

sfu.ca PDF 下载加速

参考文章(37)

Michaela Wahl, Rudolf Fleischer, Online scheduling revisited Untitled Event. pp. 202- 210 ,(2000)

Per-Åke Larson, Weipeng P. Yan, Eager Aggregation and Lazy Aggregation very large data bases. pp. 345- 357 ,(1995)

Matei Zaharia, Andy Konwinski, Anthony D. Joseph, Ion Stoica, Randy Katz, Improving MapReduce performance in heterogeneous environments operating systems design and implementation. pp. 29- 42 ,(2008) , 10.5555/1855741.1855744

John F. Rudin, R. Chandrasekaran, Improved Bounds for the Online Scheduling Problem SIAM Journal on Computing. ,vol. 32, pp. 717- 735 ,(2003) , 10.1137/S0097539702403438

Jaliya Ekanayake, Hui Li, Bingjing Zhang, Thilina Gunarathne, Seung-Hee Bae, Judy Qiu, Geoffrey Fox, Twister: a runtime for iterative MapReduce high performance distributed computing. pp. 810- 818 ,(2010) , 10.1145/1851476.1851593

Kasahara, Narita, Practical Multiprocessor Scheduling Algorithms for Efficient Parallel Processing IEEE Transactions on Computers. ,vol. 33, pp. 1023- 1029 ,(1984) , 10.1109/TC.1984.1676376

Fangfei Chen, Murali Kodialam, T. V. Lakshman, Joint scheduling of processing and Shuffle phases in MapReduce systems international conference on computer communications. pp. 1143- 1151 ,(2012) , 10.1109/INFCOM.2012.6195473

Yanfang Le, Jiangchuan Liu, Funda Ergun, Dan Wang, Online Load Balancing for MapReduce with Skewed Data Input international conference on computer communications. pp. 2004- 2012 ,(2014) , 10.1109/INFOCOM.2014.6848141

Yingyi Bu, Bill Howe, Magdalena Balazinska, Michael D. Ernst, HaLoop Proceedings of the VLDB Endowment. ,vol. 3, pp. 285- 296 ,(2010) , 10.14778/1920841.1920881

10.

Willis Lang, Jignesh M. Patel, Energy management for MapReduce clusters Proceedings of the VLDB Endowment. ,vol. 3, pp. 129- 139 ,(2010) , 10.14778/1920841.1920862

Datacenter-Network-Aware Online Load Balancing in MapReduce

来源期刊

我的账户

Datacenter-Network-Aware Online Load Balancing in MapReduce

来源期刊

相似文章 0

我的账户