Improvement of job completion time in data-intensive cloud computing applications

作者: Ibrahim Adel Ibrahim , Mostafa Bassiouni

DOI: 10.1186/S13677-019-0139-6

关键词:

摘要: Task stragglers in MapReduce jobs dramatically impede job execution of data-intensive computing cloud data centers. This impedance is due to the uneven distribution input data, heterogeneous nodes, resource contention situations, and network configurations. Data skew intermediate causes delay failures violation completion time. Data-intensive frameworks, such as or Hadoop YARN, employ HashPartitioner. partitioner may cause skew, which results straggler reducers. In this paper, we strive make YARN more efficient environments. We present, a new partitioning scheme, called balanced clusters (BDCP), handle Reduce tasks based on sampling feedback information about current processing task. Our extensive experimental show that BDCP can outperform default HashPartitioner Range partitioner. assist mitigation during reduce phase minimize time within computing.

参考文章(22)
Matei Zaharia, Andy Konwinski, Anthony D. Joseph, Ion Stoica, Randy Katz, Improving MapReduce performance in heterogeneous environments operating systems design and implementation. pp. 29- 42 ,(2008) , 10.5555/1855741.1855744
Scott Shenker, Ali Ghodsi, Ganesh Ananthanarayanan, Ion Stoica, Effective straggler mitigation: attack of the clones networked systems design and implementation. pp. 185- 198 ,(2013)
M. Al Hajj Hassan, M. Bamha, F. Loulergue, Handling Data-skew Effects in Join Operations Using MapReduce☆ international conference on conceptual structures. ,vol. 29, pp. 145- 158 ,(2014) , 10.1016/J.PROCS.2014.05.014
Xiao Qin Jiong Xie, Shu Yin, Xiaojun Ruan, Zhiyang Ding, Yun Tian, James Majors, Adam Manzanares, None, Improving MapReduce performance through data placement in heterogeneous Hadoop clusters ieee international symposium on parallel distributed processing workshops and phd forum. pp. 1- 9 ,(2010) , 10.1109/IPDPSW.2010.5470880
Changhang Lin, Wenzhong Guo, Changhui Lin, Self-Learning MapReduce Scheduler in Multi-job Environment international conference on cloud computing. pp. 610- 612 ,(2013) , 10.1109/CLOUDCOM-ASIA.2013.95
Dimitrios Karapiperis, Vassilios S. Verykios, Load-Balancing the Distance Computations in Record Linkage Sigkdd Explorations. ,vol. 17, pp. 1- 7 ,(2015) , 10.1145/2830544.2830546
Yujie Xu, Peng Zou, Wenyu Qu, Zhiyang Li, Keqiu Li, Xiaoli Cui, Sampling-Based Partitioning in MapReduce for Skewed Data chinagrid annual conference. pp. 1- 8 ,(2012) , 10.1109/CHINAGRID.2012.18
Qi Chen, Jinyu Yao, Zhen Xiao, LIBRA: Lightweight Data Skew Mitigation in MapReduce IEEE Transactions on Parallel and Distributed Systems. ,vol. 26, pp. 2520- 2533 ,(2015) , 10.1109/TPDS.2014.2350972
Wei Dai, Mostafa Bassiouni, An improved task assignment scheme for Hadoop running in the clouds Journal of Cloud Computing. ,vol. 2, pp. 23- ,(2013) , 10.1186/2192-113X-2-23
Vedaprakash Subramanian, Liqiang Wang, En-Jui Lee, Po Chen, Rapid Processing of Synthetic Seismograms Using Windows Azure Cloud ieee international conference on cloud computing technology and science. pp. 193- 200 ,(2010) , 10.1109/CLOUDCOM.2010.110