Coded computation over heterogeneous clusters

作者: Amirhossein Reisizadeh , Saurav Prakash , Ramtin Pedarsani , Salman Avestimehr

DOI: 10.1109/ISIT.2017.8006961

关键词:

摘要: In large-scale distributed computing clusters, such as Amazon EC2, there are several types of “system noise” that can result in major degradation performance: system failures, bottlenecks due to limited communication bandwidth, latency straggler nodes, etc. On the other hand, these systems enjoy abundance redundancy — a vast number nodes and large storage capacity. There have been recent results demonstrate impact coding for efficient utilization computation alleviate effect stragglers homogeneous clusters. this paper, we focus on general heterogeneous clusters consisting variety machines with different capabilities. We propose framework speeding up straggling servers by trading reducing computation. particular, Heterogeneous Coded Matrix Multiplication (HCMM) algorithm performing matrix multiplication over is provably asymptotically optimal. Moreover, if worker cluster n, show HCMM Θ(log n) times faster than any uncoded scheme. further provide numerical demonstrating significant speedups 49% 34% comparison “uncoded” “homogeneous coded” schemes, respectively.

参考文章(7)
Matei Zaharia, Andy Konwinski, Anthony D. Joseph, Ion Stoica, Randy Katz, Improving MapReduce performance in heterogeneous environments operating systems design and implementation. pp. 29- 42 ,(2008) , 10.5555/1855741.1855744
Flavio Bonomi, Rodolfo Milito, Jiang Zhu, Sateesh Addepalli, Fog computing and its role in the internet of things ieee international conference on cloud computing technology and science. pp. 13- 16 ,(2012) , 10.1145/2342509.2342513
Jeffrey Dean, Sanjay Ghemawat, MapReduce Communications of the ACM. ,vol. 51, pp. 107- 113 ,(2008) , 10.1145/1327452.1327492
Scott Shenker, Matei Zaharia, Ion Stoica, Mosharaf Chowdhury, Michael J. Franklin, Spark: cluster computing with working sets ieee international conference on cloud computing technology and science. pp. 10- 10 ,(2010)
Viveck R. Cadambe, Pulkit Grover, Sanghamitra Dutta, Short-Dot: Computing Large Linear Transforms Distributedly Using Coded Short Dot Products neural information processing systems. ,vol. 29, pp. 2100- 2108 ,(2016)
Guanfeng Liang, Ulas C. Kozat, TOFEC: Achieving Optimal Throughput-Delay Trade-off of Cloud Storage Using Erasure Codes international conference on computer communications. pp. 826- 834 ,(2014) , 10.1109/INFOCOM.2014.6848010
Qifan Pu, Ganesh Ananthanarayanan, Peter Bodik, Srikanth Kandula, Aditya Akella, Paramvir Bahl, Ion Stoica, Low Latency Geo-distributed Data Analytics Computer Communication Review. ,(2015) , 10.1145/2829988.2787505