作者: Amirhossein Reisizadeh , Saurav Prakash , Ramtin Pedarsani , Salman Avestimehr
DOI: 10.1109/ISIT.2017.8006961
关键词:
摘要: In large-scale distributed computing clusters, such as Amazon EC2, there are several types of “system noise” that can result in major degradation performance: system failures, bottlenecks due to limited communication bandwidth, latency straggler nodes, etc. On the other hand, these systems enjoy abundance redundancy — a vast number nodes and large storage capacity. There have been recent results demonstrate impact coding for efficient utilization computation alleviate effect stragglers homogeneous clusters. this paper, we focus on general heterogeneous clusters consisting variety machines with different capabilities. We propose framework speeding up straggling servers by trading reducing computation. particular, Heterogeneous Coded Matrix Multiplication (HCMM) algorithm performing matrix multiplication over is provably asymptotically optimal. Moreover, if worker cluster n, show HCMM Θ(log n) times faster than any uncoded scheme. further provide numerical demonstrating significant speedups 49% 34% comparison “uncoded” “homogeneous coded” schemes, respectively.