Timely-Throughput Optimal Coded Computing over Cloud Networks

作者: Chien-Sheng Yang , Ramtin Pedarsani , A. Salman Avestimehr

DOI: 10.1145/3323679.3326528

关键词: Cloud computingMarkov modelRobustness (computer science)Data encodingComputer scienceComputationMarkov chainHigh variabilityDistributed computing

摘要: In modern distributed computing systems, unpredictable and unreliable infrastructures result in high variability of resources. Meanwhile, there is significantly increasing demand for timely event-driven services with deadline constraints. Motivated by measurements over Amazon EC2 clusters, we consider a two-state Markov model speed cloud networks. this model, each worker can be either good state or bad terms the computation speed, transition between these states modeled as chain which unknown to scheduler. We then Coded Computing framework, data possibly encoded stored at nodes order provide robustness against that may state. With requests submitted system deadlines, our goal design optimal computation-load allocation scheme encoding maximize throughput (i.e, average number tasks are accomplished before their deadline). Our main development dynamic strategy called Lagrange Estimate-and-Allocate (LEA) strategy, achieves throughput. It shown compared static LEA improves 1.4x ~ 17.5x various scenarios via simulations 1.27x 6.5x experiments clusters.

参考文章(31)
Matei Zaharia, Andy Konwinski, Anthony D. Joseph, Ion Stoica, Randy Katz, Improving MapReduce performance in heterogeneous environments operating systems design and implementation. pp. 29- 42 ,(2008) , 10.5555/1855741.1855744
Scott Shenker, Ali Ghodsi, Ganesh Ananthanarayanan, Ion Stoica, Effective straggler mitigation: attack of the clones networked systems design and implementation. pp. 185- 198 ,(2013)
François Baccelli, William A. Massey, Don Towsley, Acyclic fork-join queuing networks Journal of the ACM. ,vol. 36, pp. 615- 642 ,(1989) , 10.1145/65950.65957
Lisandro D. Dalcin, Rodrigo R. Paz, Pablo A. Kler, Alejandro Cosimo, Parallel distributed computing using Python Advances in Water Resources. ,vol. 34, pp. 1124- 1139 ,(2011) , 10.1016/J.ADVWATRES.2011.04.013
Siva Theja Maguluri, R. Srikant, Lei Ying, Stochastic models of load balancing and scheduling in cloud computing clusters international conference on computer communications. pp. 702- 710 ,(2012) , 10.1109/INFCOM.2012.6195815
Sina Lashgari, A. Salman Avestimehr, Timely Throughput of Heterogeneous Wireless Networks: Fundamental Limits and Algorithms IEEE Transactions on Information Theory. ,vol. 59, pp. 8414- 8433 ,(2013) , 10.1109/TIT.2013.2283492
E. Modiano, M.J. Neely, C.E. Rohrs, Dynamic power allocation and routing for time-varying wireless networks IEEE Journal on Selected Areas in Communications. ,vol. 23, pp. 89- 103 ,(2005) , 10.1109/JSAC.2004.837349(410)
A. Eryilmaz, R. Srikant, J.R. Perkins, Stable scheduling policies for fading wireless channels IEEE ACM Transactions on Networking. ,vol. 13, pp. 411- 424 ,(2005) , 10.1109/TNET.2004.842226
J. G. Dai, Wuqin Lin, Maximum Pressure Policies in Stochastic Processing Networks Operations Research. ,vol. 53, pp. 197- 218 ,(2005) , 10.1287/OPRE.1040.0170
L. Tassiulas, A. Ephremides, Stability properties of constrained queueing systems and scheduling policies for maximum throughput in multihop radio networks IEEE Transactions on Automatic Control. ,vol. 37, pp. 1936- 1948 ,(1992) , 10.1109/9.182479