Building a Wide-Area File Transfer Performance Predictor: An Empirical Study

作者: Zhengchun Liu , Rajkumar Kettimuthu , Prasanna Balaprakash , Nageswara S. V. Rao , Ian Foster

DOI: 10.1007/978-3-030-19945-6_5

关键词:

摘要: Wide-area data transfer is central to geographically distributed scientific workflows. Faster delivery of important for these Predictability equally (or even more) important. With the goal providing a reasonably accurate estimate time improve resource allocation & scheduling workflows and enable end-to-end optimization, we apply machine learning methods develop predictive models times over variety wide area networks. To build evaluate models, use 201,388 transfers, involving 759 million files totaling 9 PB transferred, 115 heavily used source-destination pairs (“edges”) between 135 unique endpoints. We different retraining frequencies window size history data. In best case, resulting have median prediction error \(\le \)21% 50% edges, \)32% 75% edges. present detailed analysis results that provides insights into cause some high errors. envision performance predictor will be informative geo-distributed The also suggest obvious directions both further service optimization.

参考文章(50)
Hadrien Hours, Ernst Biersack, Patrick Loiseau, A Causal Approach to the Study of TCP Performance ACM Transactions on Intelligent Systems and Technology. ,vol. 7, pp. 25- ,(2015) , 10.1145/2770878
Brian Tierney, William Johnston, Brian Crowley, Gary Hoo, Chris Brooks, Dan Gunter, The NetLogger Methodology for High Performance Distributed Systems Performance Analysis Lawrence Berkeley National Laboratory. ,(1999) , 10.2172/764331
S. Vazhkudai, J.M. Schopf, I. Foster, Predicting the performance of wide area data transfers international parallel and distributed processing symposium. pp. 270- ,(2002) , 10.1109/IPDPS.2002.1015510
Jerome H. Friedman, Greedy function approximation: A gradient boosting machine. Annals of Statistics. ,vol. 29, pp. 1189- 1232 ,(2001) , 10.1214/AOS/1013203451
Tin Kam Ho, Random decision forests international conference on document analysis and recognition. ,vol. 1, pp. 278- 282 ,(1995) , 10.1109/ICDAR.1995.598994
Bill Allcock, Joe Bester, John Bresnahan, Ann L. Chervenak, Ian Foster, Carl Kesselman, Sam Meder, Veronika Nefedova, Darcy Quesnel, Steven Tuecke, Data management and transfer in high-performance computational grid environments parallel computing. ,vol. 28, pp. 749- 771 ,(2002) , 10.1016/S0167-8191(02)00094-7
Yoav Freund, Robert E Schapire, A Decision-Theoretic Generalization of On-Line Learning and an Application to Boosting conference on learning theory. ,vol. 55, pp. 119- 139 ,(1997) , 10.1006/JCSS.1997.1504
Arthur E. Hoerl, Robert W. Kennard, Ridge Regression: Applications to Nonorthogonal Problems Technometrics. ,vol. 12, pp. 69- 82 ,(1970) , 10.1080/00401706.1970.10488635
Syed Munir Hussain Shah, Altaf ur Rehman, Abdul Nasir Khan, Mehtab Arif Shah, None, TCP throughput estimation: A new neural networks model international conference on emerging technologies. pp. 94- 98 ,(2007) , 10.1109/ICET.2007.4516323
JangYoung Kim, Esma Yildirim, Tevfik Kosar, A Highly-Accurate and Low-Overhead Prediction Model for Transfer Throughput Optimization ieee international conference on high performance computing data and analytics. ,vol. 18, pp. 787- 795 ,(2012) , 10.1109/SC.COMPANION.2012.109