作者: Zhengchun Liu , Rajkumar Kettimuthu , Prasanna Balaprakash , Nageswara S. V. Rao , Ian Foster
DOI: 10.1007/978-3-030-19945-6_5
关键词:
摘要: Wide-area data transfer is central to geographically distributed scientific workflows. Faster delivery of important for these Predictability equally (or even more) important. With the goal providing a reasonably accurate estimate time improve resource allocation & scheduling workflows and enable end-to-end optimization, we apply machine learning methods develop predictive models times over variety wide area networks. To build evaluate models, use 201,388 transfers, involving 759 million files totaling 9 PB transferred, 115 heavily used source-destination pairs (“edges”) between 135 unique endpoints. We different retraining frequencies window size history data. In best case, resulting have median prediction error \(\le \)21% 50% edges, \)32% 75% edges. present detailed analysis results that provides insights into cause some high errors. envision performance predictor will be informative geo-distributed The also suggest obvious directions both further service optimization.