Using Machine Learning Ensemble Methods to Predict Execution Time of e-Science Workflows in Heterogeneous Distributed Systems

作者: Farrukh Nadeem , Daniyal Alghazzawi , Abdulfattah Mashat , Khalid Faqeeh , Abdullah Almalaise

DOI: 10.1109/ACCESS.2019.2899985

关键词: Task analysisComputer sciencee-ScienceWorkflowStructure (mathematical logic)Distributed computingEnsemble learningGridCloud computing

摘要: Effective planning and optimized execution of the e-Science workflows in distributed systems, such as Grid, need predictions times workflows. However, predicting heterogeneous systems is a challenging job due to complex structure workflows, variations input problem-sizes, dynamic nature shared resources. To this end, we propose two novel workflow time-prediction methods based on machine learning ensemble models. In paper, showcase our approach for different real Grid environments. Our can effectively predict time scientific applications various problem sizes, sites, runtime We characterized performance using attributes that define well environment. Contrary common ensembles, employed three strong learners, which balance weaknesses each other by their strengths model times. The proposed have been thoroughly evaluated real-world e-science applications. experimental results demonstrated multi-model models significantly decrease prediction error (by 50%, average) compared with radial basis function neural network, local learning, templates. also be applied similar effectiveness without any major modification environments, Cloud.

参考文章(62)
Bernd Mohr, Felix Wolf, KOJAK – A Tool Set for Automatic Performance Analysis of Parallel Programs european conference on parallel processing. pp. 1301- 1304 ,(2003) , 10.1007/978-3-540-45209-6_177
Thomas Fahringer, Radu Prodan, Rubing Duan, Jüurgen Hofer, Farrukh Nadeem, Francesco Nerieri, Stefan Podlipnig, Jun Qin, Mumtaz Siddiqui, Hong-Linh Truong, Alex Villazon, Marek Wieczorek, ASKALON: A Development and Grid Computing Environment for Scientific Workflows Workflows for e-Science, Scientific Workflows for Grids. pp. 450- 471 ,(2007) , 10.1007/978-1-84628-757-2_27
Christopher M. Bishop, Pattern Recognition and Machine Learning (Information Science and Statistics) Springer-Verlag New York, Inc.. ,(2006)
Alfredo Tirado-Ramos, George Tsouloupas, Marios Dikaiakos, Peter Sloot, Grid Resource Selection by Application Benchmarking for Computational Haemodynamics Applications Lecture Notes in Computer Science. pp. 534- 543 ,(2005) , 10.1007/11428831_66
E. Gelenbe, E. Montagne, R. Suros, C. M. Woodside, A performance model of block structured parallel programs Proceedings of the international workshop on Parallel algorithms & architectures. pp. 127- 138 ,(1986)
Engin Ipek, Bronis R. de Supinski, Martin Schulz, Sally A. McKee, An Approach to Performance Prediction for Parallel Applications Euro-Par 2005 Parallel Processing. pp. 196- 205 ,(2005) , 10.1007/11549468_24
J. Brehm, P.H. Worley, Performance prediction for complex parallel applications international parallel processing symposium. pp. 187- 191 ,(1997) , 10.2172/467122
Laura Carrington, Allan Snavely, Nicole Wolter, A performance prediction framework for scientific applications Future Generation Computer Systems. ,vol. 22, pp. 336- 346 ,(2006) , 10.1016/J.FUTURE.2004.11.019
Tudor Miu, Paolo Missier, Predicting the Execution Time of Workflow Activities Based on Their Input Features ieee international conference on high performance computing data and analytics. pp. 64- 72 ,(2012) , 10.1109/SC.COMPANION.2012.21
Alceu S Britto Jr, Robert Sabourin, Luiz ES Oliveira, Dynamic selection of classifiers-A comprehensive review Pattern Recognition. ,vol. 47, pp. 3665- 3680 ,(2014) , 10.1016/J.PATCOG.2014.05.003