Fault tolerance in cloud computing environment: A systematic survey

作者: Moin Hasan , Major Singh Goraya

DOI: 10.1016/J.COMPIND.2018.03.027

关键词:

摘要: Abstract Fault tolerance is among the most imperative issues in cloud to deliver reliable services. It difficult implement due dynamic service infrastructure, complex configurations and various interdependencies existing cloud. Extensive research efforts are consistently being made fault Implementation of a policy not only needs specific knowledge its application domain, but comprehensive analysis background prevalent techniques also. Some recent surveys try assimilate architectures approaches proposed for environment seem be limited on some accounts. This paper gives systematic elucidation different types, their causes used The presents broad survey frameworks context basic approaches, applicability, other key features. A comparative surveyed also included paper. For first time, basis an cited present as well recently published prime surveys, quantified view applicability presented. observed that primarily checkpoint-restart replication oriented target crash faults

参考文章(91)
R. Goel, G.M. Shroff, Transparent parallel replication of logically partitioned databases ieee international conference on high performance computing data and analytics. pp. 132- 137 ,(1996) , 10.1109/HIPC.1996.565811
Jing Liu, Jiantao Zhou, Rajkumar Buyya, None, Software Rejuvenation Based Fault Tolerance Scheme for Cloud Applications international conference on cloud computing. pp. 1115- 1118 ,(2015) , 10.1109/CLOUD.2015.164
Ming Zhao, Francois D'Ugard, Kevin A. Kwiat, Charles A. Kamhoua, Multi-level VM replication based survivability for mission-critical cloud computing integrated network management. pp. 1351- 1356 ,(2015) , 10.1109/INM.2015.7140494
Graham E. Fagg, Jack J. Dongarra, FT-MPI: Fault Tolerant MPI, Supporting Dynamic Applications in a Dynamic World european pvm mpi users group meeting on recent advances in parallel virtual machine and message passing interface. pp. 346- 353 ,(2000) , 10.1007/3-540-45255-9_47
Mehdi Nazari Cheraghlou, Ahmad Khadem-Zadeh, Majid Haghparast, A survey of fault tolerance architecture in cloud computing Journal of Network and Computer Applications. ,vol. 61, pp. 81- 92 ,(2016) , 10.1016/J.JNCA.2015.10.004
Jiajun Cao, Matthieu Simonin, Gene Cooperman, Christine Morin, Checkpointing as a service in heterogeneous cloud environments ieee acm international symposium cluster cloud and grid computing. pp. 61- 70 ,(2015) , 10.1109/CCGRID.2015.160
Liming Chen, A. Avizienis, N-VERSION PROGRAMMINC: A FAULT-TOLERANCE APPROACH TO RELlABlLlTY OF SOFTWARE OPERATlON ieee international symposium on fault tolerant computing. pp. 113- ,(1995) , 10.1109/FTCSH.1995.532621
Waseem Ahmed, Yong Wei Wu, A survey on reliability in distributed systems Journal of Computer and System Sciences. ,vol. 79, pp. 1243- 1255 ,(2013) , 10.1016/J.JCSS.2013.02.006
Bogdan Nicolae, Gabriel Antoniu, Luc Bougé, Diana Moise, Alexandra Carpen-Amarie, BlobSeer: Next-generation data management for large scale infrastructures Journal of Parallel and Distributed Computing. ,vol. 71, pp. 169- 184 ,(2011) , 10.1016/J.JPDC.2010.08.004
Deepak Poola, Kotagiri Ramamohanarao, Rajkumar Buyya, Fault-tolerant Workflow Scheduling using Spot Instances on Clouds international conference on conceptual structures. ,vol. 29, pp. 523- 533 ,(2014) , 10.1016/J.PROCS.2014.05.047