Checkpointing Algorithms for Fault-Tolerant Execution of Large-Scale Distributed Applications in Cloud

作者: Priti Kumari , Parmeet Kaur

DOI: 10.1007/S11277-020-07949-0

关键词:

摘要: Cloud computing provides infinite resources and a suitable environment for the execution of large scale applications. However, it is also susceptible to frequent failures which can affect users as well service providers adversely. Therefore, fault tolerance techniques are necessary reliable applications in cloud. This work presents checkpointing based protocols two types distributed The first kind Bags Tasks (BoT) where an application comprises set independent tasks that do not communicate with each other during execution. Hence, uncoordinated algorithm proposed BoT Subsequently, we consider composed multiple dependent on due inter-task message passing. An logging protocol presented this type utilize storage at edge switches data center reduce bandwidth consumption saving checkpoints logs. Simulation results have demonstrated provide increased rate successful recoveries from cause lower resource overhead than contemporary related schemes.

参考文章(27)
Awadhesh Kumar Singh, Parmeet Kaur, Log Based Recovery with Low Overhead for Large Mobile Computing Systems International Conference on Advances in Communication, Network, and Computing. pp. 637- 642 ,(2011) , 10.1007/978-3-642-19542-6_125
Ajay D. Kshemkalyani, Mukesh Singhal, Distributed Computing: Principles, Algorithms, and Systems Cambridge University Press. ,(2008) , 10.1017/CBO9780511805318
Rajkumar Buyya, Chee Shin Yeo, Srikumar Venugopal, James Broberg, Ivona Brandic, None, Cloud computing and emerging IT platforms: Vision, hype, and reality for delivering computing as the 5th utility Future Generation Computer Systems. ,vol. 25, pp. 599- 616 ,(2009) , 10.1016/J.FUTURE.2008.12.001
Fabricio AB da Silva, Hermes Senger, Scalability limits of Bag-of-Tasks applications running on hierarchical platforms Journal of Parallel and Distributed Computing. ,vol. 71, pp. 788- 801 ,(2011) , 10.1016/J.JPDC.2011.01.002
Parmeet Kaur Jaggi, Awadhesh Kumar Singh, Movement-Based Checkpointing and Message Logging for Recovery in MANETs Wireless Personal Communications. ,vol. 83, pp. 1971- 1993 ,(2015) , 10.1007/S11277-015-2498-8
Inigo Goiri, Ferran Julia, Jordi Guitart, Jordi Torres, Checkpoint-based fault-tolerant infrastructure for virtualized service providers network operations and management symposium. pp. 455- 462 ,(2010) , 10.1109/NOMS.2010.5488493
Dong Liu, A fault-tolerant architecture for ROIA in cloud ambient intelligence. ,vol. 6, pp. 587- 595 ,(2015) , 10.1007/S12652-014-0220-4
Nosayba El-Sayed, Bianca Schroeder, Understanding Practical Tradeoffs in HPC Checkpoint-Scheduling Policies IEEE Transactions on Dependable and Secure Computing. ,vol. 15, pp. 336- 350 ,(2018) , 10.1109/TDSC.2016.2548463
Santosh Kumar, RH Goudar, None, Cloud Computing – Research Issues, Challenges, Architecture, Platforms and Applications: A Survey International Journal of Future Computer and Communication. pp. 356- 360 ,(2012) , 10.7763/IJFCC.2012.V1.95