PreFail: Programmable and Efficient Failure Testing Framework

作者: Haryadi S. Gunawi , Pallavi Joshi , Koushik Sen

DOI:

关键词:

摘要: As hardware failures are no longer rare in the era of cloud computing, reliability has become a first-class design goal today’s software systems. To ensure that software’s fault-tolerance “prevails” against failures, systems have to be tested multiple, diverse likely occur real-world. Such failure testing poses several challenges including need explore large number combinations and also by implication, debug test runs fail during testing. In this paper, we present PREFAIL, programmable efficient framework. With tester can express variety exploration policies, skip redundant fault-injection tests, run parallel, reduce time failed runs.

参考文章(19)
Mahadev Konar, Benjamin Reed, Flavio P. Junqueira, Patrick Hunt, ZooKeeper: wait-free coordination for internet-scale systems usenix annual technical conference. pp. 11- 11 ,(2010)
Junfeng Yang, Tisheng Chen, Mao Yang, Fan Long, Zhilei Xu, Haoxiang Lin, Lintao Zhang, Lidong Zhou, Xuezheng Liu, Ming Wu, MODIST: transparent model checking of unmodified distributed systems networked systems design and implementation. pp. 213- 228 ,(2009)
Remzi H. Arpaci-Dusseau, Haryadi S. Gunawi, Andrea C. Arpaci-Dusseau, Dhruba Borthakur, Joseph M. Hellerstein, Pallavi Joshi, Thanh Do, Peter Alvaro, Koushik Sen, FATE and DESTINI: a framework for cloud recovery testing networked systems design and implementation. pp. 238- 252 ,(2011) , 10.5555/1972457.1972482
Brian F. Cooper, Adam Silberstein, Erwin Tam, Raghu Ramakrishnan, Russell Sears, Benchmarking cloud serving systems with YCSB Proceedings of the 1st ACM symposium on Cloud computing - SoCC '10. pp. 143- 154 ,(2010) , 10.1145/1807128.1807152
Mike Burrows, The Chubby lock service for loosely-coupled distributed systems operating systems design and implementation. pp. 335- 350 ,(2006) , 10.5555/1298455.1298487
Junfeng Yang, Can Sar, Dawson Engler, EXPLODE: a lightweight, general system for finding serious storage system errors operating systems design and implementation. pp. 131- 146 ,(2006) , 10.5555/1298455.1298469
Kirk Glerum, Kinshuman Kinshumann, Steve Greenberg, Gabriel Aul, Vince Orgovan, Greg Nichols, David Grant, Gretchen Loihle, Galen Hunt, Debugging in the (very) large: ten years of implementation and experience symposium on operating systems principles. pp. 103- 116 ,(2009) , 10.1145/1629575.1629586
Lakshmi N. Bairavasundaram, Garth R. Goodson, Shankar Pasupathy, Jiri Schindler, An analysis of latent sector errors in disk drives measurement and modeling of computer systems. ,vol. 35, pp. 289- 300 ,(2007) , 10.1145/1254882.1254917
Avinash Lakshman, Prashant Malik, Cassandra: a decentralized structured storage system Operating Systems Review. ,vol. 44, pp. 35- 40 ,(2010) , 10.1145/1773912.1773922
Lakshmi N. Bairavasundaram, Andrea C. Arpaci-Dusseau, Remzi H. Arpaci-Dusseau, Garth R. Goodson, Bianca Schroeder, An analysis of data corruption in the storage stack ACM Transactions on Storage. ,vol. 4, pp. 8- ,(2008) , 10.1145/1416944.1416947