FailViz: A Tool for Visualizing Fault Injection Experiments in Distributed Systems

作者: Domenico Cotroneo , Luigi De Simone , Pietro Liguori , Roberto Natella , Nematollah Bidokhti

DOI: 10.1109/EDCC.2019.00036

关键词: Context (computing)VisualizationAnomaly detectionTask (project management)Fault injectionPoint (geometry)Real-time computingComputer science

摘要: The analysis of fault injection experiments can be a cumbersome task. These generate large volumes data (e.g., message traces), which human analyst needs to inspect understand the behavior system under failure. This paper introduces FailViz tool for visualizing experiments, points out relevant events interpreting failures. We also present motivating example in context OpenStack, and point future research directions.

参考文章(21)
Mike Y Chen, Emre Kiciman, Eugene Fratkin, Armando Fox, Eric Brewer, None, Pinpoint: problem determination in large, dynamic Internet services dependable systems and networks. pp. 595- 604 ,(2002) , 10.1109/DSN.2002.1029005
Mei-Chen Hsueh, T.K. Tsai, R.K. Iyer, Fault injection techniques and tools IEEE Computer. ,vol. 30, pp. 75- 82 ,(1997) , 10.1109/2.585157
K.R. Joshi, W.H. Sanders, M.A. Hiltunen, R.D. Schlichting, Automatic Recovery Using Bounded Partially Observable Markov Decision Processes dependable systems and networks. pp. 445- 456 ,(2006) , 10.1109/DSN.2006.16
Roberto Natella, Domenico Cotroneo, Emulation of Transient Software Faults for Dependability Assessment: A Case Study european dependable computing conference. pp. 23- 32 ,(2010) , 10.1109/EDCC.2010.13
Thorsten Piper, Stefan Winter, Neeraj Suri, Thomas E. Fuhrman, On the Effective Use of Fault Injection for the Assessment of AUTOSAR Safety Mechanisms european dependable computing conference. pp. 85- 96 ,(2015) , 10.1109/EDCC.2015.14
Roberto Natella, Domenico Cotroneo, Henrique S. Madeira, Assessing Dependability with Software Fault Injection: A Survey ACM Computing Surveys. ,vol. 48, pp. 44- ,(2016) , 10.1145/2841425
Tanakorn Leesatapornwongsa, Jeffrey F. Lukman, Shan Lu, Haryadi S. Gunawi, TaxDC: A Taxonomy of Non-Deterministic Concurrency Bugs in Datacenter Distributed Systems architectural support for programming languages and operating systems. ,vol. 51, pp. 517- 530 ,(2016) , 10.1145/2872362.2872374
Ivan Beschastnikh, Patty Wang, Yuriy Brun, Michael D. Ernst, Debugging distributed systems Communications of The ACM. ,vol. 59, pp. 32- 37 ,(2016) , 10.1145/2909480
Peter Garraghan, Renyu Yang, Zhenyu Wen, Alexander Romanovsky, Jie Xu, Rajkumar Buyya, Rajiv Ranjan, None, Emergent Failures: Rethinking Cloud Reliability at Scale IEEE Cloud Computing. ,vol. 5, pp. 12- 21 ,(2018) , 10.1109/MCC.2018.053711662
Domenico Cotroneo, Luigi De Simone, Alfonso Di Martino, Pietro Liguori, Roberto Natella, Enhancing the Analysis of Error Propagation and Failure Modes in Cloud Systems international symposium on software reliability engineering. pp. 140- 141 ,(2018) , 10.1109/ISSREW.2018.00-13