Fine-Grained Characterization of Faults Causing Long Latency Crashes in Programs

作者: Guanpeng Li , Qining Lu , Karthik Pattabiraman

DOI: 10.1109/DSN.2015.36

关键词: Long latencyProgram codeReal-time computingStatic analysisFault injectionSoftware fault toleranceLatency (engineering)Computer scienceCrashSoftware

摘要: As the rate of transient hardware faults increases, researchers have investigated software techniques to tolerate these faults. An important class are those that cause long- latency crashes (LLCs), or can persist for a long time in program before causing it crash. In this paper, we develop technique automatically find locations where LLC originate so be protected bound program's crash latency. We first identify code patterns responsible majority through an empirical study. then build CRASHFINDER, tool finds by statically searching patterns, and refining static analysis results with dynamic selective fault injection-based approach. CRASHFINDER achieve average 9.29 orders magnitude reduction more than 90% program, compared exhaustive injection techniques, has no false-positives.

参考文章(32)
Valentin Robert, Xavier Leroy, A formally-verified alias analysis certified programs and proofs. ,vol. 7679, pp. 11- 26 ,(2012) , 10.1007/978-3-642-35308-6_5
S. Chandra, P.M. Chen, How fail-stop are faulty programs? ieee international symposium on fault tolerant computing. pp. 240- 249 ,(1998) , 10.1109/FTCS.1998.689475
Jiesheng Wei, Anna Thomas, Guanpeng Li, Karthik Pattabiraman, Quantifying the Accuracy of High-Level Fault Injection Techniques for Hardware Faults dependable systems and networks. pp. 375- 382 ,(2014) , 10.1109/DSN.2014.2
Anna Thomas, Karthik Pattabiraman, Error detector placement for soft computation dependable systems and networks. pp. 1- 12 ,(2013) , 10.1109/DSN.2013.6575353
Siva Kumar Sastry Hari, Sarita V. Adve, Helia Naeimi, Low-cost program-level detectors for reducing silent data corruptions dependable systems and networks. pp. 1- 12 ,(2012) , 10.1109/DSN.2012.6263960
G.A. Reis, J. Chang, N. Vachharajani, R. Rangan, D.I. August, SWIFT: Software Implemented Fault Tolerance symposium on code generation and optimization. pp. 243- 254 ,(2005) , 10.1109/CGO.2005.34
Cristian Constantinescu, Intermittent faults and effects on reliability of integrated circuits reliability and maintainability symposium. pp. 370- 374 ,(2008) , 10.1109/RAMS.2008.4925824
Song Liu, Karthik Pattabiraman, Thomas Moscibroda, Benjamin G. Zorn, Flikker Proceedings of the sixteenth international conference on Architectural support for programming languages and operating systems - ASPLOS '11. ,vol. 47, pp. 213- 224 ,(2011) , 10.1145/1950365.1950391
S. Chandra, P.M. Chen, The impact of recovery mechanisms on the likelihood of saving corrupted state international symposium on software reliability engineering. pp. 91- 101 ,(2002) , 10.1109/ISSRE.2002.1173219
Anna Lanzaro, Roberto Natella, Stefan Winter, Domenico Cotroneo, Neeraj Suri, An empirical study of injected versus actual interface errors Proceedings of the 2014 International Symposium on Software Testing and Analysis - ISSTA 2014. pp. 397- 408 ,(2014) , 10.1145/2610384.2610418