Improving the reliability of commodity operating systems

作者: Michael M. Swift , Brian N. Bershad , Henry M. Levy

DOI: 10.1145/1165389.945466

关键词:

摘要: Despite decades of research in extensible operating system technology, extensions such as device drivers remain a significant cause failures. In Windows XP, for example, account 85% recently reported This paper describes Nooks, reliability subsystem that seeks to greatly enhance OS by isolating the from driver The Nooks approach is practical: rather than guaranteeing complete fault tolerance through new (and incompatible) or architecture, our goal prevent vast majority driver-caused crashes with little no change existing and code. To achieve this, isolates within lightweight protection domains inside kernel address space, where hardware software them corrupting kernel. also tracks driver's use resources hasten automatic clean-up during recovery.To prove viability approach, we implemented Linux used it fault-isolate several drivers. Our results show offers substantial increase systems, catching quickly recovering many faults would otherwise crash system. series 2000 fault-injection tests, recovered automatically 99% caused crash.While was designed drivers, techniques generalize other extensions, well. We demonstrate this kernel-mode file an in-kernel Internet service. Overall, because supports C-language runs on commodity hardware, enables automated recovery, represents step beyond specialized architectures type-safe languages required previous efforts directed at safe extensibility.

参考文章(132)
J. Chapin, M. Rosenblum, S. Devine, T. Lahiri, D. Teodosiu, A. Gupta, Hive: fault containment for shared-memory multiprocessors symposium on operating systems principles. ,vol. 29, pp. 12- 25 ,(1995) , 10.1145/224056.224059
D. Jewett, Integrity S2: a fault-tolerant Unix platform ieee international symposium on fault tolerant computing. pp. 74- ,(1991) , 10.1109/FTCSH.1995.532615
Paul Barham, Boris Dragovic, Keir Fraser, Steven Hand, Tim Harris, Alex Ho, Rolf Neugebauer, Ian Pratt, Andrew Warfield, Xen and the art of virtualization symposium on operating systems principles. ,vol. 37, pp. 164- 177 ,(2003) , 10.1145/1165389.945462
B. Randell, System structure for software fault tolerance ACM SIGPLAN Notices. ,vol. 10, pp. 437- 449 ,(1975) , 10.1145/390016.808467
D. L. Parnas, On the criteria to be used in decomposing systems into modules Communications of the ACM. ,vol. 15, pp. 1053- 1058 ,(1972) , 10.1145/361598.361623
F. J. Corbató, V. A. Vyssotsky, Introduction and overview of the multics system Proceedings of the November 30--December 1, 1965, fall joint computer conference, part I on XX - AFIPS '65 (Fall, part I). pp. 185- 196 ,(1965) , 10.1145/1463891.1463912
Torres Wilfredo, Software Fault Tolerance: A Tutorial NASA Langley Technical Report Server. ,(2000)
Emmett Witchel, Josh Cates, Krste Asanović, Mondrian memory protection Tenth international conference on architectural support for programming languages and operating systems on Proceedings of the 10th international conference on architectural support for programming languages and operating systems (ASPLOS-X) - ASPLOS '02. ,vol. 37, pp. 304- 316 ,(2002) , 10.1145/605397.605429
David Patterson, Aaron Brown, Pete Broadwell, George Candea, Mike Chen, James Cutler, Patricia Enriquez, Armando Fox, Emre Kiciman, Matthew Merzbacher, David Oppenheimer, Naveen Sastry, William Tetzlaff, Jonathan Traupman, Noah Treuhaft, None, Recovery Oriented Computing (ROC): Motivation, Definition, Techniques, and Case Studies ,(2002)
J.-C. Fabre, M. Rodriguez, J. Arlat, J.-M. Sizun, Building dependable COTS microkernel-based systems using MAFALDA pacific rim international symposium on dependable computing. pp. 85- 92 ,(2000) , 10.1109/PRDC.2000.897288