A recovery-oriented approach to dependable services: repairing past errors with system-wide undo

作者: Aaron Brown

DOI: 10.21236/ADA603322

关键词:

摘要: Motivated by the pressing need for increased dependability in corporate and Internet services perspective that effective recovery can improve as much or more than avoiding failures, we introduce a novel mechanism gives human system operators power of system-wide undo. System-wide undo allows to roll back erroneous changes service's state without losing end-user data updates, make retroactive repairs historical timeline service system, thereby quickly recover from catastrophic corruption, operator error, failed upgrades, external attacks, even when root cause catastrophe is unknown. We explore via framework based on concept spheres undo, bubbles time provide scope recoverable serve structuring tool implementing standalone services, hierarchically-composed systems, distributed interacting services. Crucially, allow us define paradoxes, inconsistencies occur an process retroactively alters has been exposed outside its containing sphere Managing paradoxes grand challenge tackle it automatically detects compensates paradoxes; our approach exploits relaxed consistency semantics already present existing interact with end-users. describe implementation We applicability assembling evaluating prototype undoable e-mail store service, analyzing what would be necessary construct online auction developing set guidelines help designers retrofit their find functionality imposes non-negligible but tolerable overhead terms both space. Using methodology develop benchmark human-assisted processes, also undo-based net positive effect dependability, providing significant improvements correctness while only slightly degrading availability.

参考文章(106)
David A. Patterson, Aaron Brown, Towards availability benchmarks: a case study of software raid systems usenix annual technical conference. pp. 22- 22 ,(2000)
David Patterson, Patricia Enriquez, Aaron Brown, Lessons from the PSTN for Dependable Computing ,(2002)
Thomas K. Landauer, Research Methods in Human-Computer Interaction Handbook of Human-Computer Interaction. pp. 905- 928 ,(1988) , 10.1016/B978-0-444-70536-5.50047-6
Eric Anderson, Dave Patterson, A Retrospective on Twelve Years of LISA Proceedings usenix large installation systems administration conference. pp. 95- 108 ,(1999)
Henrique Madeira, Jean Arlat, Karama Kanoun, A Framework for Dependability Benchmarking ,(2002)