Virtual lockstep for fault tolerance and architectural vulnerability analysis

作者: Renato Figueiredo , Casey M. Jeffery

DOI:

关键词:

摘要: This dissertation presents a flexible technique that can be applied to commodity many-core architectures exploit idle resources and ensure reliable system operation. The proposed interposes dynamically-adaptable fault tolerance layer between the hardware operating through use of hypervisor. It avoids introduction new single point failure by incorporating hypervisor into sphere replication. approach greatly simplifies implementation over specialized hardware, system, or application-based techniques offers significant flexibility in type degree protection provided. possible levels considered range from duplex replication arbitrary n-modular limited only number processors system. feasibility is for both near- long-term computing platforms prototype developed as proof-of-concept used estimate performance overhead gather empirical data on capabilities. A detection latency reduction also analyzed using injection facilities provided prototype.

参考文章(73)
Srikanth Kandula, Yuanyuan Zhou, Sudarshan M. Srinivasan, Christopher R. Andrews, Flashback: a lightweight extension for rollback and deterministic replay for software debugging usenix annual technical conference. pp. 3- 3 ,(2004)
Babak Falsafi, Jared C. Smolens, Brian T. Gold, James C. Hoe, Detecting Emerging Wearout Faults ,(2007)
André Schiper, Naohiro Hayashibara, Takuya Katayama, Péter Urbán, Performance Comparison Between the Paxos and Chandra-Toueg Consensus Algorithms Proc. Int'l Arab Conference on Information Technology (ACIT 2002). pp. 526- 533 ,(2002)
Mike Feeley, Brendan Cully, Andrew Warfield, Dutch Meyer, Geoffrey Lefebvre, Norm Hutchinson, Remus: high availability via asynchronous virtual machine replication networked systems design and implementation. pp. 161- 174 ,(2008)
Robert Sedgewick, Algorithms in C ,(1990)
Ran Libeskind-Hadas, Eli Brandt, Origin-based fault-tolerant routing in the mesh high performance computer architecture. ,vol. 11, pp. 603- 615 ,(1995) , 10.1016/0167-739X(95)00027-P
S. Webber, J. Beirne, The Stratus architecture [1991] Digest of Papers. Fault-Tolerant Computing: The Twenty-First International Symposium. pp. 79- 85 ,(1991) , 10.1109/FTCS.1991.146637
R. Jimenez-Peris, M. Patino-Martinez, S. Arevalo, Deterministic scheduling for transactional multithreaded replicas symposium on reliable distributed systems. pp. 164- 173 ,(2000) , 10.1109/RELDI.2000.885404
N.R. Saxena, E.J. McCluskey, Control-flow checking using watchdog assists and extended-precision checksums IEEE Transactions on Computers. ,vol. 39, pp. 554- 559 ,(1990) , 10.1109/12.54849
William J Armstrong, Richard L Arndt, David C Boutcher, Robert G Kovacs, David Larson, Kyle A Lucke, Naresh Nayar, RW Swanberg, None, Advanced virtualization capabilities of POWER5 systems Ibm Journal of Research and Development. ,vol. 49, pp. 523- 532 ,(2005) , 10.1147/RD.494.0523