Fault-tolerant computer system with online recovery and reintegration of redundant components

作者: Randall G. Banton , Kenneth C. Debacker , Tom Bereiter , Douglas E. Jewett , Nikhil A. Mehta

DOI:

关键词:

摘要: A computer system in a fault-tolerant configuration employees multiple identical CPUs executing the same instruction stream, with multiple, memory modules address space of storing duplicates data. The detects faults and modules, places faulty unit offline while continuing to operate using good units. can be replaced reintegrated into without shutdown. are loosely synchronized, as by detecting events such references stalling any CPU ahead others until all execute function simultaneously; interrupts synchronized ensuring that implement interrupt at point their stream. Memory via separate CPU-to-memory busses voted three ports each modules. I/O functions implemented two busses, which is separately coupled only one number processors both busses. devices accessed through pair (redundant) processors, but designated actively control given device; case failure processor, however, an device other