Characterization of failure handling in fault-tolerant multiprocessor systems

作者: Yann-Hang Lee

DOI:

关键词:

摘要: Traditional reliability-related models for fault-tolerant systems are used to predict system reliability, availability, computation capacity, or performability. They lack the capacity treat in detail handling and consequences of failure. Also, there is insufficient attention paid fact that a crash could follow any mishandling failure. Failure consists three major steps: error detection, reconfiguration, recovery. These steps must be considered together as single package, not separate entities traditional analyses. Such an integration can extended develop design aids computers. The dissertation begins with modeling fault/error detection mechanisms which designed identify faulty units. When fault latency and/or exist, may suffer from propagation errors accumulation extant faults will seriously reduce capability. Several developed so we study effect on subsequent overall reliability. Upon unit, should reconfigure itself into optimal configuration total reward achieved executions maximized. Finally, contaminated processes have recovered. The strategies recovery employed depend redundancy available. methods, especially retry rollback, analyzed. overheads evaluated, providing index capabilities reconfiguration mechanisms.

参考文章(0)