Low overhead multiprocessor allocation strategies exploiting system spare capacity for fault detection and location

作者: S. Tridandapani , A.K. Somani , U.R. Sandadi

DOI: 10.1109/12.392845

关键词:

摘要: Several schemes for detecting faults at the processor level in a multiprocessor system have been discussed past. One such scheme (A. Dahbura et al., 1989) works by running secondary versions of jobs on unused or spare processors and uses comparison approach (J. Maeng M. Malek, 1981) to detect faults. We build upon this propose three new allocation strategies that run variable number per job. These permit online detection and, many cases, location faulty with nominal degradation its delay/throughput performance; these delays are limited chiefly associated job preemptions. Two metrics, fault capability (FDC) (FLC), introduced evaluate schemes. Extensive simulation results performed obtain performance figures various Stochastic Petri net models also developed approximate results. The show utilize capacity more efficiently, thereby improving capabilities system. >

参考文章(11)
Ravishankar K. Iyer, Measurement and Modeling of Computer System Failures. ifip congress. pp. 115- 116 ,(1989)
Leonard Kleinrock, Theory, Volume 1, Queueing Systems Wiley-Interscience. ,(1975)
A.T. Dahbura, K.K. Sabnani, W.J. Hery, Spare capacity as a means of fault detection and diagnosis in multiprocessor systems IEEE Transactions on Computers. ,vol. 38, pp. 881- 891 ,(1989) , 10.1109/12.24300
Breuer, Ismaeel, Roving Emulation as a Fault Detection Mechanism IEEE Transactions on Computers. ,vol. 35, pp. 933- 939 ,(1986) , 10.1109/TC.1986.1676695
Miroslaw Malek, A comparison connection assignment for diagnosis of multiprocessor systems Proceedings of the 7th annual symposium on Computer Architecture - ISCA '80. pp. 31- 36 ,(1980) , 10.1145/800053.801906
O.C. Ibe, K.S. Trivedi, Stochastic Petri net models of polling systems IEEE Journal on Selected Areas in Communications. ,vol. 8, pp. 1649- 1657 ,(1990) , 10.1109/49.62852
G. Ciardo, J. Muppala, K. Trivedi, SPNP: stochastic Petri net package international workshop on petri nets and performance models. pp. 142- 151 ,(1989) , 10.1109/PNPM.1989.68548
A.K. Somani, C. Wittenbrink, R.M. Haralick, L.G. Shapiro, Jenq-Neng Hwang, Chung-Ho Chen, R. Johnson, K. Cooper, Proteus system architecture and organization international parallel processing symposium. pp. 287- 294 ,(1991) , 10.1109/IPPS.1991.153793
S. Tridandapani, A.K. Somani, Efficient utilization of spare capacity for fault detection and location in multiprocessor systems [1992] Digest of Papers. FTCS-22: The Twenty-Second International Symposium on Fault-Tolerant Computing. pp. 440- 447 ,(1992) , 10.1109/FTCS.1992.243591
Dahbura, Sabnani, King, The Comparison Approach to Multiprocessor Fault Diagnosis IEEE Transactions on Computers. ,vol. 36, pp. 373- 378 ,(1987) , 10.1109/TC.1987.1676912