Techniques to Reduce the Soft Error Rate of a High-Performance Microprocessor

作者: Christopher Weaver , Joel Emer , Shubhendu S. Mukherjee , Steven K. Reinhardt

DOI: 10.1145/1028176.1006723

关键词:

摘要: Transient faults due to neutron and alpha particle strikes posea significant obstacle increasing processor transistor counts infuture technologies. Although fault rates of individual transistorsmay not rise significantly, incorporating more transistors into adevice makes that device likely encounter a fault. Hence,maintaining error at acceptable levels will requireincreasing design effort.This paper proposes two simple approaches reduce errorrates evaluates their application microprocessor instructionqueue. The first technique reduces the time instructions sit invulnerable storage structures by selectively squashing instructionswhen long delays are encountered. A is less cause anerror if structure it affects does contain valid instructions.We introduce new metric, MITF (Mean Instructions To Failure),to capture trade-off between performance reliability introducedby this approach.The second addresses false detected errors. In theabsence detection mechanism, such errors would nothave affected final outcome program. For example, faultaffecting result dynamically dead instruction notchange program output, but could still be flagged thehardware as an error. avoid signalling errors, wemodify pipeline's logic mark instructionsand data possibly incorrect rather than immediately signalingan Then, we signal only determine laterthat value have program'soutput.

参考文章(30)
Robert S. Swarz, Daniel P. Siewiorek, Reliable Computer Systems: Design and Evaluation ,(1992)
T. Karnik, B. Bloechel, K. Soumyanath, V. De, S. Borkar, Scaling trends of cosmic ray induced soft errors in static latches beyond 0.18 /spl mu/ symposium on vlsi circuits. pp. 61- 62 ,(2001) , 10.1109/VLSIC.2001.934195
E. Rotenberg, AR-SMT: a microarchitectural approach to fault tolerance in microprocessors ieee international symposium on fault tolerant computing. pp. 84- 91 ,(1999) , 10.1109/FTCS.1999.781037
T. Calin, R. Velazco, M. Nicolaidis, S. Moss, S.D. LaLumondiere, V.T. Tran, R. Koga, K. Clark, Topology-related upset mechanisms in design hardened storage cells european conference on radiation and its effects on components and systems. pp. 484- 488 ,(1997) , 10.1109/RADECS.1997.698979
J. F. Ziegler, H. W. Curtis, H. P. Muhlfeld, C. J. Montrose, B. Chin, M. Nicewicz, C. A. Russell, W. Y. Wang, L. B. Freeman, P. Hosier, L. E. LaFave, J. L. Walsh, J. M. Orro, G. J. Unger, J. M. Ross, T. J. O'Gorman, B. Messina, T. D. Sullivan, A. J. Sykes, H. Yourke, T. A. Enger, V. Tolat, T. S. Scott, A. H. Taber, R. J. Sussman, W. A. Klein, C. W. Wahaus, IBM experiments in soft fails in computer electronics (1978–1994) Ibm Journal of Research and Development. ,vol. 40, pp. 3- 18 ,(1996) , 10.1147/RD.401.0003
N.J. Wang, J. Quek, T.M. Rafacz, S.J. Patel, Characterizing the effects of transient faults on a high-performance processor pipeline dependable systems and networks. pp. 61- 70 ,(2004) , 10.1109/DSN.2004.1311877
J. Emer, P. Ahuja, E. Borch, A. Klauser, Chi-Keung Luk, S. Manne, S.S. Mukherjee, H. Patil, S. Wallace, N. Binkert, R. Espasa, T. Juan, Asim: a performance model framework IEEE Computer. ,vol. 35, pp. 68- 76 ,(2002) , 10.1109/2.982918
Steven K. Reinhardt, Shubhendu S. Mukherjee, Transient fault detection via simultaneous multithreading international symposium on computer architecture. ,vol. 28, pp. 25- 36 ,(2000) , 10.1145/339647.339652
Mohamed Gomaa, Chad Scarbrough, T. N. Vijaykumar, Irith Pomeranz, Transient-fault recovery for chip multiprocessors Proceedings of the 30th annual international symposium on Computer architecture - ISCA '03. ,vol. 31, pp. 98- 109 ,(2003) , 10.1145/859618.859631
Dean M. Tullsen, Jeffery A. Brown, Handling long-latency loads in a simultaneous multithreading processor international symposium on microarchitecture. pp. 318- 327 ,(2001) , 10.5555/563998.564038