作者: N. DeBardeleben , S. Gurumurthi , M. Sonza Reorda , F. Cappello , P. Rech
关键词: Power (physics) 、 Reliability (statistics) 、 Fault injection 、 CUDA 、 Embedded system 、 Computer science 、 Distributed computing 、 General-purpose computing on graphics processing units 、 Fault tolerance
摘要: GPGPUs are used increasingly in several domains, from gaming to different kinds of computationally intensive applications. In many applications GPGPU reliability is becoming a serious issue, and research activities focusing on its evaluation. This paper offers an overview some major results the area. First, it shows analyzes experiments assessing HPC datacenters. Second, provides recent derived radiation about GPGPUs. Third, describes characteristics advanced fault-injection environment, allowing effective evaluation resiliency running