作者: Bo Fang , Jiesheng Wei , Karthik Pattabiraman , Matei Ripeanu
DOI: 10.1109/SC.COMPANION.2012.289
关键词:
摘要: GPUs have been originally designed for error-resilient workload. Today, are used in error-sensitive applications, e.g. General Purpose GPU (GPGPU) applications. The goal of this project is to investigate the error resilience GPGPU applications and understand their reliability characteristics. To end, we employ fault injection on real hardware. We find that, compared CPUs, platforms lead a higher rate silent data corruption -- major concern since these errors not flagged at runtime often remain latent. also that out-of-bound memory accesses most critical reason crashes