Real-world design and evaluation of compiler-managed GPU redundant multithreading

作者： Jack Wadden , Alexander Lyashevsky , Sudhanva Gurumurthi , Vilas Sridharan , Kevin Skadron

关键词: Computer science 、 Supercomputer 、 Compiler 、 Thread (computing) 、 Multithreading 、 Fault coverage 、 General-purpose computing on graphics processing units 、 Software 、 Parallel computing

摘要: Reliability for general purpose processing on the GPU (GPGPU) is becoming a weak link in construction of reliable supercomputer systems. Because hardware protection expensive to develop, requires dedicated on-chip resources, and not portable across different architectures, efficiency software solutions such as redundant multithreading (RMT) must be explored.This paper presents real-world design evaluation automatic RMT hardware. We first describe compiler pass that automatically converts GPGPU kernels into redundantly threaded versions. then perform detailed power performance evaluations three algorithms, each which provides fault coverage set structures GPU. Using real hardware, we show compilermanaged has highly variable costs. further analyze individual costs work scheduling, computation, inter-thread communication, showing no single component responsible high overheads all applications; instead, certain workload properties tend cause well or poorly. Finally, demonstrate benefit architectural support with specific example fast, register-level thread communication

uni-trier.de 本地加速

acm.org 本地加速

computer.org 本地加速

virginia.edu LINK 下载加速

virginia.edu PDF 下载加速

acm.org PDF 下载加速

sci-hub.se PDF 下载加速

参考文章(25)

Timothy G. Rogers, Mike OConnor, Tor M. Aamodt, Cache-Conscious Wavefront Scheduling international symposium on microarchitecture. pp. 72- 83 ,(2012) , 10.1109/MICRO.2012.16

Yun Zhang, Soumyadeep Ghosh, Jialu Huang, Jae W. Lee, Scott A. Mahlke, David I. August, Runtime asynchronous fault tolerance via speculation symposium on code generation and optimization. pp. 145- 154 ,(2012) , 10.1145/2259016.2259035

Steven K. Reinhardt, Shubhendu S. Mukherjee, Transient fault detection via simultaneous multithreading international symposium on computer architecture. ,vol. 28, pp. 25- 36 ,(2000) , 10.1145/339647.339652

Jeremy W. Sheaffer, David P. Luebke, Kevin Skadron, The visual vulnerability spectrum: characterizing architectural vulnerability for graphics hardware international conference on computer graphics and interactive techniques. pp. 9- 16 ,(2006) , 10.1145/1283900.1283902

T. N. Vijaykumar, Irith Pomeranz, Karl Cheng, Transient-fault recovery using simultaneous multithreading ACM SIGARCH Computer Architecture News. ,vol. 30, pp. 87- 98 ,(2002) , 10.1145/545214.545226

Shubhendu S. Mukherjee, Michael Kontz, Steven K. Reinhardt, Detailed design and evaluation of redundant multithreading alternatives ACM SIGARCH Computer Architecture News. ,vol. 30, pp. 99- 110 ,(2002) , 10.1145/545214.545227

Cheng Wang, Ho-seop Kim, Youfeng Wu, Victor Ying, Compiler-Managed Software-based Redundant Multi-Threading for Transient Fault Detection symposium on code generation and optimization. pp. 244- 258 ,(2007) , 10.1109/CGO.2007.7

Keun Soo Yim, Cuong Pham, Mushfiq Saleheen, Zbigniew Kalbarczyk, Ravishankar Iyer, Hauberk: Lightweight Silent Data Corruption Error Detector for GPGPU international parallel and distributed processing symposium. pp. 287- 300 ,(2011) , 10.1109/IPDPS.2011.36

M.K. Qureshi, O. Mutlu, Y.N. Patt, Microarchitecture-based introspection: a technique for transient-fault tolerance in microprocessors dependable systems and networks. pp. 434- 443 ,(2005) , 10.1109/DSN.2005.62

10.

N. Oh, P.P. Shirvani, E.J. McCluskey, Error detection by duplicated instructions in super-scalar processors IEEE Transactions on Reliability. ,vol. 51, pp. 63- 75 ,(2002) , 10.1109/24.994913

Real-world design and evaluation of compiler-managed GPU redundant multithreading

来源期刊

我的账户

Real-world design and evaluation of compiler-managed GPU redundant multithreading

来源期刊

相似文章 10

我的账户