作者: Hajime Fujita , Kamil Iskra , Pavan Balaji , Andrew A. Chien
关键词:
摘要: Future supercomputer systems will face serious reliability challenges. Among failure scenarios, latent errors are some of the most and concerning. Preserving multiple versions critical data is a promising approach to deal with such errors. We developing Global View Resilience (GVR) library, multi-version global arrays as one key features. This paper presents three array versioning architectures: flat array, change tracking, log-structured array. use synthetic workload that mimics memory access patterns radix sort, N-body simulation, matrix multiplication, comparing architectures in terms runtime performance, requirements, version restoration costs. The experiments show tracking best architecture for frequencies 10-5 opsm1 or higher matching second beating it by up 23 times, whereas preferable low usage, since saves 98% compared