作者: Devesh Tiwari , Hajime Fujita , Daniel T. Graves , Anshu Dubey , Andrew Chien
关键词:
摘要: Supercomputing platforms are expected to have larger failure rates in the future because of scaling and power concerns. The memory performance impact may vary with error types modes. Therefore, localized recovery schemes will be important for scientific computations, including modes where application intervention is suitable recovery. We present a resiliency methodology applications using structured adaptive mesh refinement, map granularities within detection correction. This approach also enables parameterization cost differentiated model built tuning parameters that can used customize strategy different computing environments. show this make proportional rate.