Reliable Linear, Sesquilinear and Bijective Operations On Integer Data Streams Via Numerical Entanglement

作者: Yiannis Andreopoulos , Mohammad Ashraful Anam

DOI:

关键词: Integer (computer science)Redundancy (engineering)BijectionIntegerParallel computingMatrix multiplicationOverhead (computing)PermutationLeast significant bitThroughput (business)Data stream miningComputer science

摘要: A new technique is proposed for fault-tolerant linear, sesquilinear and bijective (LSB) operations on M integer data streams (M ≥ 3), such as: scaling, additions/subtractions, inner or outer vector products, permutations convolutions. In the method, input are linearly superimposed to form numerically-entangled that stored in-place of original inputs. series LSB can then be performed directly using these entangled streams. The results extracted from output by additions arithmetic shifts. Any soft errors affecting any single disentangled stream guaranteed detectable via a specific post-computation reliability check. addition, when utilizing separate processor core each streams, approach recover all outputs after fail-stop failure. Importantly, unlike algorithm-based fault tolerance (ABFT) methods, number required entanglement, extraction validation related inputs does not depend complexity operations. We have validated our proposal in an Intel (Haswell architecture with AVX2 support) several types operations: fast Fourier transforms, circular convolutions, matrix multiplication Our analysis experiments reveal incurs between 0.03% 7% reduction processing throughput wide variety This overhead 5 1000 times smaller than equivalent ABFT method uses checksum stream. Thus, used faultgenerating hardware safety-critical applications, where high without cost modular redundancy.

参考文章(43)
Christian Engelmann, Hong Hoe Ong, Stephen L Scott, The Case for Modular Redundancy in Large-Scale High Performance Computing Systems ,(2009)
Endong Wang, Qing Zhang, Bo Shen, Guangyong Zhang, Xiaowei Lu, Qing Wu, Yajuan Wang, Intel Math Kernel Library Springer, Cham. pp. 167- 188 ,(2014) , 10.1007/978-3-319-06486-4_7
K. G. Margaritis, V. K. Stefanidis, Algorithm Based Fault Tolerance : Review and experimental study ,(2004)
Steven Hand, Derek G. Murray, Spread-spectrum computation hot topics in system dependability. pp. 5- 5 ,(2008)
Brian Foo, Yiannis Andreopoulos, Mihaela van der Schaar, Analytical Rate-Distortion-Complexity Modeling of Wavelet-Based Video Coders IEEE Transactions on Signal Processing. ,vol. 56, pp. 797- 815 ,(2008) , 10.1109/TSP.2007.906685
Ijeoma Anarado, Yiannis Andreopoulos, Mitigation of fail-stop failures in integer matrix products via numerical packing international on-line testing symposium. pp. 101- 107 ,(2015) , 10.1109/IOLTS.2015.7229840
J. Rexford, N.K. Jha, Algorithm-based fault tolerance for floating-point operations in massively parallel systems international symposium on circuits and systems. ,vol. 2, pp. 649- 652 ,(1992) , 10.1109/ISCAS.1992.230168
A. Munteanu, Y. Andreopoulos, M. van der Schaar, P. Schelkens, J. Cornelis, Control of the distortion variation in video coding systems based on motion compensated temporal filtering international conference on image processing. ,vol. 2, pp. 61- 64 ,(2003) , 10.1109/ICIP.2003.1246616
Herve Jegou, Florent Perronnin, Matthijs Douze, Jorge Sánchez, Patrick Perez, Cordelia Schmid, Aggregating Local Image Descriptors into Compact Codes IEEE Transactions on Pattern Analysis and Machine Intelligence. ,vol. 34, pp. 1704- 1716 ,(2012) , 10.1109/TPAMI.2011.235
David Fiala, Frank Mueller, Christian Engelmann, Rolf Riesen, Kurt Ferreira, Poster Proceedings of the 2011 companion on High Performance Computing Networking, Storage and Analysis Companion - SC '11 Companion. pp. 47- 48 ,(2011) , 10.1145/2148600.2148625