作者: Franz Franchetti , Stefan Kral , Juergen Lorenz , Markus Püschel , Christoph W. Ueberhuber
DOI: 10.1007/11403937_3
关键词: Very long instruction word 、 Vectorization (mathematics) 、 Supercomputer 、 Performance tuning 、 Computer science 、 Floating point 、 SIMD 、 Code generation 、 Fast Fourier transform 、 Parallel computing
摘要: IBM is currently developing the new line of BlueGene/L supercomputers. The top-of-the-line installation planned to be a 65,536 processors system featuring peak performance 360 Tflop/s. This supposed lead Top 500 list when being installed in 2005 at Lawrence Livermore National Laboratory. paper presents one first numerical kernels run on prototype machine. We tuned our formal vectorization approach as well Vienna MAP vectorizer support BlueGene/L's custom two-way short vector SIMD “double” floating-point unit and connected resulting methods automatic tuning systems Spiral Fftw. Our produces automatically high-performance FFT for that are up 45% faster than best scalar spiral generated code 75% Fftw single processor.