Automatically tuned FFTs for bluegene/l's double FPU

作者: Franz Franchetti , Stefan Kral , Juergen Lorenz , Markus Püschel , Christoph W. Ueberhuber

DOI: 10.1007/11403937_3

关键词: Very long instruction wordVectorization (mathematics)SupercomputerPerformance tuningComputer scienceFloating pointSIMDCode generationFast Fourier transformParallel computing

摘要: IBM is currently developing the new line of BlueGene/L supercomputers. The top-of-the-line installation planned to be a 65,536 processors system featuring peak performance 360 Tflop/s. This supposed lead Top 500 list when being installed in 2005 at Lawrence Livermore National Laboratory. paper presents one first numerical kernels run on prototype machine. We tuned our formal vectorization approach as well Vienna MAP vectorizer support BlueGene/L's custom two-way short vector SIMD “double” floating-point unit and connected resulting methods automatic tuning systems Spiral Fftw. Our produces automatically high-performance FFT for that are up 45% faster than best scalar spiral generated code 75% Fftw single processor.

参考文章(31)
Randall J. Fisher, Henry G. Dietz, The Scc Compiler: SWARing at MMX 3DNow! languages and compilers for parallel computing. pp. 399- 414 ,(1999) , 10.1007/3-540-44905-1_25
THROOM — Supporting POSIX Multithreaded Binaries on a Cluster european conference on parallel processing. pp. 760- 769 ,(2003) , 10.1007/B12024
Randall J. Fisher, Henry G. Dietz, Compiling for SIMD Within a Register Languages and Compilers for Parallel Computing. pp. 290- 305 ,(1999) , 10.1007/3-540-48319-5_19
F. Franchetti, M. Puschel, Short vector code generation for the discrete Fourier transform international parallel and distributed processing symposium. pp. 58- ,(2003) , 10.1109/IPDPS.2003.1213153
Stefan Kral, Franz Franchetti, Juergen Lorenz, Christoph W. Ueberhuber, SIMD Vectorization of Straight Line FFT Code european conference on parallel processing. pp. 251- 260 ,(2003) , 10.1007/978-3-540-45209-6_39
George Almási, Ralph Bellofatto, José Brunheroto, Călin Caşcaval, José G. Castaños, Luis Ceze, Paul Crumley, C. Christopher Erway, Joseph Gagliano, Derek Lieber, Xavier Martorell, José E. Moreira, Alda Sanomiya, Karin Strauss, An Overview of the Blue Gene/L System Software Organization european conference on parallel processing. pp. 543- 555 ,(2003) , 10.1007/978-3-540-45209-6_79
F. Franchetti, H. Karner, S. Kral, C.W. Ueberhuber, Architecture independent short vector FFTs international conference on acoustics, speech, and signal processing. ,vol. 2, pp. 1109- 1112 ,(2001) , 10.1109/ICASSP.2001.941115
Dragan Mirković, S. Lennart Johnsson, Automatic Performance Tuning in the UHFFT Library international conference on computational science. pp. 71- 80 ,(2001) , 10.1007/3-540-45545-0_17
F. Franchetti, M. Puschel, Short vector code generation and adaptation for DSP algorithms international conference on acoustics, speech, and signal processing. ,vol. 2, pp. 537- 540 ,(2003) , 10.1109/ICASSP.2003.1202422
R. Clint Whaley, Antoine Petitet, Jack J. Dongarra, New trends in high performance computing ieee international conference on high performance computing data and analytics. ,vol. 27, pp. 3- 35 ,(2001) , 10.1016/S0167-8191(00)00087-9