Dense Linear Algebra on Distributed Heterogeneous Hardware with a Symbolic DAG Approach

作者: George Bosilca , None

DOI:

关键词: Computer hardwareChipPipeline (computing)Processor designBandwidth (signal processing)Petascale computingClock rateMulti-core processorComputer scienceLimit (music)

摘要: Dense Linear Algebra on Distributed Heterogeneous Hardware with a Symbolic DAG Approach George Bosilca Thomas Herault Aurelien Bouteiller Piotr Luszczek Anthony Danalis Jack J. Dongarra January 24, 2012 Introduction and Motivation Among the various factors that drive momentous changes occurring in design of microprocessors high end systems [1], three stand out as especially notable: 1. number transistors per chip will continue current trend, i.e. double roughly every 18 months, while speed processor clocks cease to in- crease; 2. physical limit bandwidth CPUs pins is becoming near-term reality; 3. strong drift toward hybrid/heterogeneous for petascale (and larger) taking place. While first two involve fundamental limitations technology trends are unlikely overcome near term, third an obvious consequence two, combined economic necessity using many thousands computational units scale up larger systems. More slower require multicore designs increased par- allelism. The laws traditional – increasing transistor density, speeding clock rate, lowering voltage have now been stopped by set barriers: excess heat produced, too much power consumed, energy leaked, useful signal noise. Multicore natural evolu- tionary response this situation. By putting multiple cores single die, architects can previous limitations, increase num- ber gates without densities. However, since production means frequencies cannot be further increased, deep-and-narrow pipeline models tend recede shallow-and-wide become norm. Moreover, despite similarities, processors not equiva- lent multiple-CPUs or SMPs. Multiple same share

参考文章(36)
J. Dongarra, L. S. Blackford, J. Demmel, A. Petitet, I. Dhillon, E. D'Azevedo, R. C. Whaley, G. Henry, K. Stanley, J. Choi, S. Hammarling, A. Cleary, D. Walker, ScaLAPACK Users' Guide ,(1987)
John A Sharp, None, Data flow computing: theory and practice Ablex Publishing Corp.. ,(1992)
Edward Grady Coffman, Peter J Denning, None, Operating Systems Theory Prentice Hall Professional Technical Reference. ,(1973)
Azzam Haidar, Hatem Ltaief, Asim YarKhan, Jack Dongarra, Analysis of dynamically scheduled tile algorithms for dense linear algebra on multicore architectures Concurrency and Computation: Practice and Experience. ,vol. 24, pp. 305- 321 ,(2011) , 10.1002/CPE.1829
Alfredo Buttari, Jack Dongarra, Jakub Kurzak, Julien Langou, Piotr Luszczek, Stanimire Tomov, The impact of multicore on math software parallel computing. pp. 1- 10 ,(2006) , 10.1007/978-3-540-75755-9_1
Allen D. Malony, Wolfgang E. Nagel, The open trace format (OTF) and open tracing for HPC conference on high performance computing (supercomputing). pp. 24- ,(2006) , 10.1145/1188455.1188480
J.L. Hess, A.M.O. Smith, Calculation of potential flow about arbitrary bodies Progress in Aerospace Sciences. ,vol. 8, pp. 1- 138 ,(1967) , 10.1016/0376-0421(67)90003-6
Ernie Chan, Field G. Van Zee, Paolo Bientinesi, Enrique S. Quintana-Orti, Gregorio Quintana-Orti, Robert van de Geijn, SuperMatrix Proceedings of the 13th ACM SIGPLAN Symposium on Principles and practice of parallel programming - PPoPP '08. pp. 123- 132 ,(2008) , 10.1145/1345206.1345227
G.W. Stewart, The decompositional approach to matrix computation computational science and engineering. ,vol. 2, pp. 50- 59 ,(2000) , 10.1109/5992.814658