作者: George Bosilca , None
DOI:
关键词: Computer hardware 、 Chip 、 Pipeline (computing) 、 Processor design 、 Bandwidth (signal processing) 、 Petascale computing 、 Clock rate 、 Multi-core processor 、 Computer science 、 Limit (music)
摘要: Dense Linear Algebra on Distributed Heterogeneous Hardware with a Symbolic DAG Approach George Bosilca Thomas Herault Aurelien Bouteiller Piotr Luszczek Anthony Danalis Jack J. Dongarra January 24, 2012 Introduction and Motivation Among the various factors that drive momentous changes occurring in design of microprocessors high end systems [1], three stand out as especially notable: 1. number transistors per chip will continue current trend, i.e. double roughly every 18 months, while speed processor clocks cease to in- crease; 2. physical limit bandwidth CPUs pins is becoming near-term reality; 3. strong drift toward hybrid/heterogeneous for petascale (and larger) taking place. While first two involve fundamental limitations technology trends are unlikely overcome near term, third an obvious consequence two, combined economic necessity using many thousands computational units scale up larger systems. More slower require multicore designs increased par- allelism. The laws traditional – increasing transistor density, speeding clock rate, lowering voltage have now been stopped by set barriers: excess heat produced, too much power consumed, energy leaked, useful signal noise. Multicore natural evolu- tionary response this situation. By putting multiple cores single die, architects can previous limitations, increase num- ber gates without densities. However, since production means frequencies cannot be further increased, deep-and-narrow pipeline models tend recede shallow-and-wide become norm. Moreover, despite similarities, processors not equiva- lent multiple-CPUs or SMPs. Multiple same share