作者: Mark Horowitz , Robert Kunz , Mary Hall , Robert Lucas , Jacqueline Chame
DOI: 10.2172/939091
关键词: Computer architecture 、 Computer science 、 Shared memory 、 Transactional memory 、 Memory hierarchy 、 Programmer 、 Cache coherence 、 Programming paradigm 、 Overhead (computing) 、 Compiler
摘要: ABSTRACT The goal of this project was to improve the performance large scientific and engineering applications through collaborative hardware software mechanisms manage memory hierarchy non-uniform access time (NUMA) shared-memory machines, as well their component individual processors. In spite programming advantages platforms, obtaining good for on such machines can be challenging. Because communication between processors is managed implicitly by hardware, rather than expressed programmer, application may suffer from unintended – that programmer did not consider when developing his/her application. project, we developed evaluated a collection compiler, languages monitoring tools obtain high NUMA platforms managing alternative coherence mechanisms. Alternative have often been discussed means reducing communication, although architecture implementations are quite rare. This report describes an actual implementation set protocols support coherent, non-coherent write-update accesses CC-NUMA architecture, Stanford FLASH machine. Such approach has using onlymore » where it beneficial, also provides evolutionary migration path improving performance. We present data two computations, RandomAccess HPC Challenge benchmarks forward solver derived LS-DYNA, showing For RandomAccess, versions outperform coherent version factors 5 2.5, respectively. improvements 18% average version. SpecOMP benchmarks, modest overhead less 3% in needed. addition selective studies machine, last six months ISI performed research compiler technology transactional (TM) model being at Stanford. As part recognizes “pragmas” automatically generates parallel code TM model« less