Exploring Shared Memory Protocols in FLASH

作者: Mark Horowitz , Robert Kunz , Mary Hall , Robert Lucas , Jacqueline Chame

DOI: 10.2172/939091

关键词: Computer architectureComputer scienceShared memoryTransactional memoryMemory hierarchyProgrammerCache coherenceProgramming paradigmOverhead (computing)Compiler

摘要: ABSTRACT The goal of this project was to improve the performance large scientific and engineering applications through collaborative hardware software mechanisms manage memory hierarchy non-uniform access time (NUMA) shared-memory machines, as well their component individual processors. In spite programming advantages platforms, obtaining good for on such machines can be challenging. Because communication between processors is managed implicitly by hardware, rather than expressed programmer, application may suffer from unintended – that programmer did not consider when developing his/her application. project, we developed evaluated a collection compiler, languages monitoring tools obtain high NUMA platforms managing alternative coherence mechanisms. Alternative have often been discussed means reducing communication, although architecture implementations are quite rare. This report describes an actual implementation set protocols support coherent, non-coherent write-update accesses CC-NUMA architecture, Stanford FLASH machine. Such approach has using onlymore » where it beneficial, also provides evolutionary migration path improving performance. We present data two computations, RandomAccess HPC Challenge benchmarks forward solver derived LS-DYNA, showing For RandomAccess, versions outperform coherent version factors 5 2.5, respectively. improvements 18% average version. SpecOMP benchmarks, modest overhead less 3% in needed. addition selective studies machine, last six months ISI performed research compiler technology transactional (TM) model being at Stanford. As part recognizes “pragmas” automatically generates parallel code TM model« less

参考文章(14)
John Hennessy, Jeffrey Steven Gibson, Memory profiling on shared-memory multiprocessors ,(2002)
Censier, Feautrier, A New Solution to Coherence Problems in Multicache Systems IEEE Transactions on Computers. ,vol. 27, pp. 1112- 1118 ,(1978) , 10.1109/TC.1978.1675013
Yoon-Ju Lee, Mary Hall, A code isolator: isolating code fragments from large programs ieee international conference on high performance computing data and analytics. pp. 164- 178 ,(2004) , 10.1007/11532378_13
Lance Hammond, Vicky Wong, Mike Chen, Brian D. Carlstrom, John D. Davis, Ben Hertzberg, Manohar K. Prabhu, Honggo Wijaya, Christos Kozyrakis, Kunle Olukotun, Transactional Memory Coherence and Consistency ACM SIGARCH Computer Architecture News. ,vol. 32, pp. 102- 113 ,(2004) , 10.1145/1028176.1006711
M. Chaudhuri, M. Heinrich, C. Holt, J.P. Singh, E. Rothberg, J. Hennessy, Latency, occupancy, and bandwidth in DSM multiprocessors: a performance evaluation IEEE Transactions on Computers. ,vol. 52, pp. 862- 880 ,(2003) , 10.1109/TC.2003.1214336
Milo M. K. Martin, Mark D. Hill, David A. Wood, Token coherence Proceedings of the 30th annual international symposium on Computer architecture - ISCA '03. ,vol. 31, pp. 182- 193 ,(2003) , 10.1145/859618.859640
Kourosh Gharachorloo, Vijayaraghavan Soundararajan, John Hennessy, Mark Heinrich, Ben Verghese, Anoop Gupta, Flexible use of memory for replication/migration in cache-coherent DSM multiprocessors ,(1999)
D. Lenoski, J. Laudon, K. Gharachorloo, W.-D. Weber, A. Gupta, J. Hennessy, M. Horowitz, M.S. Lam, The Stanford Dash multiprocessor IEEE Computer. ,vol. 25, pp. 583- 599 ,(1992) , 10.1109/2.121510
JaeWoong Chung, H. Chafi, C. Cao Minh, A. McDonald, B. Carlstrom, C. Kozyrakis, K. Olukotun, The common case transactional behavior of multithreaded programs high-performance computer architecture. pp. 266- 277 ,(2006) , 10.1109/HPCA.2006.1598135
Ashley Saulsbury, Tim Wilkinson, John Carter, Anders Landin, An argument for simple COMA high performance computer architecture. ,vol. 11, pp. 276- 285 ,(1995) , 10.1016/0167-739X(95)00024-M