Modeling communication in cache-coherent SMP systems

作者: Sabela Ramos , Torsten Hoefler

DOI: 10.1145/2462902.2462916

关键词: Parallel algorithmCacheCache algorithmsComputer scienceXeon PhiParallel computingCache coherenceCPU cacheMESIF protocolSmart Cache

摘要: Most multi-core and some many-core processors implement cache coherency protocols that heavily complicate the design of optimal parallel algorithms. Communication is performed implicitly by line transfers between cores, complicating understanding performance properties. We developed an intuitive model for cache-coherent architectures demonstrate its use with currently most scalable architecture, Intel Xeon Phi. Using our model, we develop several optimized algorithms complex data exchanges. All were beat highly-tuned vendor-specific OpenMP MPI libraries up to a factor 4.3. The can be simplified satisfy tradeoff complexity algorithm accuracy. expect serve as vehicle advanced design.

参考文章(23)
Richard M Karp, None, A Survey of Parallel Algorithms for Shared-Memory Machines University of California at Berkeley. ,(1988)
Scott Owens, Susmit Sarkar, Peter Sewell, A Better x86 Memory Model: x86-TSO theorem proving in higher order logics. pp. 391- 407 ,(2009) , 10.1007/978-3-642-03359-9_27
L. Ivanov, R. Nunna, Modeling and verification of cache coherence protocols international symposium on circuits and systems. ,vol. 5, pp. 129- 132 ,(2001) , 10.1109/ISCAS.2001.922002
Thilo Kielmann, Henri E. Bal, Kees Verstoep, Fast Measurement of LogP Parameters for Message Passing Platforms international parallel and distributed processing symposium. pp. 1176- 1183 ,(2000) , 10.1007/3-540-45591-4_162
Robert Mcgill, John W. Tukey, Wayne A. Larsen, Variations of Box Plots The American Statistician. ,vol. 32, pp. 12- 16 ,(1978) , 10.1080/00031305.1978.10479236
Diego Andrade, Basilio B. Fraguela, Ramón Doallo, Accurate prediction of the behavior of multithreaded applications in shared caches parallel computing. ,vol. 39, pp. 36- 57 ,(2013) , 10.1016/J.PARCO.2012.11.003
Torsten Hoefler, Timo Schneider, Optimization principles for collective neighborhood communications ieee international conference on high performance computing data and analytics. pp. 1- 10 ,(2012) , 10.5555/2388996.2389129
Leslie G. Valiant, A bridging model for multi-core computing Journal of Computer and System Sciences. ,vol. 77, pp. 154- 166 ,(2011) , 10.1016/J.JCSS.2010.06.012
Albert Alexandrov, Mihai F. Ionescu, Klaus E. Schauser, Chris Scheiman, LogGP Proceedings of the seventh annual ACM symposium on Parallel algorithms and architectures - SPAA '95. pp. 95- 105 ,(1995) , 10.1145/215399.215427
Roger W. Hockney, The communication challenge for MPP: Intel Paragon and Meiko CS-2 parallel computing. ,vol. 20, pp. 389- 398 ,(1994) , 10.1016/S0167-8191(06)80021-9