作者: Sabela Ramos , Torsten Hoefler
关键词: Parallel algorithm 、 Cache 、 Cache algorithms 、 Computer science 、 Xeon Phi 、 Parallel computing 、 Cache coherence 、 CPU cache 、 MESIF protocol 、 Smart Cache
摘要: Most multi-core and some many-core processors implement cache coherency protocols that heavily complicate the design of optimal parallel algorithms. Communication is performed implicitly by line transfers between cores, complicating understanding performance properties. We developed an intuitive model for cache-coherent architectures demonstrate its use with currently most scalable architecture, Intel Xeon Phi. Using our model, we develop several optimized algorithms complex data exchanges. All were beat highly-tuned vendor-specific OpenMP MPI libraries up to a factor 4.3. The can be simplified satisfy tradeoff complexity algorithm accuracy. expect serve as vehicle advanced design.