作者: Dafei Huang , Mei Wen , Changqing Xun , Dong Chen , Xing Cai
DOI: 10.1007/978-3-319-09873-9_18
关键词: Many core 、 Parallel computing 、 Multi-core processor 、 Coprocessor 、 Central processing unit 、 Software portability 、 Thread (computing) 、 Locality 、 Computer science
摘要: When adapting GPU-specific OpenCL kernels to run on multi-core/many-core CPUs, coarsening the thread granularity is necessary and thus extensively used. However, locality concerns exposed in code are usually inherited without analysis, which may give side-effects CPU performance. executing local-memory arrays no longer match well with hardware associated synchronizations costly. To solve this dilemma, we actively analyze memory access patterns by using array-access descriptors derived from kernels, can be adapted for CPUs removing all unwanted together obsolete barrier statements. Experiments show that automated transformation satisfactorily improve kernel performances Sandy Bridge Intel’s Many-Integrated-Core coprocessor.