作者: Anastasis Keliris , Vasilis Dimitsas , Olympia Kremmyda , Dimitris Gizopoulos , Michail Maniatakos
DOI: 10.1109/PATMOS.2015.7347583
关键词:
摘要: As the rate of single-thread CPU performance improvement per generation has diminished due to lower transistor-speed scaling and energy related issues, researchers industry have shifted their interest towards multi-core many-core architectures for improving performance. Comparisons between optimized applications parallel been quantified many times in literature, but contradictory results reported mainly biased methods evaluating comparing these architectures. In this paper, we present memory-oblivious optimizations widely used Discrete Wavelet Transform (DWT), provide detailed comparisons algorithm on Intel AMD CPUs, Nvidia GPUs, as well Intel's Xeon Phi coprocessor. Our indicate that, compared respective non-optimized single thread implementations, optimization delivers up 17.9×–197.2× various examined. Furthermore, state-of-the-art, presented GPU implementations are 2.6× 1.3× faster respectively than fastest DWT currently available literature. No comparison state-of-the-art can be made Phi, as, best our knowledge, is first study that optimizes newfangled architecture.