Exploiting Thread-Level Parallelism in Lockstep Execution by Partially Duplicating a Single Pipeline

作者： Jaegeun Oh , Seok Joong Hwang , Huong Giang Nguyen , Areum Kim , Seon Wook Kim

关键词: Computer science 、 Parallel computing 、 Pipeline (computing) 、 Degree of parallelism 、 Task parallelism 、 Lockstep 、 Compiler 、 SIMD 、 Multiprocessing 、 Multithreading

摘要: In most parallel loops of embedded applications, every iteration executes the exact same sequence instructions while manipulating different data. This fact motivates a new compiler-hardware orchestrated execution framework in which all threads share one fetch unit and decode but have their own execution, memory, write-back units. resource sharing enables to execute lockstep with minimal hardware extension compiler support. Our proposed architecture, called multithreaded processor (MLEP), is compromise between single-instruction multiple-data (SIMD) symmetric multithreading/chip multiprocessor (SMT/CMP) solutions. The approach more favorable than typical SIMD terms degree parallelism, range applicability, code generation, can save power chip area SMT/CMP without significant performance degradation. For architecture verification, we extend commercial 32-bit core AE32000C synthesize it on Xilinx FPGA. Compared original our 13.5% faster 2-way MLEP 33.7% 4-way EEMBC benchmarks are automatically parallelized by Intel compiler. Keywords: ILP, TLP, SMT, CMP, MLEP.

elsevier.com 本地加速

dissem.in 本地加速

koreascience.or.kr LINK 下载加速

sci-hub.st HTML 下载加速

参考文章(15)

Yoshio Tanaka, Shigehisa Satoh, Mitsuhisa Sato, Kazuhiro Kusano, Design of OpenMP Compiler for an SMP Cluster ,(1999)

Henk Corporaal, Ireneusz Karkowski, Exploiting fine- and coarse-grain parallelism in embedded programs international conference on parallel architectures and compilation techniques. pp. 60- 67 ,(1998) , 10.5555/522344.825688

Huong Giang Nguyen, Seok Joong Hwang, Seon Wook Kim, Compiler Construction for Lockstep Execution of Multithreaded Processors computer and information technology. pp. 829- 834 ,(2007) , 10.1109/CIT.2007.14

Hillery C. Hunter, Jaime H. Moreno, A new look at exploiting data parallelism in embedded systems compilers, architecture, and synthesis for embedded systems. pp. 159- 169 ,(2003) , 10.1145/951710.951733

Jack L. Lo, Joel S. Emer, Henry M. Levy, Rebecca L. Stamm, Dean M. Tullsen, S. J. Eggers, Converting thread-level parallelism to instruction-level parallelism via simultaneous multithreading ACM Transactions on Computer Systems. ,vol. 15, pp. 322- 354 ,(1997) , 10.1145/263326.263382

Hongtao Zhong, Steven A. Lieberman, Scott A. Mahlke, Extending Multicore Architectures to Exploit Hybrid Parallelism in Single-thread Applications high-performance computer architecture. pp. 25- 36 ,(2007) , 10.1109/HPCA.2007.346182

D. Talla, L.K. John, V. Lapinskii, B.L. Evans, Evaluating signal processing and multimedia applications on SIMD, VLIW and superscalar architectures international conference on computer design. pp. 163- 172 ,(2000) , 10.1109/ICCD.2000.878283

Dean M. Tullsen, Susan J. Eggers, Henry M. Levy, Simultaneous multithreading: maximizing on-chip parallelism international symposium on computer architecture. ,vol. 23, pp. 392- 403 ,(1995) , 10.1145/223982.224449

J.D. Collins, D.M. Tullsen, Clustered multithreaded architectures - pursuing both IPC and cycle time international parallel and distributed processing symposium. ,vol. 2, pp. 76- 85 ,(2004) , 10.1109/IPDPS.2004.1303010

10.

Jaegeun Oh, Seon Wook Kim, Chulwoo Kim, OpenMP and compilation issue in embedded applications international workshop on openmp. pp. 109- 121 ,(2003) , 10.1007/3-540-45009-2_9

Exploiting Thread-Level Parallelism in Lockstep Execution by Partially Duplicating a Single Pipeline

来源期刊

我的账户

Exploiting Thread-Level Parallelism in Lockstep Execution by Partially Duplicating a Single Pipeline

来源期刊

相似文章 6

Executing subroutines in a multi-threaded processing system

Selectively activating a resume check operation in a multi-threaded processing system

Circular Buﬀers with Multiple Overlapping Windows for Cyclic Task Graphs

Inter-task communication via overlapping read and write windows for deadlock-free execution of cyclic task graphs

Program flow control for multiple divergent SIMD threads using a minimum resume counter

Design Space Exploration for GPU-Based Architecture

我的账户