Exploiting Thread-Level Parallelism in Lockstep Execution by Partially Duplicating a Single Pipeline

作者: Jaegeun Oh , Seok Joong Hwang , Huong Giang Nguyen , Areum Kim , Seon Wook Kim

DOI: 10.4218/ETRIJ.08.0107.0343

关键词: Computer scienceParallel computingPipeline (computing)Degree of parallelismTask parallelismLockstepCompilerSIMDMultiprocessingMultithreading

摘要: In most parallel loops of embedded applications, every iteration executes the exact same sequence instructions while manipulating different data. This fact motivates a new compiler-hardware orchestrated execution framework in which all threads share one fetch unit and decode but have their own execution, memory, write-back units. resource sharing enables to execute lockstep with minimal hardware extension compiler support. Our proposed architecture, called multithreaded processor (MLEP), is compromise between single-instruction multiple-data (SIMD) symmetric multithreading/chip multiprocessor (SMT/CMP) solutions. The approach more favorable than typical SIMD terms degree parallelism, range applicability, code generation, can save power chip area SMT/CMP without significant performance degradation. For architecture verification, we extend commercial 32-bit core AE32000C synthesize it on Xilinx FPGA. Compared original our 13.5% faster 2-way MLEP 33.7% 4-way EEMBC benchmarks are automatically parallelized by Intel compiler. Keywords: ILP, TLP, SMT, CMP, MLEP.

参考文章(15)
Yoshio Tanaka, Shigehisa Satoh, Mitsuhisa Sato, Kazuhiro Kusano, Design of OpenMP Compiler for an SMP Cluster ,(1999)
Henk Corporaal, Ireneusz Karkowski, Exploiting fine- and coarse-grain parallelism in embedded programs international conference on parallel architectures and compilation techniques. pp. 60- 67 ,(1998) , 10.5555/522344.825688
Huong Giang Nguyen, Seok Joong Hwang, Seon Wook Kim, Compiler Construction for Lockstep Execution of Multithreaded Processors computer and information technology. pp. 829- 834 ,(2007) , 10.1109/CIT.2007.14
Hillery C. Hunter, Jaime H. Moreno, A new look at exploiting data parallelism in embedded systems compilers, architecture, and synthesis for embedded systems. pp. 159- 169 ,(2003) , 10.1145/951710.951733
Jack L. Lo, Joel S. Emer, Henry M. Levy, Rebecca L. Stamm, Dean M. Tullsen, S. J. Eggers, Converting thread-level parallelism to instruction-level parallelism via simultaneous multithreading ACM Transactions on Computer Systems. ,vol. 15, pp. 322- 354 ,(1997) , 10.1145/263326.263382
Hongtao Zhong, Steven A. Lieberman, Scott A. Mahlke, Extending Multicore Architectures to Exploit Hybrid Parallelism in Single-thread Applications high-performance computer architecture. pp. 25- 36 ,(2007) , 10.1109/HPCA.2007.346182
D. Talla, L.K. John, V. Lapinskii, B.L. Evans, Evaluating signal processing and multimedia applications on SIMD, VLIW and superscalar architectures international conference on computer design. pp. 163- 172 ,(2000) , 10.1109/ICCD.2000.878283
Dean M. Tullsen, Susan J. Eggers, Henry M. Levy, Simultaneous multithreading: maximizing on-chip parallelism international symposium on computer architecture. ,vol. 23, pp. 392- 403 ,(1995) , 10.1145/223982.224449
J.D. Collins, D.M. Tullsen, Clustered multithreaded architectures - pursuing both IPC and cycle time international parallel and distributed processing symposium. ,vol. 2, pp. 76- 85 ,(2004) , 10.1109/IPDPS.2004.1303010
Jaegeun Oh, Seon Wook Kim, Chulwoo Kim, OpenMP and compilation issue in embedded applications international workshop on openmp. pp. 109- 121 ,(2003) , 10.1007/3-540-45009-2_9