Software integration of identical DLP threads via compilation for VLIW processors

作者: Maolin Guan , Nan Wu , Mei Wen , Chunyuan Zhang

DOI: 10.1109/ICCIT.2010.5711096

关键词:

摘要: Based on the characteristics of data level parallelism (DLP) multi-threading programs appearing in practical application, this paper proposes a new method that implements software integration identical DLP threads via compilation for VLIW processors. This translates into ILP by merging operations corresponding basic blocks divided from different one block to extend instruction window compiler can schedule, and optimizes control flow program after thread ensure correctness program. The experimental results show technique accelerate execution very well without exerting more burdens programmer, while hardware overhead be ignored. Generally speaking, 2∼4 get speedup 1.34∼2.07.

参考文章(15)
Stephan Suijkerbuijk, Ben H. H. Juurlink, Implementing Hardware Multithreading in a VLIW Architecture. IASTED PDCS. pp. 674- 679 ,(2005)
Preston Briggs, Register allocation via graph coloring Rice University. ,(1992)
Alexander Guimaraes Dean, Software thread integration for hardware to software migration Carnegie Mellon University. ,(2000)
Emre Özer, Thomas M. Conte, Saurabh Sharma, Weld: A Multithreading Technique Towards Latency-Tolerant VLIW Processors ieee international conference on high performance computing data and analytics. pp. 192- 203 ,(2001) , 10.1007/3-540-45307-5_17
Roger Alexander Bringmann, Enhancing instruction level parallelism through compiler-controlled speculation University of Illinois at Urbana-Champaign. ,(1995)
Won So, Alexander G. Dean, Complementing software pipelining with software thread integration Proceedings of the 2005 ACM SIGPLAN/SIGBED conference on Languages, compilers, and tools for embedded systems - LCTES'05. ,vol. 40, pp. 137- 146 ,(2005) , 10.1145/1065910.1065930
Mei Wen, Nan Wu, Hai-Yan Li, Chun-Yuan Zhang, Multiple-Morphs Adaptive Stream Architecture Journal of Computer Science and Technology. ,vol. 20, pp. 635- 646 ,(2005) , 10.1007/S11390-005-0635-7
Ozana Silvia Dragomir, Todor Stefanov, Koen Bertels, Optimal Loop Unrolling and Shifting for Reconfigurable Architectures ACM Transactions on Reconfigurable Technology and Systems. ,vol. 2, pp. 1- 24 ,(2009) , 10.1145/1575779.1575785
Manoj Gupta, Fermin Sanchez, Josep Llosa, CSMT: Simultaneous Multithreading for Clustered VLIW Processors IEEE Transactions on Computers. ,vol. 59, pp. 385- 399 ,(2010) , 10.1109/TC.2009.96
Thomas M. Conte, Emre Özer, Sanjeev Banerjia, Unified assign and schedule: a new approach to scheduling for clustered register file microarchitectures international symposium on microarchitecture. pp. 308- 315 ,(1998) , 10.5555/290940.291004