Parallelization techniques with improved dependence handling

作者: Easwaran Raman , David I. August

DOI:

关键词:

摘要: Continuing exponential growth in transistor density and diminishing returns from the increasing count have forced processor manufacturers to pack multiple cores onto a single chip. These processors, known as multi-core generally do not improve performance of single-threaded applications. Automatic parallelization has key role play improving legacy newly written applications this new multi-threaded era. Automatic parallelizations transform code into semantically equivalent by preserving dependences original code. This dissertation proposes two automatic techniques that differ related existing their handling dependences. difference dependence enables proposed outperform techniques. The first technique is parallel-stage decoupled software pipelining (PS-DSWP). PS-DSWP extends pipelined like DSWP allowing certain stages be executed threads. Such parallel execution requires distinguishing inter-iteration loop being parallelized rest The applicability effectiveness further enhanced applying speculation remove some second technique, speculative iteration chunk (Spice), uses value ignore dependences, enabling chunks iterations parallel. Unlike other value-speculation based techniques, Spice speculates only few dynamic instances those Both these are implemented VELOCITY compiler evaluated using Itanium 2 simulator. results geometric mean speedup 2.13 over with five threads on set loops benchmarks. use improves resulting 3.67 six shows 2.01 four Based above experimental qualitative quantitative comparisons demonstrates

参考文章(69)
Ron Cytron, Doacross: Beyond Vectorization for Multiprocessors. international conference on parallel processing. pp. 836- 844 ,(1986)
Todd C. Mowry, J. Gregory Steffan, Hardware support for thread-level speculation Carnegie Mellon University. ,(2003)
David Alejandro Padua Haiek, Multiprocessors: discussion of some theoretical and practical problems University of Illinois at Urbana-Champaign. ,(1980)
G H Barnes, S F Lundstrom, A controllable MIMD architecture Advanced computer architecture. pp. 30- 38 ,(1986)
William Thies, Michal Karczmarek, Saman Amarasinghe, StreamIt: A Language for Streaming Applications compiler construction. pp. 179- 196 ,(2002) , 10.1007/3-540-45937-5_14
X. Berenguer, J. Diaz, The Weighted Sperner's Set Problem mathematical foundations of computer science. pp. 137- 141 ,(1980) , 10.1007/BFB0022500
D.B. Loveman, High performance Fortran IEEE Parallel & Distributed Technology: Systems & Applications. ,vol. 1, pp. 25- 42 ,(1993) , 10.1109/88.219857
Stefan Savage, Michael Burrows, Greg Nelson, Patrick Sobalvarro, Thomas Anderson, Eraser: a dynamic data race detector for multithreaded programs ACM Transactions on Computer Systems. ,vol. 15, pp. 391- 411 ,(1997) , 10.1145/265924.265927
Troy A. Johnson, Rudolf Eigenmann, T. N. Vijaykumar, Min-cut program decomposition for thread-level speculation Proceedings of the ACM SIGPLAN 2004 conference on Programming language design and implementation - PLDI '04. ,vol. 39, pp. 59- 70 ,(2004) , 10.1145/996841.996851