Exploring Parallel Bitonic Sort on a Migratory Thread Architecture

作者: Kaushik Velusamy , Thomas B. Rolinger , Janice McMahon , Tyler A. Simon

DOI: 10.1109/HPEC.2018.8547568

关键词:

摘要: Large scale, data-intensive applications pose challenges to systems with a traditional memory hierarchy due their unstructured data sources and irregular access patterns. In response, that employ migratory threads have been proposed mitigate bottlenecks as well reduce energy consumption. One such system is the Emu Chick, which migrates small program context being referenced in access. Sorting an unordered list of elements critical kernel for countless applications, graph processing tensor decomposition. As can be considered highly suitable thread architecture, it imperative understand performance sorting algorithms on these systems. this paper, we implement parallel bitonic sort target Chick system. We investigate explicit comparison-based approach network implementation. Furthermore, explore two different layouts network, namely cyclic blocked. From results our study, find while migrations dictate overall application, cost creation management out-grow migration.

参考文章(18)
Nathan Bell, Jared Hoberock, Thrust: A Productivity-Oriented Library for CUDA Programming Massively Parallel Processors (Third Edition)#R##N#A Hands-on Approach. pp. 359- 371 ,(2012) , 10.1016/B978-0-12-385963-1.00026-5
Charles E. Leiserson, Programming Irregular Parallel Applications in Cilk Lecture Notes in Computer Science. pp. 61- 71 ,(1997) , 10.1007/3-540-63138-0_6
Hagen Peters, Ole Schulz-Hildebrandt, Norbert Luttenberger, Fast in-place sorting with CUDA based on bitonic sort parallel processing and applied mathematics. pp. 403- 410 ,(2009) , 10.1007/978-3-642-14390-8_42
Bilge Acun, Abhishek Gupta, Nikhil Jain, Akhil Langer, Harshitha Menon, Eric Mikida, Xiang Ni, Michael Robson, Yanhua Sun, Ehsan Totoni, Lukasz Wesolowski, Laxmikant Kale, Parallel programming with migratable objects: charm++ in practice ieee international conference on high performance computing data and analytics. pp. 647- 658 ,(2014) , 10.1109/SC.2014.58
Nassimi, Sahni, Bitonic Sort on a Mesh-Connected Parallel Computer IEEE Transactions on Computers. ,vol. 28, pp. 2- 7 ,(1979) , 10.1109/TC.1979.1675216
Jeff Draper, Chang Woo Kang, Ihn Kim, Gokhan Daglikoca, Jacqueline Chame, Mary Hall, Craig Steele, Tim Barrett, Jeff LaCoss, John Granacki, Jaewook Shin, Chun Chen, The architecture of the DIVA processing-in-memory chip international conference on supercomputing. pp. 14- 25 ,(2002) , 10.1145/514191.514197
D. Patterson, T. Anderson, N. Cardwell, R. Fromm, K. Keeton, C. Kozyrakis, R. Thomas, K. Yelick, A case for intelligent RAM IEEE Micro. ,vol. 17, pp. 34- 44 ,(1997) , 10.1109/40.592312
Peter Kogge, EXECUBE-A New Architecture for Scaleable MPPs international conference on parallel processing. ,vol. 1, pp. 77- 84 ,(1994) , 10.1109/ICPP.1994.108
Hagen Peters, Ole Schulz-Hildebrandt, Norbert Luttenberger, A Novel Sorting Algorithm for Many-core Architectures Based on Adaptive Bitonic Sort international parallel and distributed processing symposium. pp. 227- 237 ,(2012) , 10.1109/IPDPS.2012.30
Yong Cheol Kim, Minsoo Jeon, Dongseung Kim, A. Sohn, Communication-efficient bitonic sort on a distributed memory parallel computer international conference on parallel and distributed systems. pp. 165- 170 ,(2001) , 10.1109/ICPADS.2001.934815