A G-Line-Based Network for Fast and Efficient Barrier Synchronization in Many-Core CMPs

作者: Jose L. Abellan , Juan Fernandez , Manuel E. Acacio

DOI: 10.1109/ICPP.2010.34

关键词: Network on a chipShared memoryComputer scienceThread (computing)Context (language use)ScalabilityFlow control (data)Parallel computingDistributed computingSynchronizationInterconnection

摘要: Barrier synchronization in shared memory parallel machines has been widely implemented through busy-waiting on variables. However, typical implementations of barrier tend to produce hot-spots terms and network contention, thus creating performance bottlenecks that become markedly more pronounced as the number cores or processors increases. To overcome such limitations, we present a novel hardware-based mechanism context many-core CMPs. Our proposal is based global interconnection lines (G-lines) S-CSMA technique, which have recently used enhance flow control (EVC) networks-on-chip. Based this technology, designed simple scalable G-line-based operates independently main data network, aimed at carrying out synchronizations efficiently. In ideal case, our design takes only 4 cycles perform once all threads arrived barrier. As proof concept, examine benefits by comparing it with one best software approaches (a binary combining-tree barrier). do so, run several kernels scientific applications top Sim-PowerCMP simulator models 32-core CMP 2D-mesh configuration. entails average reductions execution time 68% 21% for applications, respectively. Additionally, traffic also lowered 74% 18%,

参考文章(33)
Venkata Krishnan, Josep Torrellas, The Need for Fast Communication in Hardware-Based Speculative Chip Multiprocessors International Journal of Parallel Programming. ,vol. 29, pp. 3- 33 ,(2001) , 10.1023/A:1026479803767
John Sartori, Rakesh Kumar, Low-Overhead, High-Speed Multi-core Barrier Synchronization High Performance Embedded Architectures and Compilers. pp. 18- 34 ,(2010) , 10.1007/978-3-642-11515-8_4
David E. Culler, Jaswinder Pal Singh, Anoop Gupta, Parallel Computer Architecture: A Hardware/Software Approach ,(1998)
H.T. Olnowich, ALLNODE barrier synchronization network international parallel processing symposium. pp. 265- 269 ,(1995) , 10.1109/IPPS.1995.395943
M. Makhaniok, R. Manner, Hardware synchronization of massively parallel processes in distributed systems international symposium on parallel architectures algorithms and networks. pp. 157- 164 ,(1997) , 10.1109/ISPAN.1997.645087
Allan Gottlieb, Ralph Grishman, Clyde P. Kruskal, Kevin P. McAuliffe, Larry Rudolph, Marc Snir, The NYU Ultracomputer—designing a MIMD, shared-memory parallel machine (Extended Abstract) ACM SIGARCH Computer Architecture News. ,vol. 10, pp. 27- 42 ,(1982) , 10.1145/1067649.801711
William Tsun-Yuk Hsu, Pen-Chung Yew, None, An effective synchronization network for hot-spot accesses ACM Transactions on Computer Systems. ,vol. 10, pp. 167- 189 ,(1992) , 10.1145/146937.146938
Weirong Zhu, Vugranam C Sreedhar, Ziang Hu, Guang R. Gao, Synchronization state buffer Proceedings of the 34th annual international symposium on Computer architecture - ISCA '07. ,vol. 35, pp. 35- 45 ,(2007) , 10.1145/1250662.1250668
Shisheng Shang, Kai Hwang, Distributed hardwired barrier synchronization for scalable multiprocessor clusters IEEE Transactions on Parallel and Distributed Systems. ,vol. 6, pp. 591- 605 ,(1995) , 10.1109/71.388040
Robert Alverson, David Callahan, Daniel Cummings, Brian Koblenz, Allan Porterfield, Burton Smith, The Tera computer system international conference on supercomputing. ,vol. 18, pp. 1- 6 ,(1990) , 10.1145/2591635.2667161