作者: Jose L. Abellan , Juan Fernandez , Manuel E. Acacio
DOI: 10.1109/ICPP.2010.34
关键词: Network on a chip 、 Shared memory 、 Computer science 、 Thread (computing) 、 Context (language use) 、 Scalability 、 Flow control (data) 、 Parallel computing 、 Distributed computing 、 Synchronization 、 Interconnection
摘要: Barrier synchronization in shared memory parallel machines has been widely implemented through busy-waiting on variables. However, typical implementations of barrier tend to produce hot-spots terms and network contention, thus creating performance bottlenecks that become markedly more pronounced as the number cores or processors increases. To overcome such limitations, we present a novel hardware-based mechanism context many-core CMPs. Our proposal is based global interconnection lines (G-lines) S-CSMA technique, which have recently used enhance flow control (EVC) networks-on-chip. Based this technology, designed simple scalable G-line-based operates independently main data network, aimed at carrying out synchronizations efficiently. In ideal case, our design takes only 4 cycles perform once all threads arrived barrier. As proof concept, examine benefits by comparing it with one best software approaches (a binary combining-tree barrier). do so, run several kernels scientific applications top Sim-PowerCMP simulator models 32-core CMP 2D-mesh configuration. entails average reductions execution time 68% 21% for applications, respectively. Additionally, traffic also lowered 74% 18%,