SynchronizationCoherence:ATransparentHardwareMechanismfor CacheCoherenceandFine-GrainedSynchronization

作者: Csaba Andras Moritz , Richard Weiss , Raksit Ashok , Vladimir Vlassov , Yao Guo

DOI:

关键词: CPU cacheDistributed computingCache coherenceComputer scienceData synchronizationChipMultiprocessingCoherence (physics)BottleneckPerformance improvement

摘要: The quest to improve performance forces designers explore flner-grained multiprocessor machines. Ever increasing chip densities based on CMOS improvements fuel research in highly parallel multiprocessors with 100s of processing elements. With such levels parallelism, synchronization is set become a major bottleneck and e‐cient support for an important design criterion. Previous has shown that integrating flne-grained can have signiflcant beneflts compared traditional coarse-grained synchronization. Not much progress been made supporting transparently processor nodes: key reason perhaps why wide adoption not followed. In this paper, we propose novel approach called Synchronization Coherence provide transparent flnegrained caching machine single-chip multiprocessor. Our merges mechanisms cache coherence protocols. It reduces network utilization as well related overheads while adding minimal hardware complexity or previously reported techniques. addition its beneflt making nodes, the applications studied, it provides up 23% improvement 24% energy e‐ciency no L2 caches previous increases 38% when simulating ideal system.

参考文章(32)
Paul S. Barth, , Rishiyur S. Nikhil, M-structures: Extending a parallel, non-strict, functional language with state Functional Programming Languages and Computer Architecture. pp. 538- 568 ,(1991) , 10.1007/3540543961_26
Arvind, Rishiyur S. Nikhil, Keshav K. Pingali, I-structures: data structures for parallel computing ACM Transactions on Programming Languages and Systems. ,vol. 11, pp. 598- 632 ,(1989) , 10.1145/69558.69562
Stephen W. Keckler, William J. Dally, Nicholas P. Carter, Andrew Chang, Whay S. Lee, Daniel Maskit, Exploiting fine-grain thread level parallelism on the MIT multi-ALU processor international symposium on computer architecture. ,vol. 26, pp. 306- 317 ,(1998) , 10.1145/279358.279399
Martin C. Carlisle, Anne Rogers, Software caching and computation migration in Olden acm sigplan symposium on principles and practice of parallel programming. ,vol. 30, pp. 29- 38 ,(1995) , 10.1145/209936.209941
Jon Louis Bentley, A parallel algorithm for constructing minimum spanning trees Journal of Algorithms. ,vol. 1, pp. 51- 59 ,(1980) , 10.1016/0196-6774(80)90004-8
Doug Burger, Todd M. Austin, The SimpleScalar tool set, version 2.0 ACM Sigarch Computer Architecture News. ,vol. 25, pp. 13- 25 ,(1997) , 10.1145/268806.268810
Naraig Manjikian, Multiprocessor enhancements of the SimpleScalar tool set ACM SIGARCH Computer Architecture News. ,vol. 29, pp. 8- 15 ,(2001) , 10.1145/373574.373578
Robert Alverson, David Callahan, Daniel Cummings, Brian Koblenz, Allan Porterfield, Burton Smith, The Tera computer system international conference on supercomputing. ,vol. 18, pp. 1- 6 ,(1990) , 10.1145/2591635.2667161
E. Waingold, M. Taylor, D. Srikrishna, V. Sarkar, W. Lee, V. Lee, J. Kim, M. Frank, P. Finch, R. Barua, J. Babb, S. Amarasinghe, A. Agarwal, Baring it all to software: Raw machines IEEE Computer. ,vol. 30, pp. 86- 93 ,(1997) , 10.1109/2.612254
Ravi Rajwar, James R. Goodman, Speculative lock elision: enabling highly concurrent multithreaded execution international symposium on microarchitecture. pp. 294- 305 ,(2001) , 10.5555/563998.564036