Coherent Network Interfaces for Fine-Grain Communication

作者: Shubhendu S. Mukherjee , Babak Falsafi , Mark D. Hill , David A. Wood

DOI: 10.1145/232973.232999

关键词:

摘要: Historically, processor accesses to memory-mapped device registers have been marked uncachable insure their visibility the device. The ubiquity of snooping cache coherence, however, makes it possible for processors and devices interact with cachable, coherent memory operations. Using coherence can improve performance by facilitating burst transfers whole blocks reducing control overheads (e.g., polling).This paper begins an exploration network interfaces (NIs) that use coherence---coherent (CNIs)---to communication performance. We restrict this study NI/CNIs reside on or I/O buses, are much simpler than processors, fine-grain messaging from user process process.Our first contribution is develop optimize two mechanisms CNIs communicate processors. A cachable register---derived [39,40]---is a coherent, block used transfer status, control, data between processor. Cachable queues generalize one contiguous region managed as circular queue.Our second taxonomy comparison four more conventional NI. Microbenchmark results show round-trip latency achievable bandwidth small 64-byte message 37% 125% respectively bus 74% 123% bus. Experiments five macrobenchmarks 17-53% 30-88%

参考文章(40)
Duncan Roweth, Computing Surface 2 Supercomputer '93 Anwendungen, Architekturen, Trends, Seminar. pp. 36- 45 ,(1993) , 10.1007/978-3-642-78348-7_5
Shlomo Weiss, James E. Smith, POWER and PowerPC ,(1994)
Derek Chiou, Boon S. Ang, Robert Greiner, Arvind, James C. Hoe, Michael J. Beckerle, James E. Hicks, Andy Boughton, START-NG: Delivering Seamless Parallel Computing european conference on parallel processing. pp. 101- 116 ,(1995) , 10.1007/BFB0020458
J. Kubiatowicz, A. Agarwal, K. Mackenzie, M. F. Kaashoek, FUGU: Implementing Translation and Protection in a Multiuser, Multimodel Multiprocessor Massachusetts Institute of Technology. ,(1994)
C. Thompson, Special Interest Group The Psychiatrist. ,vol. 19, pp. 185- 185 ,(1995) , 10.1192/PB.19.3.185
Fredrik Dahlgren, Boosting the performance of hybrid snooping cache protocols international symposium on computer architecture. ,vol. 23, pp. 60- 69 ,(1995) , 10.1145/223982.223998
Vijay Karamcheti, Andrew A. Chien, A comparison of architectural support for messaging in the TMC CM-5 and the Cray T3D international symposium on computer architecture. ,vol. 23, pp. 298- 307 ,(1995) , 10.1145/223982.224440
Eric A. Brewer, Frederic T. Chong, Lok T. Liu, Shamik D. Sharma, John D. Kubiatowicz, Remote queues Proceedings of the seventh annual ACM symposium on Parallel algorithms and architectures - SPAA '95. pp. 42- 53 ,(1995) , 10.1145/215399.215416
Charles E. Leiserson, David S. Wells, Monica C. Wong, Shaw-Wen Yang, Robert Zak, Zahi S. Abuhamdeh, David C. Douglas, Carl R. Feynman, Mahesh N. Ganmukhi, Jeffrey V. Hill, Daniel Hillis, Bradley C. Kuszmaul, Margaret A. St. Pierre, The network architecture of the Connection Machine CM-5 (extended abstract) Proceedings of the fourth annual ACM symposium on Parallel algorithms and architectures - SPAA '92. pp. 272- 285 ,(1992) , 10.1145/140901.141883
A. Krishnamurthy, D. E. Culler, A. Dusseau, S. C. Goldstein, S. Lumetta, T. von Eicken, K. Yelick, Parallel programming in Split-C conference on high performance computing (supercomputing). pp. 262- 273 ,(1993) , 10.1145/169627.169724