作者: Shubhendu S. Mukherjee , Babak Falsafi , Mark D. Hill , David A. Wood
关键词:
摘要: Historically, processor accesses to memory-mapped device registers have been marked uncachable insure their visibility the device. The ubiquity of snooping cache coherence, however, makes it possible for processors and devices interact with cachable, coherent memory operations. Using coherence can improve performance by facilitating burst transfers whole blocks reducing control overheads (e.g., polling).This paper begins an exploration network interfaces (NIs) that use coherence---coherent (CNIs)---to communication performance. We restrict this study NI/CNIs reside on or I/O buses, are much simpler than processors, fine-grain messaging from user process process.Our first contribution is develop optimize two mechanisms CNIs communicate processors. A cachable register---derived [39,40]---is a coherent, block used transfer status, control, data between processor. Cachable queues generalize one contiguous region managed as circular queue.Our second taxonomy comparison four more conventional NI. Microbenchmark results show round-trip latency achievable bandwidth small 64-byte message 37% 125% respectively bus 74% 123% bus. Experiments five macrobenchmarks 17-53% 30-88%