How much does network contention affect distributed shared memory performance

作者: Donglai Dai , D.K. Panda

DOI: 10.1109/ICPP.1997.622680

关键词:

摘要: Most of recent research on distributed shared memory (DSM) systems have focused either careful design node controllers or cache coherence protocols. While evaluating these designs, simplified models networks (constant latency average based the network size) are typically used. Such completely ignore contention. To help designers to better for DSM systems, in this paper; we focus two goals: 1) isolate and quantify impact link contention interface overall performance applications 2) study critical architectural parameters categories We achieve goals by a set SPLASH2 benchmarks simulator using three models. For an 8/spl times/8 wormhole system, our results show that can degrade up 59.8%. Out this, 7.2% is caused alone. The indicates becomes dominant small caches, wide line sizes, low degrees associativity, high processing speeds, widths.

参考文章(11)
James R. Larus, Alvin R. Lebeck, Mark D. Hill, David A. Wood, Steven K. Reinhardt, James C. Lewis, The Wisconsin Wind Tunnel: virtual prototyping of parallel computers Multiprocessor performance measurement and evaluation. pp. 150- 162 ,(1995)
John L. Hennessy, David A. Patterson, Computer Architecture: A Quantitative Approach ,(1989)
Donglai Dai, Dhabaleswar K. Panda, How Can We Design Better Networks for DSM Systems PCRCW '97 Proceedings of the Second International Workshop on Parallel Computer Routing and Communication. pp. 171- 184 ,(1997) , 10.1007/3-540-69352-1_15
Chris Holt, Jaswinder Pal Singh, John Hennessy, Application and Architectural Bottlenecks in Large Scale Distributed Shared Memory Machines international symposium on computer architecture. ,vol. 24, pp. 134- 145 ,(1996) , 10.1145/232973.232988
Mark Heinrich, Jeffrey Kuskin, David Ofelt, John Heinlein, Joel Baxter, Jaswinder Pal Singh, Richard Simoni, Kourosh Gharachorloo, David Nakahira, Mark Horowitz, Anoop Gupta, Mendel Rosenblum, John Hennessy, The performance impact of flexibility in the Stanford FLASH multiprocessor architectural support for programming languages and operating systems. ,vol. 29, pp. 274- 285 ,(1994) , 10.1145/195470.195569
D.C. Burger, D.A. Wood, Accuracy vs. performance in parallel simulation of interconnection networks international parallel processing symposium. pp. 22- 31 ,(1995) , 10.1109/IPPS.1995.395909
Steven Cameron Woo, Moriyoshi Ohara, Evan Torrie, Jaswinder Pal Singh, Anoop Gupta, The SPLASH-2 programs: characterization and methodological considerations international symposium on computer architecture. ,vol. 23, pp. 24- 36 ,(1995) , 10.1145/223982.223990
L.M. Ni, P.K. McKinley, A survey of wormhole routing techniques in direct networks IEEE Computer. ,vol. 26, pp. 492- 506 ,(1993) , 10.1109/2.191995
Donglai Dai, D.K. Panda, Reducing cache invalidation overheads in wormhole routed DSMs using multidestination message passing international conference on parallel processing. ,vol. 1, pp. 138- 145 ,(1996) , 10.1109/ICPP.1996.537154
Jeffrey Kuskin, K. Gharachorloo, J. Hennessy, David Ofelt, Richard Simoni, D. Nakahira, J. Chapin, A. Gupta, M. Rosenblum, Mark Heinrich, M. Horowitz, John Heinlein, J. Baxter, The Stanford FLASH multiprocessor international symposium on computer architecture. ,vol. 22, pp. 302- 313 ,(1994) , 10.1145/191995.192056