作者: Jingjing Wang , Ketan Bahulkar , Dmitry Ponomarev , Nael Abu-Ghazaleh
关键词: Scalability 、 Distributed computing 、 Bottleneck 、 Computer science 、 Parallel computing 、 Polling 、 Software 、 Shared memory 、 Discrete event simulation 、 Latency (engineering) 、 Chip
摘要: The performance and scalability of Parallel Discrete Event Simulation (PDES) is often limited by communication latencies overheads. emergence multi-core processors their expected evolution into many-cores offers the promise low latency tight memory integration between cores; these properties should significantly improve PDES in such environments. However, on clusters multi-cores (CMs), processing overheads incurred when communicating different machines (nodes) far outweigh those cores same chip, especially commodity networking fabrics software are used. It unclear if there any benefit to among node given that links across nodes worse. In this study, we examine a multi-threaded implementation CMs. We demonstrate inter-node costs impose substantial bottleneck without optimizations addressing long latencies, does not outperform multiprocess version despite direct through shared individual nodes. then propose three optimizations: message consolidation routing, infrequent polling latency-sensitive model partitioning. show with place, threaded outperforms process-based even