作者: Misbah Mubarak , Philip Carns , Jonathan Jenkins , Jianping Kelvin Li , Nikhil Jain
关键词: Network performance 、 Multipath I/O 、 Latency (engineering) 、 Discrete event simulation 、 Computer science 、 Computer network 、 Resource allocation 、 Network traffic control 、 Distributed computing 、 Input/output 、 Workload 、 Network topology 、 Network packet 、 Interconnection 、 Scheduling (computing) 、 Resource management
摘要: HPC systems have shifted to burst buffer storage and high radix interconnect topologies in order meet the challenges of large-scale, data-intensive scientific computing. Both these technologies been studied detail independently, but interaction between them is not well understood. I/O traffic communication from concurrently scheduled applications may interfere with each other unexpected ways, this behavior vary considerably depending on resource allocation, scheduling, routing policies.In work, we analyze network interference burst-buffer-equipped dragonfly-based using high-resolution packet-level simulations provided by CODES simulation framework. The analysis performed realistic workload sizes, a variety allocation strategies employed production environments, dragonfly configuration modeled after current vendor options. We impact both traffic.We observe that although average packet latency stable across wide configurations, maximum presence concurrent highly sensitive subtle policy changes. Our reveal worst-case single 4,700 times for sub-optimal configurations. While topology-aware mapping compute nodes can minimize variation latency, it slow down creating contention nodes. Overall, balancing performance requires careful selection routing, data placement, job placement policies.