Quantifying I/O and Communication Traffic Interference on Dragonfly Networks Equipped with Burst Buffers

作者: Misbah Mubarak , Philip Carns , Jonathan Jenkins , Jianping Kelvin Li , Nikhil Jain

DOI: 10.1109/CLUSTER.2017.25

关键词: Network performanceMultipath I/OLatency (engineering)Discrete event simulationComputer scienceComputer networkResource allocationNetwork traffic controlDistributed computingInput/outputWorkloadNetwork topologyNetwork packetInterconnectionScheduling (computing)Resource management

摘要: HPC systems have shifted to burst buffer storage and high radix interconnect topologies in order meet the challenges of large-scale, data-intensive scientific computing. Both these technologies been studied detail independently, but interaction between them is not well understood. I/O traffic communication from concurrently scheduled applications may interfere with each other unexpected ways, this behavior vary considerably depending on resource allocation, scheduling, routing policies.In work, we analyze network interference burst-buffer-equipped dragonfly-based using high-resolution packet-level simulations provided by CODES simulation framework. The analysis performed realistic workload sizes, a variety allocation strategies employed production environments, dragonfly configuration modeled after current vendor options. We impact both traffic.We observe that although average packet latency stable across wide configurations, maximum presence concurrent highly sensitive subtle policy changes. Our reveal worst-case single 4,700 times for sub-optimal configurations. While topology-aware mapping compute nodes can minimize variation latency, it slow down creating contention nodes. Overall, balancing performance requires careful selection routing, data placement, job placement policies.

参考文章(30)
Ioan Raicu, Dongfang Zhao, Da Zhang, Ke Wang, Exploring reliability of exascale systems through simulations high performance computing symposium. pp. 1- ,(2013) , 10.5555/2499968.2499969
William Gropp, Ewing Lusk, Reproducible Measurements of MPI Performance Characteristics european pvm mpi users group meeting on recent advances in parallel virtual machine and message passing interface. pp. 11- 18 ,(1999) , 10.1007/3-540-48158-3_2
Ning Liu, Jason Cope, Philip Carns, Christopher Carothers, Robert Ross, Gary Grider, Adam Crume, Carlos Maltzahn, On the role of burst buffers in leadership-class storage systems ieee conference on mass storage systems and technologies. pp. 1- 11 ,(2012) , 10.1109/MSST.2012.6232369
Edwin Froese, Bob Alverson, Greg Faanes, Joe Kopnick, James Reinhard, Tim Johnson, Duncan Roweth, Tom Court, Abdulla Bataineh, Mike Higgins, Cray cascade: a scalable HPC system based on a Dragonfly network ieee international conference on high performance computing data and analytics. pp. 1- 9 ,(2012) , 10.5555/2388996.2389136
Peter D. Barnes, Christopher D. Carothers, David R. Jefferson, Justin M. LaPre, Warp speed: executing time warp on 1,966,080 cores principles of advanced discrete simulation. pp. 327- 336 ,(2013) , 10.1145/2486092.2486134
Misbah Mubarak, Christopher D. Carothers, Robert Ross, Philip Carns, Modeling a Million-Node Dragonfly Network Using Massively Parallel Discrete-Event Simulation ieee international conference on high performance computing data and analytics. ,vol. 1, pp. 366- 376 ,(2012) , 10.1109/SC.COMPANION.2012.56
Jongmin Won, Gwangsun Kim, John Kim, Ted Jiang, Mike Parker, Steve Scott, Overcoming far-end congestion in large-scale networks high-performance computer architecture. pp. 415- 427 ,(2015) , 10.1109/HPCA.2015.7056051
Teng Wang, Sarp Oral, Yandong Wang, Brad Settlemyer, Scott Atchley, Weikuan Yu, BurstMem: A high-performance burst buffer system for scientific applications international conference on big data. pp. 71- 79 ,(2014) , 10.1109/BIGDATA.2014.7004215
Maciej Besta, Torsten Hoefler, Slim fly: a cost effective low-diameter network topology ieee international conference on high performance computing data and analytics. pp. 348- 359 ,(2014) , 10.1109/SC.2014.34
Roger Hockney, Paper: Performance parameters and benchmarking of supercomputers parallel computing. ,vol. 17, pp. 1111- 1130 ,(1991) , 10.1016/S0167-8191(05)80029-8