Multi-homed Fat-Tree Routing with InfiniBand

作者: Sven-Arne Reinemo , Bartosz Bogdanski , Bjorn Dag Johnsen

DOI: 10.1109/PDP.2014.22

关键词:

摘要: For clusters where the topology consists of a fat-tree or more than one combined into subnet, there are several properties that routing algorithms should support, beyond what exists today. One missing is current algorithm does not guarantee each port on multi-homed node routed through redundant spines, even if these ports connected to leaves. As consequence, in case spine failure, small window unreachable until subnet manager has rerouted another spine. In this paper, we discuss need for independent routes nodes fat-trees by providing real-life examples when single point failure leads complete outage multi-port node. We present and implement methods may be used alleviate problem perform simulations demonstrate improvements performance, scalability, availability predictability InfiniBand topologies. show our only increase performance up 52.6%, but also, importantly, no downtime associated with switch failure.

参考文章(17)
Hans Meuer, E. Strohmaier, J. Dongarra, Horst Simon, Top500 Supercomputer Sites University of Tennessee. ,(1997)
Bartosz Bogdanski, Bjorn Dag Johnsen, Sven-Arne Reinemo, Frank Olaf Sem-Jacobsen, Discovery and Routing of Degraded Fat-Trees parallel and distributed computing: applications and technologies. pp. 697- 702 ,(2012) , 10.1109/PDCAT.2012.67
Sven-Arne Reinemo, Ernst Gunnar Gran, InfiniBand congestion control: modelling and validation simulation tools and techniques for communications, networks and system. pp. 390- 397 ,(2011) , 10.5555/2151054.2151122
P. Lopez, J. Flich, J. Duato, Deadlock-free routing in InfiniBand/sup TM/ through destination renaming international conference on parallel processing. pp. 427- 434 ,(2001) , 10.1109/ICPP.2001.952089
Santosh Mahapatra, Xin Yuan, Wickus Nienaber, Limited Multi-path Routing on Extended Generalized Fat-trees international parallel and distributed processing symposium. pp. 938- 945 ,(2012) , 10.1109/IPDPSW.2012.115
Darren J. Kerbyson, Michael Lang, Eitan Zahavi, Gregory Johnson, Optimized InfiniBand TM fat-tree routing for shift all-to-all communication patterns international supercomputing conference. ,vol. 22, pp. 217- 231 ,(2010) , 10.1002/CPE.V22:2
German Rodriguez, Cyriel Minkenberg, Ramon Beivide, Ronald P. Luijten, Jesus Labarta, Mateo Valero, Oblivious routing schemes in extended generalized Fat Tree networks international conference on cluster computing. pp. 1- 8 ,(2009) , 10.1109/CLUSTR.2009.5289145
Xin Yuan, Wickus Nienaber, Zhenhai Duan, Rami Melhem, Oblivious routing for fat-tree based system area networks with uncertain traffic demands measurement and modeling of computer systems. ,vol. 35, pp. 337- 348 ,(2007) , 10.1145/1254882.1254922
Bartosz Bogdanski, Frank Olaf Sem-Jacobsen, Sven-Arne Reinemo, Tor Skeie, Line Holen, Lars Paul Huse, Achieving Predictable High Performance in Imbalanced Fat Trees 2010 IEEE 16th International Conference on Parallel and Distributed Systems. pp. 381- 388 ,(2010) , 10.1109/ICPADS.2010.94
Jens Domke, Torsten Hoefler, Wolfgang E. Nagel, Deadlock-Free Oblivious Routing for Arbitrary Topologies international parallel and distributed processing symposium. pp. 616- 627 ,(2011) , 10.1109/IPDPS.2011.65