作者: Sven-Arne Reinemo , Bartosz Bogdanski , Bjorn Dag Johnsen
DOI: 10.1109/PDP.2014.22
关键词:
摘要: For clusters where the topology consists of a fat-tree or more than one combined into subnet, there are several properties that routing algorithms should support, beyond what exists today. One missing is current algorithm does not guarantee each port on multi-homed node routed through redundant spines, even if these ports connected to leaves. As consequence, in case spine failure, small window unreachable until subnet manager has rerouted another spine. In this paper, we discuss need for independent routes nodes fat-trees by providing real-life examples when single point failure leads complete outage multi-port node. We present and implement methods may be used alleviate problem perform simulations demonstrate improvements performance, scalability, availability predictability InfiniBand topologies. show our only increase performance up 52.6%, but also, importantly, no downtime associated with switch failure.