An Efficient Fault-Tolerant Routing Methodology for Fat-Tree Interconnection Networks
2007; Springer Science+Business Media; Linguagem: Inglês
10.1007/978-3-540-74742-0_46
ISSN1611-3349
AutoresC. Gómez, Marı́a Engracia Gómez Requena, Pedro López, J. Duato,
Tópico(s)Parallel Computing and Optimization Techniques
ResumoIn large cluster-based machines, fault-tolerance in the interconnection network is an issue of growing importance, since their increasing size rises the probability of failure. The topology used in these machines is usually a fat-tree. This paper proposes a new distributed fault-tolerant routing methodology for fat-trees. It does not require additional network hardware. It is scalable, since the required memory, switch hardware and routing delay do not depend on the network size. The methodology is based on enhancing the Interval Routing scheme with exclusion intervals. Exclusion intervals are associated to each switch output port, and represent the set of nodes that are unreachable from this port after a failure appears. We propose a mechanism to identify the exclusion intervals that must be updated after detecting a failure, and the values to write on them. Our methodology is able to support a relatively high number of network failures with a low degradation in network performance.
Referência(s)