next up previous
Next: Graph Theory Background Up: Designing Computer Networks to Previous: Designing Computer Networks to

Introduction.

A computer network consists of host computers that are nodes of the network and communication links that connect the nodes. The nodes and links of a network are generally very reliable components, yet do occasionally fail. The failure of such a component, by itself, is not usually disastrous, and methods for recovering from such failures are well known. Furthermore, the failure of a network node or link will not typically affect network functioning for the nodes that are still in service as long as other links can be used to route messages that would normally be sent through the failed node or over the failed link. (Of course, network performance could be affected.)

However, serious problems can result if so many nodes and links fail that alternative routings are not possible for some messages. More precisely, a network is said to be partitioned if there exist two nodes, both of which are in service, such that no path exists from one node to the other through nodes and links of the network, all of which are in service. In other words, the two nodes are both in service as host computers, yet messages cannot be exchanged between them.

For many applications it is important that the network administrator choose a network topology that will minimize the probability of partitioning. For example, an area of application where network partitioning arises as a crucial measure of reliability is database availability. If data is stored in a distributed database at some host nodes but not at others, then network partitioning will often make it impossible to answer certain queries when required data becomes unavailable. Another database application that is clearly affected is atomic commitment and crash recovery for distributed transactions. In 1981, Dale Skeen [7] published a message based protocol, known as three-phase commit by later authors, whereby multiple nodes of a network could reliably agree on a consistent final state, commit or abort, in the presence of certain types of network failures. In particular, any number of nodes taking part in the transaction could fail and the remaining active nodes of the transaction would recognize these node failures but would not be ``blocked,'' i.e., left unable to make progress. Instead, by conferring among themselves, they would be able to complete their transition to a final state. At a later time, when failed nodes recovered, these nodes would be able to arrive at the same final state by examining messages previously written to durable media (such as journal logs) and by conferring with other nodes. It is noteworthy that the three-phase commit protocol is reliable in the presence of any set of node and link failures that do not result in a network partitioning. (As we explain in detail a bit later, network congestion can also result in a node or link being regarded as having failed, and hence to a perceived partitioning of the network.) When a network partition does occur and messages are lost, it was shown in a later paper by Skeen and Stonebraker [8], that at least some nodes that are in service will be forced to block until they can reestablish communication with other nodes. This means that data items being updated by the blocked transactions, some of which are likely to be extremely heavily referenced, will be unavailable to other transactions at the same host node, and all database applications may grind to a halt.

The three-phase commit protocol is fundamental, and has found its way into a number of texts on the subject (for example, [3,6,2]). In all these texts, the theoretical limitation of the three-phase commit protocol (that it cannot avoid blocking in the presence of network partition) is emphasized, with no consideration given to designing a network to limit the likelihood of such a partition. One sees this despite the fact that probabilistic considerations are already a fundamental part of the protocol. For example, it is necessary to assume that a network failure can be detected by a timeout period T in awaiting a message response. In [8], for example, the following statement appears.

Notice that we are assuming a somewhat idealized environment by precluding the possibility that a timeout is caused by a slow but correctly executing site. However by adjusting T and by providing low-level protocols for verifying failures, we can build systems that differentiate between failures and slow responses with an arbitrarily high degree of confidence.

Since networks become congested under high loads and can build up response queues with long delays under naive protocols, the ability to differentiate between failures and slow responses is predicated on being able to bound these delays. A protocol that employs binary exponential backoff [9, Section 3.4,] can bound the response time during bursts of high traffic by transferring the queue wait to the message transmission event. A communication failure is not detected until some threshold of retried transmission attempts have failed. Naturally, sufficiently high traffic exceeding capacity over an extended period will eventually cause queue failure, and we assume in what follows that the workload of the network is known in advance so that sufficient hardware can be purchased to handle traffic with short queues, except possibly during occasional short bursts. We also assume that no protocol errors intervene to simulate network component failure.

It is the purpose of this paper to make the following point: a network can be built on an arbitrary set of host nodes such that the possibility of partitioning can be ruled out with an arbitrarily high degree of confidence. This result assumes a simple stochastic model of network component failure for which it is possible to choose a timeout period and retry threshold that can differentiate between failure and slow response. This guarantee yields an exponential decrease in the probability of partitioning with only a linear increase in the cost of the network.

Clearly a guarantee to rule out partitioning has important implications for database availability as well as applicability of the three-phase commit protocol. It is a folk theorem in network theory that as one increases the budget for the number of links of a network, the reliability of the network can be increased by a judicious choice of network topology. The current paper makes this intuitive statement precise and analyzes one simple class of networks to illustrate the result. Analyses of other more complex (but more efficient) classes of networks will appear separately[5].



next up previous
Next: Graph Theory Background Up: Designing Computer Networks to Previous: Designing Computer Networks to



Kenneth Baclawski
Wed Nov 1 21:08:51 EST 1995