Sharing Resources in Distributed Systems

Abstract

An important goal of a distributed system is to effectively utilize the collective resources of the system, namely, the memory and the processors of the individual nodes. This dissertation addresses certain problems pertaining to sharing memory and processors in distributed systems.

In the first part of the dissertation, we study two important issues that arise while sharing memory in a distributed system: memory contention and faults. We adopt a model of computation in which each node can send or receive at most a constant number of objects per step. Our main result is a simple protocol for providing fast access to shared objects in an environment in which memory contention is unconstrained and a constant fraction of the nodes and communication links may be faulty at any time. Our protocol combines techniques for hashing and erasure codes with a new mechanism for on-line replication. We show that if each new access request is chosen according to a fixed probability distribution over the set of objects, then our protocol rapidly reaches a steady state in which each request is satisfied in an expected constant number of steps and the throughput of the protocol is asymptotically optimal. We also prove that the protocol continues to remain in a steady state if changes in the access pattern are moderate.

The second part of the dissertation studies load balancing, which is a mechanism for sharing processors in a distributed system. We analyze the effectiveness of a local load balancing algorithm in which each node repeatedly balances its load with its neighbors. Our main results concern the static version of the problem where we assume that the total load does not change with time. We analyze the local balancing algorithm in terms of both the imbalance of the initial load distribution and several parameters of the network, including the number of nodes, the maximum degree, and the edge expansion. We show that the algorithm is worst-case optimal for all networks. We improve this result for the special case of ring networks by showing that the local balancing approach is optimal (up to an additive linear term) for every initial distribution of load on the ring. Our results are also shown to hold for an asynchronous model of computation.

This dissertation demonstrates that a number of basic resource sharing problems admit efficient solutions in the form of simple local algorithms. Our algorithms are simple in the sense that the program running at each node can be expressed as a periodic process that repeatedly executes a small number of fixed operations. Our algorithms are local in the sense that each node either communicates with only its nearest neighbors or sends messages to only a small number of other nodes.

Sharing Resources in Distributed Systems

Abstract

Postscript