Machine Problem 5: A Distributed Hash Table

Out: Tuesday, April 1, 2008

Due: Sunday, April 20, 2008

For this problem, you will implement a distributed hash table, much like the MIT Chord system.

Introduction

The Chord system uses an m-bit identifier space to assign identifiers to both nodes and keys. Both keys and nodes are hashed using a hash function like

   proc hash(x) = SHA-1(x) mod 2^m .

We call this the identifier of the key or node. For the rest of this discussion, we will assume that keys and nodes have already been hashed, and just work with the identifiers. Also, all arithmetic is mod 2^m.

Thus the nodes conceptually form a ring with 2^m members. Of course not all 2^m nodes are present at any time. A (key, value) pair will be stored in the node with the smallest identifier id &ge k, where k is the hash of key. This node is referred to as the successor of k, and is denoted succ(k).

A query (either an insert or a lookup) on a key k can come from any node, so the first hard problem in DHTs is to find an efficient way of computing succ(k). Linear search obviously works, but doesn't scale. Chord uses a "finger table" that allows any node to compute succ(k) in time linear in m. You can look in any of the many good tutorials on DHTs for a detailed description of how this is done.

A node can also leave the ring. It may stay out of the ring, or rejoin later. You will have to figure out how to move data and finger tables around so that the ring works, even though nodes are constantly joining and leaving.

Normally, a new node inserts itself in the ring at the index hash(ip_address). Since you have only a few machines, you will have to run several nodes on a single machine. So you will assign each node a name (assumed locally, but not globally, unique) to distinguish it from the other nodes running on the same machine. The node will insert itself at index hash(ip_address++name), where ++ denotes concatenation.

Mini Man page

dht-node name port [ring_ip_addr other_name] [-r] [-v] [-i script_file]
name the name of this node
port the port on which nodes should communicate in this ring
[ring_ip_addr other_name] The IP address and handle of a known element of the ring. This need not be the root of the ring. Do not include if -r is specified
-r This is the root of the ring
-v Verbose. Produce a log of the activity of this node. The log should include at least:
  • Each query received by the node, and the action taken (insert, retrieve, or forward).
  • A record of each change to the finger table.
  • If an adjacent node joins or leaves, a record of the changes to the local data table.
-i Take inputs from script_file

If -v is set, each node should log (among other things) how many steps it took for a query to reach it. Use a hop count to keep track of how many steps it takes you to resolve a key.

Any client can add a (key, datum) pair, and any client can look up a key. We assume a keys and datums are both strings. You may assume reasonable limits on the length of these strings.

A client must support the following commands. They should appear in the script in the form shown below, one per line.

join() Join the ring specified in the command line. If the node specified in the command line has left the ring, this should fail.
leave() Leave the ring.
wait(n) Wait n milliseconds. (Useful in a script between a leave() and a subsequent join() ).
put(key, value) Put into the DHT the (key,data) pair. key and data are both strings. You may assume reasonable limits on the length of these strings. The data will be stored at the node responsible for KEY. What this does if there is already a pair for KEY is up to you.
get(key) Get the data corresponding to KEY. What this does if there is no pair for KEY is up to you.

For your implementation, set m = 8 (ie, there can be up to 256 nodes in your ring). You will need to have quite a few more than m nodes running, so you can tell whether your finger table is working correctly.

Don't worry about optimizing the lookup of the key inside a node. That's not what this problem is about.

Evaluation

Root node runs and processes keys. 5 points
New nodes can join, both on the same machine and on different machines, and both joining at the root node and at other running nodes. Both root and new nodes can put and get data. 10 points
All puts and gets require only about m transmissions, even if the ring has a lot more than m members. 5 points
Nodes can leave the ring and their data is not lost. 5 points
Nodes that have left can rejoin the ring. 5 points
Nodes can leave the ring without notice (ie, fail) without the ring losing data. Do this by storing data redundantly, in two nodes (!) 10 points (extra credit)
Add a discovery protocol, so when a node wants to join, it broadcasts on a well-known port (specified on the command line) to find an existing node. If you do this, then you don't have to supply the ring_ip_addr and other_name parameters in the command line. 10 points (extra credit)

Collaboration

As usual, if you got portions of your program from external libraries (on the internet or wherever), tell us what they are and where you got them. Otherwise we will assume that you wrote every line yourself (except as noted below), and nasty things will happen if that turns out to be false.

You need not code your own implementation of SHA-1. Get one from a library.

Also, I'd rather have you working on the interesting parts of the problem than the stupid parts. So feel free to collaborate on the stupid parts, such as parsing the input files, producing log files, etc. Just tell me the parts you collaborated on, and with whom. For LOC, please report the total LOC for the project.

Turnin procedure

The turnin procedure is the same as for the previous labs. We will be building and testing your submission. You can assume that we have the following tools: build-essential, sun-java6-jdk, mono-gmcs, ocaml-nox, scheme, and ghc-6.8.2. If your submission requires any other tools, you should list them in your README or INSTALLATION files, and include information on where to download and how to install them.

Your deliverables should include a description of your program and its parts. This should include a description of the algorithms you used and documentation of the code (eg, which modules implement which portions of the algorithm). The documentation and comments in your code should make it possible for a reader who knows the algorithms to read your code easily. Note that there will be algorithms for several different portions of the problem.

Submit your package as a gzipped tar file. You can build this by doing something like

mkdir mp5
cp {all files} mp5
tar cvf mp5.tar mp5
gzip mp5.tar
mv mp5.tar.gz yourname-mp5.tar.gz
Send yourname-mp5.tar.gz to xindong@ccs.neu.edu.

Last modified: Tue Apr 15 22:09:20 EDT 2008