CSG 712 Machine Problem 4: Distributed, Multi-Party Chat

Out: Wednesday, March 12, 2008

Due: ~~Thursday, March 27, 2008~~ Sunday, March 30, 2008

Preliminaries

In this lab, you will implement a distributed, peer-to-peer, multi-party chat server. We'll provide you with a simple client program; you'll only need to build the server. Each server can handle multiple clients, but, for increased scalability, there will be multiple servers as well. The chief difficulty with multiple servers is enforcing an ordering on the messages in the absence of synchronized clocks. In order not to confuse the members of the chat, the participants' messages should appear in the same order at every client. A naive implementation would likely have clients at different servers see messages in a different order due to the variation in network latencies between servers. Your task for this assignment is to implement a protocol between the servers that ensures a uniform ordering. We'll make several assumptions about the network to make the problem more tractable, however. Doing it for real would be considerably more difficult.

Architecture

The system will consist of two kinds of modules: clients and servers.

Each client will connect to one server via a durable TCP connection. Each client will have a handle, which is assumed to be globally unique. The client will send to the server a sequence of text lines. The client does not echo messages locally. Instead, it sends messages entered at the client on to the server and displays them only when the server sends them back. This allows the server(s) to control the order in which all messages are displayed at each client.

There will be some number of servers, all connected via a mesh network: each server will be connected directly to all other servers. Each server will transmit each message, along with the handle of the client who sent it, to its live clients and to all the other servers.

Your fundamental task is to coordinate the servers so that all the clients on all the servers see all the messages in the same order.

Thus if client1 and client2 both transmit the sequence of lines

This is message one.
This is message two.
This is message three.
This is message four.
This is message five.

then all clients should see the same sequence of messages, which will be something like

client1: This is message one.
client1: This is message two.
client2: This is message one.
client2: This is message two.
client1: This is message three.
client2: This is message three.
client1: This is message four.
client2: This is message four.
client1: This is message five.
client2: This is message five.

Observe that the messages from client1 and client2 are interleaved in an arbitrary order, but the messages from each client appear in order.

Specification

Each client will take three command-line parameters: a hostname to connect to (given as a resolvable domain name or an IP address), a port to connect to, and a handle (assumed to be unique in the system). We provide a sample client program that takes three command-line parameters: a hostname and port of a server to connect to, and a name or handle to use during the chat. The client supports Emacs line editing commands through the GNU readline library, so you should find it easy to use. The client communicates with its server over a TCP connection using a very simple protocol. You don't need to modify the client for this lab, although you might find it useful to automate the client for testing purposes.

Your server will take a set of command-line arguments including its ID number (which is guaranteed to be unique in the system), a port number to use for network communication, and a list of hostnames and ports for the other servers in the network. It will then forward messages from any clients connected to it to any other directly-connected clients and all other servers. Similarly, it should forward any messages received from other servers to its own clients.

You are responsible for the following requirements:

Each server must forward any received messages to all its (live) clients.
All (currently live) servers must forward all messages from a local client to all other (currently live) servers in addition to locally attached clients.
All messages from clients of live servers must be displayed in the same order on all live clients.
Messages from any particular client must be displayed in the order in which they were submitted. (Note that the preceding item means that messages may be reordered to obtain a uniform ordering; this requirement is a limit on the acceptable reorderings.)
Every server must display a message within two seconds of its original submission at an initial server---assuming the original server is still alive. Servers who join after a message is submitted need not display the message at their clients.
Servers can join or fail at any time.
The chat should continue despite the failure of one or more servers. Any clients connected to a dead server may not be able to send or receive further messages, but clients connected to live servers must continue to be able to send and receive messages. What happens to messages previously sent (but not yet displayed) by clients of failed servers is up to you, but any clients that do display such messages must all display them in the same order. In particular, you are free to display messages from clients of dead servers more than two seconds after they were submitted.
Your server must work both with servers running on the same machine (on a different port) and different machines.
You must not depend on any client behavior other than immediate, FIFO printing of received messages.
You cannot depend on the local server clocks to be synchronized.

You are allowed to assume the following (non-realistic) things:

All servers will have a globally unique identifier, specified at startup.
Every server will be provided with the location of all other servers at startup. No servers other than those specified on the command line will join the network. Specified servers may join at arbitrary points in time, however, and, indeed, may never actually join.
A server may die, but will never return with the same identifier during the lifetime of the system. In other words, you can assume fail-stop behavior. (Extra credit: allow a server to rejoin the system).
The maximum propagation delay between any two servers is 100 ms.
The delay between any client and its server is negligible.
The network never drops any packets. The behavior of your servers in the face of packet loss is undefined.

Hints

You may wish to use a protocol like that on page 62 of Birman and Joseph (1985).
You may wish to require your servers to broadcast a heartbeat, or to respond to an "Are you there?" message, to determine whether a particular server is alive or not.

Starting off

We have provided an initial skeleton directory to get you started. It is available as chat.tgz in the machprobs directory. The tarball contains three files: chat-client.C, Makefile, and chat-server.C. chat-server.C contains some initial code to get you started (if you want to write your server in C). Currently, it accepts client connections and broadcasts any received messages to all its attached clients. It does not yet communicate with any other servers. You can run a server with the following command line:

% ./chat-server 1 2255 localhost 2256

where 1 is the globally unique ID of this server, and 2225 is the port you want it to listen on. localhost and 2256 are the hostname and port of another server in the network. You can have as many hostname:port pairs as you'd like.

For the sake of discussion, start another server in a separate window with a similar command:

% ./chat-server 2 2256 localhost 2255

You can connect to a server with the provided client from another window in the following fashion:

% ./chat-client localhost 2255 Client1

In a fourth window, you should start up another client with a slightly different command line:

% ./chat-client localhost 2255 Client2

Any messages typed at either client should appear at both. You can exit from the provided client by typing CTRL-D by itself at the prompt. If, on the other hand, you connect the second client to the other server, with the following command,

% ./chat-client localhost 2256 Client2

you'll see that the messages are not passed between the servers. For your assignment, you'll need to fix that up. An obvious first step is to simply have each server send any messages from its clients to all other servers. The hard part is making sure they're displayed in the same order at all clients.

Testing your chat server

In order to ease testing, we've made it possible for you to drive the client from a file. You should pass the -i flag to the client when redirecting standard input so that it disables line editing. Here's one possible way to test your servers:

First, create a file, say messages.txt, such as the one below:

This is message one.
This is message two.
This is message three.
This is message four.
This is message five.

You can then use this file to drive two clients simultaneously, like so:

./chat-client -i localhost 2255 bot1 < messages.txt > output1.txt &; \
./chat-client -i localhost 2255 bot2 < messages.txt > output2.txt &

This will cause two clients, called 'bot1' and 'bot2', to simultaneously connect to the server at localhost:2255 and send the set of messages contained in messages.txt. The output of each will go into files output1.txt and output2.txt, respectively. Notice that I've arranged things so that a single shell command starts both clients in the background.

If you try this, you'll see that the order in output1.txt and output2.txt is the same. Once you can issue the same command, but connect the clients to different servers, and the order still comes out the same, you're well on your way. Keep in mind that we'll be testing with a lot more than two servers, and killing some off in the middle.

Evaluation

Single server correctly echoes multiple clients: 5 points.
Single server correctly echoes multiple clients even when clients disconnect: 5 points.
Multiple servers correctly coordinate messages: 10 points.
Multiple servers correctly coordinate messages even when servers fail: 5 points.
Clients allowed to drop out and rejoin: 5 points (extra credit).
Servers allowed to drop out and rejoin: 5 points (extra credit).
Instead of giving server a list of other servers, allow other servers to join a server at will: 5 points (extra credit).

Turnin procedure

The turnin procedure is the same as for the previous labs. We will be building and testing your submission. You can assume that we have the following tools: build-essential, sun-java6-jdk, mono-gmcs, ocaml-nox, scheme, and ghc-6.8.2. If your submission requires any other tools, you should list them in your README or INSTALLATION files, and include information on where to download and how to install them.

Your deliverables should include a description of your program and its parts. This should include a description of the algorithm and documentation of the code (eg, which modules implement which portions of the algorithm). The documentation and comments in your code should make it possible for a reader who knows the algorithm to read your code easily.

Submit your package as a gzipped tar file. You can build this by doing something like

mkdir mp4
cp {all files} mp4
tar cvf mp4.tar mp4
gzip mp4.tar
mv mp4.tar.gz yourname-mp4.tar.gz

Send yourname-mp4.tar.gz to xindong@ccs.neu.edu.

Attribution

If you got portions of your program from external libraries (on the internet or wherever), tell us what they are and where you got them. Otherwise we will assume that you wrote every line yourself, and nasty things will happen if that turns out to be false.

Last modified: Mon Mar 17 14:21:09 EDT 2008