Notes on a Solution to Robust Slaves

This is the current summary. Soon, I will add to this by splitting the solution into individual tasks for the networks group.

We are looking for a way to continue the TOP-C computation even after one slave dies or is slow to respond. See the previous web page on robust slaves for how the problem was posed. This page tries to summarize the solution.

The TOP-C software is built in four layers:

  1. topc/src/topc.c: If master, keep table of slaves, and when slave is available, call GenerateTaskInput() to create task input, and to send it to a slave. If slave, wait for task input, call DoTask(), and send back task output.
  2. topc/src/comm*.c: Communication layers for sequential, mpi, and pthread; Each file presents a common interface to topc.c for sending and receiving messages, but internally, it does its work by different means. In particular, topc/src/comm-mpi.c does its work by calling the MPI layer.
  3. topc/src/mpinu/*: MPINU is a subset of MPI. It does its work by making calls to sockets in the operating system.
  4. sockets: See UNIX man pages (section 2 of man pages), and pointers to sockets on course web page.
Software in one layer can call a lower layer, but not vice versa. We decided to divide the design into two parts:

Note also that there are four interesting states in which a slave may be.

  1. Dead slave
  2. Slave not responding; Transient delay
  3. Very large slave task
  4. Very slow slave (due to heavy CPU load, or intensive disk I/O, or other)

The issue for Berkeley Sockets is that if a process (or even a thread in a process) reads or writes from a dead socket (e.g. socket connected to dead slave), then the entire process dies.

The proposed solutions fall into the categories:

  1. Active Strategy: Create additional process or thread
    1. Strongly Active: manager process communicates with master process via shared memory, via named pipes or via some other safe mechanism; Manager process can keep data structure for queue of outstanding tasks and collect replies from slaves. If the manager process dies, the master recreates the manager process, either from its own internal data structure of outstanding tasks, or else from some persistent file committed by manager process.
    2. Weakly Active: manager process acting as taste tester: try the socket. If manager process dies, then master considers the corresponding slave dead, and creates a new manager process
  2. Passive Strategy: Master waits for slave until some user-specified timeout. If the slave does not reply in time, then the master considers it dead.

Eugenio pointed out that the strongly active manager process can be viewed as a database manager. We can require that slaves initiate contact with the database manager at all times. The database manager can commit its data, in case it dies. There is still a question about how the master will inform the slave processes that an update of the shared data is pending. One possibility is for the master to send a UDP packet to the slave. Another possibility is that the slave can collect its updates when it communicates with the database manager.