[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

3. Overview of `TOP-C/C++'

 
                       "Difficulty?" exclaimed Ford. "Difficulty?  What
                   do you mean difficulty?  [The wheel is] the single
                   simplest machine in the universe!"  ...
                       "All right, Mr. Wiseguy,"  she said,  "you're so
                   clever, you tell us what color it should be."
                       from "The Restaurant at the End of the Universe"
                       by Douglas Adams

`TOP-C' has been designed especially to make it easy to parallelize existing sequential applications. Toward this goal, `TOP-C' emphasizes:

  1. ease of use (high level task-oriented abstraction);
  2. latency tolerance; and
  3. architecture independence (same application code for shared and distributed memory).

A `TOP-C' application is compiled and run using topcc (similarly to gcc) or topc++ (similarly to g++). For example, assuming a `procgroup' file in the current directory to specify the remote hosts for the slave processes, one executes:
 
  topcc --mpi parfactor.c
  # or else:  topc++ --mpi parfactor.cc
  ./a.out

If a `TOP-C' application fails to link, check for a clash of symbol names. All TOP-C symbols are of the form TOPC_*, COMM_*, MEM_*, MPI_*, MPINU_*, NO_ACTION, UPDATE, REDO, CONTINUATION, or NOTASK.

For purposes of documentation, we will standardize on an explanation of topcc. Wherever topcc is mentioned, the description is equally valid for topc++.

3.1 Programmer's Model  
3.2 Three Key Concepts for TOP-C  
3.3 Distributed and Shared Memory Models  


[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

3.1 Programmer's Model

3.1.1 Structure of a TOP-C Program  
3.1.2 Four Callback Functions  
3.1.3 Task Input and Task Output Buffers  
3.1.4 The `TOP-C' Algorithm  


[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

3.1.1 Structure of a TOP-C Program

A typical TOP-C application has the following structure:
 
#include <topc.h>
... define four callback functions for TOPC_master_slave() ...
int main( int argc, char **argv ) {
  TOPC_init( &argc, &argv );
  ...
  TOPC_master_slave( GenerateTaskInput, DoTask, CheckTaskResult,
                     UpdateSharedData )
  ...
  TOPC_finalize();
}

The program above is run with a master process and several slave processes operating with identical command line arguments and identical initial data (SPSD, or Single Program, Single Data). Communication among processes occurs only within the routine TOPC_master_slave(), during which execution can be called SPMD (Single Program, Multiple Data). At the end of TOPC_master_slave(), execution returns to SPSD, although the master process may contain some additional process-private data.

9. `TOP-C' Raw Interface for Parallelizing Sequential Code describes an alternative interface that is often useful for parallelizing existing sequential code. However, for new applications, the standard interface will usually be cleaner.


[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

3.1.2 Four Callback Functions

In a `TOP-C' application, the programmer defines four callback functions and passes control to the `TOP-C' library through the following command.
 
  TOPC_master_slave(GenerateTaskInput, DoTask, CheckTaskResult,
                    UpdateSharedData);

Pictorially, TOP-C arranges for the flow of control among the four callback functions as follows:
 
      (ON MASTER)      task input
   GenerateTaskInput() ---------->


  task input   (ON A SLAVE)  task output
  -----------> DoTask(input) ----------->


  task output     (ON MASTER)                    action
  -----------> CheckTaskResult(input, output) ----------->


if (action == UPDATE):
  task input, task output      (ON ALL PROCESSES)
  -----------------------> UpdateSharedData(input, output)


[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

3.1.3 Task Input and Task Output Buffers

A task input or task output is an arbitrary buffer of bytes of type (void *) in `TOP-C'. The task buffers are arbitrary application-defined data structures, which are opaque to `TOP-C'. Note that in ANSI C, a void pointer is compatible with any other pointer type (such as (struct task_input *) and (struct task_output *) in the example below).

Defining Application Task Buffers  
Marshaling (Serialization) and Heterogeneous Architectures  
Marshalgen, a Package for Marshaling  

Defining Application Task Buffers

An application callback function returning a task input or task output must encapsulate it in a function, TOPC_MSG( void *buf, int buf_size ), which is then returned by the callback functions GenerateTaskInput() and DoTask(). TOPC_MSG() internally allocates a copy of buf, and TOP-C later frees the copy automatically. So, buf may be reused by the application program.

 
TOPC_BUF DoTask( struct task_input * inp ) {
  struct task_output outp;
  ...
  outp = ...;
  return TOPC_MSG( &outp, sizeof(outp) );
}

The principle of memory allocation in `TOP-C' is that if an application allocates memory, then it is the responsibility of the application to free that memory. TOPC_MSG() has the further property of copying its buffer argument to a separate `TOP-C' space (using a shallow copy), after which the application can free any memory it allocated. This happens automatically in the above example, since outp is allocated on the stack.

Marshaling (Serialization) and Heterogeneous Architectures

If a heterogeneous architecture is used, there is an issue of converting data formats or marshaling. This is the application's responsibility. For simple data formats (integers, floats or characters), such conversion can easily be done in an ad hoc manner. Most C compilers use the IEEE binary floating point standard, and characters are almost always encoded in eight bit ASCII representation (or possibly in a standard Unicode format). Although the byte ordering of integers is not standardized, the system calls htonl() and ntohl() are available to convert integers into 32 bit network integers that are portable across heterogeneous systems.

For more complicated conversions, one can consider writing one's own marshaling routines or else using a standard package for marshaling such as the `XDR' library (RFC 1832, eXternal Data Representation), `IDL' (`Corba'), or `SOAP' (`XML').

Marshalgen, a Package for Marshaling

For complex C++ applications, we recommend tat you try out `Marshalgen' package. A pointer to it is available from the TOP-C home page. Since the C++ classes to be marshaled are already defined in `.h' files, Marshalgen asks th user simply to annotate those files with comments (for example, deep copying vs. shallow copying fo pointers). `Marshalgen' also has support for such real world issues as marshalingsubclasses and templates, handling of polymorphism, etc. The `Marshalgen' preprocessor then generates methods for a new marshaling class that know how to marshal and unmarshal the original class.


[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

3.1.4 The `TOP-C' Algorithm

When there is only one slave, The `TOP-C' algorithm can be summarized by the following C code.
 
{ void *input, *output;
  TOPC_ACTION action;
  while ( (input = GenerateTaskInput()) != NOTASK ) {
     do {
       output = DoTask(input);
       action = CheckTaskResult(input, output);
     } while (action == REDO);  /* REDO not useful for only one slave */
     if (action == UPDATE) then UpdateSharedData(input, output);
  }
}

On a first reading, it is recommended to skip the rest of this section until having read through Section 4.3 Actions Returned by CheckTaskResult().

For a better understanding of the case of multiple slaves, this simplified excerpt from the `TOP-C' source code describes the `TOP-C' algorithm.
 
TOPC_BUF input, output;
int num_idle_slaves = num_slaves;
TOPC_ACTION action;

while (TRUE) {
  wait_until_an_idle_slave();
  input = COMM_generate_task_input();
  if (input.data != NOTASK.data) {
    SUBMIT_TO_SLAVE:  output = DoTask(input.data);
    num_idle_slaves--;
  }
  else if (num_idle_slaves < num_slaves) // else insure progress condition
    receive_task_output();               // by blocking until a slave replies
  else break;
} // termination condition:  _after_ all slaves idle, next input was NOTASK

The code for wait_until_an_idle_slave() can be expanded as follows.
 
void wait_until_an_idle_slave() {
  do {
    while ( result_is_available(&input, &output) ) {
      action = CheckTaskResult(input.data, output.data);
      if (action == UPDATE)
        UpdateSharedData(input.data, output.data);
      if (action == REDO) /* Use updated shared data, when redoing */
        SUBMIT_TO_SLAVE:  output = DoTask(input.data);
      num_idle_slaves++;
    } while (num_idle_slaves == 0);
  }

Note that the term result refers to an `(input,output)' pair. The routine CheckTaskResult() returns an action, which determines the control structure for a parallel algorithm. A common definition is:

 
TOPC_ACTION CheckTaskResult( void *input, void *output ) {
  if (output == NULL) return NO_ACTION;
  else if ( TOPC_is_up_to_date() ) return UPDATE;
  else return return REDO; }

TOPC_is_up_to_date() returns true if and only if during the interval between when the task input was originally generated and when the task output was returned by the most recent slave, no other slave process had returned a task output during the interim that had caused the shared data to be modified through an UPDATE action. An UPDATE action causes UpdateSharedData() to be invoked on each process. Further discussion can be found in 4.4 TOP-C Utilities.


[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

3.2 Three Key Concepts for TOP-C

The `TOP-C' programmer's model is based on three key concepts:

  1. tasks in the context of a master/slave architecture;
  2. global shared data with lazy updates; and
  3. actions to be taken after each task.

Task descriptions (task inputs) are generated on the master, and assigned to a slave. The slave executes the task and returns the result to the master. The master may update its own private data based on the result, or it may update data on all processes. Such global updates take place on each slave after the slave completes its current task. Updates are lazy in that they occur only after a task completes, although it is possible to issue a non-binding request to `TOP-C' to abort the current tasks (8.2 Aborting Tasks). A SPMD (Single Program Multiple Data) style of programming is encouraged.

In both shared and distributed memory architectures, one must worry about the order of reads and writes as multiple slaves autonomously update data. The utilities below are meant to ease that chore, by supporting the ease of the SPMD programming style, while still maintaining good efficiency and generality for a broad range of applications. The software can easily be ported to a variety of architectures.


[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

3.3 Distributed and Shared Memory Models

`TOP-C' provides a single API to support three primary memory models: distributed memory, shared memory and sequential memory. (The last model, sequential memory, refers to a single sequential, non-parallel process.) On a first reading, one should think primarily of the distributed memory model (distributed nodes, each with its own private memory). Most programs written for distributed memory will compile without change for sequential memory. `TOP-C' is designed so that the same application source code may operate efficiently both under distributed and under shared memory. In order to also compile for shared memory hardware (such as SMP), additional hints to `TOP-C' may be necessary.

In shared memory architectures, all data outside of the four callback functions is shared, by default. Hence, an UPDATE action under shared memory causes only the master process to invoke UpdateSharedData(). To avoid inconsistencies in the data, by default `TOP-C' arranges that no slave process may run DoTask() while UpdateSharedData() is running. `TOP-C' also provides support for finer levels of granularity through application-defined private variables and critical sections. Further discussion can be found in 8.4 Optimizing TOP-C Code for the Shared Memory Model.


[ << ] [ >> ]           [Top] [Contents] [Index] [ ? ]

This document was generated by Gene Cooperman on October, 6 2004 using texi2html