[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

8. Advanced Features of `TOP-C'

It is best to postpone reading this section until the basic features discussed in the previous chapters are clear.

8.1 Testing for Task Continuations and Redos  
8.2 Aborting Tasks  
8.3 Memory Allocation for Task Buffers  
8.4 Optimizing TOP-C Code for the Shared Memory Model  
8.5 Modifying TOP-C Code for the Sequential Memory Model  
8.6 Caveats  


[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

8.1 Testing for Task Continuations and Redos

Function: TOPC_BOOL TOPC_is_REDO ( void )
Function: TOPC_BOOL TOPC_is_CONTINUATION ( void )
These return 0 (false) or 1 (true), according to whether the current call to DoTask() was a result of a REDO or CONTINUATION() action, respectively. The result is is not meaningful if called outside of DoTask().


[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

8.2 Aborting Tasks

Function: void TOPC_abort_tasks ( void )
Function: TOPC_BOOL TOPC_is_abort_pending ( void )
TOPC_abort_tasks() should be called in CheckTaskResult(). `TOP-C' then makes a best effort (no guarantee) to notify each slave. TOP-C does not directly abort tasks. However, TOPC_is_abort_pending() returns 1 (true) when invoked in DoTask() on a slave. A typical DoTask() callback uses this to poll for an abort request from the master, upon which it returns early with a special task output. At the beginning of the next new task, REDO or CONTINUATION, `TOP-C' resets the pending abort to 0 (false). See `examples/README' of the `TOP-C' distribution for example code.


[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

8.3 Memory Allocation for Task Buffers

The principle of memory allocation in `TOP-C' is that if an application allocates memory, then it is the responsibility of the application to free that memory. This issue typically arises around the issue of task buffers (see section 3.1.3 Task Input and Task Output Buffers) and calls to TOPC_MSG(buf,buf_size). An application often calls buf = malloc(...); or buf = new ...; (in C++) and copies data into that buffer before the call to TOPC_MSG. Since the last action of GenerateTaskInput() or DoTask() is typically to return TOPC_MSG(buf,buf_size), there remains the question of how to free buf.

8.3.1 Avoiding malloc and new with Task Buffers  
8.3.2 Using TOPC_MSG_PTR() to Avoid Copying Large Buffers  
8.3.3 Allocation and Freeing of Task Buffers for TOPC_MSG_PTR()  
8.3.4 Marshaling Complex Data Structures into `TOP-C' Task Buffers  


[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

8.3.1 Avoiding malloc and new with Task Buffers

The best memory allocation solution for task buffers is to implement the buffers as local variables, and therefore on the stack. This avoids the need for malloc and new, and the question of how to later free that memory. If you use TOPC_MSG (as opposed to TOPC_MSG_PTR, see section 8.3.2 Using TOPC_MSG_PTR() to Avoid Copying Large Buffers), then recall that TOPC_MSG copies its buffer to a separate TOP-C space. For example,
 
{ int x;
   ...
  return TOPC_MSG(&x, size_of(x));
}

If your task buffer is of fixed size, one can allocate it as a character array on the stack: char buf[BUF_SIZE];. If your buffer contains variable size data, consider using alloca in place of malloc to allocate on the stack.
 
{ ...
  buf = alloca(buf_size);
  return TOPC_MSG(buf, buf_size);
}

In all of the above cases, there is no need to free the buffer, since TOPC_MSG will make a `TOP-C'-private copy and the stack-allocated buffer will disappear when the current routine exits. Note that alloca may be unavailable on your system. Alternatively, the use of alloca may be undesirable due to very large buffers and O/S limits on stack size. In such cases, consider the following alternative.
 
{ TOPC_BUF tmp;
  ...
  buf = malloc(buf_size);
  tmp = TOPC_MSG(buf, buf_size);
  free(buf);
  return tmp;
}


[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

8.3.2 Using TOPC_MSG_PTR() to Avoid Copying Large Buffers

If the cost of copying a large buffer is a concern, `TOP-C' provides an alternative function, which avoids copying into `TOP-C' space.

Function: TOPC_BUF TOPC_MSG_PTR ( void *buf, int buf_size )
Same as TOPC_MSG(), except that it does not copy buf into `TOP-C' space. It is the responsibility of the application not to free or modify buf as long as `TOP-C' might potentially pass it to an application callback function.

TOPC_MSG_PTR() is inherently dangerous, if the application modifies or frees a buffer and `TOP-C' later passes that buffer to a callback function. It may be useful when the cost of copying large buffers is an issue, or if one is concerned about `TOP-C' making a call to malloc(). Note that the invocation
 
  ./a.out --TOPC-safety=4
automatically converts all calls to TOPC_MSG_PTR() into calls to TOPC_MSG(). This is useful in deciding if a bug is related to the use of TOPC_MSG_PTR().

An application should not pass a buffer on the stack to TOPC_MSG_PTR(). This can be avoided either by declaring a local variable to be `static', or else using a global variable (or a class member in the case of C++). In such cases, it is the responsibility of the application to dynamically create and free buffers. An example of how this can be done follows in the next section.

Note that if the application code must also be compatible with the shared memory model, then the static local variable or global variable must also be thread-private (8.4.2 Thread-Private Global Variables).

For examples of coding with TOPC_MSG_PTR() that are compatible with all memory models, including the shared memory model, see `examples/README' and the corresponding examples in the `TOP-C' distribution.


[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

8.3.3 Allocation and Freeing of Task Buffers for TOPC_MSG_PTR()

Recall the syntax for creating a message buffer of type TOPC_BUF using TOPC_MSG_PTR(buf, buf_size). The two callback functions GenerateTaskInput() and DoTask() both return such a message buffer. In the case of GenerateTaskInput(), `TOP-C' saves a copy of the buffer, which becomes an input argument to CheckTaskResult() and to UpdateSharedData on the master. Hence, if buf points to a temporarily allocated buffer, it is the responsibility of the `TOP-C' callback function to free the buffer only after the callback function has returned. This seeming contradiction can be easily handled by the following code.
 
    TOPC_BUF GenerateTaskInput() {
      static void *buf = NULL;
      if ( buf == NULL ) { malloc(buf_size); }
      ... [ Add new message data to buf ] ...
      return TOPC_MSG_PTR(buf, buf_size);
    }
If buf_size might vary dynamically between calls, the following fragment solves the same problem.
 
    TOPC_BUF GenerateTaskInput() {
      static void *buf = NULL;
      if ( buf != NULL ) { free(buf); }
      ... [ Compute buf_size for new message ] ...
      buf = malloc( buf_size );
      ... [ Add new message data to buf ] ...
      return TOPC_MSG_PTR(buf, buf_size);
    }

Note that buf is allocated as a static local variable. `TOP-C' restricts the buf of TOPC_MSG_PTR(buf, buf_size) to point to a buffer that is in the heap (not on the stack). Hence, buf must not point to non-static local data.


[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

8.3.4 Marshaling Complex Data Structures into `TOP-C' Task Buffers

If you use a distributed memory model and the buffer pointed to by input includes fields with their own pointers, the application must first follow all pointers and copy into a new buffer all data referenced directly or indirectly by input. The new buffer can then be passed to TOPC_MSG(). This copying process is called marshaling. See section Marshaling and Heterogeneous Architectures.

If following all pointers is a burden, then one can load the application on the master and slaves at a common absolute address, and insure that all pointer references have been initialized before the first call to TOPC_master_slave(). In `gcc', one specifies an absolute load address with code such as:
 
  gcc -Wl,-Tdata -Wl,-Thex_addr ...
These flags are for the data segment. If the pointers indirectly reference data on the stack, you may have to similarly specify stack absolute addresses. Choosing a good hex_addr for all machines may be a matter of trial and error. In a test run, print out the absolute addresses of some pointer variables near the beginning of your data memory.

Specifying an absolute load address has many risks, such as if the master and slaves use different versions of the operating system, the compiler, other software, or different hardware configurations. Hence, this technique is recommended only as a last resort.


[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

8.4 Optimizing TOP-C Code for the Shared Memory Model

The `TOP-C' programmer's model changes slightly for shared memory. With careful design, one can use the same application source code both for distributed memory and shared memory architectures. Processes are replaced by threads. UpdateSharedData() is executed only by the master thread, and not by any slave thread. As with distributed memory, TOPC_MSG() buffers are copied to `TOP-C' space (shallow copy). As usual, the application is responsible for freeing any application buffers outside of `TOP-C' space. Furthermore, since the master and slaves share memory, `TOP-C' creates the slaves only during the first call to master_slave. If a slave needs to initialize any private data (see TOPC_thread_private, below), then this can be done by the slave the first time that it gains control through DoTask().

Two issues arise in porting a distributed memory `TOP-C' application to shared memory.

  1. reader-write synchronization: DoTask() must not read shared data while UpdateSharedData() (on the master) simultaneously writes to the shared data.
  2. creating thread-private (unshared) global variables:

Most `TOP-C' applications for the distributed memory model will run unchanged in the shared memory model. In some cases, one must add additional `TOP-C' code to handle these additional issues. In all cases, one can easily retain compatibility with the distributed memory model.

8.4.1 Reader-Writer Synchronization  
8.4.2 Thread-Private Global Variables  
8.4.3 Sharing Variables between Master and Slave and Volatile Variables  
8.4.4 SMP Performance  


[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

8.4.1 Reader-Writer Synchronization

In shared memory, `TOP-C' uses a classical single-writer, multiple-reader strategy with writer-preferred for lock requests. By default, DoTask() acts as the critical section of the readers (the slave threads) and UpdateSharedData() acts as the critical section of the writer (the master thread). `TOP-C' sets a read lock around all of DoTask() and a write lock around all of UpdateSharedData().

As always in the `TOP-C' model, it is an error if an application writes to shared data outside of UpdateSharedData(). Note that GenerateTaskInput() and CheckTaskResult() can safely read the shared data without a lock in this case, since these routines and UpdateSharedData() are all invoked only by the master thread.

The default behavior implies that DoTask() and UpdateSharedData() never run simultaneously. Optionally, one can achieve greater concurrency through a finer level of granularity by declaring to `TOP-C' which sections of code read or write shared data. If `TOP-C' detects any call to TOPC_ATOMIC_READ(0), `TOP-C' will follow the critical sections declared by the application inside of DoTask() and UpdateSharedData().

Function: void TOPC_ATOMIC_READ ( 0 ) { ... C code ... }
Function: void TOPC_ATOMIC_WRITE ( 0 ) { ... C code ... }
This sets a global read or write lock in effect during the time that C code is being executed. If a thread holds a write lock, no thread may hold a read lock. If no thread holds a write lock, arbitrarily many threads hold a read lock. If a thread requests a write lock, no additional read locks will be granted until after the write lock has been granted. See `examples/README' of the `TOP-C' distribution for example code.

It is not useful to use TOPC_ATOMIC_READ() outside of DoTask() not to use TOPC_ATOMIC_WRITE() outside of UpdateSharedData().

The number 0 refers to page 0 of shared data. `TOP-C' currently supports only a single common page of shared data, but future versions will support multiple pages. In the future, two threads will be able to simultaneously hold write locks if they are for different pages.

The following alternatives to TOPC_ATOMIC_READ() and TOPC_ATOMIC_WRITE() are provided for greater flexibility.

Function: void TOPC_BEGIN_ATOMIC_READ ( 0 )
Function: void TOPC_END_ATOMIC_READ ( 0 )
Function: void TOPC_BEGIN_ATOMIC_WRITE ( 0 )
Function: void TOPC_END_ATOMIC_WRITE ( 0 )
The usage is the same as for TOPC_ATOMIC_READ and TOPC_ATOMIC_WRITE.

In the distributed memory model of `TOP-C', all of the above invocations for atomic reading and writing are ignored, thus retaining full compatibility between the shared and distributed memory models.


[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

8.4.2 Thread-Private Global Variables

A thread-private variable is a variable whose data is not shared among threads: i.e., each thread has a private copy of the variable. The only variables that are thread-private by default in shared memory are those on the stack (non-static, local variables). All other variables exist as a single copy, shared by all threads. This is inherent in the POSIX standard for threads in C/C++. If DoTask() accesses any global variables or local static variables, then those variables must be made thread-private.

Ideally, if C allowed it, we would just write something like:
 
  THREAD_PRIVATE int myvar = 0;  /* NOT SUPPORTED */
Instead, `TOP-C' achieves the same effect `as if' it had declared
 
  TOPC_thread_private_t TOPC_thread_private;
This allows the application writer to include in his or her code:
 
  typedef int TOPC_thread_private_t;
  #define myvar TOPC_thread_private;
  int myvar_debug() {return myvar;} /* needed to access myvar in gdb */

`TOP-C' provides primitives to declare a single thread-private global variable. `TOP-C' allows the application programmer to declare the type of that variable.

Variable: TOPC_thread_private
A pre-defined thread-private variable of type, TOPC_thread_private_t. It may be used like any C variable, and each thread has its own private copy that will not be shared.
Type: TOPC_thread_private_t
Initially, undefined. User must define this type using typedef if TOPC_thread_private is used.

If more than one thread-private variable is desired, define TOPC_thread_private_t as a struct, and use each field as a separate thread-private variable.

EXAMPLE:

 
/* Ideally, if C allowed it, we would just write:
 *      THREAD_PRIVATE struct {int my_rank; int rnd;} mystruct;
 * We emulate this using TOP-C's implicitly declared thread-private var:
 *      TOPC_thread_private_t TOPC_thread_private;
 */
typedef struct {int my_rank; int rnd;} TOPC_thread_private_t;
#define mystruct TOPC_thread_private
void set_info() {
  mystruct.my_rank = TOPC_rank();
  mystruct.rnd = rand();
}
void get_info() {
foo();
  if (mystruct.my_rank != TOPC_rank()) printf("ERROR\n");
  printf("Slave %d random num:  %d\n", mystruct.my_rank, mystruct.rnd);
}
TOPC_BUF do_Task() {
  set_info(); /* info in mystruct is NOT shared among threads */
  get_info();
  ...;
}

Additional examples can be found by reading `examples/README' in the `TOP-C' distribution.


[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

8.4.3 Sharing Variables between Master and Slave and Volatile Variables

The shared memory model, like any `SMP' code, allows the master and slaves to communicate through global variables, which are shared by default. It is recommended not to use this feature, and instead to maintain communication through TOPC_MSG(), for ease of code maintenance, and to maintain portability with the other `TOP-C' models (distributed memory and sequential). If you do use your own global shared variables between master and slaves, be sure to declare them volatile.
 
  volatile int myvar;
ANSI C requires this qualifier if a variable may be accessed or modified by more than one thread. Without this qualifier, your program may not run correctly.

To be more precise, if a non-local variable is accessed more than once in a procedure, the compiler is allowed to keep the first access value in a thread register and reuse it at later occurrences, without consulting the shared memory. A volatile declaration tells the compiler to re-read the value from shared memory at each occurrence. Similarly, a write to a volatile variable causes the corresponding transfer of its value from a register to shared memory to occur at a time not much later than the execution of the write instruction.

If you suspect a missing volatile declaration, note that `gcc' support the following command-line options.
 
  gcc -fvolatile -fvolatile-global ...
  # If topcc uses gcc:
  topcc --pthread -fvolatile -fvolatile-global myfile.c
The option -fvolatile tells `gcc' to compile all memory references through pointers as volatile, and the option -fvolatile-global tells `gcc' to compile all memory references to extern and global data as volatile. However, note that this implies a performance penalty since the compiler will issue a load/store instruction for each volatile access, and will not keep volatile values in registers.


[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

8.4.4 SMP Performance

Note that `SMP' involves certain performance issues that do not arise in other modes. If you find a lack of performance, please read 7.3 Improving Performance. Also, note that the vendor-supplied compiler, cc, is often recommended over gcc for `SMP', due to specialized vendor-specific architectural issues.


[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

8.5 Modifying TOP-C Code for the Sequential Memory Model

`TOP-C' also provides a sequential memory model. That model is useful for first debugging an application in a sequential context, and then re-compiling it with one of the parallel `TOP-C' libraries for production use. The application code for the sequential library is usually both source and object compatible with the application code for a parallel library. The sequential library emulates an application with a single `TOP-C' library.

The sequential memory model emulates an application in which `DoTask()' is executed in the context of the single slave process/thread, and all other code is executed in the context of the master process/thread. This affects the values returned by TOPC_is_master() and TOPC_rank(). In particular, conditional code for execution on the master will work correctly in the sequential memory model, but the following conditional code for execution on the slave will probably not work correctly.
 
int main( int argc, char *argv[] ) {
  TOPC_init( &argc, &argv );
  if ( TOPC_is_master() )
    ...;  /* is executed in sequential model */
  else
    ...;  /* is never executed in sequential model */
  TOPC_master_slave( ..., ..., ..., ...);
  TOPC_finalize();
}


[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

8.6 Caveats

IMPORTANT: `TOP-C' sets alarm() before waiting to receive message from master. By default, if the master does not reply in a half hour (1800 seconds), then the slave receives SIGALRM and dies. This is to prevent runaway processes in dist. memory version when master dies without killing all slaves. 7.4 Long Jobs and Courtesy to Others, in order to change this default. If your applications also uses SIGALRM, then run your application with --TOPC-slave-timeout=0 and `TOP-C' will not use SIGALRM.

GenerateTaskInput() and DoTask() This memory is managed by `TOP-C'.

The slave process attempts to set current directory to the same as the master inside TOPC_init() and produces a warning if unsuccessful.

When a task buffer is copied into `TOP-C' space, it becomes word-aligned. If the buffer was originally not word-aligned, but some field in the buffer was word-aligned, the internal field will no longer be word-aligned. On some architectures, casting a non-word-aligned field to `int' or certain other types will cause a bus error.


[ << ] [ >> ]           [Top] [Contents] [Index] [ ? ]

This document was generated by Gene Cooperman on October, 6 2004 using texi2html