DoTask()
and UpdateSharedData()
, save partial
computations in global private variables. Then, in the event of a
REDO
action, `TOP-C' guarantees to invoke
DoTask()
again on the
original slave process or slave thread. That slave may then use
previously computed partial results in order to shorten the required
computation. Note that pointers on the slave to input and output
buffers from previous UPDATE
actions and from the original task
input will no longer be valid. The slave process must copy
any data it wishes to cache to global variables.
In the case of the shared memory model, those global variables must be
thread-private. (see section Thread-Private Global Variables)
CheckTaskResult()
, the master may merge two or more task
outputs in an application independent way. This may avoid the
need for a REDO
action, or it may reduce the number of required
UPDATE
actions.
TOPC_abort_tasks()
should be called in CheckTaskResult()
.
`TOP-C' then makes a best effort (no guarantee) to notify each
slave. Afterwards, TOPC_abort_pending()
returns 1 (true)
when invoked in DoTask()
on a slave. Typically,
the DoTask()
callback uses this to poll for an abort request
from the master, upon which it returns early with a special
task output. At the beginning of the next new task, REDO
or CONTINUATION
, `TOP-C' resets the pending abort
to 0 (false).
If your application runs too slowly due to excessive time for communication, consider running multiple slave processes on a single processor. This allows one process to continue computing while another is communicating or idle waiting for a new task to be generated by the master.
If communication overhead or idle time is still too high, consider if it is possible to increase the granularity of your task -- perhaps by amalgamating several consecutive tasks as a single larger task to be performed by a single process. You can do some of this automatically. For example, if the statement:
TOPC_agglom_count=5; [ EXPERIMENTAL VERSION, ONLY ]
is executed before TOPC_master_slave()
, then `TOP-C' will
transparently
bundle five task inputs as a single network message, and similarly
for the corresponding task outputs.
Other useful techniques that may improve performance of certain applications are:
LIBMPI
in
`.../top-c/Makefile' by your vendor's `limbpi.a' or
`libmpi.so', and delete or modify the the LIBMPI
target in the
`Makefile'.
cc
, is recommended over
gcc
for `SMP', due to specialized vendor-specific
architectural issues. Second, if a thread completes its work before
using its full scheduling quantum, the operating system may yield the
CPU of that thread to another thread -- potentially including a thread
belonging to a different process. There are several ways to defend
against this. One defense is to insure that the time for a single task
is significantly longer than one quantum. Another defense is to ask the
operating system to give you at least as many "run slots" as you have
threads (slaves plus master). Some operating systems use
pthread_setconcurrency()
to allow an application to declare this
information, and `TOP-C' invokes pthread_setconcurrency()
where it is available. However, other operating systems may have
alternative ways of tuning the scheduling of threads, and it is
worthwhile to read the relevant manuals of your operating system.
It is easy for parallel jobs to demand excessive resources. By default, in the distributed memory model, TOP-C causes a slave to time out if the master does not reply in one hour (and in some versions of `TOP-C', if the slave task lasts longer than one hour.
This is implemented with the UNIX system call, alarm()
. It is
done for safety, in case of orphaned slaves from previous computations,
and slaves in infinite loops. You can set other timeout values (in
seconds), or else set --TOPC_slave_timeout=0
for no timeout
(and to tell `TOP-C' not to use SIGALRM
).
Some other options are:
#include <unistd.h> #include <sys/resource.h> setpriority(PRIO_PROCESS,getpid(),prio) - prio = 10 still gives you some CPU time. prio = 19 means that any job of higher priority always runs before you. Place inmain()
. #include <sys/resource.h> struct rlimit rlp; rlp.rlim_max = rlp.rlim_cur = SIZE; setrlimit(RLIMIT_RSS, &rlp) - SIZE is RAM limit (bytes). If your system has significant paging, the system will prefer to keep your process from growing beyond SIZE bytes of resident RAM. Even if you set nice to priority 20, this is still important. Otherwise you may cause someone to page out much of his or her job in your favor during one of your infrequent quantum slices of CPU time. Place inmain()
. (Not all operating systems enforce this request.)
Go to the first, previous, next, last section, table of contents.