Student Projects in Parallel Computing
Prof. Gene
Cooperman
The following are suggested projects for CS G280 (Parallel
Computing). You are welcome to suggest other projects if
you like.
Most of the projects below have the potential to result in conference
papers. Within each category, the projects are ordered roughly by order of
importance, which correlates strongly with the likelihood of
developing the project into a conference paper. Clicking on a project
listing will take you to a fuller
description. If a student has selected that project, clicking on the
heading of that fuller description will take
you to a directory where you can look at the results to date.
Throughout this web page, there are many references to TOP-C (Task
Oriented Parallel C/C++). TOP-C is freely available and easily
installed and used. It supports both shared memory and distributed
memory parallelism with a single API (Application Program Interface).
Also, many of these projects will benefit from having a test suite
of applications. There are several possible test suites, but a particularly
high quality suite is the
Stanford Parallel Applications for Shared Memory (SPLASH and SPLASH-2)
with a
.tar.gz file.
While the applications are written for shared memory, they can be adapted
to distributed memory in most cases. Since distributed memory
tends to be more scalable, this is an important consideration.
The kernel programs are smaller than apps programs, and should be examined
first.
splash2/codes/null_macros/c.m4.null
provides expansions
of the macros to turn the program into a single processor program.
Brief Overview
Detailed Listing
Applied Programming Languages is a subject in which one first determines,
based on user experience, what features would be useful. One then looks
for innovative ways to retrofit those features into an existing language.
Retrofitting into an existing language often requires strong systems skills.
In a pure approach, one would design a new language with new features
(e.g.: Scheme, Smalltalk, etc.), and then hope that a new mainstream
language (such as Java or C#) will adopt some of those features. In an
applied approach, one tries to directly retrofit those features into an
existing language, gain user experience, and then use that experience
to create a new mainstream language. C++ is an example of the latter
(although one can argue that C++ was released too early, with an
overly complex programmer's model, and would have benefited from more
user experience).
This section considers parallel abstractions that would modify the
TOP-C programmer's model.
- Parallelizing Octave (Matlab lookalike)
- Parallelizing Octave (Engineering/Science application)
Extend TOP-C to work interactively in Octave.
Octave is a Matlab-like language for
interactive numerical computation.
Its software is free under GPL. Something similar was previously done
with ParGAP,
the parallelization of the interactive language,
GAP using TOP-C.
Matlab has become
almost a standard in engineering. A parallelization
of Octave making it easy for end users of Matlab to write parallel
code (via the Octave work-alike) is likely to gather a lot of attention.
- C++ binding to TOP-C callback functions
- C++ was not designed for use with callback functions. If you
don't believe this, try using a non-abstract member function of some
class as a callback function. In C, one can write
int (*do_task)(void *input);
int myfnc(int x) { return x+1; }
do_task = myfnc;
(*do_task)(input);
This does not work if myfnc is a non-abstract member function.
This problem was already faced by the designers of
GTK+, a GUI project written in C
that depends on callback functions. The designers had to add C++
callback functions. Study their design decisions, and desing an
appropriate C++ binding for TOP-C that elegantly handles the callback
functions.
- Scalability of data in parallel computation
- In a cluster, the size of the data may grow so large that it no
longer fits in the RAM of a single processor. Traditionally, when
an applications person was faced with this problem, he or she would look
for a shared memory computer. A 16-processor shared memory computer
is likely to have 16 times as much memory as a standalone
1-processor computer. This is an inefficient solution since shared
memory computers are expensive and he or she plans to use only one of the
16 processors. Design a cluster-based solution, in which, in
addition to master and worker nodes, one also has data nodes. A
data node may be a separate processor. Alternatively, a single processor
may support two processes, one primarily for computation and one primarily
for maintaining some portion of a distributed database (not necessarily
a relational database). The job of a data node is to handle requests
from other nodes.
For example, we may distribute a 1 GB hash array among 10
data nodes. Each data node is responsible for a contiguous sequence
of hash indices. For network efficiency, we may have to structure a
parallel computation so that a worker node saves up 1000 hash
queries per data node, and sends all 1000 hash queries to the data
node as a single message.
In 2005, we are lucky enough to have an in-house application by
Eric Robinson that would directly benefit from this model. This provides
you with an excellent test case. Please talk to me and him about this
project.
- Annotated parallelization
- openMP is a standard that
provides for parallelism on top of POSIX threads. It does so by
adding annotations and pragmas, that are recognized by a front-end
program. Following that philosophy, do the same thing for
applications on top of TOP-C. Annotate the main TOP-C task in such a
way that a preprocessor can translate the code into parallel code that can be
compiled and linked with the TOP-C library. An ANSI C
grammar suitable for yacc/bison and lex/flex is readily
available for this purpose. Perhaps a better approach is to use
the power of an existing language transformer package, such
as CIL.
- Bottlenecks due to slow RAM
- As the CPU-memory gap continues to grow, many common programs are
now dominated by the speed of RAM, and the speed of the CPU is almost
irrelevant. For example, most integer sorting programs are in this class.
This holds to such an extent, that program runtimes can be estimated to
within about 25% using
models
that count memory cycles. For such a simple program as
object Z[], Y[]; int X[];
for (int i = 0; i < N; i++) Z[i] = Y[X[i]];
there is a
faster
version that takes into account properties of RAM.
How does one translate such ideas into such tasks as speeding up
an FFT (Fast Fourier Transform) or integer matrix multiplication.
See, also, the next topic on slow RAM.
- Superlinear parallel speedup and slow RAM
- The good news is that superlinear parallel speedup can be a
reality. Superlinear parallel speedup is a phenomenon in which a sequential
program is parallelized to run on 10 processors and then runs 20 times
as fast. This can happen if the working set of the sequential program is
larger than the cache of any one processor, but in the parallel
version, the working set may be split up so that the working set on
any single processor fits nicely within the available cache.
The bad news is that RAM can be up to 100 times slower when used
in a random access pattern. In some shared memory applications, one
can even observe a negative parallel speedup (speed-down?) when added
processors compete for accessing slow RAM. In today's technology,
the CPU processes 4-byte words at approximately 8 GB/s
(i.e. 2 GHz = 8 GB/s); Gigabit Ethernet runs at
approximately 128 MB/s (1 Gb/s/8 = 128 MB/s);
and random access to DRAM runs at approximately 66.6 MB/s
(4-byte word using 60 ns DRAM = 4/(60 ns) =
66.6 MB/s; 60 ns
random access time for DRAM has been a standard for the last decade).
Can one make use of multiple data nodes (see Scalability of
Data) with their multiple caches to partition the data in a
distributed manner for faster access to RAM? What is a possible
application that might usefully take advantage of such a scheme?
If the bandwidth and latency of Ethernet is a bottleneck, can one
implement this scheme successfully on a shared memory machine?
- Recursion
- It is non-trivial to provide recursion in a parallel system. One
elegant solution is provided by Cilk. Cilk is
primarily for shared memory, although a port to distributed memory
does exist. Cilk uses the concept of work-stealing to
implement recursion and to also provide good load balancing. Look
at the MANTA
project, which has its own solutions to recursion. Although
TOP-C has a recursion example among its example files, the recursion
is less elegant and more error-prone than Cilk. Either use the
spawn/sync calls invoked in Cilk's work-stealing paradigm, or design
a more fool-proof way to do recursion in TOP-C (possibly with the
use of source code annotations in comments??). One useful trick
that makes this possible is the possibility of copying stacks and
pasting them together using tricks with the longjmp/setjmp
system calls.
If enough coverage is provided, this
can be turned into a publishable paper (typically by investing as much
time after the course as was invested during the course).
- Multiple pages to eliminate false sharing in
TOP-C
- A problem with the TOP-C model occurs when there are frequent
updates to the shared data. There is a well-known phenomenon in
shared memory called false sharing. When two threads (or
two processes in a DSM system) repeatedly modify the same page of
memory, the operating system must repeatedly migrate that page first
to one thread and then to the other. If the two threads are
modifying distinct addresses on that page, then the migration of the
page is unnecessary. Yet the operating system has no way of
detecting this.
TOP-C exhibits a form of false sharing. Its concept of shared
data effectively creates a single page for all of shared data.
TOPC_is_up_to_date() tests if that single page has been modified
since the current task was executed, and if so, the user program
typically invokes a REDO action. This is expensive. Yet it can
happen that the worker for the current task, and the worker that had
since modified the shared data, may have in fact worked on distinct
portions of that page. In that case, the REDO is unnecessary. The
TOP-C model can be extended to multiple `pages' of shared data by
modifying the action UPDATE(page_num)
and the utility
TOPC_is_up_to_date(page_num)
. Do so.
- Shared memory and LISP/GAP
- GAP (Groups, Algorithms
and Programming) is a general-purpose interactive language for
discrete mathematics, and especially group theory. Modify GAP
(interpreted language) for multithreaded. ParGAP, a
parallelization of GAP using TOP-C already exists and is a refereed
GAP package. Replace this implementation (which is distributed
memory only) with a new implementation that allows for both
distributed and shared memory operation. A major issue in this
project is to stop all threads before any garbage collection.
(Multithreaded garbage collectors exist, but porting one to GAP
would probably be beyond the scope of a one-quarter course.)
For those who don't mind LISP syntax, the same project can be
attempted on GCL
(GNU Common LISP). GCL also has a distributed memory
parallelism, ParGCL.
- TOP-C, the Grid, and generic communication layers
- The Computational Grid represents an important paradigm for
computation in the future. In this paradigm, 30 computer centers
may "share their
computer resources" on the Internet. Then, instead of one person
using a cluster during 30 nights over a month, one person can use 30
clusters in a single night to complete the computation. This
requires no new resources and achieves the same throughput, but the
elapsed time from the beginning of a computation to the end is
drastically reduced. The Globus
protocols are an evolving standard
to support the Grid.
TOP-C
currently supports the Grid through the
Ampic communication layer for
TOP-C, and also through the use of MPICH with the
MPI
communication layer for TOP-C. Ampic is part of the AppLeS project of the
Grid Research And Innovation Laboratory (GRAIL) at the San Diego
Supercomputer Center. The integration of TOP-C with Ampic was joint
work with Henri Casanova, Jim Hayes and Thomas Witzel. Thomas
Witzel
is at Northeastern University, and Henri Casanova and Jim Hayes are
at the University of California, San Diego.
Is there a way to automatically write a generic communication layer?
One should specify the primitives for spawning a new process, sending
a message, and receiving a message. Hence, given these primitives,
one should be able to use the current comm-mpi.c as a template, and
replace the MPI commands for spawning, sending, receiving, etc., by
the commands given in the specification. (This is similar to the
current generic
marshaling
package, joint work with Ning Ke.) A test of the success of this
facility is the ability to easily incorporate the Globus protocols.
This is important for making the software more maintainable
(e.g. aspects in programming languages) and easier to update
when protocols change.
- Adapt checkpointing to interactive programs
- Checkpointing programs are especially efficient for task-based
master-slave computations, since they need only checkpoint the
master process. Many parallel interactive programs rely on
task-based master-slave architectures. Adapt an existing checkpointing
package in order to checkpoint a parallel interactive program.
There should be an emphasis on robustness, so that this works
well in the "real world".
- Marshaling
- Marshaling of complex objects is inherent to any sophisticated
application in distributed computing. For C (and for C++ when it
uses "class" as a C "struct"), there are standard solutions. These
solutions tend not to support the full feature set of C++. This was
painfully observed in parallelizing Geant4.
Rather than reinvent the wheel, one desires a package to enable
marshaling through annotations in comments. Decisions must be made
where to do a deep copy, shallow copy, index into a permanent data
array, etc. Templates, inheritance and other issues add real world
complications. Simplicity is important here. Many of these
problems have been solved by Marshalgen
(also, see paper
and earlier
paper). How good is the coverage? How easy is it to use?
- Scalability of the master-worker paradigm
- The master-worker paradigm suffers from scalability problems, as
the number of workers grows (perhaps into the thousands). This becomes
an issue as vendors move to quad processor chips and beyond. Set up a
tree of masters to pass tasks to the slaves. Do this initially for
only two levels, and then continue. Next, assume that the lowest
layer is shared memory. (As a special case, each node at the lowest
layer may have no sibling nodes, in which case the problem of shared
memory goes away.) This problem can be solved by invoking the
existing communication layers of TOP-C more than once. A new
intermediate communication layer must be written for TOP-C which
considers itself a worker process when communicating with its parent nodes,
but it considers itself a master process when communicating with its
child nodes.
- Master-worker paradigm for combined
distributed/shared memory computation
- This is related to the
previous issue of scalability. Here we emulate the future
architecture of multiple quad processor (or dual processor) chips.
To support this architecture, consider a three-level tree with leaf
nodes using shared memory model and intermediate nodes using a
distributed memory model. The topmost, root node acts as the master.
To implement this, one has to write an
intermediate communication layer that links with both comm-mpi.c
(distributed memory) and comm-pthread.c (shared memory), while
avoiding name conflicts. The following pseudo code does this:
comm-hybrid.c:
#define COMM_ COMM_MPI_
#include "comm-mpi.c"
#define COMM_ COMM_PTHR_
#include "comm-pthread.c"
One then implements the standard COMM_
functions,
and makes calls to both COMM_MPI_XXX
and
COMM_PTHR_XXX
.
- Large messages
- Large messages cause inordinate delays in a parallel program.
This is especially threatening if it leads to unbalanced solutions in
which the master must delay issuing tasks, or issue tasks in a context
of incomplete information, due to network delays. A lot of this can
be solved by proper use of TCP/IP socket options for setting buffer
size and other parameters (see "man getsockopt" and "man
setsockopt"). Modify TOP-C to adapt to variable size messages in an
application that creates large message buffers.
- Memory allocator
- The topc.c file goes part of the way toward implementing an
efficient memory allocator for TOP-C. Much of the difficulty is for
the case of threads. Implement short messages allocated directly in
memory header. Right now, there are three calls to the memory
management: allocate memory, check memory, free memory; Can check
memory be avoided in a new version?
Philosophy: We know that all message headers will always be the
same size. Malloc doesn't. Can we take advantage of that for
more efficient organization?
- Reorganize topc.c and memory.c (more modular)
- This is an exercise in software design. Can TOP-C be made more
object-oriented in spirit, while retaining the C language as the
greatest common denominator in the C family of languages.
(Consider as an example, the design of GTK+) One should create a
separate utility for linked lists.
Objects might be linked lists of slave states, linked lists of tasks,
etc. As the code grows, can it be documented better by referring to
standard design patterns? The goal here is to make the code as readable
as possible.
(Also see Memory Allocator.)
- Integration of Condor with TOP-C
- Condor is widely used software with special benefits for setting up
"computing farms". It has strong support for checkpointing
(see also Poor Man's Checkpointing) and
dynamic creation of processes (see also Dead Workers and New Workers). How can
this be integrated as a lower layer under TOP-C, in order to provide
the TOP-C programming model with these benefits?
- RPC: Create communication layer using RPC for TOP-C
- This can be used to allow new workers to automatically become
available. (See Dead Workers and New
Workers.)
- XML/SOAP: Create communication layer using SOAP
for TOP-C
-
SOAP is becoming more of
a standard, with the rise of XML.
Note that the Globus project
also wants to use SOAP more heavily.
The advantage is that the prevalence of the Web technology
ensures that there will be a lot of tools to support SOAP
on many platforms.
- Distributed Shared Memory (DSM)
- Distributed Shared Memory was long thought to be a simple
programming model, because it provides a shared memory abstraction
(similar to POSIX threads), while being built on top of a
distributed memory architecture, such as a cluster. Explore the
strengths and limitations of this paradigm, by implementing a toy
example. The idea is to use mmap to change the protection bits on
pages in RAM. So, access to a shared page will cause a page fault
if the page is shared, and a user-level handler can recover from
this. (See, for example,
TreadMarks.)
It has been said about
engineers that any engineer can design a solution, but only a good
engineer can design an economical solution. In computer
science, high level languages and then still higher level languages
(e.g. perl) have been used to provide economical solutions to
difficult programming problems. It is part of the thesis of this
course that message-based parallel solutions are relatively low level,
difficult to write, and difficult to debug. The purpose of most of
these application-based projects is to demonstrate a task-based
parallel solution. Of course, task-based solutions usually use
messages at a lower layer. But the higher task-based abstraction
hides the details of message passing, thus providing a more powerful tool.
In the case of parallelizing a very large program, this means that a
task-based tool may provide the first successful parallelization of
the program (as was the case for the parallelization
of Geant4). In the case of a smaller application, one has the
luxury of writing a new program. In this case, a task-based approach
gives one the flexibility to easily experiment with many algorithms
and thereby find an efficient algorithm. Once one has found
an efficient algorithm, a direct message-based approach may still be
more efficient (just as once one has found an efficient C-based
implementation, an assembly implementation may still be more
efficient). However, just because assembly is more efficient than C,
and C is more efficient than Java, few people would want to neglect C
and Java for the sake of assembly language.
( Geant4 (GEometry
ANd Tracking) is a simulation package for particle-matter
interaction developed at CERN, the
place where the Web was invented. Although initially developed to
assist in collider design, its use has spread to many areas of
physics, engineering and biomedicine. Including the associated
toolkits, Geant4 contains approximately 1,000,000 lines of C++ code.
The Geant4 software was developed by RD44,
a world-wide collaboration of over 100 scientists participating in
more
than 10 collider experiments in Europe, Russia, Japan, Canada
and the United States.
Many specialized working groups are responsible for the various
domains of the toolkit.
Geant4 is a 1,000,000 line C++ program. It has already been
parallelized with event-level
parallelism using TOP-C. The Geant4 user community has additional
requirements for parallelism, which are the basis for these projects.
- Parallelization of a Geant4 Application
for Dark Matter Detection
-
Dark Matter experiments are currently being developed using a novel liquid
xenon based technique for detection of WIMPs (Weakly Interacting Massive
Particles). In order to calibrate the effectiveness of such detectors
neutrons are used in the laboratory to mimic the interactions of candidate
Dark Matter particles. This experimental work has been carried out at
Imperial College in London, England and has produced some interesting and
partially unexplained results.
To undertstand the observed signals it is necessary to simulate the
neutron scattering within the laboratory and predict the expected
interactions within the test detector. Tracking of neutrons in a
detector-like manner is extremely processor intensive. In 50ns a pulse of
100 million neutrons is produced which once scattered around the room
produces about 10000 events within the detector over a time-scale of about
1ms. A sequential simulation of this experiment processes less than 200
neutrons per hour. Therefore the only way to successfully understand and
simulate this will require parallel resources.
The project will entail taking existing sequential Geant4 code of the
experimental set-up and adapting it to a parallel regime. If time permits,
the simulation can be extended to define and advise on the experimental
location of the neutron source and suitable collimation to maximise the
effectiveness of the measurement, which will be repeated in the immediate
future.
The following tar files contain
some sample macros and the source code to be run within Geant4.
There is the possibility of including this work within an experimental R&D
journal paper on the development of this new type of Dark Matter detector.
In addition a separate publication demonstrating the use of Geant4 in a
novel regime (low energy and neutron tracking) with application to a novel
experiment - Dark Matter searches using Liquid Xenon. There is also the
intention to present this week at a summer conference within the context
of Dark Matter detector development and simulation.
This is joint work with Alex
Howard of the High Energy
Physics group at Imperial
College in London, England. The description is by him.
- Geant4: shared memory parallelism
- TOP-C allows the same source code to be easily converted from distributed
memory parallelism into shared memory parallelism. However, Geant4
uses a navigator to determine geometry, and the navigator is
a singleton class. In shared memory, many tasks would try to simultaneously
use this singleton class, leading to disaster. The solution is to modify
Geant4 to make the Navigator a non-singleton class. How can one
do this with little or no change to the Geant4 source code, but primarily
by adding new classes? A solution should make few assumptions about
Geant4, so as to easily adapt to new versions.
- Geant4: track-level parallelism
- In Geant4, and (external) event creates a particle. Each particle
creates a track, as part of its simulation. A track may end in decay
of the particle or interaction with matter, giving rise to one or more
secondary tracks. The secondaries may give rise to their own
secondaries. There is an event work loop, which processes each event
in turn. As a single event is processed, there is a track work loop, which
processes a track, and then processes the secondary tracks that arise
from previous tracks. Event-level parallelism for Geant4 has been
done. Track-level parallelism (parallelization of the track work
loop) remains to be done.
- Geant4: Interactivity and PostponeOneEvent()
- In the current parallelization of Geant4, user commands are read
from a script by each process (master and worker processes). A more
elegant parallelization would take user commands directly from the
terminal. This means that user commands must be passed from the
master to each worker process. Further, if the user command is to
abort the current event or session, then the user command must be
passed to all workers. The TOP-C "soft abort" facility is useful
here.
In addition, PostponeOneEvent() in Geant4 has not yet been
parallelized. In event-level parallelism, one may wish to suspend the
simulation of an event until a later time, when more information will
be available from other worker processes. However, this implies the
conversion of previously trivial parallelism into non-trivial
parallelism. Hence, the TOP-C features, such as the REDO action and
the UpdateSharedData() callback may be required to handle this more
sophisticated situation.
-
Parallelize as many of the
SPLASH-2
applications as possible using TOP-C. There are important lessons to be
learned here about what are the limitations of the TOP-C model.
This is very important for developing the ideal future parallel language
that will be easy to use.
If enough coverage is provided, this
can be turned into a publishable paper (typically by investing as much
time after the course as was invested during the course).
-
Search for Steiner 5-designs (combinatorics)
-
This can be thought of as a problem in recreational mathematics, although
there are important applications for statistics and other fields if
a Steiner 5-design is discovered. All definitions are elementary
and are understandable by a high school student.
The search is reduced by the appropriate theory, see below, to the problem
of solving a Diophantine system of linear equations.
The matrix of coefficients has entries 0 or 1 and on the right hand side
the inhomogenous part is the all-1-vector. Any solution vector so also
has only entries 0 or 1.
Since most entries are 0, a data structure is chosen that only uses the
non-zero entries. So, for each row there is a stack of all column numbers
where this row has an entry 1.
The solutions of the search problem are selected column numbers such that
in each row exactly one of the selected numbers occurs.
A second representation is available where for each column all row numbers are
listed where the column has an entry 1.
Options for publication of a paper include:
Designs, Codes and Cryptography;
J. Comb. Designs; Discrete Math;
and other journals.
The full details are given here.
This work will be joint with Prof.
Laue and his working group
Lehrstuhl II at
U. of Bayreuth
in Germany).
- Network simulator
- Network simulators are notorious for their irregular parallelism.
How does one efficiently parallelize such an irregular application?
- Neural Net
- In the future, one can conceivably wish to employ neural networks
with millions of nodes in each layer. In such cases, the slow
speed of RAM is likely to dominate the running time. How can
one structure this application to avoid this. (See, also, the
related topics of Bottlenecks due to slow RAM
and Superlinear parallel speedup and slow RAM.)
- Parallel simulated annealing
- Design a new parallel algorithm for simulated annealing. The
simulations done so far are mostly based on message passing and data
parallelism. Can one design an efficient task-based algorithm?
There are similar, related questions for branch-and-bound, and other
search-style problems.
Simulated
annealing can be thought of as a noisy hill-climbing algorithm,
in which one follows a given gradient. This is a challenge to do
efficiently, since distinct worker processes will tend to climb
different hills, and it is difficult to share such information to provide
a general speedup. The `soft abort' feature of TOP-C may be
useful in this context. Previous implementations of parallel
simulated annealing exist. Some sites I found are distinguished
for a
simple
overview,
a
general implementation in C++,
a
survey of various approaches,
a
journal paper of one successful approach,
another
such journal paper,
and slides from
a talk.
- Shortest solutions for Rubik's cube
- There are several approaches to finding either shortest or
simply "short" solutions for arbitrary initial states of Rubik's cube.
However, such approaches require enormous amounts of CPU time.
Hence, parallel methods are called for.
- Search in chess endgames
- Chess endgames cannot always be "solved" (to determine who wins)
when there are six or more chess pieces on the board. Part of the
problem is storing all of the positions in a large distributed database
on disk. (See
Scalability of data in parallel computation.)
How can this be managed?
- Random numbers
- Sequential simulations frequently use a single random number generator.
What must a parallel simulation do in this context? Some desirable
properties are:
1. Each worker process must draw from a distinct random number sequence.
2. Repeatability between parallel runs when using the same random seed.
3. Repeatability in comparing a parallel run with n slaves and a
parallel run with only 1 slaves, when using the same random seed.
4. Repeatability between the parallelized simulation and the
original (unmodified) sequential simulation.
Solutions satisfying 1, 2 and 3 are relatively easy. However, it is
not trivial, since distinct parallel runs may re-order the individual
tasks, depending on the speed of the worker processes. If task 5
executes before task 4, can one still maintain repeatability of the
results? One sends a new seed or random state to each worker, along
with each task. However, is there a theoretical basis to guarantee
that the use of multiple random seeds satisfies the pseudo-randomness
property of a single generator? Also, is there an efficient way
to capture goal number 4?
- Small Java version of TOP-C
- This allows TOP-C to use the Java sandbox. Hence, a PC owner
can safely volunteer a new worker process in the Java sandbox,
without fear of danger to the local computer. This could also
take advantage of Java RMI for messages. The TOP-C syntax could
be extended to allow for this.
- Numerical analysis
- Consider your favorite numerical algorithm (Gaussian elimination,
differential equation solver, etc.) and design a new, task-based
algorithm for solving it in parallel.
Gene Cooperman
College of Computer Science, 336 WVH
Northeastern University
Boston, MA 02115
e-mail: gene@ccs.neu.edu
Phone: (617) 373-8686
Fax: (617) 373-5121