Research of Gene Cooperman
This page will always be obsolete at any given point in time, but
sometimes more obsolete than other times. It is very obsolete right now.
I'm happy to correspond.
Here are my refereed publications and some
examples of my software. I last updated
this page in November, 2011.
A brief descripiont of my research
A current theme in my
High Performance Computing Laboratory at Northeastern University is
adaptation of data structures and low-level software access algorithms
to quickly changing technology. In the 90s, computers became faster.
Now, we simply have more of them, and with the growth of heterogeneous
computing, we have more types of them. A partially related thrust
is to treat checkpoints of running programs as first-class objects
These research directions are summarized here:
- User Space Distributed, Multi-Threaded Checkpointing (DMTCP):
DMTCP checkpoint-restart package employs a
pure user-space approach. This enables DMTCP to be bundled with other
major applications for distribution. Checkpointing to disk,
or restart, takes place in seconds or less. DMTCP requires no modification
of kernel or of application binary. It has been demonstrated on
OpenMPI, SciPy (iPython), SCIRun, Java, bash, gcl, matlab, and so on.
DMTCP is the most widely used user-space checkpointing package.
(Some other checkpointing packages, such as
require a kernel module.
They are more commonly used for batch queues. It is difficult to
compare usage among batch queues versus user-space settings.)
- Reversible Debugger:
A new experimental version of DMTCP can now checkpoint debugging sessions.
Using this, we had built URDB in 2009, a reversible debugger
for single-threaded programs.
It can reverse execute code (going backwards in time).
While URDB is freely available (GPL),
it is now obsolete. We plan to soon release
a beta version of FReD (Fast Reversible Debugger). It is both
more robust (capable of reversing MySQL, Firefox, and Apache)
and capable of also debugging multi-threaded programs.
Following a principle of orthogonality, FReD (and the planned
debugging tools on top of it) operate with multiple debuggers.
Using this, we have built the first reversible debuggers
for: MATLAB, python (pdb module), perl (perl -d), and OpenMPI using gdb.
This also provides a gateway both to program-based introspection and
to speculative program execution.
- Disk-Based Parallel Computation (data-intensive computing):
computing is now seeing many cores, but the RAM is not growing in
proportion. Our solution is to use the disk as an extension of RAM.
The bandwidth of 50 local disks in a cluster is approximately the
same as a single RAM subsystem. While this may solve the
bandwidth problem of disk, the latency problem remains. We have
developed over five years a series of applications that overcome this
barrier. Development of such disk-parallel code is highly labor
intensive. We are now working on a mini-language, Roomy, that reduces the
software development and debugging time from person-months to person-days.
The end user need only make minimal changes to a sequential program, and then
link in the Roomy run-time library.
Just as the Linda programming language provides coordinated access to
a common tuple space, the
access to the disk storage resources of a computer cluster or SAN
through a sequential API.
A particular emphasis of
this research is on a broad variety of search algorithms, with an eye
to applications in formal verification
and elsewhere. The demonstration that Rubik's cube can be solved
in 26 moves or less using 8 terabytes of disk storage was a
byproduct of this work that attracted popular attention.
- Converting Distributed Memory Parallelism to Thread-Parallelism:
This is a newer project. As the move to many-core computing provides
less RAM per core (and for other reasons), it becomes desirable
to migrate MPI or other distributed memory code to thread-parallel code.
In particular, with the advent of many-core CPUs, large sequential
codes must be converted to thread-parallel code with data
sharing in order to avoid thrashing between the CPU cache and RAM.
Source code transformations are used to segregate thread-private
read-write data. In combination with copy-on-write (UNIX fork system call),
nearly linear speedup is achieved. This has been tested so far
on 24-core machines. An interesting byproduct of segregating
the read-write thread-private data is that even for the sequential
case (single thread), we sometimes observe a speedup. The methodology
has been developed in cooperation with the Geant4 developers.
Geant4 consists of about 750,000 lines of C++ code
developed at CERN for simulation of particle-matter interaction. One
of its applications is analyzing data from the
collider (particle accelerator) at CERN, which is about
8.6 kilometers in diameter.
History/Background: I have a background from the 80s and 90s in
computational algebra (especially computational group theory). This
has served me well as a testbed for parallel computating. This work
led to the TOP-C
(Task Oriented Parallel C/C++) model of parallel computing. In a
nutshell, it was always designed for commodity computing, and it
emphasizes a task-oriented model with lazy updates of globally shared
memory. This allows for good latency tolerance, while providing an
exceptionally easy model for end-users to implement a generalization
of task-oriented parallelism allowing for non-trivial parallelism.
Some outgrowths of that work are my support for parallel GAP (Groups,
Algorithms and Programming), parallel GCL (GNU Common LISP), ParGeant4
(Geant4 is a million line program developed at CERN and elsewhere,
which is used to design and simulate experiments on the LHC, the
largest collider in the world). My software
page describes this software further.
Khoury College of Computer Sciences, 336 WVH
Boston, MA 02115
Phone: (617) 373-8686
Fax: (617) 373-5121