High Performance Computing Lab, Northeastern University
The High Performance Computing Laboratory at Northeastern University
is led by Gene
Cooperman. The Lab is part of the College of Computer and Information
Science and is located at 370 West Village H. It currently
includes five Ph.D. students. The Laboratory is pursuing three inter-related
topics: parallelization tools, scientific/engineering applications,
and balancing architectural bottlenecks.
Professor Cooperman has over 70 refereed
publications, and has been
awarded 15 grants from the National Science Foundation. He is the head
of the High Performance Computing Lab at Northeastern University. He is
also the director of the Institute for Complex Scientific Software, an
inter-disciplinary collaboration across five departments at Northeastern
Three current research directions are:
They are described in further detail below.
- Disk-Based Parallel Computation
- User-Space Distributed, Multi-Threaded Checkpointing
- Converting Distributed Memory Parallelism to Thread-Parallelism
- Disk-Based Parallel Computation:
Commodity computing is now seeing many
cores, but the RAM is not growing in proportion. Our solution is to
use the disk as an extension of RAM. The bandwidth of 50 local
disks in a cluster is approximately the same as a single RAM
subsystem. While this may solve the bandwidth problem of disk, the
latency problem remains. We have developed over five years a series
of applications that overcome this barrier.
We are now working on some general tools that others can use to
quickly design and implement disk-based computations. A demontration
of the power of this approach was our result that Rubik's cube can
be solved in 26 moves or less. This was done in 2.5 days
of a 32-node cluster using 8 terabytes of distributed disks.
Development of such disk-parallel code is highly labor intensive. For an
example of the power of this approach, you are welcome to read some source code written in the Roomy language
(a library-based extension of C/C++). The Roomy-based code requires
only 271 lines of code, and was written in less than one day.
Even though the source code appears to the end
user as a short sequential program, the code invokes the Roomy run-time
library, which then employs multiple threads, MPI, and access to multiple
files per computation node on behalf of the user.
Currently, we are employing Roomy toward more serious efforts,
such as formal verification. Many problems in formal verification
are known to suffer from the state explosion problem.
- User-Level Distributed, Multi-Threaded Checkpointing:
The user-space approach allows us to bundle the checkpointing
capability with the application or with the computational facility,
as opposed to kernel-space solutions, which (at least in binary
form) are bound to particular versions of the kernel, and therefore
to the computational facility. As one expects, we require no
modification of kernel or of application binary. We have demonstrated
that it works with OpenMPI, with MPICH-2, with SciPy (iPython),
with the Java JVM, and a variety of other applications. Our latest
version is DMTCP
(available at SourceForge), and is available under GPL. The chart
of the number of downloads shows DMTCP to be in active use
(also reproduced below).
- Converting Distributed Memory Parallelism to Thread-Parallelism:
This is a newer project. As the move to many-core computing provides
less RAM per core (and for other reasons), it becomes desirable
to migrate MPI or other distributed memory code to thread-parallel code.
In particular, with the advent of many-core CPUs, large sequential
codes must be converted to thread-parallel code with data
sharing in order to avoid thrashing between the CPU cache and RAM.
We are investigating to what extent some of this can be done
semi-automatically for properly structured code.