CS 4650 / CS 5650
(Research in High Performance Computing)
Instructor: Gene Cooperman
Spring, 2012
Tuesdays, 321 Hayden Hall, 6:00 - 9:00
For "Debugging and Systems Tricks" (below):
Copyright © Gene Cooperman,
2017.
This may be freely copied and modified as long as this copyright
notice remains. I would appreciate any enhancements being sent
back for possible inclusion here.
NEWS: The
Course Wiki
is available.
OpenMP and parallel benchmarking test suites
Organization
- Eliott Wiener (FReD -- reading internals: Python, record-replay
module for DMTCP)
- Samaneh Kazemi - Cilk (software model for multi-threaded programming)
- Adapt to FReD; but needs determinism on replay.
Understand internals: analyze Cilk races: wrapper around
atomic increment (used by Cilk), and rdtsc Intel assembly;
Analyze internals (how and why does Cilk use atomic increment,
rdtsc): analyze Cilk run-time libraby (where are the Cilk
functions, spawn and sync defined?) [ has already verified
that the Cilk test suite can be checkpointed using DMTCP ]
- Andrew Hannon-Rizza - DThreads - download and try out Dthreads;
Next step: checkpoint Dthreads with DMTCP
- Jonathan Albernaz - Averting TCP bandwidth throttling by enhancing
UDP protocol to add functionality so that
underlying application cannot be bandwidth-throttled; also encryption;
use UDP for testing)
- Rohan Garg - implementation of epoll wrapper (benefits for
Apache, Firefox, and some VMs like user-space Qemu !!)
- Zhengping Jin - Lguest - Reading internals (a virtual machine
in only 5,9000 words) Using DMTCP for snapshots?
- Ashutosh Waikoo - ??? (maybe interested in epoll wrappers, etc. ???
(currently working on adding PID/TID virtualization using DMTCP
modules; (experimental module for single-process checkpointing
exists by Kapil Arya; extend to distributed checkpointing)
- Komal Sodha - emacs23 bug -- (investigating DMTCP) ; May need
to debug DMTCP with GNU screen first. :-)
- Jim Shargo - X11 graphics, checkpoint using suspend-to-disk ;
understand internals of suspend-to-disk
and adapt to DMTCP; Prof. Bart Massey (Portland State U.)
is an expert on this; We'll ask his advice.
- SAME ORDER AS ABOVE. (Ashutosh Waikoo will be early or later)
- Jonathan Albernaz and Andrew - New Layer 4 protocol
- hiding detection of content,
bandwidth, actual
ports by network switches in the middle: encryption, port hopping,
multi-port channelling, randomized protocol byte order structuring;
Use TCP as model (starting point)
- Samaneh Kazemi - Adapt Cilk for reversible debugging using FReD
- Rohan Garg and Ashutosh Warikoo - Checkpointing Qemu
- Zhengping Jin and Komal Sodha - Lguest
- Jim Shargo - checkpointing 3-D graphics (OpenGL) using suspend-to-disk
- Eliott Wiener - FReD and Haskell
The prerequisites for this course are a familiarity with programming
in C (including pointers) under the Linux operating system, and the
ability to consider a new system call, read the Linux man page for it,
and to then understand how to use it in your program. Ideally, you should
also be comfortable using a symbolic debugger (e.g. gdb), although this
can be assimilated during the course itself. Students may in some cases
also be migrating from a background with a different operating system.
The remaining background knowledge (including
systems concepts) will be introduced/reviewed in the course.
If you want to privately test yourself on the
prerequisites, then read man mmap, and try writing a short
C program that uses the mmap system call. Also, write a short
amount of testing code to verify that the system call produced the
result that you expected.
If you have taken a systems or operating systems earlier, or if you are
taking such a course during the same semester, this would normally provide
those prerequisites. If you have questions whether your background
fulfills the prerequisites, please see me.
Course Structure:
As the
official course description states, this course "introduces students
to research in the domain of high-performance computing."
At its core, research is messier than the highly structured courses that
one more typically sees, but it can also be very exciting to see things
that no one in the world has ever seen before. For this reason, the course
requires highly motivated students who will operate semi-autonomously,
while reporting back to the class at regular intervals. The course
will have a small to moderate enrollment with the opportunity for more
personal attention.
There will be a warm-up project in January, followed by a term project.
The warm-up project should be done individually, although discussions
among students are encouraged both in class and outside of class.
The term project will typically be done in teams. The ideal team size
is two or three students.
The warm-up project gives you an opportunity to try out a research
area, and also to bring yourself up-to-date on a selection of techniques
that you may need for systems programming. (See,
Debugging and Other Systems Tricks for an example
of some of the useful techniques, many of which will also be
discussed in class.)
For the term project, students may choose to work either on their
own research projects that they bring to the course, or on
research
questions that have evolved from the instructor's own research lab.
Lectures will be customized to present background concepts, theory,
and practical techniques of special value for the term projects as they
develop. The instructor and his students will offer generous amounts
of time to collaborate with the teams in small meetings.
In keeping with the course goal to take students to the forefront of research,
there will be opportunities after the course is over to continue
to collaborate with the goal of a competitive conference publication. However,
to maintain a sharp line between the academic course and extracurricular
work toward a publication, any interest in a collaboration toward a
conference publication should be discussed only after the student
has received his or her final course grade. Interested students
should be forewarned that the effort to produce a competitive conference
publication after the course is at least as great as the effort in the
course itself.
(Note that the overlap of certain weeks is intentional.)
WEEKS 1 and 2: Introduction to research topics; students
choose mini-project
WEEKS 3 and 4: Continuing lectures on research topics; students
complete mini-projects
WEEK 4: Students present results of mini-project
(oral and written)
WEEKS 4 and 5: Students choose course project.
WEEKS 4 through 8: Lectures guided by needs of students for
projects.
WEEK 6: Interim project reports by students (oral and written).
WEEK 9: Further interim project reports (oral and written).
WEEKS 10 through 12: Students lead discussions of lessons from
research: results of research topics to date; potential for
new research directions; interaction with other research results
in the literature
WEEK 12: Final project presentations (oral and written)
Instructor Information:
Office: 336 WVH
(and also look in my High Performance Computing Lab, 370 WVH)
Office Hours:
After class, and 4:20 - 5:30, Tuesday and Thursday; and also by appointment.
If students are having problems with their code, they are encouraged
to stay after class or arrange an appointment, so as to develop some
code jointly with the instructor.
Text:
There is no textbook. Internal documents and pointers to resources
on the Web will be provided. Please also note the two reference books
on systems programming listed at the end of this web page.
Grades:
Grades will be determined by the sophistication of the project, along with
the quality of the reports to the class (both oral and written reports).
Both individual and joint projects are possible. Students will be
encouraged to first do a (warm-up) mini-project, followed by a full
project that need not be on the same topic.
Research consists of exploration into the unknown.
Since all research is speculative, research results consist both of
positive and negative results. In geographical terms, the discovery
of a new mountain range (a new barrier) is just as interesting
as the discovery of a new river (a new exploration route).
For 2012, the course will be project-based, and will leverage the
research of the
High Performance Computing Laboratory. It will
emphasize two research vehicles:
- DMTCP
(Distributed MultiThreaded CheckPointing):
DMTCP is an open source
package freely available from Sourceforge and developed by
a team originating in the High Performance Computing Laboratory.
There is
a video demo of it here. There is also a
description of its internal
architecture.
It transparently
checkpoints the state of a process or computation to disk.
It does so in user space (no modification to the Linux kernel).
dmtcp_checkpoint a.out # run a.out under checkpoint control
rm ckpt_a.out-*.dmtcp # remove any old checkpoint image files
dmtcp_command -c # checkpoint the current process
dmtcp_restart ckpt_a.out-*.dmtcp # restart process from disk
DMTCP transparently follows the creation of new threads,
the forking of child processes, and the spawning of remote
processes via ssh. It currently does not checkpoint certain
processes involving X-Windows, the ptrace system call (e.g. gdb),
or suspended processes (^Z). The research question is how well
DMTCP can checkpoint common processes (without modifying the kernel),
and how well it can be extended to novel applications (checkpointing
GUIs using X-Windows, creation of a reversible debugger by
checkpointing gdb, etc.). For example, an interesting novelty
would be the ability to checkpoint some open windows of your
current session, and carry them home with you on your USB key.
- VISION: Checkpointing has seen three important uses:
restarting long-running computations in the middle after a
computer crash; load balancing and process migration; and
more recently,
restoring an earlier state for purposes of programming or
debugging. DMTCP supports all three modes, but some of the most
interesting research goals lie int he third area. Wouldn't it be
nice to checkpoint an X-Windows application, and move it to another
machine, and restart it? Can one do that with 3-D graphics
(an extension to basic X-Windows)?
Can one checkpoint a virtual machine such as user-space Qemu
or Linux lguest? If one could do this, one could even think
of running malware inside Windows inside a virtual machine.
Why is this useful? We can checkpoint fast (in seconds, unlike
the time for a virtual machine snapshot).
If the malware detects that it is being
spied on, we can back up to a previous checkpoint. If we are not
sure what input to pass to the malware, we can restart from the
checkpoint several times, and play "What if" games.
Don't worry if you have never used Qemu or lguest. All concepts
will be explained in a self-contained manner in the course.
-
EXAMPLE PROJECTS FOR 2012: Checkpointing single X11 apps (e.g.,
checkpointing Firefox: the ultimate bug report
for just before it crashes); Checkpointing a user-space
virtual machine; Infiniband support and
porting projects from expensive Infiniband clusters
to cheaper TCP/IP clusters for leisurely debugging.
- FReD
(Fast Reversible Debugger):
FReD
is an open source reversible debugger. It implements such
commands as reverse-step, reverse-next, and reverse-watch
(a generalization of watchpoints).
Suppose one is using a debugger and the
variable x
has the wrong value. When did it get the wrong value? Wouldn't
it be nice to revert to an earlier state and
examine x?
One can with DMTCP, which immediately yields a reversible debugger.
If we had checkpointed a debugging session 100 commands ago,
and we wish to undo the last debugging command, then just restart
the checkpoint image from 100 commands ago, and re-execute
the first 99 debugging commands. Now, combine the last
two ideas: I'm sure you've all seen how easily web browsers
can crash. Wouldn't it be great to go back and find out at which
statement they did something causing the crash?
An old description of
FReD can be found in these slides
from here. While reversible debuggers have been available
at least since 1970, they have seldom gained widespread use.
Most recently, GDB version-7.2 and later provides excellent support
for reversible debugging using its target record
command. GDB-7.3 is available in Ubuntu~11.10, and you will
find a copy of it in the instructor's directory.
Some strong points of the FReD reversible debugger are:
(i) supports multi-threaded programs at near full speed;
(ii) supports long-running programs (in contrast, GDB reversibility
is not practical for programs running even a few seconds); and
(iii) FReD supports a novel feature,
reverse expression watchpoints. (See the
slides
for a description of this feature.)
FReD is in the last stages before a public release. An alpha copy
of the code, along with two documents describing it are at:
The vision and example projects follow:
- VISION:
FReD provides a Python-based scripting language that allows one
to directly call debugging commands that can manipulate the
debugging history of a process. Using this platform, one can
automatically search for the cause of bugs. For example, if a
a program dereferences a NULL pointer, FReD can bring one back
in time within the GDB debugger to the point where the corresponding
pointer variable was being set to NULL. If a buffer is allocated
via malloc, and a program calls
free twice on the same memory buffer, then FReD can bring
one back to a point in time where the first call to free
was made. This is done using reverse expression watchpoints.
FReD can be extended to other debuggers besides GDB, and to
other mechanisms for searching for the cause of a bug, beyond
the examples above.
-
EXAMPLE PROJECTS FOR 2012: extend FReD to work with
multi-threaded languages such as Cilk and OpenMP ;
add reversibility to the functional, lazy language Haskell ;
implement reversible memory leak detector
that will go back in time to the cause of the memory leak
The instructor will cover any missing systems knowledge either in class,
or one-on-one with individual students.
GDB and other UNIX resources:
Some help files for UNIX and its compilers,
editors, etc. are also available.
In particular, the use of gdb (the GNU debugger) is especially encouraged
as an important productivity tool.
The lecture slides on
parallel computing (from the Intel Parallel Computing Center at
the U. of Oregon) form a nice view of parallel computing, based
on the Structured Parallel Programming
book (written by authors from Intel).
Here is also one book that is very nice for learning systems
programming concepts. Choose a chapter of interest, rather than
reading it from front to back. The Rochkind book is an excellent book,
with simple, example source code showing useful programs. The book home page
has the table of contents, and downloadable example source code.
I also recommend the online book, "The Linux Kernel", below
for qn excellent overview of the kernel.
The book by Robert Love
provides more technical details on the Linux Operating System,
but it would only needed for more unusual aspects of certain projects.
Each of these mini-projects is described only in outline. Within the
class, further details will be provided for those mini-projects of
interest to the students. Never mind if you produce working software.
The goal is to understand. It is only incidental if you succeed with
a "deliverable".
- DMTCP (Familiarize yourself now with the code.
Note especially the subdirectory dmtcp/doc with descriptions
of many parts of the DMTCP internals.
Use tools such as gdb for a deeper understanding.
Read the QUICK-INSTALL file for more
tips about DMTCP and its debugging tools.
Then write an overview of the implementation of DMTCP.
This is a paper-only project. If you take this on,
it will require detailed descriptions of the functionality
of the components of DMTCP.
Below are some alternative mini-projects concerned
more closely with producing code or pseudo-code.
- Use the module facility of DMTCP to build a new module.
An example module might be wrappers for the functions malloc
and free. The wrappers should allocate additional "guard
regions" around the memory buffer. It should catch bugs
like user code that writes beyond the end of allocated memory,
or user code that frees a buffer twice. For interested students,
there is the possibility of building on This mini-project,
to provide a novel, advanced memory leak detector in the
FReD reversible debugger.
- Write a new DMTCP wrapper function for a new system call.
(One suggestion is for man epoll.) There are examples
of wrapper functions in trunk/dmtcp/src/pidwrappers.cpp
and other files with names *wrappers.cpp. If you choose
an advanced system call such as epoll, it is acceptable
to work jointly with another student on a single mini-project.
- DMTCP currently has a bug in checkpointing emacs23
(version 23 of emacs). Investigate the cause of this bug.
The primary responsibility is the diagnosis of the bug. You are
not required to produce a bug fix.
(The bug appears to occur in the context of screen. If you
are interested in this project, tell me, and I will help you
reproduce this bug.)
- Provide a paper design for a new MTCP module. This type of
module is unrelated to the DMTCP modules described above. Currently,
DMTCP has an option for using "gzip" to dynamically compress
files on the fly. It is mostly in MTCP. This is too restrictive.
The MTCP subdirectory should support arbitrary user-defined
modules that are called by the MTCP checkpoint or restart routines.
The modules may save a checkpoint image locally or on a remote
machine; using gzip or a newer fast compression
routine such as
Snappy,
LZO,
FastLZ,
QuickLZ,
or other.
You provide a paper design for the framework, and third parties
build whatever module they want. As part of your paper design,
consider options for third-party module writers to write to RAM
and then fork a child process that saves on disk -- or a third-party
module that mmaps RAM to disk and lets the operating system worry
about the best optimization.
You don't yet write code
in this mini-project, but you should refer to existing
code in the MTCP subdirectory.
- FReD (Familiarize yourself now with the code.) This primarily
involves reading Python code, the C++ record-replay module
for DMTCP, and a rough "black box" understanding of DMTCP.
There is a limited introduction in
this paper from the PLOS-11 workshop.
In particular, read about the primitive reverse-xxx algorithms,
and then study the code to see how reverse-watch works.
Document your findings in a report.
- Dthreads
is a novel idea for determinism: replace a multi-threaded process
by multiple processes with shared memory. There is also
a
full paper on Dthreads. This provides for
efficient deterministic multithreading. A well-known problem with
reversible debuggers is that if you go back in time, and then
execute forwards, do you arrive at the same place, or did the
operating system produce a different thread schedule that changes
the behavior? By adding determinism,
Dthreads provides a "simple" idea for allowing FReD to easily
implement
determinism. Is such a combined implementation possible? How would
it work? This project is purely a paper design.
- Linux LGuest
is a virtual machine written in just 5000 lines of well-documented
code! You can read and understand every line of the code.
Lguest does not provide for snapshots. But using DMTCP, we could
checkpoint the Lguest "process". This provides a fast
checkpoint of an entire virtual operating system. Can this
idea work? Try it. If you try it, probably some things go wrong.
What goes wrong? Propose on paper some approaches to overcome
these difficulties. There are opportunities to continue
this work into the term project.
The main projects are listed below. As always, you are also welcome to
bring your own research project. The project may be a tool useful for
High Performance Computing or a large computation itself.
We will also set up a course Wiki, where you will describe the
status of your projects. The Wiki will also have a space for general
issues/comments in supporting Roomy and DMTCP.
DMTCP (list of projects still being revised)
- Using suspend-to-disk mode to enable checkpointing of graphics
programs (including OpenGL (3D graphics))
-
MTT (MPI Testing Tool) for automated testing for
Open MPI and DMTCP
- Checkpointing Qemu, first step towards checkpointing malware,
and then running the malware reversibly
- Another option that may or may not work is the
Linux lguest simple hypervisor
- Fast process migration (e.g. for servers):
Step 1: Add an MTCP module capability
Step 2: Write an MTCP checkpoint module
that checkpoints to remote RAM
Step 3: Restart on the remote machine
Step 4: Tune it for speed
- DMTCP attach using new ptrace capability.
- Checkpoint the job control/suspend feature of your favorite shell (^Z)
- Have MTCP use a standard ELF linker script.
- Hijack/Attach to already running process and checkpoint
(There is a question here about how to follow socket connections,
if that process is already talking to other processes.)
- Thread race condition detector: A traditional race condition
eventually causes a crash. But since it's a race condition,
it doesn't always crash at that location.
Experiment with different checkpoints,
until you find a checkpoint location for which the process
always crashes upon restart. Then modify MTCP to only allow
a subset of the threads to resume, and keep the other threads
suspended. By trial and error, discover which two threads
have a race condition.
- Portable Linux Apps: DMTCP checkpoint images include any
libraries that have been loaded. If the environment variable
LD_BIND_NOW is set (set to anything), then the loader will preload
every library that it will need. This should enable one to copy
a checkpoint image from Debian Linux to OpenSuse Linux to RedHat Linux
to Ubuntu Linux to (etc.). Does this work? If not, what's needed
to make it work?
- Incremental Checkpoint: DMTCP may want to keep multiple
checkpoint images, so that it can return to any of several
execution points in the past. This would normally require a lot
of disk space. How does one efficiently store a diff between
checkpoints. (This may be a somewhat easier project, for those
who are looking for that. With other projects, one will often
find that at the end of the semester, one has to report that some
parts are still not working, and why. This project offers the
opportunity of finishing most of the project, if there are no
surprises.)
- Checkpoint valgrind or other binary rewriters. Examples of
programs using dynamic binary translation include:
Valgrind,
Pin, and
Paradyn/Dyninst.
Currently, it appears that DMTCP cannot checkpoint these
packages. Why? Can it be fixed?
FReD (list of projects still being revised)
- Reversibly debugging
Cilk programs
- Reversibly debugging
OpenMP programs
- Extend the FReD reversible debugger to Java programs via 'jdb'
- Haskell and reversible debugging (Due to the functional lazy design
of the language, Haskell has to worry about side effects in debugging)
- Combine FReD with DThreads (highly speculative, but very high impact)
- Memory leak detector: If a memory leak occurs later in the
program, valgrind runs too slowly to easily find it.
So, use a malloc debugger or your own memory/free interceptor.
A good one is the
DUMA library (libduma)
(Detect Unintended Memory Access), which is a newer replacement for
the classic Electric Fence (libefence).
This defines regions created through malloc. Late in the program,
it will be easy to find a region of memory that is a memory
leak (that no one ever touches again).
Alternatively, using checkpoint/restart
tricks, find the last time that anyone touched that memory
segment. Report that line of code using a standard tool to
convert between a line of assembly language and the source code line.
To guarantee that no one ever uses that memory again, remove
read-write protection from that region of memory and add
a segfault handler to trap any accesses. Then automate the
many checkpoint-restart to automatically find where the memory
segment was last touched.
DMTCP
If you have questions about DMTCP, please send e-mail to
Kapil Arya and me. The username of Kapil Arya is his
first name (all lower case) and: @ccs.neu.edu
DMTCP is available through
the sourceforge web page.
The easiest way to start is (in Linux) to type:
svn co https://dmtcp.svn.sourceforge.net/svnroot/dmtcp/trunk dmtcp
cd dmtcp
./configure
[ OR: ./configure --enable-debug ]
make
make check [OPTIONAL]
Then read the QUICK-START file in the top-level dmtcp directory.
From there start browsing the source code.
FReD
The FReD software will be made available soon.
See: debugging and other system tricks
(separate web page)