CS 4650 / CS 5650 (Research in High Performance Computing)

Instructor: Gene Cooperman
Spring, 2012
Tuesdays, 321 Hayden Hall, 6:00 - 9:00

For "Debugging and Systems Tricks" (below):
Copyright © Gene Cooperman, 2017.
This may be freely copied and modified as long as this copyright notice remains. I would appreciate any enhancements being sent back for possible inclusion here.

NEWS: The Course Wiki is available.
OpenMP and parallel benchmarking test suites

NEWS: Debugging and other Systems Tricks has now been updated.

NEWS: There is now a file to explain more about low-level GDB debugging.

OLD: There was a series of workshops and speakers concerned with large systems projects.

Organization

Mini-Projects (chosen)
Mini-Projects (talks)
Term Projects
Prerequisites and Course Structure
Syllabus
Overview of Two Project Themes
1. DMTCP
2. FReD
Course Resources
Mini-Projects
Main Projects
Project Software
Debugging and Other Systems Tricks

Mini-Projects (chosen)

Eliott Wiener (FReD -- reading internals: Python, record-replay module for DMTCP)
Samaneh Kazemi - Cilk (software model for multi-threaded programming) - Adapt to FReD; but needs determinism on replay. Understand internals: analyze Cilk races: wrapper around atomic increment (used by Cilk), and rdtsc Intel assembly; Analyze internals (how and why does Cilk use atomic increment, rdtsc): analyze Cilk run-time libraby (where are the Cilk functions, spawn and sync defined?) [ has already verified that the Cilk test suite can be checkpointed using DMTCP ]
Andrew Hannon-Rizza - DThreads - download and try out Dthreads; Next step: checkpoint Dthreads with DMTCP
Jonathan Albernaz - Averting TCP bandwidth throttling by enhancing UDP protocol to add functionality so that underlying application cannot be bandwidth-throttled; also encryption; use UDP for testing)
Rohan Garg - implementation of epoll wrapper (benefits for Apache, Firefox, and some VMs like user-space Qemu !!)
Zhengping Jin - Lguest - Reading internals (a virtual machine in only 5,9000 words) Using DMTCP for snapshots?
Ashutosh Waikoo - ??? (maybe interested in epoll wrappers, etc. ??? (currently working on adding PID/TID virtualization using DMTCP modules; (experimental module for single-process checkpointing exists by Kapil Arya; extend to distributed checkpointing)
Komal Sodha - emacs23 bug -- (investigating DMTCP) ; May need to debug DMTCP with GNU screen first. :-)
Jim Shargo - X11 graphics, checkpoint using suspend-to-disk ; understand internals of suspend-to-disk and adapt to DMTCP; Prof. Bart Massey (Portland State U.) is an expert on this; We'll ask his advice.

Mini-Projects (talks)

SAME ORDER AS ABOVE. (Ashutosh Waikoo will be early or later)

Term Projects

Jonathan Albernaz and Andrew - New Layer 4 protocol - hiding detection of content, bandwidth, actual ports by network switches in the middle: encryption, port hopping, multi-port channelling, randomized protocol byte order structuring; Use TCP as model (starting point)
Samaneh Kazemi - Adapt Cilk for reversible debugging using FReD
Rohan Garg and Ashutosh Warikoo - Checkpointing Qemu
Zhengping Jin and Komal Sodha - Lguest
Jim Shargo - checkpointing 3-D graphics (OpenGL) using suspend-to-disk
Eliott Wiener - FReD and Haskell

PREREQUISITES:

The prerequisites for this course are a familiarity with programming in C (including pointers) under the Linux operating system, and the ability to consider a new system call, read the Linux man page for it, and to then understand how to use it in your program. Ideally, you should also be comfortable using a symbolic debugger (e.g. gdb), although this can be assimilated during the course itself. Students may in some cases also be migrating from a background with a different operating system. The remaining background knowledge (including systems concepts) will be introduced/reviewed in the course. If you want to privately test yourself on the prerequisites, then read man mmap, and try writing a short C program that uses the mmap system call. Also, write a short amount of testing code to verify that the system call produced the result that you expected.

If you have taken a systems or operating systems earlier, or if you are taking such a course during the same semester, this would normally provide those prerequisites. If you have questions whether your background fulfills the prerequisites, please see me.

Course Structure:

As the official course description states, this course "introduces students to research in the domain of high-performance computing." At its core, research is messier than the highly structured courses that one more typically sees, but it can also be very exciting to see things that no one in the world has ever seen before. For this reason, the course requires highly motivated students who will operate semi-autonomously, while reporting back to the class at regular intervals. The course will have a small to moderate enrollment with the opportunity for more personal attention.

There will be a warm-up project in January, followed by a term project. The warm-up project should be done individually, although discussions among students are encouraged both in class and outside of class. The term project will typically be done in teams. The ideal team size is two or three students.

The warm-up project gives you an opportunity to try out a research area, and also to bring yourself up-to-date on a selection of techniques that you may need for systems programming. (See, Debugging and Other Systems Tricks for an example of some of the useful techniques, many of which will also be discussed in class.)

For the term project, students may choose to work either on their own research projects that they bring to the course, or on research questions that have evolved from the instructor's own research lab. Lectures will be customized to present background concepts, theory, and practical techniques of special value for the term projects as they develop. The instructor and his students will offer generous amounts of time to collaborate with the teams in small meetings.

In keeping with the course goal to take students to the forefront of research, there will be opportunities after the course is over to continue to collaborate with the goal of a competitive conference publication. However, to maintain a sharp line between the academic course and extracurricular work toward a publication, any interest in a collaboration toward a conference publication should be discussed only after the student has received his or her final course grade. Interested students should be forewarned that the effort to produce a competitive conference publication after the course is at least as great as the effort in the course itself.

Syllabus:

(Note that the overlap of certain weeks is intentional.) WEEKS 1 and 2: Introduction to research topics; students choose mini-project
WEEKS 3 and 4: Continuing lectures on research topics; students complete mini-projects
WEEK 4: Students present results of mini-project (oral and written)
WEEKS 4 and 5: Students choose course project.
WEEKS 4 through 8: Lectures guided by needs of students for projects.
WEEK 6: Interim project reports by students (oral and written).
WEEK 9: Further interim project reports (oral and written).
WEEKS 10 through 12: Students lead discussions of lessons from research: results of research topics to date; potential for new research directions; interaction with other research results in the literature
WEEK 12: Final project presentations (oral and written)

Instructor Information: Office: 336 WVH (and also look in my High Performance Computing Lab, 370 WVH)

Office Hours: After class, and 4:20 - 5:30, Tuesday and Thursday; and also by appointment. If students are having problems with their code, they are encouraged to stay after class or arrange an appointment, so as to develop some code jointly with the instructor.

Text: There is no textbook. Internal documents and pointers to resources on the Web will be provided. Please also note the two reference books on systems programming listed at the end of this web page.

Grades:

Grades will be determined by the sophistication of the project, along with the quality of the reports to the class (both oral and written reports). Both individual and joint projects are possible. Students will be encouraged to first do a (warm-up) mini-project, followed by a full project that need not be on the same topic.

Research consists of exploration into the unknown. Since all research is speculative, research results consist both of positive and negative results. In geographical terms, the discovery of a new mountain range (a new barrier) is just as interesting as the discovery of a new river (a new exploration route).

Two Project Themes Evolving from Instructor's Research Laboratory:

For 2012, the course will be project-based, and will leverage the research of the High Performance Computing Laboratory. It will emphasize two research vehicles:

DMTCP (Distributed MultiThreaded CheckPointing):
DMTCP is an open source package freely available from Sourceforge and developed by a team originating in the High Performance Computing Laboratory. There is a video demo of it here. There is also a description of its internal architecture. It transparently checkpoints the state of a process or computation to disk. It does so in user space (no modification to the Linux kernel). dmtcp_checkpoint a.out # run a.out under checkpoint control rm ckpt_a.out-*.dmtcp # remove any old checkpoint image files dmtcp_command -c # checkpoint the current process dmtcp_restart ckpt_a.out-*.dmtcp # restart process from disk DMTCP transparently follows the creation of new threads, the forking of child processes, and the spawning of remote processes via ssh. It currently does not checkpoint certain processes involving X-Windows, the ptrace system call (e.g. gdb), or suspended processes (^Z). The research question is how well DMTCP can checkpoint common processes (without modifying the kernel), and how well it can be extended to novel applications (checkpointing GUIs using X-Windows, creation of a reversible debugger by checkpointing gdb, etc.). For example, an interesting novelty would be the ability to checkpoint some open windows of your current session, and carry them home with you on your USB key.
- VISION: Checkpointing has seen three important uses: restarting long-running computations in the middle after a computer crash; load balancing and process migration; and more recently, restoring an earlier state for purposes of programming or debugging. DMTCP supports all three modes, but some of the most interesting research goals lie int he third area. Wouldn't it be nice to checkpoint an X-Windows application, and move it to another machine, and restart it? Can one do that with 3-D graphics (an extension to basic X-Windows)?
  
  Can one checkpoint a virtual machine such as user-space Qemu or Linux lguest? If one could do this, one could even think of running malware inside Windows inside a virtual machine. Why is this useful? We can checkpoint fast (in seconds, unlike the time for a virtual machine snapshot). If the malware detects that it is being spied on, we can back up to a previous checkpoint. If we are not sure what input to pass to the malware, we can restart from the checkpoint several times, and play "What if" games. Don't worry if you have never used Qemu or lguest. All concepts will be explained in a self-contained manner in the course.
- EXAMPLE PROJECTS FOR 2012: Checkpointing single X11 apps (e.g., checkpointing Firefox: the ultimate bug report for just before it crashes); Checkpointing a user-space virtual machine; Infiniband support and porting projects from expensive Infiniband clusters to cheaper TCP/IP clusters for leisurely debugging.
FReD (Fast Reversible Debugger):
FReD is an open source reversible debugger. It implements such commands as reverse-step, reverse-next, and reverse-watch (a generalization of watchpoints).

Suppose one is using a debugger and the variable x has the wrong value. When did it get the wrong value? Wouldn't it be nice to revert to an earlier state and examine x? One can with DMTCP, which immediately yields a reversible debugger. If we had checkpointed a debugging session 100 commands ago, and we wish to undo the last debugging command, then just restart the checkpoint image from 100 commands ago, and re-execute the first 99 debugging commands. Now, combine the last two ideas: I'm sure you've all seen how easily web browsers can crash. Wouldn't it be great to go back and find out at which statement they did something causing the crash?

An old description of FReD can be found in these slides from here. While reversible debuggers have been available at least since 1970, they have seldom gained widespread use. Most recently, GDB version-7.2 and later provides excellent support for reversible debugging using its target record command. GDB-7.3 is available in Ubuntu~11.10, and you will find a copy of it in the instructor's directory.

Some strong points of the FReD reversible debugger are:
(i) supports multi-threaded programs at near full speed;
(ii) supports long-running programs (in contrast, GDB reversibility is not practical for programs running even a few seconds); and
(iii) FReD supports a novel feature, reverse expression watchpoints. (See the slides for a description of this feature.)

FReD is in the last stages before a public release. An alpha copy of the code, along with two documents describing it are at:
The vision and example projects follow:
- VISION: FReD provides a Python-based scripting language that allows one to directly call debugging commands that can manipulate the debugging history of a process. Using this platform, one can automatically search for the cause of bugs. For example, if a a program dereferences a NULL pointer, FReD can bring one back in time within the GDB debugger to the point where the corresponding pointer variable was being set to NULL. If a buffer is allocated via malloc, and a program calls free twice on the same memory buffer, then FReD can bring one back to a point in time where the first call to free was made. This is done using reverse expression watchpoints. FReD can be extended to other debuggers besides GDB, and to other mechanisms for searching for the cause of a bug, beyond the examples above.
- EXAMPLE PROJECTS FOR 2012: extend FReD to work with multi-threaded languages such as Cilk and OpenMP ; add reversibility to the functional, lazy language Haskell ; implement reversible memory leak detector that will go back in time to the cause of the memory leak

Course Resources:

The instructor will cover any missing systems knowledge either in class, or one-on-one with individual students.

GDB and other UNIX resources: Some help files for UNIX and its compilers, editors, etc. are also available. In particular, the use of gdb (the GNU debugger) is especially encouraged as an important productivity tool.

The lecture slides on parallel computing (from the Intel Parallel Computing Center at the U. of Oregon) form a nice view of parallel computing, based on the Structured Parallel Programming book (written by authors from Intel).

Here is also one book that is very nice for learning systems programming concepts. Choose a chapter of interest, rather than reading it from front to back. The Rochkind book is an excellent book, with simple, example source code showing useful programs. The book home page has the table of contents, and downloadable example source code. I also recommend the online book, "The Linux Kernel", below for qn excellent overview of the kernel. The book by Robert Love provides more technical details on the Linux Operating System, but it would only needed for more unusual aspects of certain projects.

Advanced UNIX Programming, Second Edition, by Marc J. Rochkind, Addison Wesley, 2004 (Library copy goes on reserve Jan. 16)
Advanced programming in the Unix environment / W. Richard Stevens, Stephen A. Rago (Library copy goes on reserve jan. 16; The book above or this is enough for those who wish a deeper class background. You don't need both.)
The Linux kernel (online) (my current favorite book on the Linux kernel --- a gentle introduction without confusing newcomers with all the gory details)
Online: Linux System Programming by Love, 2007 (free online version accessible from Northeastern computer network via Safari Books Online)
- If using from another ISP outside of Northeastern U., then try tunneling using your CCIS account:
  ssh -L1234:0-proquest.safaribooksonline.com.ilsprod.lib.neu.edu:80 denali.ccs.neu.edu
  or:
  ssh -L1234:safari.oreilly.com:80 denali.ccs.neu.edu
  Then point one's browser at the URL http://localhost:1234/
Note also the notes on debugging below.

Mini-Projects:

Each of these mini-projects is described only in outline. Within the class, further details will be provided for those mini-projects of interest to the students. Never mind if you produce working software. The goal is to understand. It is only incidental if you succeed with a "deliverable".

DMTCP (Familiarize yourself now with the code. Note especially the subdirectory dmtcp/doc with descriptions of many parts of the DMTCP internals. Use tools such as gdb for a deeper understanding. Read the QUICK-INSTALL file for more tips about DMTCP and its debugging tools. Then write an overview of the implementation of DMTCP. This is a paper-only project. If you take this on, it will require detailed descriptions of the functionality of the components of DMTCP. Below are some alternative mini-projects concerned more closely with producing code or pseudo-code.
1. Use the module facility of DMTCP to build a new module. An example module might be wrappers for the functions malloc and free. The wrappers should allocate additional "guard regions" around the memory buffer. It should catch bugs like user code that writes beyond the end of allocated memory, or user code that frees a buffer twice. For interested students, there is the possibility of building on This mini-project, to provide a novel, advanced memory leak detector in the FReD reversible debugger.
2. Write a new DMTCP wrapper function for a new system call. (One suggestion is for man epoll.) There are examples of wrapper functions in trunk/dmtcp/src/pidwrappers.cpp and other files with names *wrappers.cpp. If you choose an advanced system call such as epoll, it is acceptable to work jointly with another student on a single mini-project.
3. DMTCP currently has a bug in checkpointing emacs23 (version 23 of emacs). Investigate the cause of this bug. The primary responsibility is the diagnosis of the bug. You are not required to produce a bug fix. (The bug appears to occur in the context of screen. If you are interested in this project, tell me, and I will help you reproduce this bug.)
4. Provide a paper design for a new MTCP module. This type of module is unrelated to the DMTCP modules described above. Currently, DMTCP has an option for using "gzip" to dynamically compress files on the fly. It is mostly in MTCP. This is too restrictive. The MTCP subdirectory should support arbitrary user-defined modules that are called by the MTCP checkpoint or restart routines. The modules may save a checkpoint image locally or on a remote machine; using gzip or a newer fast compression routine such as Snappy, LZO, FastLZ, QuickLZ, or other. You provide a paper design for the framework, and third parties build whatever module they want. As part of your paper design, consider options for third-party module writers to write to RAM and then fork a child process that saves on disk -- or a third-party module that mmaps RAM to disk and lets the operating system worry about the best optimization. You don't yet write code in this mini-project, but you should refer to existing code in the MTCP subdirectory.
FReD (Familiarize yourself now with the code.) This primarily involves reading Python code, the C++ record-replay module for DMTCP, and a rough "black box" understanding of DMTCP. There is a limited introduction in this paper from the PLOS-11 workshop. In particular, read about the primitive reverse-xxx algorithms, and then study the code to see how reverse-watch works. Document your findings in a report.
Dthreads is a novel idea for determinism: replace a multi-threaded process by multiple processes with shared memory. There is also a full paper on Dthreads. This provides for efficient deterministic multithreading. A well-known problem with reversible debuggers is that if you go back in time, and then execute forwards, do you arrive at the same place, or did the operating system produce a different thread schedule that changes the behavior? By adding determinism, Dthreads provides a "simple" idea for allowing FReD to easily implement determinism. Is such a combined implementation possible? How would it work? This project is purely a paper design.
Linux LGuest is a virtual machine written in just 5000 lines of well-documented code! You can read and understand every line of the code. Lguest does not provide for snapshots. But using DMTCP, we could checkpoint the Lguest "process". This provides a fast checkpoint of an entire virtual operating system. Can this idea work? Try it. If you try it, probably some things go wrong. What goes wrong? Propose on paper some approaches to overcome these difficulties. There are opportunities to continue this work into the term project.

Main Projects:

The main projects are listed below. As always, you are also welcome to bring your own research project. The project may be a tool useful for High Performance Computing or a large computation itself.

We will also set up a course Wiki, where you will describe the status of your projects. The Wiki will also have a space for general issues/comments in supporting Roomy and DMTCP.

DMTCP (list of projects still being revised)

Using suspend-to-disk mode to enable checkpointing of graphics programs (including OpenGL (3D graphics))
MTT (MPI Testing Tool) for automated testing for Open MPI and DMTCP
Checkpointing Qemu, first step towards checkpointing malware, and then running the malware reversibly
Another option that may or may not work is the Linux lguest simple hypervisor
Fast process migration (e.g. for servers):
Step 1: Add an MTCP module capability
Step 2: Write an MTCP checkpoint module that checkpoints to remote RAM
Step 3: Restart on the remote machine
Step 4: Tune it for speed
DMTCP attach using new ptrace capability.
Checkpoint the job control/suspend feature of your favorite shell (^Z)
Have MTCP use a standard ELF linker script.
Hijack/Attach to already running process and checkpoint (There is a question here about how to follow socket connections, if that process is already talking to other processes.)
Thread race condition detector: A traditional race condition eventually causes a crash. But since it's a race condition, it doesn't always crash at that location. Experiment with different checkpoints, until you find a checkpoint location for which the process always crashes upon restart. Then modify MTCP to only allow a subset of the threads to resume, and keep the other threads suspended. By trial and error, discover which two threads have a race condition.
Portable Linux Apps: DMTCP checkpoint images include any libraries that have been loaded. If the environment variable LD_BIND_NOW is set (set to anything), then the loader will preload every library that it will need. This should enable one to copy a checkpoint image from Debian Linux to OpenSuse Linux to RedHat Linux to Ubuntu Linux to (etc.). Does this work? If not, what's needed to make it work?
Incremental Checkpoint: DMTCP may want to keep multiple checkpoint images, so that it can return to any of several execution points in the past. This would normally require a lot of disk space. How does one efficiently store a diff between checkpoints. (This may be a somewhat easier project, for those who are looking for that. With other projects, one will often find that at the end of the semester, one has to report that some parts are still not working, and why. This project offers the opportunity of finishing most of the project, if there are no surprises.)
Checkpoint valgrind or other binary rewriters. Examples of programs using dynamic binary translation include: Valgrind, Pin, and Paradyn/Dyninst. Currently, it appears that DMTCP cannot checkpoint these packages. Why? Can it be fixed?

FReD (list of projects still being revised)

Reversibly debugging Cilk programs
Reversibly debugging OpenMP programs
Extend the FReD reversible debugger to Java programs via 'jdb'
Haskell and reversible debugging (Due to the functional lazy design of the language, Haskell has to worry about side effects in debugging)
Combine FReD with DThreads (highly speculative, but very high impact)
Memory leak detector: If a memory leak occurs later in the program, valgrind runs too slowly to easily find it. So, use a malloc debugger or your own memory/free interceptor. A good one is the DUMA library (libduma) (Detect Unintended Memory Access), which is a newer replacement for the classic Electric Fence (libefence). This defines regions created through malloc. Late in the program, it will be easy to find a region of memory that is a memory leak (that no one ever touches again).

Alternatively, using checkpoint/restart tricks, find the last time that anyone touched that memory segment. Report that line of code using a standard tool to convert between a line of assembly language and the source code line. To guarantee that no one ever uses that memory again, remove read-write protection from that region of memory and add a segfault handler to trap any accesses. Then automate the many checkpoint-restart to automatically find where the memory segment was last touched.

Project Software:

DMTCP

If you have questions about DMTCP, please send e-mail to Kapil Arya and me. The username of Kapil Arya is his first name (all lower case) and: @ccs.neu.edu

DMTCP is available through the sourceforge web page. The easiest way to start is (in Linux) to type:

  svn co https://dmtcp.svn.sourceforge.net/svnroot/dmtcp/trunk dmtcp
  cd dmtcp
  ./configure
  [ OR:   ./configure --enable-debug ]
  make
  make check  [OPTIONAL]

Then read the QUICK-START file in the top-level dmtcp directory. From there start browsing the source code.

FReD

The FReD software will be made available soon.

Debugging and Other Systems Tricks:

See: debugging and other system tricks (separate web page)