Description of "Accelerated Computer Systems" (CS 4973; Spring, 2026)
Prof. Gene Cooperman
NOTE:
The registrar lists this course as "Advanced Computer Systems". A more
accurate name would be "Accelerated Computer Systems".
This course begins exactly where CS 3650 ("Computer Systems") begins.
It does not assume advanced background.
The course can be accelerated, because it concentrates on a few,
fundamental concepts for Computer Systems.
For those who have already taken CS 3650, I can also consider a small
number of students who can take a directed study course
(CS 4992) from me, in which I will provide additional work for those
portions of the course that duplicate CS 3650.
INSTRUCTOR EXPERIENCE:
I have previously taught Computer Systems at the undergraduate, M.S.,
and PhD level for many years. This course is a synthesis of that experience.
This synthesis allows me to accelerate the learning, while being very respectful of students' time commitments.
For questions on this course, please write to me:
g.cooperman@northeastern.edu
By concentrating
on fundamental concepts, instead of large amounts of code, the student
will quickly internalize how to think about Computer Systems.
The last third of this course will then use that newly found sophistication
to take a tour of the th many other computer systems course, for which
this course serves as a gateway.
The difference between CS 4973 and CS 3650 is that this course will place
a greater emphasis on fundamental concepts throughout computer systems,
instead of syntax and large projects.
By emphasizing fundamental concepts,
the course will be able to accelerate through the material of
a traditional Computer Systems course, and then offer a gateway
to the rich set of
other systems courses offered in the Khoury College.
Ny own definition of Computer Systems is
as follows:
Computer Systems is the subject that shows how to create software
functionality that cannot be written solely within the confines of a
single programming language.
In order to make concrete this definition of Computer Systems concrete, and
also to explain how a course can painlessly accelerate the
contents of a traditional course, please read the
description of the the first two homeworks in the course (covered during
the first three weeks). For a detailed description, see
"First two homeworks" (below).
See below for more details of this course.
-
Rough syllabus for Accelerated Computer Systems
-
Accelerated Computer Systems as a Gateway to Other Systems Courses
-
First two homeworks: New Capabilities Using Computer Systems Concepts:
A. First two homeworks: New Capabilities Using Computer Systems Concepts:
- Central Concept of Operating Systems:
- The UNIX process table is an array of struct, where each struct
has attributes for that process. The open files of that process
are "file descriptors", pointing to an entry in a
global table of all open files.
- The global table of open files is an array of struct, where each
struct has attributes for that file. The struct can point to
the terminal, a device, or an entry in a superblock on disk.
- The superblock on disk is an array of struct, where each
struct has attributes for that file on disk (permissions,
file owner, file size, etc.).
- Homework 1 and homework 2 (with guidance in class):
Create two programs, checkpoint and restart,
with the ability to save and restore a running process.
(See: "First two homeworks")
- C pointers are almost the same as array names:
programmer's model
- fork/exec/waitpid: Writing your own shell for the terminal
in 100 lines or less.
- Assembly language: A low-level programming language with
32 variables (called registers), array indexing (called
base addressing mode), and a weird implementation of the
stack of pending functions. Learning is made easier by
an easy-to-use simulator for
RISC-V assembly language
that graphically exposes concepts, through single-step, back-step
(execute in reverse), breakpoints, register values, etc.
- A reading knowledge of C code for early UNIX (Linux) operating system.
This unit is just one week, leveraging the concepts learned
earlier.
- CPU cache and virtual memory: A CPU cache is just a table
of key-value pairs, with each key being a virtual address
for a variable,
generated by the assembly language, and the value being the
actual address of that variable in RAM.
Virtual memory is almost the
same thing as a "direct-mapped" CPU cache.
(And the corresponding homework will show you how to use an LLM
to modify the C compiler to generate an address trace of all
addresses used by a program. The address trace is then used
to write a simple simulator for the CPU cache.)
- Multithreaded programming: Threads are "subprocesses" that
have access to the entire memory of the parent process.
For threads to cooperate with each other, they need to
synchronize with each other: mutex (locks), semaphore
(multiple threads doing the same thing), condition variables
(simple multithreaded construct that is mostly
implemented as a user-defined program invariant).
- Model Checking: In Freshman programming, you give a program
an input, and the program deterministically executes exactly the
same steps every time. So, diagnosing bugs is easy.
In Multithreaded programming, each thread (subprocess) executes
in a separate schedule. So, maybe a bug appears only once in
a hundred executions (due to an unfortunate schedule). How can
you discover and fix such rare bugs? (ANSWER:
Model checking)
This unit will use a simple, easy-to-use model checker that
allows you to quickly diagnose very subtle multithreaded bugs.
- Gateway to other systems courses
in Khoury College.
(See: Accelerated Computer Systems as a
Gateway to Other Systems Courses)
B. Accelerated Computer Systems as a Gateway to Other Systems Courses
The following courses all leverage topics in computer systems.
-
The cybersecurity courses leverage information on process memory
layout, including stacks and call frames, dynamic libraries, LD_PRELOAD,
assembly language, and other topics in this course.
-
The networking courses leverage a knowledge of the operating
system implementation and how it can be extended for network devices.
-
The Distributed Systems, Cloud Computing, Parallel Data Processing
and High Performance Computing
courses can all be viewed as ways to take multithreaded programming
on a single node, and generalize them to distributed computers.
For example, multithreaded programmings use locks (mutexes) and barriers
located in the shared memory of a single computer. How do you create
locks or barriers for all processes across many computers.
-
The Database and Data Mining courses leverage a knowledge of
the file system on disk or SSD, along with generalizations of multithreaded
programming to distributed hosts.
-
And of course, there is an obvious connection with the courses
on Operating Systems Implementation (CS 6640), and M.S.-level or PhD-level
Computer Systems (CS 5600 and CS 7600).
Courses using Computer Systems:
- CY 2550 – Foundations of Cybersecurity
- CS 4700 – Network Fundamentals
- CS 4730 – Distributed Systems
- CY 5010 – Cybersecurity Principles and Practices
- CY 5065 – Cloud Security Practices
- CY 5130 – Computer System Security
CY 5150 – Network Security Practices
- EECE 5640 - High-Performance Computing (In "Electrical and Computer Engineering")
- CS 5200 – Database Management Systems
- CS 5600 – Computer Systems
- CS 5700 – Fundamentals of Computer Networking
- CS 6240 – Large-Scale Parallel Data Processing
- CS 6640 – Operating Systems Implementation
- CS 6650 – Building Scalable Distributed Systems
- CY 6740 – Network Security
- CS 7600 – Intensive Computer Systems
- CS 7610 – Foundations of Distributed Systems
C. First two homeworks: New Capabilities Using Computer Systems Concepts:
The first two homeworks will
demonstrate (with lots of hints) how a systems approach can implement
software that would be impossible in a traditional programming language.
Consider a checkpointing
program. A checkpointing program can save a process to disk,
and later resume the process. For example:
% ./count-numbers
1 2 3 ...
% checkpoint ./count-numbers
1 2 3 4[program interrupted]
% restart ./checkpoint-file
5 6 7 ...
The first homework will be to write the program, 'checkpoint'.
The second homework will be to write the program 'restart'.
The normal deadline will be one or two weeks per assignment.
But for these two homeworks, if you have trouble, you can put aside
your work for two months, and then submit it as a "late homework"
with a penalty of 5 points out of 100.
The reason I do this is because the course is about _concepts_, not
syntax. Concepts take longer to internalize, but once internalized,
they stay with you for a lifetime. I will have generous office hours to
help you through hw1 and hw2. But for students who have not yet
internalized the associated systems concepts, they will benefit
from doing the other (much smaller) homeworks, that again review
the concepts. After that, students typically are surprised by
how much they have internalized, and they can then _easily_ do hw1 and hw2.