The Run-Time System =================== After a program is compiled, either to JVM bytecode or to machine code, it needs to be executed. We saw in the last class how a program compiled into JVM bytecode runs. The essential components of any run-time system are stack (including registers, local variables, arguments, return value), method area (where the code resides), and heap (where objects are dynamically allocated). Since the size of both the stack and the heap are dynamic, we do not allocate a fixed amount of memory to them. Instead the stack and the heap occupy two ends of a virtual memory "space" with free memory in between. Note that this memory is allocated by the operating system. (Talk about the notion of virtual memory.) Why are objects allocated on the heap? Because we do not know at compile time the size of the objects we would be allocating. How do you think the allocation of the "class vector" works? Memory corruption errors are one of the most common software bugs. There are different kinds of memory corruption errors, the two most common being references to non-allocated memory locations and memory leaks. Memory corruption errors occur primarily when programmers do their own memory management. Buffer overrun is one such example. Overrun can happen in either the stack or in the heap. Stack overruns may overwrite the registers (program counter, local variables, return arguments, etc.) and may cause a variety of wierd behaviors. Heap overruns overwrite objects allocated that may be referenced later and could cause major problems. The recent bug discovered in Microsoft MDAC is an example of a heap-overrun bug. Automatic Memory Management =========================== Scheme and Java are two programming languages that do not allow the programmer to directly allocate memory on the heap. The heap memory is managed by allocation and deallocation subroutines that are automatically called (inbuilt). For instance, when you create a new object that is local for a procedure, memory for the object in the heap will be allocated at the instant the object is declared and will be deallocated when the procedure returns. Memory management is a complex task since memory is not always allocated and deallocated in convenient chunks. So the virtual memory space at any time could contain used space and unused space, which can be reused for allocation. The process of reclaiming deallocated space is referred to as *garbage collection*. Garbage collection is an important component of the run-time system that runs in the background (or periodically) to reclaim unused space. This procedure needs to identify "dead space". Two kinds of garbage collectors: (a) reference counting; (b) tracing. Reference counting keeps a count of the number of references made to each object. Whenever a reference to an object is added, the count for the object is increased. Whenever a reference to an object goes out of scope or is reassigned, the count is decremented. When an object is garbage collected, the counts of all objects it references are decremented. Tracing collections traverse the "pointers" from the root set (global variables, stack, etc.). All reachable pointers form the live space. Unreachable space can be reclaimed. This is another example of a graph traversal algorithm (like depth-first search and breadth-first search that we had seen earlier in the class -- in connection with traversing the web graph). Tracing collectors use a mark and sweep strategy. And then compact/copy the live data so that it is stored in one end of the heap.