Wednesday, October 5th, 2010, 3rd hour of lecture (8-9PM)
Scribed by Adam Sheehan (ats@ccs.neu.edu)


Benefits of Page Tables
=======================

What are some of the benefits of using page tables over contiguous
memory allocation with base and limit pairs? With a page table, you
can maintain the state of certain pages in memory. Pages can be marked
with various attributes such as:

* Read-only bit

Marking a page of memory as read-only can be used as a form of
protection, but it also enables multiple processes to share frames
that have common data (e.g. multiple processes running the same
instructions can share that frame in memory if it is read-only).

* Shared memory

Two or more processes can share memory as a form of inter-process
communication by having more than one page table refer to the same
frame.

* No execute bit

Marking a page as no-execute prevents the CPU from executing any
instructions found within that page. This is useful for protecting
against certain attacks that inject malicious code into one part of
memory and attempt to move the instruction pointer to that new
location (e.g. stack-smashing).

* Valid bit

Indicating which pages contain valid data allows the MMU to check
whether a memory access should be allowed or not.

* Copy-on-write (COW) bit

The copy on write bit allow two page tables to share a frame in memory
until one process attempts to write to it, at which point a copy of
the frame is made for that process. An example of this in use would be
when the Unix fork() call creates a new process that is identical to
the parent. It would require a significant amount of time and memory
to copy all of the frames of memory for the child process. Instead,
just the page table is copied, and when either process writes to a
frame, it makes a copy at that moment.
  

Page Table Implementation
=========================

The page table itself exists as a large region of contiguous
memory. How much memory does the page table occupy?

For 32-bit hardware (or 64-bit hardware using 32-bit logical
addresses), there is a maximum of 4GB (2^32) addressable bytes in
memory. With a page size of 4KB (2^12 bytes), we can calculate the
number of entries in the page table as:

2^32 (4GB) total bytes
------------------------- = 2^20 (~1M) pages
2^12 (4KB) bytes per page


If the minimum size of a page table entry is 4 bytes, then the total
page table size is 4 bytes * 2^20 pages = 4MB. It would be best to
divide the page table into chunks so that we don't have to allocate
4MB of contiguous memory for each process to create their page table.

There are at least three ways to break up the page table. 

1. Hierarchical paging 
======================

The first method is to repeat the paging process and create another
page table (i.e. page the page table). This enables us to allocate
smaller chunks of the page table as it is needed rather than having to
allocate all of the memory up front.

How can this be accomplished?

For a single level page table, a logical address can be broken down
into two pieces, the virtual page number and the offset:

  +---------+--------+
  |   P1    | offset |
  +---------+--------+

Adding another page table would require two page numbers to be stored
for each logical address:

  +----+----+--------+
  | P1 | P2 | offset |
  +----+----+--------+

The figure above shows the format of a logical address for a 2-level
page table, where P1 refers to the page of memory containing the inner
page table, P2 refers to the page containing the desired memory
location, and the remaining bits define the offset into that page.

The size of the inner page table is dictated by the size of a page of
memory. The size of P2 in a logical address can be calculated based on
the size of the page and the size of a page table entry:

P2 = (size of page) / (size of entry)

To resolve a logical address in a 2-level page table:

1. Use P1 to find the entry in the outer page table that contains the 
desired inner page table.
2. Use P2 to find the desired frame of memory from the inner page table.
3. Use the offset to find the specific address within the frame.

Although this process saves space, it comes at the expense of an
additional memory lookup. A 2-level page table requires a total of
three memory lookups for each logical address.

Example: Using a 2-level hierarchical paging table with 32-bit logical addresses
--------------------------------------------------------------------------------

How can we determine the format of a logical address for 32-bit hardware?

Assume that pages are 4KB (2^12 bytes). This requires the offset of
the logical address to be 12 bits long to access all of the addresses
within a single page.

Format of a 32-bit logical address
+----+----+--------+
| ?? | ?? |   12   | = 32 bits
+----+----+--------+
  P1   P2   offset

If we assume that a page table entry is 4 bytes, we can calculate the
number of bits required for the inner page table index:

     (size of page)    2^12 bytes (4KB)
P2 = --------------- = ---------------- = 2^10 bytes = 10 bits required
     (size of entry)    2^2 bytes (4B) 

This leaves us with 10 bits left over for P1:

Format of a 32-bit logical address
+----+----+--------+
| 10 | 10 |   12   | = 32 bits
+----+----+--------+
  P1   P2   offset


Example: Using hierarchical paging with 64-bit logical addresses
----------------------------------------------------------------

Assume again that pages are 4KB (2^12 bytes), requiring 12 bits for
the offset. If we assume that a page table entry is 4 bytes, using the
same procedure in the last exercise, we end up with the following
logical address format:

Format of a 64-bit logical address
+----+----+--------+
| 42 | 10 |   12   | = 64 bits
+----+----+--------+
  P1   P2   offset

The outer page table then requires 2^42 bits (huge). To reduce the
size, we can add another layer of indirection (i.e. another page
table):

Format of a 64-bit logical address
+----+----+----+--------+
| 32 | 10 | 10 |   12   | = 64 bits
+----+----+----+--------+
  P1   P2   P3   offset

Each new layer of indirection reduces the space required for the
outermost page table, but it also adds another memory lookup. For
64-bit addresses, 7 layer of indirection are needed before a
reasonably sized outer page table can be created. This means that
there will be 7 memory lookups for each logical address, which is very
slow.


2. Hashed paging
================

An alternative to using hierarchical paging is to use a hashed page
table.

Using this method, 

1. The page table index (P1) is hashed to yield a key into the hash table.
2. Since there might be multiple page table indexes that hash to the same
value, the hashed page table provides a linked list for each hashed entry.
3. The linked list is searched (should only contain a few values at most) 
until an entry that matches the original P1 value is found.
4. The frame number is retrieved from the linked list entry and combined 
with the offset to find the physical address.

  +---------+--------+
  |   P1    | offset |
  +---------+--------+
       |
       v
    HASH(P1)    +------+
       |        |hashed|    linked_list
       +------> |page  |--> of virtual -> physical
                |table |    mappings
                +------+

The benefit of this approach is that in a hash table, the lookup time
should be independent of the number of entries in the table, which
works well for address spaces that are larger than 32-bits where there
could be many entries in the page table.


3. Inverted Page Tables
=======================

Normally, logical addresses are mapped to physical addresses in a page
table for each process. This requires each process to have their own
page table, each with potentially millions of entries.

The inverted page table maps physical frames to logical pages. Why is
this useful? Since physical frames are mapped to virtual addresses,
there is only one page table for all processes, reducing the memory
requirements.

Because each process shares the same page table, each entry has to
store the virtual page index and the process ID that it belongs to,
known as the address space identifier (ASID). When a logical address
is given, the page table is searched for an entry that matches the
ASID, and then the index is used as the frame number.

Although this method saves space, it requires more time to search
through the page table for entries since they are ordered by frame
number and not page number. In practice, a hash table could be used to
speed up the searching process.

Another downside to this approach is that pages cannot be shared by
processes, since each frame has only one entry in the page in the
table, and each page is associated with one process.


Segmentation
============

Segmentation is another form of memory management where instead of
creating a linear virtual address space via paging, applications can
use variable size chunks of memory (i.e. segments) by specifying a
segment number and an offset.

The benefit of this approach is that user programs are allowed to
specify separate segments for different parts of the program
(e.g. stack, heap, code, etc.) and refer to them by a segment name (or
number). This is considered a more natural view of memory as opposed
to the linear address space provided by paging.

Since segmentation uses variable sized chunks of memory, it is prone
to external fragmentation. In practice, segmentation can be combined
with paging so that the user is presented with a segmented view of
memory but the underlying implementation uses paging with fixed-size
chunks of memory. This approach reduces or eliminates external
fragmentation but still suffers from internal fragmentation (i.e. each
page may not be fully utilized).