Wednesday, October 5th, 2010, 3rd hour of lecture (8-9PM) Scribed by Adam Sheehan (ats@ccs.neu.edu) Benefits of Page Tables ======================= What are some of the benefits of using page tables over contiguous memory allocation with base and limit pairs? With a page table, you can maintain the state of certain pages in memory. Pages can be marked with various attributes such as: * Read-only bit Marking a page of memory as read-only can be used as a form of protection, but it also enables multiple processes to share frames that have common data (e.g. multiple processes running the same instructions can share that frame in memory if it is read-only). * Shared memory Two or more processes can share memory as a form of inter-process communication by having more than one page table refer to the same frame. * No execute bit Marking a page as no-execute prevents the CPU from executing any instructions found within that page. This is useful for protecting against certain attacks that inject malicious code into one part of memory and attempt to move the instruction pointer to that new location (e.g. stack-smashing). * Valid bit Indicating which pages contain valid data allows the MMU to check whether a memory access should be allowed or not. * Copy-on-write (COW) bit The copy on write bit allow two page tables to share a frame in memory until one process attempts to write to it, at which point a copy of the frame is made for that process. An example of this in use would be when the Unix fork() call creates a new process that is identical to the parent. It would require a significant amount of time and memory to copy all of the frames of memory for the child process. Instead, just the page table is copied, and when either process writes to a frame, it makes a copy at that moment. Page Table Implementation ========================= The page table itself exists as a large region of contiguous memory. How much memory does the page table occupy? For 32-bit hardware (or 64-bit hardware using 32-bit logical addresses), there is a maximum of 4GB (2^32) addressable bytes in memory. With a page size of 4KB (2^12 bytes), we can calculate the number of entries in the page table as: 2^32 (4GB) total bytes ------------------------- = 2^20 (~1M) pages 2^12 (4KB) bytes per page If the minimum size of a page table entry is 4 bytes, then the total page table size is 4 bytes * 2^20 pages = 4MB. It would be best to divide the page table into chunks so that we don't have to allocate 4MB of contiguous memory for each process to create their page table. There are at least three ways to break up the page table. 1. Hierarchical paging ====================== The first method is to repeat the paging process and create another page table (i.e. page the page table). This enables us to allocate smaller chunks of the page table as it is needed rather than having to allocate all of the memory up front. How can this be accomplished? For a single level page table, a logical address can be broken down into two pieces, the virtual page number and the offset: +---------+--------+ | P1 | offset | +---------+--------+ Adding another page table would require two page numbers to be stored for each logical address: +----+----+--------+ | P1 | P2 | offset | +----+----+--------+ The figure above shows the format of a logical address for a 2-level page table, where P1 refers to the page of memory containing the inner page table, P2 refers to the page containing the desired memory location, and the remaining bits define the offset into that page. The size of the inner page table is dictated by the size of a page of memory. The size of P2 in a logical address can be calculated based on the size of the page and the size of a page table entry: P2 = (size of page) / (size of entry) To resolve a logical address in a 2-level page table: 1. Use P1 to find the entry in the outer page table that contains the desired inner page table. 2. Use P2 to find the desired frame of memory from the inner page table. 3. Use the offset to find the specific address within the frame. Although this process saves space, it comes at the expense of an additional memory lookup. A 2-level page table requires a total of three memory lookups for each logical address. Example: Using a 2-level hierarchical paging table with 32-bit logical addresses -------------------------------------------------------------------------------- How can we determine the format of a logical address for 32-bit hardware? Assume that pages are 4KB (2^12 bytes). This requires the offset of the logical address to be 12 bits long to access all of the addresses within a single page. Format of a 32-bit logical address +----+----+--------+ | ?? | ?? | 12 | = 32 bits +----+----+--------+ P1 P2 offset If we assume that a page table entry is 4 bytes, we can calculate the number of bits required for the inner page table index: (size of page) 2^12 bytes (4KB) P2 = --------------- = ---------------- = 2^10 bytes = 10 bits required (size of entry) 2^2 bytes (4B) This leaves us with 10 bits left over for P1: Format of a 32-bit logical address +----+----+--------+ | 10 | 10 | 12 | = 32 bits +----+----+--------+ P1 P2 offset Example: Using hierarchical paging with 64-bit logical addresses ---------------------------------------------------------------- Assume again that pages are 4KB (2^12 bytes), requiring 12 bits for the offset. If we assume that a page table entry is 4 bytes, using the same procedure in the last exercise, we end up with the following logical address format: Format of a 64-bit logical address +----+----+--------+ | 42 | 10 | 12 | = 64 bits +----+----+--------+ P1 P2 offset The outer page table then requires 2^42 bits (huge). To reduce the size, we can add another layer of indirection (i.e. another page table): Format of a 64-bit logical address +----+----+----+--------+ | 32 | 10 | 10 | 12 | = 64 bits +----+----+----+--------+ P1 P2 P3 offset Each new layer of indirection reduces the space required for the outermost page table, but it also adds another memory lookup. For 64-bit addresses, 7 layer of indirection are needed before a reasonably sized outer page table can be created. This means that there will be 7 memory lookups for each logical address, which is very slow. 2. Hashed paging ================ An alternative to using hierarchical paging is to use a hashed page table. Using this method, 1. The page table index (P1) is hashed to yield a key into the hash table. 2. Since there might be multiple page table indexes that hash to the same value, the hashed page table provides a linked list for each hashed entry. 3. The linked list is searched (should only contain a few values at most) until an entry that matches the original P1 value is found. 4. The frame number is retrieved from the linked list entry and combined with the offset to find the physical address. +---------+--------+ | P1 | offset | +---------+--------+ | v HASH(P1) +------+ | |hashed| linked_list +------> |page |--> of virtual -> physical |table | mappings +------+ The benefit of this approach is that in a hash table, the lookup time should be independent of the number of entries in the table, which works well for address spaces that are larger than 32-bits where there could be many entries in the page table. 3. Inverted Page Tables ======================= Normally, logical addresses are mapped to physical addresses in a page table for each process. This requires each process to have their own page table, each with potentially millions of entries. The inverted page table maps physical frames to logical pages. Why is this useful? Since physical frames are mapped to virtual addresses, there is only one page table for all processes, reducing the memory requirements. Because each process shares the same page table, each entry has to store the virtual page index and the process ID that it belongs to, known as the address space identifier (ASID). When a logical address is given, the page table is searched for an entry that matches the ASID, and then the index is used as the frame number. Although this method saves space, it requires more time to search through the page table for entries since they are ordered by frame number and not page number. In practice, a hash table could be used to speed up the searching process. Another downside to this approach is that pages cannot be shared by processes, since each frame has only one entry in the page in the table, and each page is associated with one process. Segmentation ============ Segmentation is another form of memory management where instead of creating a linear virtual address space via paging, applications can use variable size chunks of memory (i.e. segments) by specifying a segment number and an offset. The benefit of this approach is that user programs are allowed to specify separate segments for different parts of the program (e.g. stack, heap, code, etc.) and refer to them by a segment name (or number). This is considered a more natural view of memory as opposed to the linear address space provided by paging. Since segmentation uses variable sized chunks of memory, it is prone to external fragmentation. In practice, segmentation can be combined with paging so that the user is presented with a segmented view of memory but the underlying implementation uses paging with fixed-size chunks of memory. This approach reduces or eliminates external fragmentation but still suffers from internal fragmentation (i.e. each page may not be fully utilized).