diff --git a/content/posts/operating_system.md b/content/posts/operating_system.md index 828d693..55ab5ad 100644 --- a/content/posts/operating_system.md +++ b/content/posts/operating_system.md @@ -686,3 +686,332 @@ anything with arriving and departing tasks, provided the system is stable — re the arrival process, number of servers, or queueing order. +## Memory Management + +### Address translation +Address translation is a simple concept, but it turns out to be incredibly powerful. What can +an operating system do with address translation? This is only a partial list: +- **Process isolation.** As we discussed in Chapter 2, protecting the operating system +kernel and other applications against buggy or malicious code requires the ability to +limit memory references by applications. Likewise, address translation can be used by +applications to construct safe execution sandboxes for third party extensions. +- **Interprocess communication.** Often processes need to coordinate with each other, +and an efficient way to do that is to have the processes share a common memory +region. +- **Shared code segments.** Instances of the same program can share the program’s +instructions, reducing their memory footprint and making the processor cache more +efficient. Likewise, different programs can share common libraries. +- **Program initialization.** Using address translation, we can start a program running +before all of its code is loaded into memory from disk. +- **Efficient dynamic memory allocation.** As a process grows its heap, or as a thread +grows its stack, we can use address translation to trap to the kernel to allocate +memory for those purposes only as needed. +- **Cache management.** As we will explain in the next chapter, the operating system can +arrange how programs are positioned in physical memory to improve cache efficiency, +through a system called page coloring. +- **Memory mapped files.** A convenient and efficient abstraction for many applications is +to map files into the address space, so that the contents of the file can be directly +referenced with program instructions. +- **Efficient I/O.** Server operating systems are often limited by the rate at which they can +transfer data to and from the disk and the network. Address translation enables data to +be safely transferred directly between user-mode applications and I/O devices. +- **Virtual memory.** The operating system can provide applications the abstraction of +more memory than is physically present on a given computer. +- **Persistent data structures.** The operating system can provide the abstraction of a +persistent region of memory, where changes to the data structures in that region +survive program and system crashes. + + +#### Definition +The translator takes each instruction and data memory reference generated by +a process, checks whether the address is legal, and converts it to a physical memory +address that can be used to fetch or store instructions or data. The data itself — whatever +is stored in memory — is returned as is; it is not transformed in any way. The translation is +usually implemented in hardware, and the operating system kernel configures the +hardware to accomplish its aims. +Given that a number of different implementations are possible, how should we evaluate the +alternatives? Here are some goals we might want out of a translation box; the design we +end up with will depend on how we balance among these various goals. +- Memory protection +- Memory Sharing +- Flexible memory placement +- Sparse addresses +- Runtime lookup efficiency +- Compact translation tables +- Portability +We will end up with a fairly complex address translation mechanism, and so our discussion +will start with the simplest possible mechanisms and add functionality only as needed. It +will be helpful during the discussion for you to keep in mind the two views of memory: the +process sees its own memory, using its own addresses. We will call these *virtual +addresses*, because they do not necessarily correspond to any physical reality. By +contrast, to the memory system, there are only *physical addresses* — real locations in +memory. From the memory system perspective, it is given physical addresses and it does +lookups and stores values. The translation mechanism converts between the two views: +from a virtual address to a physical memory address. + +#### Towards Flexible Address Translation +Our discussion of hardware address translation is divided into two steps. First, we put the +issue of lookup efficiency aside, and instead consider how best to achieve the other goals +listed above: flexible memory assignment, space efficiency, fine-grained protection and +sharing, and so forth. Once we have the features we want, we will then add mechanisms to +gain back lookup efficiency. +In Chapter 2, we illustrated the notion of hardware memory protection using the simplest +hardware imaginable: base and bounds. The translation box consists of two extra registers +per process. The base register specifies the start of the process’s region of physical +memory; the bound register specifies the extent of that region. If the base register is added +to every address generated by the program, then we no longer need a relocating loader — +the virtual addresses of the program start from 0 and go to bound, and the physical +addresses start from base and go to base + bound. Since physical memory can contain several processes, the kernel +*resets* the contents of the base and bounds registers on *each process context switch* to the +appropriate values for that process. + +Base and bounds translation is both simple and fast, but it lacks many of the features +needed to support modern programs. Base and bounds translation supports only coarsegrained +protection at the level of the entire process; it is not possible to prevent a program +from overwriting its own code, for example. It is also difficult to share regions of memory +between two processes. Since the memory for a process needs to be contiguous, +supporting dynamic memory regions, such as for heaps, thread stacks, or memory mapped +files, becomes difficult to impossible. + +##### Segmented Memory +Many of the limitations of base and bounds translation can be remedied with a small +change: instead of keeping only a single pair of base and bounds registers per process, +the hardware can support an array of pairs of base and bounds registers, for each process. +This is called *segmentation*. Each entry in the array controls a portion, or segment, of the +virtual address space. The physical memory for each segment is stored contiguously, but +different segments can be stored at different locations. The *high order bits* of the virtual address are used to *index into* the +array; the rest of the address is then treated as above — *added to the base* and checked +against the bound stored at that index. In addition, the operating system can assign +different segments different permissions, e.g., to allow execute-only access to code and +read-write access to data. Although four segments are shown in the figure, in general the +number of segments is determined by the number of bits for the segment number that are +set aside in the virtual address. + +program memory is no longer a single contiguous region, but instead it is a set of regions. +Each different segment +starts at a new segment boundary. For example, code and data are not immediately +adjacent to each other in either the virtual or physical address space. + +What happens if a program branches into or tries to load data from one of these gaps? The +hardware will generate an exception, trapping into the operating system kernel. On UNIX +systems, this is still called a segmentation fault, that is, a reference outside of a legal +segment of memory. + +Although simple to implement and manage, segmented memory is both remarkably +powerful and widely used. With segments, the operating system can allow +processes to share some regions of memory while keeping other regions protected. + +Likewise, shared library routines, such as a graphics library, can be placed into a segment +and shared between processes. This is frequently done in modern operating systems with dynamically +linked libraries. + +We can also use segments for interprocess communication, if processes are given read +and write permission to the same segment. + +As a final example of the power of segments, they enable the efficient management of +dynamically allocated memory. When an operating system reuses memory or disk space +that had previously been used, it must first zero out the contents of the memory or disk. +Otherwise, private data from one application could inadvertently leak into another, +potentially malicious, application. + +Over time, as processes are created and finish, physical memory will +be divided into regions that are in use and regions that are not, that is, available to be +allocated to a new process. These free regions will be of varying sizes. When we create a +new segment, we will need to find a free spot for it. Should we put it in the smallest open +region where it will fit? The largest open region? + +However we choose to place new segments, as more memory becomes allocated, the +operating system may reach a point where there is enough free space for a new segment, +but the free space is not contiguous. This is called *external fragmentation*. The operating +system is free to compact memory to make room without affecting applications, because +virtual addresses are unchanged when we relocate a segment in physical memory. Even +so, compaction can be costly in terms of processor overhead: a typical server configuration +would take roughly a second to compact its memory. + +All this becomes even more complex when memory segments can grow. How much +memory should we set aside for a program’s heap? If we put the heap segment in a part of +physical memory with lots of room, then we will have wasted memory if that program turns +out to need only a small heap. If we do the opposite — put the heap segment in a small +chunk of physical memory — then we will need to copy it somewhere else if it changes +size. + +##### Paged Memory +An alternative to segmented memory is *paged memory*. With paging, memory is allocated +in fixed-sized chunks called *page frames*. Address translation is similar to how it works with +segmentation. Instead of a segment table whose entries contain pointers to variable-sized +segments, there is a *page table* for each process whose entries contain pointers to page +frames. Because page frames are *fixed-sized and a power of two*, the page table entries +only need to provide the *upper bits* of the page frame address, so they are more compact. +There is no need for a “bound” on the offset; the entire page in physical memory is +allocated as a unit. + +What will seem odd, and perhaps cool, about paging is that while a program thinks of its +memory as linear, in fact its memory can be, and usually is, scattered throughout physical +memory in a kind of abstract mosaic. The processor will execute one instruction after +another using virtual addresses; its virtual addresses are still linear. However, the +instruction located at the end of a page will be located in a completely different region of +physical memory from the next instruction at the start of the next page. Data structures will +appear to be contiguous using virtual addresses, but a large matrix may be scattered +across many physical page frames. + +Paging addresses the principal limitation of segmentation: *free-space allocation* is very +straightforward. The operating system can represent physical memory as a bit map, with +each bit representing a physical page frame that is either free or in use. Finding a free +frame is just a matter of finding an empty bit. + +Sharing memory between processes is also convenient: we need to set the page table +entry for each process sharing a page to point to the same physical page frame. For a +large shared region that spans multiple page frames, such as a shared library, this may +require setting up a number of page table entries. Since we need to know when to release +memory when a process finishes, shared memory requires some extra bookkeeping to +keep track of whether the shared page is still in use. The data structure for this is called a +*core map*; it records information about each physical page frame such as which page table +entries point to it. + +Page tables allow other features to be added. For example, we can start a program +running before all of its code and data are loaded into memory. Initially, the operating +system marks all of the page table entries for a new process as invalid; as pages are +brought in from disk, it marks those pages as read-only (for code pages) or read-write (for +data pages). Once the first few pages are in memory, however, the operating system can +start execution of the program in user-mode, while the kernel continues to transfer the rest +of the program’s code in the background. As the program starts up, if it happens to jump to +a location that has not been loaded yet, the hardware will cause an exception, and the +kernel can stall the program until that page is available. Further, the compiler can +reorganize the program executable for more efficient startup, by coalescing the initialization +pages into a few pages at the start of the program, thus overlapping initialization and +loading the program from disk. + +A *downside* of paging is that while the management of physical memory becomes simpler, +the management of the virtual address space becomes more challenging. Compilers +typically expect the execution stack to be contiguous (in virtual addresses) and of arbitrary +size; each new procedure call assumes the memory for the stack is available. Likewise, the +runtime library for dynamic memory allocation typically expects a contiguous heap. In a +single-threaded process, we can place the stack and heap at opposite ends of the virtual +address space, and have them grow towards each other, as shown in Figure 8.5. However, +with multiple threads per process, we need multiple thread stacks, each with room to grow. + +This becomes even more of an issue with 64-bit virtual address spaces. The size of the +page table is proportional to the size of the virtual address space, not to the size of +physical memory. The more sparse the virtual address space, the more overhead is +needed for the page table. Most of the entries will be invalid, representing parts of the +virtual address space that are not in use, but physical memory is still needed for all of +those page table entries. + +We can reduce the space taken up by the page table by choosing a larger page frame. +How big should a page frame be? A larger page frame can waste space if a process does +not use all of the memory inside the frame. This is called *internal fragmentation*. Fixed-size +chunks are easier to allocate, but waste space if the entire chunk is not used. + +##### Multi-Level Translation +Almost all multi-level address translation systems use paging as the lowest level of the +tree. The main differences between systems are in how they reach the page table at the +leaf of the tree — whether using segments plus paging, or multiple levels of paging, or +segments plus multiple levels of paging. +- Paged Segmentation +Let us start a system with only two levels of a tree. With paged segmentation, memory is +segmented, but instead of each segment table entry pointing directly to a contiguous +region of physical memory, each segment table entry points to a page table, which in turn +points to the memory backing that segment. The segment table entry “bound” describes +the page table length, that is, the length of the segment in pages. Because paging is used +at the lowest level, all segment lengths are some multiple of the page size. + +Although segment tables are sometimes stored in special hardware registers, the page +tables for each segment are quite a bit larger in aggregate, and so they are normally stored +in physical memory. To keep the memory allocator simple, the maximum segment size is +usually chosen to allow the page table for each segment to be a small multiple of the page +size. +##### Multi-Level Paging +A nearly equivalent approach to paged segmentation is to use multiple levels of page +tables. The top-level page table contains entries, each of which +points to a second-level page table whose entries are pointers to page tables. +##### Multi-Level Paged Segmentation +We can combine these two approaches by using a segmented memory where each +segment is managed by a multi-level page table. + + +#### Towards Efficient Address Translation +At this point, you should be getting a bit antsy. After all, most of the hardware mechanisms +we have described involve at least two and possibly as many as four memory extra +references, on each instruction, before we even reach the intended physical memory +location! It should seem completely impractical for a processor to do several memory +lookups on every instruction fetch, and even more that for every instruction that loads or +stores data. + +In this section, we will discuss how to improve address translation performance without +changing its logical behavior. In other words, despite the optimization, every virtual address +is translated to exactly the same physical memory location, and every permission +exception causes a trap, exactly as would have occurred without the performance +optimization. + +For this, we will use a *cache*, a copy of some data that can be accessed more quickly than +the original. + +##### Translation Lookaside Buffers +If the two +instructions are on the same page in the virtual address space, then they will be on the +same page in physical memory. The processor will just repeat the same work — the table +walk will be exactly the same, and again for the next instruction, and the next after that. + +A translation lookaside buffer (TLB) is a *small hardware table* containing the results of +recent address translations. Each entry in the TLB maps a virtual page to a physical page: + +``` +TLB entry = { + virtual page number, + physical page frame number, + access permissions +} +``` + +Instead of finding the relevant entry by a multi-level lookup or by hashing, the TLB +hardware (typically) checks all of the entries simultaneously against the virtual page. If +there is a match, the processor uses that entry to form the physical address, skipping the +rest of the steps of address translation. This is called a TLB hit. On a *TLB hit*, the hardware +still needs to check permissions, in case, for example, the program attempts to write to a +code-only page or the operating system needs to trap on a store instruction to a copy-onwrite +page. + +A *TLB miss* occurs if none of the entries in the TLB match. In this case, the hardware does +the full address translation in the way we described above. When the address translation +completes, the physical page is used to form the physical address, and the translation is +installed in an entry in the TLB, replacing one of the existing entries. Typically, the replaced +entry will be one that has not been used recently. + +To be useful, the TLB lookup needs to be much +more rapid than doing a full address translation; thus, the TLB table entries are +implemented in very fast, on-chip static memory, situated near the processor. In fact, to +keep lookups rapid, many systems now include multiple levels of TLB. In general, the +smaller the memory, the faster the lookup. So, the first level TLB is small and close to the +processor. If the first level TLB does not contain the translation, a +larger second level TLB is consulted, and the full translation is only invoked if the +translation misses both levels. + +##### Superpages +One way to improve the TLB hit rate is using a concept called *superpages*. A superpage is +a set of contiguous pages in physical memory that map a contiguous region of virtual +memory, where the pages are aligned so that they share the same high-order (superpage) +address. + +Superpages complicate operating system memory allocation by requiring the system to +allocate chunks of memory in different sizes. However, the upside is that a superpage can +drastically reduce the number of TLB entries needed to map large, contiguous regions of +memory. Each entry in the TLB has a flag, signifying whether the entry is a page or a +superpage. For superpages, the TLB matches the superpage number — that is, it ignores +the portion of the virtual address that is the page number within the superpage. + +##### Virtually Addressed Caches +Another step to improving the performance of address translation is to include a virtually +addressed cache before the TLB is consulted, as shown in Figure 8.16. A virtually +addressed cache stores a copy of the contents of physical memory, indexed by the virtual +address. When there is a match, the processor can use the data immediately, without +waiting for a TLB lookup or page table translation to generate a physical address, and +without waiting to retrieve the data from main memory. Almost all modern multicore chips +include a small, virtually addressed on-chip cache near each processor core. + + +##### Physically Addressed Caches +Many processor architectures include a physically addressed cache that is consulted as a +second-level cache after the virtually addressed cache and TLB, but before main memory. Once the physical address of the memory location is +formed from the TLB lookup, the second-level cache is consulted. If there is a match, the +value stored at that location can be returned directly to the processor without the need to +go to main memory. \ No newline at end of file