原 著: david a rusling Translation: Banyan & FIFA (2001-04-27 13:54:58)
Chapter III stores one of the most important components of the operating system when managing storage manage subsystems. In the early calculation era, because people needed to be much greater than physical memory, people have designed a variety of strategies to resolve this issue, which is the most successful virtual memory technology. It gives the memory space required for a limited physical memory competition to meet the memory space. Virtual memory technology not only allows us to use more memory, it also provides the following features: Huge address space operating system allows the system to look much more memory space than actual memory. Virtual memory can be a number of times the actual physical space in the system. Each process is running in its independent virtual address space. These virtual spaces are completely separated from each other, so the process does not affect each other. At the same time, hardware virtual memory mechanisms can set certain locations of memory to be unwritable. This protects the code and data from being disturbed by malicious programs. Memory mapping memory mappings can map image files and data files directly to address space of the process. In the memory map, the contents of the file are directly connected to the process virtual address space. The fair physical memory allocation memory management subsystem allows each running process to share the physical memory in the system. Sharing virtual memory although virtual memory allows the process that has its own independent virtual address space, sometimes you need to share memory between processes. For example, there is a possibility that there are several processes in the system to run the Bash command shell. To avoid copying of the Bash program in the virtual memory space of each process, a better solution is that there is only one Bash copy in the system physical memory and sharing between multiple processes. Dynamic library is another way to share execution code between processes. Shared memory can be used as a means of inter-process communication (IPC), and multiple processes exchange information through shared memory. Linux supports the shared memory IPC mechanism for System V. 3.1 Abstract Model of Virtual Memory Figure 3.1 Abstract Model of Virtual Address to Physical Address Mapping In Discussion Linux How to implement the support for virtual memory, it is necessary to look at the simpler abstract model. When the process is executed, it needs to read it from the memory to decode decoding. Before the instruction decoding it must take out to a location in the memory or store a value. This instruction is then executed and the next instruction in the program. In this process, the processor must frequent access to memory, or take the number of points, or store data. All addresses in the virtual memory system are virtual addresses instead of physical addresses. A series of tables maintained by the operating system is implemented by the processor to transition from the virtual address to the physical address. In order to make the conversion easier, virtual memory and physical memory are organized in a page. The size of the page in different systems can be the same, or may be different, which will bring the inconvenience of management. The Linux page running on the Alpha Axp processor is 8KB, and the Intel X86 system uses a 4KB page. Each page is labeled (PFN) through a number called a page frame number. The virtual address in page mode consists of two parts: the page frame number and the page offset value. If the page size is 4KB, the 11th: 0 bit of the virtual address represents the virtual address offset value, and the virtual page frame number is indicated by 12 digits. The address separation must be completed when the processor processes the virtual address. With the help of the page table, it converts the virtual page box number into physical page frame number and access the corresponding offset in the physical page. Figure 3.1 shows the virtual address spaces of two processes x and y, they have their respective pages. These pages are mapped to physical pages in memory in the virtual page of each process. In the figure, the virtual page frame number 0 of the process X is mapped to the physical page frame number 4. In theory, each page table entry should contain the following: Valid tag, indicating that this page table entry is a valid page mapping number of physical page frame numbers to access control information. What operations can be used to describe this page? Whether to include an execution code? The virtual page box number is offset in the page table. The sixth unit in the virtual page frame number 5 (0 is the first). In order to convert virtual addresses to physical addresses, the processor must first get the virtual address page number and page offset. Generally set the page size to 2 power.
Set the page size in Figure 3.1 to 0x2000 bytes (decimal 8192) and an address in the virtual address space of the process Y is 0x2194, then the processor converts it to the virtual page frame number 1 and the page offset 0x194 . The processor uses the virtual page box number to access the processor page table, and retrieve the page table entry. If the page table inlet at this location is valid, the processor will get the physical page frame number from this entry. If this entry is invalid, it means that the processor is accessed a region in which the virtual memory is not present. In this case, the processor cannot perform address translation, it must pass the control to the operating system to complete this. When a process attempts to access the processor's virtual address that cannot be converted to a valid address, the processor passes control to the operating system depending on the specific processor. The usual approach is that the processor triggers a page failure and falls into the operating system core, so that the operating system will get information about invalid virtual addresses and the cause of the page error. Further, as described in FIG. 3.1, the virtual page frame number 1 of the process Y is mapped to the system physical page frame number 4, and the starting position in the physical memory is 0x8000 (4 * 0x2000). Add the 0x194 byte offset to get the final physical address 0x8194. By mapping the virtual address to the physical address, virtual memory can be mapped to the system physical page in any order. For example, in FIG. 3.1, the virtual page frame number 0 of the process X is mapped to the physical page frame number 1 and the virtual page frame number 7 is mapped to the physical page frame number 0, although the latter's virtual page box number is higher than the former . Such virtual memory technology has brought interesting results: the pages in virtual memory do not need to maintain a specific order in physical memory. 3.1.1 Request for Conversion Pages In a system of physical memory than virtual memory, the operating system must improve the efficiency of physical memory. One way to save physical memory is to load only those virtual pages that are being executed. For example, a database program may have to query a certain database. At this time, it is not to load all the contents of the database to the memory, but only the part you want. If this database query is a search query without having to add a record operation for the database, the code to load the added record is meaningless. This technology called only the virtual page to be accessed is called a request to change the page. When the process tries to access the virtual address currently not in the memory, the processor cannot find the entry of the reference address in the page table. In FIG. 3.1, there is no inlet for the virtual page frame number 2, the page table of the process X, so that the processor cannot convert this address into a physical address when the process X attempts to access the virtual page frame number 2. At this time, the processor notifies the operating system having a page error occurred. If the virtual address that happens is invalid, it indicates that the process is attempting to access a virtual address that does not exist. This may be caused by an error, for example, it tries to make a random write operation to the memory. At this point, the operating system will terminate the operation of this application to protect other processes in the system from this error process. If an error virtual address is valid, it is currently not in memory, the operating system must read this page into memory from disk images. Since the interviewment is longer, the process must wait for a while until the page is taken out. If there are other processes in the system, the operating system will run one of them during the waiting process during the read page. Reading the page will be placed in an idle physical page box, and the entry corresponding to this virtual page frame number will be added to the page table of this process. The final process will start running from the place where the page is wrong. At this point, the entire virtual memory access process comes to a paragraph, and the processor can continue the virtual address to physical address translation, and the process can continue to run. Linux uses the requested change page to load the executable image into the virtual memory of the process. When the command is executed, the executable command file is opened, and its content is mapped to the virtual memory of the process. These operations are done by modifying the data structure of the memory module in the program process, referred to as memory mapping. However, only the starting part of the image is touched into physical memory, and the rest remain on the disk.
When the image is executed, it generates a page error so that Linux will decide which parts on the disk continue to execute. 3.1.2 Exchange If the process needs to transfer a virtual page to physical memory and just there is no idle physical page in the system, the operating system must discard some pages located in physical memory to tear a space. If those pages discarded from physical memory come from executable files or data files on the disk, and no modification is not required to save those pages. When the process needs this page again, read it directly from the executable or data file. However, if the page is modified, the operating system must keep the content of the page for re-access. This page is called a Dirty page. When you move from memory, they must be saved in special files called exchange files. With respect to the speed of the processor and physical memory, the speed of accessing the exchange file is very slow, and the operating system must write these DIRTY pages to the disk and continue to remain in memory. Algorithms for selecting discard pages often need to determine which pages should be discarded or exchanged. If the exchange algorithm is very efficient, "bumps" phenomenon will occur. In this case, the page constantly writes the disk and read from the disk, so that the operating system cannot make anything else. Take Figure 3.1 as an example, if the physical page frame number 1 is frequently used, the page drop algorithm is inappropriate to exchange it as a hop selection of the hard disk. A page collection currently used by a process is called a work set. Efficient swap policies ensure that all processes are stored in physical memory. Linux uses the least recently used (LRU) page decision algorithm to pay fairly to the page that will be abandoned from the system. This strategy sets an age for each page in the system, which varies with the number of pages. The more the number of times the page is accessed, the more you look at the page; the opposite is, more aging. The older page is the best candidate for the page to be exchanged. 3.1.3 Sharing virtual memory virtual memory allows multiple processes to easily share memory. All memory access is performed through a page table of each process. For two processes that share the same physical page, the page table entry that points to this physical page frame number must be included in the respective pages. Figure 3.1 Two processes shared physical page frame number 4. For the process X, the corresponding virtual page frame number is 4 and the process Y is 6. This interesting phenomenon: The process of sharing the physical page corresponds to the virtual memory location of this page. 3.1.4 Physical and Virtual Addressing Mode Operating System It is also not very meaningful in virtual memory. If the operating system is forced to maintain its own page table, it will be a disgusting solution. Most universal processors support physical addressing and virtual addressing mode. Physical addressing mode does not need to participate in the page and the processor does not perform any address conversion. The Linux core is run directly on the physical address space. The Alpha AXP processor does not have a special physical addressing mode. It divides memory space into several areas and assigns two of them as physical mapping addresses. The core address space is called the KSEG address space, which is located in the address 0xFFFFC0000000000 or more. In order to perform the core code of KSEG or the data to access there, the code must be executed in core mode. The Linux core on Alpha starts from address 0xffffffc0000310000. 3.1.5 Access Control Page Table Inlet contains access control information. Since the processor has already mapped the page table inlet as a virtual address to the physical address mapping, the access control information can be easily used to determine whether the processor accesses memory in its own way. Many factors make it necessary to strictly control access to the memory area. Some memory, such as part of the execution code, obviously should be read-only, and the operating system will never allow the process to write to this area. On the contrary, the page containing data should be writable, but the data will definitely cause the error to occur. There are at least two implementations of most processors: core state and user state. No one will allow core code to be executed in a user state or modify the core data structure at a user state. Figure 3.2 Alpha AXP Page Table Inlet Page Table Inlet The access control information is a processor; Figure 3.2 is a PTE (Page Table Entry) of the Alpha Axp processor.
The meaning of these bit is as follows: v is valid, if this location bit, indicating this PTE effective Foe "execution", whether the processor is incorporated, the processor will report the page error and transmit control FOW "Write", except that the page error occurs when writing to this page, the other is the same. For "read time failure", other than the page error occurs when reading this page, the other is the same. ASM address space matches. The operating system is used to clean some of the portions in the conversion buffer. The code running in the core mode can be read this page. The code running in user mode can be read this page. GH maps the entire block to a single instead of multiple conversion buffers. The code running in the core mode can be written. UWE runs this page if the code runs in user mode. Page Frame Number For the PTE of the V position bit, this domain contains the physical page frame number corresponding to this PTE; for invalid PTE, this domain is not 0, which contains information in the switch file in the exchange file. The following two bits are defined and used by Linux. _Page_dirty If set, this page is written to the swap file. _Page_accessed Linux Use it indicates that the page has been accessed. 3.2 High Speed Buffer If you use the above theoretical model to implement a system, it may work, but the efficiency is not high. Operating system designers and processor designers are working hard to improve system performance. In addition to manufacturing faster CPUs and memory, the best way is to maintain useful information and data in caches to speed up certain operations. Linux uses many memory management policies associated with cache. Buffer Cache This buffer Cache contains data buffers that are driven by block devices. The size of these buffered units is generally fixed (e.g., 512 bytes) and includes information blocks read or written from the block device. The block device is a device that can only read and write in a fixed large block. All hard drives are block devices. The index can be quickly found in the buffer cache using the device flag and the required block number. The block device can only access by Buffer Cache. If the data can be found in the Buffer Cache, it is not necessary to read from physical block devices (such as hard disks), which can accelerate access. Page Cache Used to accelerate access to executable image files on your hard disk and data files. It buffers the file content of a page each time. The page reads the memory from the disk to the Page Cache. Swap cache is only stored in the swap file with the modified page. As long as these pages are not modified after writing to the exchange file, the next time this page is swapped out of memory, it is not necessary to make updated write operations, which can be discarded simply. In a system where frequent frequent occurrence occurs, SWAP Cache can save a lot of unnecessary and time-consuming disk operations. Hardware Caches A common hardware cache is a page table entry cache in the processor. The processor does not always read the page table directly to the conversion of the page. This Cache is also called Translation Look-Aside Buffers, which contains a buffer copy of the page table portal of one or more processors in the system. When a reference to the virtual address is issued, the processor tries to find the matching TLB entry. If you find it directly into a virtual address into a physical address and processes the data. If you don't find it, you seek help to the operating system. The processor will issue a TLB mismatch signal to the operating system, which uses a specific system mechanism to notify the operating system. The operating system matches this address to generate a new TLB entry. When the operating system clears this exception, the processor will perform virtual address conversion again. Since there is already a corresponding entry in TLB, this operation will be successful. A disadvantage of using a cache is that Linux must consume more time and space to maintain these caches, and the system will also crash when the cache system crashes.
3.3 Linux Page Table 3.3 Linux's Three-Level Page Table Structure Linux always assume that the processor has a third-level page table. Each page is accessed by the page frame number included in the following subsequent page table. Figure 3.3 gives how virtual addresses are split into multiple domains, each domain provides an offset of a specified page table. In order to convert virtual addresses into physical addresses, the processor must obtain the value of each domain. This process will last three times until the physical page box number corresponding to the virtual address is found. Finally, use the last field in the virtual address to get the address of the data in the page. In order to implement cross-platform operation, Linux provides a series of transformation macros to make the core access to a page table for a particular process. This, this core does not have to know the structure of the page table entry and their arrangement. This strategy is quite successful, whether in the INTEL X86 processor with a three-level page table structure or a two-level page table, Linux always manipulates the code. 3.4 Page Allocation and Recycling is very frequently requested for physical pages in the system. For example, when an executable image is transferred to memory, the operating system must assign a page. These pages must be released when the image is executed and uninstalled. Another use of the physical page is to store the core data structure of the page table. The data structure and mechanism responsible for page allocation and recycling in the virtual memory subsystem may be maximized. All physical pages in the system are described with linked list MEM_MAP containing the MEM_MAP_T structure, which initializes when system startup. Each MEM_MAP_T describes a physical page. The important domains related to memory management are as follows: Count Record Use this page number. When this page is shared between multiple processes, its value is greater than 1. The age of this domain squeezing page is used to select the appropriate page to abandon or replace the memory. Map_nr records the physical page frame number of this MEM_MAP_T. The page assignment code uses the free_Area array to find and release the page, which is responsible for the entire buffer management. In addition, this code is independent of the page size and physical paging mechanism used by the processor. Each element in Free_Area contains information about the page block. The first element in the array describes 1 page, the second means 2 pages size blocks and next indicates the four page size blocks, and in turn, it is 2 power size. The List field represents a queue head that contains a pointer to the Page data structure in the MEM_MAP array. All idle pages are in this queue. The MAP field is a pointer to a page group distribution bitmap of a particular page size. When the nth block of the page is idle, the nth bit of the bitmap is set. Fig. Free-area-frame draws the Free_Area structure. The first element has a free page (page frame number 0), and the second element has two free blocks of 4 page sizes, the previous start from the page frame number 4 and the latter starts from the page frame number 56. 3.4.1 Page Allocation Linux uses the buddy algorithm to effectively assign and recycle page blocks. The page allocation code is assigned a memory block with one or more physical pages. The page is assigned in 2 power memory blocks. This means it can allocate 1, 2 and 4 pages. As long as there is enough free page in the system to meet this requirement (NR_FREE_PAGES> min_free_page), the memory allocation code will find a free block as the request size in Free_Area. Each element in Free_Area saves a bitmap that reflects the assigned and idle page of this size. For example, the second element in the Free_Area array points to a memory image that reflects the memory block allocation of the four pages. The allocation algorithm first searches for a page that satisfies the request size. It starts from the LIST domain of the Free_Area data structure to search the idle page. If there is no such idle page that requests the size, it searches twice the memory block of the request size. This process will continue until Free_Area searched or found a memory block that meets the requirements. If the found page block is greater than the requested block, it segments it to match the size of the request block. Since the block size is 2 power, the segmentation process is very simple. The idle block is connected to the corresponding queue and this page block is assigned to the caller.
Figure 3.4 FREE_AREA Data Structure In Figure 3.4, when there is a request for the size of the two page blocks, the first 4 page size memory block (starting from the page frame number 4) will be divided into two 2 pages. Piece. The previous, starting from the page frame number 4, returning to the requester, then one, starting from the page frame number 6, will be added to the free_Area array element representing the two page size idle block elements 1. 3.4.2 Page Recycling Great Page Block Brake Break Makes the number of zero-broken idle page blocks in the system. The page recycling code combines these pages to form a single large page block under the appropriate timing. In fact, the page block size determines the extent to which the page re-combined. When the page block is released, the code will check if there is an adjacent or buddy memory block in the same size. If there is, combine them to form a new space for twice the original size. After each combination, the code also checks if it can continue to be merged into larger pages. The best case is that the system's idle page block will be as large as the maximum memory allowed to be allocated. In Figure 3.4, if the page frame number 1 is released, it will be discharged into the first element of Free_Area as the idle block of the idle page frame number 0 as the size of 2 pages. 3.5 Memory Mapping Image Execution, the content of the executable image will be transferred in the process virtual address space. This is the same as shared libraries that can be implemented. However, the executable actually does not tune into physical memory, but only the virtual memory connected to the process. When this section is referenced to this section of the other part of the program, the memory is transferred from the disk. The process of connecting the image to a process virtual address space is called a memory map. Figure 3.5 Virtual Memory Area Each process is represented by a mm_struct. It contains the currently executed image (such as Bash) and a large number of pointers pointing to VM_Area_struct. Each VM_AREA_STRUCT data structure describes the start and end position of virtual memory, and the process of access to this memory area and a set of memory operation functions. These functions are the subroutines that Linux must use when manipulating the virtual memory area. One of the cases responsible for processing the process attempt to access virtual memory (via page failure) in the current physical memory. This function is called Nopage. It is used in Linux attempt to transfer the page of the executable image into memory. A set of corresponding VM_AREA_STRUCT data structures will be generated when an executable image is mapped to a process virtual address. Each VM_AREA_STRUCT data structure represents a part of the executable image: executable code, initialization data (variable), uninited data, etc. Linux supports many standard virtual memory operation functions, and there is a set of corresponding virtual memory operation functions when creating a VM_AREA_STRUCT data structure. 3.6 Request page When the executive image is completed, it can start running after the mapping of the process virtual address space. Since only a few images are transferred to memory, access to virtual memory areas that are not in physical memory will soon occur. When the process accesses the virtual address without a valid page entry, the processor will report a page error to Linux. Page error with virtual addresses that fail to occur and the way to get failed. Linux must find the VM_Area_struct structure indicating this area. The search speed for the VM_Area_Struct data structure determines the efficiency of the processing page error, and all VM_Area_struct structures are connected by an AVL (ADELSON-VELSKII AND LANDIS tree structure. If you can't find a correspondence between VM_Area_struct with this failure virtual address, the system considers that this process has access to illegal virtual addresses. At this time, Linux will send a SIGSEGV signal to the process, and if the process does not have this signal, the operation is terminated. If this correspondence is found, Linux then checks the type of access to the incorrect. If the process accesses the memory in an illegal manner, for example, write a write operation on the non-writable area, the system will generate a signal of memory errors. If Linux believes that the page error is legal, it needs to be processed. First Linux must distinguish the page located in the exchange file and the executable images located on the disk.
ALPHA AXP's page table is possible to have a valid bit without setting but there is a page table entry in the PFN domain. In this case, the PFN domain indicates that the location of this page in the swap file. How to deal with the page in the exchange file will be discussed in the next chapter. Not all VM_Area_struct data structures have a set of virtual memory operation functions, and some don't even have a NOPAGE function. This is because Linux fixes this visit by assigning a new physical page and creates a valid page table entry. If this memory area has a NOPAGE operation function, Linux will call it. The general linux nopage function is used to process memory mapping executables, and it uses page cache to transfer the requested page into physical memory. The processor page table must be updated when the requested page is submitted to physical memory. Updating these portals must perform related hardware operations, especially when the processor uses TLB. Thus, when the page failure is processed, the process will start running from the position of the failure virtual memory access. 3.7 Linux Page Cache Figure 3.6 Linux Page Cache Linux Use the Page Cache to speed up access to files on the disk. The memory mapped file reads out each time and stores these pages in the page Cache. Figure 3.6 Indicates the page cache consisting of Page_haash_TABLE, pointing to the pointer array of the MEM_MAP_T data structure. Each file in Linux is identified by a VFS Inode (in the file system chapter) data structure and each VFS inode is unique, it can only describe a file. The index of the page list is derived from the file's VFS inode and files. Read the page from a memory mapping file, for example, when you generate a page request, you should read the page back in memory, and the system attempts to read from the page cache. If the page is in the cache, the page failure process is returned a point to the MEM_MAP_T data structure; otherwise this page reads memory from the file system containing the image and assigns the physical page. During the reading and execution of the image, the page Cache grows grows. When a page is no longer needed, it is no longer used in any process, it will be removed from the page cache. 3.8 Replacement and Discard Page When physical memory is reduced in the system, the Linux memory management subsystem must release the physical page. This task is done by the core swap background process (KSWAPD). The core swap background process is a special core thread. It is a process without virtual memory and runs at a core state on the physical address space. The name of the core switching background process is easy to misunderstand, in fact, it has much more efforts than only exchanges the page to the system exchange file. Its goal is to ensure that there is enough free page in the system to maintain the operational efficiency of the memory management system. This process is run by the core init process in the system starts, and is called by the core switch timer. When the timer arrives, the swap background process will check if the number of idle pages in the system is too small. It uses two variables: free_pages_high and free_page_low to determine if some pages are released. As long as the number of idle pages in the system is greater than free_pages_high, the core exchange daemon does not do anything; it will sleep until the next timer arrive. In the inspection, the core switching background process also calculates the number of pages currently written to the swap file, which uses nr_async_pages to record this value; when there is a page being discharged into the exchange file queue, it will increase Once, when the write operation is completed, it will be reduced once. If the number of idle pages in the system is in Free_PAGES_HIGH or even free_pages_low, the core switching background process will reduce the number of physical pages used in the system by three ways: reduce the size of the buffer and page cache, and the memory page of the system V type Exchange out, change or discard the page. If the number of idle pages in the system is below free_pages_low, the core switching background process will release 6 pages before running. Otherwise it only releases 3.
The above three methods will be used in turn until the system releases enough free page. When the core switching background process is attempt to release the physical page, it will record the last way used. Next time it will first run the last last successful algorithm. When the sufficient page is released, the core switching background process will sleep again to the next timer. If the cause of the core swap background process release page is that the number of idle pages in the system is less than free_pages_low, it is only half a time when it sleeps. Once the number of idle pages is greater than free_pages_low, the sleep time of the core exchange process will extend. 3.8.1 Reduce Page Cache and Buffer Cache Page Cache and Buffer Cache will be prioritized to release into the Free_Area array. Page Cache is included in the page of memory mapping files, some of which may be unnecessary, and they areted to waste the system's memory. Buffer Cache is included in the buffer data read and written from the physical device, and some may be unnecessary. When the physical page is exhausted in the system, the discard page is relatively simple from these cache (it does not need to be swapped from memory, there is no need to write to the physical device). In addition to the decrease in access to physical equipment and memory map files, the page discarding strategy does not have much side effects. If the strategy is proper, all processes have the same loss. Every time the core swap background process will try to compress these cache. It first checks that the page block in the MEM_MAP page array see if there is any discard from physical memory. When the number of idle pages in the system is lowered to a dangerous level, the core daemon is frequently exchanged, and the page block checking is generally relatively large. The inspection mode is rotation, and each time you try to compress the memory image, the core daemon is always checked different page blocks. This is a well-known CLOCK algorithm that checks the page each time the entire MEM_MAP page array. The core background exchange process will check if each page is to be buffered by Page Cache or Buffer Cache. Readers may have noticed that the shared page is not considered a page of the page, which does not appear in both Cache at the same time. If the page is not in any of these two, it will check the next page in the MEM_MAP page array. The page in buffer cache (or buffer in the page) can make buffer allocation and recovery more effective. The memory compression code will force the discharge to release the buffer included in the inspection page. If all buffers included in the page are released, this page will also be released. If the inspection page is in Linux's Page Cac, it will be removed and released from the Page Cache. If you release enough pages, the core switching background process will wait next time to wake up. These released pages are not part of any process virtual memory, so there is no need to update the page table. If there is not enough buffer page discard, the exchange process will try to exchange some shared pages. 3.8.2 Replacement System V Memory Page System V Shared Memory is a mechanism for implementing process communication between processes through sharing virtual memory between processes. The process is how to share memory will be discussed in detail in the IPC chapter. Now only need to explain any area of the system V shared memory to be expressed in a SHMID_DS data structure. This structure contains a linked list pointer to VM_Area, and VM_Area is a structure designed for each shared virtual memory area. They are connected between VM_NEXT_SHARED and VM_PREV_SHARED pointers. Each SHMID_DS data structure contains a page table entry, and each entry describes the mapping relationship between the physical page and the shared virtual page. The core swap background process also uses the Clock algorithm to exchange the system V shared memory page. Each time it runs, it is remembers which page of the shared virtual memory area is the last one swap. Two indexes can help it complete this work, one is a set of SHMID_DS data structures, the other is the index of the page table entry list of the system V shared memory area. This ensures a fair choice for system V shared memory area.
Since the physical page number of the virtual memory for a given system V is saved in the page table of all shared virtual memory area processes, the core switching background process must also modify all pages at the same time to indicate that the page is no longer in memory. Exchange file. For each shared page to exchange, the core switching background process can find them (via the VM_AREA_STRUCT data structure) in the page table in the page table of each shared process. If the process page of the page corresponding to this system V shared memory is valid, it can convert it into invalid, so that the number of users who have changed the page table entry and the shared page will be reduced. Intensive System V Sharing Page Table Inlet The format contains an index corresponding to a set of SHMID_DS data structures and a page table entry index of the system V shared memory area. If the page table of all shared processes is modified, the shared page can be written to the swap file after the page table of all shared processes is modified. Also pointing to this system V Shared memory area The page table portal in the SMID_DS data structure linked list is also replaced by the page table entry instead. Intensive Page Table Inlet Although it is invalid but it contains an index of an open exchange file, it can also find the offset of the replacement page in the file. This information is very useful when the page is re-entered physical memory. 3.8.3 Replacement and Discard The page swap background process sequentially checks each process in the system to confirm who is best for exchange. Combine candidates are those that can be exchanged (some are unswabed) and only one or several pages in memory. Only those pages that cannot be retrieved from those containing data will be swapped in the physical memory to the system exchange file. Many of the executable images can be read from the image file and can be easily reused. For example, executable instructions in the image cannot be modified by the image itself, so it will never write to the exchange file. These pages can be discarded directly. When the process references them again, you only need to read the memory from the executable image file. Once it is determined that the process that will be swapped, the swap background process will search for its entire virtual memory area to find the area that is not shared or lock. Linux does not exchange the entire exchange page of the selected process, it only deletes a small number of pages. If the memory is locked, the page cannot be exchanged or discarded. Linux swap algorithm uses page aging algorithms. Each page has a counter to tell the core switching background process This page is worth swap (this counter is included in the MEM_MAP_T structure). When the page is not used or not, it will be aging; the switching the background process only exchanges those old pages. The default operation is: When the page is first allocated, its age initial value is 3, each reference is added to the age will be 3, the maximum is 20. Each time the core swap background process runs it to make the page aging - minus age. This default operation can be changed and they are stored in the SWAP_Control data structure for this reason. If the page is old (AGE = 0), the switching background process will process it. The DIRTY page can be swapped out. Linux uses a hardware-related bit in PTE to describe this feature of the page (see Figure 3.2). However, not all Dirty pages must be written to the swap file. Each virtual memory area of the process may have its own exchange operation (represented by the VM_OPS pointer in the VM_Area_STRUCT structure), which is used in exchange. Otherwise, the switching background process will allocate a page in the switched file and write the page to the device. The page table entry of the page is marked into invalid but it contains information on the location in the swap file, including a shift value indicating the location of the page in the exchange file and which exchange file is used. But no matter which exchange algorithm is used, the physical page will be marked into idle and put it in Free_Area. The page of Clean (or Not Dirty) can discard simultaneously in Free_Area for reuse. If there is enough Exchange Process page to be swap or discarded, the swap background process will sleep again. Next time it wakes up, consider the next process in the system. In this way, the switching the background process will be in a balanced state at a point in which the system is exchanged or discardable physical page. This is much more fair than exchanges the entire process.