Memory (ZT)

xiaoxiao2021-03-06 53

RAM

table of Contents

RAM

Memory Manage Subsystem Guide to Initialize the Borrowing User Process Code of Core Core Directory for the Borrowing User Process Directory of the User Process Directory Memory.c

copy_page clear_page_tables oom free_page_tables new_page_tables copy_one_pte copy_pte_range copy_pmd_range copy_page_range free_pte forget_pte zap_pte_range zap_pmd_range zap_page_range zeromap_pte_range and other remap_pte_range and other put_dirty_page handle_mm_fault mmap.c partner (buddy) algorithm page directory handle macro MM author's articles

RAM

The memory management system is the most important part of the operating system because the physical memory of the system is always less than the number of memory required by the system. Virtual memory is to overcome this strategy used in this contradiction. The virtual memory of the system makes the system more than the actual memory capacity by sharing memory between the various processes.

Virtual memory can provide the following features:

* Vast address space. The virtual memory of the system can be much larger than the actual memory of the system.

* Protection of the process. Each process in the system has its own virtual address space. These virtual address spaces are completely separated, such a process run does not affect other processes. Also, the virtual memory mechanism on the hardware is protected, and the memory cannot be written, which prevents the lost application to overwrite the code data.

* Memory mapping. Memory mapping is used to map files to the address space of the process. In the memory map, the contents of the file are directly connected to the virtual address space of the process.

* Fair physical memory allocation. The memory management system allows each running process in the system to get the physical memory of the system fairly.

* Share virtual memory. Although the virtual memory allows the process to have its own separate virtual address space, it is sometimes desirable to share memory.

Linux only uses four segments

Two representatives (CODE and DATA / Stack) are kernel space from [0xc000 0000] (3 GB) to [0xffffffffffff] (4 GB) is two representatives (Code and Data / Stack) are user space from [0] (0 GB) to [0xBFFFFFF] (3 GB)

__ 4 GB ---> | | | | KERNEL | | Nuclear space (Code Data / Stack) | | __ | 3 GB ---> | ---------------- | | | | | | | | | TASKS | | User space (code data / stack) | | | 1 GB ---> | | | | | | | | ________________ | __ | 0x00000000 kernel / user linear address

Linux can use a 3-layer page mapping, such as on the advanced I64 server, but only 2 in the i386 architecture has actual significance:

-------------------------------------------------- ---------------- Linear address -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- -------------------------------- / ___ / / ___ / / _____ / pd offset PF offset Frame Offset [10 bits] [10 bits] [12 bits] | | | | | ----------- | | | Value | ---------- | ------ | | | | --------- | / | / | | | | | | | | | | | | | | | | | | | | | - --------- | <------ | | | | | | | | | | | | | PF Offset | _________ | ------- | | | / | / | | | PD Offset | _________ | ----- | | | | | | | | | | | | | | | | | / | / | | | - ------> | _________ | Physical Address | / | / | | ------> | | | || _____ | | ....... | | ....... | | | | |

Page Table Table Table

Linux i386 pagination

Note that the kernel (only kernel) linear space is equal to the kernel physical space, so as follows:

____________________ | Other kernel data | ___ | | | | ---------------- | | | | ____ | Truth | 3 GB --- > | ---------------- | / | Kernel Data | | | | / | | | __ | _ / _ / ____ | __R | | Tasks | / | Tasks | | __ | ___ / _ / __ | | | | / / | ---------------- | | | / | Real kernel space | | ________________ | / | ________________ | logical address physical address

[Memory real-time allocation]

· Copy_mm [kernel / fork.c] · allocate_mm [kernel / fork.c] · kmem_cache_alloc [mm / slab.c] · __kmem_cache_alloc · kmem_cache_alloc_one · alloc_new_slab · kmem_cache_grow · kmem_getpages · __get_free_pages [mm / page_alloc.c] · alloc_pages [mm /NUMA.C] ·alloc_pages_pgdat ·__alloc_pages [mm / page_alloc.c] · rm_queue · reclaim_pages [mm / vmscan.c]

[Memory Exchange Thread KSWAPD]

| Kswapd | // initialization routines | for (;;) {// Main loop | do_try_to_free_pages | recalculate_vm_stats | refill_inactive_scan | run_task_queue | interruptible_sleep_on_timeout // we sleep for a new swap request |}

· Kswapd [mm / vmscan.c] · do_try_to_free_pages · recalculate_vm_stats [mm / swap.c] · refill_inactive_scan [mm / vmswap.c] · run_task_queue [kernel / softirq.c] · interruptible_sleep_on_timeout [kernel / sched.c] [Memory Switch System: Exception of insufficient memory]

| Page Fault Exception | Cause By All these Conditions: | A-) User Page | B-) Read or Write Access | C-) Page Not Present || -----------> | DO_PAGE_FAULT | HANDLE_MM_FAULT | PTE_ALLOC | PTE_ALLOC_ONE | __GET_FREE_PAGE = __GET_FREE_PAGE = __GET_FREE_PAGE = __GET_FREE_PAGES | Alloc_pages | Alloc_pages_pgdat | __alloc_pages | wakeup_kswapd // We Wake Up Kernel Thread Kswapd

· Do_page_fault [arch / i386 / mm / fault.c] · handle_mm_fault [mm / memory.c] · pte_alloc · pte_alloc_one [include / asm / pgalloc.h] · __get_free_page [include / linux / mm.h] · __get_free_pages [mm /PAGE_ALLOC.C] ·alloc_pages [mm / Numa.c] · Alloc_pages_pgdat · __alloc_pages · wakeup_kswapd [mm / vmscan.c]

[table of Contents]

Memory Management Subsystem Guide from Aka

My goal is 'Guide', providing the overall concept of the Linux memory management subsystem, and gives a further in-depth study (including code organization, file, and main functions and some reference documents). This way, because I am in the process of reading the code, I feel that "reading a code is easy, and I am very impossible to grasp the overall thinking." Moreover, when I write some kernel code, I feel that in many cases, I don't have to understand all kernel code very specific, and it is often understood that its interface and overall work are enough. Of course, my personal ability is limited, and time is not enough. Many things are also committed to lectures in the lecture pressure :), the content is inevitable and even mistaken, welcome everyone to correct.

Storage hierarchy and x86 storage management hardware (MMU)

It is assumed here to have a certain understanding of virtual storage and segment pages. It mainly emphasizes some conceptual or easy to misunderstand the concept.

Storage level

Cache -> Main Memory ---> Disk (Disk)

Understand the root cause of the storage hierarchy: the gap between the CPU speed and the memory speed.

The reason for the hierarchical structure: topical principles.

Linux tasks:

Reduce Footprint, improve the cache hit rate, and take advantage of topics.

Realize virtual storage to meet the needs of the process, effectively manage memory allocation, and strive to use limited resources.

Reference documentation:

"Too Little, Too Small" by Rik Van Riel, Nov. 27, 2000.

And all architectural materials :)

The role of MMU

The auxiliary operating system performs memory management, providing hardware support such as virtual real address transition.

X86 address

Logical address: The address used to make an operand is used in the machine instruction. Segment: shift

Linear address: After the logical address is processed by the segment unit, it is a linear address, which is a 32-bit unsigned integer, which can be used to locate 4G storage units.

Physical address: The linear address is obtained after the page table lookup, which will be sent to the address bus to indicate the physical memory unit to be accessed.

Linux: Try to avoid using segment function to increase portability. If the base address is 0, the logical address == linear address is made.

X86 segment

Segment in protective mode: Selecton descriptor. More than just a base address is to provide more information: protection, length limit, type, etc. The descriptor is placed in a table (GDT or LDT), and the selector can be considered to be the index of the table. Segment registers are stored in the selection, while the segment register is loaded, the data in the descriptor is loaded into an invisible register for quick access. (Figure) P40

Special Register: GDTR (including the first address of the global description of the schedule), LDTR (segment of the current process describes the status address), TSR (point to the task status section of the current process)

The paragraph used by Linux:

__Kernel_cs: Nuclear code segment. Range 0-4g. Readable, execute. DPL = 0.

__Kernel_ds: Nuclear code segment. Range 0-4g. Readable, write. DPL = 0.

__User_cs: Nuclear code segment. Range 0-4g. Readable, execute. DPL = 3.

__User_ds: The kernel code segment. Range 0-4g. Readable, write. DPL = 3.

TSS (task status segment): The hardware context of the storage process is used when the process is switched. (Because x86 hardware has a certain support for TSS, all of which has this special segment and corresponding dedicated register.)

DEFAULT_LDT: Theoretically use many segments at the same time, these segments can be stored in their own LDT segments, but actual Linux rarely uses the X86's features, all processes share this segment, it only contains an empty Descriptor.

There are also some special segments in the power management and other codes.

(Before 2.2, the LDT and TSS segments of each process exist in GDT, and GDT can only have 8192 items, so the total number of processes in the entire system is limited to around 4090.2.4 no longer exist in GDT Cancel this limit.)

__User_cs and __user_ds segments are shared by all processes in user states. Note Don't confuse this sharing and process space: Although you use the same paragraph, by using different pages tables, the process space is still independent.

X86 page mechanism

X86 Hardware supports two-level page table, and the models above Pentium PRO also support Physical Address Extension Mode and Levels. The so-called hardware support includes some special registers (CR0-CR4), and the CPU recognizes some of the flags in the page entry and reacts according to access. If you read the page of the Present bit 0 or the page written to the Read / Write bit 0 will cause the CPU to issue the Page Fault exception, and automatically set the Accessed bit after accessing the page.

Linux uses a three-level page table model that is unrelated to the architecture (as shown), using a series of macros to cover details of various platforms. For example, by treating the PMD as only one table and stores in the PGD entry (the first address of the PMD table in the normal PGD table), the intermediate directory (PMD) of the page table is clever 'folded' The global directory (PGD) to the page table is adapted to the secondary page table hardware. TLB

TLB is full of TLBs is Translation Look-Aside Buffer, which is used to speed up page table lookup. The key point here is that if the operating system changes the page table content, it must refresh the TLB accordingly to make the CPU incomplete entries.

Cache

Cache is basically transparent to programmers, but different use methods can lead to large different performance. Linux has many key places to optimize the code, many of which are to reduce unnecessary pollution to cache. If you put only the code used in the event of only the error in .Fixup Section, you will centrally use the data frequently used to a Cache line (such as struct task_struct), reduce the Footprint of some functions, SLAB Coloring, etc. in the SLAB dispenser.

In addition, we must also know when cache is invalid: New MAP / Remap one page to an address, page out, page protection change, process switching, etc., that is, when Cache corresponds to the content or meaning of the address of the Cache Time. Of course, there is no need to invalidate throughout Cache in many cases, just if you need an address or address range. In fact,

Intel is doing very well in this regard, and Cache consistency is completely maintained by hardware.

For more information on x86 processors, please refer to its manual: Volume 3: Architecture and Programmingmanual

8. Linux related implementation

This part of the code and architecture are closely related, so most of the Arch subdirectory and a large number of macro definitions and inline functions exist in the header file. Take the i386 platform as an example, the main documents include:

Page.h

Page size, page mask definition. Page_size, Page_shift and Page_Mask.

Operation of the page, such as clear page clear_page, copy page COPY_PAGE, page pair Page_Align

There is also the starting point of the internal nuclear deficiency address: the famous Page_offset :) and the macro __pa and __va in the related kernel ..

Virt_to_page gets the description structure of this page from a kernel deficiency address. We know that all physical memory is described by a MEMMAP array. This macro is the location of the physical page calculating a given address in this array. In addition, this file also defines a simple macro check that a page is not legal: valid_page (page). If Page is too far from the beginning of the MEMMAP array to exceed the distance that the maximum physical page should have, it is not legal.

The more strange thing is that the definition of the page item is also placed here. PGD_T, PMD_T, PTE_T and Macro XXX_VAL for accessing them

PGTABLE.H PGTABLE-2LEVEL.H PGTABLE-3LEVEL.H

As the name suggests, these files are to process the page table, which provides a series of macro action pages. PGTABLE-2LEVEL.H and PGTABLE-2LEVEL.H respectively correspond to the needs of the X86 secondary and third-level page tables, respectively. First of all, of course, how many of the definitions of each page table is different. And in PAE mode, the address exceeds 32 bits, and the page entry PTE_T is indicated by 64 (PMD_T, PGD_T does not need to be changed), and some operations for the entire page entry are different. There are several categories:

· [PTE / PMD / PGD] _ERROR is printed when printing items, 64-bit and 32 positions are of course different. · Set_ [PTE / PMD / PGD] Settings Table Item Value • PTE_SAME Comparison PTE_page The MEMMAP position and PTE_none are empty from PTE. · __Mk_pte Constructs the macro of PtepgTable.h, no longer explained one by one. In fact, it is also intuitive, usually it can be seen from the name. The parameter of the PTE_xxx macro is PTE_T, and the parameters of PTEP_xxx are PTE_T *. 2.4 Kernel has made some efforts in Clean Up of code, and many local vague names become clear, and some functions of some functions are better.

In addition to the macro of the page table, PGTable.h is also reasonable because they are often used in the page table operation. The TLB operation here is started in __, that is, the internal use, the real external interface is in pgalloc.h (which may be because in the SMP version, the TLB refresh function and the stand-alone version are large, some It is no longer embedded function and macro).

Pgalloc.h

Including the assignment and release macro / function of the page entry, it is worth noting that the use of the key cache:

PGD / PMD / PTE_QUICKLIST

There are many localities in the kernel to use similar techniques to reduce the call to the memory allocation function, accelerate frequently used allocation. Such as Buffer_Head and Buffers in Buffer Cache, the most recently used area in the VM area.

There is also the TLB refresh interface mentioned above.

segment.h

Definition __kernel_cs [ds] __user_cs [ds]

reference:

The second chapter of "Understanding the Linux Kernel" gives a brief description of the implementation of Linux,

Physical memory management.

2.4 Memory management has a large change. Zone Based Buddy Systems is implemented on physical page management. The zone is divided according to the different types of use of memory. The memory of different regions uses a separate partner system (Buddy system) and is independently monitored.

(In fact, there is a higher layer and NUMA support. Numa (None Unified Memory Access) is an architecture in which different memory areas may have different access times for each processor in the system. It is determined by the distance from memory and processor). And in general machines are called DRAM, ie dynamic random access memory, which is the same for each unit, and the CPU is the same. Numa's access speed is the same memory area For a Node, the main task that supports this structure is to minimize communication between Node, so that the data to be used for each processor is as fast as possible in the fastest node. 2.4 kernel Node & # 0; the corresponding data structure is pg_data_t, each Node has its own MEMMAP array, divides its memory into several zone, each zone reuses the independent partner system management physical page. Numa has a lot of problems to deal with, Not much perfect, there is not much to say)

Design of the area-based partner system design & # 0; management of physical page

The two major issues of memory allocation are: allocation efficiency, fragmentation problem. A good dispenser should be able to quickly meet the allocation requirements of various sizes, and cannot produce a lot of debris waste space. Partner system is a commonly used ratio algorithm. (Explanation: Todo)

The concept of the introduction area is to distinguish between different use types (methods?) To make more effectively use them.

2.4 There are three districts: DMA, NORMAL, and HIGHMEM. The first two actually managed by independent buddy system, but there is no clear zone concept in 2.2. The DMA zone is usually less than 16 megaby physical memory in the X86 architecture because the DMA controller can only use this paragraph of memory. HIGHMEM is a high-end memory that exceeds a certain value (usually about 900m). The other is Normal area memory. Due to Linux implementation, the high address memory cannot be used directly by the kernel. If the config_highmem option is selected, the kernel will use a special approach to use them. (Explanation: Todo). HIGHMEM is only used for page cache and user processes. After this is separated, we will be more targeted to use memory, and without having to use a large amount of memory available to DMA to use a user process that causes the driver to not get enough DMA memory. In addition, each zone independently monitors the usage of memory in this area, and the system will determine which region allocation comparison is compared, considering the requirements of the user and system status. 2.4 Mills the page may interact with the high-level VM code (allocation according to the case of idle pages, the kernel may allocate the page from the partner system, or directly to the assigned page to recover & # 0; reclaim, etc.), code ratio 2.2 Complex a lot, comprehensively understand it is familiar with the mechanism of the entire VM work. The main interface of the entire dispenser is the following functions (mm.h page_alloc.c):

Struct Page * Alloc_pages (int GFP_MASK, UNSIGNED Long ORDER) Allocate 2 ^ Order pages from the appropriate area according to the requirements of GFTP_Mask, and return a descriptor of the first page.

#define alloc_page (GFP_MASK) Alloc_pages (GFP_MASK, 0)

Unsigned long __get_free_pages ((int GFP_MASK, UNSIGNED Long ORDER) works with alloc_pages, but returns the first address.

#define __get_free_page (GFP_MASK) __GET_FREE_PAGES (GFP_MASK, 0)

Get_free_page Assigns a clear page.

__free_page (s) and free_page (s) release page (one / multiple) The former is parameter as a page descriptor, and the latter is parameter with the page address.

About the Buddy algorithm, many textbooks have a detailed description, and the sixth chapter has a good introduction to Linux. For more information about Zone Base Buddy, see "Design for a zone based memory allocator written by Rik Van Riel." This person is currently Linuxmm's maintainer, authority. This article has a little longer, written in 1998, there is no hiGhmem, but thoughts are still effective. Also, the following article analyzes 2.4 implementation code:

Http://home.earthLink.net/~jknapka/linux-mm/zonealloc.html.

SLAB - Continuous physical region management

The splitter of the single allocation page must not meet the requirements. A large number of data structures in the kernel, the size from several bytes to dozens of hundred k qi, all of the power of the 2 is completely unrealistic. 2.0 The root solution is to provide a memory area having a size of 2, 4, 8, 16, ..., and 131056 bytes. When you need a new memory area, the kernel applies the page from the Partner System, divides them into one area, takes one to meet the needs; if the memory area in a page is released, the page will return to the partner system. This is not efficient to do. There are many places to improve: Different data types are allocated with different ways to increase the efficiency. For example, it is necessary to initialize the data structure. After the release, you can temporarily save, and it will not be initialized when redistribute. The function of the kernel often repeatedly uses the same type of memory area, and the most recently released object can accelerate allocation and release. The request for memory can be classified according to the request frequency, frequently used types of special cache, which is rarely used with a general purpose cache that can be used in similar 2.0. The probability of cache collisions is large when using the power of 2 power, and it is possible to reduce cache collisions by carefully arrangeing the starting address of the memory area. Cache a certain number of objects can reduce the call to the Buddy system, saving time and reduces cache pollution caused.

2.2 The implementation of the SLAB dispenser embodies these improvement ideas.

Main data structure

interface:

KMEM_CACHE_CREATE / KMEM_CACHE_DESTORY

KMEM_CACHE_GROW / KMEM_CACHE_REAP Growth / Reduced Size Cache

KMEM_CACHE_ALLOC / KMEM_CACHE_FREE Assign / release an object from a class cache

The assignment of the kmalloc / kfree generic cache, release function.

Related code (SLAB.C).

Related reference:

Http://www.lisoleg.net/lisoleg/memory/slab.pdf: SLAB inventor's papers must read the classic.

Chapter 6, detailed description of specific implementation.

Aka's 2000 context also has a strong prawn, please visit the AKA homepage: www.aka.org.cn

Vmalloc / vfree & # 0; physical address is discontinuous, the virtual address continuous memory management

Use the Kernel page table. The file Vmalloc.c is relatively simple.

2.4 The kernel's VM (perfect ...)

Process address space management

Create, destroy.

MM_STRUCT, VM_AREA_STRUCT, MMAP / MPROTECT / MUNMAP

Page Fault Processing, Demand Page, Copy ON Write

Related documents:

MM / SWAP.C KSWAPD uses various parameters and functions for the age of operation.

MM / SWAP_FILE.C swaps the operation of the partition / file.

MM / PAGE_IO.C Read or write a switched page.

MM / SWAP_STATE.C SWAP CACHE related operations, join / delete / find a swap cache, etc.

The mm / vmscan.c scanning the VM_Area of the process, trying to replace some pages (KSWAPD).

Reclaim_page: Recycled a page from INACTIVE_CLAN_LIST, put it in Free_List

KcLaimd is replicated after waking up Reclaim_Page until each area

Zone-> free_pages> = zone-> Pages_low

Page_Lauder: Calling from __alloc_pages and try_to_free_pages. Usually because the page of FreePags Inactive_clean_List is too small. Function: Transfer the page of INACTIVE_DIRTY_LIST to INACTIVE_CLEAN_LIST, first put the page (by bdflush) that has been written back or exchange zone (by bdflush), if Freepages are indeed shortage, wake up BDFLUSH, and then write back a certain number of Dirty pages over again.

For the logic of these queues (active_list, inactive_dirty_list, inactive_clean_list), please refer to: Document: RFC: Design for New VM, you can get from Lisoleg's documentation.

Page Cache, Buffer Cache and Swap Cache

Page Cache: Cache of the file content when reading and writing files, size is a page. Not necessarily continuous on the disk.

Buffer Cache: When reading and writing disk blocks, the Cache of the disk block content corresponds to a continuous area on the disk, and a buffer cache size may be from 512 (sector size) to a page.

Swap cache: It is a subset of Page Cache. The case where the page shared by multiple processes is swapped out of the switching area.

Relationship between Page Cache and Buffer Cache

Essentially different, buffer cache buffer magnetic disk block content, Page Cache buffer file 1 page content. The temporary buffer cache will be used to write the disk when writing.

BDFLUSH: Write Dirty's buffer cache back to disk. Usually only when dirty's buffer is too much or requires more buffer and runs short-term memory. Page_Lauder may also wake up it. Kupdate: Timed operation, writing the Dirty Buffer that the write-back period has reached the disk.

2.4 Improvement: Page Cache and Buffer Cache are coupled. In 2.2, the disk file reads Page Cache, and the Buffer Cache is written directly, so the problem of synchronization: You must use the update_vm_cache () update possible Page Cache. 2.4 Page Cache has made a relatively large improvement, and the file can be written directly by Page Cache, and Page Cache is prioritized using High memory. Moreover, 2.4 introduces new objects: File Address Space, which contains methods used to read and write a full page data. These methods take into account the update of Inode, the use of Page Cache processing and temporary buffer. The synchronization problem of Page Cache and Buffer Cache is eliminated. It turns out that INODE OFFSET lookup Page Cache becomes converted via FILE Address Space Offset; the Inode member in Struct Page is replaced by the address_space type Mapping member. This improvement also makes sharing of anonymous memory be possible (this is difficult to implement in 2.2, many discussions).

The virtual memory system brings a lot of experience from FreeBSD, which has made huge adjustments for 2.2.

Document: RFC: Design for New VM is not read.

Due to time rush, many of the new VMs have never been able to figure out. First roughly, I will further improve this article in the future, and strive to clear the problem. In addition, after this semester exam, I hope to provide you with some detailed source code.

[table of Contents]

User state

User space access kernel space, specific implementation methods should be considered from two aspects, first is the user process, you need to call MmApp to map your own virtual space to physical memory allocated by kernel; then the kernel space needs to reset the user process. This virtual memory page table describes its physical address points to the corresponding physical memory. Different processing needs to be made for several different memory allocations (kmalloc, vmalloc, and ioremap) for Linux kernels.

First, Linux Memory Management Overview

Here is my understanding, mainly from the data structure.

1. Physical memory is divided into one page in order, and a PAGE structure is used per page. System all physical page Page knots

According to the configuration, it constitutes an array MEM_MAP.

2, the virtual address space of the process uses Task_struct's domain mm to describe, it is a mm_struct structure, this structure package contains points?

Pointer (PGD_T * PGD) and pointer to process virtual memory area (Struct VM_Area_Structt * MMAP)

3, the process virtual memory area has a segment structure VM_Area_Struct (referred to as VMA). All VMAs all the processes?

Tree tissue.

4. Each VMA is an object that defines a set of operations, which can be processed for different types of VMA through this set of operations.

For example, the mapping of memory allocated by Vmalloc is implemented by the NOPAGE operation.

Second, MMAP processing

When the user calls MMAP, the kernel performs the following processing:

1. First look up a VMA in the virtual space of the process; 2, map this VMA to map

3. If the device driver or file system's file_Operations define the MMAP operation, call it.

4, insert this VMA into the VMA chain of the process

The MMAP method prototype defined in File_Operations is as follows: int (* mmap) (Struct File *, Struct VM_Area_struct *);

Where file is a file structure mapped to the virtual space, VM_AREA_STRUCT is the VMA found in step 1.

Third, the shortage fault processing process

When accessing an invalid virtual address (probably a protection fault, it is possible to make a fault, etc.), will generate a page failure,?

The process of the system is as follows:

1. Find the VMA where this virtual address is located;

2, if necessary, allocate the intermediate page catalog table and page table

3. If the physical page corresponding to the page entry does not exist, call this VMA's NOPAGE method, which returns the PAAGE description structure of the physical page

(Of course this is just one of them)

4. Pack the address of the physical page to the page table for the above situation

After the page troubleshooting is completed, the system will restart an instruction that causes the fault, and then you can access it.

The following is a VMA method: struct vm_operations_struct {void (* open) (struct vm_area_struct * area); void (* close) (struct vm_area_struct * area); struct page * (* nopage) (struct vm_area_struct * area, unsigned long address, innt

WRITE_ACCESS;

Where the widow function NOPAGE address is a virtual address that causes the fault failure, and Area is the VMA it is, Write_Acccess is accessible.

Attributes.

Third, specific implementation

3.1 Map of memory allocated to Kmalloc

The memory allocated to Kmalloc, because it is a continuous physical memory, so it can simply set the monk table in the MMAP routine.

The method is to use the function REMAP_PAGE_RANGE. Its prototype is as follows:

INT Remap_page_range (unsigned long from, unsigned long pHys_addr, unsigned long size,

PgProt_t protot

Where from FROM is a virtual address starting. This function is a scope of virtual address space from FROM and FROM SIZE;

Phys_addr is the physical address that the virtual address should be mapped; size is the size of the mapping area; PROT is the protection flag?

The process of remap_page_range is a page for from to Form Size, finding the page table of it is your page table (

Create a page table if necessary, clear the old content of the page table, and re-fill it in its physical address and protected domain.

REMAP_PAGE_RANGE can process multiple consecutive physical pages. << Linux device driver >> pointed out,

REMAP_PAGE_RANGE can only give access to the physical address of the reserved page and physical memory, which is made to non-reserved page pages?

When remap_page_range, the default NOPAGE process controls the zero page at which the visited virtual address is mapped. So after the internal memory is allocated?

For the allocated memory reserved bit, it is implemented by the function MEM_MAP_RESERVE, is it to the corresponding physical page?

PG_RESERVED flag. (About this, see the topic "Question about REMAP_PAGE_RANGE" stupid? Because remap_page_range has the above limit, it can be used in another way, which is the same as the Vmalloc Assignment

The method is processed for the fault of the pages.

3.2 Map of memory allocated for Vmalloc

3.2.1, Vmalloc allocates the process of memory

(1), pretreatment and legality checking, such as pages the distribution length, check whether the distribution length is too large?

(2), call the Kmalloc assignment as GFP_kernel (GFP_kernel is used in the process context, so is it in here?

The vmalloc in the interrupt handler calls VMALLOC) describes the VM_STRUCT structure of the memory allocated by Vmalloc.

(3), add SIZE to a page length, to form a 4K isolation strip in the middle, then between Vmalloc_Start and Vmallooc_end

Mark the VMList linked list, look for a free memory interval, fill the address in the VM_STRUCT structure

(4), return this address

Vmalloc allocated physical memory is not continuous

3.2.2, page directory and page table definition

TYPEDEF STRUCT {UNSIGNED Long PTE_LOW;} PTE_T; typef struct {unsigned long pmd;} pmd_t; typedef struct {unsigned long pgd;} pgd_t; #define PTE_VAL (x) ((x) .pte_low)

3.2.3, common routines:

(1), Virt_to_phys (): The kernel virtual address is transformed into physical address #define __pa (x) (unsigned long) (x) -page_offset) Extern inline unsigned long virt_to_phys (volatile void * address) {Return__pa (address); }

The above conversion process is to minimize 3G (page_offset = 0xc000000) because the kernel space is from 3G to 3G actual memory one?

0 to actual memory mapping to physical address

(2), phys_to_virt (): kernel physical address translates into virtual address #define __va (x) ((unsigned long) (x) Page_offset) EXTERN INLINE VOID * Phys_to_virt (unsigned long address) {Return __va (address);} Virt_to_phys () and phys_to_virt () are defined in include / ASM-I386 / IO.H

(3), # define Virt_to_page (kaddr) (MEM_MAP >> Page_shift) (illegal nuclear 2.4? #Define valid_page (page) (Page - MEM_MAP)

Inside the physical page. (The virtual address of these two macros must be the kernel virtual address, such as the address returned by Kmalloc #  ?

The address returned by Vmalloc is not like this, because the Vmalloc assignment is not a continuous physical memory, the middle may have empty emaps in the middle of the MMAP assigned by Vmalloc:

The memory allocated by Vmalloc needs to be implemented by setting the corresponding VMA's NOPAGE method, when generating a fault, will call VM

The NOPAGE method, our purpose is to return a pointer to a PAGE structure in the NOPAGE method. To this end, you need to pass the following steps?

(1) PGD_OFFSET_K or PGD_OFFSET: Find the page directory where the virtual address is located, the former corresponds to the kernel space

The latter's virtual address #define pgd_offset (mm, address) ((mm) -> pgd pgd_indress) # define pgd_offset_k (address) pgd_offset (& init_mm, address) PGD_Offset (Address) PGD_OffSet (& init_mm, address) PGD_OFFSET (& init_mm, address) PGD_OFFSET () (IDle process) virtual memory mm_struct structure, all proximal core forms of all processes are the same

. When VMalloc is allocated, you have to refresh the kernel page directory table, in order to save overhead in 2.4, only the process 0 gear is changed.

Record, and other processes generate pages anomalies to update their respective kernel page directories

(2) PMD_offset: Find the intermediate page directory entry where the virtual address is located. You should use PGD_none to determine if there is a phase before looking for

The page of the item, these functions are as follows: Extern inline int PGD_none (PGD_T PGD) {RETURN 0;} EXTERN INLINE PMD_T * PMD_offset (PGD_T * DIR, UNSIGNED Long Address {Return (PMD_T *) DIR;}

(3) PTE_OFFSET: Find the page table item corresponding to the virtual address. You should also use PMD_none to determine if there is a corresponding file intermediate page.

Per recorded: #define pmd_val (x) ((x) .pmd) #define pmd_none (x) (! PMD_VAL (x)) # define __pte_offset (address) / (Address >> Page_shift) & (PTRS_PER_PTE - 1)) # DEFINE PMD_PAGE (PMD) / (PMD_VAL (PMD) & Page_Mask) # define PTE_Offset (Dir, Address) ((PTE_T *) PMD_Page (* (DIR)) / __PTE_Offset (Address))

(4) PTE_PRESENT and PTE_PAGE: Whether the former determines whether the physical address corresponding to the page table is valid, the latter takes out the page table in the physical address pair?

Page Description #define PTE_Present (x) ((x) .pte_low & (_page_present | _page_protnone)) # define PTE_Page (x) (MEM_MAP ((((unsigned long) ((x) .pte_low >> Page_shift)))))))) #define page_address (Page) ((page) -> virtual)

One of the following relationships below is not big, it is to do such a thing, just keep a memory memory during startup, then make?

IREMAP maps it to kernel virtual space, and uses remap_page_range to map to user virtual space, so that you can access this memory, "ABCD" is initialized by the kernel virtual address, and then read it with the user virtual address.

/***********MMAP_IOREMAP.C6************/#include #include # Include #include #include / * for MEM_MAP_ (UN) RESERVE * / # include / * for virt_to_phys * / # include / * for kmalloc and kfree * /

Module_Parm (Mem_Start, "I"); Module_Parm (Mem_SIZE, I ");

Static int MEM_START = 101, MEM_SIZE = 10; static char * reserve_virt_addr; static int major;

int mmapdrv_open (struct inode * inode, struct file * file); int mmapdrv_release (struct inode * inode, struct file * file); int mmapdrv_mmap (struct file * file, struct vm_area_struct * vma);

STATIC STRUCT FILE_OPERATIONS MMAPDRV_FOPS = {Owner: this_module, mmap: mmapdrv_mmap, open: mmapdrv_open, release: mmapdrv_release,};

INT init_module (void) {IF ((Major = register_chrdev (0, "mmapdrv", & mmapdrv_fops) <0) {Printk ("mmapdrv: unable to register character device / n"); return (-eio);} printk "MMap Device Major =% D / N", Major);

Printk ("High Memory Physical Address 0x% LDM / N", Virt_to_Phys (High_Memory) / 1024/1024);

reserve_virt_addr = ioremap (mem_start * 1024 * 1024, mem_size * 1024 * 1024); printk ( "reserve_virt_addr = 0x% lx / n", (unsigned long) reserve_virt_addr); if (reserve_virt_addr) {int i; for (i = 0; I

/ * REMOVE THE MODULE * / VOID CLEANUP_MODULE (VOID) {if (reserve_virt_addr) IOUNMAP (RESERVE_VIRT_ADDR);

Unregister_chrdev (Major, "Mmapdrv");

Return;}

INT mmapdrv_open (struct inode * inode, struct file * file) {mod_inc_use_count; return (0);}

INT mmapdrv_release (struct inode * inode, struct file * file) {mod_dec_use_count; return (0);

INT mmapdrv_mmap (struct file * file, struct vm_area_struct * vma) {unsigned long offset = VMA-> VM_PGOFF << Page_shift; unsigned long size = VMA-> VM_END - VMA-> VM_Start;

IF (Size> MEM_SIZE * 1024 * 1024) {Printk ("Size TOO BIG / N"); Return (-ENXIO);

OFFSET = OFFSET MEM_START * 1024 * 1024;

/ * We do not want to have this area swapped out, lock it * / vma-> vm_flags | = VM_LOCKED; if (remap_page_range (vma-> vm_start, offset, size, PAGE_SHARED)) {printk ( "remap page range failed / n "); return -enxio;}

Return (0);

The tool Mapper test results that come with the LDD2 source code are as follows:

[root @ localhost modprg] # insmod mmap_ioremap.modmap device major = 254High Memory Physical Address 0x100mreserve_virt_addr = 0xc7038000

[root @ localhost modprg] # MKNOD MMAPDRV C 254 0

[root @ localhost modprg] # ./mapper mmapdrv 0 1024 | od -ax -t x1mapped "mmapdrv" from 0 to 1024000000 61 62 63 64 63 62 63 64 61 62 63 64 61 62 63 64 * 000400

[root @ localhost modprg] #

[table of Contents]

Initialization of kernel page directory

/ * swapper_pg_dir is the main page directory, address 0x00101000 * /

>>> Core directory, number 0, 1 and 768,767 are mapped to physical memory 0-8m page directory item >>> The physical address of its page table is 0x00102000 and 0x00103000, the following PG0 and PG1 location >>> (When started, move the kernel image to 0x0010000). >>> The above 0, 1 is the same as Section 768 and 767, because 3G-3G 8M after the linear address 0-8m and the opening >> (3G 8M after the paging before the page "is mapped to the same physical address 0 -8m

/ ** This is initialized to create an identity-mapping at 0-8M (for bootup * purposes) and another mapping of the 0-8M area at virtual address * PAGE_OFFSET. * /. Org 0x1000ENTRY (swapper_pg_dir) .long 0x00102007.long 0x00103007.Fill Boot_user_pgd_ptrs-2,4,0 / * default: 766 entries * /. Long 0x00102007.long 0x00103007 / * Default: 254 entries * /. Fill boot_kernel_pgd_ptrs-2,4,0

/ ** The page table 40mb here - the final page * Tables Are set up latter depending on membrate size. * / >>> below the page table item of the physical address 0-8m >>> from 0x4000 to 0x2000 A total of 2K page entries, mapping 0-8M physical memory

.org 0x2000ntry (PG0)

.org 0x3000ntry (PG1)

/ ** EMPTY_ZERO_PAGE MUST IMMEDIATELY FOLLOW The Page Tables! (The * Initialization Loop Counts Until EMPTY_ZERO_PAGE) * /

.org 0x4000ntry (EMPTY_ZERO_PAGE)

>>> Process 0 page directory points to swapper_pg_dir # define init_mm (name) / {/ mmap: & init_mmap, / mmap_avl: null, / mmap_cache: null, / pgd: swapper_pg_dir, / mm_users: atomic_init (2), / mm_count: atomic_init (1), / map_count: 1, / mmap_sem: __RWSEM_INITIALIZER (name.mmap_sem), / page_table_lock: SPIN_LOCK_UNLOCKED, / mmlist: LIST_HEAD_INIT (name.mmlist), /} / ** paging_init () sets up the page tables - note that the first 8MB are * already mapped by head.S. ** This routines also unmaps the page at virtual kernel address 0, so * that we can trap those pesky NULL-reference errors in the kernel. * / void __init paging_init (void) {pageTable_init ();

__ASM __ ("MOVL %% ECX, %% CR3 / N": "c" (__ papper_pg_dir))));

. . . . . . . . . . . }

Static void __init pagetable_init (void) {unsigned long vaddr, end; pgd_t * pgd, * pgd_base; int i, j, k; pmd_t * PMD; PTE_T * PTE, * PTE_BASE

>>> END Virtual Space Maximum (Maximum Physical Memory 3G) / * * This Can Be Zero As Well - No Problem, In That Case We Exit * The loops Anyway Due To the PTRS_PER_ * Conditions. * / End = (unsigned Long) __ va (max_low_pfn * page_size);

PGD_BASE = swapper_pg_dir; #if config_x86_paefor (i = 0; i >> Core Start Virtual Space In Core Directory Table Index i = __pgd_offset (PAGE_OFFSET); PGD = PGD_BASE i;

>>> #DEfine PTRS_PER_PGD 1024 >>> Each for the page directory starts from 768 items (; i >> Vaddr is the kernel space mapped by the I Item. starting virtual address, PGDIR_SIZE = 4M vaddr = i * PGDIR_SIZE; if (end && (vaddr> = end)) break; #if CONFIG_X86_PAE pmd = (pmd_t *) alloc_bootmem_low_pages (PAGE_SIZE); set_pgd (pgd, __pgd (__ pa ( PMD) 0x1)); # else >>> For two-stage mapping mechanism, PMD is actually PGD PMD = (PMD_T *) PGD; #ENDIF IF (PMD! = PMD_offset (PGD, 0)) BUG (); for (j = 0; j = End)) Break; >>> If the kernel does not support Page Size Extensions IF (CPU_HAS_PSE ) {. . . . . . . . . . } >>> Allocate kernel page table PTE_BASE = PTE = (PTE_T *) Alloc_bootmem_low_pages (>>> For each page entry for (k = 0; k = End)) Break; >>> Fill the page of the page * PTE = MK_PTE_PHYS (_PA (VADDR), Page_kernel; } >>> Fill the physical address of the page into the page table in set_pmd (PMD, __PMD (PTE_BASE))); if (PTE_BASE! = PTE_Offset (PMD, 0)) bug ();

}

/ * * Fixed mappings, only the page table structure - mappings will be set by set_fixmap ():

* / VADDR = __fix_to_virt (__ end_of_fixed_addresses - 1) & pmd_mask; fixRange_init (VADDR, 0, PGD_BASE);

#if config_highmem. . . . . . . . . . . . #ENDIF

#if config_x86_pae. . . . . . . . . . . . #ndif}

[table of Contents]

Bay of the kernel page directory

When creating a kernel thread, since the kernel thread does not have user space, and all the core-based directory of all processes is the same ((in some cases, there may be no synchronization, mainly to reduce the overhead of all process core "directory of synchronization However, just in each process to access the kernel space, if there is a case where you do not synchronize, then the synchronous processing is performed, the kernel page of the created kernel thread is always borrowed the process 0's kernel page directory. >>> kernel_thread Clone_vm calls Clone system call / ** Create a kernel thread * / int kernel_thread (int (* fn) (void *), void * arg, unsigned long flags) {long RetVal, D0;

__ASM____volatile __ ("MOVL %% ESP, %% ESI / N / T" "INT $ 0X80 / N / T" / * Linux / i386 system call * / "cmpl %% ESP, %% ESI / N / T" / * Child Or Parent? * / / * Load The Argument Into Eax, And Push It. That Way, IT Does * Not Matter WHether The Called Function Is Compiled with * -mRegparm or Not. * / "MOVL% 4, %% EAX / N / T "" Pushl %% EAX / N / T "" Call *% 5 / N / T "/ * CALL FN * /" MOVL% 3,% 0 / N / T "/ * EXIT * /" INT $ 0x80 / n "" 1: / t ":" = & a "," = & s "(d0):" 0 "(__nr_clone)," I "(__nr_exit)," R "(arg)," R "(fn)," B "(Flags | Clone_VM):" Memory "); Return RetVal;

>>> SYS_CLONE-> Do_FORK-> COPY_MM: Static int Copy_mm (unsigned long clone_flags, struct task_struct * tsk) {struct mm_struct * mm, * Oldmm; int RetVal

. . . . . . . .

TSK-> mm = null; TSK-> Active_mm = NULL;

/ * Arene We CLONING A KERNEL THREAD? * * WE NEED to Steal a Active VM for That .. * / >>> If the sub-thread of the kernel thread (mm = null), exit directly, the kernel thread mm And active_mm is nulloldmm = current-> mm; if (! Oldmm) returnograph;

>>> kernel thread, only increase the reference count IF (clone_flags & clone_vm) {atomic_inc (& oldmm-> mm_users); mm = oldmm; goto good_mm;

. . . . . . . . . .

Good_mm: >>> The MM and ACTIVE_MM of the kernel thread point to the mm_struct structure of the current process TSK-> mm = mm; TSK-> Active_MM = mm; return 0; . . . . . . }

The kernel thread generally calls daemonize to release references to user space: >>> daemonize-> EXIT_MM -> _EXIT_MM: / ** TURN US Into a lazy tlb process if we * aren't already .. * / static inline void __exit_mm (struct task_struct * tsk) {struct mm_struct * mm = tsk-> mm;

MM_RELEASE (); if (mm) {atomic_inc_inc (& mm-> mm_count); if (mm! = tsk-> Active_mm) bug (); / * More a Memory Barrier Than A Real lock * / task_lock (tsk); >> > Release the data structure of user virtual space TSK-> mm = NULL; TASK_UNLOCK (TSK); ENTER_LAZY_TLB (MM, Current, SMP_Processor_ID ());

>>> The reference count of the decrement MM is 0, which is the mapping MMPUT (mm) represented by MM;}}

ASMLINKAGE VOID SCHEDULE (Void) {. . . . . . . . . IF (! current-> Active_mm) bug ();

. . . . . . . . .

Prepare_to_Switch (); {struct mm_struct * mm = next-> mm; struct mm_struct * Oldmm = prev-> Active_mm; >>> mm = null, selected for kernel threads if (! mm) {>>> to kernel threads, Active_mm = null, otherwise it must be an error (next-> Active_mm) bug (); >>> selected kernel thread Active_mm borrow the old process Active_mm next-> ACTIVE_MM = OLDMM; atomic_inc_inc (& Oldmm-> mm_count); ENTER_LAZY_TLB Oldmm, next, this_cpu;} else {>>> mm! = NULL selected for the user process, Active_mm must be equal to MM, otherwise it must be an error (Next-> Active_MM! = mm) bug (); switch_mm ( Oldmm, MM, Next, this_cpu);

>>> prev = NULL, switching out is the kernel thread IF (! prev-> mm) {>>> sets its Active_MM = NULL. Prev-> Active_MM = NULL; MMDrop (OLDMM);}}

}

A virtual space for the kernel thread summarizes: 1. When you created: The parent process is a user process, then MM and Active_mm share the parent process, then the kernel thread generally calls the daemonize adapted to shop M parent process is the kernel thread, then mm and Active_mm is NULL in summary, the MM = NULL of the kernel thread; the process schedules are based on judging whether the user process or the kernel thread. 2, when the process is scheduled, if the kernel thread is switched, set Active_MM to switch the Active_mm of the process; if the kernel thread is switched, set the active_mm as NULL.

[table of Contents]

User Process Core Directory

In a process, you must establish your own kernel page directory item (kernel page directory item to be placed on the same physical address consecutive page with the page directory of the user space, so the kernel page of all processes must not be shared. Table and process 0 share?

3G user, page directory, one map 4M space (one page directory 1024 page table, each page table corresponding to 1 page 4K) # 即: #define pgdir_shift 22 # define pgdir_size (1 ul << pgdir_shift)

>>> sys_fork-> do_fork-> COPY_MM-> mm_init-> pgd_alloc-> get_pgd_slow

#if config_x86_pae

. . . . . . . . . . . . .

#ELSE

EXTERN __INLINE__ PGD_T * GET_PGD_SLOW (VOID) {>>> Assign page directory table (including 1024 page directory), the page directory assigned to a process can be mapped to 10024 * 4m = 4GPGD_T * PGD = (PGD_T *) __ get_free_page (Gfp_kernel);

IF (PGD) {>>> #define user_ptrs_per_pgd (task_size / pgdir_size) >>> task_size is 3G size, user_ptrs_per_pgd number of page directory items corresponding to user space (3G / 4M = 768? >>> Terms of user space Note MemSet (PGD, 0, User_PTRS_PER_PGD * SIZEOF (PGD_T)); >>> Section 768 of the Cable Directory Table (Swapper_PG_DIR) to 1023 items to the process of the page of the page record table, 1023 Memcpy (PGD User_PTRS_PER_PGD, SWAPPER_PG_DIR USER_PTRS_PER_PGD, (PTRS_PER__PGD - User_PTRS_PER_PGD) * SIZEOF (PGD_T));} Return PGD;}

#ENDIF

[table of Contents]

Core directory synchronization

When a process is faulty in the internal nuclear space, in its handler, you will have the kernel page directory of the synchronization of this process through the page number of the No. 0 process. It is actually the core directory of the copy 0 process. In the process (the kernel page is shared with the process 0, so it is not necessary to copy). As follows: asmlinkage void do_page_fault (struct pt_regs * regs, unsigned long error_code) {. . . . . . . . >>> Address / * get the address * / __ ASM __ ("MOVL %% CR2,% 0": "= R"); TSK = CURRENT;

/ * * We fault-in-defitary. The * 'reference' page table is init_mm.pgd. * / >> If the lattice fault is in the kernel space IF (address> = task_size) goto vmalloc_fault;

. . . . . . . . .

vmalloc_fault: {/ * * Synchronize this task's top level page-table * with the 'reference' page table * / int offset = __pgd_offset (address); pgd_t * pgd, * pgd_k; pmd_t * pmd, * pmd_k;.

PGD = TSK-> Active_mm-> PGD Offset; PGD_K = INIT_MM.PGD OFFSET;

>>> / * >>> * (PMDS Are Folded Into Pgds So this DoesNT Get Actually Called, >>> * But The Define IS NEEDED for a Generic Inline Function.) >>> * / >>> #define set_pmd ( PMDPTR, PMDVAL) (* (PMDPTR) = PMDVAL) >>> #define set_pgd (pgdptr, pgdval) (* (pgdptr) = pGDVAL)

>>> If the kernel page directory of this address does not exist if (! Pgd_present (* pgd)) {>>> If the internal core directory at the address 0 does not exist, then an error (! Pgd_present " * PGD_K)) GOTO BAD_AREA_NOSEMAPHORE; >>> Copy Process 0 Core Directory in the corresponding page directory of this process in the corresponding page directory of the process; return;} >>> EXTERN INLINE PMD_T * PMD_Offset (PGD_T * Dir, unsigned long address >>> {>>> RETURN (PMD_T *) DIR; >>>} PMD = PMD_offset (PGD, Address); PMD_K = PMD_Offset (PGD_K, Address);

>>> For the middle page directory, if it is a two-stage page table, the following steps are processed with the above repeat IF (PMD_Present (* PMD_K)) GOTO BAD_AREA_NOSEMAPHORE; set_pmd, * pmd_k) ; return;} / ** Switch to real mode and then execute the code * specified by the code and length parameters * We assume that length will aways be less that 100 * / void machine_real_restart (unsigned char * code, int length).! {

. . . . . . . . . . . . .

/ * Remap the Kernel At Virtual Address Zero, As Well as Offset Zero from The Kernel Segment. This Assumes The Kernel Segment Starts At Virtual Address Page_offset. * /

Memcpy (swapper_pg_dir, swapper_pg_dir user_pgd_ptrs, sizeof (swapper_pg_dir [0]) * kernel_pgd_ptrs;

/ * Make Sure The First Page is mapped to the start of physical memory. It is normally not mapped, to trap kernel null pointer dereferences. * /

PG0 [0] = _PAGE_RW | _PAGE_PRESENT

/ * * Use `swapper_pg_dir 'as outile (" MOVL% 0, %% CR3 "::" r "(__pa (swapper_pg_dir));

[table of Contents]

MLOCK code analysis

The function of the system call Mlock is the page required to block some user processes in memory.

The syntax called Mlock is:

INT SYS_MLOCK (unsigned long start, size_t len);

Initialized to:

LEN = (LEN (START & ~ PAGE_MASK) ~ PAGE_MASK) & Page_mask;

START & = Page_mask;

Where mlock calls DO_MLOCK (), the syntax is:

INT DO_MLOCK (unsigned long start, size_t len, int on);

Initialized to:

LEN = (LEN ~ PAGE_MASK) & Page_Mask;

It can be seen from the parameters of mlock that Mlock starts with the starting address of the start address in the START, and the length is LEN (Note: Len = (Len (start & ~ page_mask) ~ page_mask) The page of the memory area of the memory area of the memory area. SYS_MLOCK If the call successfully returns, all of which contains the page of the specific memory area must be resident memory, or if this part is locked before calling Munlock or MunlockAll, this part must be retained in memory. Of course, if the process of calling mlock terminates or calls EXEC to perform other programs, this part is released. The sub-process created via the fork () call is not able to inherit the page that Mlock-locking Mlock is called by the parent process. Memory shielding mainly has two applications: real-time algorithms and high degree of confidential data. Real-time applications require strict points, such as scheduling, dispatching page is a major factor in execution delay. Confidential security software often processes data structures such as passwords or keys such as passwords or keys. The result of page scheduling is possible to write these important bytes to the exemplary (such as hard drive). Such hackers may be able to delete these security software to access data in the hard disk after these security software. The above-mentioned challenges can be solved by locking memory. Memory lock does not use stack technology, that is, those that are locked multiple times by calling mlock or mlockall can release the return value analysis of the corresponding page MLOCK by calling Munlock or MunlockAll: If the mlock is successful, return 0; Unsuccessful, then -1, and errno is set, the address space of the process keeps the original state. Returns the error code analysis as follows: eNOMEM: Some specific address areas do not have the corresponding process address space corresponding or exceed the maximum lockable page allowed by the process. Eperm: Calling mlock has no correct priority. Only the root process allows locks the required pages. EINVAL: The input parameter LEN is not a legal positive number. Main data structure and important constants used by mlock

1. mm_structstruct mm_struct {int count; pgd_t * pgd; / * starting address of the page directory, as shown in * / unsigned long context shown in FIG. 2-3; unsigned long start_code, end_code, start_data, end_data; unsigned long start_brk, brk, start_stack , start_mmap; unsigned long arg_start, arg_end, env_start, env_end; unsigned long rss, total_vm, locked_vm; unsigned long def_flags; struct vm_area_struct * mmap; / * point vma doubly linked list pointer * / struct vm_area_struct * mmap_avl; / * point vma AVL The pointer of the tree * / struct semaphore mmap_sem;} start_code, end_code: The start address and end address of the process code segment. START_DATA, END_DATA: The start address and end address of the process data segment. Arg_Start, arg_end: Call the starting address and end address of the parameter area. ENV_START, ENV_END: The start address and end address of the process environment area. RSS: The total number of pages resides in physical memory.

2. Virtual Section (VMA) Data Structure: VM_Area_atruct

Vma virtual memory segment described by the data structure vm_area_atruct (include / linux / mm.h): struct vm_area_struct {struct mm_struct * vm_mm; / * VM area parameters * / unsigned long vm_start; unsigned long vm_end; pgprot_t vm_page_prot; unsigned short vm_flags; / * AVL tree of VM areas per task, sorted by address * / short vm_avl_height; struct vm_area_struct * vm_avl_left; struct vm_area_struct * vm_avl_right; / * linked list of VM areas per task, sorted by address * / struct vm_area_struct * vm_next; / * for areas with inode, the circular list inode-> i_mmap * // * for shm areas, the circular list of attaches * // * otherwise unused * / struct vm_area_struct * vm_next_share; struct vm_area_struct * vm_prev_share; / * more * / struct vm_operations_struct * VM_OPS; unsigned long vm_offset; struct inode * vm_inode; unsigned long VM_PTE; / * shared mem * /};

VM_Start; // The corresponding memory area start address VM_END; // The end address of the corresponding memory area VM_FLAGS; // The access author of the corresponding memory area VM_AVL_HEIGHT; / / AVL tree high VM_AVL_LEFT; // AVL tree Left son VM_AVL_RIGHT; / / AVL tree's right son VM_NEXT; // Sorted VM_Area Link Pointer VM_OPS; // A set of operations for memory is the operation of the memory is operated when the memory is operated. A set of methods that the Linux system must use. For example, when the process is ready to access a virtual area but find that this area does not exist when physical memory is not present (the shortage interruption), the correct behavior of the operation is performed correctly. This operation is an empty page (NOPAGE) operation. This empty page (NOPAGE) is used when the Linux system comes to the executable page image into memory. When an executable page image is mapped to the process's virtual address, the data structure (VMA) of a set of VM_Area_struct structures is generated. The data structure (VMA) of each VM_AREA_STRUCT represents a part of the executable page image: executable code, initialization data (variable), non-initialization data, etc. The Linux system can support a large number of standard virtual operations, which is created when the VM_AREA_STRUCT data structure (VMA) is created, it corresponds to a set of correct false operations. The VMA segment belonging to the same process is connected to the VM_NEXT pointer, which constitutes a linked list. As shown in Figure 2-3, the member structure of the Struct MM_STRUCT structure is a header of the VMA linked list of the process. In order to improve the VMA segment query, insert, delete operation, Linux also maintains an AVL (Adelson-Velskii and Landis tree. In the tree, all the VM_Area_Struct hover segments have left pointer VM_AVL_LEFT points to adjacent low address virtual segments, right pointer VM_AVL_RIGHT points to adjacent high address virtual segments, as shown in Figures 2-5. The member structure of the Struct MM_STRUCT structure Struct * mmap_AVL represents the root of the AVL tree of the process, and VM_AVL_HEIGHT represents the height of the AVL tree. Any operation of balance tree MMap_avl must meet some of the balanced trees: consistency and balancing rulesj (consistency and balance rules): Tree-> VM_AVL_HEIGHT == 1 max (Heightof (Tree-> VM_AVL_LEFT), Heightof (Tree-> vm_avl_right)) abs (heightof (tree-> vm_avl_left) - heightof (tree-> vm_avl_right)) <= 1foreach node in tree-> vm_avl_left: node-> vm_avl_key <= tree-> vm_avl_key, foreach node in tree-> vm_avl_right: Node-> VM_AVL_KEY> = Tree-> VM_AVL_KEY. Note: Node-> VM_AVL_KEY = Node-> VM_END

Packed, add protection, sharing, and dynamic expansion of VMA.

3. The important constants used in important constant mlock system calls are: Page_Mask, Page_Size, Page_shift, Rlimit_Memlock, VM_LOCKED, PF_SUPERPRIV, etc. Their values are as follows: PAGE_SHIFT 12 // PAGE_SHIFT determines the page size PAGE_SIZE 0x1000 // 1UL << PAGE_SHIFT PAGE_MASK ~ (PAGE_SIZE-1) // a very useful constant variable RLIMIT_MEMLOCK 8 // max locked-in-memory address space VM_LOCKED 0x2000 // 8 * 1024 = 8192, one of the vm_flags signs. PF_SUPERPRIV 0x00000100 // 512, Mlock System Call Code Function Analysis

The following is a detailed analysis of the functions of each function ((1) and (2) have been introduced in the previous profile Mlock, and there is a detailed program process behind: SUSER (): If the user is active (ie, current-> EUID == 0), set the process logo as root priority (current-> flags | = pf_superpriv), and return 1; otherwise returns 0. Find_vma (Struct MM_STRUCT * MM, Unsigned long addr): Enter the parameter as the mm of the current process, and needs to lock the start memory address AddR. Find_VMA's function is to find the first VMA that satisfies mm-> mmap_avl-> VM_Start <= addr mmap_AVL-> VM_END, if success is successful; otherwise return null. Mlock_fixup (STRUCT VM_AREA_STRUCT * VMA, Unsigned long start, unsigned long end, unsigned int newflags): Enter a VMA in the VM_MMAP chain, you need to lock memory area start addresses and end addresses, you need to modify the flag (0: Plock, 1: Release the lock). MERGE_SEGMENTS (STRUCT MM_STRUCT * MM, UNSIGNED Long START_ADDR, UNSIGNED Long Enddr): Enter the parameter to the mm of the current process, and you need to lock the start memory address start_addr and end the address END_ADDR. The functionality of MERGE_SEGMENTS is the most likely to merge (ie, the memory address offset continuous) and have the same attribute (including VM_INODE, VM_PTE, VM_OPS, VM_FLAGS), which is released in this process, which is released. The VM_MMAP chain is required to be sorted by address size (we do not need to traverse the entire table, but only need to traverse the neighboring VM_AREA_STRUCTS) of the cross or separately. Of course, in the default, MERGE_SEGMENTS is loop processing for the VM_MMAP_AVL tree, how much can be merged. Mlock_fixup_all (struct vm_area_struct * vma, int newflags): Enter a VMA in the VM_MMAP chain, which needs to be modified (0: lock, 1: Release lock). The function of mlock_fixup_all is to modify this VM_FLAGS of this VMA based on the input parameter newflags. Mlock_fixup_start (Struct VM_Area_Struct * VMA, Unsigned long end, int newflags): Enter a VMA in the VM_MMAP chain, you need to lock the memory area end address, you need to modify the flag (0: lock, 1: Release lock).

The function of mlock_fixup_start is based on the input parameter End, allocates a new new_vma in memory, divides the original VMA into two parts: New_VMA and VMA, where new_vma's VM_FLAGS is set to enter the parameter newflags; and press the address (new_vma-> start) And the new_vma-> end) Size sequence Insert the newly generated new-> VMA into the MMAP chain or MMAP_AVL tree of the current process MM (default is inserted into the MMAP_AVL tree). Note: VMA-> VM_OFFSET = VMA-> VM_START-NEW_VMA-> VM_Start; mlock_fixup_end (Struct VM_AREA_STRUCT * VMA, Unsigned long start, int newflags): Enter a VMA in the VM_MMAP chain, need to lock memory area start Address, the flag needs to be modified (0: Plock, 1: Release Lock). The function of mlock_fixup_end is based on the input parameter start, allocating a new new_vma in memory, dividing the original VMA into two parts: VMA and New_VMA, where new_vma's VM_FLAGS is set to enter the parameter newflags; and press the address size Sequence to put New--- > VMA is inserted into the MMAP chain or MMAP_AVL tree of the current process MM. NOTE: new_vma-> vm_offset = vma-> vm_offset (new_vma-> vm_start-vma-> vm_start); mlock_fixup_middle (struct vm_area_struct * vma, unsigned long start, unsigned long end, int newflags): input parameter is a chain vm_mmap A VMA, you need to lock memory area start addresses and end addresses, you need to modify the flag (0: lock, 1: Release lock). The function of mlock_fixup_middle is based on the input parameter start, end, allocating two new VMAs in memory, dividing the original VMA into three parts: LEFT_VMA, VMA, and Right_VMA, where VMA's VM_FLAGS is set to enter parameter newflags; and press the address size The sequence is inserted into the MMAP chain or MMAP_AVL tree of the current process MM. Note: VMA-> VM_OFFSET = VMA-> VM_START-LEFT_VMA-> VM_Start; Right_VMA-> VM_OFFSET = Right_VMA-> VM_START-LEFT_VMA-> VM_Start; kmalloc (): will be discussed in detail in later 3.3. INSERT_VM_STRUCT (STRUCT MM_STRUCT * MM, STRUCT VM_AREA_STRUCT * VMP): The input parameter is the mm of the current process, which needs to be inserted VMP. The function of INSERT_VM_STRUCT is to insert VMP in the MMAP chain or MMAP_AVL tree of the current process MM by address size sequence, and insert the VMP into the I_MMAP ring (loop sharing chain) of VMP-> Inode.

avl_insert_neighbours (struct vm_area_struct * new_node, ** ptree, ** to_the_left, ** to_the_right): Input parameters for the current need to insert a new node vma new_node, target mmap_avl ptree tree, the new node is inserted ptree node to its left and Its node (left and right sides are sorted in VMA-> VMA_END size in mmap_avl). The function of avl_insert_neighbours is inserted into the new VMA node new_node to the target mmap_avl tree ptree, and call avl_rebalance to keep the PTREE balance tree characteristics, and finally returns the node on the left side of New_Node and its node. AVL_REBALANCE (STRUCT VM_AREA_STRUCT *** NODEPLACES_PTR, INT Count): The input parameter is pointer data node data_ptr [] (each element represents a MMAP_AVL sub-tree that needs to be balanced), the number of data elements in the data element. The feature of AVL_REBALANCE is from NodePlace_Ptr [- Count] until NodePlace_PTR [0] loops balances each MMAP_AVL sub-tree, and ultimately makes the entire MMAP_AVL tree balance. Down (STRUCT SEMAPHORE * SEM): The input parameter is synchronized (enter the critical region) sem. Down's function is locked according to the setting of the current semaphore (blocking other processes from entering the critical zone) and proceeding or enters the waiting state (waiting for the other process to complete the exit critical zone and release the lock). Down Definition in /inClude/Linux/Sched.h: Extern inline void Down (Struct Semaphore * SEM) {IF (SEM-> Count <= 0) __down (sem); SEM-> Count -;} Up (Struct Semaphore * SEM) The input parameter is synchronous (enter the critical area) semapon. The function of the UP is based on the setting of the current semaphore (when the value of the signal is negative: indicates that there is a process waiting to use this critical area) release the lock. UP definition in /include/linux/sched.h: Extern inline void up (struct semaphore * sem) {SEM-> Count ; Wake_UP (& Sem-> Wait);} kfree_s (a, b): kfree_s Definition in / include include /Linux/malloc.h: #define kfree_s (a, b) kfree (a). Kfree () will be discussed in detail in later 3.3. AVL_NEIGHBOURS (STRUCT VM_AREA_STRUCT * NODE, * TREE, ** TO_THE_LEFT, ** TO_THE_RIGHT): Enter the parameter as the VMA node node, target mmap_avl tree Tree, Node left, and its node (left and right) Node is sorted in VMA-> VMA_END size in MMAP_AVL). The function of avl_ neighbours is to find the node on the left side of the Node on the target MMAP_AVL tree Ptree according to the lookup condition Node, and the nodes on the right and return.

AVL_REMOVE (STRUCT VM_AREA_STRUCT * NODE_TO_DELETE, ** PTREE): Enter the parameter to Node_to_Delete and target mmap_avl tree ptree. The function of AVL_Remove is to find node node_to_delete in the target MMAP_AVL tree PTree and remove it from the balance tree and call AVL_REBALANCE to keep Ptree's balance tree features. REMOVE_SHARED_VM_STRUCT (STRUCT VM_AREA_STRUCT * MPNT): The input parameter is the VMA node MPNT that needs to be removed from the inode-> IMMAP ring. The function of remove_shared_vm_struct is to delete the node from the inode-> IMMAP ring with the VMA node MPNT. [table of Contents]

Memory.c

Among Memory.c, Linux provides several functions for virtual memory operations, including copying, new page table, clear page table, handling method of pages, and more to the virtual page.

[table of Contents]

COPY_PAGE

1. Static Inline Void Copy_page (unsigned long from, unsigned long to)

In order to save memory, in the system, each process usually uses shared memory, that is, different processes can share the same segment or data segment. When a process occurs on the shared memory, in order not to affect the normal operation of other processes, the system will copy the memory block for use, which is the so-called Copy-ON-Write. mechanism. Copy_page is a function of providing replicating memory features. It calls the standard memory operation function in the C language, copies a virtual memory page of the first address to the first address to TO.

[table of Contents]

CLEAR_PAGE_TABLES

2, void clear_page_tables (struct task_struct * tsk)

The function of CLEAR_PAGE_TABLE is to clear all items in the PGD page in the incoming structure TSK, and the space occupied by the secondary page table is released. Incoming Clear_Page_Tables, the TSK structure of the current process, after obtaining the first-level page directory pointer PGD of the process, uses a loop mode, call free_one_pgd to clear the PGD table. Table 1024 items. In Free_ONE_PGD, the actual function is only called once_one_pmd (in 80x86, due to hardware limitations, only two levels of address maps are combined with PMD with PGD). In Free_ONE_PMD, the function calls PTE_FREE release the physical space accounted for in the second-level page table of the PMD (process code, the physical memory released by DO_MUNMAP) and assigns PMD to zero.

CLEAR_PAGE_TABLE is called when the system starts an executable image or loads a dynamic link library. DO_LOAD_AOUT_BINARY () or DO_LOAD_AOUT_BINARY () in FS / EXEC.C calls flash_old_exec, the latter calls exec_mmap, and exec_mmap calls clear_page_table. Its main function is to clean up the page table in the copyed mm_struct when starting a new application, and release all the original secondary page tablespaces.

[table of Contents]

OOM

3, Void OOM (Struct Task_struct * task)

Returns the error message.

[table of Contents]

Free_PAGE_TABLES

4, void free_page_tables (struct mm_struct * mm)

In Free_Page_Table, most of the code is consistent with the function in Clear_Page_Table. It is different that the function is finally called PGD_Free (page_dir), ie space occupied by the secondary page table, and also releases the space occupied by the first-level page directory. This is because free_page_tables are called __exit_mm call, __ exit_mm is called by Do_exit (kernel / kernel.c). When the process is aborted, the system exits or system restarts you need to end all the processes with DO_EXIT. During the end of the process, free_page_table will be called all the space of the process, and of course, the space occupied by the release process first-level page directory. [table of Contents]

NEW_PAGE_TABLES

5, int new_page_tables (struct task_struct * tsk)

The main function of this function is to create a new page directory, and its main process is as follows:

• Call PGD_alloc () to apply for a 4K space for the new page directory.

• Copy the contents of the initialization process from 768 items to 1023 items to the new page table (all processes share 3G ~ 4G memory in virtual spaces, that is, all the same storage space can be accessed when the core state ).

· Call the macro set_page_dir (include / asm / pgtable.h) Remove the value of the process control block TSK-> TS-> CR3 to the first address of the new page directory, and change the value of the CR3 register in the CPU to new The first address of the page directory table, so that the new process enters its own running space.

· Change TSK-> mm-> PGD to the first address of the new page directory.

· New_page_tables is called by copy_mm, and the copy_mm is called by COPY_MM_DO_FORK, and the two functions are in kernel / fork.c. At the same time, new_page_tables can also be called in EXEC_MMAP (FS / EXEC.C). That is, the generation of new processes can pass two ways, one is fork, dynamically generate new processes in the program, so that the new process of the new process is originally inherited from its parent process, the other is Run an executable file image, copy the image to the TSK structure through Exec.c in the file system. Both methods need to call new_page_tables to assign a page directory table for the new process.

[table of Contents]

COPY_ONE_PTE

6, static inline void copy_one_pte (PTE_T * OLD_PTE, PTE_T * New_PTE, INT COW)

Copy the original PTE page entry to New_PTE, the process is as follows:

· Detecting whether OLD_PTE is in memory, if not in physical memory, call SWAP_DUPLICATE Press OLD_PTE to copy the OLD_PTE to the memory in the SWAP file, and assign the OLD_PTE's entry address to new_pte and return. Convertent to 3.

Get the page number of the physical address corresponding to OLD_PTE.

· Determines whether the page number is reserved for the system, if the system is reserved, these pages are used under the core state, and the user's process is not written, and simply returns the OLD_PTE pointer to new_pte back to new_pte. . In vicetime, the PTE is a normal memory, then turn to 4.

· Remote the protection flag for the OLD_PTE based on the incoming C-O-W flag, if the page is from Swap_Cache, set the "Dirty" flag on the OLD_PTE page. Assign the OLD_PTE to New_PTE.

• Add 1 of the numerical count of the number of physical memory usage processes in the MEM_MAP structure. [table of Contents]

COPY_PTE_RANGE

7, static inline int Copy_PTE_RANGE (PMD_T * DST_PMD, PMD_T * SRC_PMD,

Unsigned long address, unsigned long size, int code

Copying COPY_ONE_PTE by looping to copy the length of the length of the address Address from the source SRC_PMD to the DST_PMD. If the address of the address is not assigned in DST_PMD, you will be assigned 4K space first to the third-level page table PTE table. (Copy 4K space per call. You can copy up to 4M space in a copy_pte_range).

[table of Contents]

COPY_PMD_RANGE

8, static inline int Copy_PMD_RANGE (PGD_T * DST_PGD, PGD_T * SRC_PGD,

Unsigned long address, unsigned long size, int code

Copying COPY_PTE_RANGE by looping to copy the length of the length SIZE starting from the source SRC_PGD to the DST_PGD in a space from the source SRC_PGD. If the page table item that is not assigned an address is Address in DST_PGD, the corresponding PMD assignment directory item is given in the Level 1 (also a secondary) page table.

[table of Contents]

COPY_PAGE_RANGE

9, int Copy_page_range (struct mm_struct * dst, struct mm_struct * src,

Struct VM_Area_Struct * VMA)

The main function of this function is to copy a VMA block of a task or process to another task or process. Its working mechanism is a loop call COPY_PMD_RANGE, copies all virtual spaces in the VMA block to the corresponding virtual space. Before doing replication, you must ensure that the copied virtual space corresponding to the new task must be zero. COPY_PAGE_RANGE Press DUP_MMAP () -> COPY_MM () -> DO_FORK () to be called (above three functions in kernel / fork.c). When the process is created, you need to copy all virtual space from the parent process, and the copy_page_range is the task.

[table of Contents]

free_pte

9, Static Inline Void Free_PTE (PTE_T Page)

The Volume Page Page is in memory, and does not reserve memory for the system (as in the system reserved area, it cannot be deleted).

If Page is in a swap file, call SWAP_FREE () is released.

[table of Contents]

Forget_pte

10, Static Inline void forget_pte (PTE_T Page)

If Page is not empty, call Free_PTE to release it.

[table of Contents]

ZAP_PTE_RANGE

11, static inline void zap_pte_range (pmd_t * pmd, unsigned long address,

Unsigned long size

Zap is abbreviated by Zero All Pages. The function of this function is to start from the virtual address address in the PMD, the length of the memory block is called PTE_clear by looping PTE_Clear, call free_pte to save the physical memory or swap space in the space. Page release. Before released, you must check that the memory block from Address is Size has no PMD_size. (Overflow can escape the range of 0 to 1023).

[table of Contents]

Zap_PMD_RANGE

12, static inline void zap_pmd_range (pgd_t * dir, unsigned long address, the function structure is similar to ZAP_PTE_RANGE, complete the clearance of all PTEs falling in Address to Address Size intervals by calling zap_pte_range. Zap_PMD_RANGE to clear the physical memory of 4M space.

[table of Contents]

ZAP_PAGE_RANGE

13, int zap_page_range (struct mm_struct * mm, unsigned long address, unsigned long size)

The function structure is similar to the first two functions. Clear all the corresponding PMDs from Address start to the ADDRESS SIZE length. The main function of zap_page_range is to be cleared in the process of memory shrinkage, release memory, exiting the virtual map, or moving page table, will not be cleared from the three-level page table of the process. (When discussing clear_page_tables, it is mentioned that when the process exits, the page table is released, first guaranteed to clear the page table, and ensure that the process does not occupy 0 ~ 3G when it is in an exit state.)

[table of Contents]

ZEROMAP_PTE_RANGE, etc.

14, static inline void zeromap_pte_range (PTE_T * PTE, UNSIGNED Long Address,

Unsigned long size, PTE_T ZERO_PTE)

15, static inline int zeromap_pmd_range (pmd_t * pmd, unsigned long address,

Unsigned long size, PTE_T ZERO_PTE)

16, int ZEROMAP_PAGE_RANGE (unsigned long address, pgprot_t prote)

These three functions are very similar to the three functions, and their function is to release the physical memory corresponding to the virtual space from the address address, the length of the memory block of the size of size, will point to these areas. The PTE is all about the physical page specifically 4K, all 0 in the system. ZeroMap_page_range is not referenced in the kernel code, this function is left behind Linux, which has been replaced by zap_page_range in the new version.

[table of Contents]

REMAP_PTE_RANGE, etc.

17, Static Inline Void Remap_PTE_RANGE (PTE_T * PTE, UNSIGNED Long Address,

Unsigned long size, unsigned long offset, pgprot_t protein

18, Static Inline int Remap_pmd_range (pmd_t * pmd, unsigned long address,

Unsigned long size, unsigned long offset, pgprot_t protein

19, int REMAP_PAGE_RANGE (unsigned long farm, unsigned long size,

PgProt_t protot

These three functions are the same as the previous function, and the layer is called, and now describes the role of the last function. The function of remap_page_range is to map the virtual memory block that is originally mapped to the virtual memory address from the virtual memory block to the virtual memory of the offset OFFSET as the start address, and the original PTE, PMD item is cleared. . This function is also a step-by-step call, in Remap_PTE_RANGE, mapped to the new virtual memory page entry PTE by SET_PTE. The function of the REMAP_PAGE_RANGE function is similar to the functions described in Remap.c below, so it is not used in Kernel. [table of Contents]

PUT_DIRTY_PAGE

20, unsigned long put_dirty_page (struct task_struct * TSK, UNSIGNED Long Page,

Unsigned long address

Linking the virtual memory page Page to the virtual address of the virtual address in TSI TSK, the main calling process is as follows: put_dirty_page-> setup_arg_page-> do_load_xxx_binary (xxx is AOUT or ELF, these functions are fs / exec.c ), Its function is to copy the relevant stack information, environment variables, etc. to the current process when loading the executable file.

[table of Contents]

HANDLE_MM_FAULT

21, void handle_mm_fault (struct vm_area_struct * vma, unsigned long address,

INT WRITE_ACCESS

Used to handle the shortage interrupt in the Alpha machine

[table of Contents]

mmap.c

In mmap.c, a function supporting process memory management is mainly provided, mainly including DO_MMAP, DO_MUNMAP, etc. to manage functions of the virtual block stack AVL number of the process.

For some operations avl tree: 1, static inline void avl_neighbours (struct vm_area_struct * node, struct vm_area_struct * tree, struct vm_area_struct ** to_the_left, struct vm_area_struct ** to_the_right) Looking to the preamble node tree node in the tree and avl after The order node, put the result in the pointer TO_THE_LEFT and TO_THE_RIGHT, that is, * to_the_left-> next = node, node-> next = * to_THE_RIGHT. In actual search, the process is to find the leftmost node of the left node in the Node node and the leftmost node of the right node, and the AVL tree search can improve efficiency.

2, Static Inline void avl_rebalance (struct vm_area_struct *** nodeplaces_ptr, int count) will restore the imbalanced AVL tree due to insertion operation or delete operation. NodePlace_ptr is the root node of the subtree that needs to be adjusted, and Count is the height of the subtree.

Static Inline Void AVL_INSERT (Struct VM_AREA_STRUCT * New_NODE, STRUCT VM_AREA_STRUCT ** PTREE) Inserts the new node new_node into the AVL tree PTree and regenerates the tree to the balance AVL tree. When creating an AVL tree, insert the VMA module into the AVL tree, build a large AVL tree. When the process is created, copy the VMA chain that is copied from the two-way linked list after copying the parent's process.

4, static inline void avl_insert_neighbours (struct vm_area_struct * new_node, struct vm_area_struct ** ptree, struct vm_area_struct ** to_the_left, struct vm_area_struct ** to_the_right) insert a new node new_node ptree avl tree, and the regenerated equilibrium avl tree of the tree, Also return the front node and the rear sequence node of the new node. 5, Static Inline Void avl_remove (struct vm_area_struct * node_to_dlete, struct vm_area_struct ** ptree) Remove the node node_to_delete to be deleted from the AVL tree PTree. And regenerate the tree to the balance AVL tree. This function is used in the release of the virtual space and the VMA chain list.

7, static void printk_list (struct vm_area_struct * vma) 8, static void printk_avl (struct vm_area_struct * tree) 9, static void avl_checkheights (struct vm_area_struct * tree) 10, static void avl_checkleft (struct vm_area_struct * tree, vm_avl_key_t key) 11, static void avl_checkright (struct vm_area_struct * tree, vm_avl_key_t key) 12, static void avl_checkorder (struct vm_area_struct * tree) 13, static void avl_check (struct task_struct * task, char * caller) avl tree structure for detecting when debugging systems these functions are Correctness

14, Static Inline int vm_enough_memory (long pages) By calculating the space judgment on the current system is sufficiently called. The memory that can be used includes buffer memory, page cache, idle pages, swap cache, etc. in the main memory.

15. Static inline unsigned long vm_flags (unsigned longprot, unsigned long flags) provides macro functions to merge the page protection bit and flag.

16, unsigned long get_unmapped_area (unsigned long addr, unsigned long list) starts from virtual memory Address to find unallocated virtual space blocks greater than LEN, and returns the fast address.

17, unsigned long do_mmap (Struct File * file, unsigned long addr, unsigned long line "is a very important function in Linux virtual memory management, its main function is Map the image of the executable to virtual memory or map memory information in other processes to the virtual space of the process. The VMA of the mapped virtual block is added to the VMA AVL tree of the process. The process of operation is as follows, more detailed analysis, please refer to Lintao classmates and Xu Meifeng classmates. Check the given mapping length LEN is greater than one page, less than one task 3G and plus the process of the process is not overflow. Exit if it is not satisfied. If the memory of the current task is the lock, check whether the LEN will exceed the limit of the lock length on the current process. If you exit. If you are mapping from a file, check if the file is read. If there is no exit. Call GET_UNMAPED to get the unmapped continuous virtual space from the address address Address greater than the virtual block of the LEN. For example, the file control block has a corresponding mapping operation. Apply for a VMA structure for mapping organizations. Call VM_ENOUGH_MEMORY has sufficient memory. If you do not release the VMA applied for 6, exit. If it is a filemap, call file-> f_op_mmap maps the file to the VMA. Call INSERT_VM_STRUCT Insert the VMA into the AVL tree of the process. Merge the AVL tree. 18, void merge_segments (unsigned long start_addr, unsigned long enddr) After a constant mapping of the process virtual space, there are many VMA blocks in the process that can be merged, in order to improve the efficiency of the AVL tree lookup, reduce AVL Unnecessary VMA blocks in the tree typically need to put these blocks and Merge_SEGMENTS functions from start_addr to end_addr in the consolidated virtual space. Since only increased operation is possible, all MERGE_SEGMENTs are only called in DO_MMAP and UNMAP_FIXUP. The process of this function is as follows: The VMA block MPNT that satisfies the VM_END> Start_Addr from the found first block based on the start address START_ADDR. Call avl_neighbours find VMA block prev and next after the VMA bidirectional list. If Prev and MPNT are tail, and the same flag, the same operation, the same operation, etc., the same operation, etc., and vice versa. Call avl_remove Delete MPNT from the AVL tree to adjust the end address of the Prev and the rear sequence pointer. Remove the physical space occupied by the structure MPNT. PREV, MPNT, NEXT move down, such as not exceeding End_ADDR, returns 3.

19. Static void unmap_fixup (STRUCT VM_AREA_STRUCT * Area, unsigned long addr, size_t g releases some areas in the virtual space, will have four situations: release the entire VMA release to release the first half of the VMA to pull VMA The second half releases the release of the intermediate part of the VMA to the normal maintenance VMA tree, when the first case is to release the entire VMA. Also release the space occupied by the VMA structure. Second, the second half of the release is released, and the relevant information of VMA is modified. The second, the first half of the release is released, and the relevant information of the VMA is modified. In the fourth, since a hole appears in the VMA, it is necessary to add a VMA structure to describe the new VMA block. The work executed by unmap_fixup is to correct the impact on the VMA tree when the space is released. 20. INT DO_MUNMAP (unsigned long addr, size_t len) Do_munmap will release the virtual space corresponding to the VMA within the LEN space from the address addr. Do_munmap is called by the system calls SYS_MUNMAP (there is nothing to do with Sys_Munmap). The following is the process of this function: VMA block MPNT of the first VMA-> end> addr is found according to Addr according to Addr. Call avl_neighbours to find the adjacent pointers prev and next in the linked list. Put all the VMA blocks intersecting the virtual space Addr ~ Addr LEN in the Free Link list. At the same time, if the VMA link is in the shared memory, it is released from the annular lin list. Search for the free lin list in sequence, call the unmap_fixup release space. Call ZAP_PAGE_RANGE Clear zero will point to the PTE page item in the released virtual space. Call the KFree release the space occupied by the MPNT structure.

Remap.c This file provides several functions of the virtual memory reproduction map. These functions will be described below to analyze the roles of these functions in virtual memory management. Introduce the process of main functions in detail.

Static Inline PTE_T * GET_ONE_PTE (STRUCT MM_STRUCT * MM, UNSIGNED Long Addr Returns the corresponding page table item PTE in the virtual memory based on the input false memory address.

Static Inline PTE_T * ALLOC_ONE_PTE (Struct MM_STRUCT * MM, UNSIGNED Long Addr) Refers to the PTE according to the third-level page table mapping mechanism based on the third-level page table mapping mechanism according to the input virtual address AddR, if there is no corresponding item in the PGD table, assign one The PGD (PMD) entry is allocated in this entry to allocate PTE according to the Addr, returns the PTE.

Static Inline int Copy_one_pte (PTE_T * SRC, PTE_T * DST) assumes the value in the PTE (DST) entry into the value in the PTE (SRC), and then clears the value in the source PTE, according to the function of this function. Take MOVE_ONE_PTE more appropriate.

Static int move_one_page (struct mm_struct * mm, unsigned long@addr, unsigned long new_addr) Call GET_ONE_PTE based on the input virtual address OLD_ADDR to get the PTE item in the third-level page table, call COPY_ONE_PTE to move the PTE corresponding physical page pointer to On the PTE item corresponding to the new_addr, a virtual memory page is moved within the virtual space.

Static int move_page_tables (Struct MM_STRUCT * MM, UNSIGNED long new_addr, unsigned long list_addr, unsigned long LONG LON) Moves virtual memory from OLD_ADDR from OLD_ADDR to the virtual space that starts locations with new_addr, The following is the process of this function: assigns the memory length of the required movement to the offset OFFSET if OFFSet = 0, end. Convert the steering 2. Reduce the offset OFFSET minus a page length, call Move_ONE_PAGE to move from a page starting from the OLD_ADDR OFFSET to New_ADDR Offset. Turn to 4 if an error occurs. If OFFSET is not 0, turn to 1, and it ends. Call MOVE_ONE_PAGE Return all the page shift addresses that have been moved to the new address, call ZAP_PAGE_RANGE to clear the page PTE that will start from New_ADDR and return the error information -1. static inline unsigned long move_vma (struct vm_area_struct * vma, unsigned long addr, unsigned long old_len, unsigned long new_len) the start address of the virtual storage block vma vma is addr, old_len length memory block of extended length new_len The memory block, and finds a continuous area that can accommodate the NEW_LEN block in the virtual memory, and returns the first address. The workflow is as follows: Give the new VMA structure block new_vma allocate space, if it is not successfully returned, the error message is returned. Call GET_UNMAP_AREA Start from ADDR to find the first unused dummy hollow hole, the hole length is greater than the length of the given new virtual memory block, and assigns its first address to new_addr. If not recruited, turn to 9. Call MOVE_PAGE_TABLES moves from the length of the length of the ADDR to the memory area of OLD_LEN to the virtual space of the start address with new_addr. Modify the value about the start address and end the address in the New_VMA block. Insert the new New_VMA block into the two-way linked list and the AVL tree chained in the virtual memory of the current process. Calling Merge_SEGMENT to connect the different VMA segments of the address in the virtual space into a VMA block, while deleting the redundant VMA block. Start from the ADDR from the original space, the length is released from the virtual space of OLD_LEN. Modify all the length of the virtual memory in the MM structure, return to the new start virtual address new_addr. Release the VMA block New_VMA and return an error message.

ASMLINKAGE Unsigned long sys_mremap (unsigned long addr, unsigned long new_len unsigned long flags) sys_remap is a system call, its main function is extended or contracted in existing virtual space. Its main workflow is as follows: Check if the AddR address is less than 4096, such as less than, illegally, return. The original length OLD_LEN and the length new_len page that needs to be extended or contracted. If there is an OLD_LEN> New_LEN, the description is the shrink space, call DO_MUNMAP to release the spatial space from New_LEN to OLD_LEN. Returns the first address of the shrinkage addr. Find the first VMA block according to Addr to meet the VMA-> end> addr, check if the addr falls in a void, if it is returned. Check if the memory block that needs to be extended is in this VMA block, and the property returns an error message. If the VMA is uplocked, it is detected whether the upper lock memory extension is off, if it returns an error message. Detecting the dummy space of the current process that exceeds the system to the maximum space for the process after the extension is expanded. If it is, it returns an error message. If the length of the VMA block starts from the ADDR to the end of the block is OLD_LEN and (the length of the OLD_LEN is not equal to NEW_LEN or the dummy is unmovable), then steirement 9, reverse steering 10. Detecting whether the unallocated space from following the found VMA block is greater than the extended space. If it is greater than, then the extended space is directly hung after the found VMA block, modify the information related to the VMA block, and return the first address of the virtual block after the extended virtual block. Such as smaller than the steering 10. If the current virtual block is unmovable, the error message is returned. Conversely, calling MOVE_VMA to move the virtual block that needs to be extended to accommodate the virtual space of its length New_LEN. [table of Contents]

Partner algorithm

The page distributor of the 2.4 version of the kernel introduces the "Zone" structure, a page area is a large block of continuous physical pages. Linux 2.4 divides the entire physical memory into 3 pages, the DMA page area (zone_dma), Ordinary page area (zone_normal) and high-end page areas (zone_highmem).

The page area can make the page allocate a more destination, which is conducive to reducing memory fragmentation. Page allocation of each page is still using a partner algorithm. Partner algorithm divides the entire page into two levels of power-powered Collection, adjacent same page blocks are called partners, and a partner can be merged into a higher page collection.

The page release process of the partner algorithm is analyzed below.

; MM / Page_Alloc.c:

#define bad_range (zone, x) -> zone || (((x) -MEM_MAP) Offset || (((x) -mem_map)> = (zone) > Offset (ZONE) -> Size))

#define virt_to_page (kaddr) (MEM_MAP >> Page_shift) # define put_page_testzero (p) atomic_dec_and_test

Void free_pages (unsigned long address {Order is the page block size index, the size of the page block is (2 ^ Order) page. if (addr! = 0) __free_pages (Virt_to_page (addr), order);} void __free_pages (struct page * page, unsigned long order) {if __free_pages_ok (page, order) (PageReserved (page) put_page_testzero (page)!);} static void FASTCALL (__ free_pages_ok (struct page * page, unsigned long order)); static void __free_pages_ok (struct page * page, unsigned long order) {unsigned long index, page_idx, mask, flags; free_area_t * area; struct page * base; zone_t * zone; if (page-> buffers) BUG (); if ( Page-> mapping) bug (); if (! Valid_page (page)) bug (); if (PageLocked (Page)) bug (); if (PageDecrafter (Page); IF Bug (); if (page) bug (); if (pageinactivedirty (page)) Bug (); if (pageinactiveclean (page)) bug ();

Page-> Flags ~ ((1 page-> age = Page_age_Start;

Zone = Page-> zone; take the page area where Page is located

Mask = (~ 0ul) Base = MEM_MAP ZONE-> Offset; Homepage Page Page_IDX = Page - Base; Seeking Page Number IF (Page_IDX ~ Mask) page number in Page Must on Page Align bug (); index = page_idx >> (1 order);; ORDER; take the bitmap plane of this index page

Spin_lock_irqsave (flags);

ZONE-> Free_PAGES - = MASK; the number of free pages in the page area plus the number of pages that will be released (the mask value is negative)

While (Mask (1 struct page * buddy1, * buddy2;

IF (Area> = zone-> free_area max_order) If the highest secondary bug (); if (! test_and_change_bit (index, area-> map) test and reflect the files / * * the buddy page Is Still Allocated. * / Break; If the original position is 0, then the block is not partner, the operation is complete / * * Move the buddy up one level. If the original position is 1, then the block has a partner * / Buddy1 = base (page_idx ^ -mask); Reverse the border of the page number, get the starting point buddy2 = base page_idx; if (BAD_RANGE (ZONE, BUDDY1)) Is there a cross-page range bug () ; If (BAD_RANGE (ZONE, BUDDY2)) bug ();

Memlist_del (delete partner's free link Mask Area ; seeking higher secondary map flat index >> = 1; seeking higher secondary index number Page_IDX Mask; begging at the start page number of the top page} Memlist_add_head ( Page_IDX) - > List, the free link to this index will be obtained

Spin_unlock_irqrest (Flags);

/ * * We do not want to protect this variable from race conditions * since it's nothing important, but we do want to make sure * it never gets negative * / if (memory_pressure> NR_CPUS) memory_pressure--;.}

[table of Contents]

Page directory processing macro

For the 2-level paging mechanism of I386, each page directory word is 20-bit is a page number, and the lower 12 bits are page properties.

If the lower 12-bit shielding of the page directory word is 0, the entire page directory is the physical address of the corresponding page, and below is a commonly used page directory processing macro.

typedef struct {unsigned long pgd;} pgd_t; a word page directory structure typedef struct {unsigned long pmd;} pmd_t; Intermediate word page directory structure typedef struct {unsigned long pte_low;} pte_t; LAST word-level directory structure typedef struct { Unsigned long pgprot;} pgprot_t; page attribute word structure

PGD_T * PGD = PGD_OFFSET (mm_struct, addr); Take a process virtual address ADDR's first-level page directory word pointer, extended ((mm_struct) -> PGD ((Address >> 22) 0x3FF)) PGD_T * PGD = PGD_OFFSET_K AddR) Take the first-class page directory word pointer of the kernel address AddR, expand (init_mm.pgd ((address >> 22) 0x3ff);

PMD_T * PMD = PMD_offset (PGD, ADDR); From the first-class page directory word finger pin ADDR's intermediate page recorded pointer, in the 2-level paging system, their value is the same, extended (PMD_T *) (PGD) ;

PTE_T * PTE = PTE_OFFSET (PMD, ADDR) From the Intermediate Page Directory Finger Pin Take AddR's Last Page Tablet Pointer, extended (PTE_T *) ((PMD-> PMD 0xFfff000) 0xC0000000) ((Addr >> 12 0x3FF);

Struct Page * Page = PTE_PAGE (PTE_VAL)); Take the page mapping pointer of the last page directory word PTE_val, extended (ME_MAP (PTE_VAL.PTE_LOW >> 12))))

PTE_T PTE_VAL = PTEP_GET_AND_CLEAR (PTE); Take the value of the last level page directory word pointer PTE, and clear the directory word to (PTE_T) {XCHG (...

PTE_T PTE_VAL = MK_PTE (PAGE, PGPROT); combine the page mapping pointer Page and page properties PGPROT to page directory words, extended to (PTE_T) {(PAGE - MEM_MAP) PTE_T PTE_VAL = MK_PTE_PHYS (Physpage, PgProt); Physpage The page and the page property word combination are page directory word, extended to (PTE_T) {physpage >> 12Unsigned long addr = pmd_page (pMD_VAL); Take the page directory virtual address represented by the intermediate page directory word, expand (UNSIGNED Long (PMD_VAL.PMD 0xFFFFF000 0xC0000000));

SET_PTE (PTE, PTE_VAL); Set the final page directory word, extend to * PTE = Pteval; set_pmd (pmd, pmd_val) Set the intermediate page directory word, extended to * PMD = PMD_VAL; set_pgd (pgd, pgd_val) Setting up the page Directory word, extended to * pgd = pgd_val;

[table of Contents]

MM author's article

Linux mm: design for a zone based memory allocator

Rik Van Riel, July 1998

One of the biggest problems currently facing the Linux memory management subsystem is memory fragmentation. This is the result of several developments in other parts of the Linux kernel, most importantly the growth of each process'es kernel stack to 8 kB and the dynamic allocation of DMA and networking buffers. These factors, together with a general speedup of both peripheral hardware and the device drivers has lead to a situation where the currently used buddy allocator just can not cut it anymore. This white-paper is divided in 3 pieces, The Problem, The Solution and Some Actual Code. I Need A Lot of Comments and Hints for Possible Improvement, So Feel Free To Email The To Me ...

The problemThe problem is caused by the fact that memory is allocated in chunks of different sizes. For most types of usage we just allocate memory one page (4 kB on most machines) at a time, but sometimes we give out larger pieces of memory ( 2, 4, 8, 16 or 32 Pages Atce). BECAUSE OF THE FACT THAT UNIX (AND Linux) Machines Have a Completely Full Memory (Free Memory Is Wasted Memory), IT IS next to Impossible To Free Larger Area's and the best we can do is be very careful not to hand out those large areas when we only need a small one.There have been (and there are) several workarounds for this fragmentation issue; one of them (PTE chaining) even involves a physical to logical translating, almost reverse page table-like solution. With that project, we can swap out pages based on their physical address, thus force freeing that one page that blocked an entire 128 kB area. This would solve most of our problems, except when That last page is unswappable, for example a page Table OR a program's kernel stack. In that case, we're screwed regardlessly of what deallocation scheme we're using.Because our inability to hand out larger chunks of memory has impact on system functionality and could even have impact on system stability it seems warranted to sacrifice a little bit of speed (the buddy system is fast!) in order to solve most of the above problems. The main problem with the current system is that it does not differentiate between swappable and unswappable memory, leading to a system where page .

This problem is made even worse by the fact that on some architectures we can only do DMA to addresses under 16 MB and it will undoubtedly show up again in some years when we all have 16 GB of memory and try do do DMA to those oldie 32 bit PCI cards that do not support dual cycle address mode :-) The solutionThe solution is to hand out free zones of 128 kB large, and to use each zone for one type of usage only. Then we can be sure that no page tables Interfere with the freeing of a zone of user memory, and we can always just free an area of memory.in The Current Linux Kernel, We Have The Following Uses for Memory:

reserved memory, kernel code and statically allocated kernel structures: after system boot we never much with the layout of this memory so it's a non issue wrt the allocatoruser memory:. this memory can be swapped out and / or relocated at will, it is allocated one page at a time and gives us no trouble, apart from the fact that we always need more than we have physically available; no special requirementskernel stack: we allocate 8 kB (2 pages) of unswappable kernel stack for each process; each of those stacks needs to be physically contiguous and it needs to be in fast memory (not in uncached memory) page tables: page directories are unswappable, page tables and (on some machines) page middle directories can be moved / swapped with great caution; the memory for these is given out one page at a time; we only look up the page tables every once in a while so speed is not very critical; when we have uncached memory, we'd rather use it for page tables than for user pagessmall SLAB : SLAB MEMORY is used for dynamic kernel data; it is allocated and freed at will, unfortunately this will is not ours but that of the (device) driver that requested the memory; speed is criticallarge SLAB: the same as small SLAB, but sometimes the kernel wants large chunks (> 2 pages); we make the distinction between the two because we do not want to face hopeless fragmentation inside the SLAB zones ... DMA buffers: this memory needs to be physically below a certain boundary (16 MB for ISA DMA) and is offen allocated in chunks of 32, 64 or 128 kbfor sales (<

16 MB) machines, the above scheme is overkill and we treat several types of usage as one. We can, for instance, treat large SLAB and DMA the same, and small SLAB, kernel stack and page table can be allocated in the same zones . too Small slab and kernel stack will be treated the same on every machine; the distinction is only made because I want the documentation to be complete.In addition to this, we can differentiate between 3 different kinds of memory: DMA memory: this memory is located under the 16 MB limit and is cached by the L1 and L2 caches'normal 'memory: this memory is located above the DMA limit and is cached by the L1 and L2 caches, it can not be used for DMA buffersslow memory: this memory is not cached or present on an add-on board, it can not be used for DMA buffers and using it for time critical kernel stack and SLAB would be disastrous for performance; we also do not want to use it for CPU intensive user ApplicationSsince We don't want to Waste the Slow Memory WE Might HAV e, we can use that for page tables and user memory that is not used very often. If we have user memory in slow memory and it turns out that it is used very often we can always use the swap code to relocate it to fast Memory. DMA Memory IS Scarce, So We Want To Allocate That Only We Specify Need It or WHEN We don't have any other memory Left.this Leads to the Following Zone Allocation Orders:

Page allocationFor SLAB, page table and DMA memory we always try to allocate from the fullest zone available and we grab a free zone when we're out of our own memory. In order to grab the fullest zone, we keep these zones in a ( ?. partially) sorted order For large SLAB / DMA areas we will also want to keep in mind the sizes of the memory chunks previously allocated in this zone.User pages are kept on a number of linked lists: active, inactive, clean and free . We allocate new pages in the inactive queue and perform allocations from the free queue first, moving to the clean queue when we're out of free pages. inactive pages get either promoted to the active queue (when they're in heavy use) or demoted to the clean queue (when they're dirty, we have to clean them first). Pages in the clean queue are also unmapped from the page table and thus already 'halfway swapped out'. Pages only enter the free list when a PROGRAM free () s................

In order to be able to free new zones (for when SLAB gets overly active), we need to be able to mark a relatively free zone force-freeable. Upon scanning such a page kswapd will free the page and make sure it is not allocated again.When the PTE chaining system gets integrated into the kernel, we can just force-free a user zone with relatively few active pages when the system runs out of free zones. Until then we'll need to keep two free zones and walk the page tables to find and free the pages.Actual codeThere's not much of actual code yet but all the administrative details are ready ALPHA status reached and the .h file is ready:.) / ** The struct mem_zone is used to describe a 32 Page Memory Area. * /

Struct Mem_zone {MEM_ZONE * prev, next; / * the previous and next zone on this list * / unsigned long buy; / * used pages Bitmap for slab, etc !!! count for user * / unsigned long flags;

/ ** flags for struct_mem-> flags * /

#define ZONE_DMA 0x00000001 / * DMA memory * / # define ZONE_SLOW 0x00000002 / * uncached / slow memory * / # define ZONE_USER 0x00000004 / * usermode pages, these defines are for paranoia only * / # define ZONE_SLAB 0x00000008 / * large SLAB * / # define ZONE_STK 0x00000010 / * kernel stack and order-1 SLAB (and order-0 SLAB if there is slow memory) * / # define ZONE_PTBL 0x00000020 / * page tables and one-page SLAB (except when there is slow memory) * / # define ZONE_DMA 0x00000040 / * DMAbuffers * / # define ZONE_RECL 0x00000080 / * We are reclaiming this zone * / # define ZONE_0 0x00000100 / * loose pages allocated * / # define ZONE_1 0x00000200 / * order-1 (2 ^ 1 = 2 page) chunks allocated * / # define ZONE_2 0x00000400 / * etc ... in order to help in buddy-like allocation for * / # define ZONE_3 0x00000800 / * large SLAB zones on small memory machines. * / # define ZONE_4 0x00001000 # define ZONE_5 0x00002000 / ** Memory Statistics * /

TYPEDEF STRUCT {UNSIGNED Long Free;} zone_stats_t;

struct memstats {struct zone_stats_t ptbl; struct zone_stats_t stk; struct zone_stats_t slab; struct zone_stats_t dma; / * Slightly different structs for these * / struct user {unsigned long active; unsigned long inactive; unsigned long clean; / * we do lazy reclamation * / unsigned long free;}; struct free {unsigned long dma; / * different memory types * / unsigned long normal; unsigned long slow;}; struct misc {unsigned long num_physpages; unsigned long reserved; / * reserved pages * / unsigned long Kernel; / * taken by static kernel stuff * /};}; / * this is where we find the degreent zones * /

struct memzones {struct free {struct mem_zone dma; struct mem_zone normal; struct mem_zone slow;}; struct mem_zone dma; struct mem_zone user; struct mem_zone slab; struct mem_zone stk; struct mem_zone ptbl;};

[table of Contents]

[This document

Good Friends · Collectors Automatic Generation]

转载请注明原文地址:https://www.9cbs.com/read-60202.html

9cbs

New Post(0)