Windows memory mechanism resolution
By Leezy_2000 03-9-3 9:38
Foreword
For a long time before writing this article, it is quite confused for the Windows memory mechanism. How is the memory space of each process isolation and sharing? GDT (global description table) is still there, the segmentation mechanism go there? Since we have virtual 4G space and structural abnormalities, what is the distribution of memory still fail? When is STACK overflow? ---
When I made these problems clearly, I wrote this article for myself, I hope to help everyone. At the same time, because of writing Windows memory this article, I will try my best to do with others' content.
Shortly after the pen, I found that IMQuestion wrote a few very good articles for Windows memory, the general title called "JIURL playing Win2K memory", recommended reading.
First, the spirit
Windows memory management mechanism, the core of the bottom layer is the paging mechanism. The paging mechanism allows each process to have its own 4G virtual space so that we can run procedures with virtual linear addresses. Each process has its own work set, and the data in the work focuses indicates how the virtual linear address corresponds to what physical address. The process switching process is the process of workset switches, such as Matt Pietrek said that if only virtual addresses are given, then this address is meaningless. (See Figure 1)
In the linear address space formed by the paging mechanism, we have a further partition of memory, stack, free storage, etc. The API that operates on the heap has HeapCreate, HeapAlloc, and the like. Manipulating the free storage API has VirtualalloC, etc. In addition, the memory mapped file should also be a free storage space. The stack is used to store function parameters and local variables. With the establishment of STACK Frame, it is automatically growing and reduced.
Speaking here, some people will ask questions: The X86 CPU segmentation mechanism is required, and the paging mechanism is optional. Why here only mention the paging mechanism. Then I tell you that the segmentation mechanism still exists. First, in order to comply with the previous 16-bit programs, the second is to distinguish between Ring 0 and Ring 3 privilege levels after all. Look at the GDT (global description table) with Softice, you will basically see the following:
GDTBASE = 80036000 LIMIT = 03FF
0008 Code32 Base = 00000000 LIM = fffffffff DPL = 0 P RE
// kernel state Driver code segment
0010 DATA32 BASE = 00000000 LIM = ffffffff DPL = 0 P RW
// Data segment of internal nuclear stamp Driver
001b code32 base = 00000000 LIM = fffffffffffffffffffffffffffff DPL = 3 P RE
// Application code segment
0023 DATA32 BASE = 00000000 LIM = ffffffff DPL = 3 P RW
// Data segment of the application
What does this mean?
Let's take a look at the generation process of linear addresses (see Figure 1). From this, we should draw conclusions. If segment base address is 0, then this segment can be seen as there is no existence because the offset address is the final linear address.
There are also two segments that exist for Kernel Processor Control Region and User Thread Environment Block. So if you see MOV ECX, FS: [2C] is not surprised when you are in disassembly, how to use the logical address instead of linear addresses. This will be described later in places involving exceptional processing.
Second, say from STACK
From my personal experience, I will talk about the most articles in my memory, saying the least STACK. The reason why I am referred to here is that stack is actually more important than the pile. You can have a procedure that doesn't use a pile, but you can't use Stack, although the management of Stack is determined by the compiler, Less error. By link switch / stack: reserve [, commit] can specify the STACK size of the process main thread. If you do not specify a DWSTACKSIZE parameter when you build other threads, the value specified by / stack will also use the value specified by / stack. Microsoft said that if you specify a bigger commit value will help the speed of the promotion, I have not verified, but the reason is. It is usually not necessary to set the Stack, and 1M space will be retained without the default, and two pages (8k for x86) are submitted. And 1M space is sufficient for most programs, but in order to prevent Stack Overflow, three points need to be pointed out that it is best to use a global array or use Virtualalloc to allocate when it takes a very large space. The second is to deliver or use pointer to pass. The size of the large size (this is probably the earth man knows), and the third is to consider the depth recursive will not produce stack overflow. If possible, I can use I mentioned in "Recurrent and Goto". The way to simulate recursive, at this time, you can use a stack or free storage to replace Stack. At the same time, structural exceptions are used to control whether STACK is submitted to a new page. (This part writes a shortcomings because many people have written, recommend reading Jeffery Ritcher "Windows Core Programming" Chapter 16)
Let's take a look at the use of Stack.
Suppose we have such a simple pole function:
INT __STDCALL Add_s (int X, int y)
{
Int sum;
SUM = X Y;
Return SUM;
}
This usually we will see such an instruction before calling a function.
MOV EAX, DWORD PTR [EBP-8]
Push EAX
MOV ECX, DWORD PTR [EBP-4]
Push ECX
At this point the function parameters are pressed into the stack, and the Stack pointer ESP is decremented, and the STACK space is reduced.
After entering the function, you will see the following instructions:
Push EBP
MOV EBP, ESP
SUB ESP, 44H
These three sentences establish a Stack framework and reduce ESP for local variable reserved space. After establishing the STACK framework, [EBP *] points to the function parameters, [EBP- *] points to the local variable.
In addition, in many cases you will see the following three instructions
Push EBX
PUSH ESI
Push EDI
These three sentences press the three universal registers into the stack so that these three registers can be used to store some variables, which in turn increases the running speed.
Very strange, my function does not use these three registers at all, and the compiler also generates the above three instructions.
Reading the contents of the STACK is based on the base pointer EBP. So correspond to SUM = X Y; a sentence you will see
MOV EAX, DWORD PTR [EBP 8]
Add Eax, DWORD PTR [EBP 0CH]
MOV DWORD PTR [EBP-4], EAX
Where [EBP 8] is x, [EBP 0CH] is Y, remember that the stack direction is from right, so y is on the X.
Let's take a look at the function when you exit:
POP EDI
POP ESIPOP EBX
MOV ESP, EBP
POP EBP
Ret 8
At this point, the STACK framework is restored, so that the ESP is the same as that just enters this function, RET 8 makes the ESP plus 8, so that the ESP is consistent with the function that does not call this function. If you use the __cdecl call rule, the caller is operated by the caller with the ADD ESP, 8, so that the size of the Stack is consistent with the failure of the function. The use of Stack is completely implemented by the compiler, as long as it is not overflow, it may be a memory intelligent management. The last two points to be added is: First, Stack is not automatically expanded like HEAP. If you use a light reserve, he will overflow on time. Second, don't think that you use the default parameters to link, you have 1M Stack, see the startup code, you know before you have Stack, c run -time
Library for a small part of the STACK.