Windows 95 System Programming Secrents Learning Notes --- Chapter 5 (2)

xiaoxiao2021-03-06  41

"Copy On Write" in Windows 95 (copying when writing)

Since Windows 95 does his best shared program code, we will naturally care about how to deal with this. Any questions? Oh, the debugger will write the break point instruction (INT 3, OPCODE 0XCC) in your program code. If the Code page of the debugger is shared by the two process, there will be a potential problem. To know, the debugger is only commissioned for a process, and even if the breakpoint is encountered, it should not be affected. When the operating system sees INT 3 and knows that the process is not in the debug state, it ends the process because it is an abnormality that cannot be processed. Ok, if the Windows 95's memory management system is really as I said, you have no way to debug a "DLL used in multiple processes" - there will be inevitably caused other processes inexplicably The end is over. Not to mention the debugging of an acting body, and the other actuator is still working properly.

Advanced operating systems such as UNIX, methods for dealing with this issue are so-called "Copy Con Write" mechanism. A system with a Copy ON Write mechanism (such as Windows NT), the memory manager uses the Paging mechanism of the CPU, shares the memory as much as possible, and some RAM PAGE copies one copy.

Give an actual example will be more clear. Suppose the two executors of a program are executed, share the same code Pages (both read-only properties). One of them is in debug status, and the user tells the debugger to put a break point in a certain place. When the debugger attempts to write an interrupt command, a Page Fault is triggered (because the Code Page has a read-only property). When this Page Fault is seen, the operating system is first concluded that the debugger attempts to read data in memory, which is legal. Then, the action "written to the shared Code Page" should not be allowed. The system then copies the affected page and changes the Page Table of the debugger to enable the mapping relationship to the newly copied page. Once the memory is copied and mapping, the system can make the write action pass. The action of writing (interrupt point) will only affect the copied page, which will not affect the original content.

"Copy ON Write" does not only send the program when sharing the program code. In Windows NT, the writable Data Pages is also a read-only attribute at the beginning. When the application writes data for one of the Page, the CPU generates Page Fault. The operating system then modifies this page to "readable writable". Why is you so troublesome? Because of this, the memory manager can also share other read-only Data Pages. If someone is written later, the "Copy ON WRITE" mechanism will refuse and provide RAM PAGES to each process later.

The greatest advantage of your Copy ON Write mechanism is to allow memory to share the sharing benefits as much as possible. The system will make a new copy of the shared memory only when necessary. Unfortunately, the Copy ON WRITE mechanism requires a brilliant memory management system, and has a delicate page table to manage the system, and Windows 95 is not enough because Windows 95 does not directly support your Copy on Write directly. This is a great pain for Windows 95 early users. After all, Microsoft has been selling, saying that all Win32 programs are as good as in Windows 95 and NT. When the main characteristics (such as the "Copy On Write" mechanism) are absent, "the same is as good as the execution", there is a vulnerability. Windows 95 is not blind and stupid, writing data into shared memory. Since some actions must be made to allow the debugger to work, Windows 95 supports a so-called "Copy ON WRITE virtual mechanism". In this virtual mechanism, the WriteProcessMemory action will occur when Page Fault is displayed on the shared memory. The operating system first determines if the address you want to write is in the shared memory, if yes, the system will copy a copy, then map the new Pages to the same linear address, and then start writing actions. The Phys program demonstrates that the Copy ON WRITE virtual mechanism is active.

Although WriteProcessMemory is sufficient to make the debugger to excerabble to most DLLs, it is not able to debug 2GB or more. Since SYSTEM DLLS is located on 2GB, such as Kernel32, the general application debugger has no way to debug them as among Windows NT. Try to see, launch your most familiar application debugging in Windows 95, try to enter a system call in STEP INTO. Whether it is a Visual C debugger or Turbo Debugger, STEP OUT, the system calls - even after you are in an anti-compilation window and require it to enter the call. If you want to enter Windows 95 system code, you need a system-level debugger, like Softice / W or WDEB386.

Translation:

STEP INTO and Step Out are commands on the debugger. The former means that the latter is to jump from the current function.

PHYS program

Looking under the surface of the Phys, it is some underlying system code, Microsoft may not want you to know so much. In a well-designed operating system, the application should not be able to process mapping relationships between actual memory and linear addresses. In general, there is no need to do this. But these are the core of the Phys program function. Since Windows 95 does not provide a way to get the relationship of page mapping, Phys has to bypass your operating system. Part of the PHYS is walking on the edge of the blade and promotes it in Ring0 (the highest permission of the X86 CPU). The general application is executed in RING3, and it is impossible to enter RING0 without the cautious control of the operating system. Since the Ring0 code required by Phys does not approval by the operating system, I must write a general mechanism that allows Ring3 Win32 programs to call Ring0 code. You can easily modify the Phys Ring0 code and put it in your own application.

In order to map the linear address to the actual address, the getPhysicalAddrFromLinera function must "Party with" page Tables. "Party with" is Microsoft's official terminology, which means that some things you should not do. Page Tables is a complex topic, I will describe it in the next section "Memory Context". If you don't know what is Page Tables, as long as you think it is a data structure, you can describe the mapping relationship between linear addresses and actual addresses. Page Tables are maintained by the operating system and is provided to the CPU. Read the CPU manual and you will find that Page Directory is saved by the register CR3. Unfortunately, you must have high privileges to take out the value of CR3. The value attempt to take out the CR3 in RING3 will only result in a general protection fault (exception 0DH). When Windows 95 sees this exception, it analyzes this instruction and discovers that the latter does not have enough permissions. Windows 95 does not end out of the programs that make this instruction, it is just a silent interest to control the control to the application. However, of course, there is no CR3 value. what does this mean? Windows 95 does not allow the app to attack Page Tables. Of course, I can write a VXD (which is executed in Ring0) Remove the value of the CR3, but I don't like my system around too much VXDs. In addition, even if I can get the value of CR3, there is a big problem. The CR3 value represents an actual address, but there is nothing a good way to convert the actual address to a linear address (Phys can only use linear addresses). I don't seem to do anything about CR3 values ​​unless I turn the paging mechanism.

Another idea is to see that Windows 95 can map the Page Tables to Ring3 code to the linear address directly. We know that the entire 4MB Page Tables are always mapped in linear addresses 0xFF800000 (8MB of the tip end of the linear address). So, can you generate a pointer, point to it, you can directly read the contents of Page Tables? No, not so good, these forms are not as good as you think. Page Directory and each of the data in each page table have a USER / Supervisor bit indicating that "any permission level can be accessed" or "only Ring0 code can access it". Each Page Table USER / Supervisor bit is 0, indicating that the entire 4MB Page Tables is a block for Ring3 code, and we must make our code in Ring0 to get Page Tables.

From a Ring3 Windows program to call Ring0 16-bit code, the focus is actually in the so-called CPU Call Gates, which provides a method for "low authority code calls high-level code, such as RING3 call Ring0". Since Windows does not give you such a Call Gates, you must enter the LDT and generate a Call Gate. In order to enter the LDT, use the INT 2FH sub function.

Call RING0 code in a Win32 program, of course, is more tricky, but not too difficult. The getPhysicalAddressFromLinear listed below is a good example. First, you have to call getRing0CallGate to generate a Call Gate Selector. This function accepts two parameters, the first is the 32-bit linear address of "Ring0 code you want to execute", the second is the number of DWORD parameters, which will be overwhelmed by RING0 code. One but you have a Call Gate Selector, the next action is to store it in a 6-digit remote pointer (that is, a fword). 6 digits? Yes, in the 32-bit mode, the remote call is implemented through a 16-bit Selector and a 32-bit offset address. The offset address is 32 bits, which suggests that the Selector will be a 32-bit section instead of the 16-bit section, which is a bit like the FLAT address mode of the Win32 program. We hope to make a remote call using this Call Gate Selector, which is to switch the CPU to Ring0. In the sample code below, the Call Gate Selector is stored in a higher three Words of 6 digits. The offset value of the pointer is not important because the CPU is ignored and loaded from the offset value recorded by the Call Gate descriptor. After generating this pointer, the program code is used to use the embedded assembly call to call this pointer function (because the C compiler only knows 32-bit short-range calls). I added two instructions to the CLI and STI before and after calling the Call Gate, which avoids interrupts in RING0.

DWORD getPhysicalAddrfromLinear (DWORD linear)

{

IF (! Callgate1)

{

Callgate1 = getRing0Callgate ((DWORD) _GetphysicalAddrfromLiner, 1);

}

IF (! Callgate1)

{

Word myfwordptr [3];

MyfWordPtr [2] = Callgate1;

__ASM Push [Linear]

__asm ​​CLI

__ASM Call Fword PTR [MyfWordptr]

__ASM STI

// The return value is in Eax. The Compiler Will Complain, But ... ..

}

Else

Return 0xfffffff;

}

From the Win32 program, enter Ring0 Some strange requirements. For some factors, I must write a PageTabl.asm. First, 16: 32 Remote calls make the CPU set 8-bit array to Stack, not a traditional 4 bit. Therefore, after setting the EBP Frame, the first parameter is EBP 0CH instead of EBP 08H. More importantly, when you want to return to RING3, you need a 16:32 RETF instead of 32-bit short-range returns. Like 16:32 remote calls, the compiler does not know how to generate a 16:32 RETF.

Now let me make a finishing. When you call RING0 from the Win32 program, the first step is to write Ring0 code and pay attention to the warning above. Next, you call getRing0Callgate in the Win32 code to pass your Ring0 function name and its parameters. Then generate a 16:32 remote pointer according to this Call Gate, and call it. Finally, when you no longer need that Ring0 function, call FREERING0CALLLGATE to release. The whole process is not very streamlined, but it is better than all of the operating system.

Memory contexts

Although the abstract description Memory Context is also good, but sometimes it is better to work. Windows 95 must maintain some data structures to record which linear address of the RAM map to the process to the process. To understand the Memory Context of Windows 95, you must understand the Paging Mechanism of the CPU. I will take you to quickly browse 80386 paging mechanisms, as for more advanced details, I will not mention it. If you are interested in paged, please refer to the Intel manual or other 386 architectural book.

The 80386 CPU uses two layers of query form to convert a linear address into an actual address, and then send to Address Bus. The first layer query table is called page directory, then 4KB is so large, can be seen as an array of 1024 DWORDs. Each DWORD contains an actual address, pointing to a 4KB space called a page table - it is also an array of 1024 DWORDs, with an actual address in each DWORD, pointing to 4KB Physical Memory (RAM).

To use the Page Directory and page table (Page Table), the CPU splits the 32-bit linear address into three parts, as shown in Figure 5-5. The highest is 10 bits to the CPU as the index of the page directory array, and select a page table (Page Table). The next 10 positions are used as the index of the page table, select one of the data, with the starting address of 4KB RAM. The last 12 positions are used to accurately indicate an actual binary bit for this 4kb RAM.

So where to find the page directory (Page Directory? CR3 registers are also! This is a special register introduced by 80386. The most rough generation of Memory Context is to generate a page directory and 1024 page tables for each process, and then change the contents of the CR3 register when appropriate time, point to the page directory of the current process ( Page Directory).

The problem with this approach is to map the entire 4GB address space, you need 1024 page tables, each size is 4KB. Each process light consumes 4MB of memory, which does not meet economic benefits. The practice of Windows 95 is to maintain a 4MB area as a page table, and sometimes modify the contents of page directory, so that the CPU can quickly change the page mapping.

Maybe you worry, the light uses 4MB for paging, is it too much? Oh, don't worry, the operating system can tell the CPU to say a page table (PAGE TABLE, 4KB) is not in memory, so 4KB RAM can be saved. Page Directory and Page Table (Page Table) racely truly uses nearly 4MB of actual memory, but they do use 4MB address space, starting with FF800000H. Page Directory is also located in this 4MB. They can be observed using SoftICE / W. You can easily find a linear address of page directory: Use the SoftICE / W's Cr command to remove the CR3 value. On my machine, CR3 is 6EE000H. This is an actual address, so you must convert it to a linear address to be used in the program. Softice / W's phys command can easily complete this, it will search for all page tables, find all linear addresses with "The actual address you specify". Down the Phys 6ee000H command, I get two linear addresses, where the second is FFBFE000H, which is located in the 4MB address space that is reserved to the page table.

Since we can find a page directory through Softice / W, we should be able to set up a hardware write breakpoint (Write Breakpoint) to prove or overthrow the arguments I said in front of Memory Context Swithing. If the interrupt point does work, it is indeed that the context switch is completed because of the operation of the page table, and the aforementioned write address can give us a line, let us know more clearly.

I am a small experiment on Softice / W, and the confirmation page directory does change. For observation, I turned back to see some instructions before writing actions, as shown below:

_ContextSwitch

0028: C0004856 MOV EAX, [C001084C]

0028: C000485B MOV EDX, [ESP 04]

0028: C000485F CMP EAX, EDX

0028: C0004861 JZ C0004893

0028: C0004863 PUSH ESI

0028: C0004864 Push EDI

0028: C0004865 MOV EDI, FFBFE000

0028: C000486A MOV ECX, [EDX 04]

0028: C000486D MOV ESI, [EDX]

0028: C000486F REPZ MOVSD

0028: C0004871 MOV ECX, [EAX 04]

0028: C0004874 SUB ECX, [EDX 04]

0028: C0004877 JBE C0004880

0028: C0004879 MOV EAX, [C00107E0]

0028: C000487E REPZ Stosd

0028: C0004880 XCHG EDX, [C001084C]

0028: C0004886 MOV EAX, EDX

0028: C0004888 MOV ECX, [C0010CDC]

0028: C000488E MOV CR3, ECX0028: C0004891 POP EDI

0028: C0004892 POP ESI

0028: C0004893 RET

_ContextSwitch's core action is the REPZ MOVSD and the REPZ StosD. The three MOV instructions before the REPZ MOVSD are used to set a thing to copy a memory from somewhere to another. The copy object is FFBFE000H, which is the starting position of the page directory we have seen earlier. This means that this is a new set of pages that are mapped to the page directory. Each DWORDS it replicates corresponds to one of the page table (maximum of 1024).

Another interesting thing is that the number of DWORDs that is moved is not written. Conversely, the program code is incorporated in ECX, with a number of dwords. The effect of the second instruction REPZ STOSD is not obvious, it is used to compare the DWords number replicated when this time replicated DWORDS "and" the previous _contextSwitch is called. " If this time is less than the previous time, it means that some pages are dedicated to the previous Memory Context, and the new memory context should not be seen. If necessary, REPZ Stosd will mark other pages (refer to the translation) as "non-present".

Translation:

My so-called "page directory data item" is "Page Directory Entries", and some people are referred to as PDE. As for Page Table Entries, some people are referred to as PTE.

Softice / W is very good to put the _contextswitch marked on the top of the program list. _ContextSwitch is a Services for VMM, and its address appears in the VMM Services table, which is pointed out by a flag of VMM Device Descriptor Block (DDB). How does Softice / W know the name of this service? Take a look at the VMM.inc in Windows 95 DDK. Each line is the service of VMM VXD if Vxd_Service is the service. You will see _ContextSwitch on the place where you are close to the bottom. The other two neighboring services: _PageModify and _PagEModifyermissions are also very interesting.

We found that Windows 95 must keep a set of Pages, as well as a page value, and use it for each Memory Context. Once again we can use the Softice / W's Addr command to verify:

In this list, Freecell, WinMtask, and Heapwalk are Win16 programs. Interestingly, even if the Win16 program can treat each other, Windows 95 is treated as separated processes, and serving different Memory Context. However, this is just a theory because the program code and data area of ​​Win16 are always loaded into the shared area (0-4MB, and 2GB). Therefore, Win16 programs can always see each other, even though they have different Memory Contexts.

Other processes in the above list are Win32 processes. The TABLES bit is easy to cause misuse, which actually refers to the number of pages required to constitute the Memory Context. Each page directory maps 1024 page tables, each Page Table Entry (PTE) maps to a 4KB area, each page directory entry (PDE) maps to 4MB linear address space. Note that the 16-bit program only uses two pages, because 16 programs do not require Win32 process zones (0x00400000-0x7FFFFFFFFFFFFfff). As for the Win32 process, you must need this area, but most of them are all "not present". For the Memory Context of Windows 95, I will no longer continue because it is too big for Windows 2000 / XP, and there is no value for the current learning.

Windows 95 Memory Management Function

The memory management function of Windows 95 is divided into four layers, and the upper function depends on the lower layer function. The bottom is the function provided by the VMM to allocate large block memory and operate Pages. The application does not call the function of this layer, and kernel32.dll will only use them to complete the high-level functions.

The second layer is the Virtualxxx function provided by the Kernel32: VirtualAlloc, VirtualFree, VirtualProtect, and more. These functions are based on VMM functions, used to manage large block memory and units in Page.

The next layer is the Heapxxx Heapxxx, including: HeapAlloc, HeapFree, HeapCreate, and more. They are approximately equivalent to memory management functions in the C library (such as Malloc, Free, etc.). In fact, in the C function library of Windows NT SDK, Malloc is just another packaging of HeapAlloc. The last layer is the Localxxx and Globalxxx function. But the Win16 is different, the two group functions are basically no different, such as Localalloc is identical to GlobalAlloc. KERNEL32 opens these two functions, but uses the same function address. Localxxx and Globalxxx are actually just a layer of packaging of Heapxxx. There is no much reason to use Localalloc and GlobalalloC in Win32 program. These memory management functions are no longer derived from Selector as Win16's GLOBALALLOC, nor will they dig space from the program's data area like Win16 Localaloc. They continue to exist in Win32, the main reason is that the original Win16 program is more susceptible to survival. The remainder of this chapter will deeply explore the four layers of functions. In addition to the bottom VMM function, I will provide virtual code for each memory management function. Some Win32 functions may not be made in Windows 95, or may be simply mapped to other functions. I am meaningful.

VMM function

The content here is no longer suitable for Windows NT series, which is no longer coming.

Win32 Virtual function

Translation:

The Virtual function here refers to the WIN32 API at the beginning of Virtual (all related to memory management). There is no relationship with virtual function in C .

Win32 memory management API functions is the Virtual function, such as Virtualalloc and VirtualProtect. These Virtual functions are used to manage and allocate large block memory. In Windows 95, the Virtual function is 4KB of the basic unit of memory, which makes them not suitable for replacing the Malloc and New of C / C . Most of them are a thin package of VMM functions. About this, when I show that the virtual code of the Virtual function is you will see. Win16 is the closest to these Virtual functions, such as GLOBALLOC. Win16's GlobalHeap function and Win32 Virutal functions allow you to assign large block memory. But different from GlobalHeap is that the Virtual function does not use Selector to point to the memory block. They do not use Selector with a minimum unit of 4kb. The Win16 GlobalHeap function allows you to configure the memory block of the phase 20h.

Virtualalloc

VirtualAlloc has multiple functions. Any VMM Memory Manager's attitude to each page of linear memory is free, reserved, or committed. Virtualalloc allows you to change the status of the Pages within a range. It can change the status of the page, from Free to Reserved, or from Free to Committed. In addition, it can also change the original reserved Pages to become a commcted state.

The last change, from reserved to committed, the realization of sparse memory and STACK is extremely valuable. The program first uses Virtualalloc to keep a large memory space, enough to meet any requirements in the program. Then the program sets a structural exception handler to search for Page Faults within the reserved memory. When these Page Faults occurs, the program calls VirtualAlloc again. This time Virtualalloc causes the Page Fault's Pages, change from the RESERVED status to the commcted status. In this way, the program can allocate huge memory and do not need to get the actual RAM. Only when these Pages are really used, they need to be mapped to the actual RAM.

In general, Virtualalloc is used by operating systems and applications, and allocated memory in the address space of the program (not 2GB below). However, it has an unapplicted flag (0x80000000) that allows it to get 2GB of memory. You can use the memory map file to complete the same thing. In fact, roughly said that the address range used by the memory map file is equivalent to the address assigned by Virtualalloc in 0x80000000 flag.

When the Win32 VirtualAlaloc function retains memory, it starts with the closest 64kb boundaries. However, it is not the VirtualalloC to do this "cutting" action, which is the VirtualAlloc call _PageReServe is completed.

VirtualAlaLoccu first checks if the requested memory is too large. The "too big" here means more than 2GB-4MB, which is the size of a linear address reserved area of ​​an application. Then, Virtualalloc calculates how much Pages are needed, and then move the starting address to the closest 4KB boundary, and move the last position to the closest 4kb boundary. So if you asked 2bit space, a last on a certain page, another in the front of another page, then VirtualaLalloc will try to keep two Pages.

转载请注明原文地址:https://www.9cbs.com/read-70679.html

New Post(0)