Windows 95 System Programming SECRETS
(Windows 95 system program design big mystery)
Original: Matt Pietrek
Note: Simon Wan
Memory Management
Win32 itinerary address space among Windows 95
In Windows 3.x, all programs are executed in the same address space. So any program is easy to read another program's memory. Worse, the program can also change the memory content of other programs, which is provided to those who have bugs that lead to hell. For example, 16-bit Windows programs can even get 16 USER DGROUP in Windows 95 and write some garbage. So the window system has to say to you.
Windows 95 gives an independent address space for each stroke. The so-called "independent address space", I mean that the program can only see its own memory, and the memory used by other trips is unable. More precisely, Windows95 Memory Manager uses the CPU "Basic" memory management philosophy to ensure that only the memory owned by the current itinerary will be imaged into the 4GB address space of the CPU. The RAM owned by other trips does not appear in the Page Tables of the current itinerary. The biggest benefit of this is that a problem program can only destroy itself, and will not affect other people.
Ink, you are too excited about this Windows new nature, I have to tell you that it is actually not new. The Unix is available in the year, and Windows NT is also true. We can only say that Microsoft currently pushed the top operating system with the most basic nature of the advanced operating system. As for Win32S, the sister of the harmonics in this Win32 family does not use the separation address space.
Although it is important to separate the memory of each trip, some memory still needs to be shared by all itinerary. That is to say, some pages in the linear address space of all itvelop should be imaged on the same RAM. why? The best example is SYSTEM DLLS. Every stroke requires kernel32.dll, and if each it is loaded into a brand new kernel32.dll, it will be an unconvisible super-waste. Therefore, kernel32 (and other System DLLs such as USER32) should be stationed in the shared area. When the Windows job system switches Page Tables to perform another trip, it will leave the image to shared memory. I will explain the necessity of sharing memory with other examples.
Since Windows 95 is separated by the memory of different strokes, any discussion of how "Windows 95 is arranged 4GB address space" will be inseparable from the so-called Memory Context concept. Memory Context is basically a series of RAM Pages, as well as linear addresses they are imaged. In another sentence, Memory Context is a view of the operating system to give a linear address of a stroke.
Every trip has its own Memory Context. When the Windows 95 scheduler pauses a stroke and makes another itinerary, it must switch Memory Context. Since every stroke has a Memory Context, sometimes it is called Process Context. Sometimes it is also known as Address Context. No matter what you put it, remember, the address itself doesn't make sense, unless you indicate which Memory Context is in this address.
From the top floor, the memory layout of the Win32 itinerary of Windows 95 is very simple. In the 4GB address range, the bottom 2GB (0 ~ 7FFFFFFFFH) is reserved to the application, and 2GB or more (80000000h ~ ffffffh) remains to the operating system. These two parts have all fine cuts. Figure 5-1 shows the various items in the 4GB address space. If you have a Windows 95 DDK (Demo: Development Kit), please read the "Page Mapping and Address Spaces" section in the "Arenas" topic in the line. The first 4MB address space is shared by each stroke in the system virtual machine. Among the 1MB of one, the memory image containing MS-DOS is loaded when Windows 95 is started. The interesting thing below 1MB also includes a lower part of 16-bit Global HEAP. As I said in Windows Internals Chapter 2, all 16-digit HEAP sector linear addresses in Windows 3.1 are above 2GB below 1MB. If it is configured with a GMEM_FIXED property, it is often under 1MB. You will see many 16-bit System DLLs in the initial 4MB of the address space because there are many (for example, KRNL386) that require "Fixed and Pagelocked" memory. This is very important, I will discuss it later.
The next area is 4MB to 2GB. This is the address space used by the Win32 itinerary. Every Win32 itinerary puts its own code, its own data, its own resources to this nearly 2GB range. When the switching action of MemoryContext occurs, it is actually changed to another set of Pages, mapped to this range. The RAM PAGES that is mapped to this area cannot be accessed by other strokes unless specified. In addition to the code and data of the application, the code and data used by any DLLs are also placed in this area. In this, you can also find the application's HEAP and STACK (there is a Stack for each execution). Win32 program preset is loaded with a very low position (4MB). This concept is a bit uncoordinated unless you really understand the page action. How can I have more than one program loaded to the same address? The answer is: they share the same linear address, but they are not the same actual address. In general, the linear address in the stroke does not map the actual address of the same value. Due to the relationship between paging operations, each stroke can think that it has 4MB to 2GB of the entire space. It can't see the memory of other trips, and other trips cannot be seen - that is, the same linear address is realized in the same line. Page "Magic" makes them actually distinguish.
The exceptions of the above rules (for each of the 4MB to 2GB address space) are: Windows 95 believes that "the same actual memory is open to multiple copies of the same program (execute individual, instances) sharing" is safe. Take the program code, because the program usually does not modify its program code, if you execute multiple copies of the same program, then Windows 95 saves memory is: put the actual memory image of the inner program code to each program. A copy of the address space.
From the purest operating system point of view, if each 16-bit trip has its own address space, it is best to be like 32-bit trips. Unfortunately, a large number of 16-bit programs rely on "Can see the memory of other programs" and survive. In order to retain the compatibility of 16-bit programs, Windows 95 is bound to provide power than Win32 itinerary. Windows NT 3.5 allows every Win16 stroke to run in its own address space, but therefore consumes more memory and leads to higher complexity. Designers of Windows 95 seem to feel that this benefit is not worthy of the cost of it. Since I have seen Windows 95, there is a question that caused my interest: How can the 16 programs still share their address space with different trip? The conclusion is that the memory used by the 16-bit procedure is always from 4MB or more and 2GB, so-called shared area. Now let's move your eyes to the upper half of 4GB. From Figure 5-1 you can see that it is cut to two. All strokes are shared between 2GB to 3GB and intended to use the RING3 operating system code. At the lowest part of this area, you will find 16-bit Global HEAP. And above, what you see is a memory image file. This is quite interesting and it is worth thinking.
If the memory image file is located in areas that can be shared by all the itinerary, it is clear that any stroke can see it, and even do not need to image actions on it (the translation: refers to the Win32 MapViewOffile action). Yes, this hypothesis is correct. Among Windows 95, a memory image file can be accessed by all strokes. This is different from Windows NT. Windows NT uses more intriguational paging mode, so that the memory image file can only be seen by the "imaging action for this file".
The top of the 2GB to 3GB area is 32-bit SYSTEM DLLS (kernel32, user32, etc.). In order to keep the most space to the memory image file, Ring3 System DLLS is loaded from 3GB to low. Here is the output fragment of the Softice / W MOD command, which understands this fact:
: MOD
HMOD Base Peheader Module Name EXE FILE NAME
019F
BFF700000 0147: BFF70080 KERNEL
32 C
: /Windows/system/kernel32.dll
01A
7 BFF200000 0147: 81525AF4 GDI
32 C
: /Windows/system/gdi32.dll
186F
BFEF00000 0147: 81525E98 Advapi
32 C
: /Windows/system/advapi32.dll
1827 BFC000000 0147:
815270F
0 User
32 C
: /Windows/system/User32.dll
The second column is the loading address of the module. KERNEL32 is the first 32-bit SYSTEM DLL that is loaded, extremely close to 3GB (address BFF700000). Next is USER32, located in BFF200000, and as best as Kernel 32. Maybe you will think that these addresses are calculated at the time of loading, no, not the case. Microsoft has a tool program (Rebase.exe in Win32 SDK), can calculate how much address space needs to be required, and then calculate the best loading address so that these System DLLs can be tightly connected. When these System DLLs are compiled (the translation: of course is not you), Microsoft will then modify the DLLS so that they have a better load address calculated by Rebase.exe. This makes all Systems DLLs can be loaded in the fastest time, and the Windows 95 loader does not need to "reset" "work again. The last large block of Windows 95 address space is 3GB to 4GB (C0000000H ~ Fffffffh). Finally, this 1GB is used for Ring0 system components (which is VXDS). Memory sharing (Sharing Memory)
All programs in WIN16 and all memory owned by all DLLs can be accessed by other programs and DLLs. This is because each Win16 itinerary uses the same area descriptor table (LDT). Therefore, shared memory between the itinerary is very easy: Just let two (above) programs use the same Selector. Will set up to the GMEM_SHARE attribute to the access to others, which is not necessary. Yes, it is not necessary to pay attention to the warning of Microsoft.
Now let us more than the Windows 95 memory management, it distinguishes the address space of each Win32 itinerary unless you specify which block is shared. Unfortunately, specifying sharing is not just as simple as using GMEM_SHARE properties - in fact, using GMEM_SHARE attributes in GlobalAlloc is useless. That is to say that GMEM_SHARE is useless: Win16 does not need it, because everything can be shared; Win32 ignores it.
Maybe you have listened to some so-called Win32 authorities said that the only way to share memory in Windows 95 or NT is to use a memory mapped file (Memory Mapped file). That is indeed a method, but not the only way. If you just want to share a small amount of memory in the different procedures, why should you use a cow knife? Although this book puts the focus on readable / writable data between procedures and procedures, don't forget that the 4GB address space is reserved for the system, they can always be shared by all strokes.
From the low-level, the so-called memory sharing is only to map the RAM of one page to more than one stroke location space. These RAMs can be imaged to the same linear address, or can be imaged to different linear addresses. In Windows 95, the memory shared area completed via a memory mapped file is always the same linear address in different strokes. The last PHYS program will expose this fact. However, doing this assumption in your Win32 program is very dangerous, because Windows NT does not guarantee that the memory image file has the same linear address in each stroke. Many Win32 programming books cover this topic with memory image files, so I don't plan to say too much here. The simplest memory sharing method is not filed by too many people. In fact, as long as you specify the Datasections for the program for the program, you can easily execute the individual (all the INSTANCE), or each user of the DLL, sharing this data. As long as Win32 DLL's Data Section is specified as Shared, its nature is like Win16 DLL. It's really lucky, Windows 95 gives us such a simple and flexible data sharing method. You can generate multiple Data Sections in the EXE or DLL, put all the data you plan to share in one of the Data Section, and set it to Shared. As for other Data Sections, the preset attributes are still used (Nonshared). The Phys program demonstrates this.
In general, the Microsoft compiler will put all initialized data into the section called .data, then leaves it to an attribute other than IMAGE_SCN_MEM_SHARED. This will make whenever an individual is generated, the data will copy a copy of the data, which is exclusive to execute individual. For sharing memory, you can ask the compiler to generate a new section, the name takes you, but only the first 8 characters make sense. E.g:
#pragma data_seg ("sharedat")
After #pragma, you can declare any data variable you want to be shared. You should initialize these data, otherwise they will be placed in another Data section who is unin-initialized. After the variable is finished, if you want to restore the original Data Section property, just add a row:
#pragma data_seg ()
Finally, you have to communicate your shared heart to the coupon. You have two ways, the traditional practice is to set the section attribute in the DEF file:
Sections
Sharedat Read Write Shared
Another method is to specify attributes in the coupling command column parameter. RWS represents Read, Write, Shared:
LINK / Section: Sharedat, RWS
I should tell you some warnings such as "user needs". If you initialize your data as a program code or data symbol, you will become quite interesting when DLL is loaded on different linear addresses of different strokes. Take a look at the data declaration in this surface (in a shared Data Section):
INT I;
INT * addressof_i = & i;
The problem is that the addressof_i cannot be determined before the DLL is loaded. Therefore, the DLL must contain a fixed recording (FixUp Record), telling the loader to remember the value of addressof_i. When the DLL is loaded, there is no problem. But if another trip is then loaded, the load address is not the same as the previous stroke, because Addressof_i has been used for the first stroke (it is shared, isn't it, isn't it? Modify the value of Addressof_i. Thus, for the second stroke, the value of AddressOf_i is wrong. Using the indicator, you can solve this problem. I can use a non-shared data variable to place an indicator to point to shared data. Since this indicator is a copy of each stroke, the loader can correct its value, making it correct in every stroke. In addition to sharing your information, Windows 95 can also share other memory. I have said that 2GB is all shared. However, Windows 95 also slightly opened a part of the area below 2GB. If you perform multiple copies of a program, or use the same DLL in more than one stroke, each repeated code is a waste. Although Code Section does not have an Image_SCN_MEM_SHARED property, Windows 95 still loads only one program code, then uses the CPU's Page Table, map the program code to other Memory Context.
This kind of sharing code section is very good, the only exception is when the DLL does not have a way to load the same linear address in different strokes. Suppose foo.dll is used by two strokes, and the stroke A loads foo.dll and placed at the linear address x. The trip B uses another group of DLLs (which includes foo.dll). When the trip B loads foo.dll, some other DLLs already occupy address x, so foo.dll has only used other addresses. If your program is in this case, the solution is to reset the Substrate loading address of the DLL, set to a linear address that is not used by other strokes.
"Copy on Write" in Windows 95 (copying when writing)
Since we know that Windows 95 is very possible to share the program code, we will naturally care: the deactivation is how to do this. Any questions? Hey, the error will write the breakpoint instruction (INT 3, OPCODE 0XCC) in your code. If the Code Page that is written to the interrupt point instruction is shared by two strokes, there will be potential problems. To know, the demphor is only in error, and the other is not affected even if it comes to the interrupt point. When the operating system see INT 3 and know that the stroke is not in the wrong state, it ends the stroke because it is an unprocessed anomalous case. Ok, if Windows 95's memory management system is really like the last section, you have no way to deactivate a DLL that is "used in multiple strokes" - so will not be avoided to other trips inexplicable Ended. Not more to say that an individual is entry and the other can operate normally.
Advanced operating systems such as Unix streams, the method of dealing with this issue is the so-called "Copy ON WRITE" mechanism. A system with a Copy ON Write mechanism (such as Windows NT), the memory manager uses the Paging mechanism of the CPU, shares the memory as much as possible, and some RAM PAGE replicates one copy as much as possible. Give an actual example will be more clear. Assuming that two individuals of a program are executing, sharing the same Code Pages (both read-only properties). One of them is in the wrong state, and the user tells the deactivator to put a breakpoint at some point in the program. When the division is attempt to write to the interrupt point command, a Page Fault is triggered (because the Code Page has a read-only property). When the operating system see this page fault, it is legal that is determined to be an unschered device attempt to read memory. However, the actions "written to the Code Page" will not be allowed. The system then copies the affected pages and changes the Page Table of the unpredictor to convert the image relationship to this copy version. Once the memory is copied and being imaged, the system can make the write action pass. The action of writing (interrupt point) only affects the copy content, does not affect the original content.
"Copy ON Write" does not only send the program when sharing the program code. In Windows NT, the writable DataPages is also only a read-only attribute at the beginning. When the application writes data, the CPU will generate PageFault. The operating system is then registered with "readable writable". Why is you so troublesome? Because this memory manager can also share other read-only Data Pages to everyone. If someone is written later, the "Copy ON WRITE" mechanism will refuse, and the RAM PAGES is provided to each stroke.
The greatest advantage of your Copy ON Write mechanism is to allow memory to share the sharing benefits as much as possible. The system will make a new copy of the shared memory only when necessary. Unfortunately, the Copy ON Write mechanism requires a delicate memory management system, and a brilliant Page Table management system, while Windows 95 is not enough because Windows 95 does not directly support your Copy on Write directly. This is very distressed for Windows 95 early users. After all, Microsoft has been selling, saying that all Win32 programs are as good as in Windows 95 and NT. When the main characteristics (such as "Copy On Write") are absent, "the same is as good as", "there is a vulnerability.
Windows 95 is not blind and stupidly writes data into shared memory. Since some actions must be made to make the decentralizer work, Windows 95 supports a so-called "Copy ON WRITE virtual mechanism". In this virtual mechanism, the WriteProcessMemory action will occur when the shared memory appears. The homework system first determines if the address you want to write is in the shared memory. If so, the system will copy a copy, then the new PAGES image is the same linear address, and then write actions. The PHYS program has proved that the Copy ON WRITE virtual mechanism operates effectively.
Although WriteProcessMemory is sufficient to make the extension defect to large DLLs, it is not able to except for the area of 2GB or more. Since SYSTEM DLLS, such as Kernel32 is located on 2GB, the general application deactivation has no way to deactivate them like it is in Windows NT. Try to see, launch your most familiar application deactivation in Windows 95, try to enter (Step INTO, translation) a system call. Regardless of the Visual C deactivation or Turbo Debugger, it is silently skipped (Step Out, the translation) The system call - even even if you are in an anti-group translation window and require it to enter the call. If you want to walk into the Windows 95 system code, you need a system-level decentralizer, like SOFTICE / W or WDEB386. Memory contexts
Although the abstract description Memory Contexts is also good, but sometimes some practical experience is better. Windows 95 must maintain some data structures to record which page of the RAM image to the stroke. To understand the Memory Contexts of Windows 95, you must understand the Paging Mechanism of the CPU. I will take you to quickly browse 80386 paging mechanisms, as for more advanced details, I will not mention it. If you are interested in paging, please refer to Intel Manual or other 386 architectural books.
The 80386 CPU uses a two-layer query form to convert a linear address into an actual address, and then sent to the address bus. The first layer query form is called Page Directory, there are 4KB, which is considered to be an array of 1024 DWORDs. Each DWord contains an actual address, pointing to a 4KB space called Page Table - it is also an array of 1024 DWORDs, with an actual address in each DWORD, pointing to 4KB actual memory (RAM).
In order to use the Page Directory and Page Table, the CPU cuts the 32-bit linear address into a part, as shown in Figure 5-5. The maximum 10 bits are given to the CPU as an array index of Page Directory, and select a Page Table. The next 10 bits will be used as the array index of this page table, and select a data, with the starting address of 4KB RAM. The last 12 positions are used as the offset value of these 4KB RAMs, and accurately indicate a byte.
Where do CPUs find Page Directory? CR3 buffer is also! This is a special teleconference introduced by 80386. Memory Contexts's most rough production is to generate a Page Directory and 1024 Page Tables for each trip, then change the CR3 buffer content at the appropriate time to point to the PageDirectory at the time.
The problem with this approach is that in order to image the entire 4GB address space, you need 1024 Page Tables, each size is 4KB. Each trip is this to consume 4MB of memory, which does not meet economic benefits. Windows 95's approach is to maintain a single 4MB area when doing Page Tables, and sometimes modify the data item in the Page Directory, allowing the CPU to quickly change the image of the Pages.
Maybe you worry, the light is used to use 4MB for paging, is it too much. Hey, don't worry, the operating system can tell the CPU to say that a PAGE TABLE is not present in memory (NOT PRETENT). Page Directory and Page Tables rarely truly use nearly 4MB of actual memory, but they do use 4MB address space, starting with FF800000H. Page Directory is also located in this 4MB. Using SoftICE / W can be observed. Windows 95 memory management functions
Windows 95's memory management is divided into four floors, upper-level fidelity depends on the lower floors. The bottom layer is a VMM provided by the VMM to configure large block memory and operate Pages. The application does not call this layer of functions directly, and kernel32.dll uses them to achieve higher levels of functions.
The second layer is the Virtualxxx function provided by the Kernel32: VirtualAlloc, VirualFree, VirtualProtect, and more. These functions are based on VMM vs, used to manage large block memory and take PAGE.
The last layer is the Heapxxx typographic of kernel32, including HeapAlloc, HeapFree, HeapCreate, and more. They are approximately equivalent to memory associated in C-letter library (such as Malloc, Free, etc.). In fact, in the C-card library of Windows NT SDK, Malloc is just another packaging of HeapAlloc.
The last layer is Localxxx and Globalxxx. However, the Win16 is different, and the two groups are basically no different, such as Localalloc, is exactly the same as GlobalAlloc. KERNEL32 opens these two functions, but uses the same letter address. Localxxx and globalxxx are actually just the upper pack of Heapxxx, in fact, there is not much reason to use Localalloc and GlobalalloC in Win32 programs. These memory management functions are no longer just like Win16's GlobalalloC, and they are no longer digging from the program's data segment as Win16 Localaloc. They continue to exist in Win32, the main reason is that the original Win16 program is more susceptible to survival.