Chapter III Module, Process, Thread (Modules, Processes, Threads)
Author: kendiv
Last Update:
Sunday, December 26, 2004
Although this book is written according to Windows 95, many ideas and details are still very useful for us to study Windows2000 and its subsequent systems. This reading note is not for the entire book, only for the original book 3, 5, 8, 10. These four chapters are as follows:
Chapter III Module, Process, Thread
Chapter 5 Memory Management
Chapter 8 PE and COFF OBJ Format
Chapter 10 writes a Win32 API SPY
At the end of each chapter, I will transplant the original book to Windows 95 to Windows 2000, and some details mentioned in the original article will provide changes in Windows 2000. At the same time, there are many new content.
Related references:
MSDN
"Windows Core Programming" Jeffrey Richter
"32-bit assembly language program design in the Windows environment" Luo Yunbin
"IBM PC assembly language program design (fifth edition)" Peter Abel
Summary:
Modules, Process, and Thread (thread) constitute the core of Ring3 Windows 95. Almost all APIs are related to them.
This chapter, we will see the core data structure of modules, processes, threads. When we observe these data structures, we will often encounter additional data structures, which will force us to continue. For example, each process contains a pointer to a Handle Table. And join the Handle Table, we will find many kernel objects (kernel32 objects). Similarly, when we observe the thread, it is difficult to ignore the existence of Thread Information Block (TIB). TIB plays a very important role in structured abnormalities.
In this chapter, in addition to three key data structures, I will also give pseudo code for APIs that do directly. This allows you to have the opportunity to see the operation of these data structures, as well as how the kernel (kernel32) handles questions like thread synchronous control.
Before the detail of exploiting modules, processes, and threads, I must first declare that the disclosure of these materials has not been approved by Microsoft. Microsoft wants you to put in your code into information related to these data structures. For applications that need to handle modules, processes, and threads, Microsoft's solution is to define the Toolhelp32 API in Tlhelp32.h.
The TOOLHELP32 function provides limited processing capabilities for modules, process, and thread data structures, limited to Microsoft's considered security. I have to emphasize that such processing is just a read-only process. However, Microsoft often believes that enough information, for system programmers, if I am not enough. For example, Toolhelp32 does not provide "enumerating the ability of a process of Handle Table". If you need this action, you must read these information directly.
Win32 Module (Modules)
A Win32 module represents a program code, data, and resources that are loaded by Win32 Loader loading of Exe or DLL. Therefore, one module in the memory corresponds to a program in the disk. EXE and DLL itself are not modules. It is a memory that is loaded by Win32 Loader and generates a corresponding module. One advantage in the Win32 PE format is that it is very simple to load them into memory. The operating system saves all advanced information of a load module in a structure that this structure is called: Module Database.
The application uses hModules to represent the loaded module. In WIN32, an HMODule is actually the start address of the memory when the program is loaded. For example, most EXE programs are loaded to 0x400000 (4MB), so their hModule is 0x400000. This means that when multiple EXEs are executed, they have the same HMODULE. This is not a problem, because Windows 95 / NT maintains a separate address space for each process. supplement:
HMODULE and HINSTANCE
In Windows 95/98 / ME / NT / 2000, Hinstance and HModule are actually the same, if a function requires an HMODule as a parameter, then pass a hinstance, and it is also true. There are two structures because in Windows 3.x, hmodule and hinstance are used to identify different things.
Module Database is very close to the beginning of the memory address that the exe or DLL is loaded, and it contains information that is located in the program in the program. Codes and materials in the module are not only the binary code generated by the compiler for your program, but also IMPORT TABLE, EXPORT TABLE, RESOURCE DIRECTORY .... Import Table Tell the loader This module requires a function of which DLL that needs dynamic links to take a function; Export Table is the opposite, tell the operating system which function is to be opened to other modules. Resource Section includes a tree structure similar to a disk directory that enables the system to quickly find specific resources. Module Database contains how to find these sections information, as well as the version of the required operating system, and whether the program is console mode .....
The format of Module Database is open. One of Win32 is actually the PE header of Exe or DLL. Take a look at Winnt.h, you will find an Image_nt_Headers structure, which consists of a DWORD and two sub-structures. The information in the image_nt_headers structure is that the Windows 95 is used inside the IMAGOWS 95 to find code, data, and resources in the loaded EXE or DLL.
Win32 requires an array of modules every process. If the module does not implicitly the implicity link DLLS, or it is loaded with DLLS through LoadLibrary, then the process does not have a way to see these DLL modules in memory (even if these DLLs are loaded).
supplement:
ImplicitLink: means that the program is a static link with the import libraries (.libs) corresponding to the DLLS during the link, so the final executable will contain a reposition table for all DLL functions (Relocation Table) and the corresponding correction record (FixUp Record). When the executable is loaded into memory by the Windows loader, the loader fixes all Fixup Records to record the actual address of the function in the memory in the DLLS, so the dynamic link can be smoothly performed.
In this case, the system kernel (kernel32) must face a difficult choice. From an application perspective, each process has its own module array is good, but from the perspective of the kernel, the single module array is easier to reach the sharing of code and resources. As long as there is a new process to start execution, or a new DLL is loaded. The kernel can quickly check the unique global module array, see if the EXE or DLL has been loaded, if yes, the kernel adds its reference count. If not, the kernel needs to load it into memory to generate a new module. The kernel (Kernel32) uses two structures to maintain a global module and make it look like a module chain table for each process. The first structure is an IMTE (Internet Module Table Entry), and the second structure is modRef.
Supplement: Some differences under Windows 2000
In Windows NT / 2000, the concept of Module is actually no longer existent. HMODULE and HINSTANCE indicate the same thing. We usually say that process handle is actually the HModule we discussed above, and Module Database I think it refers to the process / DLL kernel object, such as PCB (Process Control Block), mentioned above A global module chain list of the kernel maintenance, is actually a list of process / dll kernel objects.
Under Windows 2000, Hinstance is actually as follows:
Typedef void * hinstance
HMODULE is actually defined as follows:
Typedef Hinstance HModule
For a process, Hinstance is actually a pointer to the Hinstance__ structure.
IMTES (INTERNAL MODULE TABLE ENTRIES)
As shown in the figure above, IMTES constitutes a global module array, and the memory used by the array is allocated from the Kernel32 HEAP. The system uses HeapAlloc to assign a single memory. When the new module is joined, Kernel32 uses HeapRealloc to dynamically expand global arrays. When the kernel (kernel32) produces a new imte, it will search for blank elements in PModuletableArray. If you find one, put the Imte pointer. The index value of this element will play an important role when we explore MODREFS. The first element of PModuletableArray (index is 0) is used to represent the Kernel32.dll module.
Each non-zero element in PModuleTableArray represents an EXE or DLL that is loaded into memory in the system. Each such element is an IMTE pointer (I am in pimte in pseudo code). Although the format of Module Database is disclosed (actually the image_nt_headers structure), but the format of the IMTE is not open.
notes:
The discussion of the IMTE structure in this book is for Windows 95, for the current Windows 2000 / XP / 2003, there is no further study, so it is not here to describe the Imte structure in detail. In order to add this section later. (2004-11-26)
MODREF structure
A process has its own module chain list, but it doesn't know anything to other process loaded. The MODREF structure is associated with a module linker and a global module array of each process. Each process (except for strange kernel32.dll) has a module linked list actually a Modref linked list, one of the modRef for each Win32 DLLS used by the process when the process itself, other modRefs. The memory required by ModRefs comes from the HEAP of the Kernel32 (kernel), which means that the memory is on 2GB. notes:
According to the understanding of the original text, each process's Modref linked list is located in the system sharing global heap, so a process can read the Modref Lin table for other processes. But this is still valid in Windows 2000 and its subsequent versions, and is not known. (2004-11-26)
The MODREFS Lin table is located in Process Database, and each modRef linked table structure contains an index and points to the PModuletableArray array. Figure 3-2 shows the relationship between Modrefs and IMTES.