BY
Anton Bassov
Process-Wide API SPYING.
Abstract
API hooking and spying is not uncommon practice in Windows programming Development of system monitoring and analysis tools heavily depends upon it Numerous articles have been written on this subject -.. Quite a few are even available on The Code Project To be honest, I did. not find these articles to be that much informative - they all seem to describe the techniques that were presented by Matt Pietrek and Jeffrey Richter a decade ago do not get me wrong -. I do not want to say anything about the quality of these .
This article presents an absolutely universal model of process-wide API spying solution, capable of hooking all API calls in any user-mode process of our choice, ie our spying model is not bound to any particular API at the compile time. Our implementation is limited to logging the return values of all API functions that are called by the target module However, our model is extensible -.. you can add parameter logging as well Our spying model is particularly useful for analyzing the internal working of third-party applications when The Source Code Is Not Available. in Addition To The Universal Process-Wide Spying Model, We Also Present One More Way To Inject The DLL INTO The Target Process.
All The Programming Tricks, Described in this Article, Area 100% of My OWN Design, Although, Certainly, Based Upon The Ideas That Were First Expressed by Matt Pietrek.
Introduction
Process-wide API hooking relies upon the technique of modifying entries in the Import Address Table (IAT) of the target executable module First of all, you need to understand how imported functions are invoked -. At the binary level, calling an imported function is different from intra-modular call. When you make an intra-modular call, the compiler generates the direct call instruction (0xE8 on Intel CPU), because the offset of function within the module, relative to the place from which it is called, is always known -. even at the compile time However, if the function is imported, its address is unknown at the compile time, although a guess can be made Therefore, when you call the imported function, the compiler generates indirect (0xFF, 0x15. ON Intel CPU), Rather Than Direct, Call Instruction. When You Call An Imported Function, The Compiled Code Looks LOKE FOLLOWING: CALL DWORD PTR
[__IMP__createWindowExa @ 48]
This instruction tells CPU to call the function, the address of which is stored in __imp__CreateWindowExA @ 48 memory location. At the load time, the loader will write the address of CreateWindowExA () to __imp__CreateWindowExA @ 48 memory location, and the above instruction, when executed, will invoke CreateWindowExA (). If we write the address of our user-defined function into __imp__CreateWindowExA @ 48 memory location at the run time, then all calls to CreateWindowExA () within the module will invoke our user-defined function, instead of CreateWindowexa (). Our user-defined function can log or validate parameters, and kilidate Parameters, and the call createwindowexa () Directly By ITS Address. Process-Wide API Hooking Is Based Upon this IDEA.
The API spying solution normally consists of driver DLL, which actually does all the job of hooking and spying, and controller application, which injects the driver DLL into the target process The driver DLL normally communicates with its controller application by window messages -. WM_COPYDATA message is a convenient way to pass a small amount of data from one application to another.The addresses of all functions, imported by the module, are stored in Import Address Table (IAT), every entry of which has the internal form of __imp__xxx. Once the driver DLL has been injected into the target process, it overwrites IAT entries of the target module with the addresses of user-defined proxy functions, implemented by the driver DLL Each IAT entry replacement normally requires a separate proxy function -. a proxy function must Know Which Particular API Function It Replaces So That It Can Invoke The Original Callee. However, with Some Certain Workaround, All Iat Entry Replacements CAN Be Servic ed by a single proxy function - we will show you how this can be done This is an ultimate hack, but such approach makes our model absolutely universal -. we can hook all API calls in any user-mode process of our choice.
Locating the import address Table
In order to start spying, we have to locate the Import Address Table (IAT) of the target executable module. Therefore, we need a brief introduction to (PE) Portable Executable file format, which is the file format of any executable module or DLL . MSDN CD provides a very detailed description of Portable Executable (PE) file format, so we are not going too deeply into details here - we are mostly concerned with locating the Import Address Table of the target executable module.
PE file starts with 64-byte DOS file header (IMAGE_DOS_HEADER structure), followed by tiny DOS program which, in turn, is followed by 248-byte NT file header (IMAGE_NT_HEADERS structure). The offset to NT file header from the beginning of the file is given by e_lfanew field of IMAGE_DOS_HEADER structure. First 4 bytes of NT file header are file signature, followed by 20-byte IMAGE_FILE_HEADER structure, which, in turn, is followed by 224-byte IMAGE_OPTIONAL_HEADER structure. The code below obtains a pointer to Image_optional_header structure (hmod is a module handle): image_dos_header *
DOSHEADER = (image_dos_header *) hmod;
Image_optional_header * OPTHDR =
(Image_optional_header *) ((byte *) hmod dosheader-> e_lfaNew 24);
In actuality, IMAGE_OPTIONAL_HEADER is far from being optional - the information it contains is too important to be omitted This includes the suggested base address of the module, size and base addresses of code and data, stack and heap configuration, the address of entry point. , and, what we are mostly interested in, pointer to the table of directories. PE file reserves 16 so-called data directories. The most commonly seen directories are import, export, resource and relocation. We are mostly interested in import directory, which is just an array of IMAGE_IMPORT_DESCRIPTOR structures, with one structure corresponding to each imported module The code below obtains a pointer to the first IMAGE_IMPORT_DESCRIPTOR structure in import directory.:
Image_import_descriptor
* descriptor =
(Image_import_descriptor *) (Byte *) HMOD
OPTHDR-> DATADIRECTORY [image_directory_entry_import].
VirtualAddress;
The first field of IMAGE_IMPORT_DESCRIPTOR structure holds an offset to the hint / name table, its last field holds an offset to the import address table. These two tables are of the same length, with one entry corresponding to each imported function. The code below lists All names and addresses of it entries for all functions imported by the module: While (descriptor -> firstthunk)
{
Char * dllname = ((byte *) HMOD Descriptor -> name);
Image_thunk_data * thunk = (image_thunk_data *) ((Byte *) hmod
Descriptor -> OriginalFirstthunk);
INT x = 0;
While (thunk-> u1.function)
{
Char * functionname = (char *) ((byte *) hmod
(DWORD) THUNK-> U1.AddressofData 2);
DWORD * IATENTRYADDRESS = (DWORD *) ((Byte *) HMOD
Descriptor-> firstthunk x;
X ; Thunk ;
}
DESCRIPTOR ;
}
The inner loop retrieves function names and addresses of IAT entries for the imported module from IMAGE_IMPORT_DESCRIPTOR structure that corresponds to the given module; the outer loop just proceeds to the next imported module As you can see, Import Address Table for the imported module is nothing. more than just an array of DWORDs. All we have to do in order to start spying is to fill this array with the addresses of our user-defined proxy functions. As we promised, we will show you a trick that makes it possible for all IAT Entry Replacements to Be Serviced by a Single Proxy Function.
Implementing the spying solution
. Our spying team consists of 4 members - ProxyProlog (), Prolog (), ProxyEpilog () and Epilog () As their names suggest, ProxyProlog () and Prolog () are invoked before the actual calee takes control; ProxyEpilog () and Epilog () are invoked after the actual calee returns ProxyProlog () and ProxyEpilog () are implemented as naked assembly routines;. Prolog () and Epilog () are just regular C functions The actual spying job is done by Prolog () and Epilog (. .) The only task of ProxyProlog () and ProxyEpilog () is to save and restore CPU registers and flags before and after Prolog () and Epilog () perform their tasks - if we want the target process to keep on functioning properly, the whole process of spying must leave everything intact, at least as far as the API function and its client code are concerned.Windows uses flat memory model, which means code and data reside in the single address space, rather than in separate segments. This implies we Can Fill An Array With The Machine Instructions, And Call IT as a function. Look at the code below:
DWORD AddR = (DWORD) & RetBuff [6];
RetBuff [0] = 0xFF; RetBuff [1] = 0x15;
Memmove (& RetBuff [2], & Addr, 4);
AddR = (dword) & proxyepilog;
Memmove (& RetBuff [6], & Addr, 4);
This is a 6-byte indirect call instruction The first 2 bytes are occupied by the call instruction itself, and 4 bytes that follow are occupied by the operand -.. They hold the address of the variable that contains the address of ProxyEpilog () In this particular case, this variable comes immediately after the 6-byte instruction. When the instruction pointer hits retbuff, our handcrafted code is going to call ProxyEpilog (). Call instruction implicitly pushes the address, to which the invoked routine must return control, on the stack -. this is how the function knows its return address In our case, the pointer to the variable that contains the address of ProxyEpilog () (the address of retbuff [6]) is going to be on top of the stack when ProxyEpilog () starts execution.When DllMain () is called with fdwReason set to DLL_PROCESS_ATTACH, we fill retbuff array with the machine instructions (retbuff is a global BYTE array), dynamically allocate some memory, allocate Tls index, and store the memory we have allocated in the thread local storage. Every time DllMain () is called with fdwReason set to DLL_THREAD_ATTACH, it must dynamically allocate some memory and put it aside into thread local storage.
Now Let's Look At How We overwrite Iat Entries, After Obtaining Name and Address of Iat Entry for the Given Imported Function:
Struct RelocatedFunction {DWORD Proxyptr;
DWORD FUNTIONCPTR; Char * Dllname; char * functionname;
BYTE * PTR = (byte *) Heapalloc (getProcessHeap (), Heap_zero_memory, 32);
RelocatedFunction * Reloc = (RelocatedFunction *) & Ptr [6];
DWORD AddR = (DWORD) & proxyProlog
Reloc-> proxyptr = addr;
Reloc-> funcname = functionname;
Reloc-> Dllname = DLLNAME;
Memmove (& Reloc-> FunctionPTR, Iatentryaddress, 4); PTR [0] = 0xff; PTR [1] = 0x15; Memmove (& PTR [2], & Reloc, 4);
DWORD BYTESWRITTEN
WriteProcessMemory (GetCurrentProcess (), Iatentryaddress, & Ptr, 4, & Byteswritten
For each IAT entry replacement, we dynamically allocate an array, first 6 bytes of which are occupied by indirect call instruction, and 16 bytes that follow are processed as RelocatedFunction structure, first member of which is set to the address of ProxyProlog () (it definitely has to be the first). The other fields are set to the address and the name of the imported function, plus to the name of the DLL, from which the given function is being imported. First 2 bytes of the array are 0xFF and 0x15, AND 4 Bytes That Follow Contain The Address of RelocateDFunctin Structure. We Replace EAAT Entry with The Address of Such Array - Each Iat Entry Replacement Requires A Separate Array.
As a result, every call to the API function will, in actuality, call our handcrafted code that calls ProxyProlog (). As we said, call instruction implicitly pushes on the stack the address, to which the invoked routine must return. In our case , the pointer to RelocatedFunction structure is going to be on top of the stack, and the original return address, ie the address to which the API function must return control, is going to be one stack entry below at the time when ProxyProlog () starts EXECUTION. Stack Entries Below The Original Return Address Are Going to Be Occupied by The API Function Arguments. Now Let's Look At ProxyProlog () IMPLEMENTATIONS.
__Declspec (naked) void proxyProlog ()
{
_asm {
Push EAX
Push EBX
Push ECX
Push Edx
MOV EBX, ESP
Pushf
Add EBX, 16
Push EBX
Call protog
POPF
POP EDX
POP ECX
POP EBX
POP EAX
RET
}
}
ProxyProlog () saves registers and CPU flags, pushes the value of ESP at the time when ProxyProlog () started execution, and calls Prolog (). As we said, the pointer to RelocatedFunction structure is on top of the stack, and the address to which the API function must return control, is one stack entry below at the time when ProxyProlog () starts execution. As a result, Prolog () receives a pointer to the stack location where the pointer to RelocatedFunction structure can be found, as an argument . By incrementing its argument, prolog () Can Find a Pointer to the Stack Location Wheree The Original Return Address is store.
Struct storage {dword retaddress; relocatedfunction * ptr;};
Void __stdcall prolog (DWORD * RELOCPTR)
{
// Get Pointer to RelocateDFunction Structure
RelocatedFunction * reloc = (relocatedfunction *) relocptr [0];
// Get Pointer to Return Address
DWORD * RETDESSPTR = Relocptr 1;
// Save Pointer to RelocateDFunction Structure and Return Address in TLS
DWORD * NESTLEVELPTR = (DWORD *) TLSGetValue (TLSIndex);
DWORD NESTLEVEL = NestlevelPtr [0];
Storage * storptr = (storage *) & NestlevelPtr [1];
Storptr [Nestlevel] .retaddress = (* RetaddessPtr);
Storptr [Nestlevel] .ptr = reloc;
NestlevelPtr [0] ;
// Place API Function Pointer on top of the stack
Relocptr [0] = reloc-> funcptr;
// Replace proxyProlog () 's return address with retbuff
RetaddePtr [0] = (DWORD) & RetBuff;
}
Prolog () saves the pointer to RelocatedFunction structure and the original return address in the thread local storage, which is organized as a DWORD, followed by the array of Storage structures We treat this array as a stack -. DWORD just indicates the number of stack entries, ie is just a counter Prolog () saves the pointer to RelocatedFunction structure and the return address in the topmost stack entry, and increments the counter After performing the above tasks, Prolog () modifies the CPU stack -.. the address of the API function obtained from RelocatedFunction structure, replaces the pointer to RelocatedFunction structure, and the address of retbuff global array which is filled with the machine instructions in DllMain (), replaces the original return address on the stack.After Prolog () returns, ProxyProlog ( RESTORES Registers and CPU Flags. Prolog () HAS Modified The CPU Stack in Such Way That, After ProxyProlog () Returns, The Program Flow Jumps To The Original Calee, IE to To The AP I Function, Upon The Return Of Which The Program Flow Jumps, Instead of The Original Return Address, To Our Handcraft Code That Calls Proxyepilog ().
Let's look at proxyepilog ().
__Declspec (naked) void proxyepilog ()
{
_asm {
Push EAX
Push EBX
Push ECX
Push Edx
MOV EBX, ESP
Pushf
Add ebx, 12
Push EBX
All Epilog
POPF
POP EDX
POP ECX
POP EBX
POP EAX
RET
}
}
Implementation of ProxyEpilog () is almost identical to that of ProxyProlog (). ProxyEpilog () saves registers and CPU flags, pushes the value of ESP at the time when EAX register was on top of the stack, and calls Epilog (). As a result, Epilog () receives a pointer to the stack location where the return value of the API function can be found, as an argument. By incrementing its argument, Epilog () can find a pointer to the stack location where the address, to which ProxyEpilog () Must Return, is store, is store, is store, at epilog (). Void __stdcall
Epilog (DWORD * RETVALPTR)
{
// Get Pointer to proxyepilog () 's return address
DWORD * RETDESSPTR = RETVALPTR 1;
// Get Return Value
DWORD RETVAL = RetValptr [0];
// Get the Original Return Address and Pointer To
// RelocatedFunction Structure from the Topmost Storage Entry in TLS
DWORD * NESTLEVELPTR = (DWORD *) TLSGetValue (TLSIndex);
NestlevelPtr [0] -;
DWORD NESTLEVEL = NestlevelPtr [0];
Storage * storptr = (storage *) & NestlevelPtr [1];
RelocatedFunction * Reloc = (RelocatedFunction *) storptr [Nestlevel] .ptr;
// Replace proxyepilog () 's return address with the Original ONE
Retdessptr [0] = storptr [Nestlevel] .retaddress;
// Pack all info INTO the BUFFER AND
// send it to the controller Application
DWORD ID = getCurrentThreadID ();
Char buff [256]; char smallbuff [8]; char secondmallbuff [8];
STRCPY (BUFF, "Thread"); WSPrintf (SmallBuff, "% D / N", ID);
STRCAT (BUFF, Smallbuff); STRCAT (BUFF, "-");
STRCAT (BUFF, RELOC-> DLLNAME); STRCAT (BUFF, "!");
Strcat (buff, reloc-> funcname);
STRCAT (BUFF, "-");
STRCAT (BUFF, "RETURNS");
WSPrintf (SecsmallBuff, "% D / N", RetVal;
STRCAT (BUFF, SECSMALLBUFF);
CopyDataStruct data; data.cbdata = 1 strlen (buff); data.lpdata = buff; data.dwdata = wm_copydata;
SendMessage (WPARAM) SECWND, (LPARAM) & data;
}
Epilog () gets the pointer to RelocatedFunction structure and the original return address from the topmost Storage structure in the thread local storage, and decrements the counter Then Epilog () modifies the CPU stack -. It replaces the address to which ProxyEpilog () must return , with the original return address After performing the above tasks, Epilog () informs the controller application that the API function has returned -. the name of the given function, as well as of the DLL that exports it, are available from RelocatedFunction structure, pointer to which was saved in the thread local storage, and the pointer to the return value of the API function is Epilog () 's argument. Epilog () provides the controller application with all the above information by sending WM_COPYDATA message to the controller window .
After Epilog () returns, ProxyEpilog () restores registers and CPU flags. Epilog () has modified the CPU stack in such a way that, after ProxyEpilog () returns, the program flow jumps to the address, to which the API function was supposed to return control if no "espionage" was taking place. As you can see, all our "spying activity" can not disrupt the program execution in any possible way, because it leaves CPU stack, registers and flags intact, at least as far as the . API function and its client code are concerned Our "spying team" does not care which API function to spy on -. our model is absolutely universal, because our implementation is not bound to any particular API function at the compile time Furthermore, our model .
For the time being, our model is suitable only for listing all API calls and for logging the return values of API functions If you want to add parameter logging or validation, it can easily be done -. The API function arguments are just below the original return address on the CPU stack However, you must provide our "spying team" with the argument lists of the target API functions -. unfortunately, there is no way to obtain this information from the PE file The solution to this problem lies with the. enhanced communication between the controller application and the spying DLL - the controller application can always get the description of arguments of the target API function from the user, and provide the DLL with this information at run time Apparently, RelocatedFunction structure would require one more data. Member, IE a Pointer to Some Array That Contains The Description of Arguments, SO That Prolog () Would Be Able To Examine The Arguments. We Leave it for you to decide how to do i t.Warning: In case if your target executable module dynamically links to C run-time library, do not try to hook the functions that are imported from MSVCRT.dll Instead, you should hook the API calls that C run-time library. Makes, IE Overwrite The Import Address Table of Msvcrt.dll's Module.
Therefore, we are able to hook all API calls that are made by the target executable module, ieoutgoing calls. What about the opposite task, ie hooking all incoming calls to some particular DLL module (say, kernel32.dll), made by all Modules That Are Loaded Into The Address Space of The Target Process, Including System Dlls?
Hooking all calls to dll module, name by the target process
Once we know that process-wide API hooking can be achieved by modifying IAT entries of the target executable module, the answer to this question must be obvious. All we have to do is to walk through all modules that are currently loaded into the address space of the target process, and, in each loaded module, overwrite IAT entries of all functions that are imported from kernel32.dll. As a result, we will hook all calls that are made to kernel32.dll by all modules that are currently loaded into the address space of the target process.Unfortunately, this is only the partial solution. The problem is that any modification of IAT entries in the module affects only the given module. Hence, even if we hook all calls to kernel32.dll in all currently loaded modules, any module that is subsequently loaded into the address space of the target process is not going to be affected - all calls to kernel32.dll, made by such module, will remain unhooked.
In order to get a real solution, in addition to above mentioned overwriting of IAT entries in all currently loaded modules, we must also overwrite IMAGE_EXPORT_DIRECTORYof kernel32.dll itself. If we overwrite IMAGE_EXPORT_DIRECTORY of kernel32.dll, all future loading of DLLs into the target process will link with our proxy functions, although all currently loaded modules are not going to be affected. By combining the modification of IATs of all currently loaded modules with overwriting the IMAGE_EXPORT_DIRECTORY of kernel32.dll itself, we will hook all calls that are made to . kernel32.dll by absolutely all (including yet-to-be-loaded) modules in the address space of the target process Do not confuse it with system-wide spying - apart from the target process, all other processes in the system will STAY INTACT.ALL Information About The Functions, Exported by DLL Module, Can Be Found in Image_Export_Directory Structure, Which IS Accessible Via Image_Optional_Header Structure. The Code Belo W Obtains a Pointer to image_export_directory structure (HMOD IS KERNEL32.DLL MODULE's HANDLE):
Image_dos_header * dosheader = (image_dos_header *) hmod;
Image_optional_header * OPTHDR = (Image_Optional_Header *)
((Byte *) HMOD DOSHEADER-> E_LFANEW 24);
Image_export_directory * exp = (image_export_directory *) ((byte *) hmod
OPTHDR-> DATADIRECTORY [image_directory_entry_export]. VirtualAddress;
IMAGE_EXPORT_DIRECTORY contains the information about the addresses, names and ordinal values of all functions that are exported from the given DLL. The address table is an ULONG array that holds the addresses of all exported functions, name table is an ULONG array that holds the addresses of function name strings, and the ordinal table is an USHORT array that holds the difference between the real ordinal and base ordinal values. Please note that the addresses of functions and names are given as Relative Virtual Addresses (RVAs). In order to get the actual memory address of the exported function or of its string name, you must add its corresponding entry in the address or name table to the address, at which the given module is loaded. The code below lists all names and addresses of all functions that are exported By DLL Module: Ulong * Addressoffunctions = (Ulong *) (Byte *) HMOD EXP-> Addressoffunctions);
Ulong * addressofnames = (ulong *) ((byte *) HMOD EXP-> AddressOfNames);
For (DWORD X = 0; x
{
Char * functionName = (char *) (Byte *) hmod addressofnames [x]);
DWORD FUNCTIONADDRESS = (DWORD) HMOD Addressoffunctions [x]);
}
As you can see, for the time being everything is more or less the same as with listing the imported functions and their names However, things become a little bit different when it comes to patching the export address table -. Its entries must be overwritten not with actual memory addresses of proxy functions, but with RVAs, ie the differences between the actual memory addresses of proxy functions and the address, at which the given module is loaded. This means that all proxy functions must be loaded at the addresses that are higher Than Kernel32.dll Module's Base Address - RVA Cannot Be Negative. Let's Look Ath How IT CAN Be Done: Byte * WriteBuff = (Byte *
) VirtualaLalkEx (GetCurrentProcess (), 0, 5 * 4096,
MEM_RESERVE | MEM_TOP_DOWN, PAGE_EXECUTE_READWRITE
Writebuff = (byte *
VirtualaLalkEx (GetCurrentProcess (), WriteBuff, 5 * 4096,
MEM_COMMIT | MEM_TOP_DOWN, PAGE_EXECUTE_READWRITE
For (int x = 1; x <= exp-> numberoffunctions; x )
{
// Getur Current Position in Virtual Memory
Chunk
DWORD A = (X-1) / 170, POS = a * 16 (x-1) * 24;
BYTE * CURRENTCHUNK = & WriteBuff [POS];
DWORD OFFSET = (DWORD) WriteBuff- (DWORD) HMOD POS
// Get Name and Address of the Target
FUNCTION
Char * functionname = (char *) (Byte *) HMOD AddressOfnames [x-1]);
DWORD FUNCTIONADDRESS = (DWORD) (BYTE *) HMOD Addressoffunctions [x-1]);
// Load Virtual Memory with Machine Instructions
And Relocation Information
DWORD Addr = (DWORD) & WriteBuff [POS 6];
CurrentChunk [0] = 0xff; CurrentChunk [1] = 0x15;
Memmove (¤tchunk [2], & addr, 4);
RelocatedFunction * Reloc = (relocatedfunction *) ¤tchunk [6];
Reloc-> funcname = functionname;
Reloc-> funcptr = functionAddress;
Reloc-> proxyptr = (dword) & proxyProLog; // overwrite export address TABLE
DWORD BYTESWRITTEN
WriteProcessMemory (GetCurrentProcess (), & addressoffunctions [x-1],
& Offset, 4, & Byteswritten;
}
As a first step, we allocate a chunk of virtual memory at the highest possible address. The version of kernel32.dll on my machine (it runs Windows 2000) exports 823 functions. For each function replacement, we need 6 bytes for indirect call instruction , plus 16 bytes for RelocatedFunction structure, ie22 bytes. If we round this number up to 24 bytes, we will be able to fit 170 function replacement chunks in one page of memory (4096 bytes on Intel CPU), and 16 bytes of every page will remain unused. Therefore, we will need the total of 5 pages of virtual memory. It is a good idea to align these function replacement chunks on the page boundary. Therefore, the address of every given function replacement chunk can be calculated as following :
DWORD A = (X-1) / 170, POS = a * 16 (x-1) * 24;
BYTE * CURRENTCHUNK = & WriteBuff [POS];
Hence, THE RVA of Every Given Chunk, Relative To The Target Module's Base Address, Can Be Calculated As Following:
DWORD OFFSET = (DWORD) WriteBuff- (DWORD) HMOD POS
The rest is pretty much the same as overwriting the IAT entry - we fill first 6 bytes of the current chunk with the machine instructions, process 16 bytes that follow as RelocatedFunction structure, and write RVA to export address table entry that corresponds to the given function . As a result, every DLL that is subsequently loaded into the target process, will link with our proxy "functions", ie with our handcrafted code that calls ProxyProlog (). Furthermore, any call to GetProcAddress () from any module within the target process will return the address of our proxy "function", rather than the address of the real calee, although if we call any function, exported by kernel32.dll, by its name, it will result in calling the actual function, rather than our handcrafted code (unless the call is made by the module that was loaded after we have patched the export address table of kernel32.dll) - IATs of all modules that were loaded into the target process before we had patched the export ad dress table of kernel32.dll still contain the addresses of actual functions.WARNING: In case if any module in your target process dynamically links to C run-time library, make sure that MSVCRT.dll is loaded into your target process's address space before you overwrite kernel32.dll's export table. If you try to load MSVCRT.dll into your target process's address space after you have hooked kernel32.dll, it will fail to load properly. When it comes to hooking and spying, MSVCRT.dll turns out to be a hell of a library to work with - you remember that you should not hook the functions that are imported from MSVCRT.dll, ie this library always requires a special treatment.After having modified the export address table of
kernel32.dll, we must walk through all modules that are currently loaded into the address space of the target process, and, in each loaded module, overwrite IAT entries of all functions that are imported fromkernel32.dll. The code below shows how it can BE DONE
CurrentHandle Is A Module Handle of Spying DLL):
Void Overwrite (HModule HMOD)
{
Image_dos_header * dosheader = (image_dos_header *) hmod;
Image_optional_header * OPTHDR = (Image_Optional_Header *)
((Byte *) HMOD DOSHEADER-> E_LFANEW 24);
Image_import_descriptor * descriptor = (image_import_descriptor
*) ((Byte *) DOSHEADER OPTHDR-> DATADIRECTORY [
Image_directory_entry_import] .virtualAddress;
Handle hand = getcurrentprocess ();
HModule Ker = GetModuleHandle ("kernel32.dll");
While (descriptor-> firstthunk)
{
Char * dllname = ((byte *) HMOD DEScriptor-> name);
IF (LSTRCMP (Dllname, "Kernel32.dll")) {Descriptor ; Continue;}
Image_thunk_data * thunk =
Image_thunk_data *) ((byte *) DOSHEADER DESCRIPTOR-> OriginalFirstthunk);
INT x = 0;
While (thunk-> u1.function)
{
Char * functionname = (char *) (Byte *) DOSHEADER
(unsigned) thunk-> u1.addressofdata 2);
DWORD * IatentryAddress = (DWORD *)
(Byte *) DOSHEADER DESCRIPTOR-> firstthunk) x;
DWORD AddR = (DWORD) GetProcaddress (Ker, FunctionName);
DWORD BYTESWRITTEN
WriteProcessMemory (Hand, IatentryAddress,
& Addr, 4, & byteswritten;
X ; Thunk ;
}
DESCRIPTOR ;
}
CloseHandle (HAND);
}
Handle Snap =
CreateToolHelp32Snapshot (TH32CS_SNAPMODULE, GETCURRENTPROCESSID ());
ModuleEntry32 mod; mod.dwsize = sizeof (ModuleEntry32);
Module32First (SNAP, & MOD);
HModule first = mod.hmodule;
OverWrite (First); WHILE (Module32Next (SNAP, & MOD))
{
HModule next = mod.hmodule;
IF (Next == CurrentHandle) Continue;
Overwrite (Next);
}
We walk through all modules that are currently loaded into the address space of the target process (the fact that, starting from Windows 2000, Toolhelp32 functions are available on NT platform, simplifies our task greatly), and, in each loaded module, overwrite IAT entries of all functions that are imported from kernel32.dll We do not even have to fill function replacement chunks -.. it has already been done when we overwrote the export address table of kernel32.dll All we have to do is to overwrite IAT entries with the addresses that are returned by GetProcAddress () - after we have overwritten the export address table of kernel32.dll, GetProcAddress () returns the addresses of our function replacement chunks, rather than addresses of actual exported functions It is understandable that all. The Code You Have Seen So Far Resides in Our Spying DLL.
INJECTING THEING DLL INTO THE TARGET Process
There is one more thing to be done -.. We must inject the spying DLL into the target process The technique, described by Jeffrey Richter, uses CreateRemoteThead () API function in order to achieve this goal Unfortunately, this technique is not going to work in our case. Why not? Because we save that original return address in the thread local storage. If we want the target process to keep on functioning properly, absolutely every thread in the process must dynamically allocate some memory and put it aside into thread local storage, ie DllMain () must be called by absolutely every thread in the process. DllMain () will be first called by the thread that loads the spying DLL into the target process, and, subsequently, by all threads that are created in the target process after the spying DLL has been loaded. However, in case if we use CreateRemoteThead () to inject the spying DLL, all threads that were created by the target process before we had injected the spying DLL are not going to call . DllMain () Therefore, if we want the target process to keep on functioning properly, we have only 2 options:. 1 We must inject the spying DLL into its primary thread, and do it before the target process creates any additional threads, ie At The Earliest Possible Stage Of The Target Process's Lifetime
2. WE Must Make Every Thread That Currently Runs in Target Process Call Our Spying DLL's Entry Point
Implementing the former option is relative al., Compared to the latter one. There
INJECTING THEING DLL INTO The Process That We Create Ourslves
First, We Will Inject Our Spying DLL INTO The Process That We create Ourslves. Let's Look Atall (Char * filename)
{
// Get The Address of Target Application's Entry
Point
DWORD BYTES; char buff [4096];
Handle file = cretefile (filename,
Generic_read | Generic_Write, 0, Open_EXISTING, FILE_ATTRIBUTE_NORMAL, 0);
Readfile (File, BUFF, 1024, & BYTES, 0);
CloseHandle (file);
Image_dos_header * dosheader = (image_dos_header *) BUFF;
Image_optional_header * OptionalHeader = (Image_Optional_Header
*) ((Byte *) Buff DOSHEADER-> E_LFANEW 24);
DWORD
Entryptr = OptionalHeader-> AddressofentryPoint OptionalHeader-> ImageBase;
// CREATE TARGET Process
Startupinfo Startup; getStartupinfo; Process_information proco;
CreateProcess (FileName, 0,0,0, True, Create_Suspended, 0, 0, & Startup, & Procinfo);
// Allocate Memory in the Target Process
BYTE * WRITEBUFF = (Byte *
VirtualaLalkEx (Procinfo.hprocess, 0,4096, MEM_RESERVE, PAGE_EXECUTE_READWRITE);
Writebuff = (byte *
) VirtualaLalkEx (Procinfo.hprocess, WriteBuff, 4096, Mem_Commit,
Page_execute_readwrite);
// Get the adress of loadingLibraryas
DWORD
Function = (DWORD) GetProcaddress ("Kernel32.dll"),
"LoadingLibrarya");
// Fill the array with the machine instructions
DWORD STRINGPTR = (DWORD) & WriteBuff [20]; STRCPY (& BUFF [20], "Spydll.dll");
DWord Funcptr = (DWORD) & WriteBuff [16]; Memmove (& Buff [16], & Function, 4);
BUFF [0] = 0x68;
Memmove (& Buff [1], & StringPtr, 4);
BUFF [5] = 0x68;
Memmove (& Buff [6], & Entryptr, 4);
BUFF [10] = 0xff; buff [11] = 0x25;
Memmove (& Buff [12], & Funcptr, 4);
// Copy The Above Array Into The Memory That WE
Have Allocated in The Target ProcessWriteProcessMemory (Procinfo.hprocess, WriteBuff, Buff, 4096, & Bytes);
// Change the Execution Context of the Target
Process's Primary Thread
Context context; context.contextflags = context_control;
GetThreadContext (Procinfo.hthread, & context);
Context.eip = (dword) WriteBuff;
SetthreadContext (procinfo.hthread, & context);
ResumeThread (procinfo.hthread);
}
As a first step, we obtain the address of entry point of the target executable module - we can get this information before even spawning the target process Our executable file is saved on the disk in PE format, and, hence, the address of entry. .
Then we create a target process with the initially suspended primary thread from the .exe file, dynamically allocate a memory array in the target process's address space, and fill this array with the machine instructions in the following form:
Push Pointer_TO_DLLNAME
Push address_of_entry point
JMP DWORD PTR [_IMP_LOADLIBRARYA]
LANG = MC
Here we simulate the call instruction by combination of push and jmp instructions. When the instruction pointer hits the first byte of this array, the program will call LoadLibraryA () with pointer_to_dllname as an argument, and then return control to the application's entry point. Finally , we change the execution context of the target process's primary thread - we set the thread's instruction pointer to the first byte of our array with handcrafted instructions, and then let the thread run by calling
ResumeThread (). As a result, the spying DLL will be loaded by the target process's primary thread even before the target application's entry point is called.INJECTING THE SPYING DLL INTO THE RUNNING PROCESS
Now Let 'do Much More Complicated Thing, and INJECT Our Spying DLL INTO The Process That Already Runs. Let's Look Athow It Can BE DONE:
Void Inject (DWord ThreadID, Byte * Remotebuff, HModule HMOD, DWORD
EntryPoint, Handle ProcessHandle, Handle EventHandle;
Void loadandInject (DWORD Procid)
{
Byte Array [256]; CHAR BUFF [1024]; DWORD BYTESWRITTEN, DW, ThreadId;
// allocate memory and create thread in the Target
PROCESS
Handle processhandle = openprocess (process_all_access, 0, procid);
BYTE * WRITEBUFF = (Byte *
VirtualalalkEx (ProcessHandle, 0,4096, MEM_RESERVE, PAGE_EXECUTE_READWRITE);
Writebuff = (byte *
Virtualalocex (ProcessHandle, WriteBuff, 4096, Mem_Commit,
Page_execute_readwrite);
DWORD
Funcptr = (DWORD) GetProcaddress ("kernel32.dll"),
"LoadingLibrarya");
STRCPY (BUFF, "Spydll.dll");
WriteProcessMemory (ProcessHandle, WriteBuff, Buff, 256, & Byteswritten);
CreateRemoteThread (ProcessHandle, 0, 0, (lpthread_start_routine) Funcptr,
Writebuff, 0, & threadid;
// Get Module Handle and Entry Point of Our
DLL
Handle Snap = CreateToolHelp32Snapshot (TH32CS_SNAPMODULE, PROCID);
ModuleEntry32 mod; mod.dwsize = sizeof (ModuleEntry32);
Module32First (SNAP, & MOD);
HModule HMOD = 0;
While (Module32Next (SNAP, & MOD))
{
IF (! strcmp (mod.szmodule, "spydll.dll")) {hmod = mod.hmodule; Break;}
}
CloseHandle (SNAP);
ReadProcessMemory (ProcessHandle, (void *) HMOD, BUFF, 1024, & DW);
Image_dos_header * dosheader = (image_dos_header *) BUFF;
Image_optional_header * OPTHDR = (Image_Optional_Header *) ((byte *) Buff DOSHEADER-> E_LFANEW 24);
DWORD Entry = (DWORD) HMOD OPTHDR-> AddressofentryPoint;
// Create Auto-Reset Event in Initially Unsignaled
State
Handle EventHandle = CreateEvent (0,0,0, "spyevent");
// Make Every Thread in The Target Process Call
Entry Point of Our DLL
SNAP = CreateToolHelp32Snapshot (TH32CS_SNAPTHREAD, 0);
ThreadENTRY32 TH; th.dwsize = sizeof (threadentry32);
Thread32first (SNAP, & TH);
While (Thread32Next (SNAP, & TH))
{
IF (th.th32ownerprocessid == procid)
Inject (th.th32threadid, WriteBuff, HMOD, Entry, ProcessHandle, EventHandle);
}
CloseHandle (EventHandle);
}
As a very first step, we allocate a memory array in the address space of the target process, copy the name of our spying DLL into this array, and call CreateRemoteThread () API function with the lpStartAddress and lpParameter parameters set to respectively the address of LoadLibrary () API function and the address of the array that we have allocated, ie inject the spying DLL into the target process the way described by Jeffrey Richter. Then we walk through all modules that are currently loaded into the address space of the target process , until we find the module handle of our spying DLL. Then we read the memory of the target process, starting from the address that corresponds to our spying DLL's module handle. At this point we are already able to find the address of our DLL's entry Point in The Address Space of The Target Process - This Information is Available from Image_Optional_Header.
Then we create auto-reset event in initially unsignaled state - the meaning of this step will become obvious when you see the implementation of inject () Finally, we enumerate all threads that currently run in the target process, and make every thread in the. target process call our DLL's entry point - this is implemented by inject (), to which the above mentioned event handle is one of the parameters Let's look at inject () 's implementation: void inject (DWORD threadid, BYTE * remotebuff, HMODULE. HMOD, DWORD
EntryPoint, Handle ProcessHandle, Handle EventHandle
{
DWORD arg1 = (dword) hmod, arg2 = DLL_THREAD_ATTACH, ARG3 = 0;
Typedef Handle (__stdcall * func) (DWORD, BOOL, DWORD);
FUNC
GetProcadDress ("kernel32.dll", OpenTHREAD = (FUNC).
"OpenThread");
Handle ThreadHandle = OpenTHREAD (Thread_suspend_resume |
Thread_get_context | thread_set_context, 0, threadid;
Suspendthread (threadhandle);
Context context; context.contextflags = context_control;
GetThreadContext (ThreadHandle, & Context);
DWORD RETDRESS = Context.eip;
// we are going to do the tough job of flilling the Tough Job of Filling To
Array with the machine cots
Byte array [256];
// Copy All Necessary Data Into the Array
DWORD * OPENEVENTPTR = (DWORD *) & Array [100];
OpenEventptr [0] = (dword) & OpenEvent;
OpenEventptr = (DWORD *) & RemoteBuff [100];
DWORD * STEVENTPTR = (DWORD *) & Array [104];
STEVENTPTR [0] = (DWORD) & setEvent;
SetEventptr = (DWORD *) & RemoteBuff [104];
DWORD * CloseHandleptr = (DWORD *) & Array [108];
CloseHandLeptr [0] = (DWORD) & closeHandle;
CloseHandLeptr = (DWORD *) & RemoteBuff [108];
DWORD * EntryPointPtr = (DWORD *) & Array [112];
EntryPointPtr [0] = entrypoint;
EntryPointPtr = (DWORD *) & RemoteBuff [112];
DWORD * RETDRESSPTR = (DWORD *) & Array [116];
Retaddressptr [0] = RetDress;
Retdressptr = (DWORD *) & remotebuff [116];
STRCPY ((char *) & array [120], "spyevent");
Char * EventNamePtr = (char *) & remotebuff [120];
// Now we are filling the array with actual machineine
Instructions
// Push Registers and Flags
Array [0] = 0x50; array [1] = 0x53; Array [2] = 0x51; Array [3] = 0x52; array [4] = 0x9c;
// push entrypoint arguments
Array [5] = 0x68; Memmove (& Array [6], & Arg3, 4);
Array [10] = 0x68; Memmove (& Array [11], & Arg2, 4);
Array [15] = 0x68; Memmove (& Array [16], & arg1, 4);
// Call EntryPoint
Array [20] = 0xff; Array [21] = 0x15; Memmove (& Array [22], & EntryPointPtr, 4);
// push OpenEvent Arguments
Array [26] = 0x68; Memmove (& Array [27], & EventNamePtr, 4);
Array [31] = 0x68; int A = 0; Memmove (& Array [32], & A, 4);
Array [36] = 0x68; A = Event_All_Access; Memmove (& Array [37], & A, 4);
// Call OpenEvent
Array [41] = 0xff; Array [42] = 0x15; Memmove (& Array [43], & OpenEventptr, 4);
// push eax
Array [47] = 0x50;
// push eax
Array [48] = 0x50;
// Call setEvent
Array [49] = 0xff; Array [50] = 0x15; Memmove (& Array [51], & seteventptr, 4);
// Call CloseHandle
Array [55] = 0xff; Array [56] = 0x15; Memmove (& Array [57], & CloseHandLeptr, 4);
// Restore Registers and Flags
Array [61] = 0x9d; Array [62] = 0x5a; Array [63] = 0x59; array [64] = 0x5b; Array [65] = 0x58;
// jmp dord ptr [retaddressptr]
Array [66] = 0xff; Array [67] = 0x25; Memmove (& Array [68], & RetdressPtr, 4);
// We Have Finished Filling The Array, Thanks God
DWORD BYTESWRITTEN
WriteProcessMemory (ProcessHandle, (void *) Remotebuff, (void *) Array, 256, & Byteswritten;
CONTEXT.EIP = (DWORD) & remotebuff [0];
SetthreadContext (ThreadHandle, & Context);
ResumeThread (ThreadHandle);
WaitforsingleObject (EventHandle, Infinite);
CloseHandle (ThreadHandle);
}
The implementation of inject () does, basically, the same thing as our DLL-injecting code in the previous example- it fills the memory array with the machine codes, and changes the execution context of the target thread, ie makes it execute our handcrafted code that calls our DLL's entry point. However, now things become more complicated -our target thread already runs, so that all our activity must leave CPU registers and flags intact, as far as the target thread is concerned. Furthermore, for the safety reasons ., we must synchronize our injections, ie proceed to the next target thread only after the current target thread's execution context has been restored Therefore, we have to fill the array with the following instructions:
Push EAX
Push EBX
Push ECX
Push Edx
Pushf
PUSH 0
Push value_of_dll_thread_attach
Push HMOD
Call dword ptr [_imp_dllentrypoint]
Push EventNamePtr
PUSH 0
Push value_of_event_all_access
Call dword ptr [_IMP_OPENEVENT]
Push EAX
Push EAX
Call dword ptr [_imp_setevent]
Call dword ptr [_IMP_CLOSEHANDLE]
POPF
POP EDX
POP ECX
POP EBX
POP EAX
JMP DWORD PTR [Retdressptr]
This seems to be a bit of a tough job, but, unless you are desperate to crash the target process, it has to be done. After having changed the execution context of the target thread, inject () waits until the target thread sets the synchronization event we have created, so that we can not proceed to the next thread until the execution context of the target thread is restored. But what if the target thread is deadlocked at the time when we want it to call the entry point of our spying DLL ? Then our code will get stuck - no one is going to set our synchronization event to the signaled state This means that the above technique can be useful (with few adjustments applied) for detecting deadlocked threads in the target process -. the fact that one of the worker threads in multithreaded application is deadlocked is not always obvious at the first glance.NOTE: in case if we inject our spying DLL into the target process that we create ourselves, we can overwrite the addresses of our target functi ons right in DllMain () when it is called with fdwReason parameter set to DLL_PROCESS_ATTACH, because our target process has only one thread at the time. However, if we inject our spying DLL into the target process that when our spying DLL is injected already runs , we can overwrite the addresses of our target functions only after absolutely every thread in the target process has called our DLL's entry point. Otherwise, there is a good chance that the function replacement code will be called by the thread that has not yet allocated its Storage, Which Means The Target Process Will Crash When ProLog () Tries To Save The Return Address In The Storage That Has Not Yet Been Allocated.
This implies that the code, which actually overwrites the addresses of our target functions, must reside in a function that is exported by our spying DLL. Then, after the code in loadandinject () is executed, we would be able to create a thread in the target process by calling CreateRemoteThread () with the lpStartAddress parameter set to the address of this function - once the function is exported, we can always get its address in the target process from the spying DLL's export address table.In case if all this seems too complicated to you, I suggest you should create the target process yourself, rather than spy on the process that already runs - as you can see, the fact that the target process already runs at the time when we inject our spying DLL gives us quite .