[Original & Translation] Three ways to inject code to other processes

xiaoxiao2021-03-05 97

Three ways to inject code to other processes

Original address: http://www.codeproject.com/threads/winspy.asp? DF = 100 & forumid = 16291 & SELECT = 1025152 & msg = 1025152

Download the entire compression package

Download Winspy

Author: Robert Kuster

Translation: Yuan Xiaohui (Hyzs@sina.com)

Summary: How to inject code into the address space of other threads and execute in the context of this thread.

table of Contents:

● Introduction

● Windows hook (hooks)

● CreateRemoteThread and LoadLibrary technology

○ Inter-process communication

● CreateremoteThread and WriteProcessMemory technology

○ How to use the technical subclass (SUBCLASS) other processes

○ In any case, suitable use this technology

● Write in the last words

● Appendix

● Reference

● Article history

preface:

We can find many password spyware on Code Project (www.codeproject.com) (Translator Note: Software that can see the contents of the password box in other programs), they all rely on Windows hook technology. Do you have other methods to achieve this? Have! However, first, let us simply review the goals we have to achieve, so that you can figure out what I am talking about.

To read a control content, whether it belongs to your own program, it is generally necessary to send a WM_GETTEXT message to that control. This is also valid for the Edit control, but there is a case exception. If this Edit control belongs to other processes and has an ES_Password style, this method will not succeed. Only the process of "owned (OWNS) this password control can use WM_GETTEXT to get its content. So, our question is: How to let the following code run in the address space of other processes:

:: SendMessage (hpwdeit, wm_gettext, nmaxchars, psbuffer);

In general, this problem has three possible solutions:

Put your code in a DLL; then map it to the remote process with the Windows hook. Put your code in a DLL; then map it to the remote process with CreateRemoteThread and LoadLibrary. You don't have to copy your code directly to the remote process (using WriteProcessMemory) and perform it with CreateremoteThread. There is a detailed description here:

I. Windows hook

Example program: Hookspy and Hookinjex

The main role of the Windows hook is to monitor the message flow of a thread. Generally be divided into:

1. Local hooks, only monitor the message flow of a thread in your own process.

2. Remote hooks can be divided into:

a. Specific threads, monitor the messages of a thread in the process;

b. System level, monitor all messages that are running in the entire system.

If the hook (monitoring) thread belongs to another process (case

2A and 2B), your hook procedure must be placed in a dynamic connection library (DLL). The system contains this DLL of the hook process to map to the address space of the hooked thread. Windows will map the entire DLL, not just your hook process. This is why Windows hooks can be used to inject code to other threads address space. Here I don't want to discuss the problem of hooks (see the description of the SETWINDOWSHOOKEX in MSDN), let me tell you that I can't find the two documents, it may be useful:

1. When the SETWINDOWHOKEX call is successful, the system automatically maps this DLL to the hooked thread, but is not immediately mapped. Because all Windows hooks are messages based until an appropriate event occurs, this DLL is mapping. such as:

If you have a hook (WH_CallWndProc) that monitors all unlunched messages, only one message is sent to the hook thread (a window), this DLL is mapping. That is, if the unHookWindowsHookex is called before the message is sent to the hook thread, this DLL will never be mapped to the thread (although the SETWINDOWSHOKEX call is successful). To force mapping, you can send an appropriate message to that thread immediately after calling SETWINDOWSHOKEX.

Similarly, after the UnHookWindowsHooKex is called, only the DLL is only uninstalled from the hooked thread after the DLL is hanged.

2. When you install the hook, the performance of the system will be affected (especially the system-level hook). However, if you just use the hooks of the particular thread to map DLLs and don't cut how messages, this defect can be easily avoided. Look at the following code snippet:

Bool apientry dllmain (Handle Hmodule,

DWORD UL_REASON_FOR_CALL,

LPVOID LPRESERVED)

{

IF (ul_reason_for_call == DLL_PROCESS_ATTACH)

{

// increase the number of references with loadLibrary

Char lib_name [max_path];

:: getModuleFileName (HModule, LIB_NAME, MAX_PATH);

:: LoadLibrary (lib_name);

/ / Security uninstalled hook

:: UnHookWindowsHookex (g_hhook);

}

Return True;

}

Let's take a look. First, we use the hook to map this DLL to the remote thread, then, after the DLL is really mapped, we immediately uninstall the hook. In general, the DLL will be uninstalled when the first message reaches the hook thread, however we add this DLL to the number of references to avoid DLLs.

The rest of the problem is: How to uninstall this DLL after use? UnHookWindowshookex is not ok, because we have canceled the thread unhook. You can do this:

○ Install a hook before you want to uninstall this DLL;

○ Send a "special" message to the remote thread;

○ Seek this message in your new hook hook process, call FreeElibrary and (Translator Note: Call the new hook) UnHookWindowsHookex. Now, the hook is only used when mapping the DLL to the remote process and uninstall the DLL from the remote process, has no effect on the performance of the hook thread. That is, we found a DLL mapping mechanism that can be used without affecting the process performance of WinNT and Win9x (compared to the second part discussed).

However, what should we use in the case? Usually in the DLL needs to reside in remote processes (such as controls in another process you want subclas [subclass) and you don't want to use this technique when you don't want to interfere. I didn't use it in hookspy, because the DLL is just short-handed for a while - as long as the password can be obtained. I demonstrate this method in another example hookinjex. Hookinjex put a DLL map into "Explorer.exe" (of course, in which finally uninstalls), the start button in the subclass is, more specifically, I am the left right button of the start button, click the event to reverse it.

You can find HooksPY and HOOKINJEX and its source code download package links at the beginning of this article.

II. CreateRemoteThread and LoadLibrary technology

Sample program: libspy

Typically, any process can dynamically load the DLL dynamically through LoadLibrary, but how do we force an external process to call this function? The answer is CreateremoteThread.

Let's take a look at the function declaration of LoadLibrary and Freelibrary:

Hinstance loadingLibrary (

LPCTSTSTLIBFILENAME // Address of FileName of Library Module

);

Bool FreeElibrary

HModule Hlibmodule // Handle To Loaded Library Module

);

Compare with the thread procedure threadproc: Thread Procedure "with the CreateremoteThread:

DWORD WINAPI THREADPROC

LPVOID LPPARETER // Thread Data

);

You will find that all functions have the same call convention, all accept a 32-bit parameter and the size of the return value type. That is, we can pass the pointer of the LoadLibrary / Freelibrary as a parameter to CRATEREMOTETHREAD.

However, there are two problems (refer to CreateremoteThread)

1. The LPStartAddress parameter passed to ThreadProc must be the starting address of the thread process in the remote process.

2. If you regard the LPParameter parameters of ThreadProc as a normal 32-bit integer (FreeLibrary is doing hmodule) so no problem, but if it is used as a pointer (LoadLibrary as a char *), it must point to the remote process Memory data.

The first question is actually unspeakable, because LoadLibrary and Freeelibrary are all functions existing in kernel32.dll, while kernel32 can ensure that there is any "normal" process, and its loading address is the same. (See Appendix a) The address of LoadLibrary / FreeElibrary is the same in any process, which ensures that a pointer to the remote process is a valid pointer. The second question is also very simple: copy the DLL file name (parameters of LodLibrary) to the remote process with WriteProcessMemory.

Therefore, the steps to use CreateremoteThread and LoadLibrary technology are as follows:

1. Get the Handle (using OpenProcess) from the remote process.

2. VirtualaLalkEx is allocated for the DLL file name in the remote process.

3. Write the DLL's file name (full path) to allocated memory (WriteProcessMemory)

4. Use CreateRemoteThread and LoadLibrary to map your DLL to remote processes.

5. Waiting for the remote thread end (WaitforsingleObject), waiting for LoadLibrary to return. That is to say, the remote thread is immediately ended when our DLLMAIN is returned when our DLLMAIN is called.

6. Retrieve the end code of the remote thread, that is, the return value of LoadLibrary - the base address after our DLL loads (HMODULE).

7. Release the memory (VirtualFreeex) assigned in step 2.

8. Use CreateRemoteThread and FreeELibrary to uninstall the DLL from the remote process. Pass the HModule obtained by the CREATEREMOTHREAD when calling when calling.

9. WaitSingleObject.

At the same time, don't forget to close all handles: The thread handles obtained in step 4, 8, the remote process handle obtained on the first step.

Now let's take a look at some of the partial code of libspy, analyze the above steps is any implementation. For the sake of simplicity, there is no code containing errors and support Unicode.

Handle hthread;

Char szlibpath [_MAX_PATH]; // "libspy.dll" file name

// (including full path!);

Void * plibremote; // szlibpath will be copied to the address

DWORD HLIBMODULE; / / The base address of the loaded DLL;

HModule hkernel32 = :: getModuleHandle ("kernel32");

// Initialize SzlibPath

// ...

// 1. Assign memory for SzlibPath in the remote process

// 2. Write SzlibPath to allocated memory

Plibremote = :: VirtualAllocex (HProcess, Null, Sizeof (Szlibpath),

MEM_COMMIT, PAGE_READWRITE

:: WriteProcessMemory (HProcess, Plibremote, (Void *) Szlibpath, Sizeof (Szlibpath), NULL;

/ / Load "libspy.dll" to remote process

// (through CreateremoteThread & loadLibrary)

Hthread = :: CreateremoteThread (HProcess, NULL, 0,

(LPTHREAD_START_ROUTINE) :: getProcaddress (Hkernel32,

"LoadLibrarya",

Plibremote, 0, NULL;

:: WaitForsingleObject (Hthread, Infinite);

/ / Get the base address of the DLL

:: getExitcodethread (hthread, & hlibmodule);

// sweep the tail

:: CloseHandle (HTHREAD);

:: VirtualFreeex (HProcess, Plibremote, Sizeof (Szlibpath), MEM_RELEASE

We are now in dllmain, such as SendMessage, now has been executed (DLL_PROCESS_ATTACH), so you can now uninstall the DLL from the destination process.

// Uninstall libspu.dll from the target process

// (via CreateremoteThread & FreeElibrary)

Hthread = :: CreateremoteThread (HProcess, NULL, 0,

(LPTHREAD_START_ROUTINE) :: getProcaddress (Hkernel32,

"Freelibrary",

(void *) hlibmodule, 0, null;

:: WaitForsingleObject (Hthread, Infinite);

// sweep the tail

:: CloseHandle (HTHREAD);

Inter-process communication

So far, we only discuss any of the DLLs to the remote process, however, in most cases of DLL needs to communicate with your program (remember, that DLL is mapped to the remote process, Not in your local program!). Take a password spy as an example: That DLL needs to know the handle of the control containing the password. Obviously, this handle cannot be hardcoded during compilation (HardCoded). Similarly, after the DLL gets the password, it also needs to send the password back to our program.

Fortunately, this problem has many solutions: file mapping, WM_COPYDATA, clipboard, etc. There is also a very convenient way to #pragma data_seg. Here I don't want to discuss in depth because they have a good description of the MSDN (see the InterProcess Communications section) or other information. I am using #pragma data_seg in libspy.

You can find libspy and source download links in this article.

III.CREATEREMOTHREAD and WRITEPROCESSMEMORY technology

Example program: Winspy

Another way to inject code to other process address is to use the WriteProcessMemory API. This time you don't have to write a separate DLL but directly copy your code to the remote process (WriteProcessMemory) and execute with CreateRemoteThread.

Let's take a look at CreateremoteThread's statement: Handle CreateremoteThread (

Handle HProcess, // Handle to Process To Create Thread in

LPSecurity_attributes lpthreadattributes, // Pointer To Security

// attributes

DWORD DWSTACKSIZE, // Initial Thread Stack Size, in Bytes

LPTHREAD_START_ROUTINE LPSTARTADDRESS, / / POINTER TO THREAD

// function

LPVOID LPPARETER, // argument for new thread

DWORD dwcreationFlags, // Creation Flags

LPDWORD LPTHREADID / / POINTER TO RETURNED THREAD IDENTIFIER

);

Compared with CreateThread, there is different:

● Added HProcess parameters. This is the handle of the process to create a thread.

● The LPStartAddress parameter of CreateRemoteThread must point to functions in the address space of the remote process. This function must exist in the remote process, so we don't simply pass an address of a local ThreadFUCN, we must copy the code to the remote process.

● Similarly, the data pointed to by the LPParameter parameter must also exist in the remote process, and we must also copy it.

Now, we summarize the steps to use this technology:

1. Get the Handle (OpenProcess) of the remote process.

2. Distribute memory (VirtualalalkEx) in the remote process.

3. Copy the initialization INJDATA structure to allocated memory (WriteProcessMemory).

4. Distribute memory (VirtualaLalkEx) for data to be injected in the remote process.

5. Copy ThreadFunc to the assigned memory (WriteProcessMemory).

6. Start the remote ThreadFunc with CreateRemoteThread.

7. Waiting for the end of the remote thread (WaitforsingleObject).

8. Remove from the remote process back to the execution result (ReadProcessMemory or getExitCodetteread).

9. Release the memory (VirtualFreeex assigned to the second, 4).

10. Turn off the open handles on step 6, 1.

In addition, the following rules must be observed when writing ThreadFunc:

1. ThreadFunc cannot call an API function in a dynamic library other than kernel32.dll and user32.dll. Only Kernel32.dll and user32.dll (if loaded) ensures that the loading address in the local and destination processes is the same. (Note: USER32 does not necessarily be loaded with all Win32 processes!) Refer to Appendix A. If you need to call functions in other libraries, use LoadLibrary and getProcessAddress to force load in the injected code. If for some reason, the dynamic library you need has been mapped into the destination process, you can also use GetMoudleHandle instead of loadingLibrary. Similarly, if you want to call your own functions in Threadfunc, copy these functions to the remote process and provide the address to threadfunc via InjDATA. 2. Do not use the Static string. Provide all strings to INJDATA delivery. why? The compiler will put all static strings on the ".data" segment of the executable file, but only reserved their references in the code (ie pointers). This way, ThreadFunc in the remote process performs non-existing memory data (at least in its own memory space).

3. Remove the compiler / gz compile option. This option is the default (see Appendix B).

4. Either the ThreadFunc and AfterThreadFunc are declared as static, or turn off the "incremental linking" (see Appendix C).

5. The total size of the local variable in ThreadFunc must be less than 4K bytes (see Appendix D). Note that when Degug is compiled, about 10 bytes in this 4k will be occupied in advance.

6. If there are more than 3 Switch branches, you must split it like this, or replace it with IF-ELSE IF.

Switch (expression) {

Case constant1: statnement1; goto end;

Case constant2: statement2; goto end;

Case constant3: statement2; goto end;

}

Switch (expression) {

Case constant4: statement4; goto end;

Case constant5: statement5; goto end;

Case constant6: statnement6; goto end;

}

End:

(Refer to Appendix E)

If you don't play according to these game rules, you are destined to hang your destination! Remember, don't want any data in the remote process and store data in your local process in the same memory address! (See Appendix F)

(Introduction: you will Almost Certainly Crash The Target Process if you don't play by those rules. Just Remember: DON ''SUME ANYTHINGION IN THE TARGET Process Is At The Same Address As IS in Your Process.)

GetWindowTextRemote (A / W)

All the work of the text acquired in remote edit is packaged into this function: getWindowTextRemote (A / W):

Int getWindowTextRemotea (Handle HPRocess, HWND HWND, LPSTR LPSTRING); INT getWindowTextRemotew (Handle HProcess, HWND HWND, LPWSTR LPSTRING);

parameter:

HProcess

Destination EDIT's process handle

HWnd

Purpose EDIT handle

LPString

Receive buffer of strings

return value:

The number of characters successfully copied.

Let us look at the following part of the code, especially the injected data and code. For the sake of simplicity, there is no code that supports Unicode.

Injdata

Typedef Lresult (WinAPI * SendMessage) (HWND, UINT, WPARAM, LPARAM);

Typedef struct {

HWND HWND; // Handle to Edit Control

SendMessage FnsendMessage; // Pointer to User32! SendMessagea

Char pstext [128]; // buffer this is to receive the password

} Injdata;

INJDATA is the data to be injected into the remote process. Before transferring its address to SendMessagea, we must initialize it. Fortunately, unse32.dll is always mapped to the same address in all processes (if mapping) is always mapped, so the address of SendMessagea is always the same, which also guarantees that the address passed to the remote process is effective.

Threadfunc

Static DWORD WINAPI Threadfunc (InjData * PDATA)

{

PDATA-> FnsendMessage (PDATA-> hwnd, wm_gettext, // get your password

SizeOf (pData-> pstext),

(LParam) PDATA-> PSText;

Return 0;

}

// this function Marks The Memory Address Afunc.

// int coBCODesize = (Pbyte) AfTERTHREADFUNC - (Pbyte) Threadfunc.

Static Void AfTerThreadFunc (Void)

{

}

ThreadFunc is the code actually implemented remote thread.

● Note how the AfThreadFunc calculates the code size of Threadfunc. Generally, this is not the best way, because the compiler will change the order in your function (such as it will put threadfunc after AfTerThreadFunc). However, you can at least identify in the same project, such as in our Winspy project, the order of your function is fixed. If necessary, you can use the / Order connection option, or to determine the size of Threadfunc with the Absorbed Tool, which may be better.

How to use this technical subclass (SUBCLASS) a remote control

Sample program: InjectEx

Let's discuss a more complex question: How do I belong to a control of other processes?

First, to complete this task, you must copy two functions to remote processes:

1. Threadfunc, this function is called by calling the control in the SETWINDOWLONG API subclass remote process.

2. NewProc, the new window process for that control (Window Procedure). However, the main problem is how to pass data to remote newproc. Because NewProc is a callback function, it must meet specific requirements (translator Note: The main parameters and types here), we cannot easily deliver an InjData pointer as its parameters. Fortunately, I have found the way to solve this problem, but two, but you must use the assembly language. I have always worked hard to avoid using compilation, but this time, we can't escape, there is no compilation.

Solution 1

Looking at the picture below:

I don't know if you noticed, INJDATA is next to NewProc in front of NewProc? This kind of newProc can know the memory address of InjData during compilation. More precisely, it knows that INJDATA is relatively offset relative to its own address, but this is not what we really want. Now, NewProc looks like this:

Static Lresult Callback NewProc

HWND HWND, // Handle to Window

UINT UMSG, // Message Identifier

WPARAM WPARAM, // First Message Parameter

LParam lparam) // Second Message Parameter

{

InjData * pdata = (injdata *) newProc; // PDATA pointing

// newProc;

PDATA -; // Now pdata points to INJDATA;

// Remember, INJDATA is just in the remote process

// NewProc's tightness;

// -----------------------------

// Subcode

// ........

// -----------------------------

// Call the window process used;

// fnoldproc (return from SETWINDOWLONG) is initialized by ThreadFunc (in remote processes)

/ / And store InJData in the remote process;

Return PData-> FncallWindowProc (PDATA-> FnoldProc,

HWND, UMSG, WPARAM, LPARAM;

}

However, there is a problem, see the first line:

InjData * pdata = (injdata *) newproc;

PDATA is hard coded for the address of NewProc in our process, but this is wrong. Because NewProc will be copied to the remote process, this address is wrong.

There is no way to solve this problem with C / C , which can be solved with inline assembly. See the modified newproc:

Static Lresult Callback NewProc

HWND HWND, // Handle to Window

UINT UMSG, // Message Identifier

WPARAM WPARAM, // First Message Parameter

LParam lparam) // Second Message Parameter

{

// calculate the address of InjData;

// In the remote process, INJDATA is just in

// NEWPROC front;

Injdata * pdata;

_asm {

Call Dummy

Dummy: POP ECX / / <- Current EIP in ECX

Sub ECX, 9 // <- ECX stores newProc address

Mov PDATA, ECX

}

PDATA -

// -----------------------------

// Subcode

// ........

// -----------------------------

// Call the original window process

Return PData-> FncallWindowProc (PDATA-> FnoldProc,

HWND, UMSG, WPARAM, LPARAM;

}

What does it mean? Each process has a special register that refers to the memory address of the instruction to be executed, that is, 32-bit Intel and AMD processors on the so-called EIP register. Because EIP is a special register, you can't access it like access to universal registers (EAX, EBX, etc.). In other words, you can't find an opcode (OPCode) that can be used to address EIP and read and write it. However, EIP can also be implicitly changed by JMP, Call, Ret, etc. (in fact it has been changing). Let's give examples how the 32-bit Intel and AMD processors work on how to work:

When we call a subroutine with CALL, the address of this subroutine is loaded into the EIP. At the same time, before the EIP is changed, its previous value will be automatically stack (after later being used as return instruction pointer [Return Instruction-Pointer]). The last RET instruction of the subroutine automatically pops this value from the stack to the EIP.

Now we know how to modify the value of EIP through Call and Ret, but how to get his current value?

Remember the call of the EIP's value stack? So in order to get the value of EIP we call a "Dummy" function "and pop up the top value. Look at the compiled newproc:

Address Opcode / Params Decoded Instruction

--------------------------------------------------

: 00401000 55 Push EBP; Entry Point of

NewProc

00401001 8bec Mov EBP, ESP

00401003 51 PUSH ECX

: 00401004 E800000000 Call 00401009; * a * Call Dummy

: 00401009 59 POP ECX; * b *

0040100A 83E909 SUB ECX, 00000009; * C *

: 0040100D 894DFC MOV [EBP-04], ECX; MOV PDATA, ECX

: 00401010 8B45FC MOV EAX, [EBP-04]

: 00401013 83E814 SUB EAX, 00000014; PDATA -;

.....

: 0040102D 8BE5 MOV ESP, EBP

0040102F 5D POP EBP

00401030 C21000 RET 0010

a. A fake function call; only jump to the next instruction and (the translator Note: More important is to put the EIP stack. b. Pop on the top of the stack to ECX. EIP is saved in ECX; this is the address of the "POP ECX" instruction.

C. Note that "distance" from NewProc's entry point to "POP ECX" instruction is 9 bytes; therefore minus ECX minus 9 get the address of NewProc.

In this way, no matter where it is copied, NewProc can always calculate its own address! However, pay attention to the distance from NewProc's entry point to "POP ECX" may vary depending on your compiler / link option, and is different in Release and Degub versions. However, in any case, you can still know the specific value of this distance in the compile period.

1. First, compile your function.

2. Check the correct distance value in the disassembler.

3. Finally, recompile your program with the correct distance value.

This is also the solution used in InjectEx. INJECTEX and HOOKINJEX are similar to the left and right click on the mouse on the switch button.

Solution 2

InjDATA in the remote process is not a unique solution in front of NewProc. Look at the next NewProc:

Static Lresult Callback NewProc

HWND HWND, // Handle to Window

UINT UMSG, // Message Identifier

WPARAM WPARAM, // First Message Parameter

LParam lparam) // Second Message Parameter

{

InjData * pdata = 0xA0B

0c0d0; // a hypothesis

// -----------------------------

// Subcode

// ........

// -----------------------------

// Call the previous window process

Return PData-> FncallWindowProc (PDATA-> FnoldProc,

HWND, UMSG, WPARAM, LPARAM;

}

Here, 0xA0B

0c0d0 is merely the placeholderholder of InjData in the remote process. You can't get this value in the compile period, however you really know the address of InjData after calling VirtualaLalkEx (time to INJDATA)! (Translator Note: It is the return value of VirtualAllocex)

It is probably this look after our NewProc compile:

Address Opcode / Params Decoded Instruction

--------------------------------------------------

0040150 55 PUSH EBP

00401001 8bec Mov EBP, ESP

00401003 C745FCD

0c0b

0A0 MOV [EBP-04], A0B

0c0d0

0040100A ...

....

: 0040102D 8BE5 MOV ESP, EBP

0040102F 5D POP EBP

00401030 C21000 RET 0010 Compiled machine code should be: 558becc745fcd

0c0b

0A0 ... 8be55dc21000.

Now, you do this:

1. Copy InJData, ThreadFunc, and NewFunc to your destination process.

2. Change NewPoc's machine code to let PDATA points to the true address of INJDATA.

For example, suppose INJDATA's true address (return value of VirtualAllocex) is 0X

008a0000, you change NewProc's machine code to:

558becc745fcd

0c0b

0A0 ... 8be55dc21000 <- NewProc 1558Becc745FC before modifying

00008a00 ... 8be55dc21000 <- Modified NewProc

In other words, you take a false value A0B

0c0d0 is changed to INJDATA's true address 2

3. Start pointing to the remote ThreadFunc, which has the controls in the remote process.

¹ You may ask, why A0B

0c0d0 and

008A0000 is in advance in the compiled machine code. At this time, because Intel and AMD processors use the LitTl-Endian NOTATION to represent their (multi-byte) data. In other words: a number of low bytes (low-order byte) is stored in the lowest position in memory, high-order byte is stored at the highest bit.

Imagine a word "UNIX" stored in four bytes, stored as "UNIX" in the BIG-Endia system, stored in the Little-Endian system is stored as "xinu".

2 Some of the buzz's cracks use a similar way to modify the machine code of the executable file, but once the program is loaded into the memory, it cannot change its own machine code (an executable .TEXT segment is written). We can modify the NewProc in the remote process because it is assigned to the allocation there is assignment.

When CreateremoteThread and WriteProcessMemory technology

With CreateremoteThread and WriteProcessMemory to inject the code, it is more flexible, but more complex, but more complicated than other two methods. Once there is an error in your threadfunc, the remote thread crashes immediately (see Appendix F). Debugging a remote ThreadFunc is also a mad dream, so you should use this method when you only inject a number of instructions. To inject a lot of code or use two other ways.

Again again, you can download to Winspy, Injectex, and their source code at the beginning of the article.

Write in the last words

Finally, we summarize some things that have not been mentioned yet:

Method Applicable operating system Operable process process I. Windows hooks Win9x and Winnt Connect User32.dll Process 1ii. CreateRemThread & loadLibrary only Winnt2 All processes 3, including system service 4iii. CreateRemThread & WriteProcessMemory only Winnt All processes, including system service

1. Obviously, you can't hook a thread without a message queue. The same SETWINDOWSHOKEX also does not work for system services (even if they connect users32). 2. There is no CreateremoteThread and VirtualalalkEx under Win9X (in fact, you can simulate them on 9x, but it is just a myth in the current

3. All processes = all Win32 processes CSRSS.exe

Native Application, such as Smss.exe, OS2ss.exe, Autochk.ex, is not connected to kernel32.dll without using Win32 APIS. The only exception is CSRSS.EXE, the Win32 subsystem itself. It is a local program, but some libraries (such as Winsrv.dll) require Win32 DLL including kernel32.dll.

4. If you want to inject code to system service or CSRSS.exe, adjust your process to "AdjustTokenPrivileges" before opening the handle of the remote process (OpenProcess).

It's probably these. Another point you need to keep in mind: you are injecting the code (especially if there is an error), it will easily drag the destination process. Remember: Responsibility is coming (Power Comes With Responsibility)!

Many examples in this article are related to the password. After reading this article, you may also be interested in the Supper Password SPY written by Zhefu Zhang (Translator Note: Probably a Chinese, Zhang Zhefu ??). He explained how to get a password from the IE password box, and said how to protect your password is not subject to this attack.

Last point: The reader's feedback is the only compensation of the author of the article, so if you think this article has a function, please leave your comment or vote for it. More importantly, if you find a bug or bug; or you think that you don't have good enough, there is a place to improve; or if you have unclear, please tell me.

thank

First of all, I would like to thank my readers in CodeGuru (this article is the earliest is because of your encouragement and supporting this article to develop from the original 1200 words to today's 6,000 words "大" . If there is a person I want to thank, he is Rado Picha. Part of this article is largely benefited from his suggestions and help to me. Finally, but not, the last, thanked Susan Moore, he helped me span the mine name called "English", so that this article is more smooth.

------------------------------------

appendix

A) Why is kernel32.dll and user32.dll being mapped to the same memory address?

My assumption: I think that Microsoft's programmer thinks this can be optimized. Let's explain this is why.

In general, an executable contains several sections, one of which is ".reloc" segment.

When the linker generates an EXE or DLL, it assumes that the file will be loaded into a specific address, which is the so-called assumption / preferred load / base address. All absolute addresses in the memory image are based on this "linker assumes the load address". If some reason, the image is not loaded to this address, then the PE loader has to correct all absolute addresses in the image. This is the reason for the ".reloc" section: it contains a list of the differences between all "linkers assumed address" and the real loaded address (note: most of the instructions generated by the compiler) A relative addressing mode, so the place you really need to relocate [Relocation] is not as much as you think). If, from another aspect, the loader can load the image to the linker preferred address, then ".reloc" segment will be completely ignored. However, because each Win32 program requires kernel32.dll, most of the user32.dll, so if they always map them to their preferred address, then the loader does not need to correct the kernel32.dll and user32.dll. (Absolute) Address, the load time can be shortened.

Let us use the following example to end this discussion:

Change the loading address of an app.exe to kernel32 (/ base: "0x77e80000") or USER32 (/ base: "0x77e10000") preferred address. If App.exe does not introduce UESE32, force LoadLibrary. Then compile App.exe and run it. You will get an error box ("Illegal System DLL Relink"), app.exe cannot be loaded.

why? When a process is created, Win2000 and WinXP's loader checks if kernel32.dll and user32.dll are mapped to their preferred addresses (their names are hard-encoded loaders), if not, they will report. . Ole32.dll in WinNT4 is also checked. In WinNT3.51 or lower, there will be no checks, kernel32.dll and user32.dll can be loaded anywhere. The only module that is always loaded to the preferred address is NTDLL.DLL, and the loader does not check it, but if it is not in its preferred address, the process is not created at all.

Summary: In WinNT4 or higher operating systems:

● The DLL of the preferred address that is always loaded into them is: kernel32.dll, user32.dll, and ntdll.dll.

● The Win32 program (along with CSRSS.exe) has a certain existing DLL: kernel32.dll and NTDLL.DLL.

● DLL: NTDLL.DLL exists in all processes.

B) / gz compilation switch

At Debug, the / gz switch is open by default. It can help you capture some errors (detail reference documentation). But what is its impact on our executable?

When / gz is used, the compiler will add additional code (add to the last side of each function) to check if the ESP stack pointer is changed by our functions in each function. However,,,,,,,,,,,,,,,,,,,,, This is the road to the disaster. Because ThreadFunc copied to the remote process will call a function that does not exist in the remote process. C) static function and incremental connection (Incremental Linking)

Incremental connections can shorten the connection time, when compiling, each function call is implemented through an additional JMP instruction (one exception is a function that is declared static!) These JMP allows connector mobile functions The location in the memory does not need to update the call to the call. But this JMP has brought us anything: now Threadfunc and AfterThreadFunc will point to JMP instead of their true code. So, when calculating the size of Threadfunc:

Const int CbcoDesize = (LPBYTE) AfTerthreadFunc - (lpbyte) threadfunc);

The actual result will be "distance" between the JMP instructions of ThreadFunc and AfterThreadFunc. Now suppose our threadfunc

004014c0, and its corresponding JMP instruction at 00401020

: 00401020 JMP

004014c0

...

004014C0 Push EBP; THREADFUNC's true address

004014C1 MOV EBP, ESP

...

then,

WriteProcessmemory (.., & threadfunc, cbcodesize, ..);

"JMP"

004014C

0 "and the code within the subsequent CBCodesize range instead of threadfunc to the remote process. Remote threads will first execute" JMP

004010C

0 ", then execute the last instruction of this process code (the translator's note: This is certainly not what we want).

However, if a function is declared as static, even if the incremental connection is used, it will not be replaced with JMP instructions. That's why I said in rule # 4 that Declaring ThreadFunc and AfterThreadFunc as static or ban incremental connection. (For other aspects of incremental connection, please see "Remove Fatty Deposits from Your Applications Using Our 32-Bit LiPosuction Tools")

D) Why can threadfunc only have 4K partial variables?

The local variable is always saved on the stack. Suppose a function has 256 bytes of local variables, when entering the function (more specifically in functions prologue), the stack pointer is subtracted 256. Like the following functions:

Void Dummy (void) {

BYTE VAR [256];

VAR [0] = 0;

Var [1] = 1;

Var [255] = 255;

}

Will be compiled as the following instructions:

00401000 PUSH EBP

00401001 MOV EBP, ESP

: 00401003 SUB ESP, 00000100; Change ESP As Storage For

Local Variables Is Needed

00401006 MOV BYTE PTR [ESP], 00; VAR [0] = 0 ;:

0040100A MOV BYTE PTR [ESP 01], 01; VAR [1] = 1;

0040100F MOV BYTE PTR [ESP FF], FF; Var [255] = 255;

: 00401017 MOV ESP, EBP; RESTORE Stack Pointer

: 00401019 POP EBP

0040101A RET

Note how the ESP (stack pointer) is changed in the example above. But what should I do if a function is more than 4K partial variable? In this case, the stack pointer will not be changed directly, but the change in ESP is correctly implemented by a function call. But this "function call" has caused the crash of Threadfunc because it will call a function that does not exist in a copy of the remote process.

Let's take a look at the documentation about the stack probes and / gs compilation options:

The / gssize option is a high-level feature that allows you to control the stack probe. The stack probe is a series of code inserted into each function call. When activated, the stack probe will be mildly according to the storage function The required space is moved.

If a function needs larger than the local variable space specified by size, its stack probe will be activated. The default size is a size (4K on 80x86). This value allows a Win32 program and the Windows NT virtual memory management program to interact, increase the total number of memory that has been submitted to the program stack during operation.

I can determine that you are strange to the above description ("stack probe moves gently to move the space required by the storage function.")) These compile options (their description!) Sometimes it is really annoyed, especially when you want to know how they work. For an alteration, if a function requires a 12kb space to store local variables, the memory on the stack is "allocated"

SUB ESP, 0x1000; first "allocate" 4 KB

Test [ESP], EAX; Touches Memory in Order to Commit A

New Page (if not already committed)

SUB ESP, 0x1000; "Assign" second 4 KB

TEST [ESP], EAX; ...

SUB ESP, 0x1000

TEST [ESP], EAX

Note how the stack pointer moves in units of 4KB, more importantly, using Test on the bottom of the stack after one step, using Test to the stack of the stack. "After Each Step). This ensures that the page containing the bottom of the stack is already submitted before the "Assign" next page.

Continue reading documentation:

"Each new thread will have (Receives) its own stack space, including memory and reserved memory. By default, each thread uses 1MB retention memory and a page size to submit memory. If necessary The system will submit a page from the reserved memory. "(Look at the GreateThread> DWSTACKSIZE>" Thread Stack Size "in MSDN)

Now why now say "This value can make a Win32 program and Windows NT virtual memory management programs harmoniously" also clear.

E) Why do I want to divide Swith from 3 CASE branches?

Similarly, use the example to explain that it will be simple: int Dummy (int Arg1)

{

INT RET = 0;

Switch (arg1) {

Case 1: Ret = 1; Break;

Case 2: Ret = 2; Break;

Case 3: Ret = 3; Break;

Case 4: Ret = 0xA0B0; Break;

}

Return Ret;

}

Will be compiled as the following code:

Address Opcode / Params Decoded Instruction

--------------------------------------------------

; Arg1 -> ECX

00401000 8B

4C2404 MOV ECX, DWORD PTR [ESP 04]

00401004

33c0 xor Eax, EAX; EAX = 0

: 00401006 49 DEC ECX; ECX -

00401007

83F903 CMP ECX, 00000003

0040100A 771E JA

0040102A 0040102A

JMP to One of the Addresses in Table ***

Note That ECX Contains the Offset

:

0040100C FF248D

2C104000 JMP DWORD PTR [4 * ECX

0040102C]

: 00401013 B801000000 MOV EAX, 00000001; Case 1: EAX = 1;

:

00401018 C3 RET

: 00401019 B802000000 MOV EAX, 00000002; Case 2: EAX = 2;

00401011 C3 RET

:

0040101F B803000000 MOV EAX, 00000003; Case 3: EAX = 3;

:

00401024 C3 RET

00401025 B8B

0A00000 MOV EAX,

0000A0B0; Case 4: EAX = 0xA0B0;

:

0040102A C3 RET

0040102B 90 NOP

; Address table ***

:

0040102C 13104000 DWORD 00401013; Jump To Case 1

: 00401030 19104000 DWORD 00401019; Jump To Case 2

00401034

1F104000 DWORD

0040101f; jump to case 3

004010215; JUMP To Case 4

Have you seen Switch-case?

It didn't test each CASE branch, but an address table (Address Table). We simply calculate the offset in the address table to jump to the correct case branch. Think about it, this is really a progress, assuming that you have a 50 branch of Switch statement, if there is no skill, you don't do 50 CMP and JMP to reach the last case, and you can use the address table, you can pass once The table is jumping to the correct case. We measure the time complexity of the algorithm: We replace the O (2N) algorithm with an algorithm of O (5), where: 1. O represent the time complexity of the worst case.

2. We assume that the offset (ie check the table) and jump to the correct address requires 5 instructions.

Now, you may think that the above situation is just because the Case constant is selected, (1, 2, 3, 4, 5). Fortunately, most of the examples in real life can apply this program, but the critical calculation is complicated. However, there are two exceptions:

● If less than 3 CASE branches, or

● If the Case constant is completely unrelated. (Such as 1, 13, 50, 1000).

The final result is the same as you use ordinary if-else if.

Interesting place: If you have only confused with constants after Case, now you should know why. This value must be determined during the compilation period, so that the address table can be created.

Go back to our question!

Note

Is the JMP instruction at 0040100C? Let's take a look at Intel's documentation for the hexadecimal opcode FF:

Opcode Instruction Description

FF / 4 JMP R / M32 JUMP NEAR, ABSOLUTE INDIRECT,

Address Given In R / M32

JMP uses absolute address! That is, it's one of the operands (here is here

0040102C) represents an absolute address. Still use more? Now remote ThreadFunc will blindly in the address table

004101c then jumps to this error, immediately hung up the remote process.

F) What is the reason for the remote process crashed?

If your remote process crashes, the reason may be one of the following:

1. You reference a string that does not exist in Threadfunc.

2. One or more instructions in Threadfunc use absolute addressing (see example in Appendix E)

3. Threadfunc calls a function that does not exist (this function call may be added by the compiler or connector). At this time you need to find the following code in the reverse system:

:

004014C0 Push EBP; Entry Point of Threadfunc

:

004014C1 MOV EBP, ESP

...

:

004014C5 Call 0041550; crash here

Remote Process

...

00401502 RET

If this controversial Call is added to the compiler (because some compilation switches that should not be opened, such as / gz open), it either in Threadfunc, or in the case of Threadfunc, you use CreateremoteThread & WriteProcessMemory technology must be careful, especially the compiler / connector setting, which is likely to add some things that bring trouble to your threadfunc.

Reference (omit)

Article history (omitted)

转载请注明原文地址:https://www.9cbs.com/read-35739.html

9cbs

New Post(0)