Introduction to anti-virus engine design
Author: NJUE
Sending time: 2004.01.14
table of Contents
1 Introduction
1.1 background
1.2 Development status of today's virus technology
1.2.1 System core virus
1.2.2 resident virus
1.2.3 Intercept system operation
1.2.4 encrypted deformed virus
1.2.5 anti-tracking / anti-virtual implementation virus
1.2.6 Direct API call
1.2.7 virus hidden
1.2.8 viral special infection method
1. Intention
The main content of this paper is as indicated by its topic is designed and prepared an advanced anti-virus engine. First of all, I need to explain this "advanced" two words. What is "advanced"? It is well known that traditional anti-virus software uses a featured static scanning technology, that is, find a specific hex string in the file. If it is found, it can be determined that the file is infected with a certain virus. However, this method has not played a good role in the situation of today's viruse techniques. Cause I will describe in the following chapters.
Therefore, this article will not analyze the characteristic scanning and virus code clearance modules in the anti-virus engine. We want to discuss two major anti-virus techniques for adequate viral technology - virtual machines and real-time monitoring techniques. What is a virtual machine, what is real-time monitoring, I will introduce a detailed introduction in the corresponding chapter. What I want here is that although these two technologies have been reflected in their predecessors (used by some advanced anti-virus manufacturers at home and abroad), these technologies are not fully disclosed for commercial purposes. So you can't find inside those on these technologies in any case from books or online information. And I will analyze a large number of program source code in the relevant chapter (mainly a complete virtual machine source code in Section 2.4) or the reverse engineering code (3.3.3 and 3.4.3, three of my reverse engineering The real-time monitoring of the famous anti-virus software and the anti-appointment code of the client program are announced at the same time publish some un disclosed mechanisms and data structures within the operating system of personal excavation. Here, you will start to enter the topic.
1.1 back view
The two main technologies involved in this article are both two of the most advanced technologies used in the anti-viral world. What is it? First, it is said that virtual machine technology is mainly designed to kill encrypted viruses. Simply, the so-called virtual machine is not a virtual machine. It is more appropriate to be a virtual CPU (CPU implemented by software), but the virus world is called. Its role is mainly an operation of analog Intel x86 CPU to explain the execution code, which can be used as the true CPU, decoding and executing the corresponding machine instruction specified.
Of course, what is an encrypted deformation virus, why do they need to be able to get answers in the appropriate chapters if they need to be virtualized and how virtual implementation. That istermined another highlight - real-time monitoring technology, it is more wide, not only limited to killing viruses. Many objects monitored in real time, such as INTMON, PPMON, Disk Access (DiskMon), and more. Monitoring for anti-virus is mainly for file access. When you have access to a file, real-time monitoring will check if the file is a poison file. If it is, the user choices is to clear the virus or cancel the operation request.
This gives the user a relatively safe execution environment. But at the same time, real-time monitoring will make system performance decline, and many anti-virus software users complain that their real-time monitoring allows the system to become unparalleled and unstable. This gives us a higher demand, which is how to make real-time monitoring of system resources for real-time monitoring while ensuring accurate intercept file operations. I will discuss this problem in the Virus Real-Time Monitoring section. These two technologies have been used in the products of advanced anti-virus manufacturers at home and abroad, although their source code is not open, but we can also peek into their design ideas through reverse engineering methods.
In fact, you use a hex editor to open their executable, perhaps see some of the debugged symbols, variable names, or output information that are not peeled off, and these spider silk horses are greatly beneficial to understand the intentions of the code. At the same time, the suffix is .vxd or .sys in the installation directory of the anti-virus software is the driver to perform real-time monitoring, can be reversed (see I analyze the discussion of the driving source code). I believe this, we have a general understanding of these two technologies. Behind we will go deep into the details of the technology. 1.2 Development status of today's virus technology
To discuss how anti-viruses, you must start from the discussion of the virus technology itself. It is the so-called "know each other and know each other." In fact, I think that there is a great disadvantage that the study of viral technology is illegal. It is hard to imagine a person who has no virus writing experience will become an anti-virus expert. As far as I know, there are currently no shortage writing masters in some of the famous anti-virus software companies in China. Only they use the same technique to be on the front, with 'poison' attack '. So I hope this paper can play the role of throwing bricks. I look forward to more people will introduce virus technology to the public.
Today's viruses are different from DOS and Win3.1. I think the biggest shift is: the guiding area virus is reduced, and the script type virus begins to flood. The reason is that there will be certain difficulties in the guidance zone directly rewritten under the current operating system (DOS is not protected, allowing the INT13 direct writer), and the change of the guiding area is easily discovered, so few people write again. And the script virus is favored by the viral authors with high communication efficiency and easy to write. Of course, because these two viruses can be killing using the characteristic-based static scanning techniques I have told, it is not in our discussion. The technology I want to discuss is mainly from binary housing viruses (viruses of infection documents), and these technologies are related to the operating system underlayer or 386% of protection mode, so it is worth research.
Everyone knows that the housing type virus under DOS mainly infects 16-bit COM or EXE files. Since DOS is not protected, they can easily reside, reduce the available memory (by modifying the MCB chain), modify the system code, intercept system service or Interrupt. Then the Win9x and Winnt / 2000 era, I want to write a 32nd Windows virus that runs it is not easy. Due to page protection, you can't modify the system's code page. Due to the provisions in the I / O license bitmap, you cannot access direct port access. In Windows you can't intercept all file operations by intercepting INT21H as in DOS. In short, you run with a user-state program, your behavior will be strictly controlled by the operating system, and it is impossible to do so like DOS. Also, it is worth mentioning that the executable file format used under Windows is very different from the EXE under the DOS (ordinary program uses PE format, the driver uses le), so the difficulty of the infection file of the virus is increased (PE and Le comparison Complex, the middle is divided into several festivals, if the infection is wrong, will cause the file to continue to be executed). Because there are too many new technologies for today's viruses, I can't discuss them one by one, so I choose some important and representation to discuss in each section of this chapter.
1.2.1 System core virus
In the introduction, it is necessary to discuss the concept of core state and user state before the system's core state virus. In fact, as long as you open a textbook on 386 protection model assembler design, you can find a story about these two concepts. The CPU of 386 and above achieved four privilege modes (Windows only used two), where privileged 0 (Ring0) is left to the operating system code, the device driver code is used, and they work in the system core state; The privilege 3 (RING3) uses ordinary user programs, they work in user state. The code running on the processor core state is not restricted, free access to any valid addresses, direct port access. The code running on the user's state is subject to the plurality of processes, which can only access the virtual addresses that can access the page in the user state in the page table item mapped to the address space, and only the task status segment ( Direct access specified in the I / O license bitmap (I / O Permission Bitmap) (at this time, IOPL in the processor status and control flag register EFLAGS is usually 0, indicating that the current direct I / O The minimum privilege level is RING0). The above discussion is limited to the protection mode operating system, and there is no such concept of this real mode operating system, all of which can be considered as running in the core state. Since there is so much advantage of running in the core state, then the virus has no reason to don't want RING0. Processor mode When the switching of Ring3 to RING0 occurs, there are two cases: the following cases: Access the long transfer command Call of the call door, access the interrupt gate or trap door. Details of the specific transfer Due to the complex protection inspection and stack switching, please refer to relevant information. Modern operating system typically uses interrupt gates to provide system services, complete mode switching, in Intel X86 is int, such as INT30 in Win9X, in Linux, is int80 In WinNT / 2000 is INT2E. User mode servers (such as system DLL) requesting system services by performing an INTXX, then processor mode will switch to the core state, working on the core state corresponding system code will serve the request and transmit the result to the user program. The following example will explain the method of the virus into the system's core state.
In addition to the top 4m page table, other places can be read or written by the user program in the part of the top 4m page table (3G-4G) in Win9x's proximity. If you view the page properties of these addresses with the Softice's page command, you will be surprised to discover the U RW bit, which means that these addresses can be read or written directly from the user state. This means that any user program can maliciously or unintentionally destroy the operating system code page during its run. This virus can be casually constructed at the GDT (Global Descriptor Table), the LDT (Local Descriptive Table), and the core state is allowed to enter the core state. Of course, it is not necessary to use the door to describe, and there are many ways to get RING0. According to the method I know, there is more than 10 species, such as calling the door (Callgate), Interrupt Door (INTGATE), TrapGate, Unusual Door (FAUT), Interrupt Request (IRQS), Port (Ports) , Virtual Machine Manager (VMM), Tonance (THUNKS), Device IO Control, API Function (SETTHREADCONTEXT), Interrupt 2E Service (NTKERN.VXD). Due to the limitations of the space I can't describe all the methods one by one, I only select a piece of code that is the most representative CIH virus version 1.5 version. It is often said that CIH viruses use VXD (virtual equipment driver) technology, in fact it is not VXD. Only it uses Win9X's vulnerability, constructed a DPL (segment prior level) in the IDT (Interrupt Description) (meant to perform an int instruction of the interrupt door from RING3), and make The descriptor points to a function address that needs to work in RING0 in a private address space. In this way, the CIH can perform an INTXX instruction (CIH choosing INT3, is to make the system debugger SOFTICE that the same hanging int3 does not work properly to enter the system's core state, thus calling the system VMM and VXD services. The following is a source code for a CiH1.5 I have commented:
**********************************************
; * Modify IDT to find core state privilege level *
**********************************************
Push EAX
SIDT [ESP-02H]; get the IDT form base site
POP EBX
Add EBX, HOOKEXCEPTIONNUMBER * 08H 04H; ZF = 0
CLI; let the interrupts are prohibited when reading modified system data
MOV EBP, [EBX]
MOV BP, [EBX-04H]; achieve the original interrupted entrance address
Lea ESI, MyExceptionHook- @ 1 [ECX]; the offset address of the function that needs to work in RING0
PUSH ESI
MOV [EBX-04H], Si
SHR ESI, 16
MOV [EBX 02H], Si; set to new interrupt entry address
POP ESI
**********************************************
; * Produce an abnormality to enter RING0 *
**********************************************
INT hookexceptionNumber; generate an exception
Of course, there is also a code that is restored to the original interrupt address and an exception processing frame.
The technology just discussed is limited to Win9x, and it is not so easy to enter RING0 under Winnt / 2000. The main reason is that Winnt / 2000 does not have the above loopholes, and their system code page (2G - 4G) has good page protection. Virtual addresses greater than 0x80000000 are invisible for user programs. If you use Softice's page command to view the page properties of these addresses, you will find the S bit, which means that these addresses can only be accessed from core states. So I want to construct a descriptor in IDT, GDT, and modify the kernel at the time of runtime. What can be done is only by loading a driver, using it to do something you can't do in Ring3. The virus can modify the kernel code in their load, or create a transfer door for the virus itself (using NT from Ntoskrnl.exe "Kei386allocategdtselectors, Kei386SetGDTSELECTORS, KEI386RELESEGDTSELECTORS). For example, the Funlove virus uses the drive to modify the system file (NTOSKRNL.EXE, NTLDR) to bypass the security check.
But there are two problems in this, one is where the driver comes from, the modern virus generally uses a technology called "DROP", ie, in the virion itself contains driver binary code (can compress or dynamically constructing the file). When the virus needs to be used, dynamically generate the driver and throw them on the disk, then immediately run the driver to run by registering and final calling startService in the SCM (Service Control Manager); its second is to load a driver Administrator's identity, normal accounts return fails when calling the above-described load function (security subsystem wants to check the user's access token (TOKEN), but most users choose administrator status when logging in Otherwise, the virus is also unable to load real-time monitoring and driving, so there is still a lot of opportunities for viruses.
1.2.2 resident virus
Resident viruses refers to the existence of those who look for a suitable page in memory and copy the virus itself to it and can always maintain the virus code during system operation. Resident viruses are more concealed than those direct infections, which usually intercepting certain system operations to achieve the purpose of infection. Viruses entering the core state can utilize system services to achieve this, such as CIH virus, by calling a service VMMCall_pageAllocate exported by VMM over 0xC0000000, is assigned a page space over 0xc0000000. The user who is in a user-state seems to be impossible in the memory after the program exits, because the user program is allocated as part of the process, once the process ends, Resources will be released immediately. So what we have to do is to allocate a process to exit memory.
A technique for the members of the Virus Writing Group 29A is very creative: He created a zone object via CreateFilemappingA and MapViewoffile and mapped it into a viewport to go to its address space, and moved the virus to it, due to documents The virtual address where the mapping is located is in a shared area (which can be seen by all the processes, that is, all processes are used to map the page table items of the virtual address in the shared area points to the same physical page), so the next step is injecting to Explorer.exe A code (using WriteProcessMemory to write data to other processes), and this code will apply again from the address space of Explorer.exe to open this file mapping. As a result, even if the virus exits, since Explorer.exe also retains the mapping page, a viral code has been kept in the memory page that can affect all processes until Explorer.exe exits. It can also be done by modifying the system dynamic connection module (DLL). Win9X under system DLL (such as kernel32.dll is mapped to BFF70000) is in the system sharing area (2G-3G), and if you write a small virus code in its code segment void, you can affect all other processes. But the code segment of kernel32.dll can only be read in the user state. Therefore, you must first modify its page protection attributes through special means; and the page of the system DLL in WinNT / 2000 is mapped to the process's private space (such as kernel32.dll mapped to 77ed0000), and has a write attribute, that is, there is no process When you try to write to this page, all processes share this page; and when a process tries to write to the page, the system's page error handling code will receive the abnormality of the processor and check the exception is not accessible, and assign it to A new page that causes exceptions and copies the original page content on it and updates the page table of the process to point to the newly allocated page. This optimization of this shared memory has brought a certain amount of trouble to the writing of the virus. The virus cannot be modified only the Kernel32.dll code in Win9X. It needs to use WriteProcessMemory to map the virus code to each process, so that each process will get a copy of the viral body, which is called a multi-process reside or every process in the viral boundaries (MUTI) -Process Residence or per-process residence.
1.2.3 Intercept system operation
Intercept system operation is a trick used for viruses. The DOS era, the Windows era is no exception. Under DOS, the virus intercepts the DOS system service by modifying the inlet address of INT21H in the interrupt vector table (DOS with INT21H to provide system calls, including a large number of file operations). Most of the guiding area viruses will connect INT13H (providing a BIOS interrupt of disk operation services) to obtain control of disk access. The virus under Windows also found the method of hooking system services. More typical CIH viruses uses a system-level file hook provided by IFSMgr.vxd (installed file system) to intercept all files in the system, I will discuss this problem in detail in the relevant chapters, because of the real-time monitoring under Win9X This service is also mainly used. In addition, there are other methods. However, the effect does not have this system-level file hook, mainly the bottom layer, will lose some file operations.
One of the methods is to use the APIHOOK, hook the API function. In fact, there is no ready-made service in the system, there is a SETWINDOWSHOKEX to hook the mouse message, but there is no power to intercept the API function. What we can do is to construct this hook. The method is actually very simple: For example, if you want to intercept the function createfile exported by kernel32.dll, you only need to add a jump instruction to your hook function at the beginning (bff7xxxx) of its function code, and then jump in your function come back. As shown below: ;; Target Function (to intercept the target function)
......
TargetFunction: (to intercept the target function entry)
JMP DETOURFUNCTION (jumping to the hook function, 5 word-wide jump instructions)
TargetFunction 5:
Push EDI
......
; TRAMPOLINE (your hook function)
......
TRAMPOLINEFunction: (where your hook function is executed, return to the original function)
Push EBP
MOV EBP, ESP
Push EBX
Push ESI (above the above lines are several instructions at the entrance to the original function, a total of 5 bytes)
JMP TargetFunction 5 (jump back to the original function)
......
But this method is only a small part of the file opens.
There is also a way to intercepted file operations under Win9X, which should be considered a large back door of Win9X. It is an API function called VxDCall0 in kernel32.dll. The code that disassembles this function is as follows:
MOV EAX, DWORD PTR [ESP 00000004h]; get the service code
POP DWORD PTR [ESP]; Stack Correction
Call fword PTR CS: [BFFC9004]; Calling the code at 3B paragraph through a call gate
If we continue to track, you will see:
003b: xxxxxxx int 30h;
This is a protected mode callback for caught VWIN32.VXD
For more details on vxdcall, please see Matt Pietrek's "Windows 95 System Programming Secrets".
When the service code is 0x002A0010, the protection mode callback will fall into a service called VWIN32_INT21DISPATCH in vwin32_int21dispatch. This is indicating that Win9X is still dependent on MSDOS, although Microsoft claims that Win9x does not rely on MSDOS. The call specification is as follows:
MY_INT21H: PUSH ECX
Push Eax; Similar to the function number in the AX of INT21H under DOS
Push 002a0010h
Call DWORD PTR [EBP A_VXDCALL]
RET
We can use the IXIT.DLL data segment of the entrance to the VxDCall0 function to the kernel32.dll data segment in the kernel32.dll data segment, the user can use the six bytes to point to our own hook function, and In the hook, check the transfer service number and function number to determine if it is a file service requesting VWIN32_INT21DISPATCH. The famous HPS virus utilizes this technology directly intercepting file operations in the system at the user state, but this method is only a small number of file operations.
1.2.4 encrypted deformed virus
Encrypted deformed virus is the key content of the virtual machine chapter, which will be placed in the relevant chapters.
1.2.5 anti-tracking / anti-virtual implementation virus
Anti-tracking / anti-virtual execution virus and virtual machine is closely related, so it will also be introduced in the corresponding chapter.
1.2.6 Direct API call
Direct API call is a common means of today's Win32 virus, which is a technique that the virus directly locates the API function directly in the memory and then calls. When the normal program performs an API call, the compiler compiles an API call statement into several parameter stack instructions followed by an indirect call statement (this refers to the Microsoft compiler, the Borland compiler uses JMPDWORD PTR [xxxxxxxh]) as follows:
Push arg1
Push arg2
......
Call Dword PTR [xxxxxxxh]
Address XXXXXXXH In the Import Section section of the program image, when the program is loaded, the loader is responsible for adding the address of the API function to the inside, which is the so-called dynamic link mechanism. The virus constructs the link information of the API used in the viral code in the import segment of the file when infected with an executable file, which selects the code to directly locate the API function address directly at runtime. In fact, these function addresses are relatively fixed for some versions of the operating system, but viruses cannot depend on here. The more popular practice is to first locate the load base address of the dynamic connection library of the API function, and then find the required API address in its export section. There is almost no difficulty in the back, as long as you are familiar with the export structure. The key is that the first step - determine the DLL load address. In fact, the system DLL load base address is also fixed for some version of the operating system, but the virus is still not dependent on this stability. At present, most of the viruses use a technique called structured abnormality to capture the abnormality triggered by viral body. In this way, the virus can search for the specified DLL (DLL using the PE format in a certain memory, and the head has a fixed flag), but not worrying that the system will be killed by the operating system due to the incorrect of the page.
Structured abnormal processing is simple interpretation of structural exception processing due to abnormal processing and back anti-virtual implementation technology.
There are two types of exception handling: final exception handling and exception per thread.
One: Final abnormal treatment
When there is an exception in your process, the operating system will call your exception handler created in the main thread. You also don't need to remove the processing code you install when you quit, and the system will automatically clear it.
Push Offset Final_Handler
Call setunhandexceptionFilter
......
Call EXITPROCESS
********************************************
Final_handler:
......
(EAX = -1 reload context and continue)
MOV Eax, 1
Ret; Program Entry Point
......
Code Covered by Final Handler
......
Code to Provide a Polite Exit
......
EAX = 1 Stops Display of Closure Box
EAX = 0 enables Display of the box
Two: Abnormal processing per thread
The value in the FS is a sixteen selection, which points to the data structure TIB, thread information block that contains important information. Its first double-byte pointing we called the structure of ERR:
1st DWord 0 Pointer to Next Err Structure (Pointer for the next ERR structure)
2nd DWORD 4 POINTER TO OWN Exception Handler (current level of exception handler address)
So the exception handling is pumped, if your own handler captures and processes this exception, then when your program has an exception, the operating system does not call its default handler, it will not An annoying red fork that performs illegal operations appears. Here is the exception segment of CIH:
MyvirusStart:
Push EBP
Lea Eax, [ESP-04H * 2]
XOR EBX, EBX
XCHG Eax, FS: [EBX]; Exclusive ERR Structure and the Address of the Former Structure
EAX = the address of the previous structure
; Fs: [0] = current ERR structural pointer (on the stack)
Call @ 0
@ 0:
POP EBX
Lea ECX, StoptorUnviruscode- @ 0 [EBX]; Offset of your exception handler
Push ECX; Offset Stack of your exception handler
Push Eax; Address Stack of the Former ERR Structure
Configure the ERR structure, remember this ESP (ERR structure pointer) for ESP0
......
StoptorUnviruscode:
@ 1 = stoptorunviruscode
XOR EBX, EBX; When an exception is abnormal, the system has added an ERR structure before you.
, So find the original structure address first.
Mov Eax, FS: [EBX]; Take the current ERR structure address EAX
MOV ESP, [EAX]; take the next structure address, EPS0 to ESP
RESTORESE:; If there is no abnormality, go here, you have the ESP for this ESP0.
Pop DWORD PTR FS: [EBX]; pop up the address of the original front structure to fs: 0
POP EAX; pop up your exception handling address, pin up
1.2.7 virus hidden
Implementing processes or modules should be a feature that must be a successful virus. Under Win9X, Kernel32.dll has an export function RegisterServiceProcess that can disappear from the process manager process list, but it does not allow viruses to escape from some process browsing tools. But when you know how these tools come to enumerate the process, you will also find ways to deal with these tools. Process Browse Tools Under Win9x, most use of the process32first and process32next two functions in the dynamic connection library called Toolhelp32.dll to implement process enumerations; in WinNT / 2000, there is also a psapi.dll export EnumpRocess can be used to achieve the same Features. Therefore, viruses can consider modifying part of these public functions, so that the information of a particular process cannot be returned to achieve viruses.
But things are far from imagining it, as the saying goes, "Tao is one foot, the magic is a high feet", this is a bad. Due to the current efforts of many counter-engineers, many secrets hidden in Microsoft have gradually been excavated by people. Of course, the management process and the internal data structure and code of the module are included in the Windows kernel. For example, WINNT / 2000 describes the process of all activities in the system by the process EPROCESS block bidirectional linked list pointed to by NToskrnl.exe PsinitialsystemProcess. If the process browsing tool reads these data from the system kernel with the help of the driver, any virus cannot escape from it.
For specific structures and features of EPROCESS, see David A.Solomon and Mark E.Russinovich's "Inside Windows2000" third edition.
1.2.8 viral special infection method
People who have some common sense for viruses know that ordinary viruses are by attaching themselves to the end of the host (in this way, the host size will increase), and modify the program entry point to make the virus to hit the virus. But now a lot of viruses can keep the host size and the entrance point on the host file head unchanged by using special infection techniques. Attached to the virus code, the size of the infected file unchanged, it is incredible, in fact it uses the characteristics of PE file format: there is a void between the PE files, and if the virion is Sustficies can divide themselves into several copies and inserted into the last gap in each section, so they do not have to increase one festival, so the file size remains unchanged. The famous CIH virus is using the typical example of this technology (there is only 1K size).
If the virus wants to get control right without modifying the file header entry point: The entry point constant means that the program is executed from the original program entry code, the virus must put the original program code Modify the jump instruction to guide the virus entrance. The principle is this, but there are still many discussable places, such as where to insert this jump instruction in the original code code. Some check tool scans the entry point domain of the executable file header, if it is found to be not normal, that is, not in the resource section or repositioning section, there is reason to suspect a certain virus. So just discussing the technique of the viral circle called EPO (inlet point blur) technology can deal with such scans, and it is an important means of anti-virtual implementation.
Also worth mentioning is that there are now many viruses already support infection of compressed files. For example, Win32.crypto virus can infect many of the types of compressed files such as ZIP, ARJ, RAR, ACE, CAB. These viruses contain code segments that decompress and compress the specific compressed file type, first to extract the contents in the compressed file, then infection, the appropriate file is infected, and finally the file compressed back and simultaneously modify the compressed file. The header checksum. Many anti-virus software currently support compressed files in multiple formats, but cannot kill them for some pending compressed files. Cause I think it may be that it is afraid that due to some kind, if it is unzipped or compressed incorrect, check and calculates is not equal, the compressed file format is destroyed after the clearance. The virus does not have to be responsible for the user's file damage, so there is no such concern.
main reference:
David A. Solomon, Mark Russinovich "INSIDE Microsoft Windows 2000" September 2000
David A. Solomon "INSIDE Windows NT" May 1998
Prasad Dabak, Sandeep Phadke, Milind Borate "undocuplented windows nt" october 1999
Matt Pietrek "Windows 95 System Programming Secret" March 1996
Walter OneY "System Programming for Windows 95" March 1996
Walter OneY "Programming The Windows Driver Model" 1999
Lu Lin "Windows9x file read and write internal" 2001