Secrets under ATL (4)

zhaozj2021-02-16  99

Author: Zeeshan Amjad

Original link:

http://www.codeproject.com/atl/atl_underthehood_4.asp

Introduction

So far, we have not discussed anything about assembly language. But if we really want to know about the Atl underlying insider, you can't avoid this topic because ATL uses some underlying technology and some inline assembly language to make it smaller and fast. Here, I assume that the reader has already had the basics of assembly language, so I will only focus on my theme, but will then write another tutorial in assembly language. If you are not enough to understand the assembly language, then I suggest you look at Matt Pietrek in February 1998, published in Microsoft System Journal, "Under the Hood", this article will give you enough information about assembly language.

Now I am going to start our trip, then use this simple program as warm-up:

Program 55.

Void Fun (int, int) {} int main () {fun (5, 10); return 0;}

Now in command line mode, use the command line compiler cl.exe to compile it. When compiling, use the -faz switch, for example, if the name of the program is PROG55:

Cl-fas prog55.cpp

This will generate a file with the same file name, extension. Sasm, which contains assembly language code for the following program. Now look at the generated output file, let us first discuss the call of the function. The assembly code of the call function is similar to this:

Push 10; 0000000AHPUSH 5CALL? FUN @@ yaxhh @ z; fun First, the parameter of the function is in the order of the right and left, and then the function is called. However, the name of the function is different from our given, this is because the C compiler will have the name of the function of the function that has completed the completed function. Let us change the program slightly, reload this function, then take a look at the behavior of the code.

Program 56.

Void fun (int, int, int, int) {} int main () {fun (5, 10); fun (5, 10, 15); return 0;} Now call these two The assembly code of the function is similar to this: Push 10; 0000000AHPUSH 5Call? Fun @@ yaxhh @ z; funpush 15; 0000000FHPUSH 10; 0000000AHPUSH 5CALL? FUN @@ Yaxhh @ z; fun @@ yaxhh @ z; fun

Please see the name of the function, we have written two functions as the same name, but the compiler makes the function name to complete the work of the function overload.

If you don't want the name of the modified function, then you can use Extern "C" to the function. Let us make a lot of modifications to the program.

Program 57.

Extern "c" void fun (} int main () {fun (5, 10); return 0;} Call function assembly code is PUSH 10; 0000000AHPUSH 5Call _fun This means that you can't This function with a C link method is overloaded. Please see the following procedures

Program 58.

Extern "C" void fun (int, int, int, "c" void fun (int, int, int) {} int main () {fun (5, 10); return 0;} This program gives one Compile errors, because the overload of the function is not supported in the C language, and you give two functions to the same name while telling the compiler not to modify its name, that is, using C links, not C Link method.

Now let's take a look at the compiler to generate what the function does not do, the following is the code generated by the compiler for our function.

Push Ebpmov EBP, ESPPOP EBPRET 0 Please see the last statement of the following function before we explain in detail, that is, RET 0. Why is it 0? Or may it be another non-0 number? As we have seen, all the parameters we passed to the function are in fact being pressed into the stack. Do you have any effect on the register when you press the data in your or compiler to the stack? Please take the following simple programs to observe this behavior. I used Printf instead of cout, which is to avoid the overhead of COUT.

Program 59.

#include int G_itemp; int main () {fun (5, 10); // Translation: The fun here should be the Void Fun (int, int) _ASM Mov G_Itemp, ESP Printf ("Before Push % D / N ", g_itemp); _ASM PUSH EAX _ASM MOV G_ITEMP, ESP Printf (" after push% d / n ", g_itemp); _ASM POP EAX RETURN 0;} The output is: Before Push 1244980AFTER PUSH 1244976 This program The value in the ESP register is shown before and after the stack. The following figure clearly shows that the value of the ESP will decrease after you press the data to the stack.

There is a problem now. When we pass parameters to the function, who is responsible for recovering the stack pointer - the function itself is still a caller? In fact, these two situations are possible, and this is the standard call agreement and C call agreement. Please see the next statement of the call function:

Push 10; 0000000AHPUSH 5CALL _Funadd ESP, 8 There are two parameters here to pass to the function, so the stack pointer will subtract 8 bytes after the two parameters are set. Now in this program, set the stack pointer is the responsibility of the function caller. This is called C call convention. In this invoking convention, you can pass the variable parameters because the caller knows how many parameters are passed to the function, so it can set the stack pointer.

However, if you have selected the standard call agreement, the clear stack is the work of the caller. So in this case, the variable parameters cannot be passed to the function, because there is no way to know how many parameters have passed to the function, and the stack pointer cannot be set normally.

Please see the procedure below to observe the behavior of the standard call.

Program 60.

Extern "C" void _stdcall fun (int, int) {} int main () {fun (5, 10); return 0;}

Now let's take a look at the call of the function.

PUSH 10; 0000000AHPUSH 5Call _fun @ 8 Here, @ in the function name is a standard call convention, 8 indicates the number of bytes that are pressed into the stack. Therefore, the number of parameters can be distinguished by this number in 4.

The following is the code of the function we don't do, Push Ebpmov EBP, ESPPOP EBPRET 8 This function sets the stack pointer before returning via the "RET 8" instruction.

Now to explore the code that the compiler is produced for us. The compiler inserts this code to create a stack frame so that it can access parameters and local variables through standard ways. The stack frame is a region that is reserved for the function to store information about parameters, local variables, and return addresses. Stack frames are usually created when new function calls, and destroy when the function returns. In the 8086 system, the EBP register is used to store the address of the stack frame, sometimes called a stack pointer. (Translation: ESP and EBP are called "Stack Pointer" in this article, in fact, ESP should be called a "stack pointer [Stack Pointer] Register", which indicates the cheaper address of the stack of stacks; EBP should be called "Based pointer [Base Pointer] Register", it is used as a base address and the offset combination is used to access information in the stack.)

Thus, the compiler first saves the address of the previous stack frame, and then creates a new stack frame using the value of ESP. The previous stack frame is restored before the function returns.

Take a look at what is in the stack frame. All parameters are stored all parameters while the EBP's high address is stored, and all local variables are stored.

The return address of the function is saved in the EBP, and the address of the previous stack frame is saved in EBP 4. Now look at the example below, it has two parameters and three local variables.

Program 61.

Extern "C" Void Fun (int A, int b) {INT x = a; int y = b; int z = x y; Return;} int main () {fun (5, 10); return 0;} Now let's take a look at the function code generated by the compiler. PUSH EBPMOV EBP, ESPSUB ESP, 12; 0000000CH; INT x = a; MOV EAX, DWORD PTR _A $ [EBP] MOV DWORD PTR _X $ [EBP], EAX; INT Y = B; MOV ECX, DWORD PTR _B $ [ EBP] MOV DWORD PTR _Y $ [EBP], ECX; INT Z = X Y; MOV EDX, DWORD PTR _X $ [EBP] Add Edx, DWORD PTR _Y $ [EBP] MOV DWORD PTR _Z $ [EBP], EDXMOV ESP, EBPPOP EBPRET 0 is now looking at _x, _y these things are what. That is, these things defined above the function definition:

_A $ = 8_b $ = 12_X $ = -4_y $ = -8_z $ = -12 This means you can read the code like this:; int x = a; MOV Eax, DWORD PTR [EBP 8] MOV DWORD PTR [ EBP - 4], EAX; INT Y = B; MOV ECX, DWORD PTR [EBP 12] MOV DWORD PTR [EBP - 8], ECX; INT Z = X Y; MOV EDX, DWORD PTR [EBP - 4] Add Edx, DWORD PTR [EBP - 8] MOV DWORD PTR [EBP - 12], EDX This means that the addresses of parameters A and B are EBP 8 and EBP 12, respectively. And the values ​​of X, Y, and Z are stored in the position of EBP - 4, EBP - 8, EBP - 12, respectively.

After armed with this knowledge, let's play a function of a function parameter, look at the following simple procedures: program 62.

#include EXTERN "C" INT FUN (Int a, int b) {RETURN A B;} int main () {Printf ("% d / n", FUN (4, 5)); Return 0; } Just as we expect, the output of the program is 9. Let us now make a little modification of the program.

Program 63.

#include EXTERN "C" Int Fun (int A, int b) {_ASM MOV DWORD PTR [EBP 12], 15 _ASM MOV DWORD PTR [EBP 8], 14 Return A B;} int Main ) {Printf ("% D / N", FUN (4, 5)); RETURN 0;} The output of the program is 29. We know the address of the parameter and we change the value of the parameters in the program. Thus, when we add two variables, new variables 15 and 14 are added.

The functions in the VC have Naked properties. If you specify any function as Naked, then it will generate ProLog code and Epilog code for the function. So what is the ProLog code and Epilog code? ProLog is an English vocabulary, meaning "Opening", of course it is also a name for the AI ​​programming language - but here, this language and the proLog code generated by the compiler does not matter. The ProLog code is automatically generated by the compiler, which will be inserted into the beginning of the function to set the stack frame. You can look at the assembly language code generated by the program 61. At the beginning of the function, the compiler will automatically insert the following code to set the stack frame.

PUSH EBPMOV EBP, ESPSUB ESP, 12; 0000000ch This code is called a ProLog code. Similarly, the code inserted at the end of the function is called the Epilog code. In program 61, the Epilog code generated by the compiler is:

MOV ESP, EBPPOP EBPRET 0 now to see functions with Naked properties.

Program 64.

Extern "C" void _declspec (naked) fun () {_ASM Ret} int main () {fun (); return 0;} The FUN function code generated by the compiler is similar to this:

_ASM RET This means that there is no ProLog code and Epilog code in this function. In fact, the naked function has some rules, that is, you cannot define an automatic variable in the Naked function. Because if you do this, the compiler needs to generate code, and the compiler does not generate any code in the Naked function. In fact, you still need to write a RET statement yourself, otherwise the program will crash. You can't even write Return statements in the naked function. why? Because when you return something from the function, the compiler will put its value in the EAX register. So this means that the compiler will generate code for your returnite. Let's understand the work of the function return value by the following simple programs.

Program 64.

#include EXTERN "C" int sum (int A, int b) {RETURN A B;} int main () {IRETVAL; SUM (3, 7); _ASM MOV IRetVal, Eax Printf ("% D) / N ", IRetVal); RETURN 0;} The output of the program is 10. Here we did not directly use the return value of the function, but a copy of the EAX value was copied after the function call was completed. Now let's write our Naked function, this function does not have ProLog code and Epilog code, it returns two variables and.

Program 65.

#include EXTERN "C" INT _DECLSPEC (Naked) SUM (INT A, INT B) {// Prolog code _ASM Push EBP _ASM MOV EBP, ESP // Used to add variables and return code _asm Mov Eax, DWORD PTR [EBP 8] _ASM Add Eax, DWORD PTR [EBP 12] // Epilog code _asm pop ebp _ASM Ret} int main () {Irge (SUM (3, 7); _ASM MOV IRetVal, Eax Printf "% D / N", IRetVal; Return 0;}

The output of the program is 10, which is two parameters 3 and 7.

This property is used in ATLBASE.H to implement members of the _QITHUNK structure. This structure is used to debug a reference count of the ATL program with the case where _ATL_DEBUG_INTERFCES is defined.

I hope to explore some other secrets of ATL in the next article.

转载请注明原文地址:https://www.9cbs.com/read-11706.html

New Post(0)