Anti-assessment analysis of C virtual function calls
Author: Ruan Jianhui
How can the virtual function call to implement its "virtual"? As a representation of C polymorphism, it is estimated that many people are interested in implementing their mechanisms. At about a general textbook, when it comes to this C powerful mechanism, it is to teach everyone how to use, when used, and will not explore the truly realization details of this virtual function. (Of course, because of different compiler manufacturers, there may be your own implementation on virtual functions, huh, this is a virtual function for the "polymorphism" of the compiler :). As a compilation language, the final result of C compile is a bunch of assembly instructions (here is different from .NET CLR). Today, I will uncover its mystery, from the assembly level to see how the virtual function is achieved. Let everyone realize the virtual function not only know, but also know it. (This article is: PC Windows XP Pro Visual C 6.0, the results of the results and reflection of the compiler strategy are only for VC6.0 compilers)
First look at a simple code:
Code segment:
LINE01: #include
Line02:
Line03: Class base {
LINE04: Public: PUBLIC:
LINE05: VOID __STDCALL OUTPUT () {
LINE06: Printf ("Class Base / N");
Line07:}
LINE08:};
Line09:
LINE10: Class Derive: Public base {
LINE11: Public: PUBLIC:
LINE12: VOID __STDCALL OUTPUT () {
LINE13: Printf ("Class Derive / N");
LINE14:}
LINE15:};
LINE16:
LINE17: Void Test (base * p) {
LINE18: P-> Output ();
LINE19:}
Line20:
LINE21: INT __CDECL Main (int Argc, char * argv []) {
LINE22: Derive Obj;
LINE23: TEST (& obj);
LINE24: RETURN 0;
Line25:}
The result of the program will be:
Class Base
Then change the Output function declaration of the Base class to:
Virtual void __stdcall output () {
So, it is clear that the results of the program will be:
Class Derive
Test functions This is an illustration of this pointer is a pointer to the Derive class object, and correctly calls its OUTPUT function. How do the compiler do this? Let's take a look at the "Virtual" keyword and "Virtual" keyword, the final assembly code is there.
(Before explaining the following assembly code, let's make a simple scan for compilation. Of course, if you are already very skilled, then goto is outside the brackets ^ _ ^. Let me talk about the above Output function is declared as __stdcall Call mode: When it indicates that the function is called, the parameter is stack from right to left. After the function is called, the stack pointer ESP is recovered by the caller. Other call mode is described in the text. The so-called C THIS pointer: that is, one The initial address of the object. When the function is executed, its parameters and variables within the function will have the stack structure shown below: (Figure 1)
As shown in Figure 1 above, our parameters and local variables will be expressed in EBP adding or decrease in assembly. You may have questions: Sometimes my parameters or local variables may be a big structure or just a char, why is EBP add or subtraction a multiple of 4? Well, is this, for 32-bit machines, use 4 bytes, which is 32 bits per transfer, and can achieve optimal bus efficiency. If your parameters or local variables are larger than 4 bytes, they will be removed into 4 bytes per time; if it is smaller than 4 bytes, then 4 bytes per time. Simply explain the assembly instructions used below, these instructions are meaningful:
1Mov Destination, Source
Assign the value of the Source to Destination. Note that the form of "[xxx]" is often used below, "XXX" corresponds to a number of registers, "[xxx]" indicates the content of the memory cell corresponding to the value of "XXX". It is better than "XXX" is a key, goes to open a drawer, then take out the things in the drawer to others, or put the things given to this drawer;
2LEA DESTINATION, [SOURCE]
Assign the value of the Source to Destination. Note that this instruction is to give the Source to Destination, without assigning the contents of the SOURCE corresponding to the Destination. It is like it to give other people;
If you want to view the disassembly when debugging, you should click on the rightmost button below Figure 2.
(figure 2)
Other instructions I estimate that you can know what it is doing from its name. If you want to know the specific meaning, this should be referred to the assembly manual. :)
I. No virtual keyword:
(1) The contrast of the main function:
LINE22: Derive Obj;
LINE23: TEST (& obj);
// If you set the breakpoint in 22 lines, the vc will tell you when you start debugging, this is an invalid line, but
// The point is automatically moved to the next line (line23), because the code is not defined for Derive and its base class.
// Number, and the compiler does not generate a default constructor, this line C code will not generate
// Any assembly instruction that can be actually called;
004010d8 Lea Eax, [EBP-4]
// put the address of the object OBJ in the EAX register;
004010dB Push Eax
// Stack the parameters;
004010dc Call @ ilt 5 (TEST) (0040100A)
// Call the TEST function;
/ / This @ ilt 5 is the address of the JMP instruction that jumps to the TEST function, all of the modules.
// Function call will be like this @ ilt 5 * n, n means that the nth function in this module, and the meaning of the ilt // is the Import lookup table, the program call function is to jump through this table. Corresponding function
// line code.
004010E1 Add ESP, 4
// Adjust the stack pointer, just call the TEST function, call mode __cdecl, by the caller to restore the stack pointer;
(2) The disassembly content of the TEST function:
LINE18: P-> Output ();
00401048 MOV Eax, DWORD PTR [EBP 8]
// Here [EBP 8] is actually the most left parameters of the TEST function, that is, the stack of EAX in the above main function;
// Place the value of the parameter (that is, the address of the OBJ object in the above main function) is placed in the EAX register.
// Note: For member functions of the C class, the default call mode is "__thiscall", this is not a result
/ / The keyword specified by the presequence, the function call it is represented, the parameter stack from the right direction, and uses ECX registration
// The device saves the THIS pointer. Here our OUTPUT function is called "__stdcall", ECX register
// Do not use to save the THIS pointer, so there is an additional instruction to stack the THIS pointer, such as the following sentence:
0040104B Push EAX
// Put EAX, that is, the THIS pointer required to call the Output function;
0040104c Call @ ilt 0 (base :: output) (00401005)
// Call the class's member function, there is no suspense, and the OUTPUT function of the Base class is old.
2. When there is a virtual keyword:
(1) The contrast of the main function:
LINE22: Derive Obj;
// When there is a Virtual keyword, set the breakpoint to 22 lines, and stop here when debugging. we do not have
/ / Declare the constructor for the Derive class or its base class, this shows that the compiler automatically generates a constraint for the class.
// Number, let's take a look at what the compiler is automatically generated by this constructor;
00401088 LEA ECX, [EBP-4]
// put the address of the object OBJ in the ECX register, why? Strive above it ~
0040108B Call @ ilt 25 (Derive :: Derive) (0040101E)
// The compiler helps generate a constructor, what is it here? Waiting, let's talk, make a mark first: @ _ @ 1; on the top of the OBJ address in ECX is preparing for this function call;
LINE23: TEST (& obj);
// This call operation is the same as there is no virtual keyword on the above:
00401090 Lea EAX, [EBP-4]
00401093 Push EAX
00401094 Call @ ilt 5 (TEST) (0040100A)
004010c9 Add ESP, 4
(2) Disassembly content of the TEST function (it is very different when there is no Virtual keyword above it): line18: p-> Output ();
00401048 MOV Eax, DWORD PTR [EBP 8]
// put the value of Test's first parameter into the EAX register, in fact, you should already know, this is the OBJ // address;
0040104B MOV ECX, DWORD PTR [EAX]
// Oh, take the content of the number of addresses in the EAX register, do you know what this is? Waiting again //, make a mark first: @ _ @ 2
0040104D MOV ESI, ESP
// This is used to do ESP pointer detection
0040104F MOV EDX, DWORD PTR [EBP 8]
// Also store the OBJ's address into the EDX register, you should know, actually the THIS pointer, and this is preparing for the member function of the call class;
00401052 Push EDX
/ / Prepare the object pointer (that is, the THIS pointer), prepare for the member function of the call class;
00401053 Call DWORD PTR [ECX]
// This call is the member function of the class. Which function is you called? Wait, then say, make a mark first:
@ _ @ 3
00401055 CMP ESI, ESP
// Compare the ESP pointer, if it is different, the following __chkesp function will allow the program to enter Debug
00401057 Call __chkesp (00401110)
// Detect the ESP pointer to handle the stack error that may appear (if an error will fall into debug).
For a C class, if it wants to present a polymorphism (the general compiler will put this class and whether there is a Virtual keyword in its base class as this class, then the class will have a Virtual Table, and each An example (object) has a Virtual Pointer (hereinafter referred to as VPTR) pointing to the Virtual Function Table of this class, as shown in Figure 3:
(The VFunCADDR in the table below should be understood to be accurately used to store the address of the memory cell of the virtual function address. More specifically, it should be the address of the JMP instruction to jump to the corresponding function.)
(image 3)
First analyze the object Obj of the Derive class in our main function, look at its memory layout, because there is no data member, its size is 4 bytes, only one VPTR, so the address of the OBJ is the address of VPTR. . (The reason I am here is not a data member, because different compilers placed the location of the VPTR in the object memory layout, of course, is generally not placed on the head, such as a Microsoft compiler; On the tail of the object. No matter which case, for this example, "Obj's address is the address of VPTR") is set up.)
The VPTR of an object is not specified by the programmer, but is specified by the compiler in the compilation. So now let me explain the @ _ @ 1 - @ _ @ 3.
@ _ @ 1: That is to explain, why is the compiler generates a default constructor for us, what is it used? Still let us find the answer from the contrast:
This is the core assembly segment selected from the Derive constructor generated by the compiler:
004010d9 POP ECX
// The compiler defaults to the modes of constructor's constructor's call mode is __thiscall, so the ECX register, as before
// said that the Save is the THIS pointer, that is, the address of the OBJ object, which is also the address of VPTR;
/ / I found that even if you declare a constructor as __stdcall, it is also a default __thiscall's anti-assessment.
// sample, this point is different from the member function;
004010DA MOV DWORD PTR [EBP-4], ECX
// For the member function of the class called by __thiscall, the first local variable is always this pointer, EBP-4 is
// The address of the first partial variable of the function
004010DD MOV ECX, DWORD PTR [EBP-4]
// Because the constructor of the base class is to be invoked, the THIS pointer must be assigned to the ECX register;
004010E0 Call @ ilt 30 (Base :: base) (00401023)
// Perform the constructor of the base class;
004010E5 MOV EAX, DWORD PTR [EBP-4]
// put the THIS pointer into the EAX register;
004010E8 MOV DWORD PTR [EAX], Offset Derive :: `vftable '(0042201C)
// Put the first address of the virtual function table into the address pointed to by the THIS pointer, that is, the VPTR is initialized;
Everyone sees, the compiler generates a default constructor, which is used to initialize VPTR; then you can probably think of what is actually the Base constructor did, don't you expect, it is used to initialize VPTR of:
0040d769 POP ECX
0040D76A MOV DWORD PTR [EBP-4], ECX
0040D76D MOV EAX, DWORD PTR [EBP-4]
0040D770 MOV DWORD PTR [EAX], Offset Base :: `vftable '(00422020)
No need to explain, just like the Derive constructor function, the VPTR is initialized. If you declare and define a constructor, you will execute these initialized VPTR's code, then perform your code. (If you have the assignment code in the form of the constructor as the initial list of constructors, the assignment code in your initialization list is first executed, and then the initialization operation of the VPTR of this class is executed, and the constructor is executed. Code)
@ _ @ 2 and @ _ @ 3:
00401048 MOV Eax, DWORD PTR [EBP 8]
0040104B MOV ECX, DWORD PTR [EAX]
Here, the previous instructions are placed in eax, then you should know that the first four bytes of the memory cell corresponding to the OBJ address is actually a VPTR address? The content of the memory cell corresponding to the VPTR address is actually the start address of the VFTable table, and the content of the memory unit corresponding to the VFTable table address is the virtual function address. The following figure is clearer clearly (as shown in Figure 4, the figure represents the content corresponding table in the address and address unit. Note that the address in the VFTable table in the right is not a real function address, but jump to The address of the JMP instruction of the function, such as 0x0040ef12, is not the address of the true class :: xxx function, but jumps to the address of the JMP instruction of the Class :: XXX function). Such an ECX is actually the address of the memory cell of the Derive :: Output function address, and then calls: 0040104F MOV EDX, DWORD PTR [EBP 8]
00401052 Push EDX
00401053 Call DWORD PTR [ECX]
Jump to the corresponding function to execute the function.
(If there are multiple virtual functions, and call the nth virtual function, then the above CALL instruction will be changed to such a form: Call DWORD PTR [ECX 4 * (N-1)])])
The above assembly is like this: I got a key, open a drawer, take out something inside, but this thing is still a key, I have to take this key to open another drawer, take out the real thing inside. ^ _ ^
(Figure 4)
I know that I am going to go to the dragon, and someone else uses assembly to call the corresponding virtual function, then if I want to use C / C , what should I do? I think you should have a eyebrow. See how I did (here, a member function of a C class is called with a function pointer below, converting a C class member function to a C function, you need to do these: C function parameters ratio ratio A member function of the C class is one, and as the first parameter, it must be the address of the class object):
State the output function of the Base class as Virtual, then change the main function to:
INT __CDECL Main (int Argc, char * argv []) {
Derive Obj; // Object still has one
Typedef void (__stdcall * pfunc) (void *); // declared function pointer
Void * pthis = & obj; // Take the object address as the THIS pointer
// Corresponding Figure 4 is to assign 0x0012FF24 to PTHIS
PFUNC PFUNC = (PFUNC) * (unsigned int *) PTHIS; // Take the content of this address, corresponding to Figure 4
/ / This is the content of the address 0x0012ff24
// 0x00400112
PFUNC = (PFUNC) * (unsigned int *) PFUNC; // Take this address content, correspond to FIG.
/ / Should be the content of the address 0x00400112
// 0x0040ef12, that is, the function address is
PFUNC (PTHIS); // Execute the function, execute Derive :: outputReturn 0;
}
Run, look at the results. I have no object or point to the pointer to call the function. J
This time, you should know how the virtual function is going. The introduction here is a means of implementation of virtual functions based on Microsoft VC 6.0 compiler. The compiler implements the methods and strategies used by C , which can be explored from its anti-vocabulary sentences. Understand these underlying details, will be great to improve your C / C code! I hope this article can help you. Any questions or advice, please mailto: tigger_211@sina.com.