In-depth study of virtual functions and vTable
Department of Computer Science, National University of Defense Science and Technology
In an object-oriented C language, the virtual function is a very important concept. Because it fully reflects the two major characteristics of inheritance and polymorphism in object-oriented ideas, it is widely used in C languages. For example, in Microsoft's MFC class library, you will find that many functions have virtual keywords, that is, they are virtual functions. No wonder that someone even called the virtual function is the essence of the C language.
So, what is virtual function, let's take a look at Microsoft's explanation:
The virtual function refers to a member function you want to overload in a class. When you use a base class pointer or reference to a inherited class object, you call a virtual function, actually call the inheritance version.
- Excerpt from MSDN
This definition is not very clear. An example is also given in MSDN, but its examples are not well explained. We write such an example:
#include "stdio.h"
#include "conoe.h"
Class Parent
{
PUBLIC:
Char Data [20];
Void function1 ();
Virtual void function2 (); // Here, FUNCTION2 is virtual function
} PARENT;
Void Parent :: function1 ()
{
Printf ("This Is Parent, Function1 / N");
}
Void Parent :: function2 ()
{
Printf ("This Is Parent, Function2 / N");
}
Class Child: Public Parent: Public Parent
{
Void function1 ();
Void function2 ();
}
Void Child :: function1 ()
{
Printf ("this is child, function1 / n");
}
Void Child :: function2 ()
{
Printf ("this is child, function2 / n");
}
Int main (int Argc, char * argv [])
{
PARENT * P; / / Define a base class pointer
IF (_Getch () == 'c') // If you enter a lowercase letter C
P = & child; // point to inheritance object
Else
P = & pent; / / Otherwise pointing to the base class object
P-> Function1 (); // Here is directly sent to Parent :: function1 ()
Entrance address.
P-> function2 (); // Note which FUNCTION2 is executed?
Return 0;
}
Compile and run with any version of Visual C or Borland C , enter a lowercase letter C to get the following result:
This is parent, function1
This is child, function2
Why does there be the result of the first line? Because we use a Parent class to call the function fuction1 (), although this pointer points to the Child class object, the compiler cannot know this fact (until running, the program can be judged according to the user's input Objects pointed to the pointer), which can only be understood and compiled according to the function of calling the Parent class, so we see the results of the first line.
So what is the result of the second line? We noticed that the function2 () function is modified by the Virtual keyword in the base class, that is, it is a virtual function. The most critical feature of virtual functions is "Dynamic Board", which can determine the object pointing to the pointer at runtime and automatically call the corresponding function. If we enter a non-C character when we run the above program, the result is as follows: this is parent, function1
This is parent, function2
Please take note of the second line, and its results have changed. The program is only called only a function2 () function, but can automatically determine the FUNCTION2 in the base class in the base class according to the user's input, which is the function of the virtual function. We know that in the MFC, many classes need you inherited, and their member functions must be overloaded, such as writing the most common CView :: OnDRAW (CDC *) function written by MFC applications, and must be overloaded. Define it as a virtual function (in fact, on the mfc ondraw is not only virtual function, but also a pure virtual function), you can ensure that the time call is the onDraw written by the user yourself. Important use of virtual functions can be seen here.
On the basis of the virtual function, we consider such a problem: a base class pointer must know the object it is to be a base class or inheritance, in order to "automatically" when calling the virtual function, which version should call which version, How is it knowing? Some of the C books mentioned that this "dynamic connection" mechanism is made through a "vtable", what is VTABLE? Microsoft describes this in the documentation on COM:
VTable refers to a function pointer table, like the implementation of C , the pointer in the vTable point to the interface member function supported by an object.
- Excerpt from MSDN
Unfortunately, Microsoft still didn't make it clear this time. Of course, the above document is about COM, which is different from our concern.
So what is VTABLE? Let's take a look at the experiment below:
Add a printf ("% d", sizeof (child)) in the previous example program; run, then remove the Virtual keyword before function2 (), run again, get this result: When function2 defines the virtual function The result is 24, otherwise the result is 20. That is, if Function2 is not a virtual function, a size of a CHILD class is only the size of its member variable Data array, and if Function2 is a virtual function, the result is more than 4 bytes. We use 32-bit Visual C 6.0, 4 bytes, just a pointer, or a space occupied by an integer.
So what role does this more about the four bytes?
Use Visual C to open the previous sample program, p-> function1 () in the main function, press F9 breakpoint, press F5 to start debugging, enter a lowercase C, the program stops to our breakpoint. Find the Debug toolbar, press the Disassembly button, as shown in the figure:
We have seen the disassembly code. As can be seen from the above figure, the code generated after the call to the invisibility of Function1 and Function 2 is very different. Function1 is not a virtual function, so it is only compiled into a Call instruction for its call, and the function2 is a virtual function, and its code is to be more complicated. Let's analyze: 45: p-> function2 ();
004012CA MOV EAX, DWORD PTR [EBP-4]
// EAX is our P pointer
004012CD MOV EDX, DWORD PTR [EAX]
// EDX takes four bytes of the Child object head
004012CF MOV ESI, ESP
004012d1 MOV ECX, DWORD PTR [EBP-4]
/ / May have to check the stack, no matter whether it
004012D4 Call DWORD PTR [EDX]
// Note Here, a function pointer in the head of the Child object is called.
004012D6 CMP ESI, ESP
004012d8 Call __chkesp (004013B0)
The most critical sentence here is Call DWORD PTR [EDX], EDX is the head of the Child object, where we have analyzed, the Child object has 24 bytes, of which member variables take up 20 bytes, and 4 bytes are unknown. From this assembly code, the four bytes are probably the function pointer starting at the beginning of the Child object, because the compiler does not know what our member variable DATA is doing, it is more impossible to put any part of Data. It is handled by a function pointer.
So what this function pointer will jump there? Let's press F10 to run to this CALL instruction, then press F11 to follow:
00401032 JMP Parent :: function2 (0040BFE0)
00401037 JMP Parent :: Parent (004010D0)
→ 0040103C JMP Child :: function2 (00401250)
00401041 JMP CHILD :: Child (004011C0)
The cursor stopped in the third line, 40103c, and after the JMP instruction here, jump to the location of Child :: function2 to get the results we have seen above.
This is not the final conclusion, let's take a look at the few lines of code around the 40103c, and the continuous lines are JMP instructions. What is the program structure? Friends with assembly language programming may think of it, this is an entrance table, which stores a few important functions of jump instructions! Let's go back and see Microsoft's description of VTABLE: vTable refers to a function pointer table, (like the implementation of C ,) The pointer in VTABLE points to (an object supported by an object) member function. Words of parenthesis do not look, the backbone of this sentence is: vtable is a function pointer, pointing to member functions. Various facts prove that the above four lines of code is this VTable we have to find!
Now we should have a knowledge of the principle of virtual functions. Each virtual function accounts for a group in vtable, saving a command to jump to its entry address (actually saved its entry address). When a object containing virtual functions (note, not an object's pointer) is created, it adds a pointer to the header, pointing to the location of the VTABLE. When you call the virtual function, no matter what the pointer is called, it first finds the entry address according to the VTABLE to implement "Dynamic Corbed". And not simply jump to a fixed address as a normal function. The above conclusions are only for Visual C 6.0 compilers. For other compilers, the specific implementation is not exactly the same, but they are all different. The famous "Green Corps" magazine wrote, the GNU C compiler on the Linux platform puts the pointer to the VTable in the target tail instead of the head, and the VTABLE is only stored in the entry address of the virtual function, not jumps to Directive for virtual functions. Some of the specific details, the space limit, we will no longer discuss here, I hope that friends who are interested can continue to study.