Stealing the column - unveil the veil
- from C to .net
Disclaimer: The first half of this article is purely written for beginners. If you know a little about the C object model, you don't have to look. In the second half, pull it in the .NET's object model has made a simple comparison - in short, if you are not a beginner, you don't have to see, I don't want to waste your time :-)
Liu Weipeng (PONGBA) / text
Polymorphism is one of the important concepts of object-oriented theory, making it a major feature of modern programming language, from application perspectives, polymorphism is the modern application architecture that constructs high-flexible low coupling degree. Ability. From a conceptual perspective, the polymorphism allows the programmer to use the "a part of the" one "function of this object without having to care about a specific type of object. This "some" function can be rendered by the base class, or it can be rendered by an interface. The latter is more important - the interface is an important feature of scalability, and the implementation of the interface is dependent on the language of the language, or simply symbolizing the realization of the language.
This article does not cost to repeat the polymorphism, because its application is improving, and its conceptual theory has already been perfect. Here, we intend to take a look at a language from the point of view of a language to do something behind its polymorphism - knowing it, it can be paid in the time.
Maybe when you study a language, you have been confused about the characteristics of the polymorphism. Although it is very simple, it is very simple, it is just like it's original concept, but you also want to know the language (compiler) What are you doing behind, why can a derived class be used as a base class object? Use the base class pointer to the derivative class to call the virtual function to reach the correct function with accurate functions? How is the interior of the class layout?
We consider this: Suppose the language does not support polymorphism, and we must achieve polymorphism, how can we do it?
Polymorphic prototype:
Class B
{
PUBLIC:
INT flag; / / To expressed simplicity, 0 represents base class, 1 representative
Void f () {cout << "in b :: f ()";} // non-virtual function
}
Class D: Public B
{
PUBLIC:
Void f () {cout << "in d :: f ()";} // non-virtual function
}
Void call_virtual (b * pb)
{
IF (PB-> Flag == 0) // If it is a base class, then directly call f
Pb-> f (); // Calling is the base class f
ELSE // If it is derived, force transformation into derived class pointer and then call f.
(D *) PB-> f (); // calling F
}
In this way, it can be in line with the "corresponding function" according to the specific object type. However, this original program has some shortcomings:, for example, the code to distribute the "virtual function" should write itself, not elegant, not scalable (when the inheritance system is expanded, this pile of code will become bloated), no closed Sex (if a new derive class is added, the code "virtual function" call must be changed, however if this call is unverified (eg, library function), it means that a user joined the derived class Unable to compatibility with the library function), etc.. The result is - this scheme does not have versatility.
However, this scheme will illustrate some of the nature of the problem: FLAG data member is used to identify the specific type of object belonging, so that the caller can determine which function is called according to it. However, can you call the correct function without "know" the specific type of object? Yes, the improved scenarios are as follows: CLASS B
{
PUBLIC:
Void (* f) (); // function pointer, derived class object can change the behavior of the object by reassing it
}
Class D: Public B
{};
Void call_virtual (b * pb)
{
(* (PB-> f)) (); // Indirect calls referied to F
}
Void B_Mem ()
{
COUT << "I am b";
}
Void D_Mem ()
{
COUT << "I am D";
}
int main ()
{
B B;
B.f = & b_mem; // b_mem represents "virtual functions"
D D;
D.f = & d_mem; // Override B virtual function with D_MEM
Call_virtual (& b); // Output "I am B"
Call_virtual (& D); // Output "I am D"
}
In this improved example, the derived class object can obtain a specific behavior by modifying the function pointer F, which is important that the call_virtual function no longer needs to determine the specific type of object through an ugly if-else statement. Just simply invoke "virtual functions" through a pointer - this time, if the derived class needs to change the specific behavior, you can point the corresponding function pointer to its own function, this strike "stealing" by adding an indirect The method of the layer "God does not know the ghost" to drop the "virtual".
However, this trick still has a shortcoming - to manually implement, scalable, transparency, and so on. However, its idea is close to the modern compiler's implementation of the polymorphism mechanism.
By extending the function pointer in the above example into a hidden pointer array - virtual function table (VTBL) - C has the polymorphism we are now seeing. In the virtual function table, each virtual function pointer holds a group, if the derived class override has the corresponding virtual function, the corresponding entry is changed to the virtual function of the derived class - these work compiled Complete-thus, as shown in the above example, the user does not have to know the exact type of the object, it can trigger its specific behavior (that is, call "depending on the object specific type" member function), the virtual function table is for the user. Completely transparent, users only need to use a Virtual keyword to easily have powerful polymorphism.
If there is a virtual function in a C class, the class will have a virtual function table (VTBL), and in the object (generally at the head), there is an implicit pointing pointer to the virtual function table (VPTR) The following is a schematic diagram showing the realization of virtual functions:
As shown below:
Now there is a code:
Void f (b * pb)
{
PB-> F1 ();
}
The code generated by the compiler is as follows (indicated by pseudo code to show):
Void f (b * pb)
{
DWORD * __VPTR = ((DWORD *) PB) [0]; // Get virtual function table pointer
Void (b :: * midd_pf) () = __ vptr [offsetof_virtual_pf1]; // Get the corresponding virtual function pointer from the table
(PB -> * MIDD_PF) (); // Call the virtual function
}
In this way, if PB points to a D object, it is obtained to point to the function pointer of D:: F1 (refer to the second map above), if the PB does point to the B object, according to the VPTR within the B object. The virtual function table is obtained, pointing to the function pointer to b :: f1.
Now, the polymorphism mechanism for C is basically clear. The rest is the virtual function table pattern under multiple inheritance, and there is not much to say it. However, there are still some subtle details, see "INSIDE C Object Model" (Lippman) (Chinese name "in-depth C Object Model" - Houjie Translation).
There is also a detail on the C virtual function call mechanism - calling the virtual function in the constructor must be careful, because "in the constructor" means "the object is not constructed", this time the virtual function call mechanism is likely to be No start, for example:
Class B
{
B () {this-> vf ();} // call b :: vf
Virtual void vf () {cout << "in b :: vf () / n";
}
Now, regardless of the base class of B body, the B :: VF is called in the constructor. The careful reader will find: This is due to the relationship between the object constructive order - C clearly stipulates that the "building" of the object is "Since the beginning", that is, from the bottom base class to construct, so, B When this is called THIS-> VF, while the object referred to this is indeed a derived class object, the build behavior of the derived class object has not yet begun, so this call cannot run to the VF function of the derived class. It is like the second floor, and the one-story person is unable to run to the second floor.
To be deeper, the invocation of the virtual function is to be indirectly derived through the virtual function pointer and virtual function table. In the constructor of B, the compiler will insert some code, set the VPTR of the object head to point to B The pointer of the function table, so this-> vf derived is the virtual function table of B, of course, can only go to the VF of B. Later, when B is constructed, when the rotary class object part is constructed, the derived structure of the derived class changes the VPTR of the object head to the pointer to the virtual function table of the derived class. At this time, the virtual function call mechanism is enabled. The later this-> VF will use the derived class virtual function table to derive, thus reaching the correct function.
.NET object model
C object models with .NET (or Java) have a major difference - C supports multiple inheritance, not supported, and .NET (or Java) supports interfaces and does not support multiple inheritance.
And .NET's virtual function call mechanism is similar to C , but there are some differences due to intervention of interfaces and JIT (instant compilation).
In .NET, each class has a corresponding function pointer (in fact, this "table" is a data structure, there are other information in it), and the C is different, each function of the class (regardless of Not a virtual function) all corresponds to one entry. This is due to the need for JIT (instant compilation) - the call to each function is indirect, and the address of the function code is derived through the table. Note that when the first call is called, the function code is also an intermediate code (the code of the intermediate language MISL of .Net), so it will jump to the instant compiler, compile these code and put it in memory, and then put the corresponding table in the table The item points to the compiled Native Code, and each call will jump directly to the compiled code. The above just wants you to have a general understanding of the "virtual function table" of .NET. The following detailed analysis.
If there is no interface, .NET's virtual function call mechanism will be very simple - almost the same as C . Just, the interface is different after the addition. - You can convert object references to interface references, and then call virtual functions in the interface. Therefore, it is necessary to make some changes to the "virtual function table", for example, for the following inheritance structure:
Public Interface IFAST
{
Void F1 ();
Void f2 ();
}
Public Interface ISecond
{
Void S1 ();
}
Public Class C: IFirst, ISECOND
{
Public Override Void F1 () {}
Public override void f2 () {}
Public Override Void S1 () {}
Public Virtual Void C1 () {}
}
The memory layout of type C is mainly like this (because .NET is single inheritance structure, each class implicit inherits from Object, so the "virtual function table" of the type C contains all member functions of Object, but I have a slightly related part in the figure):
As can be seen in the above figure, Objref points to an object, at the top of the object (except for sync # blocks used to synchronize), it is HTYPE (which can be seen as a virtual function table pointer to the top of the C object), which The structure (CORINFO_CLASS_STRUCT, you can temporarily see it as a virtual function table, although the information contained in it is more than just virtual function pointers), not only the part of the virtual function table in C , and information identified when the object is run. Different, in the interface-based .NET inheritance style, the assignment of the virtual function for the interface is based on an IoT (Interface Offset Table, an interface offset table), and the Piot is pointing to such a table, each of which The items are an offset, which refers to the location of the virtual function pointer array in the interface in the corInfo_class_struct.
Thus, when the virtual function is called based on an interface, the mechanism behind it is: first reference the address of the CORINFO_CLASS_STRUCT structure corresponding to the class, then index the corresponding virtual function pointer in the interface offset table referred to in the Piot. The offset of the array is finally used by the pointer indirectly.
It can be seen that when the virtual function is invoked by the interface reference to call the virtual function, the first, the offset of the virtual function pointer array of the corresponding interface in the IoT, second, index the corresponding virtual function in the virtual function pointer array Function pointer, finally call. However, when the object reference is called based on the object reference, only one indirect layer is like a virtual function pointer directly in C - directly in the virtual function table, then call. Regarding the interface-based reference call virtual function, there is a detail that IOT prepared a table entry for each interface (including the interface that is not implemented), the reason is that the efficiency --.net requires each interface There is a fixed (or said, compiled) offset in IOT, so that when the generated code is called to generate code for the virtual function, it can find the virtual function pointer of an interface through this fixed offset. The location.
On the other hand, if an IOT of a class only contains the interface it implemented, when the virtual function is called via the interface reference, the corresponding offset of the interface must be known in the IOT, and this information must pass through the dynamics of the running period. The query can only be known (because the compiler cannot know which type object it point to only one type of object in the hand, there is also a dynamic query that the class does not know if the class is implemented. And in the way, the compiler does not need to know which class object does not need to know the interface reference to the end, because there is a Piot in the fixed position in the Corinfo_Class_Struct of each class, point to an IoT Each interface corresponds to a fixed (compiler known) entry) - Obviously, a dynamic query is not tolerable before each calling virtual function, so .NET would rather let IOT more The entry, with a space change time.
Perhaps you think this is too complicated, but this is necessary, the .NET-based inheritance corresponds to multiple inherits in C , and the latter implementation has similar complexity - perhaps more complex.
Finally, it is to be explained that this article may be more useful for a pure practical, but it is useful for people who want to use a language. Knowing that it can be checked, it can be paid. The role of its implementation mechanism can actually use it to throw bricks.