REL = "file-list" href = "IL code underlying running mechanism" Files / FileList.xml ">
IL code underlying operation mechanism
IL code underlying operation mechanism
Function related
Liu Qiang
Cambest@sohu.com
October 31, 2003
The content involved in this article may be a high-level feature in C # and MSIL (there is no thing that can be called advanced features for IL, but here I simply call it advanced features). The function section includes function calls, function internal variable processing, commission, events, unmanaged code calls and other topics, involving interface, inheritance, closeness, commission, events, etc. in C # language. Readers need to be more familiar with C # language and initially understand the IL language.
1. Function declaration definition
In IL, the implementation of the function is very similar to C #, that is, the .method ID is the function declaration [Return Value Type] function name (parameter list). You can try to compile the following code:
//test.il
// Command Line: Ilasm Test.il
.assembly test
{
.hash algorithm 0x00008004 file: @ 1
.ver 0: 0: 0: 0 file: @ 2
}
// This is the accessory description statement. @ 1, @ 2 lines can be commented off, but this explanation can't. .NET PE file loading
// The server is loaded with the file according to the list of accessories it produces.
.method static void hello (String [] ARGS)
{
.entrypoint
.MAXSTACK 30
.locals init ([0] INT32 V_1,
[1] INT32 V_2,
[2] String [] V_2)
LDSTR "Hello, World"
Call void [mscorlib] system.console :: writeline (String)
Ldarg.0 file: // Load the Args parameters
LDLEN file: / / Calculate its length
Conv.i4 file: // Convert to 32-bit integers
STLOC.1
Br.s CPR
L1:
ldarg.0
LDLOC.0
LDELEM.REF file: / / Depending on array reference and indexing
Call void [mscorlib] system.console :: writeline (String)
LDLOC.0
LDC.I4.1
Add
STLOC.0
CPR:
LDLOC.0 file: // index value count
LDLOC.1 File: // Array Length
Blt.s l1 file: // Depending on the length, index, determine if the cycle condition is met
RET
}
This is our first IL version Hello, World program. The program sequentially outputs the command line parameters, such as entering Test Good Lucky, the program output is:
Hello, World!
good
Lucky
It can be seen that although we are used to seeing IL as a assembly language, it is quite advanced. The function declaration is our familiar C language style, and the function body also includes a pair of curly brackets. .EntryPoint ID Description This function is the program main function, that is, the program entry point; .maxstack 30 specifies the function stack size; it does not have to follow the maximum number of stacks used in runtime, but must not be less than, otherwise it will be triggered An unusual program is abnormal. Of course, it will also cause space waste, especially when nesting or recursive. I defined a little big here, but don't matter, the program is small and wasting how much space. .locals init32 v_1, [1] INT32 V_2, [2] String [] V_2) Define local variables, remember that I have described it in the "IL code underlying operation mechanism"? This statement is only indicating the compiler, and is allocated in the corresponding memory allocation at the final compilation of the VM code. Below I will also explain it. 2. Function call
We can often see such a function call statement in IL code:
Callvirt Instance Bool Functional.a :: pleasesayit (string)
or:
Call Bool Functional.c :: Pleasesayit (String)
Not only the function declaration is relatively close to the advanced language, but the form of call is quite advanced. What is the difference between two instructions, Callvirt and Call? From the instruction help, Callvirt seems to be used to call the virtual function. We also noticed that there is instance identifier before the CallVirt command (function full name), and the Call instruction is not. Can we speculate, callvirt is used to call the class instance method, and the CALL call is a class static method? It turns out that our speculation is correct. So why to define two function call instructions? Through the following explanation, we will get the answer. The CALL directive directly accesses the static method defined in this class through the class name. For the class static method, its call can be specified in the compile period, which is undoubted, it does not change in the run. So what is the Callvirt directive? Let's take a look at this example below.
Class a {
Public void pleasesayit (string s) {
Console.writeLine (S "In Class A");
}
}
Class B: a {
Public void pleasesayit (string s) {
Console.WriteLine (S "In Class B");
}
}
Let us see what results will be obtained by performing such a code:
A a = new b ();
A. Pleasesayit ("Hello,");
The resulting output is Hello, In Class A. This doesn't seem to be the result we expect. This is very different from Java. In Java, if B is overloaded, the method is overloaded, the A.pleaseSayit call will be the function in B. To achieve this in the C # language is more troublesome than in Java, you first need to add a Virtual keyword before a PleaseSayit definition, which has virtual function attributes in the overloadable PleaseSayit method of all A subscriptions. Second, add an Override keyword before the PleaseSayit definition of subclass B, indicating that the method of its base class has been rewritten. This also explains the above questions: Why do you have a CallVirt directive, the answer is that some function calls cannot be compiled, but is determined at the runtime.
Ok, we now have to figure out the specific implementation of CallVirt: First, check if the called function is a virtual function, not, if yes, if so, look up in this object space. Rewind implementation, if not, the function is also called directly. If there is, call rewriting; continue the above process until the latest rewriting is found. As shown below: A, B, C, D inheritance relationship:
A: :( Virtual) Dosth: b: :( override) Dosth: c :( override) Dosth: D :: Dosth
Code:
A a = new d ();
a.dosth ();
IL code:
.locals init ([0] Class A a)
NEWOBJ Instance Void D ::. CTOR ()
STLOC.0
LDLOC.0
Callvirt instance void A :: dosth ()
1 THIS VOID DOSTH () IS Virtual? No: Invoke IT | YES: GOTO 2
2 Search for Next Overloaded Method void Dosth ()
3 is there? No: invoke it | Yes Goto 4
4 this Method Is Override? NO: Invoke Prev Mehod | YES: GOTO 1
Class D logic inheritance
Then a.DOSth () is called C :: Dosth (). After I explained this, you should now know the difference between Callvirt and Call instructions; more you should know the usage of Virtual and Override.
In fact, in addition to the CallVirt and Call instructions, there is a special function call command, which is both a constructor call command newobj. Let us see how such a statement is achieved:
Funt.a a = new funt.a ();
One of its implementations can be:
.locals init.a v_0)
NEWOBJ Instance Void Funt.c ::. ctor ()
STLOC.0
The operation performed by the Newobj instruction is to allocate a memory space, and get a reference to the memory space, and then initialize the space according to the reference call class constructor, and finally add its reference to the stack.
We will come with discuss the constructor. As a default implementation of A:
Public a () {
}
Their IL is implemented as:
.method public hidebysig specialname rtSpecialname instance void .ctor () CIL Managed
{
.MAXSTACK 1
ldarg.0
Call instance void [mscorlib] system.Object ::. ctor ()
RET
}
There are two points worthy of our attention. First, LDARG.0 instructions, this is an instruction for loading parameters. However, the default constructor of A does not have a parameter. Note that the virtual machine When you encounter a Newobj command, you need to add a node to the object stack to store the object reference, and pass the pile lookup key value to an instance, that is, it is to pass it. The object reference is actually a pile lookup key value, which is a 32-bit unsigned integer). In an example method, 0 parameters are references to this instance; it is not explicitly specified by an instance method. For example, we want to call the PleaseSayit method of the object A. The process is like this: .locals init.a)
LDLOC.0
LDSTR "Hello, World!"
Callvirt Instance Void Funt.a :: pleasesayit (string)
Here, to pass the A reference to the funt.a :: pleasesayit method; otherwise, class code and object data are stored separately, and the objects of A may have multiple, but how to know the pair. Which object is operated? I also mentioned that the instance method parameter subscript starts from 1, because the object reference 0 parameters are hidden. The static method does not need to be involved in the object instance, so it starts from 0 from 0. As you can see, methods and object instances are discrete; about the storage of classes and objects, I will introduce in detail later.
The second point to note is that the call instance void [mscorlib] system.Object ::. CTOR () statement. Obviously, it is called a base class constructor. Whenever a new object is created, the constructor of the base class is first called. If we don't explicitly specify which constructor calling the base class, the compiler will specify a default constructor for us.
About function calls, there are several commands, such as Calli, etc., it is not discussed here.
3. Local variables and recursive calls
In the function, as long as there is a local variable, there is a statement such as .locals init (param List ...). I also said earlier, this statement just indicates that the compiler processes the local variable. So what role is it? Look at the example below.
We may use recursive calls with recursive calls with recursive calls: 1 2 3 4 ......
Static long linearsum (int Num) {
Long Result = 1;
IF (NUM == 1) Return Result;
Else Result = NUM LINEARSUM (NUM-1);
Return Result;
}
The local variable of the function is:
.locals init (INT64 RESULT, INT64 RETVAL)
Considering this situation, if the local variable defined in the function is stored in a fixed memory space, the Result is the value after the last execution, which will cause great confusion. Different number of items and time, it will add all the values until overflow, except for the first time, the next evaluation will get an inexplicable result. If the results of Result = 1 each time, the previous result will be cleared. (In the C / C language, you can simulate this situation, that is, add static keywords in the Result declaration. In C #, there is no static partial variable in the function method.) So, actually every time you enter the same function. When you reallocate the variable space, store the value obtained in the running period. In the IL language, there are also memory allocation instructions such as initblk, etc. Therefore, whenever it encounters the .locals init statement, this will be subjected to the relevant memory allocation instructions here, and the instructions are retracted at the end of the function at the end of the function. Thus, each of the enters a function, first, allocate the memory to the local variable (if there is a local variable), finally recovering the memory allocated to the variable at the end, by this to realize recursion. In fact, the local variable is stored in the system stack in the final machine code (after JIT compilation). The operation of the variable is done by the operation of the stack. For example, the high four-byte of the EDX storage results, the bottom four-byte of the EAX storage result, and the Result variable is on the 28H-byte of the stack before the stack is on the top of the stack, the stored implementation form is: MOV EBP, ESP
......
MOV DWORD PTR [EBP-28H], EAX
MOV DWORD PTR [EBP-24H], EDX
In the tail of the function, the value of the ESP can be reclaimed. Understanding of these content, helps us in depth understanding of the bottom details of the technology.
4. Entrusted and incident
4.1 commission
C # language provides us with a convenient feature, which is delegate. This makes it easy for us to deal with various events, especially when the UI event is very convenient, not like in C , using the callback function, not only trouble, but it is easy to make mistakes. For example, we have to handle some problems when the main form MAINFORM is closed, and these problems are not handled by MainForm, but by MyTask objects. So how do we get MAINFORM closing messages? Here, the entrustment shows its flexibility. The response of MainForm is implemented by a closed, while Closed is declared in C # as a delegation. In this way, MyTask is to receive and process the form shutdown event, and only one form is the same as the delegate system.eventhandler, and the function contains the processing code, and register the function to the mainform's Closed. For example, after MainForm and MyTask are created, do the following: Mainform.closed = New System.EventHandler (MyTask.ProcessWhileclosed), you can reach the purpose of the Close event by MyTask responding to MainForm.
Let's take a look at how the .net is delegated. First declare a delegation, such as public delegate void ehandler (Object SRC), then disassemble it to see what is processed: .class public auto ANSI Sealed Ehandler Extends System.MulticastDelegate {
Public Hidebysig SpecialName RTSpecialName
Instance void .ctor (Object 'Object', Native Int 'Method') runtime management {
}
.method public hidebysig Virtual instance void invoke (Object src) runtime management {
}
.method public hidebysig.comlot Virtual Instance Class System.iasyncResult BeginInvoke (Object Src, Class System.asyncCallback Callback, Object 'Object') runtime management {
}
.method public hidebysig.comlot Virtual Instance Void endInvoke (class system.iasyncResult result) runtime management {
}
}
From here we can see that our definition is actually a Sealed class from SYSTEM.MULTICASTDELEGATE, which contains three methods: BeginInvoke, Invoke, EndInvoke. Its constructor has two parameters. The first Object type parameter is an object reference, the corresponding object corresponding to the reception method, the second parameter is a method reference (32-bit integer, such as object reference, a bit like a function pointer, but there is also very Large difference, corresponding to System.intPtr (Native Int). For example, in the example above MainForm.Closed = New System.EventHandler (MyTask.ProcessWhileclosed), the first parameter that is incoming the EventHandler constructor is MyTask, and the second parameter is the processWhileClosed method reference. If the entrusted function is a static method, the first parameter is NULL. This is actually clearly telling us, do not try to inherit system.multicastdeLegate to build your own commission class, because we can't get the method reference (can't be used in the C # language, but there is a LDFTN instruction in the IL language to get a method reference) Only the compiler can determine. In fact, the C # language also specifies special classes such as MulticastDelegate because they are designed for C # languages. From this point, we can also see that the C # language is quite close to .NET class library, its syntax implementation is supported by the .NET class library. This is not difficult to understand why C # is a language designed specifically for the .NET environment.
Let's take a look at what the implementation of the delegate process call is. For example, the MainForm object calls a Closed delegate within a suitable method therein (such as form process in form.wndproc, c #):
......
Case WM_Closed:
Closed (Sender, EARG); BREAK;
......
The IL implementation calling the Closed delegate is like this:
Ldarg.0 file: // Load object reference
LDFLD class myform :: closed file: // Get fields
ldarg.0
Callvirt Instance Void MyForm.Closed :: Invoke (Object) File: / / Indirect calls of closed invoke method
The approximate process is such that the form object reference (MainForm) is first loaded on the stack, and then load the Closed delegate field reference to the stack according to the reference. Then then load the MainForm reference to the stack again, call the Closed INVOKE method to invoke the method registered in the Closed. This can be more useful than the callback function in C / C . A commission can register multiple static or instance methods. Handling these methods is made by the entrusted object, no longer need us to write a callback function. . If you have a C / C Windows programming experience, you will deeply understand the meaning of this sentence.
4.2 incident
I introduced a commission in front, introduced it to it with it to combine with it: Event (Event). Entrusted and incidents are born to be brothers, they work together to achieve simple and convenient features in C #. Still in the above example. Take a look at the statement: public system.eventhandler closed. Closed delegates are declared as public so we can register their way to the outside of CLOSED = New EventHandler (Your.Process). But we can also directly touch the method like Mainform.Closed (Sender, Arg), which violates the spirit of encapsulation in object-oriented ideas. If you declare it as Private, you can't accept registration outside the class; use properties to solve it, it is too much trouble. The solution in the C # is to add an Event keyword before the delegate declaration, like this: public event system.eventhandler closed. In this way, Closed is declared as an event, which can be registered outside, but cannot be called outside. This is the usage of the Event key.
What kind of results will be added to the EVENT keyword? Its role is that it only allows the commission to accept registration outside, and cannot be called outside? Not only that, it also instructs the compiler to generate an event attribute, two additions, delete the delegate:
.event system.eventhandler closed {
.addon instance void myForm :: add_closed (class system.eventhandler)
.removeon instance void myform :: remove_closed (class system.eventhandler)
}
Among them, the addon, the Removeon property corresponds to =, - = operation. Because we want to respond to events in the appropriate time, you can remove the registered processing that has been registered like MainForm.Closed - = New EventHandler (MyTask.Processsth) when not needed. Below we discussed the Add_Closed (EventHandler) method corresponding to the Addon property.
.method public hidebysig specialname instance void add_closed (class system.eventhandler value) CIL Managed Synchronized {.maxstack 3
ldarg.0
ldarg.0
LDFLD Class System.EventHandler MyForm :: CLOSED
ldarg.1
Call class system.delegate system.delegate :: Combine (class system.delegate, class system.delegate)
Castclass system.eventhandler
Stfld Class System.EventHandler MyForm :: CLOSED
RET
}
The approximate process of this code is, first get a reference to the field closed; then load parameter 1, that is, the delegation to add; then call the delegate.comBine method to bind them to a delegate.
First of all, we are interested in the System.Delegate.comBine (DELEGATE, DELEGATE) function. From the function name we can also see that the function binds a delegate delement to another delegate. Combine is a static method defined in Delegate, which is to add function references in the second delegate to the list of functions in the first delegate. So, we can see such a statement:
Eventhandler EH = NULL;
EH = New EventHandler (Instance.SOMEMETHON);
Some people may puzzle why such code will not cause an empty reference exception. It is actually because the Combine method will entrust "merge", so that such a code will not produce an exception. That is, when we call Delegate.comBine (EH, Another), if the EH is not empty, the method in the ANOTHER commissioned method reference is added to the EH method reference list; if EH is empty, create an EH and copy Another to EH. This is not like other situations, as we overloaded and used = operators (as if defined a special method, then call this method according to object references). As can be seen from here, C # is a support for the language level, and the compiler encounters = or - = operator when processing the delegate, and then the corresponding add or remove method.
Second, we are interested in the method modifier synchronized keyword. This shows that the event adds must be synchronized and cannot be interrupted. Otherwise it may cause confusion. Such as follows,
Thread1: EH = EventHandler1
Thread2: EH = EventHandler2
Thread 1 is interrupted when the function reference in EventHandler1 is added to the EH, and thread 2 executes the addition of the function references in EventHandler2 to the EH. This may cause the first-registered function to execute, it is likely to cause many problems.
[postscript]
With regard to the article of IL, I think it may be temporarily here to tell a paragraph. I think, the process of writing, is also a self-improvement process; although I have a little understanding of the JVM mechanism, through this period of time to the IL research, this article is completed, so that I have a virtual machine and .NET Have a deeper understanding. Due to the limited level of author, there may be such a wrong shortcomings in the article, I hope everyone can enlighten them; there may be more about IL technology, I have not involved, I also hope that everyone will discuss. Here, there is a relatively deep topic is the characteristics in C # (Attribute). There may be someone who still feels unclear about the characteristics. If what is characteristic, the characteristics have the effect, the characteristics are on the bottom, how is the IL level implementation, when to use features, wait, just like I just exposed to C # The same, it is also confused. Since it is busy recently, it cannot be written immediately. During this time, I also hope that someone will tell your opinion, I will be very happy to communicate with you. Let's talk about my emai: Cambest@sohu.com
All network media are welcome to reserve authors, and do not reprint the author's series of articles without authorized modifications.
[Reference book]
Java virtual machine specification, Tim Lindholm, Frank Yellin, Xuan Jianwei and other translations
C # Advanced Programming, Simon Robinson, K. Scott Allen, Yang Hao, Yang Tieni and other translations