IL code underlying operation mechanism
Liu Qiang
Cambest@sohu.com
May 8, 2003
Everyone knows, like Java, C # is also a stack-based language. Perhaps, for the average person, the underlying operation details are not very important; but understand these, understand us, using C # is very helpful. Below, I will explain the underlying operation mechanism of the IL code through a very simple example, maybe some help you will.
I have seen the example on the surface is a function of achieving an integer reduction function; in fact, I don't know what to do. In the actual situation, there will be many data types in our program, reference types, for the sake of simplicity, the sample code I give only one data type, as shown below:
Public int SUB (INT I, INT J)
{
Int S;
INT T = 0;
INT r = 4;
s = i;
R = I - j;
R = S T;
Return R;
}
This code is simple, anyone who has learned C # can understand. First, in turn, two integer variables I and J, then pass through internal operation, return a full value. Three local variables S, T, R defined in the function body, respectively, for saving custom values and results, respectively. We can pack it into a class and then compile it into a .dll assembly. Using VS.NET comes with the ILDASM disassembly tool for disassembly, we get the following IL code:
.method public hidebysig instance int32 sub (INT32 I,
INT32 J) CIL Managed
{
// Code Size 22 (0x16)
.MAXSTACK 3
.locals init (INT32 V_0, INT32 V_1, INT32 V_2, INT32 V_3)
LDC.I4.0
STLOC.1
LDC.I4.4
STLOC.2
ldarg.1
STLOC.0
ldarg.1
ldarg.2
Sub
STLOC.2
LDLOC.2
LDLOC.0
LDLOC.1
Add
Add
STLOC.2
LDLOC.2
STLOC.3
Br.s IL_0014
LDLOC.3
RET
}
The IL code can also be compiled as a .dll assembly or .exe executable by the IL compilation tool ILASM comes with VS.NET.
Here, I have to explain the symbols that appear in the IL. The label started with the point number '.' The title of the pseudo-instruction code, only indicates the role, and finally not compiled into the local executable code, such as ".method", ". Locals", etc. Without the name of the name '.' The label is IL assembly code, which will be compiled as a local executable code at runtime, such as "ldarg.1", etc.
What kind of operations do every statement represents, let's explain in detail. Note: The subscript of the local variable starts from 0, so it is important to note the meaning of "Zero Partial variable" and the like.
First, let's take a look at the first statement in the function: .maxstack 3. From it itself, we can also guess the statement to indicate the size of the stack. Temporary, and look at the following.
The second sentence: .locals init (int32 V_0, INT32 V_1, INT32 V_2, INT32 V_3). V_0, v_1, v_2 and the local variables S, T, T, r defined in the CS source program correspond to one, and we can also guess this sentence to complete local variable initialization work, but why is here four? We clearly define three variables. So what is the fourth variable automatically maintained by the C # compiler? Also, you will not be a table, first look at the following. LDC.I4.0
This statement is to load constant in the stack, i4 indicates that the constant is a 32-bit integer number of double word length, and the initial value is 0. "LDC" can be understood as "load constant", load constant. As shown in Figure A, it is completed as if (TOP) <= 0, TOP = TOP 1.
STLOC.1
This statement is to store the current stack top element into the first partial variable. '1' indicates that the operation object is the first local variable. "STLOC" can be understood as "Store to Local" to save local variables. As shown in Figure B, it is done as TOP = TOP-1, S <= (TOP).
LDC.I4.4
The operation completed by this statement is like (TOP) <= 4, TOP = TOP 1, as shown in Figure C.
STLOC.2
The operation of this statement is like TOP = TOP-1, T <= (TOP), as shown in Figure D.
ldarg.1
ldarg.2
These two statements are to load the first parameter (i) in the stack, the second parameter (j) (and the local variables, the indication of the parameters starts from 1). It completed operation like (TOP) <= I, TOP = TOP 1, (TOP) <= j, TOP = TOP 1, as shown in Figure E. Among them, "ldarg" can be understood as "Load Argument" and load parameters.
Sub
This statement is to reverse the current stack top element, and then add it to the second stack unit, as shown in Figure F. Its operation is like TOP = TOP-1, TEMP = - (TOP), TOP = TOP-1, (TOP) = (TOP) TEMP, TOP = TOP 1.
TOP 0
TOP
TOP 4
TOP
TOP J i
TOP I-J
(a) (b) (c) (d) (e) (f)
STLOC.2
This statement is to store the current stack top element into the second partial variable (R). Its completion is like TOP = TOP-1, R <= (TOP), ie R = I-J, as shown
LDLOC.2
LDLOC.0
LDLOC.1
These three statements are to load second, first, first local variables to the stack, as shown in Figure H. "LDLOC" can be understood as "Load Local Variable" and load local variables.
Add
Add
The use of adds and SUBs, but does not reverse the current stack elements, then add it to the second stack unit, continuous operation twice. As shown in Figures I, J.
STLOC.2
Store the current stack top element into the second local variable. Figure k.
LDLOC.2
The second part variable (R) is loaded in the stack, as shown in Figure 1.
Toptop R
TOP
TOP R S T
TOP S T R
TOP T S R
(g) (h) (i) (j) (k) (l)
STLOC.3
The current stack top element is stored in the third partial variable, that is, the return value is saved, as shown in Figure M.
Br.s IL_0014
Jump to the next sentence (ldloc.3). As shown in Figure N.
LDLOC.3
RET
Load the third partial variable (that is, the variable that is automatically maintained by the compiler) is loaded onto the stack, and then returns, as shown in Figure O. From here, we can also see that the third variable and return value type; 2. After the tail is sweep, load the third part variable to the stack before returning. This allows us to determine: The third variable is used to store the return value. We also have to figure out why it is necessary to assign a local variable to store return values, which will be explained later.
TOP
TOP
TOP V_3
(m) (n) (o)
Local views A ~ O, we will find that the maximum number of stacks used throughout the function process is 3, which is not difficult to understand the first statement. Maxstack 3.
Now, there is a little confused that why is it to introduce variables v_3? As in the above example, the countdown second instruction LDLOC.3 can also be replaced by ldloc.2 because the result we want is stored in the second variable r, which is not a waste space. Be aware that not all return values are saved in a local variable.
It is possible that we return the parameters directly or return the class member variable, such as:
Public int Laxi (int X)
{
Return X; // Return the parameters directly
}
Or int Age; ...
Public int getage ()
{
Return Age; / / Return to the field defined in the class
}
Or return an expression directly:
Public int GetInteger ()
{
RETURN AGE 4 * 6/2;
}
This step must be made to RETURN R as an example: ldloc.x -> stloc.y -> ldloc.y -> Ret. Because in the above three cases, the return value is not stored in the local variable.