IL code underlying operation mechanism
Cyclic processing
Liu Qiang
Cambest@sohu.com
October 22, 2003
Previous article We discussed the basic operational mechanism of IL code. In this article, we will discuss how the IL code handles cycles in C #. Examples also relate to array processing, as well as some new instructions. Although some people have been related to the relevant issues, I also have seen a few related articles, but I think they describe it is not very clear, so I will take the opportunity to reinite into a text, I hope to understand everyone. Net will have something. Help, and also hope to help the relevant designers who study the virtual machine mechanism.
Similarly, here is also a C # code first, then let us study their compiled IL code in detail. Here is a C # code, which contains three cycles, which are for, while, foreach loop:
Public int looptest ()
{
INT i = 3;
INT j = 9;
INT S = 0;
INT K; file: // The above statement defines the variable and initializes
For (k = 0; k <= i; k )
{
S = K;
} File: // for cycle block
K = 0;
While (K { S = K; K ; } File: // while loop block Int [] array = {2, 3, 4, 5, 6, 7, 8, 9}; Foreach (int a in array) { S = a; } File: // foreach circulation block Return S; } Here, what we have to do is to figure out how the C # compile is translated into the source program to achieve loop processing, or how to use the IL language to realize the cycle in the C # language. This is very helpful for our in-depth understanding of C # language characteristics. Of course, this is not enough, I will introduce more related questions. Let us first let's see what IL code is compiled by this function: .method public hidebysig instance int32 looptest () CIL Managed { // Code size 101 (0x65) .MAXSTACK 3 .locals init ([0] int32 i, [1] INT32 J, [2] INT32 S, [3] INT32 K, [4] INT32 [] 'Array', [5] INT32 A, [6] INT32 CS $ 00000003 $ 00000000, File: // The same local variables as the function return value, maintained by the compiler, specifically for storage returns //value. If the function is a VOID type, there is no such variable. [7] INT32 [] CS $ 00000007 $ 00000001, File: // Local variable, store array reference for Foreach cycles. This example corresponds to the 'Array' array. [8] INT32 CS $ 00000008 $ 00000002 File: // Local variable, store array index. Dedicated to the Foreach cycle, maintained by the compiler. ) IL_0000: ldc.i4.3 IL_0001: STLOC.0 IL_0002: ldc.i4.s 9 IL_0004: STLOC.1 IL_0005: ldc.i4.0 IL_0006: STLOC.2 IL_0007: ldc.i4.0 IL_0008: STLOC.3 IL_0009: Br.s IL_0013il_000b: ldloc.2 IL_000c: LDLOC.3 IL_000D: Add IL_000E: STLOC.2 IL_000F: LDLOC.3 IL_0010: ldc.i4.1 IL_0011: Add IL_0012: STLOC.3 IL_0013: LDLOC.3 IL_0014: LDLOC.0 IL_0015: BLE.S IL_000B IL_0017: LDC.I4.0 IL_0018: STLOC.3 IL_0019: Br.s IL_0023 IL_001B: LDLOC.2 IL_001C: LDLOC.3 IL_001D: Add IL_001E: STLOC.2 IL_001F: LDLOC.3 IL_0020: ldc.i4.1 IL_0021: Add IL_0022: STLOC.3 IL_0023: LDLOC.3 IL_0024: LDLOC.1 IL_0025: BLT.S IL_001B IL_0027: ldc.i4.8 IL_0028: NEWARR [mscorlib] System.Int32 File: // Create a System.Int32 array of length 8. It can be seen that array elements are mapped to INT32 objects. IL_002D: DUP IL_002e: ldtoken Field ValueType ' ' IL_0033: Call void [mscorlib] system.Runtime.compilerServices.RuntimeHelpers :: InitializeArray (class [mscorlib] system.Array, valueetype [mscorlib] system.RuntimefieldHandle) IL_0038: STLOC.S 'Array' IL_003A: ldloc.s 'arrility' IL_003C: STLOC.S CS $ 00000007 $ 00000001 IL_003E: ldc.i4.0 IL_003F: STLOC.S CS $ 00000008 $ 00000002 IL_0041: Br.s IL_0055 IL_0043: LDLOC.S CS $ 00000007 $ 00000001 IL_0045: LDLOC.S CS $ 00000008 $ 00000002 IL_0047: LDELEM.I4 IL_0048: STLOC.S A IL_004A: LDLOC.2 IL_004B: ldloc.s a IL_004D: Add IL_004E: STLOC.2 IL_004F: LDLOC.S CS $ 00000008 $ 00000002 IL_0051: ldc.i4.1 IL_0052: Add IL_0053: STLOC.S CS $ 00000008 $ 00000002 IL_0055: LDLOC.S CS $ 00000008 $ 00000002 IL_0057: LDLOC.S CS $ 00000007 $ 00000001 IL_0059: LDLEN IL_005A: conv.i4 IL_005B: BLT.S IL_0043 IL_005D: LDLOC.2 IL_005E: STLOC.S CS $ 00000003 $ 000000 IL_0060: Br.s IL_0062 IL_0062: LDLOC.S CS $ 00000003 $ 00000000 IL_0064: RET } // end of method advanced :: looptest See the article Command Significance Memory Method (*) Br.s absolutely jump, equivalent to jmp bLt.s smaller than turning Lower Than Ble.s less than or equal to turn Lower or Equals Ldlen acquired array length LDELEM.I4 Detection item according to index Here we can see the .locals init directive gives the same variable name of the homologous program. This is because when it is reversible, there is a debug information file (* .pdb) in the same directory, otherwise we see the result variables represented in v_x (such as V_1, V_2, etc.). For topics about the local variables, see the "Function Related". If you have Win32 assembler design experience, it may be familiar with how to achieve loop control. For example, to achieve the function of adding from 10 to 100, we may do this: MOV ECX, 100 file: // ECX register storage loop count XOR EAX, Eax file: // Clear the EAX and Sign Register LOOP: Add Eax, ECX file: // Realize additional and saving EAX Dec ECX file: / / counting minus one CPR: CMP ECX, 9 file: // Judgment ECX> = 10 or ECX> 9 Jg loop file: // If the result is true (greater than), turn LOOP This is different from the advanced language (C / C / Java / C #), the loop condition in the FOR cycle is given in the first part of the program, and the low-level language executed, such as MASM is accustomed to test the cycle condition at the end of the cycle. Then how the C # compiler is to process the C # cycle condition position and the location of the loop condition test statement position in the general assembly, and use IL to realize the cycle condition detection and correctly realize the loop? First, here I would like to explain that in the assembly language executed in order, the test cycle condition is completely in the circular head. As the IL version of the above example: .locals init ([0] int32 Eax, [1] INT32 ECX, [2] INT32 RET_VAL) LDC.I4 100 STLOC.1 File: // Mov ECX, 100 LDC.I4.0 STLOC.0 File: // xor Eax, Eax or Move Eax, 0 L_0000: LDLOC.1 LDC.I4 10 file: // BLT.S L_0003 // ECX <10? YES-> JMP L_0001: NO -> Go ONL_0001: LDLOC.0 LDLOC.1 Add STLOC.0 file: // These sentences implement EAX = EAX ECX LDLOC.1 LDC.I4.1 Sub STLOC.1 File: // These sentences implement ECX = ECX-1 L_0002: Br.s l_0000 L_0003: ... Second, I want to explain the reasons that don't do this. Reasons are two, one is to destroy normal logic, this is from the compiler level. For example, for statement IF (k Let's take a look at the specific implementation of the three cycles in the example. For the basic operation mechanism of IL code, please see the article "IL code underlying operation mechanism". IL_0027 to IL_003F is an array initialization, it is more difficult to understand. We will put down, I will introduce. 1. For statement It can be seen that IL_0000 to IL_0008 in the block is to perform variable initialization work. Starting with IL_0009, it is a cyclic body. IL_00009 is a direct (absolute) jumping statement, jumps to IL00_13. Let's take a look at the content here: IL_0013: LDLOC.3 IL_0014: LDLOC.0 IL_0015: BLE.S IL_000B Load partial variable 3 (i.e., k), then load local variable 0 (ie j). Behind is a comparison transfer instruction BLE.S. It is not difficult to see that these three statements are used to compare the size of K and J. If the comparison result is true (less than or equal), it is transferred to the cyclic body (IL_000B), and the direct loop body is continued for the time. The process is like a cloud, simply intuitive, and not explain. From here we can also see that the FOR statement is the condition test, and then the cyclic body is performed. ... ldloc.3 ldloc.0 ble.s Top ... TOP K ... TOP J K ... Top ... The LOAD command adds the variable one by one to the program stack. BLE.S instructions are compared. It is worth noting that BLE.S also wants to make a clear operation. Not only Ble.s, other conditional transfer instructions are also true. 2. FOREACH statement The Foreach statement and the for statement process are roughly equivalent. What we are interested is how Foreach processes boundary conditions. From the IL0041, it enters the Foreach cycle. The same direct jump instruction took us to IL_0055, let us see what it is. IL_0055: LDLOC.S CS $ 00000008 $ 00000002 IL_0057: LDLOC.S CS $ 00000007 $ 00000001 IL_0059: LDLEN IL_005A: conv.i4 IL_005B: BLT.S IL_0043 As I introduced the CS $ 00000008 $ 00000002 is a stored array index, CS $ 00000007 $ 00000001 is a reference to array 'array'. The process operation of IL0055 to IL005B is this: first load the current index to the program stack, then load an array reference (32-bit HashCode). The LDLEN instruction acquires array length (64-bit long integer) based on array references and converts it to 32 to compare indexes with this length. If less than, transfer to the cyclic body continues; otherwise the loop is loop. From here we can also see that IL gives strong support to array operations directly, which provides corresponding instructions. 3. While statement and do-while statement As can be seen from the example, the WHILE and FOR loop processing methods are the same. There is no DO-While example here, but you can imagine that it is the same as for the FOR statement. However, the Do-While cycle should be noted that the direct jump instructions in which the circle is not like the FOR and Foreach loop to jump to the condition test code. Therefore, no matter what, the Do-While cycle is at least once. In this article, I introduced several relevant conditional jump instructions, and how the C # compiler processes cycles in the C # language. In fact, this article cannot be fully considered an IL underlying mechanism related articles, but it is necessary to understand IL in depth, this foundation is still necessary.