Effective C #: 2. Replace the multi-dimensional array with a nested array

zhaozj2021-02-17 193

Effective C #: 2. Replace with a nested array

Multidimensional Arrays

Chen Ming Microsoft C # /. Net Asia MVP

Difficulty: 8/10 Terms 1

Some algorithms need to use more complex array structures than a one-dimensional array, and there are two different options in C #, and there are two different options: Array Of Array or multi-Dimensional Array. As the name suggests, the nested array refers to an array of arrays as a single data member. The nested array does not require each sub-array having the same number of elements, so its structure is a zigzag shape, which is also referred to as a jagged array. Correspondingly, the multidimensional array uses a plurality of index values to access elements in a continuous memory, since the upper limit of each dimension must be specified in the defined phase, and the multi-dimensional array is typically matrix on the layout.

As can be seen from the figure, the elements in the nested array cannot be directly accessed, and must first find the corresponding sub-array according to each dimension of the element, and then further indexing in the subset. For example, the access to a [2] [2] must first find the sub-array a [2], and then accessed the elements of the subscript 2 in A [2]. In the multidimensional array, since the memory space for the storage element is continuous, and the number of each dimensional element of the array is fixed, the element can be easily calculated from the index value of each dimension in an array memory in the array memory. In this way we can directly access an element in a multi-dimensional array. For example, in an array of B [6, 7], the offset of the B [2, 2] element is 2 * 7 2 = 16, then B [2, 2] is actually in the data memory in array b. The 16th elements in. At this point, intuition tells us that using multi-dimensional arrays should be able to achieve better performance than nested arrays, because simple mathematical calculations are more efficient than indirect memory access. It seems that "using a multi-dimensional array rather than the nested array" is written to our experience manual.

However, only the conclusion that the conclusion is finally appeared is a bit weak, and we need more judgments before speaking for this conclusion. Perhaps the intermediary code (MSIL) generated by the C # compiler will help - the "assembly" code of .NET looks not so mysterious.

In the following test code, the instance of the nested array and the two-dimensional array is defined, and one of the elements of it is accumulated:

// jaggged array operation

int [] [] ja = new int [3] [];

...

JA [1] [2] ;

// Multi-Dimensional Array Operation

int [,] ma = new int [3, 3];

...

MA [1, 2] ;

The MSIL instructions generated by the accumatant statement compile are as follows:

// ja [1] [2] ;

IL_0000: ldloc.0 // ja

IL_0001: ldc.i4.1

IL_0002: LDELEM.REF

IL_0003: DUP

IL_0004: STLOC.2

IL_0005: ldc.i4.2

IL_0006: LDLOC.2

IL_0007: ldc.i4.2

IL_0008: LDELEM.I4

IL_0009: ldc.i4.1

IL_000A: Add

IL_000b: STELEM.I4

// mA [1, 2] ;

IL_0000: LDLOC.1 // Ma

IL_0001: DUP

IL_0002: ldc.i4.1

IL_0003: ldc.i4.2

IL_0004: ldc.i4.1

IL_0005: ldc.i4.2

IL_0006: Call Instance INT32

INT32 [0 ..., 0 ...] :: get (int32, int32)

IL_000b: ldc.i4.1

IL_000c: Add

IL_000D: Call Instance Void

INT32 [0 ..., 0 ...] :: set (int32,

INT32,

Maybe you

MSIL

Not so familiar, but even if so, some significant differences in the implementation: the element access code of the nested array contains only some simple instructions, and the elements of the two-dimensional array actually contain two Function call! It is more careful, there will be more strange discovery:

Get / set

should be

INT32 [0 ..., 0 ...]

The member function of the class, but

INT32 [0 ..., 0 ...]

What type is it? All arrays are from

SYSTEM.ARRAY

Inherited, but

SYSTEM.ARRAY

Not included

Get / set

Definition of functions, and

.NET Framework

In the document, you can't find it about

Get / set

Any information on these two functions!

I have seen the huge question mark on your top, and now it should be introduced.

.NET

For the support of various array types.

The .NET Run System (CLR) divides the array to two categories: one is a one-dimensional array of start-down, which is often referred to as a Vector or SzArray; another is a one-dimensional number of groups starting the start-down subscript and All multi-dimensional arrays are known to be Mdarray. Since C # does not directly support the starting subscript non-zero array, and this array is rare in practical applications, this special array type will not be involved in this chapter.

The Vector is the most commonly used array type, so the CLR provides direct MSIL instruction support to the Vector to achieve better performance. The following table lists the MSIL instructions related to the CLR and array operations:

NEWARR

Create a new vector-type array

LDELEM

Read an element in the vector

LDELMA

Read the address of an element in the vector

ldlen

Read the number of elements in the vector

STELEM

Assign a value for one of the arrays

Among them, LDELEM and Stelem exist in a modified form of different types of arrays, such as the array of INT32 types, while LDELEM.REF is used to operate all arrays containing reference type objects.

Nested arrays are only in the nested form of a vector, so these MSIL instructions can also be used for various operations of nested arrays. After understanding the functions of these instructions, I believe that the MSIL code snippet of the above to access the nested array elements is not difficult.

Various operations of MDARRAY are somewhat complicated relative to Vector. To this end, the CLR implementation of the MDARRAY is similar to generic template similar to C : the system first summarizes the member function model of several operations such as GET, SET, and Address, where GET is subscribed as a parameter Read the specific element in the array; set is the value of the element that sets the specific subscript in the array; address is used to acquire the address of the specific element in the array. For example, for a three-dimensional MDARRAY containing Double data, the above three functions are as follows:

Double Get (INT D1, INT D2, INT D3);

Void Set (Int D1, INT D2, INT D3, DOUBLE V);

Double * Address (int D1, Int D2, INT D3);

However, due to the specific implementation of these functions involves the element type of the specific array (mainly for the offset calculation in the array), the CLR cannot directly provide the implementation of these functions. The CLR will really define a specific array class and generate the actual code of several functions mentioned in the previously included MDARRAY. INT32 [0 ..., 0 ...] as seen in the previous is the example of the Mdarray generated by the CLR, while GET, SET, and Address are three member functions of this class. Since these functions are defined by the running system, they can not find their implementation in the class set file, even if they are not visible in the .NET Framework documentation. After understanding the actual implementation of the .NET in the actual implementation of our performance: It is well known that the function call itself is time-consuming, because it contains the stack out of the parameter, and the transfer of the program control flow . Thus, language such as C requires a function of inline to increase the performance of the function call (regarding the support in .NET to reference to the terms X). The MSIL directive is used directly increases the opportunity to further optimize during the JIT Compile process.

It seems that this time we have shatled that it is deceived by his intuitive, and it should be a more superior performance in .NET. But some readers who have seen the terms X may still have a question: How do you know that the function like get, set does not compile in the JIT compilation process?

It is difficult to prove that a function is indeed embedded in the inline form, especially in the JIT environment. But there are some simple ways to prove that the JIT compiler will not attempt to perform inline compilation in line with some functions - such as the type reflection:

int [,] mda = new int [3, 3]; // Define MDARRAY

Type t = mda.gettype (); // get the corresponding TYPE

Memberinfo [] minfo = t.getMember ("get");

// We know that MDARRAY has a unique GET function definition

MethodInfo method = minfo [0] as methodinfo;

MethodImplattributes miattr =

GetMethodImplementationFlags ();

IF (Miattr & Methodimplattributes.noinLining) {

Console.Writeline ("OOPS, Can't Be inlined!");

}

The above program segment can get some specific flags of the GET function, including telling the JIT compiler to avoid the NOINLINININININING in the inline form. If you compile and run the above code, the result of run will tell you clearly that the JIT compiler does not attempt to handle these functions in the inline form.

Blocking the JIT compiler inline image of GET / SET is not a wise choice, but the implementation has their own difficulties - especially in the system architecture and performance. Microsoft has already expressed the support of generic programming and template in the later version of the .NET Framework, and the effort to CLR in Mdarray should be considered as an architectural attempt to support generic technical support. Only in a complete and elegant system architecture is likely to further improve the function and improve performance.

At this point, we should have sufficient theoretical basis for prove that the nested array has a more superior performance than a multi-dimensional array. But what else is more convincing than an actual example?

The following function is used to calculate the "distance" (similar degree of similarity) of two strings, which contains very dense nested array operations. Since the nested array is replaced with a two-dimensional number of two-dimensional calculations, this is no longer listed using the code of the two-dimensional array: // compute the distance of two char [] strings

Public static int stringdiff (char [] str1, char [] str2) {

INT COST_INSERT = 1, COST_DELETE = 1;

INT COST_REPLACE = 3, COST_MATCH = 0;

INT I, J;

int [] matrix = new int rt [str1.length 1] [];

For (i = 0; i

Matrix [i] = new int [str2.length 1];

}

Matrix [0] [0] = 0;

For (i = 1; i

Matrix [0] [i] = matrix [0] [i - 1] cost_insert;

For (j = 1; j

Matrix [J] = Matrix [J - 1] [0] COST_DELETE;

For (i = 1; i

{

For (j = 1; j

INT V1, V2, V3;

V1 = matrix [i] [j - 1] cost_insert;

V2 = matrix [i - 1] [j] cost_delete;

IF (str1 [i-1] == STR2 [J-1]) {

V3 = matrix [i - 1] [j - 1] cost_match;

} else {

V3 = matrix [i - 1] [j - 1] cost_replace;

}

INT VMIN = (V1

Matrix [i] [j] = (vmin

}

Return matrix [str1.length] [str2.length];

}

The test results show that over the current CLR implementation, the version of the nested array is more than 40% faster than using the two-dimensional array. Although the nested array requires more creation and initialization, it is enough to believe in the performance of the design, which should be tried to replace the multi-dimensional array.

Finally, there is also a problem about the need to pay attention to the nested array: Although the public language specification (CLS, see Terms X) clearly specifies the element type and CLS compatibility in the array, the nested array and all the zero starts The target multi-dimensional array is compatible with the CLS compatible data type, but because the C # compiler implemented issues, any classes containing nested arrays as a common member will be considered compatible with CLS by the C # compiler. The unreasonable refusal of the compiler is very unreasonable, but how do you say it, this is reality. (End) * This article is originally originalized, please do not reprint without the author's license.

转载请注明原文地址:https://www.9cbs.com/read-31358.html

9cbs

New Post(0)