Hidden features in .NET
Background knowledge The Array class is a base class of all array types. In the last article "STRING class's hidden characteristics": The length of the array is not fixed, which is variable. First understand some related concepts: array element: array contains the value; array length: number of elements of an array can include: Dimensance: The total number of dimensions of arrays; the lower limit: the initial index of the dimension specified by array. Multidimensional array can have different lower limits. There are two different arrays to implement - SZ arrays and ordinary arrays. The SZ array is a one-dimensional array of 0 is lower; the ordinary array refers to an array of multidimensional or lower limit is less than 0. Sometimes we call the multi-dimensional array for an array of MD. Since the SZ array is more common, Microsoft has greatly optimized its performance. The following table details the difference between the SZ array and the MD array.
The SZ array MD array defines a one-dimensional, an array of 0-cooling array, or the lower limit is not 0 array C # syntax Object [] Object [] [] Object [,] --- Whether the two-dimensional array is Compatible with CLS compatibility (except interlaced arrays) is not compatible with IL optimization to operate these arrays, such as LDLEN, STELEM, etc. In version 1.0, no dedicated IL instructions, all operations for array are called Implementing methods Optimizing the primitive type array has a dedicated method, these methods do not need to be contained in the operation of some value type arrays, so having a higher performance in version 1.0, reference type, and value type arrays use the same method. The value type is repeatedly packaged and unpacking when the method is called, causing a very large performance impact basic length (excluding 8 bytes of method table pointer and object header) value type array - 4 bytes reference type array - 8 Byte value type array - 4 8 * Rank (dimension) Reference type array - 8 8 * Rank (dimension) JIT Optimization JIT compiler eliminates the range checking the JIT compiler does not optimize it. The CLR will perform additional code to detect each dimension
Some of the tables in the table have been more detailed in the article. From the table, we can clearly see that the SZ array performance is far better than the MD array, and the interleave array can be seen as an array of SZ arrays, of course, it is better than the MD array. However, remember that the interlaced array is not compatible with CLS, so it cannot pass between code written in different languages. Array IL Optimization Using System; Namespace ABC {Class Class1 {[Stathread] Static Void Main (String [] args) {int [] a = new int [5]; int [] c = new int [5, 5] A [0] = 1; C [0, 0] = 1;}}} The IL code on the above code is as follows: .method private hidebysig static void main (string [] args) Cil Managed {.entrypoint .custom instance void [ Mscorlib] System.stathreadAttribute ::. CTOR () = (01 00 00) // Code Size 29 (0x1D) .maxstack 4 .locals ([0] INT32 [] A, [1] int32 [0 ..., 0 ...] c) IL_0000: LDC.I4.5 IL_0001: NEWARR [mscorlib] system.int32 IL_0006: STLOC.0 IL_0007: LDC.I4.5 IL_0008: LDC.I4.5 IL_0009: Newobj Instance Void INT32 [0 ..., 0 ...] ::. CTOR (INT32, INT32) IL_000E: STLOC.1 IL_000F: LDLOC.0 IL_0010: LDC.I4.0 IL_0011: LDC.I4.1 IL_0012: STELEM.I4 IL_0013: LDLOC .1 IL_0014: LDC.I4.0 IL_0015: LDC.I4.0 IL_0016: ldc.i4.1 il_0017: Call Instance Void Int32 [0 ..., 0 ...] :: set (int32, INT32, INT32) IL_001C: RET} // end of method class1 :: main compares IL code for SZ array and MD array payable: Use Stelem.i4 command to array A4 instructions, and pay value to multi-dimensional number set Call the SET method. Array Internal field SZ arrays and MD arrays include the following two internal fields.
variable
Types of
description
Array Length
int
A number of actual elements in arrays
Element Type
Type
From the source code, this field is only used only in the case where the array contains "pointers". Here, "pointer" refers to an object's reference, not a pointer in non-managed code.
In addition to the two fields above, the MD array also contains the following two fields.
variable
Types of
description
Bounds [rank]
int []
A number of elements of a certain dimension
Lowerbound [Rank]
int []
The lower limit of an array of one dimension. Legal index should meet the conditions: LowerBounds [i] <= index [i] figure 1 Access to ordinary arrays must check several internal members, which will have a certain impact on performance. In general, we have two ways to optimize the performance of ordinary arrays: one is to use interleave arrays; the other is to use non-secure code access. Array Types and Categories If the two arrays have the same dimension and the same type of element type, we believe that the two arrays have the same type, different from C / C , where the upper and lower limits of each dimension are not considered, below The code illustrates this. Some methods (such as array.copy) When operating a multi-dimensional array, they view the multidimensional array internally as a one-dimensional array (array length is the sum of each dimension). Array a = array.createInstance (INT), New INT [2] {2, 2}, new int [2] {- 1, -1}; array b = array.createInstance (Typeof (int), new INT [2] {3, 3}, new int [2] {- 10, -2}); if (a.gettype (). Equals (b.gettype ())) console.writeline ("array a and b It belongs to the same type "); an interleave array with different dimensions belongs to a different type, such as int [] [] a = new int [2] []; int [] [] [] b = new int [2] [ ] []; A and B are different types. The reason is more obvious, we can think that the elements of the interleave array are arrays, and the type A and B are different, so A and B are different types. What is interesting is, the base class Array type call type.isaRray () method Return value is false, call the type.getlementType () method Return value is NULL. In addition to the basic length, the array also includes some data, as shown in Figure 1. The value type array contains unpackable structures (continuous arrangement), and the reference type array contains a pointer to reference objects (continuous arrangement). In addition, the reference type array has an element type field (ELEMENTTYPE) before the pointer data block. The reader may think that the information about element types can be obtained by the method table of the array, which is a bit more than. Otherwise, through this field, you can get type information, in addition, this for other features of the array, such as Array Covariance, is very important (later, this will tell this later). If the data is a value type, the length of the element is the same as the corresponding value type, and the reference type takes up the INTPTR.SIZE byte. INTPTR.SIZE is 4 bytes in the Win32 system, 8 bytes in the 64-bit system. Based on Microsoft's documentation records, INTPTR.SIZE is the same as the VOID * pointer, but in non-Win32 RotOr packets (such as Mac and UNIX), regardless of the CPU, INTPTR.Size is always 8 bytes. Types of Element byte length Bool 1 Byte 1 Short 2 int 4 Long 8 Float 4 Double 8 Decimal 16 String INTPTR.SIZE Object INTPTR.SIZE Interface INTPTR.SIZE You can't access the internal fields of the array through reflection, isn't it necessary to use a non-secure code to access the internal field? Here, there is no need, because the internal field of Array is disclosed by public methods and attributes. For example: getLength () method returns the number of elements of the specified dimension in the array. Related more detailed contents can be referred to MSND. The above is mentioned in two arrays: SZ arrays and MD arrays; value type arrays and reference type arrays. How should we determine them in the code? The following code is used to determine if the array is an array: IF (array.rank == 1 && array.getLowerbound (0) == 0) {} The following code is used to determine if the array is value type: IF ((ElementType (). getElementType () && elementtype.issubclassof (valueof (value)) && elementtype! = typeof (enum) && elementtype! = typeof (valueEtype)) {} Interestingly, enum [] or valueetype [] is not a value type array, and the elements they contain are references to the packing value type. The dynamic ArrayList class ArrayList class is a very useful class that handles the dynamic array, in addition to this, it can also be used to package the collection class. The ArrayList class allows you to create an internal array object and direct modifications to the array. Without explicitting ArrayList capacity, use the default capacity (16), the length of the array created by ArrayList is 16. The following table is listed in the four internal members of the ArrayList class. variable Types of description _items Object [] Internal array _size int ARRAYLIST instance number actually contains the number of elements _Version int The _Version will increment every time you modify the arraylist. _DEFAULTCAPACITY int Constant field indicates the default capacity An ArrayList instance takes a total of 20 bytes of memory (8-byte object overhead memory 12 bytes of instance information), which does not include the space occupied by an internal array (_items). When adding new elements to ArrayList (such as calling the addrange method), you need to be out of the initial capacity of ArrayList, and ArrayList will automatically expand capacity. ArrayList's capacity or double or increase to new count, taking two larger, the internal array (_ITEMS) is also reassigned to accommodate new elements, and the existing elements are copied into the new array. For optimization performance, if the length is known in advance, you should assign enough memory to avoid unnecessary replication for ArrayList. If all array elements have been added, and no longer expand the array (_Items), you should call the ToArray method to replace it into an array of types of security, so that no matter how capacity is in memory or performance optimization. We can call the Trimtosize method to intercept the unused part of ArrayList, which is actually performed primary elements. After calling Trimtosize, you want to release the memory occupied by the array, and call the CLEAR method. It should be noted that the Trimtosize method is executed on an empty arrayList is to set ArrayList's capacity to default capacity, not zero. It should be noted that Creating ArrayList If the capacity is set to 0, the CLR will be created using the default value 16. The ArrayList class is not a complete alternative of the Array class. I think the performance of ArrayList is much better. I will analyze the difference between another article in detail, especially in terms of performance. ArrayArrayList Memory occupation The data in the value type array does not pack, and the length of each element is equal to the length of the corresponding value type. The length of each element of the reference type array is equal to INTPTR.SIZE The internal array is an array of reference types. The value type array will bring 12-byte overhead per element (4 bytes for object reference, 8 bytes is the object head introduced when the element is packaged) performance There is a dedicated IL directive; eliminate the scope check Length fixation Length variable length access The premise of access to an element of an index is that all elements before the index have been added. For example, the following code will make an exception: ArrayList Al = New ArrayList (); Al [0] = 1; The mutual conversion between Array and ArrayList is convenient. ArrayList.Adapter method is used to convert Array into ArrayList, and the ToArray method is used to convert ArrayList to Array. You can use the following approach to access the internal arrays of ArrayList maintenance: (Object []) sb.gettype (). Getfield ("_ items", bindingflags.nonpublic | bindingflags.instance) .GetValue (arraylist), this is one of ToArray alternative method. This approach reduces memory and time overhead than TOARRAY. However, it should be noted that the length of the array obtained by this method is the capacity of ArrayList, not the number of actual elements. I personally think that the array should provide a method for changing the size of the array to chase arrays. Below I wrote this method imitating the behavior of ArrayList, you can use it to change the size of the array. public static Array Resize (Array array, int newSize) {Type type = array.Type; Array newArray = Array.CreateInstance (type.GetElementType (), newSize); Array.Copy (array, 0, newArray, 0, Math.Min (NEWARRAY.LENGTH, NEWSIZE)) L Return NEWARRAY;} Mobile array Array class provides a copy method to copy data between an array and another array, in fact, the COPY method can also move data within one array. At this time, the replication behavior is equivalent to the standard C / C function Memmove, not MEMCPY. The following INSERTHELPER method starts from Index, move the back of the bottom to the right, and the elements of the array length are discarded after the right shift. Similarly, the REMOVEHELPER method starts from Index Count, moves all elements behind, and the elements after Array.length-Count are discarded. Public Static Array INSERTHELPER (Array Array, INDEX, INT Count {Array.copy (Array, Index, Array, Index Count, Array.Length- (Index Count)); Array.clear (INDEX, Count); Public Static Array RemoveHelper (Array Array, INDEX, INT Count {INT Copy =; Array.copy (Array, Index Count, Array, Index, Array.Length - (Index Count)); Array.clear (Array). Length - count, count;} For a value type array that does not contain any internal objects, the Buffer class provides several useful methods (blockcopy, bytelength, getByte, setByte) to operate them. When using these methods, the type of elements is ignored, because the buffer class is just as a series of bytes, and the different types of arrays can be replicated with each other. For example, we can copy floating point type arrays to an integer type array, which is the same. When multidimensional array copy data, the array is seen to be a one-dimensional array (length equal to the sum of the lengths of all dimensions of the multi-dimensional array). For example, if there is a three-dimensional array, 4 elements per dimension (Array [3, 4]), from the array starts to copy 6 elements, the result is that the first four elements are all elements of the first dimension, then 2 The element is the first two elements of the second dimension. In addition, a class worthy of our payment is BitArray, a bit of PASCAL style. The BitArray class is a compressed array of management bit values, which is represented as a Boolean value, where true indicates that the bit is open (1), and false indicates that the bit is closed (0). The other is a BitVector32 structure similar to BitVector32, which stores the Boolean and small integers in 32-bit memory. BitVector32 is more effective than BitVector32 for internal use of Boolean and small integers. BitArray can expand unlimited as needed, but it has system overhead in memory and performance. In contrast, BitVector32 uses only 32 bits. ArrayList view ArrayList can create views for Array and ILIST. Adapter uses the ArrayList.Adapter method to create a view for any class that implements the ILIST interface, so that it can be operated as an ArrayList class. In other words, ILIST can utilize the method provided by the ArrayList class (Binarysearch, Sort, Reverse, GetRange, also has a conversion function). For Array arrays, this does not seem to use because it also provides these methods (except for extracting subsets). grammar Convert ILIST to Array ArrayList.Adapter (IList) .toarray () Reverse IList ArrayList.Adapter (ilist) .reverse () Gather ArrayList.Adapter (IList) .geTRANGE (Start, Count) Use a dial retrieval algorithm ArrayList.Adapter (IList). Binarysearch () Sort ArrayList.Adapter (IList) .sort () The code below the array subset can be used to extract the subset of Array arrays. Public Static Array GetRRANGE (Array Range, Int Start, INT Count) { TYPE TYPE = array.type; Array newArray = array.createInstance (type.getlementtype (), count) Array.copy (Array, Start, Newarray, 0, Count); Return Newarray; } This method is actually generating a subset copy of an array, which will occupy a lot of memory. ArrayList provides a method getRange to get subsets, if the length of the group subset is large, it is recommended to use GetRange because there is no log subset to copy, saving a lot of memory. The getRange method returns a subclass of ArrayList, equivalent to an array view. You can perform a variety of operations (add, modify, or delete elements). It should be noted that we can only operate the original number group through the view returned by getRange. As shown in the following code, modify the AL will cause the view AlView to fail, and you will throw an InvalidOperationException when you use the view alview again. int [] a = new int [5]; ArrayList al = New ArrayList (a); arraylist alview = al.getRange (0, 2); Al [0] = 1; INT a = (int) alview [0]; Package supports ArrayList provides three methods: fixedsize, readonly, and synchronized, each method has two overload versions, these three methods accept ILIST or ArrayList as parameters. The FixedSIZE method returns a list of fixed-size, where the elements are allowed to modify, but they are not allowed to be added or removed. The readonly method returns a read-only list package. Synchronized method Returns a list of synchronous (thread security). These methods can be mixed, such as: arraylist.synchronized (arraylist.readonly (list) returns a read-only synchronization array. The READONLY method is useful in prohibiting the modification array. You need to pay attention to two situations: First, the array is always passed by reference; another situation is that in the Marshal process, the number of array elements is more than 10, but the CLR is not a COPY array, but "lock" original Array (prevent it from being relocated by garbage collector). In both cases, you are likely to modify arrays, so that the results of the program are not expected. Array Covariance Multiple Covariance can convert a type of pointer array into another pointer type in C . In .NET, the CLR allows you to implicit or explicitly convert the element type of the reference type array into another, which is called Array Coveriance. The CLR does not allow conversion of value type arrays to other types of arrays. You can use other ways to implement the conversion of value type arrays, for example, using the array.copy method to create a target array and convert the original array element to the target array. int [] a = new int [5]; A [0] = 1; Double [] b = new double [5]; Array.copy (A, 0, B, 0, 1); Whether it is an explicit or implicit conversion, when compiling, the original number of group types are converted into target types, and the premise of conversion is that the two arrays must have the same dimension. During the conversion process, the array is only reinleased, and there is no change in memory occupancy. If the conversion is implicit, before the conversion, the element type of the array is converted to its supported interface or one of its base types, which does not need to be displayed, or no runtime check is performed. If the conversion is explicit (conversion is converted from one interface to another, or converted from the base class into a subclass, or converted from one type to an interface that it does not directly support it), it is necessary to display Cast, and perform runtime check. As mentioned earlier, the reference type array has an element type internal field (ELEMENTTYPE). This field remains unchanged before and after conversion. Execution runtime check is mainly compatible between the new type and element type (ElementType). The following example can make you better understand the array conversion. Public class animal {} Object [] data = new animal [2]; // Animal [] is implicitly converted to object [] Animal [] Animals1 = data; // error: From Object [] to Animal [] Need explicit conversion Animal [ ] Animals2 = (Animal []) data; // Object [] Explicitly converted into animal [] string [] strings1 = (String []) Animals2; // Compile failed, because String [] and Animal [] Cannot be converted to each other [] strings2 = (String []) data; // Compile success, but it will have an abnormality when running, because Animal [] is not inherited from String [] to Object [] Data2 = new object [1]; Data2 [0] = new animal (); Animal [] Animals3 = (Animal []) DATA2; // Compile success, but an exception occurs when running. Runtime check will verify the compatibility of the element type (ElementType) and Target Type Animal. Animal [] Animal4 = New Animal [1]; Object [] data3 = (Object []) Animal4; Animal [] Animal5 = (Animal []) DATA3; // Compile Success, do not have an exception at runtime. Running checking will detect the element type of DATA3 and the compatibility of Target Type Animal. It is possible to reinterpret a type of array as other types, which greatly improves the efficiency of the program in memory usage or time. If you convert from one type array to another type of array that needs to be re-constructed, it is clear that the performance of the program will be greatly impacted. Public void test () {string [] data = new string [] {"a", "b", "c", "d", "e"}; setRANGE (DATA, 1, 3, "X"); } public void setRay, int start, int count, object value) {for (int i = 0; i When copying the reference type array, advance the type check, then perform a shallow copy, if the type is not compatible, throw the ArrayTypeMatchThexception. public Array Convert (Array array, Type type) {Array newArray = Array.CreateInstance (type, array.Length); Array.Copy (array, 0, newArray, 0, array.Length); return newArray;} accessible through reflection Internal members You can access, call or modify internal members of the ArrayList class, whether they are declared as private, protected or internal. The following code acquires ArrayList's internal members_items: object [] abc = new object [5]; arraylist al = new arraylist (5); al.add ("abc"); ABC = (Object []) Al.gettype ) .Getfield ("_ items", bindingflags.nonpublic | bindingflags.instance) .GetValue (AL); console.writeline (abc [0]); you can read Microsoft's published Rotor bag to understand the internal members of the class, or use IL contrauser, such as Reflector or Anakrino. Array performance The indexes of arrays are generally required to perform a range check. According to Microsoft's statement, the compiler has made some special optimization to improve the performance of traversal array or string. Let's first compare the three ways of traversing arrays, and see if that is faster. A is a one-dimensional INT type array 1) int hash = 0; for (int i = 0; i Observations exceeding 85K are called large objects, they are allocated in large object stacks. Almost all major objects are arrays, some are strings. Obviously, few classes contain so many members that make the occupied memory exceed 85K. The big object cannot be compressed, and it can only be recovered in full waste recycling (including the second generation garbage collection). If the big object contains a destructor, then at least two full garbage recovery can be recycled. The number of whole garbage recovery is generally 1/100 of 0 generation garbage. Obviously, the memory of recycling large waste objects requires a longer time. From the perspective of memory assignment, the big object that is frequently allocated in the program is a very bad design, even the worst design. When constructing ArrayList or other collection classes, a sufficient capacity is specified to avoid expanding capacity of performance loss. Compared with the multi-dimensional array, the interlaced array has better performance, try to use interlaced arrays. Try to use strong type arrays because strong type arrays can avoid performance losses from packing, conversion, method calls.