Original link: Part 1: an introduction to data structures
Introduction: This article is a series of articles that use data structures under the .NET platform, a total of six parts, which is the first part of this article. This article tries to examine several data structures, which contain the base class library in .NET Framework Some of them are created by it. If you are not familiar with these terms, then we can regard the data structure as an abstract structure or class, which is usually used to organize data, and provide the operation of data. Most common And the data structure we are familiar with the array array, which contains a set of continuous data and access it through an index.
Before reading the content of this article, let's take a look at the main content of these six parts. If you have any ideas, or think that this article is missing, I hope you contact me through e-mail (mitchell@4guysfromrolla.com. Share your thoughts together. If you have time, I am very happy to put your suggestion in the right part, if necessary, you can add the seventh part in this series.
The first part: First introduce the importance of the data structure in the algorithm design. The advantages and disadvantages of determining the data structure are their performance. We will have a variety of performance of the data structure. This section will introduce .NET FrameWord's two commonly used Data organizations: Array and ArrayList. We will examine the operation of its structure and its efficiency.
Part II: We will continue to analyze the ArrayList structure from more details, and will also introduce the Queue class and Stack class. Like ArrayList, Queue and Stack are stored in a set of continuous data sets, belong to .NET Framework Class library. Unlike ArrayList, Stack and Queue can only read their data in advance (advanced first out and advanced), and ArrayList can acquire data items anywhere. We will pass sample programs to examine Queue STACK, and by extending the ArrayList class. After we want to analyze the hash table Hashtable, it can directly access the data like ArrayList, which is indexed in a key (string).
ArrayList's direct reading and storage of data is an ideal data structure, and it is also a candidate for data search. In the third part, we will examine the binary tree structure. For data search, it is more effective than ArrayList. The .NET Framework does not include such a built-in data structure, so we need to create it yourself.
The efficiency of the binary tree is subject to the order in which the data inserted into the tree. If we insert it is an ordered or approximate order, it is actually the efficiency is not as good as arraylist. In order to combine these two advantages, in the first Four parts, I will find an interesting random data structure - Skiplist. Skiplist retains both the high efficiency of the binary tree search, and the order of input data is very impact on its efficiency.
The fifth part we will turn attention to the data structure that is usually used to express graphics. Figure (graph) is a collection of numerous nodes and nodes. For example, the map can be viewed in the form of a map. City is a node, highway It is the side between the connecting nodes. Many reality problems can be abstracted in the form of a map, so the figure is also the data structure we often use.
Finally, the sixth part we will talk about the reprisent sets and disjoint sets (non-associated set, ie the intersection is empty?) Collection is a set of disorderly data. Non-associated set refers to it and another collection There is no common element. We often use the collection and non-related collections when writing. We will describe it in this section.
Data Structure Performance Analysis When we think about a special application or program, most developers (including my own) will focus on the algorithm to solve the problem of the hand, or add a cool to the app. Features with enriching users' experience. We seem to have heard that someone will be excited for the data structure he use, and amazed. However, the data structure used in a particular algorithm can highly affect its performance. Most common The example is to find an element in the data structure. In the array, the time consumption is proportional to the number of elements in this array. Use a two fork or Skiplists (I can't find the right translation, press As mentioned earlier, it contains a collection of random numbers, perhaps the back of the following sections will think of the right Chinese), time consuming to decline in the number of data, Sub-linear, I have a poor words). When we want to search for a large amount of data, the selection of data structures is especially important to the performance of the program, and the difference is even more than a few seconds, and even minutes. Since the data structure used in the algorithm affects the efficiency of the algorithm, therefore compares various data. The efficiency of the structure is especially important. As a developer, we must first pay attention to how the data structure is changed as the amount of data is increased. That is to say, how will it affect the running time of the data structure when adding a new element in the data structure? Consider such a situation, we use the System.io.Directory.GetFiles method in the program to return files List, stored in a specific string array Directory. Suppose you need to search this array to determine if there is an XML file in the file list (ie, the extension name .xml file), one method is to scan (SCAN, or It is traversal. The whole array, set an identity when you find an XML file. The code may be like this: use system; use system.collections; use system.io; public class myclass {public static void main () {string [] fs = Directory.GetFiles (@ "c: / inetpub / wwwroot"); BOOL FOUNDXML = false; int i = 0; for (i = 0; i The symbols commonly used in progressive analysis are uppercase O (BIG-OH), describe the performance of the array in the form of O (N). O is a representation of the BIG-OH symbol in the term, and n represents the number of procedures that grow with the length increase with the length increase with the length of the array. One system method of the runtime of calculating the algorithm in the code block should follow the steps: 1. Determine the steps of the composition algorithm run time. As mentioned earlier, for arrays, a typical step should be an operation of reading and writing to an array. And for other data structures. In particular, you should consider the steps of the data structure itself, but is independent of the operation inside the computer. Take the above code block as an example, the runtime should only calculate the number of access arrays without considering creating and initializing the variables and comparing whether the two strings are equal. 2. Find a code line that meets the calculation of runtime conditions. On these row 1.3, it is judged whether or not the line of these 1 is included in the loop, and if yes, 1 is changed to 1 multiplies the maximum number of cyclic execution. Continue to do the same multiplication on the loop if the two or multiple cycles are nested. 4. Find the maximum value written on each line, it is the runtime. Now we mark the above code block in this step. First we have been able to determine the code line related to the calculation runtime, according to step 2, the two lines of code accessed in array FS are marked, and the line is the array element as the parameter of the string.Compare () method, and the line is in Console In the .writeline () method. We labeled these two lines as 1. The String.Compare () method is then based on step 3, and the maximum number of cycles is N (because the array length is n). Therefore, the mark 1 of the line is thus changed to n. Finally, we get the run time is the maximum value of the mark, records O (N). (Translation: That is, the time complexity of the data structure is usually said) O (N), or linear-time, represents one of a variety of algorithm runtime. Other also O (LOG2 N), O (N LOG 2 N), O (N2), O (2N), and the like. We do not have to care about these complicated BIG-OH marks, just know that the smaller the value in parentheses, the better the performance of the data structure. For example, the time complexity (here I still feel that time complexity than running time) is a more efficient algorithm for o (log n), because log n Array: A linear, can be directly accessed, a single data structure In programming, the array is the simplest and most widely used data structure. In all programming languages have the following common attributes: 1. The data of the array is stored in a continuous memory; 2. All elements of the array must be the same data type, so the arrays are also considered to be a single data structure (Homogeneous Data Structures); 3. Array elements can be accessed directly. (In a lot of data structures, this feature is unnecessary. For example, the data structure of the article introduces Skiplist. To access specific elements in skiplist, you must find other elements until the search object is found. However, for arrays For details, if you know that you have to find the i-I element, you can access it through ArrayName [i].) (Translation: Many languages specify the subscript of the array from 0, so access the i-th element, should ArrayName [i-1]) The following is a commonly used operation: 1. Assign space 2. Data Access 3. Array spatial redistribution (redimensioning) When the array is declared in the C #, the array is null value (NULL). The following code creates an array variable called BooleanArray whose value is empty (NULL): Bool [] BOOLLEANARRAY; When using this array, you must assign space with a specific number, as shown below: BooleanArray = new bool [10]; General expressions are: ArrayName = New ArrayType [allocationsize]; It will allocate a continuous memory space in the CLR hosted pile, enough to accommodate the data type ArrayTypes, number of array elements of allocationsize. If ArrayType is a value type (the translation: such as an int type), there is an allocationsize unboxed ArrayTyPE value created. If ArrayType is a reference type (the translation: such as a String type), there is an allocationsize an ArrayTyPE reference type value created. (If you are not familiar with the difference between value type and reference type, the hosted stack and the stack, please refer to "Understanding .NET Public Type System Common Type System") To help understand the internal storage mechanism of the number of the .NET Framework, please see the example below: ArrayName = New ArrayType [allocationsize]; This allocates a contiguous block of memory in the CLR-managed heap large enough to hold the allocationSize number of arrayTypes. If arrayType is a value type, then allocationSize number of unboxed arrayType values are created. If arrayType is a reference type, then allocationSize number of arrayType references are created. (If you are unfamiliar with the difference between reference and value types and the managed heap versus the stack, check out Understanding .NET's Common Type System.) To help hammer home how the .NET Framework stores the internals of An Array, Consider The Following Example: Bool [] BooleanArray; Fileinfo [] FILES; BooleanArray = new bool [10]; files = new fileinfo [10]; Here, BooleanArray is a value type System.Boolean array, and the files array is the reference type System.IO.FileInfo array. Figure 1 shows the case where the CLR hosted heap is executed after the four lines of code. Figure 1: Sequential storage of array elements in the hosted stack Remember that the ten elements stored in the files array point to the FileInfo instance. Figure 2 emphasizes this (Hammers Home this Point, some slang feelings, don't know how to translate), showing the distribution of some value after the FileInfo instance in the FILES array. Figure 2: Sequential storage of array elements in the hosted stack All arrays in .NET support read and write operations for elements. The syntax format of the access array element is as follows: / / Read an array element BOOL B = BooleanArray [7]; // Write an array element, that is, assign a value booleanArray [0] = false; The runtime that accesses an array element is expressed as o (1) because the access time of it is constant. That is to say, regardless of the number of elements that store the array, it is the same time. The runtime is unchanged because the array element is continuously stored, and only needs to know the starting position in the memory, the size of each element, and the index value of the elements. In the hosted code, the lookup of the array is slightly complex than the actual implementation, because each array is accessed in the CLR, ensuring that the index value is within its boundaries. If the array index exceeds the boundary, it will throw the indexoutofrangeException exception. This border check helps ensure that we enter another memory area without accidentally exceeding array boundaries. And it does not affect the time of array access, because the time required to perform a boundary check does not increase with the increase of array elements. Note: If the array element is particularly much, the index boundary check will have a slight impact on the execution performance of the application. And for non-hosting code, this boundary check is ignored. To learn more, please refer to the Applied Microsoft .NET Framework Programming Chapter 14 of Jeffrey Richter. When you use an array, you may need to change the array size. You can create a new array instance based on a specific length size, and copy the contents of the old array to the new array to implement this operation. We call this process for array spatial redistribution, as follows: use system; use system.collections; Public class myclass {public static void main () {// creates int type array int [] FIB = new int [3] containing 3 elements; FIB [0] = 1; FIB [1] = 1; FIB [2 ] = 2; // Redistribute an array, length 10 int [] temp = new int [10]; // Copy the FIB array content to the temporary array FIB.COPYTO (TEMP, 0); // Set the temporary number to FIB FIB = Temp;}} At the last line of the code, FIB points to an INT32 type array containing 10 elements. The element value defaults to the Fib array (the translation: note the subscript from 0) is default 0 (INT32 type). When we want to store the same type of data (the original text is HETEROGENEOUS TYPES - the heterogeneous data type, I suspect is incorrect) and only need to directly access the data, the array is a better data structure. Search for unsorted array time complexity is linear. This structure is acceptable when we operate on a small array, or rarely query operation. But when your application needs to store large quantities, and frequent query operations, there are many other data structures to adapt to your work. Let's take a look at some of the data structures that this article will introduce. (If you want to find an array according to a property, and the array is sorted according to this attribute, you can search for it using a binary search, its time complexity is o (log n), with in the binary tree The time complexity in the search is the same. In fact, the array class contains a static method binarysearch (). For more information on this method, please refer to my early article "Search orderate the group". Note: .NET Framework also supports multi-dimensional arrays. Like a one-dimensional array, the multi-dimensional array of access runtime access to the data element is still constant. Recall that the time complexity of the query operation in the one-dimensional array of N elements is O (n). For a two-dimensional array of NXNs, the time complexity is O (N2), because each search is checked for N2 elements. With this type, the time complexity of the K-dimensional array search is O (NK). ArrayList: Store different types of data, array of self-growth Clearly, arrays are limited when designing, because one-dimensional array can only store the same type of data, and when using an array, the specific length must be defined for array. Many times, developers require arrays more flexible, which can store different types of data, nor to care for allocation of array spaces. The data structure that meets this condition is provided in the .NET Framework base library - System.collections.arrayList. A small code as follows is an example of arraylist. Note that any type of data can be added when using ArrayList and does not need to be allocated. All of these are controlled by the system. ArrayList countdown = new arraylist (); countdown.add (5); countdown.add (4); countdown.add (3); countdown.add (2); countdown.add (1); countdown.add ("Blast Off (" BLAST OFF ! "); countdown.add (new arraylist ()); From the deep level of meaning, ArrayList uses the type of storage type Object's System.Array object. Since all types are derived directly or indirectly, natural an array of natural objects can also store any type of element. ArrayList creates an array of 16 Object type elements by default, and of course we can customize ArrayList size by constructing parameters in the function or setting the Capacity property. Add new elements to the add () method, and inside the array automatically check their capacity. If the new element is added, the capacity is automatically doubled, and we are called self-growth. ArrayList and Array can also be accessed directly by indexing: // read accessint x = (int) countdown [0]; string y = (string) countdown [5]; // Write AccessCountDown [1] = 5; // generate argumentOutofrange exception countdown [7] = 5; Since ArrayList is stored in an Object type element, the specified type conversion should be displayed when reading an element in the ArrayList. At the same time, it should be noted that if you accessed the array element more than the length of ArrayList, the system will throw the system.ArgumentOutOfRange exception. ArrayList provides self-growth flexibility in the standard array, but this flexibility is at the expense of performance, especially when we store value types - such as system.int32, system.double, System.Boolean Wait. They are continuously stored in the hosted form in the hosted form. However, the internal mechanism of ArrayList is a referenced Object object array; therefore, these elements will be converted to reference types even if only the value type is stored in ArrayList. As shown in Figure 3: Figure 3: ArrayList quoted by the Object reference stored in succession Using the value type in ArrayList, the additional sealing and the unboxing will operate, when your application is a big ArrayList, and often affect the program performance. As shown in Figure 3, the memory allocation of ArrayList and arrays is the same for reference types. In a comparison group, the self-growth of ArrayList does not cause any performance to decline. If you know the exact amount of an element stored in ArrayList, you can initialize the capacity through the ArrayList constructor to close its self-growth. For arrays, when you don't know the specific capacity, you have to manually change the size of the array when the inserted data element exceeds the array length. A classic computer science issue is: When the program is run, it is best to allocate how much new space should be the best. One solution is that the original allocation space is added to each additional 1. For example, an array initially assigned five elements, then increase its length to 6 before inserting the sixth element. Obviously, this scheme saves memory space to the greatest extent, but the cost is too large, because each inserting a new element is once again reassigned. Another solution is just the opposite, that is, each assignment increases by 100 times on the basis of the original size. If the array initially assigned 5 elements, the array space increased to 500 before inserting the sixth element. Obviously, the program greatly reduces the number of times the redistribution operation, but only when the very few data elements are inserted, there will be hundreds of element space unused, which is too wasteful! ArrayList's progressive runtime and standard arrays. Even if the ARRAYLIST is operated high overhead, especially the storage value type, the relationship between the number of elements and the cost of each operation is the same as the standard array.