Testing data structure - Part 2: Queue, stack and hash table [translation]

xiaoxiao2021-03-06 150

Original link: Part 2: The Queue, Stack, And Hashtable This article is the second part of the "Examination Data Structure" series, and three research structures: Queue, stack, and hash HashTable. As we know, quenu and stack actually a special arrayList, providing a large number of different types of data objects, but only access to these elements are restricted. HashTable provides a data abstraction of Array-Like, which has more flexible index access. The array needs to index through the number, and the HashTable allows the data item to be indexed by any object. Directory: Introduction "Queuing Order" work process "Anti-Queuing Order" - Stack Data Structure Structure Index Synd System.Collections.HashTable Class Conclusions Introduction In the first part, we understand what is the data structure and evaluate their respective performance And learn about what data structure on the specific algorithm is selected. In addition, we also understand and analyze the basics of the data structure and introduce a most common data structure: array. The array stores the same type of data and indexes by the order. The actual value of the array is stored in a continuous memory space, so the specific element in the read and write array is very rapid. Because of its homomature and fixed length, the .NET Framework base library provides an ArrayList data structure, which can store different types of data and do not need to explicitly specify the length. As described above, ArrayList is essentially an array of object types. Each time you call Add () methods, the internal Object array is checked boundary. If it is exceeded, the array will automatically increase its length in a multiple. The second part, we will continue to investigate two types of array structures: Queue and Stack. Similar to arraylist, they are also an adjacent memory block to store different types of elements, but when accessing data, it will be limited. After that, we will also understand the HashTable data structure. Sometimes, we can see HashTable as a associated array, which is also a collection of different types of elements, but it can be indexed by any object, such as String, rather than a fixed number. The work process of "Queuing Order" If you want to create a different service, this service is through a variety of resources to respond to a plurality of requests; then how to determine the order of its response when processed, when processed these requests, how to create a service A big problem. The usual solution has two: "Queuing order" principle "Priority-based" processing principle When you shop in the store, when you use money, you need to wait in line to wait for service. The "Queuing Order" principle specifies the earlier enjoyment in front than the next earlier. The "priority-based" principle determines the service order according to its priority level. For example, in the emergency room of the hospital, patients who are dangerous in the hospital will be better than the doctor's diagnosis, instead of tube. Imagine that you need to build a service to process the request accepted by the computer, because the request received is far exceeds the speed of your computer, so you need to put these requests in the order in which they submit it. One solution is to use ArrayList, which specifies the location of the task to be executed in the array through an integer variable called NextJobPOS. When the new work request enters, we will add it to the end of ArrayList using the ARRAYLIST's Add () method.

When you are ready to handle the task of the buffer, you get the position value of this task at ArrayList through NextJobPOS to get the task, and add NEXTJOBPOS to 1. The following program implementing the algorithm: using System; using System.Collections; public class JobProcessing {private static ArrayList jobs = new ArrayList (); private static int nextJobPos = 0; public static void AddJob (string jobName) {jobs.Add (jobName );} public static string GetNextJob () {if (nextJobPos> jobs.Count - 1) return "NO JOBS IN BUFFER"; else {string jobName = (string) jobs [nextJobPos]; nextJobPos ; return jobName;}} public static Void main () {addjob ("1"); addjob ("2"); console.writeline (getNextJob ()); addjob ("3"); console.writeLine (GetNextJob ()); console.writeLine (GetNextJob )); Console.writeline (getNextJob ()); console.writeline (getNextJob ()); addjob ("4"); addjob ("5"); console.writeLine (GetNextJob ());}} The output is as follows: 1 2 3 No Jobs in Buffer No Jobs in Buffer 4 This method is easy to understand, but efficiency is terrible. Because, even after the task is added to the buffer, the length of ArrayList will continue to increase with the task added to the buffer. Suppose we need one second from the buffer and remove a task, which means that an ARRAYLIST's add () method is called for each call () method in one second. As the ADD () method continues to be called, the ArrayList internal array length will continue to grow according to demand. After five minutes, ArrayList's internal arrays increased to the length of 512 elements. At this time, only one task is less than one task. According to this trend, the work task continues to enter as long as the program continues to run, and the length of ArrayList will continue to grow. The result of such a ridiculous ridiculous ridiculous, because the old task has been processed in the space in the buffer is not recycled. That is to say, when the first task is added to the buffer and is processed, the first elemental space of ArrayList should be reused.

Think about the workflow of the above code, when inserted into two work - ADDJOB ("1") and addjob ("2") - ARAYLIST's space is shown in Figure 1: Figure 1: After executing the first two lines of code ArrayList Note A total of 16 elements here because ArrayList initializes the default length 16. Next, call the getNextJob () method, remove the first task, as shown in Figure 2: Figure 2: ArrayList after calling the getnextJob () method When executing addjob ("3"), we need to add a new task to the buffer . Obviously, the first element space (index of ArrayList is reused, and the third task is placed at the 0 index. But don't forget, when we executed AddJob ("3"), addJob ("4") was performed, followed by calling two GetNextJob () methods. If we put the third task to the 0 index, the fourth task will be put in the index 2, and the problem has occurred. As shown in Figure 3: Figure 3: When putting the task to the 0 index, the problem now calls getNextJob (), the second task removes from the buffer, the NextJobPOS pointer points to index 2. Therefore, when getNextJob () once again, the fourth task will be removed first in the third, which is contrary to our "sort order" principle. The problem that the problem occurs is that ArrayList reflects the list of tasks in the line order. So we need to add new tasks to the right of the task to ensure that the current processing order is correct. ArrayList will grow up in any way to arrive at the end of ArrayList. If an unused element is generated, the getNextJob () method is called. The solution is to make our arraylist into annular. The annular array has no fixed start and end points. In an array, we use variables to maintain the starting point of the array. The annular array is shown in Figure 4: Figure 4: The annular array illustration is shown in an annular array, addJob () method adds a new task to the index ENDPOS (the translation: ENDPOS is typically called the tail pointer), then "increment" ENDPOS value. The getNextJob () method obtains the task according to the head pointer StartPOS and points the header pin to NULL and "increment" StartPOS value. The reason why I am adding "incrementing" plus quotation is because the "increment" mentioned here is not only to add variables to 1. Why can't we simply add 1? Consider this example: When Endpos is equal to 15, if Endpos adds 1, ENDPOS is equal to 16. At this point, addjob () is called, which attempts to access the element of the index 16, and the result is an exception INDEXOFRANGERGEEXCEPTION. In fact, when Endpos is equal to 15, Endpos should be reset to 0. If an increment feature check If the transferred variable value is equal to an array length, reset to 0. The solution is the code for array the array length value (more), the code of the increment () method is as follows: INT INCREMENT (Variable 1)% THEARRAY.LENGTH;} Note: Take the operator For example, X% Y is obtained is the remainder after x divide. The remainder is always between 0 and Y-1. This approach is that the buffer never exceeds 16 element space.

But if we have to add a new task of more than 16 elemental space? Like the ARRAYLIST's Add () method, we need to provide an annular array of self-growth capabilities, with a magnification growth array length. The System.Collection.Queue class is like the described we have just described, we need to provide a data structure that inserts and removes element items in accordance with the principles of "Queuing Order", and maximizes the use of memory space, the answer is to use data. Structure Queue. This class is already built in the .NET Framework base class library - Thestem.collections.Queue class. Just like the addJob () and getnextJob () methods in our code, the Queue class provides the same functionality to the ENQUEUE () and Dequeue () methods respectively. The Queue class establishes an annular array of Object objects internally and measures the heads and tails of the array through Head and Tail variables. By default, Queue initializes capacity of 32, and we can also customize capacity through its constructor. Since queue built is an Object array, you can put any type of element into the queue. The enqueue () method first determines if there is sufficient capacity in Queue to store new elements. If there is, add an element directly and increase the index tail. Here Tail uses a moderate operation to ensure that TAIL will not exceed the array length. If there is not enough space, Queue expands array capacity based on a particular growth factor. The default value of the growth factor is 2.0, so the length of the internal array will be doubled. Of course, you can also customize the growth factor in the constructor. The demue () method returns the current element according to the head index. The Head index is then directed to NULL, and then "increment" the value of Head. Maybe you just want to know the value of the current head element, without outputting the queue (dequeue, listing), the Queue class provides the PEEK () method. Queue can be randomly accessed like ArrayList, which is very important. That is, we can't access the third element directly before the first two elements are released. (Of course, the Queue class provides the Contains () method, which allows you to determine if a specific value exists in the queue.) If you want to join the data random access, then you can't use this data structure, but only arraylist . Queue is best for this, that is, you only need to handle element items stored in exact order at the time of reception. Note: You can call Queues as FIFO data structure. FIFO intends to advance in first out (first OUT), which is integrated with "First Come, First Served". Translation: In the data structure, we usually call the team as advanced first data structure, and the stack is advanced and out of the data structure. However, this article does not use the concept of First in, First Out, but First Come, First Served. If you translate be advanced first, or it is not very suitable. Lenovo's introduction to this concept, in order to queue in shopping in shopping malls, the index is translated into "queuing order". I think, people with queuing awareness should understand the meaning of them. Then correspond to it, for the stack, only named "Anti-Queuing Order" to represent (First Come, Last Served). I hope that all friends can have better translation to replace my poor words. Why don't you translate as "advanced first out", "advanced"? I mainly take into account the English served here, and it contains a wide range, at least we can think that it is the processing of data, so it is not simply output.

So I simply avoid the meaning of this word. "Anti-Queuing Order" - Stack Data Structure Queue data structure The mechanism of "queuing order" is implemented by using an annular array of Object types internally. Queue provides Enqueue () and Dequeue () methods to implement data access. "Queuing Order" is often used in processing realistic issues, especially providing services, such as web servers, print queues, and other procedures for processing multiple requests. Another way to use in the program is "First Come, Last Served". The stack is such a data structure. In the .NET Framework base library contains the System.Collection.Stack class, like Queue, Stack is also implemented by storing an internal annular array of objects of Object type. STACK Accesses Data - Push (Item) through two ways, press the data into the stack; POP () is a stack that pops up the data and returns its value. A Stack can be represented by a vertical set of data elements. When the element is pressed into the stack, the new element is placed at the top of all other elements, and the item is removed from the top of the stack. The following two figures demonstrate the stack and out of the stack. First press the data 1, 2, 3 into the stack in order, then pop-up: Figure 5: Pushing three elements to the stack Figure 6: STACK after popping up all elements Note that the default capacity of the Stack class is 10 elements, not 32 elements of Queue. Like Queue and ArrayList, the capacity of Stack can also be customized according to the constructor. As arraylist, the capacity of Stack is also automatically doubled. (Memolive: Queue can set growth factor based on the optional options of the constructor.) Note: Stack is often referred to as "LIFO advanced" or "LIFO advance first out" data structure. Stack: There are many examples in real life in computer science, similar examples of Queue: DMV (translation: I don't know if it abbreviate, I am alone, I don't know how it is, "print task processing, etc. However, it is difficult to find an approximate example in real life, but it is a very important data structure in a variety of applications. Imagine our computer language we used to program, for example: C #. When the CLR (public language runtime) will call Stack to track the functional module (the original text is function), I understand that many compilers are not only a function, but in fact, many compilers will call the stack to determine their address. Implementation. Whenever a functional module is called, the relevant information is pressed into the stack. The stack is popped up with the end of the call. The top of the stack is information for the current call function. (To view the execution of the function call stack, you can create a project under Visual Studio.net, set breakpoint, execute debugging. When executed, it is executed, will be in debug window (Debug / Windows / Call Stack information is displayed under stack. Restrictions on the number index Let us talk in the first part that the array is characterized by the same type of data, and indexed by the order. That is: Access the time of the i-th element is fixed. (Please Remember that this quantitative time is marked as o (1).) Perhaps we didn't realize that we always "feel a unique clock" for ordered data. For example, employee databases (Social Security) Number) For its unique identifier. The format of the social security number is DDD-DD-DDDD (the range of D is numbers 0--9).

If we have a random arrangement store all employee information, we have to find employees from 111-22-3333, which may traverse all elements of the array - execute O (n) operation. A better way is to sort according to the social security number, and the lookup time is reduced to O (log n). Ideally, we are more willing to perform O (1) time to find information about a employee. One solution is to create a giant array that is the entrance of the actual social security number. Such an array is 000-00-0000 to 999-99-9999, as shown below: Figure 7: Store all 9-digit giant arrays as shown in the figure, each employee information includes name, phone , Salary, etc., and its social security is index. In this manner, the time to access any of the employee information is set. The disadvantage of this solution is the waste of space - a total of 109, 1 billion different social security. If the company has only 1,000 employees, then this array only uses a space of 0.0001%. (For a point of view, if you want this array to make full use, perhaps your company has to hire one-third of the world's population.) Use hash function compression order number to seek, create 1 billion element array to store 1000 The information of the employee is unacceptable. However, we urgently need to increase data access speed to achieve a constant time. One option is to reduce the span of social security registration using the last four of the employee social ouloxa. In this way, the span of the array only needs to be from 0000 to 9999. Figure 8 shows the compressed array. Figure 8: The compressed array This scheme ensures that the access time is constant, and the storage space is taken. After selecting the SMS, the four digits are random, and we can use the middle four digits, or choose 1, 3, 8, and 9. This 9-digit converted into a 4-digit number is a hash conversion (Hashing). Hash conversion can convert an indexers space to a Hash Table. Hash function realizes hash conversion. In an example of social security, the hash function h () is represented as: the input of H (X) = X can be an input of any nine social homework, and the result is the latter four of the social security. Bit number. In mathematical terms, this method of converting nine-digits to four digits is called hash element mapping, as shown in Figure Nine: Figure 9: Hash Function Illustration Diagram Nine Explained the one in the hash function Behavior - Collisions. That is, we will appear the same value when mapping a relatively large set element to a relatively small concentration. For example, all the four bits 0000 in the social security number are mapped to 0000. So 000-99-0000, 113-14-0000, 933-66-0000, there are many other things will be 0000. What happens if we want to add a social security number 123-00-0191? Obviously trying to add that the employee will conflict because there is already an employee at 0191. Mathematical label: Hash function is more described more in mathematical terminology as F: A-> B. Where | A |> | B |, the function f is not a mapping relationship, so there is a conflict. Obviously, the occurrence of conflicts will generate some problems. In the next section, we will look at the relationship between hash functions and conflicts, and then simply deal with several mechanisms of conflict. Next, we will focus on the System.Collection.hashtable class and provide a hash table implementation. We will learn about the HashTable class has a hash function, a conflict resolution mechanism, and some examples of using havehtable. Avoid and resolve conflicts When we add data to a hash table, conflicts are a factor that caused the entire operation.

If there is no conflict, the insertion element operation is successful, and if a conflict has occurred, it will need to judge the reason. As a result of improving the price, our goal is to press the conflict to the lowest possible. The frequency of the conflict in the hash function is related to the data distribution transferred to the hash function. In our example, it is a good choice to use the last four digits using the last four digits. However, if the social security is assigned by employee birth year or birth address, because employees' birth year and address are obviously not uniform allocation, then the four digits will be made due to a large number of repetitions. Note: There is a certain statistical knowledge for the analysis of hash functions, which exceeds the scope discussed herein. Asnewally, we can use the K-k slots's hash table to ensure avoid conflicts, it can map a random value from the hash function of the hash to any particular element, and define within 1 / k . (If this makes you more confused, don't worry!) We will choose the appropriate hash function to become a conflict avoidance mechanism (Collision Avoidance), have many research design, because the choice of hash functions The overall performance of the hash table is affected. In the next section, we will introduce the use of hash functions in the HashTable class of .NET Framework. There are many ways to handle conflicts. The most direct method, we call the "Collision Resolution", which is to insert an object in the hash table to another space, because the actual space has been taken. One of the simplest methods is called "linear probing", and the implementation steps are as follows: 1. When you want to insert a new element, use the hash function in the hash table; 2. Check if this location already exists, if the content is empty, insert and return, otherwise the stepping step 3.3. If the address is i, check if i 1 is empty. If it is already occupied, check i 2, so on, you can find a location where you find an empty position. For example: If we want to insert five employees into the hash table: Alice (333-33-1234), BOB (444-44-1234), CAL (555-55-1237), Danny (000-00 -1235), AND Edward (111-00-1235). When the information is added, as shown in Figure 10: Figure 10: The Social Security Social Security Social Security Social Security Social Security Social Security Social Security Social Security Society of Social Security This is "Hash (here the verb, the translation)" is 1234, so the storage position is 1234. Next, the social security number of BOB is also "hash" is 1234, but the information of the Bob is put in the next position because the location 1234 already exists, the information of BOB is placed in the next position - 1235. After that, the addition of Cal, the hash value of 1237, 1237 is empty, so CAL is placed in 1237. The next is Danny, the hash value is 1235.1235 has been occupied, then check whether the 1236 position is empty. Since it is empty, Danny is placed there. Finally, add edward information. Similarly, his hash is preferably 1235.1235 has been occupied, and the 1236 is also occupied, and then inspects 1237 until 1238, this location is empty, so Edward is put in 1238 position. When searching a hash table, the conflict still exists.

For example, as shown above, we have to access the edward information. So we will set your Edward's social security number 111-00-1235 has 1235, and start searching. However, we found Bob at 1235, not Edward. So we search for 1236 and found Danny. Our linear search continues to find the location where you find Edward or find the content empty. As a result, we may conclude that the employees who have a social security number 111-00-1235 do not exist. Although linear mining is simple, it is a good strategy to resolve conflicts because it will cause clustering. If we want to add 10 employees, the four bits of their social security number are 3344. So there are 10 consecutive spaces from 3344 to 3353. Find any of these 10 employees to search for this cluster position space. Moreover, the employee added to any hash value in the range of 3344 to 3353 will increase the length of this cluster space. To quickly query, we should make the data evenly distributed, not a few places to form a cluster. Better excavation technology is "Quadratic ProBing), each time the step size of the location is increased in square. That is, if the position S is occupied, first check S 12, then check S-12, S 22, S-22, S 32, rather than such push, not like linear mining, from S 1, S 2 ... linear growth. Of course, the second digging also results in similar polymerization. In the next section we will introduce the third conflict resolution mechanism - the secondary hash, which is applied to the Hash table class of .NET Framework. The System.Collections.hashtable class .NET Framework base class includes the implementation of the HashTable class. When we want to add an element to the hash table, we must not only provide elements (Item), but also provide keywords (Key) for this element. Key and Item can be any type. In the employee example, Key is the social security number of the employee, and ITEM is added to the hash table via the add () method. To get an element (item) in the hash table, you can use Key as an index access, just like the index in the array. The following C # small program demonstrates this concept. It adds some elements to the hash table with string values as Key. And access specific elements via Key.

using System; using System.Collections; public class HashtableDemo {private static Hashtable ages = new Hashtable (); public static void Main () {// Add some values to the Hashtable, indexed by a string key ages.Add ( "Scott" , 25); Ages.Add ("SAM", 6); Ages.Add ("JISUN", 25); // Access A Particular Key IF (Ages.Containskey ("Scott")) {int Scottsage = (int) Ages ["Scott"]; console.writeline ("Scott IS" Scottsage.toString ());} else console.writeline ("Scott Is Not In The Hash Table ...");}}}} The method is to determine whether there is a qualified element according to a particular Key to return to the Boolean value. The HashTable class contains the Keys property (Property), returns a collection of all keywords used in the hash table. This attribute can be accessed, as follows: // Step through all items in the havehtableforefore Items in the haveke.writeline ("Value At Ages [/" KEY "/"] = " Ages [ Key] .tostring ()); To recognize the order of the insertion element and the order of Key in the keyword collection is not necessarily the same. The keyword collection is based on the element corresponding to the stored keyword. The result of the above program is: Value at Ages ["JISUN"] = 25Value at Ages ["Scott"] = 25Value at Ages ["SAM"] = 6 Even the order in which the hash table is inserted is: Scott, SAM, JISUN. HashTable class's hash function in the Hashtable class is more complicated than the hash value of the social security number introduced earlier. First, to remember that the value returned by the hash function is the number of orders. For the example of the social security number, it is easy to do because the social security name is a number. We only need to intercept the last four digits, you can get the right hash value. However, the HashTable class can accept any type of value as a key. Just like the above example, Key is a string type, such as "Scott" or "SAM". In such an example, we naturally want to understand how the hash function converts String into numbers. This wonderful conversion should be attributed to the getHashCode () method, which is defined in the System.Object class. The GetHashCode () default implementation in the Object class is to return a unique integer value to ensure that it is not modified in the life of the Object. Since each type is derived from Object directly or indirectly, it can access the method so Object. Natural, string or other types can be represented by a unique numeric value.

The definition of the hash function in the HashTable class is as follows: h (key) = [gethash (key) 1 ((Gethash (key) >> 5) 1)% (Hashsize - 1))]% HASHSIZE GetHash (Key), default is the return value of the getHashcode () method for Key (although when using HashTable, you can customize the getHash () function). Gethash (key) >> 5 indicates that the hash value will be obtained, and 5 bits to the right, which is equivalent to dividing the hash value at 32. The% operator is the sample operator previously introduced. HashSize refers to the length of the hash table. Since the mode is to be sampled, the last result h (k) is between 0 to HashSize-1. Since HashSize is the length of the hash table, the results are always within the acceptable range. A conflict solution in the HashTable class occurs when we add or get an element in a hash table, a conflict occurs. When inserting an element, you must find the location where the content is empty, and when you get the element, the element must also be found even if it is not in the expected location. As we briefly introduced two mechanisms - linearity and second digging of conflicts. Using a completely different technique in a HashTable class, it is a secondary hashing (some data also referred to as double precision hash double hashing). The working principle of the Seconditude Hash is as follows: There is a collection of multiple hash functions (H1 ... HN). When we want to add or get an element from a hash table, use the hash function H1. If the conflict is caused, try to use H2, until hn. Each hash function is extremely similar, and the different multiplication factors are selected. Typically, the definition of the hash function HK is as follows: HK (key) = [GetHash K * (1 ((Gethash (KEY >> 5) 1)% (HashSize - 1)))]% HashSize Note: It is important to use the secondary hash after executing the HashSize mining, every location in the hash table is exactly one visit. That is, for a given Key, Hi and HJ will not be used at the same location in the hash table. The use of a secreated hash formula in the HashTable class, which is guaranteed to be: (1 (KEY >> 5) 1)% (HashSize - 1)) and has both of them. (Two numbers). The number of mutual probes indicates that there is no common sense factor.) If HashSize is a prime number, it guarantees that these two numbers are prone to each other. Secondary Hash is better to avoid conflicts than the first two mechanisms. Call factor (Load Factors ) And extended have a private member variable loadFactor, which specifies the maximum ratio between the number of elements in the hash table and the total number of table positions. For example, only half of the hash table The space is stored in the space, and the remaining half is empty. The Hash table constructor is used to overload the way, allowing the user to specify the loadFactor value, the definition is from 0.1 to 1.0. Be careful, regardless of the value you provide. There is no more than 72%. Even if you pass the value of 1.0, the LoadFactor value of the HashTable class is still 0.72. Microsoft believes that the optimal value of LoadFactor is 0.72, so although the default loadFactor is 1.0, but the internal inside is automatically The change is 0.72.

转载请注明原文地址:https://www.9cbs.com/read-97214.html

9cbs

New Post(0)