Testing data structure - Part 2: Queue, stack and hash table [translation]
Documentation
Explain data structure - Part 1: Introduction to Data Structure
Testing data structure - Part III: Binary and BSTS
Original link: Part 2: The Queue, Stack, And Hashtable
This article is the second part of the "Investigation Data Structure" series, and three data structures, queue, stack, and havehtable. As we know, quenu and stack actually a special arrayList, providing a large number of different types of data objects, but only access to these elements are restricted. HashTable provides a data abstraction of Array-Like, which has more flexible index access. The array needs to index through the number, and the HashTable allows the data item to be indexed by any object.
table of Contents:
Introduction
"Queuing order" work process
"Anti-Queuing Order" - Stack Data Structure
Sequence index restriction
System.collections.hashtable class
in conclusion
Introduction
In the first part, we understand what is the data structure and evaluate their respective performance and understand what data structure is the impact of specific algorithms. In addition, we also understand and analyze the basics of the data structure and introduce a most common data structure: array.
The array stores the same type of data and indexes by the order. The actual value of the array is stored in a continuous memory space, so the specific element in the read and write array is very rapid.
Because of its homomature and fixed length, the .NET Framework base library provides an ArrayList data structure, which can store different types of data and do not need to explicitly specify the length. As described above, ArrayList is essentially an array of object types. Each time you call Add () methods, the internal Object array is checked boundary. If it is exceeded, the array will automatically increase its length in a multiple.
The second part, we will continue to investigate two types of array structures: Queue and Stack. Similar to arraylist, they are also an adjacent memory block to store different types of elements, but when accessing data, it will be limited.
After that, we will also understand the HashTable data structure. Sometimes, we can see HashTable as a associated array, which is also a collection of different types of elements, but it can be indexed by any object, such as String, rather than a fixed number.
"Queuing order" work process
If you want to create a different service, this service is through a variety of resources to respond to a plurality of requests; then how to determine the order of its response is a big problem for the creation of services when processing these requests. There are two ways to solve:
"Queuing order" principle
"Based on priority" processing principle
When you shop in the store, when you use money, you need to wait in line to wait. The "Queuing Order" principle specifies the earlier enjoyment in front than the next earlier. The "priority-based" principle determines the service order according to its priority level. For example, in the emergency room of the hospital, patients who are dangerous in the hospital will be better than the doctor's diagnosis, instead of tube.
Imagine that you need to build a service to process the request accepted by the computer, because the request received is far exceeds the speed of your computer, so you need to put these requests in the order in which they submit it.
One solution is to use ArrayList, which specifies the location of the task to be executed in the array through an integer variable called NextJobPOS. When the new work request enters, we will add it to the end of ArrayList using the ARRAYLIST's Add () method. When you are ready to handle the task of the buffer, you get the position value of this task at ArrayList through NextJobPOS to get the task, and add NEXTJOBPOS to 1. The following program implements the algorithm: use system; use system.collections; public class jobprocessing
{
Private static arraylist jobs = new arraylist (); private static int nextjobpos = 0; public static void addjob (string jobname)
{Jobs.Add (JobName);
}
Public static string getNextJob ()
{
IF (NextJobpos> Jobs.count - 1)
Return "No Jobs in Buffer";
Else
{
String JobName = (String) Jobs [NextJobpos];
NextJobPOS ;
Return JobName;
}
}
Public static void main ()
{
AddJOB ("1");
AddJOB ("2");
Console.writeLine (GetNextJob ());
AddJob ("3");
Console.writeLine (GetNextJob ());
Console.writeLine (GetNextJob ());
Console.writeLine (GetNextJob ());
Console.writeLine (GetNextJob ());
AddJob ("4");
AddJOB ("5");
Console.writeLine (GetNextJob ());
}
}
The output is as follows:
1
2
3
No jobs in buffer
No jobs in buffer
4
This method is simple and easy to understand, but efficiency is terrible. Because, even after the task is added to the buffer, the length of ArrayList will continue to increase with the task added to the buffer. Suppose we need one second from the buffer and remove a task, which means that an ARRAYLIST's add () method is called for each call () method in one second. As the ADD () method continues to be called, the ArrayList internal array length will continue to grow according to demand. After five minutes, ArrayList's internal arrays increased to the length of 512 elements. At this time, only one task is less than one task. According to this trend, the work task continues to enter as long as the program continues to run, and the length of ArrayList will continue to grow.
The result of such a ridiculous ridiculous ridiculous, because the old task has been processed in the space in the buffer is not recycled. That is to say, when the first task is added to the buffer and is processed, the first elemental space of ArrayList should be reused. Think about the workflow of the above code, when inserted into two work - ADDJOB ("1") and addjob ("2") - ARAYLIST's space is shown in Figure 1: Figure 1: After executing the first two lines of code ArrayList
Note that ArrayList here is 16 elements because ArrayList initializes the default length of 16. Next, call the getNextJob () method, remove the first task, as shown in Figure 2: Figure 2: ArrayList after calling the getnextjob () method
When addjob ("3") is executed, we need to add a new task to the buffer. Obviously, the first element space (index of ArrayList is reused, and the third task is placed at the 0 index. But don't forget, when we executed AddJob ("3"), addJob ("4") was performed, followed by calling two GetNextJob () methods. If we put the third task to the 0 index, the fourth task will be put in the index 2, and the problem has occurred. As shown in Figure 3: Figure 3: When putting the task to 0 index, the problem occurs
Calling getNextJob (), the second task is removed from the buffer, and the NextJobPOS pointer points to index 2. Therefore, when getNextJob () once again, the fourth task will be removed first in the third, which is contrary to our "sort order" principle.
The problem that the problem occurs is that ArrayList reflects the list of tasks in the line order. So we need to add new tasks to the right of the task to ensure that the current processing order is correct. ArrayList will grow up in any way to arrive at the end of ArrayList. If an unused element is generated, the getNextJob () method is called.
The solution is to make our arraylist into annular. The annular array has no fixed start and end points. In an array, we use variables to maintain the starting point of the array. The annular array is shown in Figure 4:
Figure 4: Ring array illustration
In an annular array, addJob () methods add new tasks to index ENDPOs (translation: ENDPOS is generally called tail pointer), then "increment" ENDPOS value. The getNextJob () method obtains the task according to the head pointer StartPOS and points the header pin to NULL and "increment" StartPOS value. The reason why I am adding "incrementing" plus quotation is because the "increment" mentioned here is not only to add variables to 1. Why can't we simply add 1? Consider this example: When Endpos is equal to 15, if Endpos adds 1, ENDPOS is equal to 16. At this point, addjob () is called, which attempts to access the element of the index 16, and the result is an exception INDEXOFRANGERGEEXCEPTION.
In fact, when Endpos is equal to 15, Endpos should be reset to 0. If an increment feature check If the transferred variable value is equal to an array length, reset to 0. The solution is to perform a variable value to the array length value (more), the code of the increment () method is as follows:
INT Increment (int variable)
{
Return (Variable 1)% THEARRAY.LENGTH;
}
Note: Take the operator, such as X% Y, obtained is the remainder after Y. The remainder is always between 0 and Y-1.
This approach is that the buffer never exceeds 16 element space. But if we have to add a new task of more than 16 elemental space? Like the ARRAYLIST's Add () method, we need to provide an annular array of self-growth capabilities, with a magnification growth array length.
System.collection.Queue class
As we have just described, we need to provide a data structure that inserts and removes element items in accordance with the principle of "queuing order", and maximizes memory space, the answer is to use the data structure Queue. This class is already built in the .NET Framework base class library - Thestem.collections.Queue class. Just like the addJob () and getnextJob () methods in our code, the Queue class provides the same functionality to the ENQUEUE () and Dequeue () methods respectively. The Queue class establishes an annular array of Object objects internally and measures the heads and tails of the array through Head and Tail variables. By default, Queue initializes capacity of 32, and we can also customize capacity through its constructor. Since queue built is an Object array, you can put any type of element into the queue.
The enqueue () method first determines if there is sufficient capacity in Queue to store new elements. If there is, add an element directly and increase the index tail. Here Tail uses a moderate operation to ensure that TAIL will not exceed the array length. If there is not enough space, Queue expands array capacity based on a particular growth factor. The default value of the growth factor is 2.0, so the length of the internal array will be doubled. Of course, you can also customize the growth factor in the constructor.
The demue () method returns the current element according to the head index. The Head index is then directed to NULL, and then "increment" the value of Head. Maybe you just want to know the value of the current head element, without outputting the queue (dequeue, listing), the Queue class provides the PEEK () method.
Queue can be randomly accessed like ArrayList, which is very important. That is, we can't access the third element directly before the first two elements are released. (Of course, the Queue class provides the Contains () method, which allows you to determine if a specific value exists in the queue.) If you want to join the data random access, then you can't use this data structure, but only arraylist . Queue is best for this, that is, you only need to handle element items stored in exact order at the time of reception.
Note: You can call Queues as FIFO data structure. FIFO intends to advance in first out (first OUT), which is integrated with "First Come, First Served".
Translation: In the data structure, we usually call the team as advanced first data structure, and the stack is advanced and out of the data structure. However, this article does not use the concept of First in, First Out, but First Come, First Served. If you translate be advanced first, or it is not very suitable. Lenovo's introduction to this concept, in order to queue in shopping in shopping malls, the index is translated into "queuing order". I think, people with queuing awareness should understand the meaning of them. Then correspond to it, for the stack, only named "Anti-Queuing Order" to represent (First Come, Last Served). I hope that all friends can have better translation to replace my poor words. Why don't you translate as "advanced first out", "advanced"? I mainly take into account the English served here, and it contains a wide range, at least we can think that it is the processing of data, so it is not simply output. So I simply avoid the meaning of this word. "Anti-Queuing Order" - Stack Data Structure
The Queue data structure enables the "Queuing Order" mechanism by using an annular array of Object types internally. Queue provides Enqueue () and Dequeue () methods to implement data access. "Queuing Order" is often used in processing realistic issues, especially providing services, such as web servers, print queues, and other procedures for processing multiple requests. Another way to use in the program is "First Come, Last Served". The stack is such a data structure. In the .NET Framework base library contains the System.Collection.Stack class, like Queue, Stack is also implemented by storing an internal annular array of objects of Object type. STACK Accesses Data - Push (Item) through two ways, press the data into the stack; POP () is a stack that pops up the data and returns its value.
A Stack can be represented by a vertical set of data elements. When the element is pressed into the stack, the new element is placed at the top of all other elements, and the item is removed from the top of the stack. The following two figures demonstrate the stack and out of the stack. First press the data 1, 2, 3 into the stack in order, then pop-out: Figure 5: Press the stack into three elements Figure 6: STACK after popping up all elements
Note that the default capacity of the Stack class is 10 elements, not 32 elements of Queue. Like Queue and ArrayList, the capacity of Stack can also be customized according to the constructor. As arraylist, the capacity of Stack is also automatically doubled. (Memories: Queue can set the growth factor based on the optional options of the constructor.)
Note: Stack is often referred to as "LIFO advanced" or "LIFO advances first" data structure. Stack: There are many examples in real life in computer science, similar examples of Queue: DMV (translation: I don't know if it abbreviate, I am alone, I don't know how it is, "print task processing, etc. However, it is difficult to find an approximate example in real life, but it is a very important data structure in a variety of applications.
Imagine our computer language we used to program, for example: C #. When the CLR (public language runtime) will call Stack to track the functional module (the original text is function), I understand that many compilers are not only a function, but in fact, many compilers will call the stack to determine their address. Implementation. Whenever a functional module is called, the relevant information is pressed into the stack. The stack is popped up with the end of the call. The top of the stack is information for the current call function. (To view the execution of the function call stack, you can create a project under Visual Studio.net, set breakpoint, execute debugging. When executed, it is executed, will be in debug window (Debug / Windows / Call Stack information is displayed under Stack.
Limitation of the number index
In the first part, the array is characterized by the collection of the same type of data, and indexes by the order. That is: the time to access the i-th element is the value. (Remember that this quantitative time is marked as o (1).)
Maybe we didn't realize that in fact, we always "feel a unique clock" on ordered data. For example, employee databases. Each employee is uniquely identified by Social Security Number. The format of the social security is DDD-DD-DDDD (the range of D is numbers 0--9). If we have a random arrangement store all employee information, we have to find employees from 111-22-3333, which may traverse all elements of the array - execute O (n) operation. A better way is to sort according to the social security number, and the lookup time is reduced to O (log n). Ideally, we are more willing to perform O (1) time to find information about a employee. One solution is to create a giant array that is the entrance of the actual social security number. Such an array is 000-00-0000 to 999-99-9999, as shown below: Figure 7: Store all 9-digit number of giant arrays
As shown, each employee information includes name, telephone, salary, etc., and its social security is indexed. In this manner, the time to access any of the employee information is set. The disadvantage of this solution is the waste of space - a total of 109, 1 billion different social security. If the company has only 1,000 employees, then this array only uses a space of 0.0001%. (For a point of view, if you want this array to take full advantage, maybe your company has to hire a world's population of people.)
Search with hash function
Obviously, information that creates 1 billion element arrays to store 1000 employees is unacceptable. However, we urgently need to increase data access speed to achieve a constant time. One option is to reduce the span of social security registration using the last four of the employee social ouloxa. In this way, the span of the array only needs to be from 0000 to 9999. Figure 8 shows the compressed array. Figure 8: Array after compression
This solution guarantees both access to a constant value and makes full use of storage space. After selecting the SMS, the four digits are random, and we can use the middle four digits, or choose 1, 3, 8, and 9.
This 9-digit converted into a 4-digit number is a hash conversion (Hashing). Hash conversion can convert an indexers space to a Hash Table.
Hash function realizes hash conversion. In the example of the social security number, the hash function h () is expressed as: H (x) = X last four
The input of the hash function can be any nine social humeyl, and the result is the latter four digits of the social security number. In mathematical terms, this method of converting nine-digits to four digits is called hash elements mapping, as shown in Figure Nine: Hash function diagram
Figure Nine illustrates a behavior-collision (Collisions) that will appear in the hash function. That is, we will appear the same value when mapping a relatively large set element to a relatively small concentration. For example, all the four bits 0000 in the social security number are mapped to 0000. So 000-99-0000, 113-14-0000, 933-66-0000, there are many other things will be 0000.
What happens if we want to add a social security number 123-00-0191? Obviously trying to add that the employee will conflict because there is already an employee at 0191.
Mathematical label: Hash function is more described more in mathematical terminology as F: A-> B. Where | A |> | B |, the function f is not a mapping relationship, so there is a conflict.
Obviously, the occurrence of conflicts will generate some problems. In the next section, we will look at the relationship between hash functions and conflicts, and then simply deal with several mechanisms of conflict. Next, we will focus on the System.Collection.hashtable class and provide a hash table implementation. We will learn about the HashTable class has a hash function, a conflict resolution mechanism, and some examples of using havehtable. Avoid and resolve conflicts
When we add data to the hash table, the conflict is a factor that caused the entire operation. If there is no conflict, the insertion element operation is successful, and if a conflict has occurred, it will need to judge the reason. As a result of improving the price, our goal is to press the conflict to the lowest possible.
The frequency of the conflict in the hash function is related to the data distribution transferred to the hash function. In our example, it is a good choice to use the last four digits using the last four digits. However, if the social security is assigned by employee birth year or birth address, because employees' birth year and address are obviously not uniform allocation, then the four digits will be made due to a large number of repetitions.
Note: There is a certain statistical knowledge for the analysis of hash functions, which exceeds the scope discussed herein. Asnewally, we can use the K-k slots's hash table to ensure avoid conflicts, it can map a random value from the hash function of the hash to any particular element, and define within 1 / k . (If this makes you more confused, don't worry!)
We will choose the appropriate hash function to become a conflict avoidance mechanism (Collision Avoidance), have many research designs, because the choice of hash functions directly affects the overall performance of the hash table. In the next section, we will introduce the use of hash functions in the HashTable class of .NET Framework.
There are many ways to handle conflicts. The most direct method, we call the "Collision Resolution", which is to insert an object in the hash table to another space, because the actual space has been taken. One of the simplest methods is called "linear probing", and the implementation steps are as follows: 1. When you want to insert a new element, use the hash function in the hash table; 2. Check if this location already exists, if the content is empty, insert and return, otherwise the stepping step 3.3. If the address is i, check if i 1 is empty. If it is already occupied, check i 2, so on, you can find a location where you find an empty position.
For example: If we want to insert five employees into the hash table: Alice (333-33-1234), BOB (444-44-1234), CAL (555-55-1237), Danny (000-00 -1235), AND Edward (111-00-1235). When the information is added, as shown in Figure 10: Figure 10: Five employees with similar social security
Alice's social security number was "Hash (here the verb, the translation)" is 1234, so the storage position is 1234. Next, the social security number of BOB is also "hash" is 1234, but the information of the Bob is put in the next position because the location 1234 already exists, the information of BOB is placed in the next position - 1235. After that, the addition of Cal, the hash value of 1237, 1237 is empty, so CAL is placed in 1237. The next is Danny, the hash value is 1235.1235 has been occupied, then check whether the 1236 position is empty. Since it is empty, Danny is placed there. Finally, add edward information. Similarly, his hash is preferably 1235.1235 has been occupied, and the 1236 is also occupied, and then inspects 1237 until 1238, this location is empty, so Edward is put in 1238 position. When searching a hash table, the conflict still exists. For example, as shown above, we have to access the edward information. So we will set your Edward's social security number 111-00-1235 has 1235, and start searching. However, we found Bob at 1235, not Edward. So we search for 1236 and found Danny. Our linear search continues to find the location where you find Edward or find the content empty. As a result, we may conclude that the employees who have a social security number 111-00-1235 do not exist.
Although linear mining is simple, it is a good strategy to resolve conflicts because it will cause clustering. If we want to add 10 employees, the four bits of their social security number are 3344. So there are 10 consecutive spaces from 3344 to 3353. Find any of these 10 employees to search for this cluster position space. Moreover, the employee added to any hash value in the range of 3344 to 3353 will increase the length of this cluster space. To quickly query, we should make the data evenly distributed, not a few places to form a cluster.
Better excavation technology is "Quadratic ProBing), each time the step size of the location is increased in square. That is, if the position S is occupied, first check S 12, then check S-12, S 22, S-22, S 32, rather than such push, not like linear mining, from S 1, S 2 ... linear growth. Of course, the second digging also results in similar polymerization.
In the next section we will introduce the third conflict resolution mechanism - the secondary hash, which is applied to the Hash table class of .NET Framework.
The System.Collections.hashtable class .NET Framework base class includes the implementation of the HashTable class. When we want to add an element to the hash table, we must not only provide elements (Item), but also provide keywords (Key) for this element. Key and Item can be any type. In the employee example, Key is the social security number of the employee, and ITEM is added to the hash table via the add () method.
To get an element (item) in the hash table, you can use Key as an index access, just like the index in the array. The following C # small program demonstrates this concept. It adds some elements to the hash table with string values as Key. And access specific elements via Key.
Using system; using system.collections;
Public class hashtabledededeMo {private static hashtable Ages = new hashtable ();
Public static void main () {// add some value to the Hashtable, Indexed by A String Key Ages.Add ("Scott", 25); Ages.Add ("Sam", 6); Ages.Add ("JISUN" , 25); // Access a particular key if (Ages.Containskey ("scott")) {int Scottsage = (int) Ages [Scott "]; console.writeline (" Scott IS " Scottsage.toString ()) } Else Console.writeline ("Scott Is Not In The Hash Table ...");}} The containskey () method in the program is the specific Key to determine whether there is a qualified element, return to the Boolean value. The HashTable class contains the Keys property (Property), returns a collection of all keywords used in the hash table. This attribute can be accessed, as follows:
// Step Through items in The HashtableForech (String Key In Ages.keys) Console.Writeline ("Value At Ages [/" Key "/"] = " Ages [key] .tostring ());
It is not true that the order in which the order of insertion elements is not necessarily the same. The keyword collection is based on the element corresponding to the stored keyword. The result of the above program is:
Value at Ages ["JISUN"] = 25Value At Ages ["Scott"] = 25Value At Ages ["SAM"] = 6
Even the order in which the hash table is inserted is: Scott, SAM, JISUN.
HashTable class has a hash function
The hash function in the HashTable class is more complicated than the hash value of the social security number introduced earlier. First, to remember that the value returned by the hash function is the number of orders. For the example of the social security number, it is easy to do because the social security name is a number. We only need to intercept the last four digits, you can get the right hash value. However, the HashTable class can accept any type of value as a key. Just like the above example, Key is a string type, such as "Scott" or "SAM". In such an example, we naturally want to understand how the hash function converts String into numbers.
This wonderful conversion should be attributed to the getHashCode () method, which is defined in the System.Object class. The GetHashCode () default implementation in the Object class is to return a unique integer value to ensure that it is not modified in the life of the Object. Since each type is derived from Object directly or indirectly, it can access the method so Object. Natural, string or other types can be represented by a unique numeric value.
The definition of the hash function in the HashTable class is as follows:
H (key) = [behash (key) 1 1)% (Hashsize - 1))]% HashSize's GetHash (KEY), default is to call to KEY GetHashCode () The return value of the method (although when using HashTable, you can customize the gethash () function). Gethash (key) >> 5 indicates that the hash value will be obtained, and 5 bits to the right, which is equivalent to dividing the hash value at 32. The% operator is the sample operator previously introduced. HashSize refers to the length of the hash table. Since the mode is to be sampled, the last result h (k) is between 0 to HashSize-1. Since HashSize is the length of the hash table, the results are always within the acceptable range.
Conflict solution in the HashTable class
A conflict occurs when we add or get an element in a hash table. When inserting an element, you must find the location where the content is empty, and when you get the element, the element must also be found even if it is not in the expected location. As we briefly introduced two mechanisms - linearity and second digging of conflicts. Using a completely different technique in a HashTable class, it is a secondary hashing (some data also referred to as double precision hash double hashing).
The working principle of the Seconditude Hash is as follows: There is a collection of multiple hash functions (H1 ... HN). When we want to add or get an element from a hash table, use the hash function H1. If the conflict is caused, try to use H2, until hn. Each hash function is extremely similar, and the different multiplication factors are selected. Typically, the definition of the hash function HK is as follows: HK (key) = [GetHash K * (1 ((Gethash (KEY >> 5) 1)% (HashSize - 1)))]% Hashsize
Note: It is important to use the secondary hash after executing the HashSize mining, each location in the hash table is exactly one visit. That is, for a given Key, Hi and HJ will not be used at the same location in the hash table. The use of a secreated hash formula in the HashTable class, which is guaranteed to be: (1 (KEY >> 5) 1)% (HashSize - 1)) and has both of them. (Two numbers). The number of mutual probes indicates that there is no common sense factor.) If HashSize is a prime number, the number of two numbers is guaranteed.
Seconditude has greatly avoided conflicts than the first two mechanisms.
Calling factors and expansion hash table
The HashTable class contains a private member Variable LoadFactor, which specifies the maximum ratio between the number of elements in the hash table and the total number of table positions. For example: LoadFactor is equal to 0.5, then only half of the space in the hash table stores the element value, and the remaining half is empty.
The hash table constructor is used to overload, allowing the user to specify the LoadFactor value, define the range of 0.1 to 1.0. It should be noted that no matter how much you provide, there is no more than 72%. Even if you pass the value of 1.0, the LoadFactor value of the HashTable class is still 0.72. Microsoft believes that the optimal value of LoadFactor is 0.72, so although the default LoadFactor is 1.0, but the internal is automatically changed to 0.72. So, it is recommended that you use the default value 1.0 (in fact it is 0.72, some confused, isn't it?) Note: I spent a few days to consult Microsoft's developers why do you want to use automatic conversion? I can't understand why they don't directly specify between 0.072 to 0.72. Finally, from the answer to the development team of the HashTable class, they very popular the words of the problem. In fact, this team has been tested that if LoadFactor exceeds 0.72, it will seriously affect the performance of the hash table. They hope that developers can use hash tables better, but they may not remember this irregular number. On the contrary, if it is 1.0 is the optimum value, developers will easier remember. As a result, it has formed the present results, although there is a small sacrifice, but we can use the data structure more convenient to use, without feeling a headache.
When adding new elements to the HashTable class, check to ensure that the proportion of elements and space sizes will not exceed the maximum ratio. If it exceeds it, the hash table space will be expanded. The steps are as follows: 1. The location space of the hash table is approximated. Accurately, the location space value increases from the current value to the next maximum value. (Recover the working principle of the secondary hash mentioned, the location space value of the hash table must be the number of prime.) 2. Since the secondary hash, all element values in the hash table will depend on the location space value of the hash table, so all the values in the table also need a mission (because the position space value in the first step is increased) .
Fortunately, the Add () method in the HashTable class hides these complex steps, you don't need to care about it.
The impact of the call factor on the conflict determines the overall length of the hash table and the number of excavation operations. The larger the Load Factor, the more dense hash, the less space, the more the relatively sparse hash table, the more the number of excavation operations. If it is not accurate, the expected number of excavation operations is approximately 1 / (1-lf) when the conflict occurs, where LF refers to the load factor.
As mentioned earlier, Microsoft sets the default call factor of the hash table to 0.72. Therefore, for each conflict, the average number of excavations is 3.5 times. Since the number is not related to the actual number of actual elements in the hash table, the gradual access time of the hash table is O (1), which is obviously much better than the array of O (n).
Finally, we have to realize that the expansion of the hash table will be at the expense of performance loss. Therefore, you should pre-estimate the total number of elements that can be accommodated in your hash table, and constructs in a suitable value when initializing the hash table to avoid unnecessary expansion.