Data structure learning (C ++) continued - sort [6] internal sorting summary

zhaozj2021-02-16  63

The base sorting will be mentioned later, I think it is a bit wrong with it and the previous sorting algorithm is. The four types of sorting methods are introduced in front, each with basic and improved types. For internal sorting, our most concerned is of course speed, which is why fast-discipline is popular. Taking into account the shortcomings of the fast row, sometimes we may use stack or Hill to sort.

The above may be the most direct ideas for choosing the sort method (our choice is not wide, just have turned over, better, synthesis, engage in "miscellaneous"), for "gamblers" Thinking, most of us may play the world with the worst situation? I am not so unlucky? In addition, a "three taken" (or randomly selected). However, sometimes the speed is not everything, we still have to consider the stability of sorting.

There is no stability in front, mainly to consider that some algorithms are not very important "whether stability" is not a way to interfere with readers, and the previous test cannot reflect stability. Now we put together. The stability of the sorting means that for the same keyword, whether the order has changed after the order is completed, maintaining the primary state is stable, otherwise it is unstable. In fact, it is said that if a multi-key sequence, the multiple sorting results can accumulate - whether multiple sorts can eventually achieve the expected order. For an alteration, first we sort a sequence of students according to the student (in actual applications, the initial sequence is always like this, we don't need to sort it), then sort the results, we hope to learn Based on the front side (expected order), if the sort is unstable, it will destroy the sequence ranked in front of the student number, resulting in the final expected sequence. Noted that "expected or order", in the example of the score, if we do not require the same level of school, sorting is stable, it is irrelevant.

What is the sorting algorithm stable? First look at what has led to "unstable". Notice that the four types of methods in front have a stable algorithm (this is an existing conclusion, don't ask me how I come, anyway, ^ _ ^), sorted ideas should not be unstable factors. Sorting affirmative movements (or modifications of pointers), moving methods are translated (in-line sorting), exchange (bubble sort), rearrangement (insert, return to). Carefully observe, it will find that the exchange of non-adjacent position records is a factor that causes unstable. In this way, all the sorting algorithms with this hidden danger are unstable, for the original stable algorithm, if such a switching strategy is used, it will lead to unstable, such as direct choice of ordering for the linked list, but For arrays, it is unstable, however, if translation is used instead of the original exchange, it is also stable for arrays (it is estimated that no one is willing to change the original exchange to panning a pile).

In addition, for the original stable algorithm, after the keyword judgment is changed, for example, it is greater than that is greater than or equal, it can cause the movement of the mobile, thereby stabilizing the algorithm becomes unstable, but this low-level mistake is not we discussed Range - deliberately change the stability, no performance improvement, who is doing disease.

When you understand the essence of stability, you can see the base sorted.

Boundary sort

I was very surprised to hear the sorting method that broke through the lower limit of O (nlogn). I actually read it later, and our daily life is often in the application, but we didn't pay attention. We all played the "December" quaile game, when we smoke 6, he would put him in the 6th position (the 6th place in the first row), if everything goes well, finally get 12 stack, in turn, 1, 2 ... 12, can be seen, sorting is achieved. Let's take a look at the sort of integers in 0 ~ 999, assuming that the numbers are not repeated, the most direct is to build a 1000-size array a [], if it is 1, put it in a [1], if it is 400, put it To a [400], after putting the numbers, from A [0] to the A [999], the sort is complete. Clearly, here is used, the best lookup performance of the hash table is O (1), the overall point of view, the time complexity of the above 0 ~ 999 non-repetitive digital is O (N) . When there is a repeating number, the method of processing conflicts used here is that the chain address method - forms all repeated numbers into a linked list hung in the corresponding location. Obviously, this process is only a rearrangement of the list, so it is stable (there is no way to write unstable.).

On the basis of "allocation-collection" above, the base sort can be completed. Discussing whether the base sorting is actually a very ridiculous thing, because the premise of the base sorting can work, it must be stable - it is a multi-keyword sorting accumulation result, if there is unstable operation, the whole result is Incorrect.

For single keywords, or can be disconnected by word, then collect, time complexity is O (N R) (last collection process o (r)); either with too much storage, you can also remove multiple Keywords, additional storage into exponentials (not rising ^ _ ^), naturally, more assignments collected several times. Because the high keyword determines the final order of the sequence, you must finally do the high-level allocation collection, the base sorting is generally the LSD (lowest position).

In addition, don't see the sorting of the base, you think of the top, ten, one-level decomposition, notice that "Base" concept, how many base decomposition you can use, such as "1000) of 1000 into 1. The routine is not given, because the limit conditions of the base sorting are too harsh.

About external sort

This seems very mysterious, but as long as you know the segmentation of "return", it will understand that such a task is also capable of completing, and how to improve efficiency.

When it comes to disk, we always think that "memory operation time" is far less than "read time", but the current technology makes reading time is getting shorter, my own feelings are not from hard drives. Reading 40MB content is fast (the time of tens of thousands of chart in my machine is 18s, maybe it is not good to write in my algorithm). But the speed of the speed of the speed is to reduce the information flow of unnecessary slow devices, just like cache in memory, memory in the hard disk. All we can think of improving the row speed of the row, nor does it don't focus on reducing the information flow of memory and existence. The techniques used here have increased the number of sideways, increasing the initial partition length, the best merged tree, and the like.

However, in fact, for more than 1,000 data, we have never managed themselves, all borrowed the database. That is, if you do not write the database, it is estimated that it is not used to use; if you write the database, the knowledge of the outer row is just a nine bull.

Compared to the common use of the internal row, the outer draft may not be the technique that must be mastered. It should be to provide us with how to solve the idea of ​​"insufficient memory", and how to improve the performance of the extent. There are no routines that have been real simulated, and they are not ugly.

转载请注明原文地址:https://www.9cbs.com/read-22957.html

New Post(0)