Data structure learning sorting

xiaoxiao2021-03-06  41

The routines behind the test program are sorted to the array, and the use of static linked lists also apply to the sort of the linked list. For simplicity, only the single key code is sorted, and the last result is arranged in an ascending order from beginning to tail. Below is a unified test program:

#include #include using namespace std; #include #include #include #include "insertsort.h" #define random (NUM) (Rand) ()% (NUM) # Define Randomize () SRAND ((unsigned) Time (NULL)) # Define N 1000 // Sort Element #define Sort InsertSort // Sort Method Class Timer // Unit MS {PUBLIC: Void Start () {start_t = clock ();} clock_t time () {return (clock () - start_t);} private: clock_t start_t;}; int KCN, RMN; Timer Timer; Void Test (int A []) { Timer.Start (); sort (a, n, kcn, rmn); cout << "TTIMESPARED:" << Timer.Time () << "ms" << endl; cout << "kcn =" << Left << SETW (11) << kcn; cout << "KCN / N =" << Left << SETW (11) << (double) KCN / N; cout << "KCN / N ^ 2 = "<< Left << SETW (11) << (double) KCN / N / N; cout <<" KCN / NLOGN = "<< Left << SETW (11) << (double) KCN / N / LOG (Double) N) * log (2.0) << endl; cout << "RMN =" << Left << setw (11) << RMN; cout << "RMN / N = << Left << setw 11) << (double) RMN / N; cout << "RMN / N ^ 2 =" << Left << setw (11) << (double) RMN / N / N; cout << "RMN / NLOGN = "<< Left << SETW (11) << (double) RMN / N / LOG ((( Double) * log (2.0) << end1;} int main () {INT i; // randomize (); In order to compare each sort algorithm in the same case, do not add this sentence int * ascending = new int [N ]; // Demcending = new int = new int * randomness = new int * randomness = new int =N]; // Random sequence for (i = 0; i

) SWAP (Randomness [I], randomness [random (n)]); cout << "sort assending n =" << n; testing; cout << "sort randomness n = << n; test (TEST) Randomness); cout << "sort descending n =" << n; testing; return 0;} Need to explain a point, KCN (number of keyword comparisons), RMN (number of recordings) is not algorithm, In order to have an intuitive evaluation of the performance of the algorithm (without those formulas calculated). Sorting more than 10,000 integer should be the most surprising test means. It is recommended not to increase the number of records. First, in the worst case, it is necessary to wait for too long. When it is worse, the number of records may overflow, causing the program to crash.

Insert Sort Basic Thought is that a record of to be sorted, inserted into the appropriate position of the recorded record in front of its key code, and can be tail from the head. Direct insertion

Template void insertsort (t a [= 0) {kcn = 0; RMN = 0; for (INT i = 1; i 0 && kcn && temp

This is the case after a streamlined:

Template void insertsort (T a [= 1; I 0 && Temp

Test Results:

Sort ascending n = 10000 Timespared: 0mskcn = 9999 kcn / n = 0.9999 kcn / n ^ 2 = 9.999E-005 kcn / nlogn = 0.07525RMN = 1999 RMN / N = 1.9998 RMN / N ^ 2 = 0.00019998 RMN / NLOGN = 0.1505 sort randomness N = 10000 TimeSpared: 330msKCN = 24293730 KCN / N = 2429.37 KCN / N ^ 2 = 0.242937 KCN / NlogN = 182.829RMN = 24303739 RMN / N = 2430.37 RMN / N ^ 2 = 0.243037 RMN / NlogN = 182.904Sort descending N = 10000 Timespared: 711 mskcn = 49995000 kcn / n ^ 2 = 0.49995 kcn / nlogn = 376.25 RMN = 50014998 RMN / N = 5001.5 RMN / N ^ 2 = 376.4 It can be seen that the average performance Approximate to N2 / 4, there is no deception (nonsense, how many people do how many tests have been drawn). This sorting method can be obtained by folding the search strategy in the direct insertion sort by order search. Obviously, only KCN can only reduce RMN, and the performance improvement can be increased.

Template void binaryinsertsort (T a [= 0) {kcn = 0; RMN = 0; for (INT i = 1; i = low; j -) {a [j 1] = a [j]; rmn ;} // Record after movement A [low] = Temp; RMN ; // Insert}}}

Test Results:

Sort ascending N = 10000 TimeSpared: 0msKCN = 123617 KCN / N = 12.3617 KCN / N ^ 2 = 0.00123617 KCN / NlogN = 0.930311RMN = 19998 RMN / N = 1.9998 RMN / N ^ 2 = 0.00019998 RMN / NlogN = 0.1505Sort randomness N = 10000 TimeSpared: 320msKCN = 118987 KCN / N = 11.8987 KCN / N ^ 2 = 0.00118987 KCN / NlogN = 0.895466RMN = 24303739 RMN / N = 2430.37 RMN / N ^ 2 = 0.243037 RMN / NlogN = 182.904Sort descending N = 10000 TimeSpared : 631mskcn = 113631 kcn / n = 11.3631 kcn / n ^ 2 = 0.00113631 kcn / nlogn = 0.855158RMN = 5001.5 RMN / N ^ 2 = 0.50015 RMN / NLOGN = 376.4 can see KCN approximation NLOG2N, there is A certain performance is improved. Table Insert Sort If you use the "pointer" to indicate the order in the record, you can avoid a lot of record movement, of course, or finally rearrange it according to the "pointer". Natural, folding is not used here.

Template Void TableInsertsort (T A [], INT N, INT & KCN, INT & RMN) {KCN = 0; RMN = 0; int * link = new int [n]; int head = 0, pre, cur, I; LINK [0] = -1; for (i = 1; i a [i]) {link [i] = head; Head = i; kcn ;} // No head, so this judgment does not record KCNELSE {for (cur = head; cur! = -1 && kcn && a [cur = -1 && KCN && A; CUR! LINK [CUR]) pre= Cur; LINK [pre] = i; link [i] = cur;}} cur = head; // Rearbor sequence for (i = 0; i

Test Results:

Sort ascending n = 10000 Timespared: 751mskcn = 49995000 kcn / n = 4999.5 kcn / n ^ 2 = 0.49995 KCN / NLOGN = 376.25RMN = 0 RMN / N = 0 RMN / N ^ 2 = 0 RMN / NLOGN = 0sort randomness n = 10000 TimeSpared: 621msKCN = 25721250 KCN / N = 2572.13 KCN / N ^ 2 = 0.257213 KCN / NlogN = 193.572RMN = 29955 RMN / N = 2.9955 RMN / N ^ 2 = 0.00029955 RMN / NlogN = 0.225434Sort descending N = 10000 TimeSpared: 0mskcn = 9999 kcn / n = 0.9999 kcn / n ^ 2 = 9.999E-005 kcn / nlogn = 0.07525RMN = 15000 rmn / n = 1.5 RMN / N ^ 2 = 0.00015 RMN / NLOGN = 0.112886 can be seen, indeed reduced RMN, theoretically RMNMAX = 3 (N-1). However, in the average of the average, performance is not as simple as simple - this is because the test object is an integer. For linked lists, this method does not require the final rearrangement. The algorithm for rearrangement has a detailed description of the "Data Structure (C language version)" in severeity.

The average efficiency of the algorithm in Hill is not very good, but we note that the direct insertion sorting is in the case of the critical code, the efficiency is the best, and when the number of key codes is small, N The gap between N2 is not so obvious. Based on the above facts, DLShell proposed narrowing incremental sorting in 1959 (old antiques). Basic idea is: Take an interval (GAP), divide the sequence into several subsequences, and insert each subsequences; then Gradually shrink interval, repeat the above process until the interval is 1. At the beginning, there is very little key in each sub-sequence, and the intersection is high; with the narrowing of the interval, the key code of the subsequence is increasing, but the key code has basically In order, the efficiency of direct insert is still high. Hill sorted time complexity is not well estimated, GAP's selection has not contained, the program of GAP = [GAP / 2] is best written, as for why, I will know.

Template void shellsort (T a [], int N, int & KCN, INT & RMN) {KCN = 0; RMN = 0; for (int GAP = N / 2; GAP; GAP = GAP / 2) For INT i = GAP; I = GAP && KCN && Temp

Test Results:

Sort ascending N = 10000 TimeSpared: 0msKCN = 120005 KCN / N = 12.0005 KCN / N ^ 2 = 0.00120005 KCN / NlogN = 0.903128RMN = 240010 RMN / N = 24.001 RMN / N ^ 2 = 0.0024001 RMN / NlogN = 1.80626Sort randomness N = 10000 TimeSpared: 10msKCN = 258935 KCN / N = 25.8935 KCN / N ^ 2 = 0.00258935 KCN / NlogN = 1.94868RMN = 383849 RMN / N = 38.3849 RMN / N ^ 2 = 0.00383849 RMN / NlogN = 2.88875Sort descending N = 10000 TimeSpared : 10 mskcn = 172578 kcn / n = 17.2578 kcn / n ^ 2 = 0.00172578 KCN / NLOGN = 1.29878RMN = 302570 RMN / N = 30.257 RMN / N ^ 2 = 0.0030257 RMN / NLOGN = 2.27707 Noted that the test results at this time are not Accurate, the sort of 10,000 intenses has not been tested (it is estimated that the new machine is 0ms, I also have a difference here everything is 0). Therefore, the following is retelled with 100,000 intense sorts:

Sort ascending N = 100000 TimeSpared: 140msKCN = 1500006 KCN / N = 15.0001 KCN / N ^ 2 = 0.000150001KCN / NlogN = 0.903094RMN = 3000012 RMN / N = 30.0001 RMN / N ^ 2 = 0.000300001RMN / NlogN = 1.80619Sort randomness N = 100000 TimeSpared: 230msKCN = 4041917 KCN / N = 40.4192 KCN / N ^ 2 = 0.000404192KCN / NlogN = 2.43348RMN = 5598883 RMN / N = 55.9888 RMN / N ^ 2 = 0.000559888RMN / NlogN = 3.37086Sort descending N = 100000 TimeSpared : 151mskcn = 2244585 kcn / n = 22.4459 kcn / n ^ 2 = 0.000224459kcn / nlogn = 1.35137RMN = 384572 RMN / N = 38.4457 RMN / N ^ 2 = 0.000384457RMN / NLOGN = 2.31466

This result indicates that Hill sort has little worse cases, whether it is positive or sequence, reverse order, disorder, and use time is not a lot, and additional storage is O (1), it is really very good. It is indeed a good choice until you don't know how to sort quickly, but it is indeed a good choice. I have been using it.

The basic idea of ​​switching is: two two two-two keywords to be sorted recorded, if an inverse order occurs, exchange it until all objects are ranted. Bubbling sorting sorting is a compared two records, and the reverse order is exchanged. Such a method leads to a small key code layer, so the name is obtained. 9CBS Forum once discussed "bubble" and "bubble" are not a thing. It seems that this is a disaster that translators. The English name is Bubble Sort, and it can be in the case of being written. . (Strict version is from the back, Yin version is from the back of the back, it is good to translate the two books as "bubble sort", otherwise it is as the conclusion of some people - one is from The front row, one is from the back row) Template void bubblesort (t a [] []) {kcn = 0; RMN = 0; BOOL Exchange = true; for (int i = 1; I = i; j -) {Exchange = false; if ( KCN && A [J - 1]> a [J ]) {SWAP (A [J - 1], A [J]); Exchange = True; RMN = 3;}}}

It should be noted that don't write it below, although the result is correct:

Template void bubblesort2 (t a [= 0; i a [j]) SWAP (a [j - 1], a [j]);}

Test Results:

Sort ascending n = 10000 Timespared: 0mskcn = 9999 kcn / n = 0.9999 kcn / n ^ 2 = 9.999E-005 kcn / nlogn = 0.07525RMN = 0 RMN / N = 0 RMN / N ^ 2 = 0 RMN / NLOGN = 0sort randomness N = 10000 TimeSpared: 1161msKCN = 45409094 KCN / N = 4540.91 KCN / N ^ 2 = 0.454091 KCN / NlogN = 341.737RMN = 71526984 RMN / N = 7152.7 RMN / N ^ 2 = 0.71527 RMN / NlogN = 538.294Sort descending N = 10000 Timespared: 1022mskcn = 49995000 kcn / n = 4999.5 kcn / n ^ 2 = 0.49995 KCN / NLOGN = 376.25RMN = 149985000 RMN / N = 1498.5 RMN / N ^ 2 = 1.49985 RMN / NLOGN = 1128.75

It can be seen that the efficiency is very poor, it is better to sort, I really don't know why people are here, is it to understand rapid sorting? There is another interesting phenomenon. Although the reverse sequence KCN and RMN are more than the downtime, the time in the reverse sequence is less than the chaos, from here you can see the role of the CPU pipeline, here you can give us a signal, one Really good algorithms need to make full use of hardware characteristics. When the number of records (n = 1000000), it can be seen that in the case of complete or order, the foaming is better than the insertion, as mobile records are required at this time. Quick sorting is really sorrowful for this algorithm. Even a name that can indicate the substance of the algorithm (such as insertion, form plug) is not, and is not like Hill's sorting is named by inventors. Is it because it is too fast? ? Perhaps "fast" is the highest honor of sorting algorithms. Basic ideas are: one record of serving sequence as a reference, according to this key code size, divide the entire sequence into two sequences - all records of all records on the left are smaller than the reference small (or etc.), right side Both are larger than the benchmark, the baseline is placed between two subsequences, apparent that the reference is placed in the last position. The above procedure is repeated separately from the left and right subsequences, and all records are placed in the corresponding position. The following routines are not easy to understand, because this is after several improvements: Template Int Partition (T a [], int LEFT, INT RIGHT, INT & KCN, INT & RMN) {Int Pivotpos = Left T pivot = a [left]; // pivot for (int i = left 1; i <= right; i ) IF ( KCN && A [i]

The calculated pivot position is used as a function to avoid the use of useless temporary variables when recursive. When you decide to use recursive, pay attention to this - put everything can be placed outside. Note how this function reaches our "The left side is smaller than it, the right side is bigger than it".

Template void qsecurve (T a [] [) {IF (Left (A, LEFT, RIGHT, KCN, RMN ); QSrecurve (A, LEFT, PIVOTPOS - 1, KCN, RMN); QSrecurve (A, Pivotpos 1, Right, KCN, RMN);}} Template Void Quicksort (T A [], int N, INT & KCN, INT & RMN) {KCN = 0; RMN = 0; QSrecurve (A, 0, N - 1, KCN, RMN);}

These two can only be considered a housing, especially the last one. Test Results:

Sort ascending N = 10000 TimeSpared: 1051msKCN = 49995000 KCN / N = 4999.5 KCN / N ^ 2 = 0.49995 KCN / NlogN = 376.25RMN = 29997 RMN / N = 2.9997 RMN / N ^ 2 = 0.00029997 RMN / NlogN = 0.22575Sort randomness N = 10000 TimeSpared: 0msKCN = 155655 KCN / N = 15.5655 KCN / N ^ 2 = 0.00155655 KCN / NlogN = 1.17142RMN = 211851 RMN / N = 21.1851 RMN / N ^ 2 = 0.00211851 RMN / NlogN = 1.59434Sort descending N = 10000 TimeSpared : 1082 mskcn = 49995000 kcn / n = 4999.5 kcn / n ^ 2 = 0.49995 kcn / nlogn = 376.25RMN = 29997 RMN / N = 2.999 RMN / N ^ 2 = 0.0002997 RMN / NLOGN = 0.22575 It can be seen that the average performance is very good. But the performance at both ends is not as straight. The case of testing n = 100000 is as follows (Million Remember to comment out the test and reverse test, otherwise, "crash" not to find me)

Sort randomness N = 100000 TimeSpared: 110msKCN = 2123221 KCN / N = 21.2322 KCN / N ^ 2 = 0.000212322KCN / NlogN = 1.27831RMN = 3010848 RMN / N = 30.1085 RMN / N ^ 2 = 0.000301085RMN / NlogN = 1.81271

It is really very "fast", but its worst situation is really unfameful, in case ..., and because of the use of stack recursive, the worst case is not allowed to collapse. In order to reduce this adverse tendency, the improved approach is "Triple Taking", that is, the first, last, one of the key code hosted in the middle of the sequence to be sorted as the reference. Just change the partition function.

Template Int Partition (T a [], int LEFT, INT RIGHT, INT & KCN, INT & RMN) {INT MID = (Left Right) / 2; IF ( KCN && A [Left]> A [ MID]) {IF ( KCN && A [LEFT]> a [Right]) {IF ( KCN && A [MID]> A [Right]) {swap (a [MID], A [LEFT]) ; RMN = 3;} else {swap (a [Right], A [Left]); RMN = 3;}}} else {IF ( KCN && A [Left]

Sort ascending N = 10000 TimeSpared: 0msKCN = 131343 KCN / N = 13.1343 KCN / N ^ 2 = 0.00131343 KCN / NlogN = 0.988455RMN = 35424 RMN / N = 3.5424 RMN / N ^ 2 = 0.00035424 RMN / NlogN = 0.266592Sort randomness N = 10000 TimeSpared: 0msKCN = 154680 KCN / N = 15.468 KCN / N ^ 2 = 0.0015468 KCN / NlogN = 1.16408RMN = 204093 RMN / N = 20.4093 RMN / N ^ 2 = 0.00204093 RMN / NlogN = 1.53595Sort descending N = 10000 TimeSpared : 280mskcn = 12517506 kcn / n = 1251.75 kcn / n ^ 2 = 0.125175 kcn / nlogn = 94.2036RMN = 45006 RMN / N = 4.00045006 RMN / NLOGN = 0.338704

The following is the test results of n = 100,000, still very embarrassing during the reverse order, but it is still the past.

Sort ascending N = 100000 TimeSpared: 60msKCN = 1665551 KCN / N = 16.6555 KCN / N ^ 2 = 0.000166555KCN / NlogN = 1.00276RMN = 393210 RMN / N = 3.9321 RMN / N ^ 2 = 3.9321e-005RMN / NlogN = 0.236736Sort randomness N = 100000 TimeSpared: 110msKCN = 1888590 KCN / N = 18.8859 KCN / N ^ 2 = 0.000188859KCN / NlogN = 1.13704RMN = 2659857 RMN / N = 26.5986 RMN / N ^ 2 = 0.000265986RMN / NlogN = 1.60139Sort descending N = 100000 Timespared: 42120MSKCN = 1250175006 KCN / N = 12501.8 kcn / n ^ 2 = 0.125018 kcn / nlogn = 752.68RMN = 450006 RMN / N = 4.50006 RMN / N ^ 2 = 4.50006E-005RMN / NLOGN = 0.270931 However, we actually So many statements are not as high as "John to choose one" directly, for example, replace the following statements to replace the original bold statement:

SWAP (A [Left], A [Right-LEFT) Left]); RMN = 3;

Test Results:

Sort ascending N = 100000 TimeSpared: 90msKCN = 1917756 KCN / N = 19.1776 KCN / N ^ 2 = 0.000191776KCN / NlogN = 1.1546RMN = 378810 RMN / N = 3.7881 RMN / N ^ 2 = 3.7881e-005RMN / NlogN = 0.228066Sort randomness N = 100000 TimeSpared: 120msKCN = 1979189 KCN / N = 19.7919 KCN / N ^ 2 = 0.000197919KCN / NlogN = 1.19159RMN = 3175977 RMN / N = 31.7598 RMN / N ^ 2 = 0.000317598RMN / NlogN = 1.91213Sort descending N = 100000 Timespared: 110mskcn = 2069369 KCN / N = 20.6937 KCN / N ^ 2 = 0.000206937kcn / nlogn = 1.24588RMN = 2574174 RMN / N = 25.7417 RMN / N ^ 2 = 0.000257417RMN / NLOGN = 1.54981

It can be seen that the efficiency of the reverse order has a qualitative leap, and the random function has to write itself, because the RAND () of the library function can only output 0x7FFF, because the RAND function uses 32bit integers, in order to prevent (most serious) It is a negative number), which can only be output. A less stringent random function is as follows, the maximum output value is the maximum positive integer of 32bit:

INT RND (INT N) {static _int64 x; x = (2053 * x 13849)% 0x7fffffff; return (int) x% n;}

Choosing the basic idea is: Elect the i-th small record, put it in the i-th position (i's starting point is 0, according to this, the 0th small record is actually the smallest, there is a little awkward, no matter how much . When I = N-1 is finished. Direct selection of sorting straight sequencing simple reproduces the basic idea of ​​choosing sort, the cost of finding the minimum element is O (N), if you don't do some special processing, you use the easiest way to find a method each time, nature Sorting time complexity is O (N2). Template Void SelectSort (T a [], INT N, INT & KCN, INT & RMN) {KCN = 0; RMN = 0; for (INT i = 0; I

Test Results:

Sort ascending n = 10000 Timespared: 721mskcn = 49995000 kcn / n = 4999.5 kcn / n ^ 2 = 0.49995 KCN / NLOGN = 376.25RMN = 0 RMN / N = 0 RMN / N ^ 2 = 0 RMN / NLOGN = 0sort randomness n = 10000 TimeSpared: 711msKCN = 49995000 KCN / N = 4999.5 KCN / N ^ 2 = 0.49995 KCN / NlogN = 376.25RMN = 29955 RMN / N = 2.9955 RMN / N ^ 2 = 0.00029955 RMN / NlogN = 0.225434Sort descending N = 10000 TimeSpared: 711mskcn = 49995000 kcn / n = 4999.5 kcn / n ^ 2 = 0.49995 kcn / nlogn = 376.25RMN = 15000 rmn / n = 1.5 RMN / N ^ 2 = 0.00015 RMN / NLOGN = 0.112886

It can be seen that the KCN is fixed to N (N-1) / 2. Another interesting thing is that the normal time of RMN = 0 is still more sequences close to 3 (N-1) in RMN. First, the test accuracy is not enough, and the floating 10ms in my machine is more frequently floating 10ms; the second is that the N (N-1) / 2 of KCN is 3 (N-1) of RMN. Some are slightly short.

Champion sort

From direct sequencing, the bottleneck of the algorithm is KCN, and in fact, for subsequent search minimum, the time complexity can be reduced to O (logn). The most direct practice is to use the tournament method. After the champion is taken away, just put the championship over again, then the "champion" in the remaining people will produce, and the number of race is the competition tree depth. When you are actually written, you can't get "stupid", not only a lot of memory, which will also lead to useless judgment. I have never seen a satisfactory routine (I am too disgusting in Yin version), I can't write it again, and I don't show ugly (in fact, this is inert, there is a quick row, most People will not interested in other internal, except for the base sorting). When you are bored, you may wish to write (or improve) the champion sort to send time, ^ _ ^. Stack sort

There are too many additional storage of the champion sort, and the maximum value or minimum value is high efficient, and we have a plot. Here, the maximum heap is used, and the space to be recorded is used as a stack of space, and the last record of the top record (current maximum) and the stack is exchanged. When the heap gradually shrinks into 1, the record is sorted. Obviously, the time complexity is O (nlogn), and there is no worse case.

Template void filterdown (t a [] [t {= 2 * i 1; t temp = a [i]; while (child = a [child]) Break; // does not need to be adjusted, end adjustment A [i] = a [child]; RMN ; i = child; child = 2 * i 1;} a [i] = Temp; RMN ;} template void Heapsort (t a [] [], int N, int & KCN, INT & RMN) {INT i; for (i = (n - 2) / 2; I> = 0; I -) FilterDown (A, I, N, KCN, RMN); // Generate the largest heap for for (i = n - 1; I> 0; i -) {swap (a [0], a [i]); RMN = 3; FilterDown (A, 0, I, KCN, RMN);}}

Test results, direct testing n = 100000:

Sort ascending N = 100000 TimeSpared: 110msKCN = 1556441 KCN / N = 15.5644 KCN / N ^ 2 = 0.000155644KCN / NlogN = 0.937071RMN = 2000851 RMN / N = 20.0085 RMN / N ^ 2 = 0.000200085RMN / NlogN = 1.20463Sort randomness N = 100000 TimeSpared: 160msKCN = 3047006 KCN / N = 30.4701 KCN / N ^ 2 = 0.000304701KCN / NlogN = 1.83448RMN = 3898565 RMN / N = 38.9857 RMN / N ^ 2 = 0.000389857RMN / NlogN = 2.34717Sort descending N = 100000 TimeSpared : 90mskcn = 4510383 kcn / n = 45.1038 kcn / n ^ 2 = 0.000451038kcn / nlogn = 2.71552 RMN = 574596 RMN / N = 0.0005746 RMN / NLOGN = 3.45943 overall performance is very good, additional storage 1, Not very bad, if you really don't worry about the worst case, the stack is really a good choice. There is still a case where KCN, RMN is more expensive - error 70ms is impossible, it seems that the role of CPU optimization is still very obvious (possibly related to cache).


New Post(0)