Sort

xiaoxiao2021-03-06  39

Improve "rapid sort" with "return"

[Time: 2003-11-2 Source: plainsong] Sort and search is the most commonly used algorithm we programmed, C programmers are lucky because there are various universal sort functions in the C standard library; The Delphi programmer is only TList.Sort and TstringList.Sort, so the Delphi programmer usually adds the sort function into its own common function unit. As a "general sort function", it is natural to choose in the "comparison" sort algorithm, and "rapid sort" is often the preferred choice for its efficiency advantage. But rapid sorting also has a problem that is often worried ... It is O (N2) in the worst case. Therefore, pessimistic programmers will usually use "heap sort" (in the "C Standard Library" book, the author recommends using Partial_Sort, because sort may be implemented with "quick sort" and Partial_Sort is usually used " Piles Sort by "). In order to "eat the fish and do not lose the bear's paw", many programmers are trying to improve "rapid sorting", so that it can be obtained in worst cases without affecting its excellent average performance. Can endure performance, the most successful thing should be "Intro Sort", and I am talking about another method. We know that the main idea of ​​"rapid sorting" is to recursively divide the sequence, divide the sequence into "large sequence" and "small sequence" in a pivot. If this division is average "one divided into two" each time, it will achieve the best performance; if each time is "deformed division" - divide the sequence into length 1 and N-1 Two parts, it degraded into the "bubble sort" of recursive achievements, and performance is imagined. To improve it, you should process this situation. A idea is to improve the division, so it does not generate "deformity division"; another idea is to check the division results, and special treatment if it is deformed. The method I here is the second. The basic idea is to check the length of the two subsequences generated by the division results. If one of the other is more than one limit, it is considered to be a "deformed division", which continues to use for shorter subsequences " Quick sorting ", and divide the longer sub-sequences into two subsequences separately sorted, and then consolidated again. The combination of two ordered sequences can be achieved as a linear time complexity, so that the time complexity of O (N * logn) can still be obtained at each time a deformity division. The basic algorithm is described as follows: Procedure Sort (data, size); begin m: = partition (data, size); moredata: = @data [m 1]; more: = size - m - 1; size: = m; if Size> m1n begin swap (moredata, data); end; sort (data, size); if more theize div Max_ratio> size dam sort (Moredata, Moresize Div 2); Sort (@MOREDATA [ Moresize Div 2], MORESize - MORESize Div 2); MERGE (MOREDATA, MORESize Div 2, MoreSize); Else Sort (moredata, more "; end; where partition is well known for" rapid sort "division, Merge (DATA, FIRST, SIZE) combines two sequences in DATA and sequence into a ordered sequence and stored in DATA.

The above algorithm believes that the value of the position m at the partition is divided, that is, the sequence can be divided into [0, m-1], [M, M, M] and [M 1, Size-1] three parts. If the implementation of Partition does not guarantee this, moredata should be DATA [M], and more, more, should also be Size - M. Now simply analyze this sort, it is best to partly divide each division at Partition, the merge will not occur, it is equivalent to "rapid sort". In the worst case, each partition is divided into two parts of length 1 and N-1, and it turns into a "second-way merger sort" that recursively realizes, but before each merge About N times comparisons, the role is only to separate an element does not participate in the merge. Therefore, the time complexity of this sort is still O (N * logn), and the number of comparisons is twice as sorting. Regarding the choice of MAX_RATIO, according to my experience, it should be that the number between 4-16 is better, I choose 8. I actually achieved more complicated than the above, one is to change the insertion sort in the sequence length of no greater than 16, and the other is to remove a recursive. Then make some efficient tests, the data is a string of 500,000 lengths of 10, which is approximately as follows: random distribution, 500,000 elements: sort: Time: 3.31; Comparative number: 10843175. Quicksort: Time: 3.28; Comparative number: 10936357. Introsort: Time: 3.35; Compaction Time: 10958355. Mergesort: Time: 4.20; Compaction Time: 13502620. Positive sequence distribution, 500,000 elements: sort: Time: 1.71; Compaction number: 8401712. Quicksort: Time: 1.91; Comparative number: 9262161. Introsort: Time: 1.80; Compaction Time: 8401712. Mergesort: Time: 1.72; Comparative number: 4766525. Inverse sequence distribution, 50,000 elements: sort: Time: 2.38; comparison number: 11737937. Quicksort: Time: 2.54; Comparative number: 12619014. Introsort: Time: 2.38; Comparative number: 11293745. Mergesort: Time: 1.69; Comparative number: 4192495. The same value, 50,000 elements: sort: Time: 1.41; Compaction number: 8401712. Quicksort: Time: 1.47; Comparative number: 9262161. Introsort: Time: 1.40; Comparative number: 8401712. Mergesort: Time: 1.43; Comparative number: 4766525. Waveform distribution (wavelength 1000), 500,000 elements: sort: Time: 2.52; Compaction number: 10658948. Quicksort: Time: 2.97; Compaction Time: 12971845. Introsort: Time: 3.02; Comparative number: 12672744.

Mergesort: Time: 2.71; Comparative number: 7978745. The peak-shaped distribution (the first half is positive, the second half is reversed in the first half), 500,000 elements: sort: Time: 2.42; Comparative number: 10401407. Introsort: Time: 5.13; Comparative number: 19211813. Mergesort: Time: 1.88; Comparative number: 5176855. Valley distribution (reversal of peak-shaped distribution), 500,000 elements: sort: Time: 2.29; Comparative number: 10944792. Introsort: Time: 5.29; Comparative number: 17898801. Mergesort: Time: 1.90; Comparative number: 5282136. Since this worse distribution is not very good, I have modified the partition function, making the positive sequence into its disadvantage, and then tested on the same machine: Positive order distribution, 500,000 elements: sort: Time : 2.77; Comparative number: 12011738. Introsort: Time: 4.31; Compaction Time: 19212426. Mergesort: Time: 1.73; Comparative number: 4766525. According to my analysis, the sort can be divided into two parts of Quicksort and Mergesort, and the time of the two spots is independent of each other, so I subtract this time with MerGesort, and then add the time of the random distribution of MergeSort time (Take 4.20), The result is 5.24, it should be almost almost. From this result, the worst situation of Introsort is basically almost similar. I would like to thank the 9CBS Leemars and ZHANYV, they made a lot of discussion on the sorting improvement and gave me a great help. Leemars offers me an excellent partition function. Full implementation of the code is given below: type TPointerList = array [0..32767] of Pointer; PPointerList = ^ TPointerList; PPointer = ^ Pointer; TLessThenProc = function (Left, Right: Pointer): Boolean; const SORT_MAX = 16; MAX_RATIO = 8; {********************************************************** ************************************ Functions: Partition Features: Divide a sequence into two subsequences, and the rear subsequences are not Big than the previous sub-sequence any value. Return subsequent sequence split index. Parameters: Data: PPointerList, source sequence. Size: integer, sequence length.

LESSTHEN: TLESSTHENPROC, is used to define a comparison function description: For "fast sort" pivot policy: the intermediate value of the value of 0, 0.5, 1 three, guarantee: a Result inevitable Not Lessthen (Data [A], DATA [Result]); *********************** *********************************************************** **} Function Partition (size): Integer; var m: integer; value: pointer; begin m: = size div 2; dec (size); if lessthen (data [0] , DATA [M]) THEN SWAP (Data [M], Data [0]); if Lessthen (Data [Size], Data [0]) THEN SWAP (Data [Size], Data [0]); if Lessthen Data [0], DATA [M]) THEN SWAP (Data [M], DATA [0]); Value: = Data [0]; Result: = 0; While Result

SIZEFIRST, SIZESECOND: INTEGER, two source sequence length LESSHEN: TLESSTHENPROC, used to define the comparison function of the order ************************************* *************************************************************} Procedure Merge SrcFirst, SrcSecond, Dest: PPointerList; SizeFirst, SizeSecond: Integer; LessThen: TLessThenProc); var I: Integer; IsFirst: Boolean; begin IsFirst: = True; if (SizeFirst = 0) or (LessThen (SrcSecond [0], SrcFirst [0]) The begin swap (Pointer (srcsecond)); Srcsecond); SWAP (SIZEFIRST, SIZESECOND); isfirst: = NOT ISFIRST; END; while sizefirst> 0 do begin if sizesecond = 0 THEN i: = sizefirst Else Begin I: = 0; While (I

Parameters: ppointerlist, ordered sequence, you must accommodate Size 1 element size: integer, the original sequence length value: newly inserted value LESSTHEN: TLESSTHENPROC, used to define the comparison function ******** *********************************************************** ***************} Procedure SortInsert; size: integer; value: Pointer; LESSTHEN: TLESSTHENPROC); VAR J: Integer; Begin If Lessthen (Value, Data [ 0]) THEN J: = 0 else Begin J: = size; while (j> 0) And lessthen (value, data [j - 1]) DO DEC (j); end; move (data [j], data [ J 1], SIZEOF (SIZE - J)); DATA [J]: = VALUE; END; {********************** *********************************************************** ** Function: MergePart Features: Two adjacent neighboring sequences in the sequence are combined into a sequence of sequences and stored in place. Parameters: Data: PPointerList, source sequence. Partsize: Integer, the first order length. Size: Integer, the total length of the sequence. Lessthen: TLESSTHENPROC, a comparison function for defining order Description: If the free space is sufficient, call the Merge implementation, otherwise call SortInsert. *********************************************************** ***********************} Procedure MergePart (Data: PpointerList; First: Integer; Size: Integer; Lessthen: TlesshenProc); VAR Buffer: ppointerlist; I: integer; begin buffer: = allocmem (size * sizeof (pointer)); if buffe <> nil dam move (data ^, buffer ^, size * sizeof (Pointer)); MERGE (@Buffer [0], @ Buffer [First], Data, Firsthen; FreeMem (Buffer); Else Begin Dec (Size); for i: = Partsize To Size Do SortInsert (Data, i, Data [i], lessthen) End; End; {*************************************************** ************************************** Functions: Insertionsort Features: Simple Insert Sort Parameters: Data: PPointerList, Source Sequence. Size: integer, sequence length.

转载请注明原文地址:https://www.9cbs.com/read-59958.html

New Post(0)