RLE compression and optimization

zhaozj2021-02-08  219

Simply put RLE compression is the purpose of converting a string of continuously identical data into a specific format to achieve compression. Both the BYTE stream is compressed. Such as input data LPBTE PBYTE = {1, 1, 1, 1, 1, 1}; compressed data is 6, 1 compressed 4 characters. However, it cannot be directly replaced in the data stream, but special control characters should be used, otherwise it cannot be decompressed. For example, PBYTE = {6, 1, 0, 1, 1, 1, 1, 1, 1}; there are two 6, 1 unable to determine whether the original 6, 1 is {1, 1, 1, 1, 1 , 1} Compressed code. So there should be control characters. (1) In order to achieve the maximum compression ratio, you can scan the source data stream first, and use the least appearance character to control the character. Such as PBYTE = {6, 1, 0, 1, 1, 1, 1, 1, 1, ...}; 0 to find 0 to the least appearance after scanning. We use 0 as a compressed control, and other characters represent him itself. 0 0 in the source data is represented by 0, 0. Then the PBYTE is compressed for 6, 1, 0, 0, 0, 6, 1 ... decompression, Byte A, B, C; A = sequentially scan compressed data, if the input character is non-control character, then directly Output to the decompression stream. If it is control character, b = whether the next character is also control character, if yes, the code is output to the control character. If not c = read compressed stream, then output B C to the output stream.

Note: The location of the codes of Ctrlcode requires yourself to calculate offset.

Such as ctrl = 2. Then n = 3 should be corrected as 2. The method just described is the maximum compression rate, but because checking each input character, the speed is not fast.

(2) In order to increase the decompression speed, other coding methods can be employed. The main method is not checked for each input character, and only the same compression ratio is reached less. Let's take a look at this improvement method. Carefully observe, in fact, the non-repetitive characters can also be represented by control N data. The n is tape N uncompressed data here. Still just the data. PBYTE = {6, 1, 0, 1, 1, 1, 1, 1} No scan selection 0 is control compression to 3, {6, 1, 0,} 0, 6, 1 n ctrl n m decompression It is very convenient to scan data read a character, {n = read; if (n) {character copy N} else {n = read (); m = read; Write (n m);}} (3) Optimization pair (1) Optimization. It is observed that the data compression ratio of 1, 1, 1 is 0, so it is not compressed when n <= 3 is used. The format directly written directly as 1, 1, and 1. Also, if there are multiple control characters to continuous. It can also be compressed. Ctrl = 0; 0,0,0,0,0,0,0,0,0,0,0,0,0,10,0,0,0 compressed encoded is 0, 4, 0, so that the control character is compressed. For (2): Optimize only compression coding. Examples 1, 2, 3, 4, 1, if the dead sleeve formula is 4, 1, 2, 3, 4, 0, 2, 1, add 2 bytes. If you use 6, 1, 2, 3, 4, 1, 1 to add only one byte.

转载请注明原文地址:https://www.9cbs.com/read-1980.html

New Post(0)