JPEG compression introduction

xiaoxiao2021-03-05  24

1. Color model JPEG Picture uses YCRCB color model instead of the most common RGB on your computer. About color models, not much here. Just explain that the YCRCB model is more suitable for graphic compression. Because of the brightness on the picture Y 's change is much sensitive to the change of chrominance C. We can save a 8bit luminance value each point, saved a CR CB value per 2X2 point, and the image does not change much in the naked eye. Therefore, it takes 4X3 = 12 bytes with RGB models. Now only 4 2 = 6 bytes are required; the average per point accounts for 12BIT. Of course, the C value per point is recorded in the JPEG format. Down; but MPEG is stored in a point of 12bit, we are shortly written as YUV12. [RGB] -> [y CB CR] Conversion ----------------- ------ (r, g, b) | Y | | 0.299 0.587 0.114 | R | | 0 || CB | = | - 0.1687 - 0.3313 0.5 | * | g | | 128 | | Cr | 0.5 - 0.4187 - 0.0813 | | B | | 128 | Y = 0.299 * R 0.587 * g 0.114 * B (brightness) CB = - 0.1687 * R - 0.3313 * g 0.5 * B 128CR = 0.5 * R - 0.4187 * G - 0.0813 * B 128 [Y, CB, Cr] -> [R, G, B] conversion --------------------- ---- R = Y 1.402 * (CR-128) g = Y - 0.34414 * (CB-128) - 0.71414 * (CR-128) B = Y 1.772 * (CB-128) General, C value ( Including CB CR) should be a symbolic number, but it is handled here, the method is plus 128. The data in JPEG is no symbol 8bit. 2. DCT (Discrete cosine transformation) JPEG, pair Data compression, first to do a DCT transformation. The principle of DCT transformation involves mathematical knowledge, where we don't have to be refreshed. Anyway and Fourier transform (learning high, knowing) is almost the same. After this transformation, put the picture The law between points and points is presented, more convenient to compress .jpeg is for every 8x8 The point is handled by one unit. So if the length of the original picture is not 8 times, you need to make up the multiple of 8, a piece of block processing. In addition, I remember that the Cr CB I just said is 2x2 record once? So most cases, it is to make up the 16X16 integer block. Press from left to right, from top to the order (like the order of our writing). JPEG is a DCT transformation for Y CR CB, so it is. JPEG The INVERSE DCT (IDCT) used when I encode the Forward DCT (FDCT) decoding the formula: FDCT: C (U, V) 7 7 2 * x 1 2 * Y 1F (U, V) = --------- * Sum Sum f (x, y) * COS (------- * u * pi) * COS (---- * v * pi) 4 x = 0 y = 0 16 16U, V = 0, 1, ..., 7 {1/2 When u = v = 0 C (U, V) = {{1 Other case IDCT: 1 7 7 2 * x 1 2 * Y 1F (x, y) = --- * Sum sum c (u, v) * f (u, v) * COS (----- * u * pi) * cos (- ----- * v * pi) 4 u = 0 v = 0 16 16x, y = 0, 1 ... 7 This step takes time, there is a kind of AA & N optimization algorithm, you can go to inet yourself. On the Intel home page, you can find AA &

N idct's MMX optimization code. 3. Reorder DCT results DCT converts an 8x8 array into another 8x8 array. But all data in memory is linearly stored. If we store this 64 numbers, each There is no relationship between the points and the point of the end of the line, so JPEG regulations shall be organized 64 numbers as follows .0, 1, 5, 6, 14, 15, 27, 28, 2, 4, 7, 13, 16, 26, 29, 42, 3, 8, 12, 17, 25, 30, 41, 43, 9, 11, 18, 24, 31, 40, 44, 53, 10, 19, 23, 32, 39, 45, 52, 54, 20, 22, 33, 38, 46, 51, 55, 60, 56, 59, 61, 35, 36, 48, 49, 57, 58, 62, 63 The adjacent points in this number are also adjacent to the picture. 4. Quantify the 64 spatial frequency amplitude values ​​for the previously obtained, we will lay a layered quantization operation on them. Method is to quantify The corresponding value in the table and rounded .for (i = 0; i <= 63; i ) vector [i] = (int) (Vector [i] / quantization_table [i] 0.5) There is a JPEG standard quantization table below. Press the same bending order in the same bending order) 16 11 10 16 24 40 51 6112 12 14 19 26 58 60 57 69 5614 17 22 29 51 87 80 6218 22 37 56 68 109 103 7724 35 55 64 81 104 113 9249 64 78 87 103 121 120 10172 92 95 98 112 100 103 99 According to the psychological visual valve, the image of the image of 8bit and the image of the image of the 8bit is good. Of course we can use any quantization table. Quantitative table It is defined after the DQT tag of JPEG. Generally, one definition is defined for the C value. The quantization table is the key to control the JPEG compression ratio. This step removes some high frequencies, which is very high. But the fact The eyes of the upper eyes are far without low frequency sensitivity. So the visual loss after treatment is small. Another important reason is that there is a color transition between all pictures. A large number of image information is included. In the low spatial frequency. After quantitative treatment, a large amount of continuous occurrences in high space frequency segments. Zero .5. 0 RLC Coding Now we have a lot of continuous 0. We can use RLC to compress these 0. Here we will skip the first vector (why it will be explained later) because it is comparable. Suppose There is a set of vectors (63 after 63) are 57, 45, 0, 0, 0, 0, 2, 0, -30, -16, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, .., 0 after the RLC is compressed, (0, 57); (0, 45); (1, -30); (0, -16); (2, 1) EOBEOB is an end tag, indicating that it is 0. In fact, we use (0,0) to indicate eob However, if this set number does not end with 0, then do not need EOB. Due to the rear Huffman encoding requirements, The number of numbers in each group must be 4 bit, that is, it can only be 0 ~ 15, so we actually encode: (0, 57); (15) (2, 3); (4) ,2) ;

(15, 0) (15, 0), (0) Note (15, 0) Note 16 consecutive 0.6. Huffman encoding To improve storage efficiency, JPEG does not directly save values, but Divide the numerical bits into 16 groups: numeric group actual save value 0 0 --1, 1 1 0, 1-3, -2, 2, 3 2 00, 01, 10, 11-7, -6, -5 -4, 4, 5, 6, 7 3 000, 001, 110, 111-15, .., - 8, 8, .., 15 4 0000, .., 0111, 1000, .., 1111-31, .., - 16 , 16, .., 31 5 00000, .., 01111, 10000, .., 11111-63, .., - 32, 32, .., 63 6.-127, .., - 64, 64 ,. ., - 128, 128, .., - 256, 256, .., 511 9. -1023, .., - 512, 512, .., 1023 10. -2047, .., - 1024, 1024, .., .., - 2048, 2048, .., 4095 12. -8191, .., - 4096, 4096, .., 8191 13 - 16383, .., - 8192, 8192, .., 16383 14. -32767, .., - 16384, 16384, .., 32767 15. Or come to see the previous example: (0, 57); (0 45); (4, 23); (1, -30); (0, -8); (2, 1); (0, 0) only handle the one of the right side: 57 is the 6th group The actual storage value is 111001, so the encoded is (6, 111001) 45, the same operation, encoded as (6, 10111) - 30 -> (5, 100001) -8 -> (4,0111) 1 -> (1, 1) The front of the string becomes: (0, 6), 111001; (0, 6), 101101; (4, 5), 10111; (1, 5), 00001; (0, 4), 0111; (2, 1), 1; (0) The values ​​in parentheses just synthesize one byte. The rear encoded digital representation range is -32767..32767. In the synthetic byte, the high 4 bits are the number of finals, and the low 4 digits describe the number of digits of the numbers. Continue just now Example, if 06 Huffman encoded is 11100069 = (4, 5) --- 11111111001100121 = (1, 5) --- 11111101104 = (0, 4) --- 101133 = (2, 1) --- 110110 = EB = (0, 0) - 1010 So lasting 63 coefficients of the previous example (remember that we will write the first skip?) Writing the JPG file by bit stream is: 111000 111001 111000 101101 1111111110011001 10111 1111110110 000000110111111011 1 1010DC code --------- Remember the first one of each group of 64 data, DC means this number (63 referred to as AC ) You can get C (0,0) 7 7 DC = f (0) = --------- * Sum Sum f (x, y) * cos 0 * cos 0 in front of the FDCT formula C (0,0) = 1/24 x = 0 y = 0 1 7 7 = --- * Sum Sum f (x, y) 8 x = 0 y = 0 The average of an image sample is average. That is to say It contains a lot of energy in the original 8x8 image block. (Usually get a big value) JPEG's authors indicate a very close relationship between the DC rates of the continuous block, so they determine the DC value of 8x8 blocks. The difference is coded. (Y, CB, CR has its own DC) DIFF =

DC (i) - DC (i-1) So this piece of DC (i) is: DC (i) = DC (I-1) DIFFJPG starts from 0 to DC encoding, so DC (0) = 0. Then Then add the current DIFF value to the previous value. Let's take a look at the example above: (Remember that the DC we save is the difference between the last DC), for example, the DIFF is -511, Encoding (9, 000000000) If the HUFFMAN code is 1111110 (in the JPG file, there is a DC using a DC, one is AC, one is represented in the JPG file. For 111110 000000000, it will be placed in front of 63 ACs, the final Bit stream on the above example is as follows: 1111110 000000000 11110 111001 11100000 11110 111111110 00001 1011111111110 00001 1011 0111 11011 1 1010 Next, a picture of a data unit ----------------------------------------------- throughout the picture At the beginning of the decoding, you need to initialize the DC value of 0.1) First decoding DC: a) obtain a Huffman code (using the Huffman DC table) b) Huffman decoding, see the following data bit NC) get n bits, calculate the DIFF value d) DC = DIFFE) Write DC value: "Vector [0] = DC" 2) Decoding 63 ACs: ------ Cyclic Processing Each AC until EOB or processes to 64 ACAs to get one Huffman code (using Huffman AC table) b) Huffman decoding, resulting (previous 0 quantity, group number) [Remember: If it is (0, 0) is EOB] c) get N-bit (group number) calculation ACD) Enter the corresponding number of 0e) Next to write AC ----------------- Next Decodation ------------ Previous We get 64 Vector. Let's also do some decoding work: 1) Inverse quantization 64 vectors: "for (i = 0; i <= 63; i ) vector [i] * = quant [i]" 2) heavy Arranged 64 vector to 8x8 block 3) For 8x8 blocks Idct to 8x8 blocks (Y, CB, CR) repeating the above operation [Huffman decoding, steps 1), 2), 3)] 4) will all Symbolic 8bit number plus 1285) Convert YCBCR to RGBJPG file (Byte level) How to organize picture information ------------------------ ---------- Note that the JPEG / JFIF file format uses the Motorola format, not the intel format, that is, if it is a word, the high byte is before, the low byte is after the .jpg file is made by one Segment (segments) constructed. Each section is <= 65535. Each segment begins with a marker word. The tag word is the 0xFF header, ending in a non-0 byte and 0xFF. For example, 'FFDA', 'FFC4' , 'Ffc0'. Each tag has its specific meaning, which is made by the second byte. For example, SOS (STARTOF Scan = 'FFDA') indicates that you should start decoding. Another tag DQT (Define QuantizationTable =

0xffdb) That is to say, there is a 64-byte quantization table after processing JPG files, if you touch a 0xFF, and the byte behind it is not 0, and this byte is meaningless. So you encountered 0xFF bytes It must be ignored. (Some JPG, common use of 0xFF to do some fill use) If you happen to generate a 0xFF when you do Huffman encoding, then use 0xFF 0x00 instead. That is to talk to FF00 when JPEG graphics decoding It is treated as a ff. In addition, when you have a few BIT, you should use 1 to fill when you are in use. Then follow the ff. Below is a few important tags ---------- ------------ SOI = start of image = 'ffd8' This tag is only an EOI = End of Image = 'ffd9'jpg file in FFD9 in the file RSTI = FFDI (i = 0..7) [RST0 = FFD0, RST7 = FFD7] = Reset tag usually inserted in the data stream, I think it is worried that JPG decoding problems (should be used with DRI). But many JPG do not use it (SOS - - RST0 --- RST1 - RST2 --...... - RST6 - RST7 - RST0 --...) ---- Tag ---- The following is the mark SOF0 that must be processed = Start of Frame 0 = FFC0SOS = Start of Scan = FFDAAPP0 = it's the marker used to identify a JPG file which uses the JFIFspecification = FFE0COM = Comment = FFFEDNL = Define Number of Lines = FFDCDRI = Define Restart Interval = FFDDDQT = Define Quantization Table = Ffdbdht = define huffman table = FFC4JPG file storage -------------------------- JPEG defines a table to describe HAFFMAN tree. Define behind the DHT tag. Note: HA The length of the FFMAN code is limited to 16bit. Generally there is a 2 type haffman table in a JPG file: a DC for AC (actually there are 4 tables, brightness DC, AC, chromatic DC, AC Two) This table is to be saved: 1) 16 bytes: the i -tyte represents the number of HUFFMAN code (i = 1 to 16) 2) The length of this table (number of bytes) = These 16 numbers now you can imagine how this table is stored? The corresponding byte is the equivalent number of the HAFFMAN code. I don't explain, this requires you to know the haffman algorithm. Here only one example: haffman The table head is 0, 2, 3, 1, 1, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0 is the code for the length of 1 no length 2. 0001 Length 3 code is 100101110 length code is 1110 length 5 code is 11110 length of length 6 code is 111110 length 7 code is not (if one is one, should be 111110) length is 8 code is 11111100 ..... There is no later. If the data below the table is 45 57 29 17 23 25 34 28. That is to say 45 = 0057 = 0129 = 10017 = 10123 =

110, etc. ... If you understand HAFFMAN encoding, these is not difficult to understand the sampling coefficient -------- The following is the decoding of true color JPG, the grayscale JPG is very simple, because the brightness is only in the graphics Information. The color graphic is composed of (Y, Cr, CB), mentioned above, Y is usually sampled once, while Cr, CB is typically 2x2 points, and of course, JPG is sampling point-by-point, or each Two-point sampling (lateral two points, longitudinal one) sampling coefficients are defined as the relative value of the highest sampling coefficient. General case (ie: y points-by-point sampling, Cr CB per 2x2 point once): Y has the highest sampling rate The lateral sampling coefficient HY = 2 longitudinal sampling coefficient VY = 2; the lateral sampling coefficient of CB HCB = 1, the longitudinal sampling coefficient vcb = 1; the same HCR = 1, VCR = 1 in JPEG, 8x8 raw data, after RLC, Huffman encoded a string of data is known as a Data Unit (dU) jpg Press DU-based encoding order as follows: 1) for (counter_y = 1; counter_y <= vy; counter ) for (counter_x = 1; counter_x < = Hy; counter_x ) {DATA UNIT encoding} 2) for Y (counter_y = 1; counter_y <= vcb; counter_y ) for (counter_x = 1; counter_x <= HCB; Counter_x ) {Data Unit encoding of CB} 3 ) for (counter_y = 1; counter_y <= vcr; counter_y ) for (counter_x = 1; counter_x <= hcr; counter_x ) {Data Unit encoding to Cr Press my above: (Hy = 2, VY = 2; HCB = VCB = 1, HCR, VCR = 1) is such a sequence YDU, YDU, YDU, YDU, CBDU, and CRDU described a 16x16 graphic. 16x16 = (HMAX * 8 x vmax * 8) Here hmax = HY = 2 Vmax = VY = 2 A block of (hmax * 8, vmax * 8) is called MCU (Minimun Coded Unix) A MCU = YDU, YDU, YDU, YDU, CBDU, CRDU if Hy = 1, VY = 1HCB = 1, VCB = 1HCR = 1, VCR = 1 This (hmax = 1, vmax = 1), MCU only 8x8 large, MCU = YDU, CBDU, CRDU for grayscale JPG, MCU only one DU (MCU =

YDU) JPG file, the sampling coefficient of each component of the image defines the decoding of the JPG file after Sof0 (FFC0) tag ------------------- ------ Decoding procedures read the sampling coefficient from the JPG file, so that the size of the MCU is to calculate a few MCUs. The decoding program is recirculated one by one to the MCU, until the EOI tag For each MCU, separate each DU in order, then combine, convert into (r, g, b) OK with: JPEG file format ~~~~~~~~~~~~~~ ~~ - File Head (2 Bytes): $ FF, $ D8 (SOI) - Any number of paragraphs, see the end of the file (2 Bytes): $ FF, $ D9 (EOI) format : ~~~~~~~~~ - Header (4 bytes): $ FF segment identifies the type of N-segment (1 Byte) SH, SL, including these two bytes, but does not include the front of $ FF And n. Note: The length is not an intel order, but Motorola's, high byte in front, low byte! - This paragraph content, up to 65533 bytes Note: - There are some parametric segments (below below Instrumental status) There is no length description (and no content), only $ ff and type bytes .- Each section ends to the next $ ff is legal, must be ignored. Segment type : ~~~~~~~~~ * Tem = $ 01 can ignore SOF0 = $ c0 frame start (Baseline JPEG), after the additional SOF1 = $ C1 DITOSOF2 = $ C2 usually does not support SOF3 = $ C3 usually does not support Sof5 = $ c5 usually does not support SOF6 = $ C6 usually does not support SOF7 = $ C9 usually does not support SOF9 = $ C9 Arithmetic encoding (an extension algorithm for HUFFMAN), usually does not support Sof10 = $ CA usually does not support SOF11 = $ CB usually Does not support SOF14 = $ CE usually does not support SOF14 = $ CE usually does not support SOF15 = $ CF usually does not support DHT = $ C4 Defining Huffman Table, Details JPG = $ C8 Undefined / Reserved ( Cause decoding error) DAC = $ CC defines Arithmetic Table, usually does not support * RST 0 = $ d0 RSTN is used for RSYNC, usually ignored * RST1 = $ d1 * rst2 = $ d2 * rst3 = $ d3 * rst4 = $ d4 * RST5 = $ d5 * rst6 = $ d6 * rt7 = $ d7soi = $ d8 Picture Start EOI = $ D9 End SOS = $ DA Scanning Start After DHP = $ de ignored (skip) Exp = $ DF ignore (Skip) APP0 = $ E0 JFIF APP0 Segment Marker (Detail) App15 = $ EF ignore JPG0 = $ F0 ignore (Skip) JPG13 = $ FD ignore (Skip) COM = $ Fe Note, details after the additional segment type must be skipped SOF0: start of frame 0: ~~~~~~~~~~~~~~~~~~~ ~~ - $ FF, $ C0 (Sof0) - Length (High Byte, Low byte), 8 Components * 3-Data Accuracy (1 Byte) Each sample bit number, usually 8 (most software does not support 12 and 16) - Picture Height (High Byte, Low byte), if DNL is not supported>

转载请注明原文地址:https://www.9cbs.com/read-39153.html

New Post(0)