JPEG Easy Document V2.1

zhaozj2021-02-08 399

------------------------------

Finally amend 2000.6.19

Author: Yun Feng

Email: Cloudwu@263.net

Homepage: http://member.netese.com/~cloudwu

Written in front

------------

1. Why write this document?

Yunfeng wants to have a systematic study on JPEG / MPEG, but it is bitter to find good information. And English level

Not like, so in the process of learning, I will have a recorded thing. Convenient you are writing

Check when the code is reviewed. And the official JPEG document is very complicated, and it is also thick and thick.

A friend who is better in English, it seems to be headache. Write a streamlined version, just JPEG

Baseline encoded decoding algorithm is described. This will be good for friends who want to know JPEG.

Of course, it is necessary to study JPEG friends, please go to find books and information, I hope that Chinese information on inet is increasing.

The more rich.

2. The purpose of reading this document is expected.

It is possible to have a sense of sense of understanding of JPEG graphics, but its mathematical principles do not need to be cleared. You can pass this,

Start writing your own encoding / decoding program. Or understand some code. It is further understood on damaged graphic compression.

You can improve JPEG, such as increased transparency support, speed up JPEG decoding speed.

3. Why write in text format instead of HTML?

Personal preferences. I don't like the electronic documentation with formatted. Pure text can be used more widely, not need

HTML browser.

4. Do you need to pay for this document?

You can freely use it. But because you are free to use, the author is not wrong and asking possible errors and questions.

The title should be responsible. About related issues, you can come to Email to explore, but due to limited energy, it is not guaranteed

Call. If you are unsatisfactory, Yunfeng does not accept any unreasonable criticism.

5. Can you reprint this document?

You are welcome to reprint, but you must not use it. And reprint, please keep its content is complete. If you are for it

A version of the format such as HTML, must also keep a plain text version together.

JPEG compression introduction

-------------

Color model

JPEG's picture is used by YCRCB color model instead of the most common RGB on the computer. About colors

The color model is not described here. Just illustrate that the YCRCB model is more suitable for graphics compression. Because of the human eye

The change in brightness Y is much sensitive to the change of chrominance C. We can save a 8bit on each point.

The degree value, saves a CR CB value every 2X2 point, and the image does not change much in the naked eye.

Therefore, it takes 4x3 = 12 bytes with RGB models, 4 points. Now only 4 2 = 6 bytes; flat

Each point accounts for 12bit. Of course, the C value of each point is allowed to record; but MPEG

They are all stored in 12bit, and we are shortly written as YUV12.

[R G B] -> [Y CB CR] Conversion

-------------------------

(R, g, b are 8bit unsigned)

| Y | | 0.299 0.587 0.114 | | r | | 0 |

| CB | = |- 0.1687 - 0.3313 0.5 | * | g | & brvbar128|

| Cr | 0.5 - 0.4187 - 0.0813 | | B | & brvbar128|

Y = 0.299 * R 0.587 * g 0.114 * b (brightness)

CB = - 0.1687 * R - 0.3313 * G 0.5 * B 128

Cr = 0.5 * r - 0.4187 * G - 0.0813 * b 128

[Y, CB, CR] -> [R, G, B] Conversion -------------------------

R = Y 1.402 * (CR-128)

G = Y - 0.34414 * (CB-128) - 0.71414 * (CR-128)

B = Y 1.772 * (CB-128)

Generally, the C value (including CB CR) should be a symbolic number, but it is handled here, method

Yes, add 128. The data in JPEG is unsigned 8bit.

2. DCT (discrete cosine transformation)

In JPEG, you have to compress the data, first do a DCT transformation. The principle of DCT transformation involves mathematics.

Knowledge, here we don't have to study. Anyway and Fourier transform (have learned a high number) is almost the same. After passing

This transformation is presented, it is more convenient to compress the law, more convenient to compress .jpert is for every 8x8

A point is handled by one unit. So if the original image is as long as 8 multiple times, you need to make up to 8

The multiple, a piece of blocking. In addition, I remember that the Cr CB I just said is 2x2 record once?

In most cases, it is necessary to make up the 16x16 integer block. Press from left to right, arrange from top to bottom (and I

The order of writing is the same as the order). JPEG is DCT transformation to Y CR CB. DCT transformation

The Y, Cr, CB value range is -128 ~ 127. (Y is subtracted 128)

The Inverse DCT (IDCT) used when the Forward DCT (FDCT) decoding is used when JPEG encoding.

The formula is given below:

FDCT:

7 7 2 * x 1 2 * Y 1

F (u, v) = alpha (u) * alpha (v) * sum sum f (x, y) * COS (----- * u * pi) * cos (------ * v * pi)

X = 0 y = 0 16 16

U, V = 0, 1, ..., 7

{1 / SQRT (8) (u == 0)

Alpha (u) = {

{1/2 (u! = 0)

IDCT:

7 7 2 * x 1 2 * Y 1

F (x, y) = sum Sum alpha (u) * alpha (v) * f (u, v) * COS (----- * u * pi) * COS (------ * v * pi)

u = 0 v = 0 16 16

x, y = 0, 1 ... 7

This step is very spent, and there is another AA & N optimization algorithm. You can go to inet to find it.

On the Intel home page, you can find the MMX optimization code of the AA & N IDCT. (The code on the Intel home page,

The number of fixed points of the input data is 12.4, the input matrix needs to turn 90 degrees)

3. Rearrange DCT results

DCT transforms an 8x8 array into another 8x8 array. But all data in memory is line

Store, if we store this 64 numbers, the points of each line starting to start

There is no relationship, so JPEG regulations shall be organized 64 numbers as follows.

0, 1, 5, 6, 14, 15, 27, 28,

2, 4, 7, 13, 16, 26, 29, 42,

3, 8, 12, 17, 25, 30, 41, 43,

9, 11, 18, 24, 31, 40, 44, 53,

10, 19, 23, 32, 39, 45, 52, 54,

20, 22, 33, 38, 46, 51, 55, 60,

21, 34, 37, 47, 50, 56, 59, 61,

35, 36, 48, 49, 57, 58, 62, 63 This number of adjacent points in this number is also adjacent.

4. Quantization

For the 64 spatial frequency amplitude values obtained earlier, we will give them a layered quantization operation.

The law is divided by the corresponding value in the quantization table and rounded.

For (i = 0; i <= 63; i )

Vector [i] = (int) (Vector [i] / quantization_table [i] 0.5)

There is a JPEG standard quantization table below. (Aligned in the same bending order above)

16 11 10 16 24 40 51 61

12 12 14 19 26 58 60 55

14 13 16 24 40 57 69 56

14 17 22 29 51 87 80 62

18 22 37 56 68 109 103 77

24 35 55 64 81 104 113 92

49 64 78 87 103 121 120 101

72 92 95 98 112 100 103 99

This table is based on the psychological visual valve, the processing of the image of the 8bit brightness and chromaticity is good.

Of course we can use any quantization table. Quantization table is defined after the DQT tag in JPEG. General

Define one for the y value and define one for the C value.

Quantization Table is the key to controlling the compression ratio of JPEG. This step has removed some high frequencies, which is high.

Details. But in fact, human eye is much sensitive to high spatial frequencies. So the visual losses after treatment are small.

Another important reason is that all pictures of the point and points will have a color transition process. A lot of images

Information is included in a low spatial frequency. After quantitative processing, a large number of continuous occurs in high space frequency segments.

Zero.

Note that the quantified data is likely to exceed 2 Byte's processing range with symbolic integers.

5. 0 RLE Code

Now we have a lot of continuous 0. We can use RLE to compress these 0. Here us

The first vector will be skipped (why it will be explained later) because its encoding is particularly special. Suppose there is a set of vectors

(64 after 63)

57, 45, 0, 0, 0, 2, 0, -30, -16, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, .., 0

After the RLC is compressed

(0, 57); (0, 45); (4, 23); (1, -30); (0, -16); (2, 1); EB

EOB is an end tag, which means it is 0. In fact, we use (0,0) to represent EOB

However, if this set number does not end with 0, then do not need EOB.

Due to the requirements of the following Huffman encoding, the number of numbers in each group represents 0 must be 4 bit,

That is, it can only be 0 ~ 15, so we don't actually code:

(0, 57); (15, 0) (2, 3); (4, 2); (15, 0) (15, 0) (1,895), (0,0)

Note (15, 0) indicates 16 consecutive 0.

6. Huffman encoding

In order to improve storage efficiency, JPEG does not directly save values, but divide the numerical bits into 16 groups:

Value group actual save value

0 0 -

-1, 1 1 0, 1

-3, -2, 2, 3 2 00, 01, 10, 11

-7, -6, -5, -4, 4, 5, 6, 7 3 000, 001, 010, 011, 100, 101, 110, 111

-15, .., - 8, 8, .., 15 4 0000, .., 0111, 1000, .., 1111

-31, .., - 16, 16, .., 31 5 00000, .., 01111, 10000, .., 11111

-63, .., - 32, 32, .., 63 6.-127, .., - 64, 64, .., 127 7.

-255, .., - 128, 128, .., 255 8.

-511, .., - 256, 256, .., 511 9.

-1023, .., - 512, 512, .., 1023 10.

-2047, .., - 1024, 1024, .., 2047 11.

-4095, .., - 2048, 2048, .., 4095 12.

-8191, .., - 4096, 4096, .., 8191 13.

-16383, .., - 8192, 8192, .., 16383 14.

-32767, .., - 16384, 16384, .., 32767 15.

Or come to see the example:

(0, 57); (0, 45); (4, 23); (1, -30); (0, -8); (2, 1); (0, 0)

Only handle the one of the right side:

57 is the 6th group, the actual preservation value is 111001, so the encoded is (6,111001)

45, the same operation, encoding is (6,101101)

23-> (5,10111)

-30 -> (5,00001)

-8 -> (4,0111)

1 -> (1, 1)

The front of the string becomes:

(0, 6), 111001; (0, 6), 101101; (4, 5), 10111; (1, 5), 00001; (0, 4), 0111;

(2, 1), 1; (0, 0)

The value in parentheses is synthesizing a byte. The number of numbers that are encoded later is -32767..32767.

In the synthesized byte, the high 4 bits are the number of finals, and the low 4 bits describe the number of bits of the back numbers.

Continue just now, if 06 HUFFMAN is encoded to 111000

69 = (4, 5) --- 1111111110011001

21 = (1,5) --- 11111110110

4 = (0, 4) --- 1011

33 = (2, 1) --- 11011

0 = EB = (0) --- 1010

Then, finally, 63 coefficients expressed in the previous example (remember that we will skip the first time?) Bit stream

This is this in the JPG file:

111000 111001 111000 101101 1111111110011001 101111 1111110110 00001

1011 0111 11011 1 1010

DC code

---------

I remember that we just jumped through the first one of each group, DC refers to this number (behind 63)

AC) can get the previous FDCT formula

C (0, 0) 7 7

DC = f (0, 0) = --------- * Sum Sum f (x, y) * COS 0 * COS 0 where C (0, 0) = 1/2

4 x = 0 y = 0

1 7 7

= --- * Sum Sum f (x, y)

8 x = 0 y = 0

That is, an average of an image sample. That is, it contains many energy in the original 8x8 image block. (Usually

Will get a lot of values)

JPEG's authors pointed out that there is a very close relationship between DC rates, so they decide to 8x8 blocks.

Differences of DC values are encoded. (Y, CB, CRs have their own DC) DIFF = DC (i) - DC (I-1)

So this piece of DC (i) is: DC (i) = DC (I-1) DIFF

JPG starts from 0 to DC encoding, so DC (0) = 0. Then add the current DIFF value to the previous value.

To the current value.

Let's take a look at the example above: (Remember that our Save DC is the difference between the last DC)

For example, in the above example, DIFF is -511, encoded

(9, 000000000)

If the 9 HUFFMAN code is 1111110 (in the JPG file, there are generally two huffman tables, one

The DC is used, one is AC uses) So in the JPG file, the DC 2 credit is expressed as

1111110 000000000000

It will be placed in front of 63 ACs, the final Bit stream on the above example is as follows:

111110 00000000000 111000 111001 111000 101101 1111111110011001 10111

11111110110 0000001 1011 0111 11011 1 1010

The following is a brief description of the picture y of the image Y.

-----------------------------------------------

At the beginning of the entire picture decoding, you need to initialize the DC value of 0.

1) First decode DC:

a) get a huffman code (using the Huffman DC table)

b) Huffman decoding, see the number of data bits behind

c) obtain N-bit, calculate the DIFF value

d) DC = DIFF

e) Write DC value: "Vector [0] = DC"

2) Decode 63 ACs:

------ Recycling each AC until EOB or processes to 64 ACs

a) get a huffman code (using the Huffman AC table)

b) Huffman decoding, get (quantity, group number)

[Remember: If it is (0,0) is EOB]

c) obtain N-bit (group number) to calculate AC

d) Write the corresponding number of 0

e) Write next to AC

-----------------

Next decoding

----------------

In the previous step, we got 64 vectors. Here we need to do some decoders.

转载请注明原文地址:https://www.9cbs.com/read-1300.html

9cbs

New Post(0)