MPEG4 & H.264 learning notes three ------ image model (image processing)

xiaoxiao2021-03-06  21

3.4 Image Model A natural video image includes a single sampling value. Natural images are typically difficult to compress in their original state, because the adjacent image sampling value is caused. We can The image of the image is similar to the degree of similarity between different images. The top point at the mid point indicates an image that is not moved during movement. When the airspace movement copy is removed from the original image, this The function value will drop sharply, which indicates that the neighborhood of an image sample value is highly related.

The autocorrelation function of the remaining image of a motion compensation is rapidly attenuated when the airspace moves, indicating that the adjacent sampling points are weak. Effective motion compensation has reduced local relevance in the remaining frames, so that The image is easier to compress than the image in the original state. The action of the image model is to remove the correlation of the image or the remaining image and convert it into a data form that can be efficiently encoded by entropy encoder. Actual image model There are generally three main components: transform (de-correlation and simplifying data), quantify (reducing the correlation between conversion data) and reorganization (combined data to group a large value).

3.4.1 Predict Image Code Motion Compensation is an example of predicting encoding, under this compensation, encoder creates a prediction of a region of the current frame and subtracting this prediction from the current domain to form a surplus from the previous frame. frame. If the prediction is successful, then the energy in the remaining frame is much less than the energy in the original frame, and the remaining frames can be represented by less bit.

Similarly, the prediction of the same image sample or region can be composed of the same image or frame in the previous transmission. The predictive encoding is used as the basis of the early compression algorithm, and is also a very important of H.264. Component. Inframe encoding (Application in the transform domain, see). Null domain prediction is sometimes described as "Differential Pulse Coding Modulation" (DPCM) - a differential PCM encoding method introduced from the communication system.

B CA X

In the figure, it is assumed that a pixel x is a pixel point encoded. If this frame is processed in the grating order, then points A, B and C (adjacent pixels in the current and previous rows) are available in the coding and decoders. (Because they have been decoded before X). The encoder gets the prediction of the X-based pixels based on some pixels previously encoded, and then encodes the remaining frame rate (results after doing differences). ). The decoder forms the same prediction, then adds the decoded remaining frame to reconstruct the pixel value. If the encoding process is damaged (for example, the remaining frame is quantized), the decoded pixel value A, B, C may be The original A, B, C are different (due to the loss of the encoding process), such words may cause cumulative mismatch on the encoder and decoder. In this case, the encoder should decode in the remaining frames. R (x) and reconstruct pixels.

For example: encoder prediction: p (x) = (2a b c) / 4 remaining frame R (x) = x - p (x) is encoded and transmitted decoder decoding R (X) and composed the same forecast : P (x) = (2a b c) / 4 Reconstruction pixel x = r (x) p (x)

The encoder uses the decoded pixel value a, b, and c to constitute reconstruction, such as P (x) = (2a b c) / 4. In this way, the encoder and decoder use the same P (X), This avoids the occurrence of misalignment.

The efficiency of this method is dependent on the accuracy of P (X) prediction. If the prediction is accurate (P (x) is similar to the value of X), the energy of the remaining frame is very small. However, It is not always possible to choose a predictor for complex images. The encoder must indicate that the decoder selects the predictor so that there is a predictive and multiple bits that need to indicate a signal and predictor. Focus on considering it.

3.4.2 Transform Code 3.4.2.1 The transformation phase in the image or video CODEC is to transfer the image or motion compensation residual value into another domain (transform domain). The selection of the transformation depends on the following categories 1. The data in the conversion domain is de-correlation (most of the energy of the data in the conversion domain) 2. The conversion should be reversible. 3 The calculation process of the conversion should be easy to handle,

Both the image and video compression and many transformations are recommended to use the following two ways: block-based transformation and image-based transformation. Examples of block-based transformations such as Karhunen-Loeve Transform (KLT), single-value decomposition (SVD), and cosine transform (DCT). Each transformation is the image block or remaining prototype of N * N, so that the image is processed by the unit of the block. Block transform has a low memory demand and is suitable for blocking compression based on block-based motion compensation, but is more serious by the boundary area. Image-based transformation for operation of the entire image or frame (or a large segment of one image). The most commonly used image transformation is discrete wavelet transform (DWT or direct wavelet transform). Transformations such as DWT have been proven to be effective for the compression of the state image, but their consumption of memory is large (because the entire image or segment is processed in a single unit) and does not apply to block based block Motion compensation method. DCT and DWT have been applied in the MPEG4 video section (and a DCT variant is used in H.264). They will be discussed in the following sections. 3.4.2.2 Cosine Transform In X (X is a sampling block of N * N, generally refers to an image sample or remaining frame value after the predicted) to construct Y, a coefficient block of one N * m. The operation of the DCT (and its reverse transform IDCT) can be described by the transformation matrix A. A DCT transformation of a N * n sample block is obtained from the subsequent child: y = axa (t), anti-DCT transformation: y = axa (t), A (t) represents a symmetric matrix of A

X is a sampled matrix, Y is a system matrix, A is a change matrix of N * N. The element of a is:

A (i, j) = c (i) * COS [(2J 1) * i * pi] / 2nc (i) = n ^ (- 1/2) i = 0 (2 / n) ^ (- 1 / 2) i> 0

A two-dimensional DCT output is a set of N * N coefficients, which represent block data in the DCT domain, which can be imagined to be a set of standard base "right" values. Any image block can be reorganized by combining all N * N standard bases, through which the fundament is multiplied by the corresponding right factor value (coefficient).

Example 2 The DCT coefficient of the image block is a block of selected 4 * 4, and the DCT coefficient. This advantage of this method of using a DCT domain is not very obvious, because there is no reduction in storage space, it turns out to save 16 pixel values, we now have to save 16 DCT factors. The practicality of the DCT appears when the block is built from a subset of the coefficient:

Set all the coefficients other than some large values ​​to 0, the results after the IDCT is executed as shown below: Adding more coefficients before IDCT can form a more accurate reconstruction of the original map. In this way, it is possible to collect approximately the original image from a coefficient of a subset. Small coefficients in the deletion coefficient (such as by quantifying) allows image data to be represented in fewer coefficient representations, although this will affect the quality of the image.

3.4.2.3 Wavelet Popular "Wavelet Transform" is a widely used method for image compression based on a series of filters equal to the discrete wavelet function). A discrete signal applied to discrete signals with n sampling results The basic operation of the wavelet transform is as follows. A pair of filters are used to decompose signals into a low frequency portion (L) and a high frequency portion (H). Each band is sampled by factor 2 quilt, so that these two frequency waves They all contain N / 2 sampling points. This operation process is reversible if the filter is properly selected.

This method can be extended to a two-dimensional signal, such as a grayscale map. Each row of a 2D figure is passed through low pass and high pass filter (LX and HX), and the output of each filter is used to make intermediate images L and H. L is the low pass filtering of the original image, and the result of the lower sample in the X direction. H is the high pass filtering of the original image and the sample results in the X direction. Next, each of these new images uses low and high-pass filters (LY and HY), and four sub-images (LL, LH, HL, HH) were fabricated. These four sub-images can be combined to be the same image as the original number of samples. The LL is the result of sub-sampling of sub-sample by passing through low pass through the horizontal and vertical direction. HL is a result of passing high pass filtering in the vertical direction and contains the remaining vertical frequency. LH is a result of filtering and comprising the remaining horizontal frequency by high passing in the horizontal direction, and HH is a result of high pass filtering by both horizontal and vertical two directions. Between them, four images contain information of all raw images, but the sparsiness of LH, HL, and HH makes them easier to compress. In the image compression program, the two-dimensional wavelet decomposition is continuously applied to the LL diagram to form four new subgraphs. The new low-pass image of the result is continued to be obtained. Many high-frequency sampling results are close to 0, which can be removed by removing small values ​​to achieve more optimized transmission. At the decoding end, the original image is reconstructed by the upper sample, the filtering, and the quadrature.

3.4.3 Quantating quantifier quantizes a signal of the value domain X to the value domain y. The use of smaller bits indicates that the quantized signal is feasible, because the value of the conversion is smaller than the original. A scale of the quantizer maps the input signal into a quantized output value field, and a vector quantizer maps a set of input sampling values ​​to a set of quantization values.

3.4.3.1 A simple example of quantification of scale quantization scales is to approximate the decimal to the most recent integer, such as mapping from R to Z. This process is damaged (irreversible) because it is unable to restore the original decimal decimal from the approximation.

A quantifier more common example is: fq = round (x / qp) y = fq * qp

Here qp is a quantized step size. Quantating the output stage interval single QP interval.

In the CODEC of the image or video, the quantization operation is typically composed of two parts: the forward quantization of the FQ, and the decoding end of the decoder (IQ). (In fact quantification is irreversible, this is called a more accurate call: Scalar and heavy standard. The QP step between the two scales values ​​is an important parameter. If the step is long, the range of quantization values ​​is small, so that the way can be used during transmission More effectively (high compression ratio compression). However, the weight value is approximate to the original signal value. If the step size is small, the weight value is more match with the original signal, but the quantization value falls to a bigger The efficiency of compression is reduced within the scope.

Quantization can be used to remove the accuracy of the image data after the small coefficient is removed after the small coefficient is removed from the DCT or wavelet transform. An image or video front quantizer is designed to map a small factor value to 0 and retain a part of the large-scale value. The output of the forward quantizer is usually an array of sparse quantization coefficients, mostly 0.

3.4.3.2 Vector quantization a vector quantizer maps a series of input data (such as a piece of image sample) to a single value (CodeWord), and each single value (codeword) is mapped to a former input data. Approximate. This set of vector is stored in the encoding and decoding end, is used as a coded table. A typical vector quantization procedure in image compression is as follows:

1. Put the original image partition (such as M × N Pixel Block) 2. Select a vector from the coded table 3. Pass the selected vector of the selected vector to the decoder 4. In the decoder, use Set of vector to rebuild a copy of the original image.

Quantization is used in the spatial domain (such as an image sample sample quantitative), but it can also be used in motion compensation or transform data. The key issues in the quantization design include the design of the coding table and coding It is effective to find the problem of optimized vector in the table.

3.4.4 Rearrangement and zero encoding quantified transform coefficients need to be carefully encoded to store and transfer. On the transformed image or video encoder, the output of the quantizer is a sparse array that contains some unity and a lot of 0 value coefficients. Rearrangement (reorganize the non-0 coefficient) and effectively representing the 0 factor to operate before entropy coding. These processes are mentioned in DCT and wavelet transform. 3.4.4.1 DCT coefficient distribution a large DCT coefficient of a piece of image or remaining sampling is usually in the DC (0,0) coefficient "low frequency" section. Non-0 coefficients of the DCT coefficient are aggregated in the upper coefficient, and the distribution is approximately symmetrical in the horizontal and vertical direction. For the remaining area, the coefficients of aggregation in the DC position is skew, such as, more non-0 factors appear on the left hand. This is because the area picture has a strong high frequency component in the vertical direction (because of the sub-samples in the vertical direction), this is a larger DCT coefficient due to the vertical frequency.

After the scan is quantified, the DCT coefficient of the block is rearranged to combine the non-0 coefficient to make the representation of the remaining 0 value more efficient. Optimized recombinant method (scanning method) depends on the distribution of non-0DCT coefficients. For classic frames, the appropriate scanning mode should be a zigzag method, starting from the DC coefficient from the upper left corner. From the DC coefficient, each quantized coefficient is copied into the one-dimensional array. Non-0 coefficients are reorganized at the front end of the rearranger number, and then the 0 value of a long sequence is then.

Zigzag Scanning Methods may not be desirable for a domain block, because the coefficient is skewed, then a modified scan order can be valid, such as the coefficient left is scanned before the right coefficient scan.

The output of the run-level coding rearrangement process is a non-0 factor typically containing one or more clusters in the start end, which is a string 0 value coefficient. These 0 values ​​are encoded and achieve more optimized methods, such as, by means of a number of zeries 0, and the like is indicated.

For example: input arrays: 16, 0, 0, -3, 5, 6, 0, 0, 0, 0, -7,... Output value: (0, 16), (2, -3), (0 , 5), (0, 6), (4, -7)..

The DCT coefficient high frequency portion is often quantified to 0, so that a rearranged block is usually ended in a string zero. A special case is to indicate the last non-0 factor in a block. The so-called two-dimensional run level coding is used. Each run level is encoded in the above manner, and a separate encoding symbol "Last" is used to indicate the location of the last 0 value. If the three-dimensional run level encoding is used, each symbol encoding should be compiled: the running degree, the number, and the last non-0 value. If the above example, you can write: (0, 16, 0), (0, 5, 0), (0, 6, 0), (4, -7, 1)

1 in the last code indicates that this is the last non-0 value in this block.

3.4.4.2 The wavelet coefficient distribution is a lot of coefficients in the high seed wave (the coefficient value of the lower right) is close to 0 values, which can be quantified to 0 without losing the quality of the image. Non-0 coefficients correspond to the structure of the image, for example, in the violin chart, the bow has a clear vertical structure. When the coefficients in the low frequency are non-zero, there is a big possibility that the coefficients corresponding to the high frequency are also 0. This envisaged a quantized coefficient tree of 0 to start at the middle of the low frequency. The coefficient of a single LL in the first layer has a corresponding coefficient value in other first floors. The first layer coefficient position is mapped to the coefficient position of the same position of the four corresponding subsets.

转载请注明原文地址:https://www.9cbs.com/read-47406.html

New Post(0)