MPEG4 & H.264 Learn II ------ Video Code Basic Concepts and Time Domain Models

xiaoxiao2021-03-06  18

.1 Description Compression verbs: squeezed into a smaller space; a Condense compression (Conpress) Noun: Compressed behavior or compressed state

Compression is a technique that stores data with smaller space. Video compression (video coding) is a method of storeing digital video streams using fewer data bits. "RAW" or a large number of video that is not compressed Code (approximately 216m per second), and it is desirable to compress storage and transmission of digital video.

Compression includes a pair of complementary systems, an encoder (Encoder) and a decoder. The encoder converts the original data to a compressed format before transmitting or stored, while the decompression is compressed The format transfer to the original video data format. Encoder / decoder is often called CODEC (Encoder / Decoder)

Data compression is implemented by removing data redundancy, such as data that is useless during data reconstruction. Many data have statistically redundant, which can be effectively compressed by lossless compression, such as JPEG-LS, which can reach 3-4 times compression. Lossless compression can reach a higher compression ratio. In a lossless compression system, the decompression data and source stream data are different, and the high compression ratio is achieved by a decrease in video quality. The damaged video compression system is based on the principle of deleting subjective redundancy, from the part of the image or video from image or video, does not greatly affect the observer's understanding of the quality of video quality.

Most video encoding methods look for space and time redundancy to achieve compression. In the time domain, the video of the continuous frames usually has strong correlation, especially when the sample rate is very high, especially when the domain sample rate is very high. In the spatial domain, usually between pixel sampling points is interrelated, such as between adjacent pixels.

A part of the features are shared by H.264 and MPEG4 video standards. Both standards assume a block-based motion compensation, transform, quantization, and entropy encoding. We mainly focus on these major methods and start from time models, followed by image transformation, quantization, predictive coding, and entropy encoding. And a process for encoding and decoding an image sampling block is described.

3.2 Video Codec A video encoder converts a source image or video sequence into a compressed mode, and constructs it as a copy of the source sequence or an approximation. If the video sequence is extracted The original sequence is the same, then the encoding process is non-destructive, and if the decompression sequence is different from the source sequence, then this process is damaged.

CODEC uses a model to represent the original video stream (a way to be efficiently encoded, and can be used to reconstruct the approximate result of the video data or to achieve a high code rate. These two objectives (compression efficiency and high quality) are usually It is contradictory because a low compression code rate is usually decompressed portion reduces the quality of the image. The quasity and quality balance We will introduce later.

The video encoder is implemented by three main features: time domain model, airspace model, and entropy encoding. The input of the time domain model is an uncompressed video stream sequence. The time domain model attempts to eliminate time domain redundancy with the similarity of adjacent frames, typically the prediction of the construction of the current frame. In the MPEG4 video section and H.264, predicting the frames typically proceed from one or more or after, and compensated by the difference between the frames. The output of the time domain model is a residual frame (by subtracting the predicted value from the actual current frame), and a series of model parameters, typically a series of motion vectors to describe how motion is compensated.

The remaining frames constitute the input of the time domain model, which uses the similarity of the adjacent sampling point to reduce the redundancy of the airspace. In the MPEG4 video section and H.264, this is usually processed by some variables. Transform Turns the sampling point into other domains in these domains. These coefficients are quantified to delete unambiguous values, leaving only a few large coefficients to represent the remaining frames. The output of the airspace model is a series of quantitative transformations.

The parameters of the time domain model (usually the motion vector) and the diameter (coefficient) of the airspace model are typically compressed with entropy coding. This deletes the statistical redundancy (for example, with a short binary code represents the current vector and coefficients) and manufactures a compressed stream or file to perform transmission or storage. A compressed sequence is represented by the encoded motion vector parameter, the encoded residual coefficient, and head information. The video solution is constructed from a compressed stream. The coefficients and motion vectors are decoded by the entropy decoder, and the airspace model constructs a version of the remaining frames. The decoder uses the motion vector parameter to form a prediction of the current frame, and then the frame can be obtained by adding this remaining frame.

3.3 Time Domain Models The target of time domain models is to reduce redundancy reduction of transmission frames. The output of this process is a remaining frame, and the more accurate prediction process, the smaller the energy included in the remaining frames. The remaining frame is encoded and sent to the decompression device to construct a predicted frame, plus the residual frame decoded to constitute the current frame. The prediction frame is created by one or more past or future frames (called reference frames). The accuracy of the prediction is usually improved by the motion compensation between the reference frame and the current frame.

3.3.1 The easiest way to predict the prediction time domain from the previous video frame is to predict from the previous frame of the current frame. The difference between the two frames is used as the remaining frame. The obvious problem with this method is that there is too much energy in the remaining frame, which is to say that there is a large amount of data after the time domain compression can be compressed. Most of the remaining energy is due to the motion compensation of the two frames in the two frames.

3.3.2 Changes of changes due to motion can be caused by motion of the video frame (rigid object motion, such as a moving car, or deformable object, such as moving arm) The movement of the camera head is removed (for example, in a moving scene due to the movement of the motion) and the change of the light. Remove the de-coverage and light change, the difference is caused by the movement of the pixel points between the frames. The adjacent to the pixel point thus constructed is an Optical Flow. The complete domain includes the optical stream vectors for each pixel position, but these domains are sampling, in which only the vector per two pixels is displayed.

If the optical basin is accurately described, it should be able to constitute an accurate prediction of most pixel points, and the method is to move each pixel point in the reference frame along the optical flow vector. However, under various reasons, this is not a practical method of motion compensation. The accurate calculation of the optical stream is very sensitive to calculations and the light flow to the decoder for each pixel point is also necessary. Make it able to reconstruct the predicted frame (which has received a lot of transfer data, which is contradictory with our small residual value).

3.3.3 Block-based motion estimation and compensation of a practical and widely used motion compensation method is to ly with a "block" of the current frame. The following steps processes the processing of one M × N sample of the current frame:

1. Find a frame (past or later, encoded or transmitted) in the reference frame to find a region where M * N is found. This is a block that compares M * N in the lookup area and finds a region closest to this block. A popular matching method is to use the energy difference between the two blocks, such that the two areas whose energy phase is minimized is considered to be the best matching result. This process is called motion estimation.

2. The selected area is a prediction block for the current M * N block, which is reduced to the current block to constitute a remaining block (motion compensation) of M * N.

3. The remaining block is encoded and transmitted, the difference (motion vector) of the current block and the selected block is also transmitted.

The decoder uses the received motion vector to rebuild the prediction area and extract the remaining blocks, plus the previous predictive block to rebuild a original block.

There are many reasons why block-based motion compensation is so popular. It is also very easy to handle, it is very compatible with the rectangular video frame and uses a block-based graphic transformation (such as DCT, etc.), and it provides a very effective time domain model for the video sequence. However, there are some shortcomings, for example, there is almost no smoothing edges in the actual object to match the rectangular boundary, and the object is typically moved between the frames (such as deformable objects, rotation and distortion). And many sports are difficult to compensate in block-based methods. Although it has these disadvantages, block-based motion compensation is the basis for the time domain model used by all video standards. 3.3.4 For motion compensation macro blocks of a macroblock, corresponding to a 16 * 16 frame area, is a big Most coding standards (including MPEG1, MPEG2, MPEG4, H.261, H.263, and H.264) motion compensation prediction methods. For video signals of source 4: 2: 0, a 16 * 16 pixel area of ​​the source frame is sampled from 256 brightness sampling (arranged with 4 8 * 8 sampling blocks), 64 CB chroma sampling ( Use an 8 * 8 sampling block), 64 CR chromidity sampling (arranged with a 8 * 8 sample block) so that a total of six 8 * 8 blocks. The CODEC of MPEG4 or H.264 handles each video frame through a macroblock.

Motion estimation for motion estimation of a macroblock is included in the reference frame to find a sampling area of ​​16 * 16 similar to the current macroblock. The reference frame is a frame that is previously encoded. The area of ​​the reference frame is centered on the position of the current macroblock, and the matching area of ​​the macroblock area of ​​16 * 16 is the best matching result.

Motion compensation The best matching area selected in the reference frame is a portion of the reduced portion from the current macroblock, which is used to form a remaining frame, followed by coding and transmitting through the motion vector, and it also describes the best corresponding area. Within the encoder, the remaining frames are encoded and decoded and applied to the corresponding area to reconstruct a macroblock, which is stored as a reference frame for a later motion compensation prediction. The macroblock is reconstructed with a decoded remaining frame to ensure that the encoder and decoder use the same reference frame during motion compensation.

There are many different methods in the process of motion estimation and compensation. The reference frame may be one frame prior or after, or it may be combined or more frames. . If one of the frames are used for reference, then this frame is first encoded. There is a large change in the reference frame and the current frame (such as, a scene change), may not use motion compensation to encode macroblocks more effective. This allows the encoder to select the intra code (coding without motion compensation. The object moved in the moving scene rarely meets the edge of 16 * 16 pixel edges and so that motion estimation and compensation will be more efficient. The object may have little pixel part of the pixel, so that the interpolation method in the reference frame will be a better prediction method.

3.3.5 During two consecutive frames in the Sports Compensation Block Size Video Sequence, FIG. 1 is a remaining frame obtained from the reduction operation from the motion compensation. The energy in the remaining frame is made by a macroblock of 16 * 16. Motion compensation is lowered. Blocks per 8 * 8 (instead of 16 * 16) will reduce the energy in the remaining frames, and 4 * 4 will be less. This example shows that small motion compensation blocks can be better Motion compensation effect. However, a smaller motion compensation block causes an increase in complexity (more search operations are executed), so more motion vectors are transmitted. Sending each motion vector require some bit to do, this excess burden will offset the advantage of the reduction in energy in the remaining frames. A more efficient compromise method is to adopt a block size of the adjustment, such as a large block in a similar area, while selecting a smaller block in the highly detailed movement. H.264 uses an adaptive motion compensation block size, which will be mentioned later.

3.3.6 Sub-pixel motion compensation In some cases, better motion compensation prediction is obtained by predicting from interpolation sampling positions in the reference frame. "Sub-pixel" motion estimation and compensation includes looking for the interpolation position of the sub-sampling and the integer sampling position to find the best match (for example, the energy of the remaining frame) is directly moved in this location or sub-sampling value. Compensation forecast. For quarter pixel motion, the first step, the motion estimation will find the best match for the integer pattern, and then the encoder finds a semi-sampling location near the best match to see if it can be improved, if If necessary, the quarter sampling position near the half sampling best matching pixel is found. The final match (whole, sub-or quarter sampling position) is obtained by the current block or macroblock. The remaining frames obtained by 4 * 4 blocks can use half sampling interpolation and have lower remaining energy. This method can be expanded from a quarter-sampling to get smaller remaining frame energy. Overall, "better" interpolation can achieve better motion compensation performance (smaller remaining frame energy), and the cost will be higher. The performance improvement thus obtained will be offset by the increase in the steps of interpolation. Half sample interpolation has better performance compared to the whole sample, quarter-sampling interpolation gives better results, one-third more wins, and so on.

A motion compensated reference frame is subtracted from the current frame, the energy of the remaining frames is as follows (approximate SAE with total absolute error): SEQUENCE NO MOTION COMPENSATION INTEGER-PEL HALF-PEL Quarter-Pel'violin ', QCIF 171945 153475 128320 113744'grass ', QCIF 248316 245784 228952 215585'carphone', QCIF 102418 73952 56492 47780

A low SAE represents better motion compensation performance. In each case, the sub-pixel motion compensation is better performance performance relative to whole pixel compensation, and the quarter compensation will be better. "Grasses" sequence has more complex exercise, which is more difficult to implement exercise compensation, grief "Violen" and "Carphone" more easily compensated.

Finding a block in quarter sampling interpolation is more complicated than 16 * 16 blocks without interpolation. In addition to greater complexity, because the motion vector of each block is encoded and transmitted to the receiving end to better rebuild the image, this has the performance loss on the encoding. Because the block size is reduced, the number of transmitted vectors is increased. More bits are needed to represent half sampling or quarter sampling vector because the more detail portions of the vector must also be encoded like the whole sample. In this way, there is a balance between encoded efficiency in the compensation mode of complex movement, because the more accurate motion compensation, there are more data to represent the sports domain, but if it is not so accurate, there is no need to Bit. 3.3.7 Sports Based Motion Compensation Moving in a natural video scenario is rarely aligned to the edges of the block, they are likely to be irregularly shaped, placed in any position, and change at inter-frame change Their shape. This is difficult to find an ideal match in the reference frame because it covers some of the sports objects, some static objects.

Any area of ​​the image is applied in any area to achieve better performance. For example, if we try to perform motion compensation in an elliptical object, we can find an ideal match in the reference frame. However, the use of region-based motion compensation will encounter a lot of practical difficulties, including identifying zone boundaries, pointing out the contours of the boundary, encoding the remaining frames after exercise compensation, MPEG-4 video includes a series of tools to support the region-based compensation and coding.

转载请注明原文地址:https://www.9cbs.com/read-49320.html

New Post(0)