Keywords: video ITU-T H.261 H.263 H.264
Digital video technology is widely used in communication, computer, radio and TV, etc., bringing a series of applications such as conference television, visual telephone and digital TV, media storage, and has prompted many video coding standards. ITU-T and ISO / IEC are two major organizations that formulate video coding standards. The standards of ITU-T include H.261, H.263, H.264, mainly used in real-time video communications, such as conference television; MPEG series standards It is made by ISO / IEC, primarily applied to video storage (DVD), radio television, Internet, or wireless online streaming. Two organizations also develop some standards, H.262 standards equivalent to MPEG-2 video coding standards, while the latest H.264 standards are included in Part 10 of MPEG-4.
This article describes H.261, H.263 and H.264 in accordance with the development of ITU-T video coding standards.
H.261 video coding standard
H.261 is that ITU-T is made of two-way sound image services (video calls, video conferencing) on the Integrated Service Digital Network (ISDN), with an integer multiple of 64kb / s. H.261 is only processed on both of the CIF and QCIF, each frame image divided into an image layer, a macroblock group (GOB) layer, a macroblock (MB) layer, block (block) layer.
H.261 is the earliest moving image compression criterion, which develops various parts of video coding, including inter prediction, DCT transformation, quantization, entropy encoding, and rate control of the fixed rate channel, etc. section.
H.263 video coding standard
H.263 is the earliest ITU-T standard for low yield video encoding, and then the second edition (H.263 ) and H.263 have added a number of options to make it more applicable.
H.263 video compression standard
H.263 is a video coding criterion for narrowband communication channels below 64kb / s. It is developed based on H.261, and its standard input image format can be S-QCIF, QCIF, CIF, 4CIF or 16CIF color 4: 2: 0 sub-sample images. H.263 is compared with H.261 with semi-pixel motion compensation and increases four effective compression coding modes.
Unlimited motion vector mode allows motion vectors to point to areas other than images. When a reference macroblock refers to a reference macroblock is outside the encoded image, the image pixel value of its edge is replaced. When there is a span-bound motion, this mode can achieve a large coding gain, especially for small images. In addition, this mode includes an extension of the motion vector range, allowing for a larger motion vector, which is particularly advantageous for camera motion.
The arithmetic coding mode based on syntax uses arithmetic coding instead of Hoffman coding, which can reduce the crystal rate in the case of signal-to-noise ratio and reconstructing image quality.
Advanced prediction mode allows 4 8 × 8 brightness blocks in a macroblock to correspond to a motion vector, thereby increasing the prediction accuracy; the motion vector of the two color blocks takes the average of the four brightness block motion vectors. When compensating, the compensation value of each pixel of 8 × 8 brightness block is weighted by three predicted values. Using this mode can produce significant coding gain, especially the overlapping block motion compensation, which will reduce the block effect and improve the subjective quality.
The PB-frame mode specifies that a PB-frame includes two frame images that are encoded as one unit. The PB-frame mode can double the frame rate in the case where the rate increase is increased.
H.263 Video Compression Standard Version 2
ITU-T Revised version 2 of the H.263 standard after the H.263 was released, and it was informally named H.263 standard. It has increased several options to increase compression efficiency or improvement of the functionality of compression efficiency or improvement of the functionality of the original H.263 standard core syntax and semantic. The original H.263 standard limits the image input format of its application, only 5 video source formats are allowed. The H.263 standard allows for a larger-wide image input format, the size of the image, which broadens the scope of the standard usage, enabling it to handle the window-based computer image, higher frame rate image sequence and widescreen image. In order to improve the compression efficiency, H.263 adopts advanced intra coding mode; enhanced PB-frame mode improves the shortcomings of H.263, enhances the effect of inter prediction; the deck effect filter not only improves compression efficiency, And provide the subjective quality of reconstructing images.
In order to adapt to network transmission, H.263 increase the time classification, signal-to-noise ratio and spatial grading, which is meaningful for transmitting video signals in the noise channel and there is a large number of packets. In addition, the slice structure mode, reference frame selection mode Enhanced the anti-comment capability of video transmission.
H.263 video compression standard
H263 adds 3 options on the basis of H263 , mainly to enhance the anti-comment performance of the code stream in the harsh channel, while in order to increase the enhancement coding efficiency. These three options are:
Option U - referred to as enhanced reference frame selection, which provides enhanced coding efficiency and channel error regeneration capability (especially in the case of packet loss), requires a multi-buffer for storage multi-reference frame images.
Option V- is called a data slice, which provides enhanced anti-comment capabilities (especially if local data is broken in the transmission process, by separating the coefficient head and motion vector data of DCT in the video code stream. , Protect motion vectors with reversible coding mode.
Option W - Add additional information in the H263 code stream to ensure that the enhanced reverse compatibility, additional information includes: indicating the fixed point IDCT, image information, information type, any binary data, text, and repetitive image header , Alternating field indication, sparse reference frame identification. H.264 video coding standard
H.264 is a new generation of video compression coding criteria established by the United Video Group (JVT) composed of ISO / IEC and ITU-T. In fact, the H.264 standard can be traced back to 8 years ago. After the H.263 standard was developed in 1996, the ITU-T video coding expert group (VCEG) started two aspects: one is a short-term research plan, increasing options based on H.263 (H.263 With H.263 ); the other is a long-term research plan to develop a new standard to support low-rate video communications. The long-term research program produces a draft H.26L standard, which has significant superiority compared to the initial ITU-T video compression standard in terms of compression efficiency. In 2001, ISO's MPEG organization recognized the potential advantage of H.26L, then ISO and ITU began to build a joint video group (JVT) from ISO / IEC MPEG and ITU-T VCEG, and the main task of JVT is to turn H.26L. The draft development is an international standard. Thus, the standard is named AVC (Advanced Video Coding) in ISO / IEC, as the 10th option of the MPEG-4 standard; in ITU-T, it is officially named H.264 standard. The main advantages of H.264 are as follows:
Under the same reconstruction image quality, H.264 is reduced by 50% less code rate than H.263 and MPEG-4 (SP).
Strong adaptability to channel delays, both of which can be operated in low-density mode to meet real-time services, such as conference television, etc., while working in cases without time-delay, such as video storage, etc.
Improve network adaptability, use "network friendly" structure and syntax, strengthen the processing of the error and packet loss, and improve the error recovery ability of the decoder.
The complexity graded design is used in the editing / decoder, which can be classified between image quality and coding processing to accommodate different complexity applications. Compared to the secondary video compression standard, H.264 introduces many advanced technologies, including 4 × 4 integer transformation, intra-frame prediction in the airspace, motion estimation of 1/4 pixel accuracy, multi-reference frames and multiple large blocks Interface prediction technology, etc. The new technology has brought higher compression ratio while significantly improving the complexity of the algorithm.
4 × 4 integer transformation
Previous criteria, such as H.263 or MPEG-4, all using 8x8 DCT transformations. The integer transformation recommended in H.26L is actually close to the 4 × 4 DCT transformation. The introduction of integers reduces the complexity of the algorithm, and the mismatch problem of the algorithm is avoided, and the 4 × 4 block can reduce block effect. The 4 × 4 integer transformation of H.264 further reduces the complexity of the algorithm, compared to the integer transformation recommended in H.26L, for 9B input residual data, by the previous 32B to the current 16B operation, and the entire transformation Mussence, just add and some shift operations. The new transformation has little effect on the performance of the encoding, and actually coding is slightly coded.
Intra-intra prediction technology based on null domain
Video coding is the purpose of compression by removing the space and time correlation of the image. Spatial relevance is removed by effective transformation, such as DCT transform, H.264 integer transformation; time-dependent is removed by inter prediction. The transform removal spatial correlation here is limited to the transformed block, such as 8 × 8 or 4 × 4, and does not have a process between blocks and blocks. H.263 and MPEG-4 introduce intra prediction techniques, and predict certain factors of the current block in the transform field. H.264 is in the airspace, using the current block to predict each coefficient, more effectively remove the correlation between the phase blocks, greatly increasing the efficiency of intra encoding.
The intra prediction of the basic portion of the H.264 includes a prediction of 9 4 × 4 brightness blocks, predictions of 4 16 × 16 luminance blocks and 4 colorimeters.
Motion estimation
H.264's motion estimates have three new features: 1/4 pixel accuracy motion estimation; seven different sizes of different blocks; forward and backward multi-reference frames.
H.264 In the interframe encoding, a macroblock (16 × 16) can be divided into 16 x 8, 8 × 16, 8 × 8 block, and 8 x 8 block is called a sub-macroblock, and It is divided into 8 x 4, 4 × 8, 4 × 4 block. Overall, there are 7 different blocks to make motion estimates to find the most matching type. Unlike previously standard P frames, the B frame is different, and H.264 uses forward and back to multiple reference frames. Semi-pixel precidity estimation effectively improves the compression ratio than the entire pixel motion estimation, while the motion estimation of 1/4 pixel accuracy can bring a better compression.
The encoder uses a plurality of different sizes to perform motion estimation, saving more than 15% of the bit rate (relative to 16 × 16 block). Use 1/4 pixel accuracy of motion estimates to save 20% of the code rate (relative to orthographic prediction). In terms of multi-reference frame prediction, it is assumed that 5 reference frames are predicted, with respect to a reference frame, can reduce the code rate of 5% to 10%. The above percentages are statistical data, and different videos differ from their detail features and movements.
Entropy encoding
There are two entropy coding adopted by the H.264 standard: one is a combination of adaptive variable length coding (CAVLC) and unified variable length coding (UVLC); the other is based on content-based adaptive binary arithmetic coding ( Cabac). CAVLC is encoded by the CABAC based on the case of the phase block to achieve better coding efficiency. Cabac is higher than CAVLC compression, but it is more complicated.
Device effect filter
The H.264 standard introduces the deck effect filter, filter the boundary of the block, the filter strength is related to the encoding mode of the block, the coefficient of motion vector, and block. The deck effect filter improves the subjective effect of the image while increasing the compression efficiency.
Other video coding standards
In addition to the video compression standards of the above ITU-T, some standards are also popular, such as MPEG-4, AVS, WM9. H.264 is also known as MPEG-4 AVC, and currently, the MPEG-4, which is currently referred to, refers to SP (Simple) or ASP (advanced shortage), mainly for low yard ratios, such as streaming media on the Internet. , Wireless network video transmission and video storage, etc., its core is similar to H.263.
MPEG-4 SP and H.263 have many similarities, as shown in Schedule. However, there is also a significant difference between these two standards, mainly in: Code stream structure and header information, entropy encoding partial code table, some details of encoding technology. MPEG-4 ASP adds some techniques to SP, mainly: 1/4 pixel accuracy of motion estimates, B frame, global motion vector (GMV), so compression efficiency is improved.
AVS is a self-developed audio / video coding technical standard, mainly for high-definition television, high-density optical storage media and other applications. The AVS standard is based on the current state-of-the-art MPEG-4 AVC / H.264 framework, emphasizes independent intellectual property, while fully considering the complexity of implementation. Relative to H.264, the main features of AVS are: (1) 8 × 8 integer transformation and 64-level quantization; (2) brightness and chrominating intra prediction are in units of 8 × 8 blocks, and the brightness block is 5 The prediction mode, the color block is used in four prediction modes; (3) uses 16 × 16, 16 × 8, 8 × 16 and 8 × 8 4 block mode for motion compensation; (4) in 1/4 pixel movement Estimation, a different four-tap filter is used to perform semi-pixel interpolation and 1/4 pixel interpolation; (5) P frames can utilize up to 2 frames of forward reference frames, and the B frame adopts one of the front and rear reference frames.
Window Meida 9 (WM9) is a new generation of digital media technologies developed by Microsoft. Some tests show that the video compression efficiency of WM9 is much higher than MPEG-2, MPEG-4 SP and H.263, and compressive efficiency with H.264.
Conclude
At present, H.261 and H.263 are widely used in video communications, and mature products have been many. Compared with H.261, a number of options have been added to provide more flexible encoding mode, and the compression efficiency is greatly improved, and more adaptable to network transmission. The launch of the H.264 standard is an important advancement of video coding standards, which has significant advantages compared to existing MPEG-2, MPEG-4 SP and H.263, especially in coding efficiency. Make it for many new areas. Although H.264 complexity is more than 4 times more of the existing encoded compression standards, with the rapid development of integrated circuit technology, the application of H.264 will become a reality. (Mr. Guo Xiaqiang, this article, Beijing University of Posts and Telecommunications; Mr. Miginal, Professor, Multimedia Communication Center, Telecom Academy, from the "World Radio and Television")