MPEG-4 video coding core ideology and technology research

zhaozj2021-02-16  107

1 Introduction

In today's era, information technology and computer Internet development, in which the multimedia information has become the most important carrier for human acquisition information, and it has also become a hot spot in technology development and research in electronic information. Multimedia information has the advantages of easy encryption, strong anti-interference ability, renewable relay after digital processing, but also accompanied by massive data, which has made a high demand for information storage devices and communication networks, thus being hindering people. A major bottleneck that effectively acquires and uses information.

Therefore, it is of great significance to study the efficient multimedia data compression coding method, which is of great significance in compressed form storage and transmitting digitized multimedia information. As the core and key of multimedia technology, multimedia data compression coding has made progress in technology and application in recent years, and its progress and improvement are deeply influenced by the modern society.

2 Video Coding Research and MPEG Standard Evolution

70% of the information acquired by human beings come from vision, video information plays an important role in multimedia information; the maximum video data redundancy is the largest, and the quality of video quality after compression processing is a key factor in determining multimedia service quality. Therefore, digital video technology is the core technology of multimedia applications, and research on video coding has become a hot topic in the field of information technology.

The research topics of video coding mainly include data compression ratio, compression / decompression speed and rapid implementation algorithm. By compression / decompression, data is completely consistent with the compressed pre-compression as a measure of standard, and data compression can be divided into non-distortion compression (ie, inveracted) and two categories with distortion compression (ie irreversible compression).

The traditional compression coding is based on the Xiannong Information Theory is based on the classic set theory as a tool. It describes the source of the probability to describe the source, and its compression idea is based on data statistics, so it can only remove data redundancy, which belongs to low-level compression coding. .

With the rapid development of video coding related disciplines and emerging disciplines, the new generation of data compression techniques is increasingly born and increasingly mature, and their coding ideas are converted to content based on pixels and pixel blocks. It broke through the shackles of the Fairy Nong Information Theory, fully considering the human eye visual characteristics and source characteristics, to achieve data compression by removing content redundancy, can be divided into object-based and based on semantics-based) Both, the former belongs to the middle-layer compression coding, which belongs to high-level compression coding.

At the same time, the formulation of video coding related standards is also improved. The video coding standard is mainly developed by ITU-T and ISO / IEC. The video standard released by ITU-T has MPEG-1, MPEG-2, MPEG-2, MPEG series standards published by ITU-T, H.263 , H.263 . 4 and MPEG-7, and plan to publish MPEG-21.

MPEG is MOVING PICTURE Expert Group, which is an international organization specializing in the development of multimedia video and audio compression coding standards. The MPEG series standard has become the most influential multimedia technical standards in the world, where MPEG-1 and MPEG-2 are first-generation data compression coding technologies such as Xiannong information on predictive coding, transform coding, entropy coding and motion compensation. The MPEG-4 (ISO / IEC 14496) is based on the international standards developed by the second generation compression coding technology. It uses the content-based compression encoding based on the audio-visual media object to achieve digital audio, graphical synthesis applications and Integration of interactive multimedia. The MPEG series standard has a huge and profound impact on the development of audio-visual consumer electronics and digital TVs such as VCD, DVD and high-definition TV (DTV && HDTV), multimedia communications.

3 MPEG-4 video coding core ideas and key technologies

3.1 Core Thoughts

Prior to MPEG-4, MPEG-1, MPEG-2, H.261, H.263 were all the first generation compression coding techniques, focusing on the statistical characteristics of the image signal to design the encoder, belonging to the coding of the waveform encoding. The first generation compression coding scheme separates the video sequence to a series of frames, each frame image is divided into a macroblock to perform motion compensation and encoding, and this coding scheme has the following defects: • Secure the image to the same size. Block, there will be a serious block effect in the case of high compression ratio, that is, mosaic effect;

· You cannot access, edit, and playback of the image;

· The characteristics of Human Visual System (HVS, Human Visual System) are not taken.

MPEG-4 represents the second-generation compression coding technology based on model / object, which makes full use of human eye visual characteristics, seizes the nature of image information transmission, departing from the contour, texture ideas, supports interaction based on visual content This adapts to the development of multimedia information by playback steering, retrieval, and operational development trends.

AV objects (AVO, AUDIO VISUAL OBJECT) are MPEG-4 to support content encoding based on important concepts. Objects refer to entities that can access and manipulate in a scene, the division of objects can be based on its unique texture, motion, shape, model, and high-level semantic basis. The video and audio seen in MPEG-4 is no longer the concept of image frames in MPEG-1, MPEG-2, but an audiovisual scene (AV scene), which consists of different AV objects. . The AV object is an audible, visual, or representation of the content of the content, its basic unit is the original AV object, which can be natural or synthetic sounds, images. The original AV object has high efficiency encoding, efficient storage and transmission, and interactive features, which can further form a composite AV object. Therefore, the basic content of the MPEG-4 standard is to efficiently encode, organize, store, and transfer the AV object. The proposed AV object makes multimedia communication with high interaction and efficient encoding capabilities, and AV object coding is the core coding technology of MPEG-4.

MPEG-4 can not only provide high compression ratio, but also achieve better multimedia content interactivity and all-round access, which uses open coding systems that can be added to the new encoding algorithm module at any time, and can also depend Apply demand on-site configuration decoder to support a variety of multimedia applications.

MPEG-4 uses a new generation of video coding technology. Its first time in the history of video coding, the MPEG has expanded the encoding object from the image frame to any shape video object with the actual meaning, which implements object-based and object-based and The modern coding of the content, thus leading the development trend of new generation of intelligent image coding.

3.2 Key Technologies

MPEG-4, in addition to the core technologies of first generation video encoding, such as transformation coding, motion estimation and motion compensation, quantization, entropy coding, also proposed some new creative key technologies, and in the first generation video coding The technical basis has been effectively improved and improved. Some key technologies are youtrigned below.

1. Video object extraction technology

The MPEG-4 implementation-based primary task is to split video / images into different objects or separate motion objects from the background, and then adopt a corresponding encoding method for different objects to achieve efficient compression. Therefore, the video object extracts video object segmentation, is a key technology of MPEG-4 video coding, and a new generation of video coding research hotspots and difficulties.

Video object segmentation involves the analysis and understanding of video content, which is closely related to disciplines such as artificial intelligence, image understanding, pattern identification and neural network. At present, the development of artificial intelligence is not perfect, the computer does not have observations, identification, understanding the ability; at the same time regarding computer visual research also indicates that the correct image segmentation needs to be understood on the higher level. Therefore, although the MPEG-4 framework has been developed, there is still no common effective way to resolve video object segmentation, and video object segmentation is considered to be a challenging challenge, which is more difficult to semister based. The general step of currently performing video object segmentation is to simplify the raw video / image data to facilitate segmentation, which can be done by low-pass filtering, median filtering, and morphological filtering; and then feature extraction of video / image data, It is a characteristic such as color, texture, motion, frame difference, displacement frame difference or semantic; again based on certain uniformity criteria to determine segmentation decisions, classify video data according to the extracted feature; finally the post-related process is performed In addition to noise and accurate extraction boundary.

The Watershed algorithm based on mathematical form theory is widely used in video segmentation. It is also known as the water line algorithm. Its basic process is a continuous corrosion binary image, simplified by image, marking, decision, and post-processing. . The watershed algorithm has a simple operation, excellent performance, and better extracting the moving object profile to accurately obtain the advantages of the moving object edge. However, gradient information is required at segmentation, more sensitive to noise, and unused inter-frame information, usually generates image excessive segmentation.

2. VoP video encoding technology

Video object plane (VOP, Video Object Plane) is the sample of video objects (VO) at a certain time, and VOP is the core concept of MPEG-4 video encoding. MPEG-4 uses different coding strategies for different VOs during the encoding process, that is, the compression coding of the foreground Vo is as close as possible to the detail and smooth; the background Vo uses a high compression rate encoding policy, and even does not transmit in the decoding end It is spliced ​​from other backgrounds. This object-based video coding not only overcomes the square effect generated by high compression rate encoding in the first generation video encoding, but also allows the user to interact with the scene, thereby improving the compression ratio, but also realizing the content-based interaction, for video The code provides a broad space for development.

MPEG-4 supports codec for any shape image and video. For any shape video object. For real-time applications for extremely low bit rates, such as visual calls, conference television, MPEG-4, using VLBV (VERY LOW BIT-RATE VIDEO, very low bit rate video) cores.

Traditional rectangles are considered to be a special case of VO in MPEG-4, which is embodied in conventional coding and unity based on content encoding in MPEG-4. The introduction of the VO concept is more in line with the handling of vision information and makes the video signal from digitization to intelligence, thereby increasing the interactivity and flexibility of video signals, making wider video applications and more Many content interactions become possible. Therefore, VOP video encoding technology is known as a preliminary exploration of video signal processing techniques from digitization into intelligence.

3. Video encoding gradient technology

With the huge growth of the Internet business, the requirements and applications of video transmission on the IP (Internet Protocol) network and heterogeneous networks with different transmission features. In this context, the importance of video grading coding is increasingly prominent, and its application is very broad, and has high theoretical research and practical value, so it is greatly concerned about people.

SGALABILITY is adjustable, that is, video data is only compressed once, but can decode multiple frame rates, spatial resolution or video quality, so that multiple types of users can support multiple types of users. Various different application requirements.

MPEG-4 implements hierarchical encoding through a video object layer (Vol, Video Object Layer) data structure. MPEG-4 provides two basic hierarchical tools, a Temport Scalability, and a Spa Group (Spatial Scalability), which also supports hybrid grading of time domain and null domains. Each of the hierarchical codes has at least two layers of Vol, and the low layer is called a basic layer, and the high layer is called an enhancement layer. Basic layers provide basic information of the video sequence, and the enhancement layer provides higher resolution and details of video sequences.

In the subsequently added video stream application framework, MPEG-4 proposes FGS (Fine Gran Guity Scalable, fine scalability) video coding algorithm and PFGS (Progressive Fine Gran Guity Scalable) video coding algorithm.

FGS coding is simple to provide flexible adaptive and scalability in coding rate, display resolution, content, decoding complexity, etc., and has strong bandwidth adaptive capabilities and anti-error performance. However, there is still two deficiencies that are not optimal than non-scalable coding and receiving end video quality.

The PFGS is a video coding algorithm proposed to improve the FGS coding efficiency. The basic idea is to use a certain enhancement layer image reconstructed with the previous frame when enhanced layer image encoding, so that the motion compensation is more effective, so that the motion compensation is more effective. Improve coding efficiency.

4. Motion estimation and motion compensation technology

MPEG-4 uses I-VOP, P-VOP, B-VOP three frame formats to characterize different motion compensation types. It uses the H.263 Half Pixel Searching Technology and Overlapped Motion Compensation Technology, and also introduces the repeated padding technology and the modified block (polygon) match (Modified Block. Polygon) Matching technology to support the VOP area of ​​any shape.

In addition, in order to improve the accuracy of motion estimation calculation, MPEG-4 uses MTFAST (Motion Vector Field Adaptive Search Technique) and the improved PMVFast (Predictive Mvfast) method for motion estimation. For global motion estimates, feature-based fast and rehinde-based ffrgmet (Feature-Based Fast and Robust Global Motion Estimation Technique) methods are employed.

In MPEG-4 video coding, the motion estimation is time consuming, the real-time impact on the encoding is large. Therefore, it is particularly emphasized here to emphasize the fast algorithm. The motion estimation method mainly has two major categories: pixel recursive method and block matching method. The former has a high complexity, and the actual application is small, and the latter is widely used in H.263 and MPEG. In block matching, key research block matching criteria and search methods are studied. There are currently three common matching guidelines:

(1) Absolute error and (SAD, SUM OF ABSOLUTE DIFCERENCE) guidelines;

(2) Tienale error (MSE, Mean Square Error) Guidelines;

(3) Normalization of mutual correlation functions (NCCF, Normalized Cross Correlation Function) Guidelines.

In the above three guidelines, the SAD guidelines have the best use of the highest possible and easy to make simple and convenient advantages, but the choice of matching guidelines should not affect the match results. Searching for optimal matching points should be made after selecting the matching criteria. The easiest and most reliable method is full search (FS, FULL Search), but the amount of calculation is too large, not convenient for real-time implementation. Therefore, the quick search method should be born, there is mainly cross search method, two-dimensional logarithmic and diamond search, where the diamond search method is adopted by the MPEG-4 check model (VM, Verification Model), which is described in detail below.

Diamond Search (DS, Diamond Search) is named in the shape of a template, is simple, robust, and efficient, and is one of the optimal fast search algorithms. The basic idea is to use the shape and size of the search template to have an important impact on the speed and accuracy of the motion estimation calculation. When searching for the optimal matching point, selecting a small search template may fall into local optimal, selecting a large search template, may not find the best point. Therefore, the DS algorithm uses the basic law of the motion vector in the video image, and the search template of two shapes is selected.

· Da Diamond Search Template (LDSP, Large Diamond Search Pattern) contains 9 candidate positions;

• Small Diamond Search Template (SDSP, Small Diamond Search Pattern) contains 5 candidate locations.

The DS algorithm search process is as follows: Start phase Repeat the big diamond search template until the best matching block falls in the big diamond center. Since the LDSP step grows, the search range is wide, and the coarse position can be achieved, so that the search will not be locally smalled. When the coarse position is completed, it can be considered that the most advantage is in the rhombic region around 8 points around the LDSP. Then use the small diamond search template to achieve the accurate positioning of the best matching block, so as not to produce a large undulations, thereby increasing motion estimation accuracy.

In addition, SPRITE video encoding technology is also widely used in MPEG-4 as one of its core technologies. Sprite is also known as a mandatory or background panorama, refers to an image that a video object is spliced ​​in the video sequence. The video object can be reconstructed directly by Sprite or predict the compensation encoding.

Sprite video coding can be seen as a more advanced motion estimation and compensation technology, it can overcome the shortcomings of traditional motion estimates and compensation techniques based on fixed block, MPEG-4 is using traditional block coding technology and Sprite A strategy of combining coding technology.

4 Conclusion

The development trend of multimedia data compression coding is based on content compression, which is actually the advanced stage of information processing, and more information processing is more information processing. People's information processing is not signal based, but based on a relatively abstract, it is possible to directly carry out memory and processing.

MPEG-4 is a typical representative of a new generation of multimedia data compression coding. Its first proposes content based on content, object-based compression coding ideas. It requires more analysis of natural or synthetic audiovisual objects or even understanding, which is the advanced phase of information processing, which represents the development direction of modern data compression coding technology.

MPEG-4 implements transition from rectangular frames to VOP and a transition based on object and content based traditional coding, which is reflecting the organic unity of traditional video encoding and new generation video encoding. Content-based interactivity is the core idea of ​​MPEG-4, which is particularly important for the development direction and wide application of video coding technology.

转载请注明原文地址:https://www.9cbs.com/read-10338.html

New Post(0)