2.1 Description Video Coding is the process of encoding and decoding a digital video signal. This chapter discusses the structure and characteristics of digital images and video signals, and some basic concepts, such as sample formats, etc. Digital video is a way to sampling from time and space for a natural visual scene. One scene is generated by sampling of the point in time (a point to the entire video for time The representation method in the scene) or a field (consisting of spatial sampling of an odd or even line). Sampling is repeated on a certain time interval (usually 1/25 or 1/30 second interval), resulting in one Move video signal. In general, three sampling sets are required to represent a colored scene. A popular way to represent digital video is to use the ITU-R 601 standard and use "intermediate set". The accuracy of the reconstruction of a visual scene must be calculated to determine the performance of a video communication system, which is an excellent difficulty and extremely inaccurate process. Subjective measurement methods are very time consuming, and it is different from the observer to the reactive procedure of the transformed. The objective measurement method is simpler, but it is not fully matched with the actual visual feeling of human beings.
2.2 Natural Video Place
A classic "real world" or "natural world" video scenario is made of a plurality of objects with respective characteristic shapes, depth, texture, and brightness. The color and brightness of the video scenario are set in different scenarios according to the smoothness of different programs. A classic natural video scenario associated with video processing and compression includes spatial feature (texture conversion, number of objects, color, etc.) and time characteristics (image of object motion, change in brightness, movement of viewpoints, etc.)
2.3 Capturing a natural video scenario is continuous in space and time. In the form of a number, a video scene includes sampling the actual scene in space (typically by processing in a video map) and time sampling (composed of a series of static frames obtained at a certain time interval). Digital video is a way of representation of a video scene under digital form. Each time-empty sampling (pixel) is represented by a number or a set of numbers to describe the brightness and chromation of the sample point.
In order to obtain a two-dimensional sampling image, the camera focuses a two-dimensional projection of a video scene to the sensor, such as a set of charge coupling devices (CCD). In the colored image capture process, each color member is filtered and projected into a set of CCDs, respectively.
2.3.1 Space Sampling A set of CCDs is an analog video signal that represents an electrical signal of a video image. Sampling at time forms a set of sampled samples or frames. The most commonly used sampling method is to operate the sampling point in a square or rectangular. Then sampling at the point at each grade point, and the reconstruction process is displayed on the pixel in the sample value. The visual effect of the reconstruction is depends on the number of sample points. Select a rough sampling base to get a low resolution sampling image, and the number of sampling points will increase the resolution of the sampling image.
2.3.2 Time Sampling A movable video image is obtained by performing snaps on the periodic time interval. Replay this series of frames will get a movement effect. A high time sampling rate (frame rate) generates a smoother movement but it requires more sampling to be captured and saved. Some of the frame rates under 10 frames are somewhat for some very low rate video communications (because the amount of data transmitted is very small) but the movement does not look awkward and unnatural. At 10-20 frames are more classic low yard rates per second. Sampling of the 25-30 frames is the sampling frame rate of the standard TV signal image (with the interlaced sampling sample to achieve a better motion effect). 50-60 frames can form smooth motion per second (cost is too high, transmission, and storage pressure).
2.3.3 Frame and field A video signal can be sampled by a field (interlaced sampling sample) for a series of frames (gradual samples) or a sequence of interlaced. In a video sequence of an interlaced sampling, half of the data of one frame is sampled at each time sampling interval. A field consists of an odd or even scan line, while a video sequence of an interlaced scan includes a series of video frames. The advantage of this sampling method is that more than two times more fields can be transmitted in one second compared to the graded sequence of the same rate of similar frames, which can form a more smooth motion. For example, a PAL video sequence consists of 50 fields / sec, during playback, motion can be more smooth than motion that is formed by the same 25 frames per second. 2.4 Most digital video programs in color space rely on color video display, in which one is needed to capture and represent a representation of color space. A monochrome image requires only one brightness or lumen value of a pixel point in space. However, for color images, the color information is accurately expressed for at least three numbers for a pixel point. The method used to represent brightness and color is called color space.
2.4.1 RGB In the RGB color space, a color-like image sample is used to represent a relative red, green, and blue ratio of a pixel point (the primary composition color of three light). Any color can be mixed by mixing red, green and blue through different ratios. The RGB color space is more suitable for capturing and displaying color images. The capture RGB image includes the composition ratio of red, green and blue, and captures with a separate sensor array. CRT and LCD are displayed by displaying the red green blue value of each pixel point, respectively. From a usual observation distance, different components can reach the true feelings of colors.
2.4.2 YCBCR Human Visual System (HVS) is not so sensitive to color than brightness. In the RGB color space, three colors are viewed equally and stored with the same resolution. However, by separating the brightness with color information, a color image can be more efficiently represented by a higher resolution on the luminance value.
YCBCR color space and its transformation (usually written as YUV) are a way to express a color image. Y is the brightness value, the weighting average of R, G, B can be obtained: Y = Krr KGG KBB Herek here K is a weighting factor.
The color signal can be represented by different color differences: CB = B-YCR = R-YCG = G-Y is represented by a given Y and three color differences: Cb, Cr, and CG for a complete description of a color image: CB, Cr, CG.
So far, our representation is not so good, because compared to RGB, we used four parameters this time. Then the CB CR CG is a constant, then we only need two chromatic parameters, and the third can be calculated by other two. In YCBCR space, only Y and CBs, CR values are transmitted and stored, and the resolution of CB and Cr can be less than Y, because the human visual system is more sensitive to brightness. This reduces the amount of data indicating the image. Under normal observation, the image represented by RGB and YCBCR looks different. Sampling is a simple and effective compression method for sampling of a resolution of low brightness than brightness.
A RGB image can be converted to YCBCR format after capture to reduce storage and transmission burden. Before the image is displayed, then returns to RGB. Note that it is not necessary to specify the respective weight value KG (because KB KR KG = 1), and G can be done from YCBCR, which means that there is no need to store and transmit the CG. parameter.
Y = Kr R (1-KB-KR) G KB BCB = 0.5 / (1-KB) * (B-Y) Cr = 0.5 / (1-kR) * (R-Y)
R = Y (1-KR) / 0.5 * CRG = Y - 2KB (1-KB) / (1-KB-KR) * CB - 2kr (1-KR) / (1-KB-KR) * CRB = Y (1-kB) / 0.5 * CBITU-R BT.601 resolution defined KB = 0.114, Kr = 0.299, then the transform parameters are equipped with the following equation:
Y = 0.299r 0.587 g 0.114bcb = 0.564 (b - y) Cr = 0.713 (R - Y)
R = Y 1.402CRG = Y - 0.344CB - 0.714CRB = Y 1.772CB
2.4.3 YCBCR Sampling Format 4: 4: 4 Sampling is said that three elements Y, CB, CR have the same resolution, in which three elements are sampled on each pixel point. Number 4 means In the horizontal direction, the sampling rate of various elements, for example, there are four CB CR sampling points for each four brightness sampling points. 4: 4: 4 Sampling completely all information values. 4: 2: 2 Sampling (sometimes not yuy2), the chromaticity element has the same resolution in the longitudinal direction and the luminance value, and in the horizontal direction is half the brightness resolution (4: 2: 2 means each of the four brightness values. CB and Cr Sampling.) 4: 2: 2 Video Use to construct high quality video color signals.
In the popular 4: 2: 0 sampling format (often written as YV12) CB and Cr in the horizontal and vertical direction. 4: 2: 0 is somewhat different, because it does not mean in actual sampling Using 4: 2: 0, this coding method is defined in coding history is used to distinguish between 4: 4: 4: 2: 2: 2 method). 4: 2: 0 Sampling is widely used in consumption Applications, such as video conferencing, digital TVs, and DVD storage. Because each color difference element contains a quarter of Y sampling element, then 4: 2: 0YCBCR video requires just 4: 4: 4 or RGB video half.
4: 2: 0 Sampling is sometimes described as a method of "12 bits per pixel". The reason can be seen from the sampling of four pixels. Use 4: 4: 4 samples, a total of 12 samples, for each Y, CB and Cr, 12 * 8 = 96 bits, average It is 96/4 = 24 bits below. 6 * 8 = 48 bits are required to use 4: 2: 0, and the averages each pixel 48/4 = 12 bits.
In a video sequence of a 4: 2: 0 interlaced, it is assigned to two fields corresponding to Y, CB, Cr sampling of a complete video frame. It can be obtained, and the total sample number of interlaced scans is the same as the number of samples used in the progressive scan.
2.5 Video Format This book describes the video compression criteria to compress a variety of video frame formats. In practice, capturing or transforming a middle format or a series of intermediate formats is common. CIF is a common popular format and is derived from 4CIF and SUB-QCIF. The selection of the frame resolution depends on the application, the storage amount and the transmission bandwidth can be used. For example, 4CIF is suitable for standard defined television and DVD video, CIF and QCIF are often used in a video conference. QCIF and SQCIF are suitable for mobile devices' multimedia programs, in which case resolution and code rate are limited. The following is a need for specific use bits of various formats (using 4: 2: 0 samples, indicating 8 bit size for each element):
Format: Sub-qcif Brightness resolution: 128 * 96 Bits for each frame: qcif brightness resolution: 176 * 144 Bits for each frame: 304128 Format: CIF brightness resolution: 352 * 288 per frame Bit: 1216512 Format: 4CIF brightness resolution: 704 * 576 Bits per frame: 4866048
A wide range of digital video signal encoding formats applied in a television signal is the BT.601-5 proposal of ITU-R. The brightness element is sampled at 13.5 MHz, and the brightness value is sampled at 6.75 MHz, which forms 4: 2; 2 Y: CB: CR sampling results. The parameters of the sample digital signal depends on the video code rate (30 Hz for NTSC, 25 Hz for PAL / SECAM). The NTSC's 30Hz is compensated for low spatial resolution. This total code rate is 216Mbps. The actual display of the active portion of the activation part is smaller than the total amount, because it removes the horizontal and vertical white interval at one frame. Each sample has a sampling range of 0-255. 0 and 255 Two levels are left as synchronization, and the activated brightness signal is limited to 26 (black) to 235 (white). 2.6 Quality In order to specify, evaluate and compare the video communication system, we need to decide to the observer The quality of the video image. Measuring the quality of the video signal is a difficult thing, usually inaccurate, because there are too many factors that affect the results of the measurement. Visual quality and child are subjective factors, which have been influenced by many factors, which makes it harder for this measurement. For example, the quality of a video signal is mainly dependent on the task itself. For example, the passive watching a DVD movie, actively involved in a video conference, communicating with symbol review, or trying from one A person recognizes in the video scenario. Measuring the objective classification of the video signal gives an accurate repeatable result, but there is no objective measurement method to completely simulate human visual subjective feelings.
2.6.1 Subjective Quality Measurement 2.6.1.1 Factors Affecting Subjective Quality For a video scenario is determined by human visual system for complex interactivity for different elements - eyes and brains. Perception for video signals is spaced The impact of the truth (no matter whether there is any obvious distortion, the problem is whether we can clearly see the various parts of a scene) and time fidelity (whether the movement is naturally smooth). However, an observer is often observed in the environment, and the observer's mood and the observer are related to the interactive procedure of the scene. Users who perform specific tasks need to focus on a part of the video scenario. Observing a scene is different from the concept of "good" when watching a movie. For example, an observer's view of the video quality will be better in the case of observing the environment (and this point does not depend on the good and bad video signal itself)
Other important influencing factors include visual focus (an observer through a series of observation points instead of simultaneously observing all content) and the so-called "latest effect" (we always update more about a video sequence The impact of the content is not the old content). All these factors make the task of measuring the quality of the quality of a video be extremely difficult.
2.6.1.2 Itu-R 500 A lot of test processes on subjective quality are defined in ITU-R BT.500-11. A commonly used process is the Double Stimulus Continuous Quality Scale (DSCQS) method, and the evaluator is displayed a series of pictures or two video sequences A and B (a one-way), and then is required to give the quality review of A and B Value, the method is to draw a continuous line from five separated evaluations (from "Excellent" to "Bad"). In a typical test session, the evaluator is displayed a series of sequences and is required to evaluate them. For each pair of sequences, one is a uncommon "reference" sequence, the other is the same sequence, which is modified in the system or process of the test.
The order of these two sequences, the original and damaged, given randomly in the test process, so that the evaluator does not know which is the original, which is the changed sequence. This prevents the evaluator from biased by the two test sequences. At the end, the score was converted to a normalized range, and the final result was to use it with an average evaluation value to indicate the quality of the corresponding frame. Tests like DSCQS are widely accepted and used to evaluate subjective video effects. However, this test is affected by the actual problem. As a result, it will be very large for evaluators. This difference will be compensated in the process of repeated testing. A experienced evaluator (more than the video compression distortion) will give a more biased score than those non-empiric users. This means a big evaluation user group is needed, because there is no experience that will quickly find some of the characteristics of the changed video. These factors make the price of the DSCQS greater.
2.6.2 Objective Quality Measurement The complexity and consumption of the method of the main viewing quality of the main observations make the algorithm automatically measure the quality to be more attractive. The video compressed developer and video processing system highly depends on the so-called objective quality measurement method. The most widely used method is the PSNR method, but this method is limited to people have to find more complex methods to approximate human visual.
2.6.2.1 PSNRPSNR is used to describe the quality in the logarithm, and depend on the average difference (MSE) of the original signal and the changed signal: PSNR (DB) = 10 Log (10) (2 ^ n-1) ^ 2 / MSE
PSNR can be easily and rapidly calculated so that it has become a very popular way and is used to measure the quality of compression and decoding video images.
The PSNR method has several limitations. PSNR requires an original image as a comparison, but this may not be able to be implemented in all situations, making it difficult for the so-called original image without an impact. PSNR does not accurately give the subjective video quality value. For a given image or an image sequence, high PSNR usually shows high quality, low PSNR description quality is low. However, a particular PSNR value is not equal to the absolute subjective quality. The image that is subjectively sensible is not necessarily the PSNR value. In this case, the human observation sensitive area center is very sharp, but the letter is not necessarily high.
2.6.2.2 Other objective quality measurements Because of the limitations of the PSNR method, there are many jobs to develop more complex objective testing processes, and represent more accurate subjective information. Many different methods have been proposed. But there is no one that can be completely replaced by the main view. So there is no comparison, accurate, available approach. After aware of this, the ITU-T video quality expert group (VQEG) is committed to presenting an objective video quality assessment mechanism. Each step is to test and compare hidden models and test models. In March 2000, VQEG announced that there were 10 such test systems alternatives. Unfortunately, no one is considered suitable. VQED has conducted a second evaluation in 2003. This problem is hardly solved unless there is a very big breakthrough in automatic quality evaluation.
2.7 Conclusion Sampling analog signal forms a digital video signal, it has an accurate, high quality and various advantages such as storage transfer of digital media, but will occupy a comparison space. With later issues include space and time resolution, color representation, and video quality measurement issues. The next chapter will introduce some other basic theories of video compression.