Optimization of multi-reference frame prediction technology in H.264
Source: Application of electronic technology: Dong Haiyan Zhang Qishan
Abstract: A fast multi-reference frame selection algorithm for low complexity is proposed. The simulation results indicate that the algorithm can greatly reduce the calculation complexity of multi-reference frame prediction under the premise of retention rate distortion performance.
Keywords: H.264 Motion Estimation Multi-reference Frame Prediction Sports Search
Motion estimation is an effective way to remove redundant information between adjacent frames of video sequences, which has an important impact on the entire video encoding effect. The latest international video coding standard H.264 / AVC is still developed by two international standardized organizations (ISO / IEC MPEG and ITU-T vCEG) still adopts a plurality of block-based hybrid encoding frames, but many new Coding technology, such as multi-reference frame prediction motion estimation, thereby providing higher coding efficiency.
Previous video coding standards such as MPEG-4 and H.263 are only supported by reference frame predictions. H.264 is different from the previous standard, using multi-reference frame prediction technology, so that the motion search ranges from the original reference frame to multiple decoded reference frames, which usually finds a more accurate match, thereby helping Get higher coding efficiency. However, many reference frame prediction techniques also have some disadvantages, and the storage space and computing power requirements are higher.
The Lagranda Rate Distortion introduced in H.264 is the lowest rate of distortion costs in all candidate reference frames in all candidate reference frames. Lagrang's rate of distortion is the optimal mode of choice, can be expressed as:
Where m = (mx, my) t represents a motion vector, and REF represents a reference frame, JMotion (S, C, M, REF / F1λmotion) indicates the rate distortion costs under the motion vector M and reference frame REF, S The original video signal, C (M, REF) is a reconstructed video signal under the motion vector m and the reference frame REF, and λmotion is the Lagrangian multiplier, R (MP) represents the number of bits encoding the motion vector M, R (REF) represents the number of bits, SAD (S, C, M, REF) (SUM ABSOLUTE DIFERENCE) encoding reference frame Ref is the absolute difference between the original signal and the reconstruction signal, and is calculated using (2).
Where B1, B2 represent the level of the block and the number of vertical images, the value of 16, 8 or 4 can be taken.
Set M represents the range of motion vector search, if only one reference frame is allowed to predict, then each macroblock needs to search (2 × m 1) 2 candidate points; if n reference frame is allowed to predict, then each macroblock There are N × 2 × m 1) 2 candidate search points. This means that the amount of motion search is multiplied by the total number of reference frames in the prediction. The more the number of reference frames allow, the calculation amount of the motion search, the greater the encoding time.
In order to reduce the problems present in the above multi-reference frame prediction technique, this paper uses the high correlation between adjacent frames to propose a fast multi-reference frame selection algorithm for low complexity.
1 Fast multi-reference frame prediction algorithm
In H.264 motion estimation, the multi-reference frame motion search process is performed under 7 different blocks and shape conditions, and the license is sequentially performed from 16 × 16 to 4 × 4. For each block encoding size, you need to find a reference frame with the lowest rate of rate at all possible reference frames and its corresponding motion vectors. The multi-reference frame motion search process first begins with a higher probability of reference frame REF (0), which is processed until the reference frame REF (N-1) farthest from the currently coded frame.
Since there is a strong time domain correlation between the adjacent frame between the video sequence, the optimal match is typically located at a reference frame adjacent to the current coded frame, which uses REF (0) to represent this reference frame; the rest Reference frame (expressed as ref (i), i = 1, 2, ..., N-1, the maximum number of reference frames set is n) The selected probability is far less than the first reference frame Ref. 0). In order to further understand the case where different reference frames are adopted in the actual video sequence, this paper selects several typical video test sequences, and uses Lagrand's rate distortion optimal search strategy, and counts different references under different reference frames. The selected probability of the frame, the experimental data is shown in Table 1. As can be seen from Table 1, in all reference frames, REF (0) has the highest probability of the optimal reference frame, the probability of up to 88.67%, while the remaining reference frames are selected far less than REF (0). Table 1 Reference frame probability statistics
Test sequence total reference frame count Ref (0) probability REF (i) probability, (i = 1, ..., n) forman_qcif.yuv587.2012.80silent_qcif.yuv594.385.6432.36akiyo_cif.yuv1096.123.88coastg_cif. YUV1096.093.91STEFAN_CIF.YUV1090.619.39AV1090.619.39average 88.6711.33
As can be seen from the above analysis, the reference frame ref (0) has the highest probability of the final prediction frame, and the motion search result obtained in this reference frame has an important impact on the entire coding performance. Therefore, it is considered that the larger the search range in Ref (0), the greater the improvement of the entire encoding performance. Compared, the probability of the remaining reference frame is selected is very low, and each additional reference frame increases a large part of the calculation amount, the amount of calculation of motion search in these reference frames is reasonably reduced under certain conditions,
No significant impact on the entire coding performance. Since there is a strong informational correlation between adjacent frames, motion information obtained in the previous reference frame can be used to predict the search center of the next reference frame. According to the central biasing theorem of the motion vector, the optimal motion vector is usually located in a small range around the search center, using a spiral search order, simply searching this small portion.
Reducing the amount of multi-reference frame prediction calculation amount is shown in Figure 1. In the most important reference frame REF (0), the search range is large to ensure higher prediction accuracy; in the rest of the important reference frame, you can choose a smaller search range, and the previous frame has been used. The resulting movement information predicts the next frame of the search center. This can reduce the amount of calculation and will not significantly affect the entire encoding performance.
This paper proposes a quick algorithm for reducing the quantity of multi-reference frame prediction. This algorithm not only utilizes the aerial domain correlation of the motion vector to predict the first reference frame REF (0) search center, i.e., using the median motion vector of the three airspace adjacent blocks around the current macroblock, it is predicted by REF (0). The search center; and the time domain correlation of the motion vector, that is, the search center of the next reference frame REF (I) is corrected using the motion information obtained by the previous frame REF (I-1). Since there is a high time domain correlation between adjacent inter-frames, the optimal motion vector of the next frame is likely to be located in the corrected search center attachment, simply matching a few candidate points around the search center. This omits a large part of the computation. 2 simulation results and analysis
In order to test the performance of this algorithm, this paper uses H.264 reference software JM7.0 as an experimental platform. The experimental parameters are set as follows: 6 typical video test sequences (Ailent, Mobile, Foreman, Template, Bus, and Suzi), CIF (352 × 288) or QCIF (176 × 144) image format, each sequence 150 frame, frame rate For 30F / s, the motion vector search range is 16, the quantization parameter is 32, and the Hadamard Transform, the GOP structure is IPPP. Table 2 lists the comparison results of the algorithm proposed in this paper and the original Ragranda rate distortion of the Ragranda at the original. In Table 2, ΔPSNR indicates averaging average peak signal to noise ratio (PSNR), the unit is DB, and ΔBits represents a percentage of the average code rate growth, and ΔTIME indicates a reduction percentage of motion estimation encoding time. ΔTime and δBITS are obtained by equipped (3) and formula (4):
Where Toriginal and Borigind respectively indicate the motion estimation encoding time and the total number of motion used when using the original method; tproposed and bproposed respectively indicate the motion estimation coding time and the total bits used in this paper. Table 2 comparison of experimental results
Test sequence of reference frames 5 reference frames ΔPSNRΔbitsΔtimeΔPSNRΔbitsΔtimeSilent_quif.yuv-0.0220.166.67-0.0170.357.66Mobile_qcif.yuv0.004-0.225.57-0.1332.8722.97Foreman_cif.yuv-0.0380.438.05-0.0991.7011.82Football_qcif.yuv-0.0169 .228.32-0.0310.6821.74Template_cif.yuv0.0030.284.52-0.0480.9516.50Bus_qcif.yuv-0.0642.896.03-0.1663.9614.21Suzi_quif.yuv0.0172.326.94-0.0430.3811.39Average-0.0172.156.60-0.0771.5615.18
As can be seen from the experimental data in Table 2, the algorithm proposed in this article can reduce the average of 15.18% of the motion estimation time than the original mode selection method of the raw Search Rando, and the average PSNR is reduced by 15.18%. Just falling 0.077dB, the rate increase is only 1.56% (5 reference frames). In addition, compare the experimental results of the number of different reference frames, it is easy to find that the number of allowed reference frames, the more encoded time saving.
The simulation results show that the algorithm proposed herein can save the computational complexity of H.264 multi-reference frame motion prediction, and the value of the rate distortion performance is small. This is very advantageous for real-time implementation of H.264 encoding.