Summary: Visual phone is an important application of multimedia communication. This article describes the key technologies of visual calls, the development process of the corresponding standards, and discusses the future development direction. Keywords: Visual Phone H.320 H.323 H.324, introducing calls as people daily life, indispensable communication tools in work, have been widely used in their convenient, shortcut, etc., but ordinary telephones can only provide Voice communication service. The visual phone allows people to listen to the other's sounds, but also the other's images. It not only applies to family life, but also widely used in various business activities, remote teaching, confidential monitoring, hospital nursing, medical diagnosis, scientific examination, etc., and there is a broad market prospect. In 1964, the Bel Laboratory in the United States proposed the first visual telephone solution. However, due to the limitations of various technical conditions, the visual phone has not made substantial progress. At the end of the 1980s, with the continuous development of communication, computer, voice and video decoding technology, the visual call has developed rapidly in the world. In order to achieve interconnection, the International Telecommunication Union (ITU-T) has introduced H.310, H.320, H.321, H.322, H in the 1990s. 323 and H.324 series multimedia communication standards. In the above criteria, the most widely used in H.320, H.324 and H.323. In recent years, due to the rapid development of IP network, the H.323-based video terminal and conference TV system have gradually dominated, and major manufacturers have launched H.323-based applications. It is worth noting that visual phones based on session initiation protocol (SIP) have begun. Based on the analysis of the above multimedia frame protocol, this paper gives a hardware and software solution based on the media processor TM1300. Most of the modules of the program are common to H.320, H.323 and H.324 systems. Whenever the hardware is modified, the software interface is used to use the corresponding control protocol, and different solutions for H.320, H.323 and H.324 systems can be obtained. Second, the basic structure of the visual phone 1. The basic structure of the visual phone is the H.32X series of ITU-T is a framework protocol to obey the different standard visual telephone terminals, similar structures. The basic structure of the visual phone includes video input / output units, video codecs, speech input / output units, speech codecs, delay units, data processing units (optional), system control units, multimedia data multiplexing / Dependers and network interface units. Different standard applicable networks are different, so there are different communication control protocols, multimedia data packaging protocols, and different network interface units, but video and voice input / output units, video codecs, speech codecs are similar. Voice and video compression technology is the core technology of visual telephones. As a consumer product, Visual Phone must be able to provide a good voice and video quality, and the channel bandwidth should be provided as small as possible. The development of speech coding technology and video coding technology is to expand around the above two points: while ensuring compressed voice and image quality, try to improve compression efficiency. We should also combine these two points when we choose speech and video compression standards. 2. Speech coding Technical voice communication is the most basic function of the video temperature. Restrictions on network conditions, the visual phone is usually operating at a lower rate rate. In order to adapt to this low yard ratio speech application, ITU-T introduces G.72X series voice compression standards. Where G.723.1, G.728, G.729 and G.729A have been widely used in visual telephones. Table 1 lists the techniques, code rates, delay and voice quality of each voice standard. G.723.1 can generate two rates of code streams, high-rate encoders use multi-pulse maximum natural quantization (MP-MLQ) algorithm, low rate encoder uses generation of digital excitation linear prediction (ACELP) algorithm.
G.729A is a simplified version of G.729, and the G.729A algorithm complexity is reduced by 50% compared to G.729, and the voice quality is slightly reduced, and the two standard encoded code streams can decode each other. When visual calls are communicated with ordinary telephones, G.711 standards are employed. G.711 is PCM encoding, only samples and quantizes speech signals, resulting in 64kbit / s code stream. The G.711 encoded has high voice quality, and the disadvantage is that the bandwidth of the occupied is also high. When actually selecting a voice compression standard, it is necessary to consider various factors such as bandwidth, delay, algorithm complexity. 3. Video coding technology video compression is the core technology in multimedia applications. The low-rate video compression standards launched by ITU-T have entered an important role in promoting the development and practicalization of visual telephones. H.261 is the first low rate video compression standard introduced by ITU-T, and the code rate is p × 64kbit / s, where P = 1 to 30, the image format is CIF and QCIF. The basic idea of the H.261 compression coding algorithm is to reduce time redundancy using predictive coding, and reduce spatial redundancy using transform coding. The algorithm is mainly composed of motion estimation, motion compensation, DCT transformation, quantization, and Hoffman coding. Each frame image is divided into an image layer, a macroblock group (GOB) layer, a macroblock (MB) layer, a block (block) layer, and is divided into I frame and P frame. Later, the H.263, H.264 standard inherited H.261's basic idea, and some improvements were proposed on the basis of H.261. Compared with H.261, H.263 has made improvements in the following aspects: more image format, half pixel motion estimation, different GOB structures, four optional modes, reduced head information overhead, using different VLC table, etc. Under the same image quality, since H.263 is improved in motion estimation and coding, the H.263 encoded is approximately 30% lower than H.261. In order to further improve the coding efficiency and anti-error performance of H.263, ITU-T increases some options on the basis of H.263, and the modified version is called H.263 , H.263 . Currently, H.263 is the most widely used video compression standard in a video telephone. In 2003, ITU-T passed a new video coding standard, namely H.264 standard. H.264 is divided into a flexible macroblock and block compared to H.263, and the motion estimation accuracy is further improved, and the motion estimate of 1/4 or 1/8 pixel precision can be employed. The H.261 and H.263 use DCT transformation, while H.264 is similar to the integer transformation of DCT. Under the same reconstruction image quality, the code rate after H.264 is 50% lower than H.263. H.264 is also greatly increased while increasing coding efficiency. It is estimated that the computational complexity of the encoding is approximately three times that of H.263, and the decoding complexity is approximately twice that of H.263. As the DSP chip processing capacity is further improved, H.264 will have become more and more widely used in multimedia communications such as visual calls. 4. Communication protocol ITU-T introduced H.32X series standard, with the same system framework. Including real-time transmission protocol RTP and RTCP to establish H.225 call signaling on both ends, session control protocols H.245 during the session process, and T.120 protocol used for data transfer. The difference is that the facing network is different, so there are different network interfaces, different signaling processes, as well as the package structure that is optimized for different networks, and the standards of specific parts are shown in Table 2. The reuse protocol specifies the packaging standards of video data, voice data, etc., and the role of the control protocol is negotiated between terminals, such as negotiation, speech coding standard negotiation, and channel bandwidth consultation, etc. Third, a single-machine solution based on the media processor TM1300 is currently popular visual telephone terminal including a single-machine terminal and a PC-based terminal.