PC-to-PC IP telephony achieve Author: Ruan help autumn Published: 2001/04/09
Thesis:
IP phone, also known as the Internet phone, and its development is very fast. This article is designed and implemented a software model of a computer-to-computer IP phone, explained in detail, and analyzes some key technologies for the collection and playback of the software, and the network transmission, and other key technologies. Matters and pointed out the shortcomings of the software and further work. Use this software on the LAN to do call experiments, sound quality, and delay to reach the effect of the phone, indicating that this software has reached the basic requirements of IP phones. Keywords: IP phone voice network
text:
The IP phone for PC to PC implementation Internet is the most widely used, fastest and fastest communication network today. This is a data packet switching method based on a data packet. The user data is packaged in the packet, and the packet further contains some additional information for routing, error correction, flow control, and the like. The data packets are each pass independently in the network. Due to the variation of the network, the data packet reaches the destination time is not fixed, non-real. Therefore, in general, the Internet is more suitable for data transmission. However, we know that the audio signal can also be transmitted as a data after the analog number conversion, so the voice sampling, quantization into a digital signal, and then packaged from the network, the two sides can also call, this is the IP phone . The IP phone is a modulus transition to analog speech signals. After encoding compression, the compressed frame is converted into an IP packet by a certain packaging rule, and the data is subjected to data decompression, and digital-to-mode conversion recovery. Thus the purpose of voice communication. Since the data network is allocated by statistical time division, any communication entity cannot be exclusively a channel, so the IP phone can greatly improve the utilization of network resources and reduce operating costs. The initial appearance of the IP phone was in early 1995, Vocalte has introduced an Internet Phone client software. Although the company has not yet proposed the concept of transmitting voice on IP, this is indeed the first successful commercial and marketization of IP phones. Prior to this, the voice transmission on IP is extremely difficult, and Vocaltec's first product is used for two PCs on the LAN. A number of network companies have begun to use the Internet to formally provide international and domestic long-distance telephone services. Especially in the United States, such business is widely carried out. Similar techniques can also be used for telecommunications service items such as long-distance fax (E-Fax). Due to the world's Internet communication costs, many users have begun to use long-distance telephone services through the Internet. Based on experimentation and research purposes, this paper implements a software from the computer to the computer to the computer. The software development environment is a Windows 98 platform, and the development tool uses Visual C 5.0. This software can implement text transfer between multiplayers and voice conversations between two people on the network. The software will be analyzed below. A software design achieves the key points and difficulties of this software as follows: (1) Take the collection and play of voice, that is, the soundtiver will collect the sound through the microphone and sound card to voice data, and play it through the sound card and headset. (2) Realize the compression and decompression of speech data, compress the collected speech data, and decompressed at the receiver. (3) Realize the transmission of voice data on the network. (4) Connection mode and control between multiple users. (5) Design of software overall structure and interactive interface. Considering the core functions of the software are transmitted voice, the design of the entire software should be designed around speech transmission. First analyze the process of voice transmission, as shown in Figure 1 is a one-way voice transmission process. The voice is first collected by the sound card, converts the voice into data, stores in memory, and then compresses the voice data through the CPU, and compresses the voice data, and finally send the voice data through the network card. The recipient receives the voice data through the NIC, first unzip the voice data by the CPU using the decompression algorithm, and then converts the data into voice through the sound card, and is played through the headset. This flowchart actually contains a solution for the top three problems in the above problems. Of course, this is only affirmed by theoretically, and the specific implementation will further discuss it. Figure two methods can usually use between multiple users of a voice transmission process.
The first method is to set up a server, all clients are connected to the server, connecting to each other through the server. The advantage of this method is that software control is relatively simple, and the user's connection is also convenient, and users only need to know the IP address of the server. But the disadvantage is also very obvious, it is more suitable for the company's commercial nature software, and ordinary personnel have no funds to buy and set up a server, and there is no time and energy to manage and maintain the server. The second method is not a server, and each two users can be connected to each other. As long as the other's IP address can be connected to the other party, others can connect with you, that is, each computer They are both a server and a client. The advantages and disadvantages of this method can be said to be exactly the opposite, and will not be described again. There are also some relatively complete software to use both methods, and users can choose one of them to connect with other users. From the above analysis, the author decided to use the second way to connect between the computer, so although the complexity of the software is added, it is in line with our actual situation. In the actual program, a linked list is used to save the information of the user connected to it, which takes full advantage of the disorder and dynamic increase, and delete the characteristics to represent the dynamic connection and depart of the disorderless user. This also solves the fourth problem in theory. The overall structure and data flow of the software are as shown in Figure 2. The user interactive interface is used to respond to the user's operation, prompting the user's important information, displaying text, playing voice, etc., implementing the software and user interaction. The user can enter text through the keyboard, and enter the voice through the microphone. The voice acquisition module is used to collect voice data, convert the analog speech signal into a digital signal, in order to reduce the delay time of the call, directly assign the memory block, and place the acquired data in the memory block to compress or transmit. Collection of voice is done by calling the API function of the sound card. The sampling format in this software is WAVE_FORMAT_PCM, mono, sample rate is 8K, and the sample quantization rate is 16 bits, that is, 16K bytes per second. The data transfer module uses VC's CSocket class, which will be transferred to the recipient to the voice acquisition data in the memory block through the TCP / IP protocol. The voice playback module converts the received voice data directly into an analog signal through the sound card to an analog signal, and play it. The voice compression module first provides high pass filtering, removes the DC component in the signal and 50 Hz ingredients, while gain control. A linear prediction analysis is performed, extract the linear pair of parameters, predict and quantify the line spectrum, using the line spectrum to construct a resonance peak sensation weighted filter, and the high-pass signal is opened, and the open loop is estimated. On the basis of closed loop-based sound estimation, the excitation codebook is searched on the basis of the parameters that have been acquired, and the excitation parameters are extracted. Quantify each parameter and frames. The compression ratio of this software reaches 1/20, ie, the speech data per second is 0.8k bytes. The speech decompression module extracts each parameter into the voice generating model from the voice frame to output a voice. The overall structure and data flow of the second software, we have set up the basic framework of the entire software, and the data flow of the software has been more thorough, the next step is the specific implementation. Second software implementation some of the main difficulties in the implementation of the specific analysis software below. 1. Voice collection and playback. In this software, the voice is converted directly into data, put in memory, rather than speaking voice files, and playing speech, also playing voice data directly, not playing voice files. Such benefits have been mentioned above, that is, omitting the time-time operation of reading and writing hard disks, improving the real-time performance of voice calls.
To complete the above voice operation, the easy-to-use senior multimedia voice function provided in the programming language is unqualified, only with some underlying speech functions, this type of function and the name of the structure are prefixed as "Wave". . The following is briefly analyzed the process of recording and playback. Considering that the processing and processes are basically similar, this paper is only analyzed as an example, as shown in Figure 3 is the recording process. The preparation of recording is mainly three points, open the recording device, obtain the recorder, specify the record format, allocate several memory, the size and number of memory, and the number below will be further analyzed. When you start recording, you will first provide all memory blocks to the recording device for recording, and the recording device will write voice data to memory, and the recording device will send a Window message mm_wim_data to the corresponding window. The notification program is related to the process, when the process usually processes the data in the memory, such as writing files, etc., where our processing is to compress the data and send the data, then deposit empty, return to The recording device is recorded, which forms a recording process of cycling. End the recording, release all memory blocks and turn off the recording device. FIG three recording process to the recording, for example, the recording function and the key sequence is as follows: WAVEFORMATEX waveformat; waveformat.wFormatTag = WAVE_FORMAT_PCM; waveformat.nChannels = 1; waveformat.nSamplesPerSec = 8000; waveformat.nAvgBytesPerSec = 16000; waveformat.nBlockAlign = 2; waveformat.cbSize = 0; waveformat.wBitsPerSample = 16; // the specified recording format int res = waveInOpen (& m_hWaveIn, WAVE_MAPPER, & waveformat, (DWORD) m_hWnd, 0L, CALLBACK_WINDOW); // open the recording apparatus waveInPrepareHeader (m_hWaveIn, m_pWaveHdr [i ], SIZEOF (WaveHDR)); // Prepare memory block recording WaveinAddBuffer (m_hwavein, m_pwardr); // Increase memory block WaveInstart (m_hwavein); // Start recording WaveInstop (m_hwavein); // Stop recording WAVEINRESET (m_hwavein); / / Clear the memory block WaveInClose (m_hwavein); // Turns off the recording device in the program processing of the recording and playback, pay special attention to two points. One is the size and quantity of the allocated memory. The size of the memory is directly related to the continuity and delay of voice in the IP phone, the more memory, the better the convergence, but the delay is getting worse, but the less memory is, the latency of the voice The smaller, but the closer continuity is worse, so we must weigh the spres and disadvantages, take a compromise, which requires repeated experiment. According to the author's experiment, the length of time for each memory block is quoted for 0.1 seconds, and the specific size is related to the speech format used. The number of memory is related to the size of the memory and the length of the processing time of the recording data for each memory. Be sure to ensure that the recording device has at least one memory for recording, which means that the recorded memory is timely return. The loop can be carried out smoothly. It can be seen that the number of memory is large and the distribution is small, and the processing time of the recording data of each memory is short, the number of allocations can be less, but in turn, the number of assigned is more.
Of course, you can allocate enough quantity, but too much resource, which reduces the efficiency of the program. According to the author's experiment, there is enough time for all memory blocks to record the length and 0.5 seconds, that is, if each memory block recording time is 0.1 seconds, 5 memory blocks can be assigned. There is also a particular important point in the program that the voice device's Window message is correspondingly, a series of messages sent by voice devices, such as mm_win_open recording device opens messages, mm_wim_close recording device closes messages, etc., we can respond to voice devices to open, close, Start recording and playback, stop recording and playback, recording a memory block when recording, and a memory block is put on a variety of events to perform related processing. It is important to note that these messages are not seen in the message sequence of the ClassWizard class tool of the VC, and you need to manually edit the message response macro and code. 2. The network transmission TCP / IP protocol of voice data is the most common protocol on the network. This software uses TCP / IP protocol to perform data voice network transmission. The CSocket class is a new class of Visual C 4.0, which is a package for the original Windows Socket API, which is more convenient than using the Windows Socket API. The establishment of a network is mainly as follows, one is to join Windows Sockets support in the program, and the other is to construct two new CSERVERSOCKET and CMSGSGSGSGSGSGSGSGSGSGSGSGSGSGSGSGSGKETs with CSocket. CSERVERSOCKET is used to receive the request for request connection, and the cmsgsocket is used to transfer data. The third is to establish a server side and a client. The following table shows the order in which the work needs to be done between the server and the client, and the specific function parameters can be used to access the connecting document. The server side first constructs an object of a CServersocket class to receive the request for the request connection, call the Listen member function of this object, indicating that the message is waiting for the connection, waiting for the client to issue a connection (Connect) message, when receiving this message After that, the CServersocket class response function will respond, and then construct a CMSGSGSGSGT class object (to transfer data), then call the CserveT member function of the CServersocket class to represent the connection application, if this function returns true value, then Indicates successful connection. The client side only needs to construct an object of a CMSGSocket class, call the Connect member function of this object, and apply for a connection. After establishing a connection in the above sequence, both the server side and the client calls the SEND function of the CMSGSocket object to send data. When the data is received, the CMSGSGSCKET class's OnReceive message response function will respond, and then the Receive function is called to receive data. Server and clients can communicate with data. Pay attention to the response of the network message in the program, such as receiving the data, have a client application connection, connected, the other party has been disconnected, etc., unlike the general message response function, it has been integrated into the member function of the CSocket class. No messages respond to the macro. If the member function on the CSocket class overloads the CSocket class in the CserveTSocket class, you can handle the message for the application connection, and the member function onRecEreive of the CSocket class can be processed in the CMSGSGSocket class, you can process messages that receive data. The member function onclose function of the CSocket class is overloaded, that is, the message that the other party has disconnected.
Also use the CSocket class member function getPeerName can get the other party's IP address, can be saved in the program, after the connection is used, the user does not have to re-enter. 3. The support sound card for a single-kind card has a single-way voice card and a dual-work card, and the dual working card can be recorded and played at the same time, while a single-way voice card can only engage in a time, either recording or play. For IP phones, of course, use a dual working card to achieve the effect of the phone. However, considering that the current majority of users' computer configuration is a single-way card, the software has also joined the support for single-computer card. The software can automatically detect the sound card of the computer, and use different working methods for different sound cards. When the sound card is a dual-operated voice card, it can be listened and said at the same time. When the sound card is a single mode, the user can control, switching between listening and speaking. The specific implementation method is that when one is a single-acting sound card, first inform the other party through the network, let the other party have psychological preparation. Then in the actual call, the other party is notified, the other party is notified, and the other party is automatically switched, and the corresponding way is entered. That is, when one party is said, the other is automatically listening, and vice versa, so that the two sides will not conflict. Specific programming is more complicated, but as long as considering thought, it is not difficult, and details will not be described here. The three software interface and the usage method software interface is as shown in Figure 4. The IP address bar above the interface can enter the IP address of the computer that wants to connect. Click the icon on the right side to represent the meaning of the connection, connect or unconward above the interface. There are tips in the text box. The user can also press the connection by the connection by the disconnection button to press the network connection to any connected object. The name of all users currently communicating with users on the drop-down box of online members. The text box below the interface can enter multi-line text. Select the send object in the drop-down box below the interface, the send object is all other people connected to the user, click the button on the left, you can send text to the selected object. . As shown below: The received text and send objects are displayed in the text box above the interface. Click on the phone button, similar to phone dialing, users can choose a voice call with him, the other party's computer will have a ringtone, and the other party can choose whether to turn on call, similar to the pick. If the other party agrees to call, the two sides can call with the microphone and headphones. If the sound card has a full-duplex function, that is, the sound card can be recorded and played at the same time, and the two sides can talk like a call. If the sound card does not have a full-duplex function, in fact, most of the sound cards in the current market do not have this performance, listen to and say only one of them, considering the actual situation, this software provides a single-way voice card. Support, you can automatically identify the single-duplex characteristics of the sound card, separately, The microphone icon represents that the speaker icon is listened to the function to switch. Of course, it is best to have a full-double working card, which can be said to and listen at the same time, so there is a phone effect. If the call is over, both sides can click on the stop icon next to the phone icon to press the New Talk to talk, similar to hanging up the phone. The user can use compression and non-compression in a compressed and non-compressed manner according to the quality of the network. It can achieve a good effect in a non-compressed manner in the local area network. Figure 4 software interface practice proves that this software is stable and reliable in the actual operation, and the sound quality and delay on the local area network basically reach the phone. The shortcomings of the four software and further working. The software has reached the basic requirements of IP telephones, but because of the limited time and conditions of the author, there are some shortcomings and need further improvement.