Computer technology is always moving towards the functionality, more convenient direction, and user needs will always be the driving force for technology advancement. The emergence of ASR and TTS makes people and the distance of the computer closer, and the human-machine interface is more natural. As the technology (not high in recognition rate) and people's habits, ASR has a distance from the real popularity. However, in the field of communication, due to the wide application of CTI, computer technology continues to integrate into the communication platform, which is in the development, popularization of fire. VoiceXML is a good example, its application is based on ASR and TTS. ASR, English full name is Automated Speech Recognition, an automatic speech recognition technology, which is a technology that converts people's voice into text. Voice identifies a multi-discipline cross-over, which is closely linked to many subjects such as acoustic, phonology, linguistics, digital signal processing theory, informationism, computer science. Due to the diversity and complexity of speech signals, current speech recognition systems can only be satisfactory under certain restrictions, or can only be applied to certain specific occasions. The performance of the speech recognition system is substantially due to the following 4 factors: 1. Identify the size and voice of the terms; 2. Quality of speech signals; 3. Single speaker is still more speaking; 4. Hardware.
TTS, English full name is TextTospEech, ie, a textual conversion, but also known as computer speech synthesis, its process and ASR just opposite, is to convert any text arbitrarily in the computer into natural smooth voice output. It is generally believed that the speech synthesis system includes three main components: text analysis modules, rhythm generation modules, and acoustic modules. At present, TTS technology can have reached the point of commercialization.
Introduction to SAPI
At present, there are many mature speech ASR and TTS products in the market, and most of them support secondary development, such as Microsoft's Speech Application SDK (SasDK), IBM DUTTY , etc. They can identify (generated) English, Japanese, and Chinese languages, DUTTY can even identify dialects in certain regions, such as Guangdong's dialects - Cantonese. Below we use SAPI as an example, simply introduce the development engine of ASR and TTS. Microsoft's SAPI is part of Windows and has been integrated inside Windows. Compared to other engines, his recognition rate is also relatively high. If you have adaptive adjustment, the identification rate can reach more than 90%, and his development kit can also get free of charge, and all document resources are also very complete, very convenient. Do second development. Since SAPI is developing as a separate component of Windows, the version update is also faster.
The latest SAPI 3.1 provides a COM-based advanced programming interface, and the application dends with the voice engine through these interfaces. SAPI integrates ASR and TTS functions in the same voice engine, TTS can synthesize text and files as voice, and ASR is converted to readable text or files.