ASR and TTS

xiaoxiao2021-03-06 43

Computer technology is always moving towards the functionality, more convenient direction, and user needs will always be the driving force for technology advancement. The emergence of ASR and TTS makes people and the distance of the computer closer, and the human-machine interface is more natural. As the technology (not high in recognition rate) and people's habits, ASR has a distance from the real popularity. However, in the field of communication, due to the wide application of CTI, computer technology continues to integrate into the communication platform, which is in the development, popularization of fire. VoiceXML is a good example, its application is based on ASR and TTS.

ASR, English full name is Automated Speech Recognition, an automatic speech recognition technology, which is a technology that converts people's voice into text. Voice identifies a multi-discipline cross-over, which is closely linked to many subjects such as acoustic, phonology, linguistics, digital signal processing theory, informationism, computer science. Due to the diversity and complexity of speech signals, current speech recognition systems can only be satisfactory under certain restrictions, or can only be applied to certain specific occasions. The performance of the speech recognition system is substantially due to the following 4 factors: 1. Identify the size and voice of the terms; 2. Quality of speech signals; 3. Single speaker is still more speaking; 4. Hardware.

TTS, English full name is Text to Speech, ie, a textual conversion, also known as computer speech synthesis, and its process and ASR are just the opposite, which converts any text arbitrarily in the computer into natural smooth voice output. It is generally believed that the speech synthesis system includes three main components: text analysis modules, rhythm generation modules, and acoustic modules. At present, TTS technology can have reached the point of commercialization. At present, there are many mature speech ASR and TTS products in the market, and most of them support secondary development, such as Microsoft's Speech Application SDK (SasDK), IBM DUTTY , etc. They can identify (generated) English, Japanese, and Chinese languages, DUTTY can even identify dialects in certain regions, such as Guangdong's dialects - Cantonese. TTS basic structure

(1) Linguistics processing

It plays an important role in the literary conversion system, the main simulative man's understanding of the natural language - text regulation, the semantic analysis, grammatical analysis and semantic analysis, so that the computer can fully understand the text, and give The various pronunciation tips required for the latter two parts.

(2) rhythm processing

For synthetic speech, the audio segment is planned, such as sound, pitch and sound, etc., so that the synthetic voice can express the language correctly, sounds more natural.

(3) Acoustic treatment

According to the requirements of the resulting results of the first two parts, the synthesis speech is output.

TTS further development direction

TTS will develop in the following directions.

Further improve the quality of speech synthesis and achieve more fluent and natural extent.

Further study the conversion function of the sound, so that TTS technology can realize voice output of various tones (including different genders, different ages, etc.).

Provide TTS core technologies and solutions for industry, especially CTI and embedded systems.

TTS processing flow

Transplanting TTS technology to other operating systems such as Linux, UNIX, or ported to other embedded operating systems, such as Palm OS, Hopen, etc., consider hardware implementation of TTS technology.

转载请注明原文地址:https://www.9cbs.com/read-72843.html

9cbs

New Post(0)