Voice recognition brewing the second wave

zhaozj2021-02-16 104

Proudly is a company that provides voice technology. Adjusted the company's telephone switchboard, in the usual "invitation unit number" and "artificial service dial 0" voice prompt, plus a "which one of you". " In fact, users like such a user directly with the machine-talking system are very common in the United States. In public telephone booths throughout the US Street, as long as there is an AT & T voice recognition system identified, users only need to "Connect Operator Please", the keyword testing technology of the system can find Operator from the sentence, The phone is directly connected to the operator, and the system's identification rate exceeds 99%.

Such services are obviously more convenient than traditional call centers that enter the first level menu with a button. Especially in the mobile computing environment that is not suitable for keyboard and mouse input, voice input has greater development potential. Even in the office, speech recognition technology can also help some unwilling or cannot use a keyboard, the mouse user reduces a lot of wrist and finger repeat labor. However, such techniques are still very fresh for most people. Think about how many people use the mobile phone dialing function in the mobile phone?

Technical drop

The research work of speech recognition can be traced back to the Audry system of the AT & T Bel Laboratory of the 1950s, which is the first voice recognition system that can identify ten English numbers. The huge breakthrough in laboratory speech recognition is produced in the late 1980s: some spells of small words have a high recognition rate. At the same time, people finally broke through the three major obstacles of the big words, continuous voice and non-specific people in the laboratory. The three features are integrated into a system. The main reason for the breakthrough in speech recognition technology is the development of semiconductor technology, software technology and storage technology.

The first wave of voice technology started in the early 1990s, many famous large companies such as IBM, Apple, AT & T and NTT have a practical study of speech recognition systems. Voice recognition technology has a good evaluation mechanism, which is the accuracy of identification, and this indicator has been continuously improved in the late laboratory study in the 1990s. Some people in 1997 have come, the "speech era" is coming, and the merchants are also full of confidence: I hope that speech recognition ability can be the same as people. Voice recognition technology is so popular, people have to believe that voice technology will be everywhere.

However, it is not. In practical applications, "success" speech recognition system in the laboratory is far from robustness, flexibility, and adaptive ability. It is also difficult to meet the heart. Deng Yongqiang, vice president of proud of the VIA, said: "Excessive market expects to face the actual technical level and application status, it is inevitable to produce bubbles."

The technology is immature, in addition to the acceptance of the market, the speech recognition market does not use "hot" adjectives, and even the mature application of several speech recognitions can even be found in the market, so everyone has played "semi-finished technology" The idea, "It is created in an existing technical level". Taking the Chinese market as an example, manufacturers do not pursue the most perfect applications - "Writer", non-specific people, continuous speech, usage speech recognition, but to apply the currently mature part of the technology to actual Products. For example, the command-based voice recognition mainly in the medium and medium-sized word form, including the call center, voice dial, and embedded command control in the mobile device, and can reach a relatively high recognition rate for Mandarin.

"New Three Difficulties" of speech recognition

There are three basis for the sort of speech recognition system: vocabulary size, demand for speaking methods (divided into isolated words speech recognition and continuous speech recognition) and the degree of dependencies (divided into specific people and non-specific people's speech recognition) system). The development history of speech recognition technology is from simple to complex, capturing "old three" indicators one by one.

The simplest speech, isolated words, specific people's speech recognition technology is already very mature in the 1970s. Despite certain difficulties, it is currently "old three difficult" in the laboratory environment to reach the highest standard, that is, the writing machine. Microsoft said that the above-in-laws in the Office software can reach 93% in the standard Beijing accent input, and the first recognition rate reaches 93%. After debugging, it reaches 96%, and the new technology is constantly rising. The value of technology is that it serves applications. Although there is such a high recognition rate in the laboratory, the "new three difficulties" convex from various breakthroughs from various breakthroughs, and become a research focus.

First, dialects or accents will reduce speech recognition, and for Chinese who have eight major dialects, the application is more difficult. Associate Professor of the Phi Phi Technical Center of Tsinghua University, and Zheng Fang, the chairman and president of Beijing, the chairman and president of Beijing, is being followed by the problem. This year, at the annual voice technology seminar held by John Hopkins University, Dr. Zheng Fang made a proposal for dialect and accent issues, and the topic is set to "Dialectal Chinese". " The importance of the topic itself wins in more than a dozen proposals around the world, and becomes one of the three four families finally selected. Dr. Zheng said that "Mandarin Influen Influenced by Native Diact (Mandarin, which is subject to native spedition) will be possible to become a large project that requires four or five years.

The second of "New Three Difficulties" is background noise. The huge noise of people in public places is indeed, even if it is tapped in the laboratory environment, the microphone will become a background noise. It will destroy the spectrum of the original voice, or hide the original voice portion or all, resulting in a decrease in the identification rate. In practical applications, noise cannot be avoided. The question to be solved is how to separate the raw voice from the background noise, which will make the identification system strong adaptability.

The third is the problem of "oral". It involves both natural language understanding and related acoustics. The ultimate goal of speech recognition technology is to let users be as nature as "human dialogue" when "human-machine conversation". Once the user performs voice input in a manner with people, the syntax of the spoken language is difficult to analyze and understand the analysis and understanding of semantics. In addition, when people communicate in spoken, even if human brain analysis is considered to be very standard pronunciation, it has changed to acoustic angle when speech recognition, and the problem of casual pronunciation is large.

"New Three Difficult" is three factors that have the greatest impact on the identification rate in the application of speech recognition technology. In addition, the identification system needs to be adapted to different types of transport channels due to the transmission, the identification system needs to be adapted to different types of transport channels. The speech recognition technology itself has a large development space.

How far is the second wave?

Many years of research makes the core technologies of domestic Chinese speech recognition and international differences, and since last year, the application of Chinese speech recognition technology began to emerging a lot, and the industrialization process kicked off. Deng Yongqiang compared the current situation of the voice recognition industry as the Internet in 1995. "The tree is growing, green leaves, still waiting for flowering results." He believes that the Chinese speech recognition industry has passed 1998, the 1999 turning point - from 0 to 1, will have a new turning point in next year to form a new development peak. So, can Chinese bring a second wave to speech?

Another wave of surges must be established above the forming market. In March of this year, HarrisinteraCtive was entrusted to the US ordinary citizens' survey in the US regular citizens. The results of the survey are: voice technology has been widely accepted and used; users have highly evaluated speech technology they used to use; speech has more advantages compared to other interactions. It can be seen that voice technology has considerable accepted levels in ordinary citizens in the United States. On this basis, the US speech recognition market has gradually made a large number of competition. The domestic application is late in the application, leading to the current domestic users feel that there have been several years in foreign countries. As early as 1997, IBM entered the Chinese voice recognition market has spent a lot of money to cultivate the market, so that everyone knows what is voice technology. Perhaps it is because of this, domestic manufacturers in market promotion do not exclude international companies with powerful and power. Dr. Xu Bo, President of Beijing Zhongke Monoscience Technology Co., Ltd., believes that "there is no competition between the giants such as IBM and Microsoft. If they make breakthroughs in technology, or form the product, or embed voice recognition to themselves In the strong product, it is not necessarily a bad thing. In this way, there will be more people to accept speech recognition technology, and the market scale will be larger. "

Dr. Zheng Fang said, "Key issues are how to apply existing technology to actual; how to get more feedback from the market to improve technology, then cut new technologies into the product, constantly find new Combine points. "Study how to combine with industry is an eternal topic. Voice recognition forming a industry and develops inadvertently to unden it, and whether it can usher in new peak next year to see how domestic manufacturers apply. The industry believes that domestic manufacturers have to take each other, "arch", "arch", China's speech recognition market, singlely rely on a company.

After the peak

If the development peak of this speech recognition is formed, its main feature will be a breakthrough in voice recognition technology in different applications, and gradually widely popular. Sustained development after peak is a problem that various manufacturers must consider. The root cause of voice technology in the 1990s has reached a certain degree of development, that is, the technical level at the time is not expected to match speech recognition. Today, on the one hand, it is possible to apply technologies in certain areas, such as the Chinese union of the Chinese Academy of Sciences Automation Research Institute, which recognizes the national key laboratories, and its Chinese continuous voice, non-specific people's Writing system Mandarin system The error rate can be controlled within 10%, representing the world's leading level. With core technology, it has become a bottom gas source that has developed steadily in China.

On the other hand, domestic manufacturers pay more attention to the application of existing levels of technology to actual products, rather than waiting for the various aspects of the technology to put on the market. For example, the automatic switchboard that utilizes the name dial is based on speech recognition based on whom. Although the object is a continuous voice, it does not pursue the identification and understanding of the whole sentence, but uses "keyword detection" techniques to match the part of the interested interest in the input continuous voice, thereby achieving identification the goal of. Regardless of this approach, from the alternate of foreign companies, technology research and development and application phase, avoiding the market's high expectations of technology, and suppresses the production of foams.

When the forward-looking heavyweight IT company has exposed the opportunity of China's speech recognition market, it has been fully prepared after the market development peak, and people have also seen the "second wave" in may arise. The post behind. In the 1950s, IBMs on the start of speech recognition technology have continuously introduced new ViaVoice versions, applying voice technology to PDA, smart. It also provides voice development tool SDK, hoping to create a full range of voice platforms. However, regardless of the second wave of Chinese speech recognition, the speech recognition product itself will bring benefits to IBM. At the nearest "IBM Asia Pacific E-Commerce Solution Asia Tour China Station" Beijing Seminar, IBM also demonstrated how to use speech recognition to control home appliances. Microsoft has also integrated speech recognition technology into multiple leader products, including Office and Windows XP, the latest speech recognition server software SPEECH Server is prepared to be released in the first half of 2004. The software allows users to operate computers using voice commands, and companies can also use it to create a service similar to the automatic telephone system. For voice recognition technology, Microsoft hopes that the Microsoft Asian Institute of Asia established in 1998, vigorously investing in the research and full support of voice development tools, supporting the SALT specification (speech application language tag standard, possibly forming the previous voice-extensible mark language VoiceXML Converse state).

Microsoft certainly saw the rapid development of China's speech recognition market, but its vision was further in the peak of this technology application. Dr. Zhang Yisai, director of the Voice Group Director of the Microsoft Asian Institute, said that "voice technology will be everywhere, everywhere will have the use of voice platform, this technology is one of the focus of the Microsoft Asian Institute. Microsoft is brewing voice For longer technology, five years, fifteen years, perhaps longer - technical maturity is determine factor. "Among the eyes of Microsoft lies in speech recognition technology to allow users to operate computers in the most natural way - this is Bill cover Natural computing from Hats.

转载请注明原文地址:https://www.9cbs.com/read-19750.html

9cbs

New Post(0)