Use CTI to implement with Web conversation
There are now billions of telephone terminals in the world, in addition, more than 200 million mobile phones have been sold to the world. As people's own habits, through the exchange of talks, use listening and saying that people are more willing to accept and acquire information.
The combination of mobile communication technology and data communication is provided to people access the network everywhere, but only WAP is the only platform for us to build mobile commerce? The development of CTI technology provides us with a new way.
CTI technology progress
After the effort, the TTS, TEXT To Speech has achieved great progress, realizing an automatic language analysis understanding, and allows TTS users to increase more rhythm, tone in the speech, making TTS systems The voice is closer to the voice.
In the automatic speech recognition system (ASR) field, the automatic voice recognition system is developed from the mimic of the entire word and develops to the diagram of the phoneme level. The entire word mimic matching system, or more or less to rely on the speaker, and there are only few vocabulary. The current practice is that the vocabulary of the automatic voice recognition system is composed of an alphabet based on a sound piece. It is to be pointed out that this vocabulary is limited by different languages. Based on this approach, in a wide sound row, the speech can be identified and picking out and identifies. When identifying a word, each phoneme will pick out from the input of the system, and compare it with the existing phoneme and word template after the stitching combination. Such a template can be produced very quickly by TTS, that is, through the input of text, to generate the required template, and very economical stored. Now many systems can even support the "hot plug" of the identification template, such as adding an employee's name to the employee database database, without stopping the entire system.
Through these efforts, the recognition of phonemes greatly reduces the ASR's dependence, and makes it easy to build large and easy modified speech recognition dictionaries to meet the needs of different application markets. After this is successful, today's developers are adding more precision complex, intelligent, high-level linguistics to processes to the ASR system while adding a consideration of language context environments in ASR. Through the identification of the type of grammatum structure and the front-rear relationship, and to determine certain words (the word window), the probability of a specific location in the conversation will make the accuracy of the system.
Achievements made by VoiceXML
On May 23, the World Wide Web Consortium (W3C) accepted a Voice Extensible Markup Language specification version 1.0 (VoiceXML 1.0) as an instance.
VoiceXML is derived from AT & T, IBM, Lucent, and Motorola for many years of research and development. Since the release of VoiceXML 1.0 in March, the forum members have expanded to more than 150 companies.
VoiceXML 1.0 Specifies W3C-based industrial standard XML, providing a smart API for developers, service providers, and equipment manufacturers for voice and telephone applications. VoiceXML standardization will simplify the creation of personalized interface on the Web, enable people to access information and services on the website through voice and telephone, like the central database like today, to access the enterprise internal network, manufacturing new Voice access device.
Finally, a variety of forms of interactivity can eventually be integrated with micro browsers. For example, a travel application, the user tells his starting point and end point and its preferred flight time, which is very difficult to input for PDA. The fused miniature browser responds to the input to give him a menu selected by him. When he chooses an appointment flight, it is only necessary to say "third" ... implement voice input, the graphical interface output. What is VoiceXML
First let's take a look at the model of VoiceXML. (see picture 1)
A file server, for example, a web server, processes a request from a terminal application, which has been processed by a VoiceXML interpretation and VoiceXML interpretation. In response, the server generates a VoiceXML file, in reply, to process the VoiceXML interpreter.
The execution platform is controlled by the VoiceXML interpretation and VoiceXML interpreter. OE Execute the platform generate event response user's action (speech or character input) and system events (for example, the timer overflow). Part of these events is performed in accordance with the corresponding VoiceXML file in accordance with the interpretation of the VoiceXML interpretation, and other VoiceXML interpretation is controlled.
The VoiceXML interpreter is a computer program that explains a voiceXML file, booting and controlling the interaction between users and execution platforms. VoiceXML Interpretation is also a computer program that explains a VoiceXML file with a VoiceXML interpreter and can interact with the VoiceXML interpretation with the execution platform.
The execution platform refers to a computer that supports VoiceXML defined interactions, which is to load the corresponding software and hardware, such as ASR, TTS.
The main goal of VoiceXML is to make a large number of applications, rich content on the Web, allowing the interactive voice interface to all enjoy. In this process, VoiceXML hopes to liberate application developers from lowest programs and resource processing work. VoiceXML can utilize people's client / server ways that people have been very familiar with, and integrate voice services and data services.
One of the voice services here is seen as a series of interactive voice conversations performed between the user and the execution platform. Dialogue is provided by a file server, and the file server may be an device outside the execution platform. The file server provides all of the service logic, the database access, the system runs and generates a dialog. In VoiceXML, the conversation refers to a interactive role, and the user's operation is already specified in the VoiceXML file.
A VoiceXML file specifies that each interactive dialog is to be guided by the VoiceXML interpreter. The user's input will affect the interpretation of the conversation, and the user's input will be collected as a request, submit it to the file server. The file server may allow users to continue their conference through other dialogue by answering another voiceXML file. Here, the meeting refers to the contact between the user and the execution platform, like a telephone communication of a user and voice response system, a meeting to associate a VoiceXML file that is not a next.
As a marker language, VoiceXML is able to do:
1. Minimize the interaction between clients / servers through multiple interactions specified in each file.
2. Implementation developers are unrelated to software and hardware details on the software and system platforms.
3. Separate the code that interacts with the user (in VoiceXML) from the service logic (CGI script).
4. To enable the provided service to serve, these services are required to span different execution platforms. For content service providers, tool providers, and platform providers, VoiceXML is a public language. 5. The simple interaction is very easy to use, requiring the provided voice interface to support complex dialogue.
Although VoiceXML is working hard to adapt to the needs of most speech answering services, as a very strict service, it is best to achieve a special application software to achieve an excellent control level.
The VoiceXML language describes human-computer interaction communication through voice response systems, including: Synthetic Voice Output (TTS), audio file output, voice input identification, DTMF input recognition, voice input recording, telephone function like a call Transfer, etc.
VoiceXML provides characters and voice input collections, and assigns the request variable assigned to the file definition, and makes a decision after the user answers. VoiceXML determines that the file may be connected to another file via a general resource marker (URI).
VoiceXML language makes the system don't have to worry about very serious computing, and the database runs the pressure. These are set to execute outside the file interpretation, such as a dedicated file server. Conventional service logic, management form, session generation and session sequence are set outside of the file interpretation. VoiceXML provides a connection between the URI completion file and also submits the data to the server script with the URI. VoiceXML does not require files to clearly assign and unassign session resources or parallel processing. The allocation and release of resources and the control of concurrent line processing are completed by the execution platform.
What request should be implemented to support the VoiceXML interpretation.
Document acquisition: Interpreter language is expected to get the file to make the VoiceXML interpretation work. In some cases, the file request is an interpretation of the self-voiceXML file, and other requests are events that are generated outside the VoiceXML range, such as an integral call.
Audio output: Execute the platform to provide audio output by audio file or TTS. When supporting two ways, the platform must be free to arrange TTS and audio output. The audio file is mounted by the URI, and the language does not have detailed specify the fixed format of the audio file.
Audio input: The execution platform needs to be able to discover and report the input of characters and sessions, and rely on a timer to control the interval of input detection, this timer is defined by the VoiceXML file. The audio input must be able to report the user through the input of characters (e.g., DTMF). It must be able to dynamically accept data of speech recognition syntax. Some is the grammar data that VoiceXML must contain; the other involved speech syntax data is obtained through a URI. The recognition of the speech must be able to achieve dynamic upgrades based on voice input. Voice input must be able to record audio signals for users from users. The execution platform must be able to make the recording into a system's demand variable.
Product reviews
The following introduces several foreign manufacturers' products, but now most of the ASR and TTS systems cannot support Chinese.
IBM
IBM and NOKIA have established alliances' relationships to develop new needs to catch up with mobile Internet. The first is that Nokia uses IBM's Viavoice Voice Dialbook. IBM Distributes Nokia's WAP Gateway and integrates it into its universal calculation middleware.
VoiceTimes (Voice Technology Initiative for Mobile Enterprise Solutions) Details Details Digital Recording and Speech Recognition Applications. This idea is to improve the speech as a general interface for mobile devices, whether it is from digital recording equipment to mobile phones and PDAs. IBM is developing a VoiceXML web browser, providing a voice entry, users can access a WebSphere web application server, implement browsing a bookstore, looking for books, get prices, browse banks, check banks, buy books and other applications. Lore
Lucent's solution includes its own ASR, TTS engine and your own board engine.
Lucent LTTS 3.0 can not support Chinese according to the input text, converted into English, French, and other languages. Can teach system to tell some very difficult words. LASR 3.0 is used to make voice input and identification. LTTS 3.0 is part of Lucent's own wireless data server, based on this, mobile communication operators can provide unified news, news, weather forecasting services. This information can be converted between http, fax, voice, email, printing graphics with fax, using voice to read file content.
Lucent's voice handling card supports ISA / EISA, PCI, and Compact PCI. Among them, ISA / EISA speech processing card, 48M memory, T1 interface, can be upgraded to support 5 T1, support ASR, TTS.
LUCENT SPEECH Server that Lucent LUCENT SPEECH SERVER can support VoiceXML applications. The server uses Lucent's own Compact PCI voice card to support up to 192 channels of speech recognition, support TTS and other applications, serving operators and OEM vendors. The first application of the server will be used to run the VoiceXML interpretation. Alternatively, the automatic waiter, call screen service (recorded by the person name, and play to the called user, ask if it is turned on, form a database, determine the phone you want to answer), personal intelligent assistant service, etc.
Motorola
As one of the earliest vendors that support VoiceXML, Motorola finally hopes to access the Web through three ways: one is through the browser on the ordinary PC, the second is through the micro browser on the handheld device (mobile phone) via WAP To access, the third is to use voice.
Motorola's hardware device is a VOX gateway, both an ASR, TTS and a telephone interface to present VOXML (Motorola's VoiceXML version). It acts as an intermediary between telephone and Internet text. The voice browser is built in the voice gateway server, and the gateway uses a standard Internet protocol to access the Internet.
At the same time, Motorola also provides a mobile application development tool MADK. This tool enables mobile applications to create multiple end user interfaces, VOXML voice interfaces, and WML data interfaces. In VOXML, there is an HTTP link to facilitate the application of the simulation network to access the VOXML; the application emulator is responsible for managing the synthesis engine of ASR and TTS based on the agent-based automatic voice recognition. Applications developed by Madk will run on the new mobile Internet exchange platform of Motorola (MOBILE Internet Exchange).
Nuance
Nuance has its own voice recognition system, including voice recognition engines and development tools, can help third-party developers develop applications.
Nuance browser and voice-activated servers called Voyager. Now it is more similar to a personal information assistant, users can browse from a site from a site to another, check the schedule, make an appointment dinner table, and read the map to receive the driving wizard service. Although its function is not much better than the general personal information assistant, the user's input is through the ASR, the output of the system is through TTS, and everything is under the control of VoiceXML. Voyager's ASR / TTS server will be sold to ISP and operators. V-Builder is a tool developed by Nuance to convert HTML developers into VoiceXML. V-Builder will appear as a syntax conversion and tips record.