What is the benefit of adding voice capabilities? Roughly speaking, is for fun, it is suitable for all interesting fun
Applications, such as games. Of course, from a more serious perspective, it also involves the availability problem of the application. Note that this
I think about it is not only the insufficiency of visualization interface, but there is still some cases: some time, let your eyes
It is very inconvenient to open the current work, or even illegal. For example, suppose there is a browser with voice function, you will
You can browse your favorite websites while going out for a walk or drive to work.
From now on, the mail reader may be a more practical application of voice technology, with the help of the JavaMail API,
It is all possible. The mail reader can periodically check the inbox, then use the voice "You Have New Mail,
Would you like me to read it to you? "Causes your attention. According to similar ideas, we can
Consider a reminder with voice function, connect it to a calendar application: it will remind you in time "don't
Forget your meetings with the boss in 10 minutes! ".
Maybe you have been attracted by these ideas, or have your own better ideas, let us continue. First of all, I will
Describes how to enable the voice engine provided herein, so if you think the details of the voice engine are too complicated,
It can be used directly to ignore its implementation details.
First, try the voice engine
To use this voice engine, you must join the javatalk.jar file provided herein in ClassPath, then from
The command line runs (or calls from the Java program) com.lotontech.speech.talker class. If you transport it from the command
OK, the command is:
Java com.lotontech.speech.talker "h | e | l | oo"
If you are called from the Java program, the code is:
com.lotontech.speech.talker talker = new com.lotontech.speech.talker ();
Talker.SAYPHONEWORD ("H | E | L | OO");
Now, for the "H | E | L | OO" string provided on the command line (or call the SayphoneWord () method), you
Maybe there is anything. Let me explain it below.
The working principle of the voice engine is to connect the small sound sample, each sample is pronunciation pronunciation (English
A minimum unit of language. These sound samples are all called ALLOPHONEs. Each factor corresponds to one, two
Or three letters. It can be seen from the speech of "Hello" from the front "Hello", and some letters combination is obvious.
There are still some but not very obvious:
H - pronunciation is obvious
e - pronunciation is obvious
L - pronunciation is obvious, but pay attention to two "L" is shorten into one "L".
OO - You should read the pronunciation in "Hello", should not read the pronunciation in "BOT", "TOO".
Here is a list of valid phonemes:
A: such as Cat
B: such as CAB
C: such as Cat
D: If DOT
E: such as Bet
f: such as Frog
g: If Frog
H: such as HOG
i: such as PIG
J: JIG
K: such as KEG
l: such as LEG
M: such as METN: as Begin
o: such as Not
P: such as POT
R: such as ROT
S: such as SAT
T: such as SAT
u: such as PUT
v: such as Have
W: such as WET
Y: such as yet
Z: such as ZOO
AA: like fake
AY: such as hay
EE: such as Bee
II: such as HIGH
OO: such as Go
BB: B changes in the form, accent
DD: D changes in the form, accent
GGG: g of GGG: G.
HH: H changes in the form, accent
LL: L changes in the form, accent
NN: N changes in the form, accent
RR: R changes in the form, accent
TT: T variation form, accent
YY: Y is the form of change, the accent is different
Ar: such as Car
AER: such as Care
CH: such as Which
CK: such as Check
EAR: such as Beer
ER: such as Later
Err: such as Later (long "
NG: such as Feeding
Or: such as Law
OU: such as ZOO
Ouu: such as Zoo
OW: such as COW
Oy: such as BOY
SH: such as Shut
TH: such as Thing
DTH: such as this
UH: u 's variation form
Wh: such as WHERE
EN: such as Asian
When people talk, speech changes within the entire sentence. The change in tone makes the voice more natural, more infected
Force, the question and statement can be distinguished from each other. Consider the following two sentences:
IT IS FAKE - F | AA | K
Is IT Fake? - f | aa | k
Maybe you have gouaved that the way to increase the tone is to use uppercase letters.
The above is what you need to know when using the software. If you are interested in your future, please continue to read.
Second, implement the voice engine
The implementation of the voice engine includes only one class, four methods. It takes advantage of JAVA Sound API included in J2SE 1.3. in
Here, I am not ready to fully introduce this API, but you can learn its usage through instances. Java Sound API
Not a particularly complex API, the comments in the code will tell you how you must understand.
Below is the basic definition of the Talker class:
Package com.lotontech.speech;
Import javax.sound.sampled. *;
Import java.io. *;
Import java.util. *;
Import java.net. *;
Public Class Talker
{
Private sourceDatataLine line = null;
}
If Talker is executed from the command line, the following main () method will run as an entry point. MAIN () method gets the first one
Command line parameters, then pass it to the SAYPHONEWORD () method:
/ *
* Read the string of the representation specified in the command line
* /
Public static void main (string args [])
{
Talker Player = New Talker (); if (args.length> 0) Player.SAYPHONEWORD (Args [0]);
System.exit (0);
}
The SayphoneWord () method can be called either the above main () method, or it can be called directly in the Java program.
From the surface, the SayphoneWord () method is more complicated, in fact, not this. In fact, it simply traversed
Speech elements with words (speech elements in the input string are separated by "|"), through a sound output channel
Element is played out. In order to make the sound more natural, I put the end of each sound sample and the next
The start of the sound sample merges:
/ *
* Read the specified voice string
* /
Public Void SayphoneWord (String Word)
{
/ / Analog Byte array for the last sound structure
Byte [] previoussound = NULL;
/ / Sepive the input string into a separate phoneme
StringTokenizer ST = New StringTokenizer (Word, "|", FALSE);
While (st.hasmoretokens ())
{
/ / Structurally constructed the corresponding file name
String thisphonefile = st.nextToken ();
Thisphonefile = "/ allophones /" thisphonefile ". Au";
/ / Read data from the sound file
Byte [] thissound = getSound (thisphonefile);
IF (previoussound! = null)
{
// If possible, merge the previous phoneme and the current phoneme
INT MERGECOUNT = 0;
IF (PreviousSound.Length> = 500 && Thissound.length> = 500)
MergeCount = 500;
For (int i = 0; i
{
Previoussound [Previoussound.Length-Mergecount i]
= (Byte) (PreviousSound]
-mergecount i] trissound [i]) / 2);
}
// Play the previous phoneme
Plays (Previoussound);
// Put the truncated current phoneme as the previous phoneme
Byte [] news = new byte [thisssound.length-mergecount];
For (INT II = 0; II
Newsound [ii] = thissound [ii mergecount];
PREVIOUND = News;
}
Else
PREVIOUND = THISSOUND;
}
// Play the last phoneme, clean up the sound channel
Plays (Previoussound);
Drain ();
}
Behind Sayphoneword (), you can see it calls Plays () Output a single sound sample (ie a sound
Plain), then call the DRAIN () cleanup sound channel. Here is the code of Playsound ():
/ *
* This method play a sound sample
* /
Private void Plays (Byte [] DATA)
{
IF (Data.Length> 0) line.write (data, 0, data.length);
}
The following is the code of Drain (): / *
* This method cleans the sound channel
* /
Private void Drain ()
{
IF (Line! = null) line.drain ();
Try {thread.sleep (100);} catch (exception e) {}
}
Now I will look back in SayphoneWord (), there is still a method we have not analyzed, the getSound () method.
The getSound () method reads a pre-recorded sound sample from an AU file in byte data. To understand the number of minutes
According to the detailed overview of the audio format, initializing the Soucedataline and constructing byte data
Cheng, please refer to the comment in the code below:
/ *
* This method reads a phoneme from the file,
* Convert it into a BYTE array
* /
Private Byte [] getSound (String FileName)
{
Try
{
URL URL = Talker.class.getResource (filename);
AudioInputStream Stream = Audiosystem.getaudioInputStream (URL);
Audioformat Format = stream.getformat ();
// convert an Alaw / Ulaw sound into a PCM for playback
IF ((Format.Getencoding () == Audioformat.Encoding.ulaw) ||
Format.Getencoding () == audioformat.Encoding.alaw))
{
Audioformat TmpFormat = New Audioformat
Audioformat.Encoding.pcm_signed,
Format.getsamplerate (), format.getsamplesizeinbits () * 2,
Format.getChannels (), format.getframesize () * 2,
Format.getframerate (), true);
Stream = Audiosystem.GetaudioInputStream (TmpFormat, Stream);
Format = TmpFormat;
}
Dataline.info Info = New Dataline.info
Clip.class, Format,
((int) stream.getframelength () * format.getframesize ()));
IF (line == null)
{
// Output line is still not instantiated
/ / Can you find the right output line type?
Dataline.info outinfo = new dataline.info (SourceDataLine.Class,
Format);
if (! Audiosystem.islineSupported (Outinfo))
{
System.out.println ("Does not support the output line" of " Outinfo ");
Throw new Exception ("Does not support the output line" of " Outinfo ");
}
// Open the output line
LINE = (SourceDataLine) Audiosystem.getLine (Outinfo);
LINE.OPEN (Format, 50000);
Line.start ();
}
INT framesizeinbytes = format.getframesize ();
INT bufferlengthinframes = line.getBuffersize () / 8; int buffengthinbytes = bufferlengthinframes * framesizeinbytes;
Byte [] data = new byte [bufferLengthinbytes];
// Read byte data and count
INT numberTesread = 0;
IF ((NumBytesRead = stream.read (data))! = -1)
{
Int numBytesrmaining = NumBytesRead;
}
// cut byte data into a suitable size
Byte [] newdata = new byte [NumBytesRead];
For (int i = 0; i
NewData [i] = data [i];
Return newdata;
}
Catch (Exception E)
{
Return New Byte [0];
}
}
This is all code, including comments, a speech synthesizer of approximately 150 lines of code.
Third, text - voice conversion
The word to be read in the format of the voice element seems to be too complicated. If you want to construct a text (for example,
Application of web page or email, we hope to specify the original text directly.
After in-depth analysis, I provided a testic text in the ZIP file later in this article - voice conversion
class. Run this class, it will display the results of the analysis. Text-Voice Conversion class can be executed from the command line, as shown below:
Java com.lotontech.speech.converter "Hello there"
Output results such as:
Hello -> H | E | L | OO
There -> DTH | AER
If you run the following command:
Java com.lotontech.speech.converter "I like to read javaworld"
The output is:
i-> ii
Like -> L | II | K
To -> T | OUU
Read -> r | EE | A | D
Java -> J | a | v | a
World -> W | ERR | L | D
How does this conversion class work? In fact, my method is quite simple, the conversion process should be in a certain order
Use a set of text replacement rules. For example, for words "Ant", "Want", "WANTED", "Unwanted" and
"Unique", the replacement rules we want to apply may be:
Replace "* unique *" with "| y | ou | n | ee | k |"
Replace "* Want *" with "| W | O | N | T |"
Replace "* a *" with "| a |"
Replace "* e *" with "| e |"
Replace "* D *" with "| D |"
Replace "* n *" with "| n |"
Replace "* u *" with "| u |"
Replace "* t *" with "| t |"
For "unwanted", the output sequence is:
unwanted
UN [| | n | t |] ed (rule 2)
[| u |] [N |] [| N | T |] [| E |] [| D |] (rules 4, 5, 6, 7)
u | n | w | o | n | t | e | d (after deleting extra consequence)
You will see words containing letters "Wont" and the words containing letters "Ant" pronounced in different ways, will also see
Under the scope of the Special Example Rule, "Unique" is preferred as a complete word, thus "unique" word
Read as "Y | OU ..." instead of "u | n ...".
Conclusion: This article provides a convenient voice engine that can be run at any time, you can use it in your Java 1.3 app. If you carefully analyze the code, it provides you with a Javasound API playing sound.
The practical tutorial of the frequency chip break. To make it truly useful, you should consider the text - voice conversion technology, because
This is a real support basis for the text reading app mentioned earlier. To improve the effect of this document, you must
A huge replacement rule library must be constructed, and the priority of the application rules is carefully adjusted. I hope you more perseverance than me!