Implement voice engine with Java

xiaoxiao2021-03-05 36

What is the benefit of adding voice capabilities? Roughly speaking, is for fun, it is suitable for all interesting fun

Applications, such as games. Of course, from a more serious perspective, it also involves the availability problem of the application. Note that this

I think about it is not only the insufficiency of visualization interface, but there is still some cases: some time, let your eyes

It is very inconvenient to open the current work, or even illegal. For example, suppose there is a browser with voice function, you will

You can browse your favorite websites while going out for a walk or drive to work.

From now on, the mail reader may be a more practical application of voice technology, with the help of the JavaMail API,

It is all possible. The mail reader can periodically check the inbox, then use the voice "You Have New Mail,

Would you like me to read it to you? "Causes your attention. According to similar ideas, we can

Consider a reminder with voice function, connect it to a calendar application: it will remind you in time "don't

Forget your meetings with the boss in 10 minutes! ".

Maybe you have been attracted by these ideas, or have your own better ideas, let us continue. First of all, I will

Describes how to enable the voice engine provided herein, so if you think the details of the voice engine are too complicated,

It can be used directly to ignore its implementation details.

First, try the voice engine

To use this voice engine, you must join the javatalk.jar file provided herein in ClassPath, then from

The command line runs (or calls from the Java program) com.lotontech.speech.talker class. If you transport it from the command

OK, the command is:

Java com.lotontech.speech.talker "h | e | l | oo"

If you are called from the Java program, the code is:

com.lotontech.speech.talker talker = new com.lotontech.speech.talker ();

Talker.SAYPHONEWORD ("H | E | L | OO");

Now, for the "H | E | L | OO" string provided on the command line (or call the SayphoneWord () method), you

Maybe there is anything. Let me explain it below.

The working principle of the voice engine is to connect the small sound sample, each sample is pronunciation pronunciation (English

A minimum unit of language. These sound samples are all called ALLOPHONEs. Each factor corresponds to one, two

Or three letters. It can be seen from the speech of "Hello" from the front "Hello", and some letters combination is obvious.

There are still some but not very obvious:

H - pronunciation is obvious

e - pronunciation is obvious

L - pronunciation is obvious, but pay attention to two "L" is shorten into one "L".

OO - You should read the pronunciation in "Hello", should not read the pronunciation in "BOT", "TOO".

Here is a list of valid phonemes:

A: such as Cat

B: such as CAB

C: such as Cat

D: If DOT

E: such as Bet

f: such as Frog

g: If Frog

H: such as HOG

i: such as PIG

J: JIG

K: such as KEG

l: such as LEG

M: such as METN: as Begin

o: such as Not

P: such as POT

R: such as ROT

S: such as SAT

T: such as SAT

u: such as PUT

v: such as Have

W: such as WET

Y: such as yet

Z: such as ZOO

AA: like fake

AY: such as hay

EE: such as Bee

II: such as HIGH

OO: such as Go

BB: B changes in the form, accent

DD: D changes in the form, accent

GGG: g of GGG: G.

HH: H changes in the form, accent

LL: L changes in the form, accent

NN: N changes in the form, accent

RR: R changes in the form, accent

TT: T variation form, accent

YY: Y is the form of change, the accent is different

Ar: such as Car

AER: such as Care

CH: such as Which

CK: such as Check

EAR: such as Beer

ER: such as Later

Err: such as Later (long "

NG: such as Feeding

Or: such as Law

OU: such as ZOO

Ouu: such as Zoo

OW: such as COW

Oy: such as BOY

SH: such as Shut

TH: such as Thing

DTH: such as this

UH: u 's variation form

Wh: such as WHERE

EN: such as Asian

When people talk, speech changes within the entire sentence. The change in tone makes the voice more natural, more infected

Force, the question and statement can be distinguished from each other. Consider the following two sentences:

IT IS FAKE - F | AA | K

Is IT Fake? - f | aa | k

Maybe you have gouaved that the way to increase the tone is to use uppercase letters.

The above is what you need to know when using the software. If you are interested in your future, please continue to read.

Second, implement the voice engine

The implementation of the voice engine includes only one class, four methods. It takes advantage of JAVA Sound API included in J2SE 1.3. in

Here, I am not ready to fully introduce this API, but you can learn its usage through instances. Java Sound API

Not a particularly complex API, the comments in the code will tell you how you must understand.

Below is the basic definition of the Talker class:

Package com.lotontech.speech;

Import javax.sound.sampled. *;

Import java.io. *;

Import java.util. *;

Import java.net. *;

Public Class Talker

{

Private sourceDatataLine line = null;

}

If Talker is executed from the command line, the following main () method will run as an entry point. MAIN () method gets the first one

Command line parameters, then pass it to the SAYPHONEWORD () method:

/ *

* Read the string of the representation specified in the command line

* /

Public static void main (string args [])

{

Talker Player = New Talker (); if (args.length> 0) Player.SAYPHONEWORD (Args [0]);

System.exit (0);

}

The SayphoneWord () method can be called either the above main () method, or it can be called directly in the Java program.

From the surface, the SayphoneWord () method is more complicated, in fact, not this. In fact, it simply traversed

Speech elements with words (speech elements in the input string are separated by "|"), through a sound output channel

Element is played out. In order to make the sound more natural, I put the end of each sound sample and the next

The start of the sound sample merges:

/ *

* Read the specified voice string

* /

Public Void SayphoneWord (String Word)

{

/ / Analog Byte array for the last sound structure

Byte [] previoussound = NULL;

/ / Sepive the input string into a separate phoneme

StringTokenizer ST = New StringTokenizer (Word, "|", FALSE);

While (st.hasmoretokens ())

{

/ / Structurally constructed the corresponding file name

String thisphonefile = st.nextToken ();

Thisphonefile = "/ allophones /" thisphonefile ". Au";

/ / Read data from the sound file

Byte [] thissound = getSound (thisphonefile);

IF (previoussound! = null)

{

// If possible, merge the previous phoneme and the current phoneme

INT MERGECOUNT = 0;

IF (PreviousSound.Length> = 500 && Thissound.length> = 500)

MergeCount = 500;

For (int i = 0; i

{

Previoussound [Previoussound.Length-Mergecount i]

= (Byte) (PreviousSound]

-mergecount i] trissound [i]) / 2);

}

// Play the previous phoneme

Plays (Previoussound);

// Put the truncated current phoneme as the previous phoneme

Byte [] news = new byte [thisssound.length-mergecount];

For (INT II = 0; II

Newsound [ii] = thissound [ii mergecount];

PREVIOUND = News;

}

Else

PREVIOUND = THISSOUND;

}

// Play the last phoneme, clean up the sound channel

Plays (Previoussound);

Drain ();

}

Behind Sayphoneword (), you can see it calls Plays () Output a single sound sample (ie a sound

Plain), then call the DRAIN () cleanup sound channel. Here is the code of Playsound ():

/ *

* This method play a sound sample

* /

Private void Plays (Byte [] DATA)

{

IF (Data.Length> 0) line.write (data, 0, data.length);

}

The following is the code of Drain (): / *

* This method cleans the sound channel

* /

Private void Drain ()

{

IF (Line! = null) line.drain ();

Try {thread.sleep (100);} catch (exception e) {}

}

Now I will look back in SayphoneWord (), there is still a method we have not analyzed, the getSound () method.

The getSound () method reads a pre-recorded sound sample from an AU file in byte data. To understand the number of minutes

According to the detailed overview of the audio format, initializing the Soucedataline and constructing byte data

Cheng, please refer to the comment in the code below:

/ *

* This method reads a phoneme from the file,

* Convert it into a BYTE array

* /

Private Byte [] getSound (String FileName)

{

Try

{

URL URL = Talker.class.getResource (filename);

AudioInputStream Stream = Audiosystem.getaudioInputStream (URL);

Audioformat Format = stream.getformat ();

// convert an Alaw / Ulaw sound into a PCM for playback

IF ((Format.Getencoding () == Audioformat.Encoding.ulaw) ||

Format.Getencoding () == audioformat.Encoding.alaw))

{

Audioformat TmpFormat = New Audioformat

Audioformat.Encoding.pcm_signed,

Format.getsamplerate (), format.getsamplesizeinbits () * 2,

Format.getChannels (), format.getframesize () * 2,

Format.getframerate (), true);

Stream = Audiosystem.GetaudioInputStream (TmpFormat, Stream);

Format = TmpFormat;

}

Dataline.info Info = New Dataline.info

Clip.class, Format,

((int) stream.getframelength () * format.getframesize ()));

IF (line == null)

{

// Output line is still not instantiated

/ / Can you find the right output line type?

Dataline.info outinfo = new dataline.info (SourceDataLine.Class,

Format);

if (! Audiosystem.islineSupported (Outinfo))

{

System.out.println ("Does not support the output line" of " Outinfo ");

Throw new Exception ("Does not support the output line" of " Outinfo ");

}

// Open the output line

LINE = (SourceDataLine) Audiosystem.getLine (Outinfo);

LINE.OPEN (Format, 50000);

Line.start ();

}

INT framesizeinbytes = format.getframesize ();

INT bufferlengthinframes = line.getBuffersize () / 8; int buffengthinbytes = bufferlengthinframes * framesizeinbytes;

Byte [] data = new byte [bufferLengthinbytes];

// Read byte data and count

INT numberTesread = 0;

IF ((NumBytesRead = stream.read (data))! = -1)

{

Int numBytesrmaining = NumBytesRead;

}

// cut byte data into a suitable size

Byte [] newdata = new byte [NumBytesRead];

For (int i = 0; i

NewData [i] = data [i];

Return newdata;

}

Catch (Exception E)

{

Return New Byte [0];

}

This is all code, including comments, a speech synthesizer of approximately 150 lines of code.

Third, text - voice conversion

The word to be read in the format of the voice element seems to be too complicated. If you want to construct a text (for example,

Application of web page or email, we hope to specify the original text directly.

After in-depth analysis, I provided a testic text in the ZIP file later in this article - voice conversion

class. Run this class, it will display the results of the analysis. Text-Voice Conversion class can be executed from the command line, as shown below:

Java com.lotontech.speech.converter "Hello there"

Output results such as:

Hello -> H | E | L | OO

There -> DTH | AER

If you run the following command:

Java com.lotontech.speech.converter "I like to read javaworld"

The output is:

i-> ii

Like -> L | II | K

To -> T | OUU

Read -> r | EE | A | D

Java -> J | a | v | a

World -> W | ERR | L | D

How does this conversion class work? In fact, my method is quite simple, the conversion process should be in a certain order

Use a set of text replacement rules. For example, for words "Ant", "Want", "WANTED", "Unwanted" and

"Unique", the replacement rules we want to apply may be:

Replace "* unique *" with "| y | ou | n | ee | k |"

Replace "* Want *" with "| W | O | N | T |"

Replace "* a *" with "| a |"

Replace "* e *" with "| e |"

Replace "* D *" with "| D |"

Replace "* n *" with "| n |"

Replace "* u *" with "| u |"

Replace "* t *" with "| t |"

For "unwanted", the output sequence is:

unwanted

UN [| | n | t |] ed (rule 2)

[| u |] [N |] [| N | T |] [| E |] [| D |] (rules 4, 5, 6, 7)

u | n | w | o | n | t | e | d (after deleting extra consequence)

You will see words containing letters "Wont" and the words containing letters "Ant" pronounced in different ways, will also see

Under the scope of the Special Example Rule, "Unique" is preferred as a complete word, thus "unique" word

Read as "Y | OU ..." instead of "u | n ...".

Conclusion: This article provides a convenient voice engine that can be run at any time, you can use it in your Java 1.3 app. If you carefully analyze the code, it provides you with a Javasound API playing sound.

The practical tutorial of the frequency chip break. To make it truly useful, you should consider the text - voice conversion technology, because

This is a real support basis for the text reading app mentioned earlier. To improve the effect of this document, you must

A huge replacement rule library must be constructed, and the priority of the application rules is carefully adjusted. I hope you more perseverance than me!

转载请注明原文地址:https://www.9cbs.com/read-39002.html

9cbs

New Post(0)