[Repost] Application of Regular Expression in Java

xiaoxiao2021-04-01  220

Now JDK1.4 has finally have their own regular expressions API package. Java programmers can be left to find a third-party regular expression library, and we now come and understand the later Sun offers. - - It is true for me.

1 Introduction:

Java.util.Regex is a class library package that matches a string with a scheduled pattern.

It consists of two classes: pattern and matcher pattern a Pattern is a regular expression after compiled performance mode.

Matcher A Matcher object is a status machine that makes a matching check for strings based on the Pattern object as a matching mode.

First, a PATTERN instance has formulated a similar regular expression of the syntax used to Perl, and then a Matcher instance performs a string matches under the mode control of this given Pattern instance.

Let's take a look at these two classes separately:

2.pattern class:

The Pattern method is as follows: Static Pattern Compile (String Regex)

Compile a given regular expression and give a Pattern class

Static Pattern Compile (String Regex, INT FLAGS)

Also, but add the specified of the Flag parameters, the optional Flag parameters include: Case Innsitive, Multiline, Dotall, Unicode Case, Canon EQ

Int flags ()

Returns the matching Flag parameter of the current Pattern.

Matcher matcher (Charsequence Input)

Generate a given Matcher object

Static Boolean Matches (String Regex, Charsequence Input)

Compiling a given regular expression and match the input string with the regular expression, the method is suitable for this regular expression, which is only one match, which is only one match job, because of this situation There is no need to generate a Matcher instance.

String Pattern ()

Returns the regular expression compiled by the Patter object.

String [] split (charsequence intrut)

Segment the target string in accordance with the regular expression contained in the Pattern.

String [] split (charsequence intrut, int limit)

The effect is the same, the increased parameter LIMIT is to specify the number of segments, such as setting the LIMI to 2, then the target string will be divided into two segments according to the regular expression.

A regular expression, that is, a string has a specific character, you must first compile as a Pattern class instance, this pattern object will use the Matcher () method to generate a Matcher instance, then you can use the Matcher instance The compiled regular expression is based on the work of the target string, and multiple Matcher can share a Pattern object.

Now let's take a simple example, then analyze it to learn how to generate a Pattern object and compile a regular expression, and finally divide the target string according to this regular expression:

Import java.util.regex. *;

Public class replacement {

Public static void main (string [] args) throws exception {

/ / Generate a Pattern while compiling a regular expression

Pattern P = Pattern.Compile ("[/] ");

// Press the string by "/" with the split () method of PATTERN

String [] result = p.split (

"Kevin Has Seen" Leon "Seveal Times, Because It Is A Good Film." "/ Kevin has seen" this killer is not too cold "a few times, because it is a" "

"Good movie ./ Noun: Kevin.");

For (int i = 0; i

System.out.println (Result [i]);

}

}

The output is:

Kevin Has Seen "Leon" Seveal Times, Because It is a good film.

Kevin has seen "this killer is not too cold" several times, because it is a good movie.

Noun: Kevin.

Obviously, the program segments the string press "/", and then use the split (charsequence input, int limit) method to specify the number of segments, the program changes are:

TRING [] Result = p.split ("Kevin Has Seen" Leon "Seveal Times, Because IT Is A Good Film./ Kevin has seen" this killer is not too cold "a few times, because it is a good movie ./ Noun: Kevin. ", 2);

The parameter "2" in this indicates that the target statement is divided into two.

The output result is:

Kevin Has Seen "Leon" Seveal Times, Because It is a good film.

Kevin has seen "this killer is not too cold" several times, because it is a good movie. / Name: Kevin.

From the above example, we can compare the Java.util.Regex package in constructing the Pattern object and compiling the specified regular expression. The JAKARTA-ORO package introduced in the previous one is done in the same work. The Jakarta-ORO Package first constructs a PatternCompiler class object then generates a Pattern object, and then compiles the regular expression to the Pattern class with the compile () method of the PatternCompiler class:

PatternCompiler OROCOM = New Perl5compiler ();

Pattern Pattern = OROCOM.COMPILE ("Regular Expressions");

Patternmatcher matcher = new perl5matcher ();

But in the java.util.regex package, we only need to generate a Pattern class, use its compile () method to achieve the same effect:

Pattern P = Pattern.Compile ("[/] ");

Therefore, it seems that Java.util.Regex constructors are more concise than Jakarta-ORO and is easily understood.

3. Matcher class:

The Matcher method is as follows: Matcher appendreplacement (StringBuffer SB, String Replacement)

Replacing the current matching substring to a specified string and add a replaced sub-string and a string segment after the previous match substring before it is previously added to a StringBuffer object.

StringBuffer appendtail (StringBuffer SB)

Add the remaining strings after the last matching work to a StringBuffer object.

Int end ()

Returns the last character of the currently matched substring in the index position in the original target string.

Int end

Returns the position of the last character of the substrings that match the group specified in the match mode.

Boolean Find ()

Try looking for the next matching sub-string in the target string.

Boolean Find (int Start) re-sets the Matcher object and attempts to find the next matching substring from the specified location in the target string.

String group ()

Returns all substrs for the group matching to the current lookup

String Group (int group)

Returns the substring content that matches the specified group obtained by the current lookup

Int groupcount ()

Returns the number of matching groups obtained by the current lookup.

Boolean loopsat ()

Detect whether the target string starts with a matched substring.

Boolean matches ()

Try to match the entire target character exhibition, which is only the true value is returned when the entire target string is fully matched.

Pattern Pattern ()

Returns an existing matching mode of the Matcher object, that is, the corresponding Pattern object.

String ReplaceAll (String Replacement)

Alternately replace all the target strings with both the single-mode matching substrings with the specified string.

String Replacefirst (String Replacement)

Replacing the first string of the first target string with existing patterns to the specified string.

Matcher reset ()

Reset the Matcher object.

Matcher Reset (Charsequence Input)

Reset the Matcher object and specify a new target string.

Int Start ()

Returns the location of the current lookup of the subsequent start character in the original target string.

int Start (int 201)

Returns the position of the first character in the original target string to be obtained and the specified group matches.

(The explanation of the light look is not very bad? Don't worry, it will be easier to understand the examples.)

A Matcher instance is used to match the target string based on existing mode (the regular expression compiled by a given Pattern), all of which are provided by the Charsequence interface, so The purpose of doing can support matching of data provided from diversified data sources.

Let's take a look at the use of each method:

★ matches () / loowsingat () / find ():

A Matcher object is generated by a Pattern object to call its matcher () method, once the Matcher object is generated, it can perform three different matching findings:

Matches () method Attempts to match the entire target character extension, which is only the true value is returned when the entire target string is fully matched.

The Lookingat () method will detect if the target string starts with a matched substring.

The Find () method is trying to find the next matching subster in the target string.

The above three methods will return a boolean value to indicate that success or not.

★ ReplaceAll () / appendreplacement () / appendtail ():

The Matcher class also provides four ways to replace the matching substrings to specify strings:

ReplaceAll ()

Replacefirst ()

Appendreplacement ()

appendtail ()

ReplaceAll () and replacefirst () are simple, please see the explanation of the above method. We mainly focus on the appendreplacement () and appendtail () methods.

AppendReplacement (StringBuffer SB, String Replacement) Replaces the current matching substring to a specified string, and adds the replaced substring and the string segment after the previous matching substring is previously added to a StringBuffer object, and appendtail. StringBuffer SB) Method Adds the remaining string after the last matching work to a StringBuffer object.

For example, there is a string FatcatFatfat, assuming existing regular expression mode is "CAT", first matching appendreplacement (SB, "DOG"), then the content of StringBuffer SB is fatdog, which is Fatcat. Cat Replaced with DOG and adds the content before the matching substrings to the SB, and the AppendReplacement (SB, "DOG") is called after the second match, then the content of the SB changes to fatdogfatdog, if finally calls AppendTail (SB ), Then the final content of the SB will be FatDogfatdogfat. Still a little blurry? Then let's take a simple program:

// This case will change "Kelvin" in the sentence to "kevin"

Import java.util.regex. *;

Public class matchertest {

Public static void main (string [] args)

Throws exception {

/ / Generate a Pattern object and compile a simple regular expression "kelvin"

Pattern P = Pattern.Compile ("kevin");

// Generate a Matcher object with the matcher () method of the Pattern class

Matcher M = P.matcher ("Kelvin Li and Kelvin Chan Are Both Working In Kelvin Ches KelvinSoftShop Company");

StringBuffer SB = new stringbuffer ();

INT i = 0;

// Use the find () method to find the first matching object

Boolean Result = m.find ();

// Use a loop to find all Kelvin in the sentence and replace it and add the content to SB

While (result) {

i ;

M.AppendReplacement (SB, "Kevin");

SYSTEM.OUT.PRINTLN ("" i "secondary matches SB content is:" SB);

/ / Continue to find the next matching object

Result = m.find ();

}

// Finally call the appendtail () method to add the last matching residual string to SB;

M.AppendTail (SB);

System.out.println ("Call M.Appendtail (SB) After the final content of SB is:" sb.tostring ());

}

}

The final output is:

The content of the SB after the first match is: Kevin

The content of SB after the second match is: Kevin Li and Kevin

The content of SB after the third match is: Kevin Li and Kevin Chan Are Both Working in Kevin

The contents of the SB after the fourth match is: Kevin Li and Kevin Chan Are Both Working in Kevin Ches Kevin

The final content of SB after calling M.AppendTail (SB) is: Kevin Li and Kevin Chan Are Both Working In Kevin Ches KevinsoftShop Company.

I saw the use of this routine for AppendReplacement (), and appendtail () is clearer. If it is still not sure it is best to write a few lines of code test.

★ Group () / group (int group) / groupcount (): This series of methods is similar to the matchresult .group () method in Jakarta-ORO in the above introduction (please refer to the contents of JAKARTA-ORO), It is necessary to return the substrs that match the group, the following code will be well explained:

Import java.util.regex. *;

Public class groupterest {

Public static void main (string [] args)

Throws exception {

Pattern P = Pattern.Compile ("(CA)");

Matcher M = P.matcher ("One Cat, Two Cats In The Yard";

StringBuffer SB = new stringbuffer ();

Boolean Result = m.find ();

System.out.println ("This time lookup gains the number of matching groups:" m.GroupCount ());

For (INT i = 1; i <= m

}

}

The output is:

The number of matched groups is: 2

The substrs of the first group are: CA

The sub-string content of the second group is: T

Other methods of Matcher objects are better understood because of the limited understanding, readers should be programmed.

4. A small program for testing an email address:

Finally, let's take a look at the Email address, which is used to verify whether the characters contained in an input Email address are legal, although this is not a complete Email address inspection program, it cannot verify the possible situation At the time, you can add the desired function on it based on it.

Import java.util.regex. *;

PUBLIC CLASS Email {

Public static void main (string [] args) throws exception {

String Input = args [0];

// Detect whether the input Email address is illegally symbol "." Or "@" as the start character

Pattern P = Pattern.Compile ("^. | ^ @");

Matcher M = P.matcher (Input);

IF (M

// Detect whether to start with "www."

p = pattern.Compile ("^ www.");

m = p.matcher (input);

IF (M

// Detect whether to include illegal characters

P = pattern.Compile ("[^ a-za-z0-9. @ _- ~ #] ");

m = p.matcher (input);

StringBuffer SB = new stringbuffer ();

Boolean Result = m.find ();

Boolean deletedillegalchars = false;

While (result) {

// If you find an illegal character, then set the mark.

deletedillegalchars = true;

// If the illegal character is included, if the illegal character is double quotes, then eliminate them, add it to SB

M.AppendReplacement (SB, ");

Result = m.find ();

}

M.AppendTail (SB);

INPUT = sb.toString ();

If (deletedillegalchars) {system.out.println ("Enter an illegal character such as a colon, comma, please modify" in the Email address.

System.out.Println ("Your current input is:" args [0]);

System.out.Println ("The legal address after modification is similar:" input);

}

}

}

For example, we entered in the command line: java email www.kevin@163.net

Then the output will be: Email address cannot be WWW. Start

If the input email is @ kevin @ 163.net

Then the output is: the email address cannot be in. Or @ as the start character

When the input is: cgjmail # $% @ 163.net

So output is:

Enter an email address contains illegal characters such as colon, comma, please modify

Your current input is: cgjmail# worthless@163.net

The legal address after modification should be similar: cgjmail@163.net

5. to sum up:

This article describes the classes in JDK1.4.0-beta3 in the class library - Java.util.Regex, and its methods, if combined with the Jakarta -oro API described in the previous one, readers will be more likely to master The use of the API, of course, the performance of the library will continue to expand in the future, I hope that the readers of the latest information will be appreciated by the next website to Sun.

6. Conclusion:

Originally, I would like to write more about the regular expression library to pay for a payment, but I feel that since I have a free and excellent regular expression library, why bother to find the payment, I believe many readers are thinking about it: So I am interested in learning more other third-party regular expressions, friends who can find themselves to find them online or to see the URLs provided in the reference.

Reference

Help document for java.util.regex

REGULAR Expressions and The JavaTM Programming Language written by Dana Nourie and Mike McCloskey

Need more third party regular expressions and applications developed based on them http://www.meurrens.org/ip-links/java/regex/index.html

转载请注明原文地址:https://www.9cbs.com/read-131296.html

New Post(0)