Java 101
Regular expression simplifies the code matching code
Exploring the elegance of regular expressions in mode matching in text processing occasions.
summary
Text processing often involves a matching of Pattern. Although Java's Character and Assorted String classes provide low-level Pattern-Matching support, this support generally brought complex code. In order to help you write simple Pattern-Matching code, Java provides Regular Expression. After introducing you the terms and java.util.Regex package, Jeff Friesen Explores has a regular expression structure supported by the Pattern class of that package. Then he examines has the method of Pattern and an additional Java.util.Regex class. As an end, he provides a practical application of a regular expression.
To view the terminology list, prompt and warning, new Homework, last month HOMEWORK's answer, this article's related materials, please visit Study Guide. (6,000 Words;
February 7, 2003
)
By Jeff Friesen, Translated by Humx
Text processing often requires code that matches the specific Pattern. It allows text to retrieve, Email Header, creation, creation from a custom text of normal text (for example, with "dear mr. smith" replacing "dear customer"), and so on. Java supports Pattern Matching by Character and Assorted String class. Since Low-Level support generally brings complex Pattern-Matching code, Java provides regular expressions to make a simple code.
Regular Expressions often makes novices confuse. However, this article scattered most confusion. In the introduction of the regular expression terminology, the class in the java.util.regex package, and after the sample program for Regular Expression Construction, I explore has supported the Regular Expression Construction of many Pattern. I also examine makes it into other classes in Pattern and java.util.Regex packages. A Practical's regular expression app ended my discussion.
Note
The long history of Regular Expressions is beginning to automatically control the principles and formal language theory in the field of computer science theory. Its history continues to UNIX and other operating systems, where regular expressions are often used as tools in UNIX and UNIX-LIKE: like awk (a creator, Aho, Weinberger, And Kernighan, name, can Perform a text analysis process programming language, EMACS (a development tool), and GREP (one matching regular expression in one or more files, tools for printing in order to globally.
What is regular expression? A Regular Expression, is also a Pattern (Template) describing a string set by KNown As Regex or Regexp. This pattern determines what kind of string belongs to this collection, which consists of text characters and metammatics (Metachacters, characterized by characters with special rather than character). In order to identify the process of matching the text - string satisfies a regular expression - called pattern matching.
Java's Java.util.Regex Pack The Pattern matching: Pattern object by Pattern, Matcher class, and PatternsyntaxException, is compiled by the Known As Patterns, which is a regular expression.
Matcher objects, or Matcher, which implements the Java.lang.Charsequence interface and positions the Matcher's engine as a character sequence as a text source.
The PatternsyntaxException object describes illegal regex patterns.
Listing 1 Introduce these classes:
Listing 1. Regexdemo.java
// regexdemo.javaimport java.util.regex. *;
Class regexdemo {public static void main (string [] args) {if (args.length! = 2) System.err.Println ("Java Regexdemo Regex text"); Return;} pattern p; try {p = pattern.Compile (Args [0]);} catCH (PatternsyntaxException E) {system.err.println ("Regex Syntax Error:" E.GETMESSAGE ()); System.err.Println ("Error Description:" E.GETDESCRIPTION )); System.err.println ("Error Index:" E.GETIndex ()); System.err.Println ("Erroneous Pattern:" E.GETPATTERN ()); returnTERN () STRING S = CVTLINETERMINATORS (Args [1]); matcher m = p.matcher (s); system.out.println ("regex =" args [0]); System.out.Println ("text =" s); system.out. Println (); while (m.find ()) {system.out.println ("Found" m.Group ()); System.out.Println ("Starting At Index" M.Start () "and Ending at index " m.end ()); System.out.println ();}} // Convert / n and / r character sequences to their single character // equivalents static String cvtLineTerminators (String s) {StringBuffer sb = new StringBuffer (80); int oldindex = 0, newindex WHILE ((newIndex = s.indexof ("// n", oldindex))! = -1) {sb.append (s.Substring (oldindex, newIndex)); OldIndex = newIndex 2; sb.append (' / n ');} sb.append (s.Substring (OldIndex)); s = sb.tostring (); sb = new stringbuffer (80); OldIndex = 0;
While ("// r", oldindex)! = -1) {sb.append (s.substring (oldindex, newIndex); oldindex = newIndex 2; sb.append ('/ R ');} sb.append (s.Substring (OldIndex)); Return Sb.toString ();}} regexdemo's public static void main (string [] args) method validates two command line parameters: one pointing out regular expressions The other means that the text is pointed out. After creating a pattern, this method converts all text parameters, New-line and carriage-return line-terminator character sequences are actual Meanings. For example, a new-line character sequence (represented by a backslash) is converted into a new-line character (represented by a number). After outputting Regex and converted command line text parameters, Main (String [] args method created a Matcher from Pattern, and then found all matches. For each Match, the position of the characters and information it appears is output.
In order to complete the pattern match, RegexDemo called the different methods of the java.util.Regex package. Don't let you understand these methods now; we will explore them in the articles behind. More importantly, compile Listing 1: You need regexdemo.class to explore the Pattern's Regex structure.
Explore Pattern's Regex Construction
The Pattern's SDK document provides a part of the regular expression structure. Unless you are an Avid regular expression user, a reading of the original document will make you confused. What is the difference between Quantifiers, Greedy, Reluctant, and Possessive Quantifiers? What is Character Classes, Boundary Matchers, Back References, and Embedded Flag Expressions? In order to answer these and other questions, we explore many Patter approved Regex CONSTRUCTS or REGEX PATTERN type. We started from the simplest regex construct: Litral Strings.
Caution
Don't think that Pattern and Perl5's regular expression structure are the same. Although they have a lot of identity, they have many, they support the different points of the MetachacTers structure. (More information, check your SDK Pattern class document on your platform.)
Literal strings
When you enter a literal string in the search dialog box of the word processing software, you specify a regex expression construct. Perform the following regexdemo command line to check the action of this regex construct:
Java RegexDemo Apple Applet
This command line on the upper side determines the apple as a character A, P, P, L, AND E (session) character regex construct. This command line also determines the applet as the text of Pattern-Matching. After executing the command line, see the following output: regex = AppleText = AppleTfound Apple Starting At Index 0 and ending at index 5
The output REGEX and TEXT command line indicates the successful match of an applet in the applet and indicates the start and end of the match: 0 and 5, respectively. Start index pointed out the beginning of the first text of Pattern Match, ending an index indicating the position of the first Text after this Match. In other words, the range of matching text is included in the start index and remove the end index (does not contain end index).
Metacharacters
Although string regex constructs is useful, more powerful regex contsRuct combines text characters and metammatics. For example, in A.B, this period Metacharacter (.) Represents any characters that appear between A B. In order to view the action of the metamorphic, do the following command line:
Java Regexdemo .ox "The Quick Brown Fox Jumps over the lazy ox."
The above command pointed out that .ox as Regex, and The Quick Brown Fox Jumps over the lazy ox. As a text source text. RegexDemo Retrieves Text to match the Match ended with OX with any character, and generate the following output:
Regex = .oxText = The Quick Brown Fox Jumps over the lazy ox.found fox starting at index 16 and ending at index 19found ox starting at index 39 and ending at index 42
This output shows two matches: fox and ox. Metacharacter matches F in the first Match, match space in the second Match.
If we replace it with the aforementioned Metachacter? What would it be? That is, we specify Java RegexDemo. "The Quick Brown Fox Jumps over the lazy ox" will have any output, because Period Metachacter matches any character, RegexDemo outputs each matching character in the command line, including the ended PERIOD characters.
TIP
To specify. Or any element character as a regex construct as a Literal Character, reference - Convert Meta status to Literal Status - one of the following two methods:
Place a backslash before the metammatist.
Place the metamorphism between / Q and / E (for example: /q./e).
In each case, don't forget in String Litral (for example: string regex = // .;
Double backslash when there is appearance (image //. Or //q./e). Do not use double backslash when it appears in command line parameters.
Character Classes
Sometimes we define the generated matches to a specific character set and. For example, we can retrieve vowels A, E, I, O, AND U, and any vow characters appear to be a Match. A Character class, through a character set between square brackets and the specified regex construct, helped us to complete this task. Pattern supports the following Character Classes: Simple Character: Supports the string that is placed sequentially and only matches these characters. For example: [ABC] Match Character A, B, and C. The following command line provides another example:
Java Regexdemo [CSW] Cave
Java Regexdemo [CSW] CAVE [CSW] is matched in cave in Cave. There is no other match existence.
Neitudes: Start with the ^ Metacharacter element character and only matches the characters that are not appearing in the class. For example: [^ ABC] matches all characters other than A, B, and C, and the following command line provides another example:
Java regexdemo [^ CSW] Cave
Java RegexDemo [^ CSW] Cave matches A, V, and E encountered in Cave. There is no other match existence.
Range: All characters containing characters on the left side of the metammatist (-), the character (-) right character ends. Match only characters within the range. For example: [A-Z] matches all lowercase letters. The following command line provides another example:
Java RegexDemo [A-C] Clown
Java Regexdemo [A-C] Clown matches C in CLOWN. There is no other match existence.
TIP
By placing them together in a plurality of ranges in a Range Character Class. For example: [A-ZA-Z] matches all uppercase and lowercase letters.
Union: Compose multiple nested Character Classes, match all characters that belong to the joint result. For example: [A-D [M-P]] matches characters a to d and m to p. Characters a through d and m through p. The following command line provides another example:
Java regexdemo [ab [c-e]] abcdef
Java RegexDemo [AB [C-E]] Abcdef matches copies A, B, C, D, AND E E, which matches them in ABCDEF. There is no other match existence.
Intersection: Composition with all nested Class's common part and only matches the common part of the character. For example: [A-Z && [D-F]] matches characters D, E, and F. The following command line provides another example:
Java Regexdemo [Aeiouy && [Y]] Party
Java Regexdemo [Aeiouy && [Y]] Party matches Y in Party. There is no other match existence.
Difference Collection: The character consists of all reserved characters except those who have nesting characters in the nesting Character Class. For example: [A-Z && [^ m-P]] matches characters A to L and Q to Z. The following command line provides another example:
Java RegexDemo [A-F && [^ a-C] && [^ e]] Abcdefg
Java Regexdemo [A-F && [^ a-c] && [^ e]] Abcdefg matches D and F in Abcdefg. There is no other match existence.
Predefined Character Classes
Some Character Classes that appears in Regexes provides shortcuts. Pattern provides such shortcuts with predefined Character Class, such as Table 1. Use predefined Character classes to simplify your regexes and minimize Regex syntax errors. Table 1. Pre-defined Character Classes
Predefined Character Class
Description
/ d
A number. Equivalent to [0-9].
/ D
A non-numbers. Equivalent to [^ 0-9].
/ s
A Whitespace Character. Equivalent to [/ T / N / X0B / F / R].
/ S
A non-space character. Equivalent to [^ / s].
/ w
A a character. Equivalent to [A-ZA-Z_0-9].
/ W
A non-character, equivalent to [^ / W].
Subsequent command line examples use / W predefined character class to all word character lines in the Identify command line.
Java RegexDemo / W "AZ.8 _"
The command line on the upper side produces the following output, which shows the period and Space Characters not considered for Word Character:
Regex = / wText = aZ.8 _Found a starting at index 0 and ending at index 1Found Z starting at index 1 and ending at index 2Found 8 starting at index 3 and ending at index 4Found _ starting at index 5 and ending at index 6
Note
Pattern's SDK document reference period element is a predefined flag other than the predefined flag of the Line Terminator, one or two flags, unless Dotall Mode is valid. Pattern recognizes the following line terminators:
The Enter (/ R /)
The backout (/ N)
The Rouvery is tight with the backout (/ r / n)
The return character (/ u0085)
The Ring Segmentation Character (/ U2028)
THE paragraph segmentation character (/ U2029)
Capture group
Pattern supports in the process of Pattern matching, a regex construct calls Capturing Group to save the character sequence surrounded by the cracker. All characters in the captured Group are used as a separate unit during the match. For example, (Java) Capturing Group combines characters J, A, V, and A as a single unit. Capturing Group matches Java Pattern based on the appearance of Java in Text. Each Match uses the next matching Java character that replaces the Java character saved by the previous Match.
Capturing Groups is nest in other Capturing Groups. For example: in (Language), (Language) is completed in (Java). Each nesting or non-nesting Capturing Group has its own number, the number starts from 1, the Capturing number is from the left to right. In this example, (Language) is Capturing Group 1, (language) is Capturing Group 2. In (a) (b), (a) is the capture group 1, (b) is the capture group 2.
Each Capturing Group then saved by A Back Reference to save the match. Specifies the number of follows after a backslash to indicate a Capturing Group, Back Reference to Recalls a CapTuring Group captured text character. A Back Reference has caused a Matcher to use the Capturing Group Number of The Back Reference to RECALL capture group saved Match, and then use the matching character to further match. Subsequent example demonstrates the usage for TEXT search for inspection of syntax errors: java regexdemo (Java (Language) / 2) "" The Java Language Language "
This example uses (language) / 2) Regex to check the syntax error to retrieve the string The Java Language Language, where Java is directly in two consecutive Language. Regex specifies two Capturing Groups: Number 1 IS (Java (Language) / 2), which matches Java Language Language, Number 2 IS (Language), which matches the Space Characer followed by Language. / 2 Back Reference Recalls Number 2'S Save Match, which allows Matcher to retrieve the space after the second appearance of Language, the WHICH directly follows the first appearance of Space Character and Language. The subsequent output showed what regexdemo's matcher found:
Regex = (Language) / 2) Text = The Java Language LanguageFound Java Language Language Starting At Index 4 and Ending At Index 26
quantifier
Quantifiers probably understands the most confusing regex structure. Some confused from the best way to understand 18 quantifier logic (six basic logic are organized as three main logic). Other an understanding of the length of 0 length matching from the expenditure. Once you understand this concept and 18 Categories, most (if not all) confusion will disappear.
Note
Briefly, some of the concepts of 18 Quantifier Categories and Zero-Length matching. For more detailed discussions and more examples, learn the "Quantifiers" section of The Java Tutoria.
A Quantifier is an implicit or display of a regular expression structure that binds a quantity value for a Pattern. This numeric value solves the number of times that matches a Pattern. Pattern's six basic quantifiers match a Pattern once or not, 0 times or multiple times, once or more, a precise number, at least x and at least X but do not exceed Y.
Six basic Quantifier Categories are replicated in each three main categories: Greedy, Reluctant, and Possessive. Greedy Quantifiers try to find the longest match. With the control, Reluctant Quantifiers tried to find the shortest match. Possessive Quantifiers also tried to find the longest match. However, they and greedy quantifies are different in operation. Although Greedy and Possessive Quantifiers force a Matcher read the entire Text, Greedy Quantifiers often causes multiple attempts to find a Match, but Possessive Quantifiers let a Matcher try only a match. Subsequent examples describe six basic Quantifiers under the Greedy Category category, a single Fundamental Quantifier behavior at each Reluctant and Possessive Categories categories. These examples also describe the concept of 0 match:
1. Java regexdemo a? Abaa: Use a Greedy Quantifier to match a once in Abaa or do not match at all. The following is the output result:
Regex = a? Text = abaaFound a starting at index 0 and ending at index 1Found starting at index 1 and ending at index 1Found a starting at index 2 and ending at index 3Found a starting at index 3 and ending at index 4Found starting at index 4 And Ending At Index 4
This output shows five matches. Although the appearance of the first, three and four matches showed that the position of the three matches was not surprising, the first, the fifth match is probably a bit strange. This match seems to indicate that the end of the match B and the text. However, not this. a? Don't find B and the end of the text. Instead, it looks for A appearance or missing. When A? Find A failed, it returns that fact (a missing) in zero length (a missing), the index of the start and end position in zero length. Zero-Length Matches happens between empty text, after the last text character, or between any quantity characters.
2. Java regexdemo a * abaa: Match a zero or multiple times in ABAA using a Greedy Quantifier. The following is the output result:
Regex = a * Text = abaaFound a starting at index 0 and ending at index 1Found starting at index 1 and ending at index 1Found aa starting at 2 and ending at index index 4Found starting at index 4 and ending at index 4
The output showed four matches. Like A?, A * produces Zero-Length matching. The third match, A * matches AA, very interesting. Unlike a?, A * matches one or more consecutive a.
3. Java regexdemo a abaa: Match A or multiple times in ABAA using a Greedy Quantifier. The following is the output result:
Regex = a text = abaafound a starting at index 0 and ending at index 1Found AA Starting At Index 2 and ending at index 4 output showed two matching. Unlike a? And a *, a does not match A. Thus, there is no zero length match generation. Like A *, A matches a continuous A.
4. Java regexdemo a {2} AababbaaAab: Use Greedy Quantifier to match the AA sequence in each AababbaaAab in the AabbaaAb. The following is the output result:
Regex = a {2} text = aababbaaaabfound aa starting at index 0 and ending at index 2found aa starting at index 6 and ending at index 8found aa starting at index 8 and ending at index 10
5. Java regexdemo a {2,} AababbaaAab: Using Greedy Quantifier to match two or more match in AbabbaaAab, the following is the output result:
Regex = a {2,} text = aabbaaaabfound aa starting at index 0 and ending at index 2found aaa starting at index 6 and ending at index 10
6. Java regexdemo a {1,3} AababbaaAab: A, AA or AAA that appears in AabbaAAAB using Greedy Quantifier. The following is the output result:
Regex = a {1,3} Text = aababbaaaabFound aa starting at index 0 and ending at index 2Found a starting at index 3 and ending at index 4Found aaa starting at index 6 and ending at index 9Found a starting at index 9 and ending at index 10
7. Java Regexdemo A ? Abai: Using a Reluctant Quantifier to match A in ABAA. The following is the output result:
Regex = a ? Text = abaafound a starting at index 0 and ending at index 1found a starting at index 2 and ending at index 3found a starting at index 3 and ending at index 4
Unlike the GREEDY variable in the third example, the Reluctant sample produces three separate matches because reluctant Quantifier is trying to find the shortest match.
8. Java regexdemo. * End "this is the end": uses Possessive Quantifier to match any character that ends with END in this is the end or multiple times. The following is the output result:
Regex =. * EndText = this is the end
Because this Possessive Quantifier Consume has a whole text, there is no anything to match the end, it does not have a match. In contrast, Greedy Quantifier in Java Regexdemo. * End "this is the end", because it produces a match every time the backing OFF matches up to the rightmost END match. (This quantifier and greedy differently, once the characters are matched during the latter matching process, they are no longer used in subsequent matches. Therefore. * This part of the regular expression matches all strings, no characters can Match with END.) Boundary Matchers
We sometimes want to match Pattern at the beginning of a line, matching Pattern, the end of the text. Using Boundary Matcher, a regular expression structure specified in the matching boundary is used to complete this task. Table 2 indicates the border matching support of Pattern.
Table 2. Boundary matcher
Boundary matcher
Description
^
A line of start
$
End of a row
/ B
Word boundary
/ B
Non-word boundary
/ A
The beginning of the text
/ G
The previous match
/Z
The end of the text (But for the final line terminator, if any)
/z
Text end
The following command line example uses the boundary matching element character ENSURE starts with zero or more characters.
Java regexdemo ^ the / w * thereforefore
^ It is pointed out that the first three characters must match the T, H, and E characters after Pattern. Can follow the number of characters. The above command line produces the following output:
Regex = ^ the / w * text = there = thereforefound therefore starting at index 0 and ending at index 9
Change the command line to java regexdemo ^ the / w * "therefore". What happened? Because there is no match in front of Therefore, there is no match.
Embedded Flag Expressions
Matcher assumes that it is determined, such as case sensitive matching. A program can use an Embedded Flag Expression to override the default value, that is, using a regular expression structure, the parentheses character is surrounded by a question mark element, following lowercase letters. Pattern recognizes the following Embedded Flag Expressions:
(? i): Enables case-sensitive Pattern matches. For example: Java RegexDemo (? I) Tree Treehouse to match Tree and Tree. Size-level sensitivity is the default.
(? x): Allow spaces and comments to appear in Pattern with # 元 characters. A Matcher is ignored all them. For example: java regexdemo ".at (? X) #match hat, cat, and so on" Matter matches .at and mat. By default, spaces and comments are not allowed; a Matcher considers them as characters contributing to Match.
(? s): EtOAc. In this mode, the sentence also matches the end of TEXT in addition to other characters. For example: Java RegexDemo (? S). / N,. Match / n. The NondotAll method is the default: mismatches.
(? M): Make more ways to be effective. In multi-line mode, ^ AND $ just after the end or end of the line, or before. For example: Java RegexDemo (? M) ^. Ake make / RLAKE / N / RTAKE matches .ake and make, lake and take. Non-multi-line modes are default: ^ and $ match only the beginning and end of the entire text. (? U):. Enables Unicode-aware case folding This flag works with (? I) to perform case-insensitive matching in a manner consistent with the Unicode Standard The default:. Case-insensitive matching that assumes only characters in the US- ASCII Character Set Match.
(? D): enables Unix lines mode In that mode, a matcher recognizes only the / n line terminator in the context of the, ^, and $ metacharacters Non-Unix lines mode is the default:... A matcher recognizes all terminators In The Context of the Aforementioned Metacharacters.
Embedded Flag Expressions is similar to Capturing Groups because two regex constructs are surrounded by parentheses. Unlike Capturing Group, Embedded Flag Expression does not capture the matching characters. Thus, an Embedded Flag Expression is a special case of Noncapturing Group. That is, a regex construct that does not capture the Text character; it specifies the character sequence surrounded by the element parentheses. Some NonCapturing Groups appeared in the Pattern's SDK document.
TIP
In order to specify a plurality of Embedded Flag expressions in the regular expression. Or let them be placed together (E.G., (? M) (? I)) or put their lowercase letters (E.G., (? Mi)).
Explore the method of java.util.Regex class
The three classes of the java.util.regex package provide a number of methods for helping me write more robust regular expressions and create a powerful Text processing tool. We start to explore these methods from the Pattern class.
Note
You can also use the method of Explore Charsequence interface when you create a new character sequence class to implement. The class that implements Charsequence interface is Java.Nio.Charbuffer, String, and StringBuffer.
Pattern method
Unless the code compiles a String to Pattern object, a regex expression is useless. Use one of the following editing methods to complete this task:
Public Static Pattern Compile (String Regex): Compiling the Regex content to objects of the tree structure stored in a new Pattern object. Returns that object reference. For example: pattern p = pattern.compile ("(? M) ^ //."); Created one, a representation of a compiled representation that matches the row starting with a sentence.
Public Static Pattern Compile (String Regex, INT FLAGS): Complete the same task of the previous method. However, it consider containing FLAG constants (specified by Flags). The Flag constant is declared as an Embedded Flag Expressions as a two choice in Pattern (Excet the Canonical Equivalence Flag, Canon_eq). For example: pattern p = pattern.compile ("^ //.", Pattern.multiline); (Refer to the SDK's Pattern document to learn the other constants.) If these constants that are defined in Pattern appear in the FLAG, the method will throw an IllegaLaRgumentException exception. If you need, by calling the following method, you can get a Pattern object's Flag and the original regular expression of the initial compiled object:
Public int flags (): Returns the Pattern's FLAG that is specified when it is compiled. For example: system.out.println (p.flags ()); outputs PATTERN-related Flag of P references.
Public String Pattern (): Returns the original expression of Pattern. For example: system.out.println (p.Pattern ()); output corresponds to the regular expression of Pattern. (The Matcher class contains a pattern pattern () method that returns Matcher-related Pattern objects.)
After creating a Pattern object, you generally get a Matcher object by calling Pattern's male method Matcher (Charsequence Text). This method requires a simple and implementation of the text object parameters for the Charsequence interface. The object obtained scans the input text object during Pattern matching. For example: Pattern P = Pattern.Compile ("[^ aeiouy]"); matcher m = p.matcher ("this is a test."); Gets a Matcher that matches all non-intrinsical sound letters in the text.
When you want to check if a Pattern is completely matched with a text sequence is troubled. Fortunately, Pattern provides a convenient way to complete this task; PUBLIC STATIC Boolean Matches (String Regex, Charsequence Text). When and only when the entire character sequence matches the Pattern of Regex, the static method returns the Boolean TRUE. For example: system.out.println ("[AZ Matches", "All LowerCase Letters and Whitespace Only"); Return to the Boolean True, pointing to only space characters and lowercase characters in All LowerCase Letters and whitespace only appears.
Writing code divides text into its components (such as the SET of the employee log file to a field) is a multi-developer discovered a bored task. Pattern is reduced to TEDIUM by providing a pair of character segmentation methods.
Public String [] split (charsequence text, int limit): Split the pattern matching of the current Pattern object to match the TEXT. This method returns an array that each entry specifies a character sequence from the next one by pattern matching (or text); and all entries are stored in the same order in TEXT. The number of book group entries depends on Limit, which also controls the number of matches. A positive number means that at most, LIMIT-1 match is considered and the length of the array is not greater than the limited number of entries. A negative value is to be considered for all matches and array can be arbitrarily. A 0 value is considered for all possible entries, and the array can have any length, and the empty string of the tail is discarded. Public string [] split (charsequence text): Use 0 as a restricted call to call the front edge method, return the result of the method call.
If you want a split employee record, contain your name, age, street and salary, for its components. The following code completed this task with the split (charsequence text) method:
Pattern P = Pattern.Compile (", // s"); string [] fields = p.split ("John Doe, 47,
Hillsboro Road
, 32000 "); for (int i = 0; i The Code Fragment Above Specifies A Regex That Matches A Comma Character Immediately Followed by a Single-Space Character and Products the Following Output: John Doe 47HILLSBORO Road 32000 Note String combines three convenient methods to call their equivalent Pattern method: public boilean matches (String regex), public string [] split (string regex), and public string [] split (String Regex, int it). Matcher method The Matcher object supports different types of Pattern matching operations, such as scanning text lookup next match; try to match the entire text according to a pattern; attempt to match the part of the part of the part according to a Pattern. Complete these tasks with the following methods: Public Boolean Find (): Scan text looks up the next match. This method, or start scanning in TEXT, if the last method call returns true and this matcher is not reset, the first character after the previous Match starts scanning. If a Match is found, return to the Boolean True. Listing 1 shows an example. Public Boolean Find (Int Start): Replacing Matcher Scan the next match. Scan starts from the index specified by START. If a Match is found, return to the Boolean True. For example: M.Find (1); starts scanning from index1. (Index 0 is ignored.) If START contains a negative number or a value that exceeds the Text length of Matcfher, this method throws indexoutofboundsexception exception. Public Boolean Matches (): Try to match the entire Text according to Pattern. Returns True in this text matching. For example: Pattern P = Pattern.Compile ("// w *"); matcher m = p.matcher ("abc!"); System.out.println (mtches ()); Output False Because of the entire ABC! TEXT Contains non-alphabet word character. Public Boolean Lookingat (): Try to match Text according to Pattern. If a Match is found, return to the Boolean True. Unlike matches (), the entire Text does not need to be matched. For example: pattern p = pattern.compile ("// w *"); matcher m = p.matcher ("abc!"); System.out.println (p.lookingat ()); output true because of text abc! The start part contains only Word characters. Unlike the pattern object, Matcher contains status information. Sometimes, you want to reset a matcher after a pattern matching information. The method of the lower side RESET has a Matcher: Public matcher reset (): Resets a Matcher status, including the Append Position of Matcher (Clear 0). The next Pattern matching operation begins with the beginning of the new text of Matcher. Returns the current Matcher object reference. For example: m.Reset (); reset Matcher by reference m. Public Matcher Reset: Reset a Matcher status and sets the same for matcher. The next Pattern matching operation starts at the starting position of the new text of Matcher. Returns the current Matcher object reference. For example: M.Reset ("New Text"); reset the object of M reference, and develop new text as a new text of Matcher. A Matcher's append position determines the starting position of the Matcher's Text to a StringBuffer object. The following method uses append position: Public matcher appendreplacement (StringBuffer SB, String Replacement): Read the Matcher's text and adds them to the SB Reference StringBuffer object. This method stops reading after the last character of the previous Pattern Match. This Method then adds the CHARACTERS to the StringBuffer object referenced by Replacement. (Replace the string can include a reference to the previously matched text, and the number of Dollar-Sign Characters ($) and Capturing Group is ultimately, this method sets the position of Matcher's Append position to the last matching character. A reference to a current Matcher object returns. If this Matcher object has not yet implemented Match or last Match attempts, this method will throw an IllegalStateException exception. If Replacement specifies that a Capturing Group in Pattern doesn't be thrown out, an indexoutofboundsexception exception will be thrown. Public StringBuffer Appendtail (StringBuffer SB): Add all Text to StringBuffer objects and return to object references. After the last time the AppendReplacement (StringBuffer SB, String Replacement) method is called, the rest of the AppendTail (StringBuffer SB) COPY is invoking the TEXT to StringBuffer object. The subsequent example calls the AppendReplacement (StringBuffer SB, String Replacement) and the AppendTail (StringBuffer SB) method to replace all CATs that appear in One Cat, Two Cats, or Three Cats On A Fence for Caterpillar. A Capturing Group and a reference to the Capturing Group in Replacement allow for an Erpillar after each CAT match: Pattern P = Pattern.Compile ("(" "" One Cat, Two Cats, or Three Cats on A Fence "); StringBuffer SB = New StringBuffer (); while (m.find ()) M.APpendReplacement (SB, "$ 1ERPILLAR"); M.AppendTail (SB); System.out.Println (SB); This example produces the following output: One Caterpillar, Two Caterpillars, or Three Caterpillars on a fence Other two replacement methods use alternative text to replace the first Match and all Match possible: Public String ReplaceFirst (String Replacement): Reset Matcher, create a new String object, copy all matching text characters (until the first match) to String, add the replacement character to string, copy the remaining characters to strring, and return Object references. (Replace the string can include a reference to the previously matched text, through Dollar-Sign Characters ($), and Capturing Group.) Public String ReplaceAll (String Replacement): Action and the previous method is similar. However, ReplaceAll (String Replacement) replaces all matches with replacement characters. Regular expression / s detects a space that appears in the text. The subsequent example uses this regex and calls the ReplaceAll (String Replacement) method to delete Duplicate Whitespace from Text: Pattern P = Pattern.Compile ("// s "); matcher m = p.matcher ("Remove The / T / T Duplicate Whitespace."); System.out.Println (M.ReplaceAll (")); This example produces the following output: Remove The duplicate Whitespace. Listing 1 contains System.out.Println ("Found" m.Group ()) ;. Note how Group () is called. This method is the Matcher method for Capturing Group-Oriented: Public Int GroupCount (): Returns the number of Capturing Groups in the matcher's Pattern. This count does not include a specific Capturing Group number 0, which captures the previous Match (regardless of whether a Pattern contains capenuring groups or not.) Public String Group (): Record the character of the previous MATCH via the Capturing Group Number 0. This method can return an empty string according to an empty string. If the Match has not been tried or the last MATCH operation fails will throw an IllegalStateException. Public String Group: Like a method, in addition to the Capturing Group Number specified by group returns to previous Match characters. If there is no group number specified by the Capturing Group exists in Pattern, this method throws an indexoutofboundsexception. The following code demonstrates the Capturing Group method: Pattern P = pattern.Compile (")") ")") ")") "); matcher m = p.matcher (" abc "); m.Find (); system.out.println (m.GroupCount ()) ; for (int i = 0; i <= m.GroupCount (); i ) system.out.println (i ": m.Group (i)); The EXAMPLE PROduces the Following Output: 30: ABC1: ABC2: BC3: C Capturing Group Number 0 Save Previous Match and Has Nothing to do with WTHER A CAPTURING Group appears in a pattern without any relationship. That is, IS (. (.))). Other three Capturing Groups captured the character of this Capturing Groups. For example, Number 2, (.)), Capture BC; and Number 3, (.), Capture C. Before we leave to discuss Matcher, we will examine four MATCH location methods: Public int start (): Returns the start position of PreviOS Match. If Match has not been executed or last Match failed, this method throws an IllegalStateException exception. Public Int Start: Similar to the previous method, in addition to the start index of the relevant PreviOSMATCH that returns Group specified by Group, if there is no specified Capturing Group Number in Pattern, Start (INT Group) throws indexoutofboundsexception abnormal. Public int end (): Returns the index position of the matched character in the last MATCH plus 1. If the Match has not been tried or the last MATCH operation fails will throw an IllegalStateException. Public int end (int 201): Similar to the previous method, in addition to returning the relevant PreviOS Match's END index of the Capturing Group specified. If there is no specified Capturing Group Number in Pattern, end (int group) throws the indexoutofboundsexception exception. The following example demonstrates two Match Position methods, and reports the start / end MATCH position for Capturing Group Number 2: Pattern P = pattern.com) ")") ")") ")") ")") ")"); matcher m = p.matcher ("abcabcabc"); while (m.find ()) {system.out.println ("found" M.Group (2)); System.out.Println ("Starting At Index" M.Start (2) "And ending at index" m.end (2)); system.out.println () } The EXAMPLE PROduces the Following Output: Found BC Starting At Index 1 and Ending At Index 3Found BC Starting At Index 4 and Ending At Index 6Found BC Starting At Index 7 And Ending At Index 9 Output Show We are interested in matcher associated with Capturing Group Number 2, that is, these matching start ends. Note String introduces two convenient and calling Matcher equivalents: Public String ReplaceFirst (String Regex, String Replacement) and Public String ReplaceAll (String Regex, String Replacement). PatternsyntaxException method Pattern's approach When they find illegal regular expression syntax errors, PatternsyntaxException is thrown. An exception processor can call the PatternalNSyntaxException method to get information about the patternsyntaxException object on syntax errors. Public string getdescription (): Returns a language error description. Public int getIndex (): Returns an approximate index or -1 of the language error occurrence location, if index is unknown. Public String getMessage (): Create a multi-line of information that contains the information returned by the other three methods, indicating that the wrong location string in Pattern is pointed out in a visual manner. Public String getPattern (): Returns an incorrect regular expression. Because patternsyntaxexception is inherited from java.lang.RuntimeException, the code does not need to specify an error handler. This Proves Appropriate When Regexes Are Known To Have Correct Patterns. But when there is a potential Pattern grammatical error, an exception handler is required. Thus, RegexDemo's source code (see Listing 1) contains Try {...} catch (ParsesyntaxException E) {...}, which calls each of the four exception methods to obtain information about illegal Pattern. What makes it to illegal pattern? The end symbol end symbol that does not specify end in the Embedded Flag Expression is an example. If you execute Java RegexDemo (? Itree Treehouse. This command is illegal regular expression (? Tree Pattern causes p = pattern.Compile (args [0]); throwing patternsyntaxException. You will see the following output: Regex Syntax Error: Unknown Inline Modifier Near Index 3 (? Itree ^ Error Description: UNKNOWN INLINE MODIFIERROR INDEX: 3ERRONEOS PATTERN: (? Itree Note public PatternSyntaxException (String desc, String regex, int index) constructor allows you to create your own PatternSyntaxException objects, That constructor comes in handy should you ever create your own preprocessing compilation method that recognizes your own pattern syntax, translates that syntax to syntax recognized by pattern's compilation methods, and calls one of those compilation methods. If your method's caller violates your custom pattern syntax, you can throw an appropriate PatternSyntaxException object from that method. A regular expression application practice Regexes let you create powerful text-processing applications One application you might find helpful extracts comments from a Java, C, or C source file, and records those comments in another file Listing 2 presents that application's source code..: Listing 2. EXTCMNT.JAVA // ExtCmnt.javaimport java.io. *; Import java.util.regex. *; Class extcmnt {public static void main (string [] args) {if (args.length! = 2) {system.rr.println "USAGE: JAVA Extcmnt Infile Outfile"); Return;} Pattern P; Try {// THE FOLLOWING PATTERN Lets this Extract Multiline Comments That // Appear on a Single Line (EG, / * Same Line * /) And Single-Line // Comments (eg, // some line). furthermore, The Comment May // Appear Anywhere on The line. p = pattern.compile (". * /// *. * /// * / |. * //. * $ ");} Catch (patternsyntaxexception e) {system.err.println (" regex syntax error: " E.getMessage ()); system.err.println (" Error Description: " E.GetDescription ()) System.err.Println ("Error Index:" E.GetIndex ()); System.err.Println ("Erroneous Pattern:" E.GETPATTERN ()); return;} bufferedreader br = null; bufferedWriter BW =null; tryreader fr = new fileReader (args [0]); br = new buffredreader (fr); filewriter fw = new filewriter (args [1]); bw = new bufferedWriter (fw); matcher m = p.matcher (""); String line; while ((line = br.readline ())! = Null) {m.reset (line); if (mtches ()) / * entire line must match * / {BW. Write (line); bw.newline ();}}} catch (ieException e) {system.err.println (E.GetMessage (); return; Finally // close file. {Try {if (br! = Null) br.close (); if (bw! = Null) bw.close ();} catch (ooException e) {}}}} is created in Creating Pattern After the Matcher object, the EXTCMNT reads the contents of a text file. For each row, Matcher attempts to match the Pattern's row, identifying a single row of annotations or multi-line comments appear in one line. If you match Pattern, EXTCMNT writes this line to another text file. For example, Java ExtCmnt ExtCmnt.java OUT reads each row of an ExtCmnt.java file. According to Pattern, you will try a line to output the matching row to the file called OUT. (Don't worry about understanding the reading and writing logic of the file. I will use this code in the future article.) Execution in EXTCMNT, the OUT file contains the following line: // ExtCmnt.java // The folload pattern letts this extract multiline Comments That // Appear on a single line (EG, / * SAME LINE * /) And single-line // Comments (EG, // Some line). Furthermore The comment may // Appear Anywhere on The line. p = pattern.compile (". * /// *. * /// * / |. * //. * $"); if (m.matches ()) / * Entire Line Must Match * / Finally // Close File. This output shows that extcmnt is not perfect: p = pattern.Compile (". * /// *. * // * / |. * //. * $"); Not depicts a comment. The row appearing in the OUT because ExtCmnt's Matcher matches // characters. About Pattern ". * /// *. * // * / |. * //. * $" Is made by some interesting things, vertical line character Metacharacter (|). According to SDK Documentation, parentheses character characters are logical operation symbols in Capturing Group and vertical line characters. Vertical Bar describes a Matcher that uses the regular expression structure on the left side of the operator to be set to a match in the text of Matcher. If there is no Match existence, Matcher uses the regular expression on the right side of the operation symbol to make a re-match attempt. review Although the regular expression simplifies the code that matches Pattern in the Text handler unless you understand them, you can't effectively use regular expressions in your program. This article allows you to have a basic understanding of the regular expression by introducing you Regex Terminology, the Java.util.Regex package and demonstration regex constructs. Since you have a basic understanding of regexes, build in reading additional articles (see resources) and learning Java.util.Regex's SDK document, you can learn more regex constructs, such as POSIX (Portable Operating System Interface for UNIX) The character class. I encourage you to use the information in this article or other in other than previous articles. (Please keep the problem and the articles discussed in this column.) Your question and my answer will appear in the relevant learning guides. ) After Writing Java 101 Articles for 28 Consecutive Months, I'm Taking A Two-Month Break. I'll Return In May and Introducture A Series on Data Structures and Algorithms. About the Author Jeff Friesen has been involved with computers for the past 23 years. He holds a degree in computer science and has worked with many computer languages. Jeff has also taught introductory Java programming at the college level. In addition to writing for JavaWorld, he has written his own Java book for beginners-Java 2 by Example, Second Edition (Que Publishing, 2001; ISBN: 0789725932) -and helped write Using Java 2 Platform, Special Edition (Que Publishing, 2001; ISBN: 0789724685). Jeff goes by the Nickname Java Jeff (or Javajeff). To see what he's working on, check out his Website at http://www.javajeff.com. Resources Download this article's source code and resource files: http://www.javaworld.com/javaworld/jw-02-2003/java101/jw-0207-java101.zip For A Glossary Specific To this Article, Homework, And More, See The Java 101 Study Guide That Accompanies this article: http://www.javaworld.com/javaworld/jw-02-2003/jw-0207-java101guide.html "Magic with Merlin: Parse Sequences of Characters with the New regex Library," John Zukowski (IBM developerWorks, August 2002) explores java.util.regex's support for pattern matching and presents a complete example that finds the longest word in a text file: http://www-106.ibm.com/developerworks/java/library/j-mer0827/"Matchmaking with Regular Expressions, "Benedict Chng (JavaWorld, July 2001) explores regexes in the context of Apache's Jakarta ORO library: http: //www.javaworld.com/javaworld/jw-07-2001/jw-0713-Regex.html "Regular Expressions and the Java Programming Language," Dana Nourie and Mike McCloskey (Sun Microsystems, August 2002) presents a brief overview of java.util.regex, including five illustrative regex-based applications: http: //developer.java.sun .com / developer / TechnicalArticles / Releases / 1.4Regex / In "The Java Platform" (onJava.com), an excerpt from Chapter 4 of O'Reilly's Java in a Nutshell, 4th Edition, David Flanagan presents short examples of CharSequence and java.util.regex methods: http: // www. Onjava.com/pub/a/onjava/excerpt/javanut4_CH04 The Java Tutorial's "Regular Expressions" Lesson Teaches The Basics of Sun's Java.util.Regex package: http://java.sun.com/docs/books/tutorial/extra/regex/index.html Wikipedia Defines Some Regex Terminology, Presents A Brief History of Regexes, And Explores Various Regex Syntaxes: http://www.wikipedia.org/wiki/regular_EXPIPRESSION Read Jeff's Previous Java 101 Column: "Tools of the Trade, Part 3 (JavaWorld, January 2003): http://www.javaworld.com/javaworld/jw-01-2003/jw-0103-java101.html? Check Out Past Java 101 Articles: http://www.javaworld.com/javaworld/topicalindex/jw-ti-java101.html Browse the core java section of javaworld's topical index: http://www.javaworld.com/channel_content/jw-core-index.shtml NEED SOME JAVA HELP? Visit Our Java Beginner Discussion: http://forums.devworld.com/webx?50@.ee6b804 Java Experts Answer your Toughst Java Questions in JavaWorld's Java Q & a column: http://www.javaworld.com/javaworld/javaqa/javaqa-index.html For tips' n tricks, see: http://www.javaworld.com/javaworld/javatips/jw-javatips.index.html Sign Up for JavaWorld's Free Weekly Core Java Email Newsletter: http://www.javaworld.com/subscribe You'll Find A Wealth of It-Related Articles from Our Sister Publications At IDG.Net Author Blog: http://blog.9cbs.net/blue2993/