Treatment of Java packages for regular expressions: Regexp

xiaoxiao2021-03-06  43

Although Apache believes that Jakartaoro is a more complete regular expression handle, the application of regexp is also very broad, which is probably because of its simple. Here is the regexp learning notes. 1, download installation download source code CVS -D: PServer: anoncvs@cvs.apache.org: / home / cvspublic loginpassword: anoncvscvs -d: pserver: anoncvs@cvs.apache.org: / home / cvspublic Checkout jakarta-regexp or download Compile with wget

http://apache.linuxForum.Net/dist/jakarta/regexp/binaries/jakarta-regExp-1.3.tar.gz

2. Basic Case 1) Regexp is a 100% pure Java regular processing package, which is Jonathan Locke donated to the Apache Software Foundation. He originally developed this software in 1996. The regexp expression is very strong in front of time test :). It includes a complete Javadoc document, as well as a simple applet to perform visual debugging and compatibility tests. 2) A very important class in the RE class regexp package, it is an efficient, lightweight regular calculator / match The class of the device, RE is the abbreviation of Regular Expression. Regularity is a template that is capable of complicating complicated string, and when a string matches a template, you can extract those parts, which is very useful when making text parsing. The regular syntax will be discussed below. In order to compile a regular style, you need to simply construct a RE matching object as a parameter to complete, and then you can call any RE.MATCH method to match a string, if matching success / failure, return True / false value. For example: Re r = new re ("a * b"); boolean match = r.match ("aaaab"); Re.GetParen can retrieve the matching character sequence, or a part of the matching character sequence (if the template There is a corresponding parentheses), and their location, length, etc. have attributes.

Such as: re r = new re ("(a *) b"); // compile expressionBoolean match = r.match ("xaaaab"); // match against "xaaaab" string whyleExpr = r.GetParen (0); / / wholeExpr will be 'aaaab'String insideParens = r.getParen (1); // insideParens will be' aaaa'int startWholeExpr = r.getParenStart (0); // startWholeExpr will be index 1int endWholeExpr = r.getParenEnd (0) ; // endWholeExpr will be index 6int lenWholeExpr = r.getParenLength (0); // lenWholeExpr will be 5int startInside = r.getParenStart (1); // startInside will be index 1int endInside = r.getParenEnd (1); // Endinside Will Be Index 5int leninside = r.GetParength (1); // Leninside Will BE 4 R Support regularly rearward reference, such as: ([0-9] ) = / 1 match n = n (like 0 = 0 or 2 = 2) Such string 3) RE Support regular syntax is as follows: Character Unicodechar Matches Any Identical Unicode Character / Used to quote a meta-character (like '*') // matches a single '/' Character / 0nnn Matches a Given Octal Character / XHH Matches A Given 8-Bit Hexadecimal Character // Uhhh Matches A Given 16-Bit Hexadecimal Character / T Matches An As CII Tab Character / N Matches An Ascii Newline Character / R Matches An Ascii Return Character / F Matches An ASCII FORM FEED Character Character Set [ABC] Simple Character Set [A-ZA-Z] Character Set [^ ABC] Character Set of deny standard POSIX character set [: alnum:] alphabetic character characters. [: Blank:] space and tab characters. [: DIGIT:] control characters. [: Digit:]. : graph:]] a. (a space isprintable, but not visible, while an`

is both). [: lower:] Lower-case alphabetic characters. [: print:] Printable characters (characters that are not control characters). [: punct:] Punctuation characters (characters that are not letter, digits, control characters, OR Space Characters. [: Upper:] Upper-case - code. [: xdigit:] Characters That Area. Non-standard POSIX style character set [: JavaStart:] Start of a Java Identifier [: JavaPart:] Part of a java Identifier predefined character set. Matches Any Character Other Than Newline / W Matches A "Word" Character (Alphactor Plus "_") / w matches a non-Word Character / s Matches A Non-Whitespace Character / D Matches A Non-Digit Character Boundary Matching ^ Matches Only Atly Th Beginning Of a line $ matches only at the end of a line / b Matches Only at a Word Boundary / B Matches Only at a Non-Word Boundary Greed Match Limitator A * Matches A 0 or More Times (G REEDY) A MATCHES A 1 Or More Times (GREEDY) A? Matches A 1 Or 0 Times (Greedy) A {n} Matches a {n} Matches A at Least N Times (Greedy) Greed Match Limits A *? Matches a 0 or More Times (Reluctant) A

? Matches a 1 or more Times A ?? Matches a 0 or 1 Times (Reluctant) logical operator AB Matches A FOLLOWED BY B A | B Matches Either A OR B (a) Used for SubExpression Grouping (?: A ) after Used for subexpression clustering (just like grouping but no backrefs) to the reference symbol / 1 backreference to 1st parenthesized subexpression / 2 backreference to 2nd parenthesized subexpression / 3 backreference to 3rd parenthesized subexpression / 4 backreference to 4th parenthesized subexpression / 5 backreference to 5th parenthesized subexpression / 6 backreference to 6th parenthesized subexpression / 7 backreference to 7th parenthesized subexpression / 8 backreference to 8th parenthesized subexpression / 9 backreference to program 9th parenthesized subexpression RE run first compiled RECompiler class. reasons of efficiency, RE match does not Including regular compilation classes. In fact, if you want to pre-compiling 1 or more regular formats, you can run the 'Recompile' class through the command line, such as java org.apache.regExp.recompile A * B, produce the following compilation Output (last row is not): // pre-compiled regular expression "a * b" char [] re1instruction = {0x007c, 0x0000, 0x001a, 0x007c, 0x0000, 0x000 D, 0x0041, 0x0001, 0x0004, 0x0061, 0x007c, 0x0000, 0x0003, 0x0047, 0x0000, 0xFFF6, 0x007c, 0x0000, 0x0003, 0x004, 0x0000, 0x0003, 0x0041, 0x0001, 0x0004, 0x0062, 0x0045, 0x0000, 0x0000,}; Reprogram RE1 = New Reprogram (R1Instructions); Re r = new re (RE1); By building a RE matcher object by utilizing precompiled REQ, the cost of compiling time is avoided. If you need a dynamic constructor, you can create a separate Recompiler object and use it to compile each regular style. Note that RE and Recompiler are not Threadsafe (for efficiency), so when you run, you need to create a compiler and match for each thread. 3, routines 1) The applet written in the regexp package, running as follows: java org.apache.regExp.Redemo2) Jeffer Hunter wrote a routine, you can download. 3) RegexP's own test routine, also has a reference value. It puts all regular and related strings and results in a separate file, in $ regexphome / DOCS / RETEST.TXT.

转载请注明原文地址:https://www.9cbs.com/read-55744.html

New Post(0)