Regular expression
Summary
Regular expressions are tools that can be used for pattern matching and replacement, allowing users to build matching mode by using a series of special characters, then compare matching mode with the comparison string or file, depending on the comparison object Match mode, perform the corresponding program; regular expressions start from UNIX systems, currently in various scripting languages, in PHP, Perl, JavaScript can find his figure. The current regular expression is most commonly used in the Web to determine if the user input email address is correct. (2002-09-02 12:29:29)
By Wing, Source:
Bird
Regular Expression Introduction Regular Expression is a tool that can be used for pattern matching and replacement, allowing users to build matching mode by using a series of special characters, then compare matching mode with strings or files to be compared, according to comparison Whether or not the object contains matching mode, performs the corresponding program; regular expressions start from UNIX systems, currently in various scripting languages, can find his figure in PHP, Perl, JavaScript. The current regular expression is most commonly used in the Web to determine if the user input email address is correct. Regular expression syntax character describes the next character marked as a special character, or a primary character, or a rearward reference, or an octal escape. For example, 'n' matches characters "n". '' Match a newline. Sequence '' match "" and "match" (". ^ Match the input string's start position. If the multiline property of the regexp object is set, ^ also matches the position after '' or ''. $ Match Input Character The end position of the string. If the demiline property of the regexp object is set, $ also matches the position before '' or ''. * Match the previous sub-expression zero or multiple times. For example, ZO * can match "Z" and "" Zoo ". * equivalent to {0,}. matches the previous sub-expression once or more. For example, 'ZO ' can match" ZO "and" ZOO ", but cannot match" Z ". equivalent {1,}.? Match the previous sub-expression zero or once. For example, "Do (es)" can match "do" in "do" or "does".? Isometric {0,1} {N} n is a non-negative integer. Match the N times. For example, 'o {2}' does not match 'O' in "Bob", but can match the two of "Food" O. {n } n is a non-negative integer. At least n times. For example, 'o {2,}' does not match 'O' in "Bob", but can match all O.'o {1 in "foooOD". } 'Is equivalent to' o '.' O {0,} 'is equivalent to' o * '. {N, m} m and n are non-negative integers, where n <= m. Minimize n times and Maximum match M times. "O {1,3}" will match the top three O.'o {0,1} 'in "foooood" is' o?'. Please pay attention to the comma and two numbers There is no space. • When the character is tight in any other restriction (*, , {n}, {n,}, {n, m}), the matching mode is not greedy. Non-greed The model matches the search string as little as possible, and the default greed mode is as many as possible to match the search string. For example, for the string "OOOO", 'o ?' Will match a single "O", and 'o ' will match any individual characters outside of 'o'.. Match "".
To match any characters including '', use the mode of the icon '[.]'. (Pattern) Match Pattern and get this match. The acquired matches can be obtained from the generated Matches, using the Submatches collection in VBScript, using $ 0 ... $ 9 properties in Visual Basic Scripting Edition. To match the bracket characters, use '(' or ')'. (?: Pattern) Match Pattern but does not acquire the matching result, that is, this is a non-acquired match, not to use it after storage. This is useful to use the "or" character (|) to combine a pattern. For example, 'industr (?: Y | iES) is a smale of' Industry | Industries'. (? = Pattern) Positive to check, match the lookup string at any string of Pattern. This is a non-acquisition match, that is, the match does not need to be used later. For example, 'Windows (? = 95 | 98 | NT | 2000)' Map "Windows" in Windows 2000, but does not match "Windows" in "Windows 3.1". It is not consumed by the character, that is, after a match occurs, start the next matching search immediately after the last match, not starting from the character containing the pre-check. (?! Pattern) negotiation, match the lookup string at any string of any mismatch at any Point WHERE A STRING NOT MATCHING POINT WHERE A STRING NOT MATCHING PATTERN. This is a non-acquisition match, that is, the match does not need to be used later. For example, 'Windows (?! 95 | 98 | NT | 2000) "can match" Windows "in Windows 3.1, but cannot match" Windows "in" Windows 2000 ". It is not consumed by the character, that is, after a match occurs, start the next matching search immediately after the last match, not the X | Y, which matches X or Y after the character containing the queue. For example, 'Z | Food' can match "z" or "food". '(z | f) OOD' matches "Zood" or "Food". [XYZ] Character collection. Match any of the included characters. For example, '[abc]' can match 'a' in "Plain". [^ XYZ] Negative character set. Match any of the characters that are not included. For example, '[^ ABC]' can match 'P' in "Plain". [A-Z] character range. Match any of the characters within the specified range. For example, '[a-z]' can match any lowercase alphabetic characters in the 'A' to 'Z' range. [^ A-z] Negative character range. Match any of any characters that are not within the specified range. For example, '[^ a-z]' can match any of any characters that are not in the 'A' to 'Z'.
Match a word boundary, that is, the location of the words and spaces. For example, 'er' can match 'Er' in "Never", but do not match 'Er' in "Verb". B Match the non word boundary. 'Erb' can match 'Er' in "Verb", but cannot match 'Er' in "Never". The CX matches the control character indicated by x. For example, CM matches a Control-M or ause. The value of x must be one of A-Z or A-Z. Otherwise, the C is treated as a primary 'c' character. D Match a numeric character. Equivalent to [0-9]. D Match a non-digital character. Equivalent to [^ 0-9]. F Match a change page. Equivalent to X0C and CL. Match a newline. Equivalent to X0A and CJ. Match a carriage return. Equivalent to X0D and cm. S Match any blank character, including spaces, tabs, change page, and the like. Equivalent to [f V]. S Match any non-blank character. Equivalent to [^ f V]. Match a tab. Equivalent to X09 and CI. V Matched a vertical tab. Equivalent to X0B and CK. W Match any word character to the underscore. Equivalent to '[A-ZA-Z0-9_]'. W Match any nonword word characters. Equivalent to '[^ a-za-z0-9_]'. XN matches n, where n is a hexadecimal escape value. The hexadecimal escape value must be a determined two numbers long. For example, 'x41' matches "a". 'X041' is equivalent to 'X04' & "1". ASCII coding can be used in regular expressions. UM matches NUM, where NUM is a positive integer. References to the acquired match. For example, '(.) 1' matches two consecutive identical characters. Identifies an octal escape value or a backward reference. If the previous at least n acquired sub-expression, n is a backward reference. Otherwise, if n is an octal number (0-7), then n is an eight-input escape value. M Identify an octal escape value or a backward reference. If there is at least IS Preceded by Least NM before m, Nm is a backward reference. If there is at least n acquisition before m, then n is a backward reference with the text M. If the previous conditions are not satisfied, if n and m are octal numbers (0-7), then m will match the eight-en-propelled escape value nm. ML is if n is an octal number (0-3), and M and L are eight-input numbers (0-7), match the eight-en-propelled escape value NML. UN matches n, where N is a Unicode character represented by four hexadecimal numbers. For example, U00A9 matches copyright symbol (?). Regular expression instance regular expressions In the Forum (implemented with Java) // Replace the first REP1 from BeGin from BeGin in the string STR to REP2
Public Static String Replstr (String Str, InTbegin, String REP1, STRING REP2) {
Try {
IF (str.indexof (rep1, intbegin)! = - 1)
Str = str.substring (0, str.indexof (rep1, intbegin)) rep2 str.substring
(Str.Indexof (rep1) rep1.length ());
}
Catch (Exception E) {}
Return Str;
}
/ / Replace [HREF] [/ href] in the string
Public Static String ReplstrHref (String Str) {
Try {
IF (Str.Indexof ("[href")! = - 1) {// Test
IF (Str.Indexof ("[/ href]", str.indexof ("[href]")) == - 1)
// Search the string [hre] if there is [/ href]
Return Str;
Else {
IF (str.charat (str.indexof ("[href") 5) == ']') {
/ / Judgment whether the first four characters of the connection address are "http" (not case sensitive),
If not, automatically add "http: //"
IF (! Str.Substring (Str.Indexof ("[href]") 6, str.indexof ("[href]") 10).
Touppercase (). Equalsignorecase ("http")))
Str = replace.Replstr (Str, 0, "[href]", "");
Else
Str = replace.Replstr (Str, 0, "[href]", "");
}
IF (str.charat (str.indexof ("[href") 5) == '=') {
Str = Replace.Replstr (Str, Str.Indexof ("[HREF]"), "]", ">");
IF (! Str.Substring (Str.Indexof ("[href =") 6, str.indexof ("[href =") 10).
Touppercase (). Equalsignorecase ("http")))
Str = replace.Replstr (STR, 0, "[href =", "");
}
}
}
Catch (Exception E) {}
Return Str;
}
// Replace all the carries in the string to "
"
Public static string replstrbr (String Str) {
INT length = 0;
Try {
String Beginstr = ""
While (Str.Indexof (13, length)! = - 1) {
Beginstr = Str.Substring (0, str .Indexof (13, length));
Str = str.substring (0, str .indexof (13, length)) "
"
Str.Substring (Str.Indexof (13, Length) 1);
Length = beginstr.length () 4;
}
}
Catch (Exception E) {}
Return Str;
}
// Replace [IMG] [/ IMG] in the string into
Public Static String Replstrimg (String Str) {
Try {
IF (str .indexof ("[img]")! = - 1) {
IF (str.indexof ("[/ img]") == - 1)
Return Str;
Else {
/ / Judgment whether the first four characters of the connection address are "http" (not case sensitive),
If not, automatically add "http: //"
IF (! Str.Substring (Str.Indexof ("[IMG]") 5, Str.Indexof ("[IMG]") 9)
.touppercase (). Equalsignorecase ("http"))
Str = Replace.Replstr (STR, 0, "[IMG]", "");
}
}
}
Catch (Exception E) {}
Return Str;
}
Regular expressions in the Linux command regular expressions are initially started with UNIX, and a large number of applications in the Linux system, such as: Finding file 'file.php' contains a string 'HTML' available in the following command:
# GREP 'HTML' File.php
Find files 'file.php' The first line of the first line of the first line is '<' The following command:
# GREP '^ <' file.php
Find the current directory The file name is 'http' available:
# Find -name 'http'
Only lists the following commands list:
# ls -l | grep '^ d'
Regular expression in the Web application regular expression is the most commonly used in the web is to determine if the email address is legal, such as:
[0-9A-ZA-Z _] @ [0-9a-za-z _]. [0-9A-ZA-Z_] {2,3}
Determine if the IP address is legal, such as:
D {1, 3} .d {1, 3} .d {1, 3} .d {1,3}
Where "D" is a match number, "{1,3}" is optional 1-3 numbers, "." Is a matching point character. Note: The above regular expression If you want to apply to PHP or other languages, you should make a corresponding modification. '