Regular expression basics

xiaoxiao2021-04-07  329

Regular expression basics

A regular expression is a text mode composed of normal characters (such as characters a to z) and special characters (called metammatics). This mode describes one or more strings to be matched when the text body is looking for. Regular expression As a template, a character mode matches the search string. Such as:

JScriptvbscript Match / ^ / [/ T] * $ / "^ / [/ t] * $" matches a blank line. // D {2} - / d {5} / "/ d {2} - / d {5}" Verify that one ID number is composed of a 2-digit, a hyphen, and a 5-digit. /< (.*)>.* (*)>. * "matches an HTML tag.

The table below is a complete list of metamorphic and its behavior in the regular expression context:

Character Description / Tag the next character as a special character, or a primary character, or a backward reference, or an octave. For example, 'n' matches characters "n". '/ n' matches a newline. Sequence '//' match "/" "/ (" matches "(". ^ Match the input string of the start position. If the multiline property of the regexp object is set, ^ also matches '/ n' or '/ r' The next location. $ Match the end position of the input string. If the multiline property of the regexp object is set, $ also matches the position before '/ n' or '/ r'. * Match the previous sub-expression zero or multiple times For example, ZO * can match "z" and "zoo". * Equivalent to {0,}. Match the previous sub-expression once or more. For example, 'ZO ' can match "ZO" and "ZOO" However, it cannot match "Z". Equivalent to {1,}. • Match the previous sub-expression zero or once. For example, "Do (es)" can match "do" or "does" "" " Do ".? Is equivalent to {0,1}. {n} n is a non-negative integer. Match the N times. For example, 'o {2}' does not match" Bob "'o', but can Match two O. {n,} n is a non-negative integer. At least n times. For example, 'o {2,}' does not match 'O' in "Bob", but can match " All O.'o {1,} 'in fooood is equivalent to' o '.' o {0,} 'is equivalent to' o * '. {n, m} M and N are non-negative integers Where n <= m. Minimize n times and match M times. For example, "o {1, 3}" will match the top three O.'o {0, 1} 'in "foooood". 'o?'. Please note that there is no space between commas and two numbers.? When this character is tight in any other restriction (*, ,?, {n}, {n,}, {n, M}), when the matching mode is non-greedy. Non-greedy mode matches the search string as little as possible, and the default greed mode is as much as possible to match the search string. For example, for strings "OOOO ", 'O ?' Will match a single" O ", and 'o ' will match all 'o' .. Match any individual characters other than" / n ". To match any characters including '/ n' Please use the mode of '[./n]'. (Pattern) matches Pattern and get this match. The acquired match can be obtained from the generated Matches collection, using the Submatches collection in VBScript, using $ 0 in JScript ... $ 9 properties. To match the bracket character, use '/ (' or '/)'. (12 :Pattern) match Patte Rn but does not acquire the matching result, that is, this is a non-acquired match, not for storage for storage. This is useful to use the "or" character (|) to combine a pattern.

For example, 'industr (?: Y | iES) is a smale of' Industry | Industries'. (? = pattern) Positive to check, match the lookup string at any string of Pattern. This is a non-acquisition match, that is, the match does not need to be used later. For example, 'Windows (? = 95 | 98 | NT | 2000)' Map "Windows" in Windows 2000, but does not match "Windows" in "Windows 3.1". It is not consumed by the character, that is, after a match occurs, start the next matching search immediately after the last match, not starting from the character containing the pre-check. (?! pattern) Negative review, match the lookup string at any string of Pattern. This is a non-acquisition match, that is, the match does not need to be used later. For example, 'Windows (?! 95 | 98 | NT | 2000) "can match" Windows "in Windows 3.1, but cannot match" Windows "in" Windows 2000 ". It is not consumed by the character, that is, after a match occurs, start the next matching search immediately after the last match, not the X | Y, which matches X or Y after the character containing the queue. For example, 'Z | Food' can match "z" or "food". '(z | f) OOD' matches "Zood" or "Food". [XYZ] Character collection. Match any of the included characters. For example, '[abc]' can match 'a' in "Plain". [^ XYZ] Negative character set. Match any of the characters that are not included. For example, '[^ ABC]' can match 'P' in "Plain". [A-Z] character range. Match any of the characters within the specified range. For example, '[a-z]' can match any lowercase alphabetic characters in the 'A' to 'Z' range. [^ a-z] Negative character range. Match any of any characters that are not within the specified range. For example, '[^ a-z]' can match any of any characters that are not in the 'A' to 'Z'. / b Match a word boundary, that is, the location of the words and spaces. For example, 'er / b' can match 'ER' in "Never", but do not match 'Er' in "Verb". / B matches non-word boundary. 'ER / B' can match 'Er' in "Verb", but cannot match 'Er' in "Never". / CX matches the control character indicated by x. For example, / cm matches a Control-M or an Enterprise. The value of x must be one of A-Z or A-Z. Otherwise, the C is treated as a primary 'c' character. / d Match a numeric character. Equivalent to [0-9]. / D Match a non-digital character. Equivalent to [^ 0-9]. / f Match a change page. Equivalent to / x0c and / cl. / n Match a newline. Equivalent to / x0a and / cj. / r Match a carriage return. Equivalent to / X0D and / cm. / s Match any blank character, including spaces, tabs, change page, and the like.

Equivalent to [/ f / n / r / t / v]. / S Match any non-blank character. Equivalent to [^ / f / N / R / T / V]. / t matches a tab. Equivalent to / x09 and / ci. / v Match a vertical tab. Equivalent to / x0b and / ck. / w Match any word character that includes underscore. Equivalent to '[A-ZA-Z0-9_]'. / W Match any non-word characters. Equivalent to '[^ a-za-z0-9_]'. / XN matches n, where n is a hexadecimal escape value. The hexadecimal escape value must be a determined two numbers long. For example, '/ x41' matches "a". '/ x041' is equivalent to '/ x04' & "1". ASCII coding can be used in regular expressions. ./num matches NUM, where NUM is a positive integer. References to the acquired match. For example, '(.) / 1' matches two consecutive identical characters. / n identifies an octal escape value or a backward reference. If the sub-expression of at least n acquired before / N, N is backward reference. Otherwise, if n is an octal number (0-7), then n is an eight-input escape value. / nm identifies an octal escape value or a backward reference. If there is at least NM acquisition sub-expression before / nm, Nm is backward reference. If there is at least n acquisition before / nm, then n is a backward reference to the text M. If the previous conditions are not satisfied, if n and m are octal numbers (0-7), the / nm will match the eight-way escape value Nm. / Nml If n is an octal number (0-3), and M and L are eight-input numbers (0-7), match the eight-en-en-escaic value NML. / UN matches N, where N is a Unicode character represented by four hexadecimal numbers. For example, / u00A9 matches copyright symbol (©). Let's take a few examples: "^ the": indicates all strings starting with "THE" ("there", "the cat", etc.); "Of despair $": indicates the string ending with "of despair" "^ ABC $": Indicates that the start and end are "ABC" strings - Oh, only "ABC" itself; "notice": indicates any string containing "Notice". The three symbols of '*', ' ' and '' ', indicating that one or a sequence character is repeated. They represent "no or more", "once or more" and "no or once". Here are a few examples:

"ab *": means a string having a A back followed by zero or several b. ("A", "ab", "abbb", ...); "ab ": indicates a string having a A followed by at least one B or more; "ab?": Indicates a string has a A back Follow zero or one B; "a? B $": indicates that there is zero or one A followed one or more b at the end of the string.

It is also possible to use the scope, enclosed by braces to indicate the range of repetitions.

"AB {2}": means a string has a A follow 2 B ("abb"); "AB {2,}": indicates a string having a A follow at least 2 b; "AB {3, 5} ": Represents a string having a A follows 3 to 5 B. Note that you must specify the lower limit of the range (such as "{0, 2}" instead of "{, 2}"). Also, you may notice, '*', ' ' and '' '' correspond to "{0,}", "{1,}" and "{0, 1}". There is also a '|', indicating "or" operation:

"hi|hello": means "hi" or "hello"; "indicates" bef "or" cdef ";" (AQ) * C ": indicating" bef "or" cdef ";" The string "a" "b" mixed string followed by "C";

'.' Can replace any character:

"a. [0-9]": indicates a string with a "A" followed by an arbitrary character and a number; "^. {3} $": indicates a string with any three characters (length 3 Character);

Square brackets indicate that some characters allow in a particular location in a string:

"[ab]": indicates a string with "a" or "b" (equivalent to "A|B"); "[AD]": indicates a string contains lowercase 'a' to 'd' One (equivalent to "A|B|c | D" or "[ABCD]"); "^ [A-ZA-Z]": indicates a string that starts with letters; "[0-9]%" : Represents a number in front of a percent sign; ", [A-ZA-Z0-9] $": Indicates a string with a comma with a letter or numbers.

You can also use '^' in square brackets to indicate the characters that do not want to appear, '^' should in square brackets. (Such as "% [^ a-za-z]%" indicates that the letter should not appear in two percent sign.

In order to express verbatics, you must add transfer characters '/' before "^. $ () | * ? {/".

Please note that in square brackets, no escape characters need.

转载请注明原文地址:https://www.9cbs.com/read-132467.html

New Post(0)