Regular expression general introduction and grammar

xiaoxiao2021-03-06 126

1. Use regular expressions, you can:

Test a pattern of strings. For example, an input string can be tested to see if the string exists or a credit card number mode. This is called data validity verification. Replace the text. You can use a regular expression in the document to identify a particular text, then you can delete it, or replace it with another text. Extract a sub-string from the string based on the mode match. Can be used to find a specific text in the text or input field.

2. Regular expressions Detailed syntax list A regular expression is a text mode composed of normal characters (such as characters a to z) and special characters (called metammatics). This mode describes one or more strings to be matched when the text body is looking for. Regular expression As a template, a character mode matches the search string.

Here are some regular expressions that may encounter:

JScriptvbscript Match / ^ / [/ T] * $ / "^ / [/ t] * $" matches a blank line. // D {2} - / d {5} / "/ d {2} - / d {5}" Verify that one ID number is composed of a 2-digit, a hyphen, and a 5-digit. /< (.*)>.* (*)>. * "matches an HTML tag.

The table below is a complete list of metamorphic and its behavior in the regular expression context:

Character Description / Tag the next character as a special character, or a primary character, or a backward reference, or an octave. For example, 'n' matches characters "n". '/ n' matches a newline. Sequence '//' match "/" "/ (" matches "(". ^ Match the input string of the start position. If the multiline property of the regexp object is set, ^ also matches '/ n' or '/ r' The next location. $ Match the end position of the input string. If the multiline property of the regexp object is set, $ also matches the position before '/ n' or '/ r'. * Match the previous sub-expression zero or multiple times For example, ZO * can match "z" and "zoo". * Equivalent to {0,}. Match the previous sub-expression once or more. For example, 'ZO ' can match "ZO" and "ZOO" However, it cannot match "Z". Equivalent to {1,}. • Match the previous sub-expression zero or once. For example, "Do (es)" can match "do" or "does" "" " Do ".? Is equivalent to {0,1}. {n} n is a non-negative integer. Match the N times. For example, 'o {2}' does not match" Bob "'o', but can Match two O. {n,} n is a non-negative integer. At least n times. For example, 'o {2,}' does not match 'O' in "Bob", but can match " All O.'o {1,} 'in fooood is equivalent to' o '.' o {0,} 'is equivalent to' o * '. {n, m} M and N are non-negative integers Where n <= m. Minimize n times and match M times. For example, "o {1, 3}" will match the top three O.'o {0, 1} 'in "foooood". 'o?'. Please note that there is no space between commas and two numbers.? When this character is tight in any other restriction (*, ,?, {n}, {n,}, {n, M}), when the matching mode is non-greedy. Non-greedy mode matches the search string as little as possible, and the default greed mode is as much as possible to match the search string. For example, for strings "OOOO ", 'O ?' Will match a single" O ", and 'o ' will match all 'o' .. Match any individual characters other than" / n ". To match any characters including '/ n' Please use the mode of '[./n]'. (Pattern) matches Pattern and get this match. The acquired match can be obtained from the generated Matches collection, using the Submatches collection in VBScript, using $ 0 in JScript ... $ 9 properties. To match the bracket character, use '/ (' or '/)'. (12 :Pattern) match Patte Rn but does not acquire the matching result, that is, this is a non-acquired match, not for storage for storage. This is useful to use the "or" character (|) to combine a pattern.

For example, 'industr (?: Y | iES) is a smale of' Industry | Industries'. (? = pattern) Positive to check, match the lookup string at any string of Pattern. This is a non-acquisition match, that is, the match does not need to be used later. For example, 'Windows (? = 95 | 98 | NT | 2000)' Map "Windows" in Windows 2000, but does not match "Windows" in "Windows 3.1". It is not consumed by the character, that is, after a match occurs, start the next matching search immediately after the last match, not starting from the character containing the pre-check. (?! pattern) Negative review, match the lookup string at any string of Pattern. This is a non-acquisition match, that is, the match does not need to be used later. For example, 'Windows (?! 95 | 98 | NT | 2000) "can match" Windows "in Windows 3.1, but cannot match" Windows "in" Windows 2000 ". It is not consumed by the character, that is, after a match occurs, start the next matching search immediately after the last match, not the X | Y, which matches X or Y after the character containing the queue. For example, 'Z | Food' can match "z" or "food". '(z | f) OOD' matches "Zood" or "Food". [XYZ] Character collection. Match any of the included characters. For example, '[abc]' can match 'a' in "Plain". [^ XYZ] Negative character set. Match any of the characters that are not included. For example, '[^ ABC]' can match 'P' in "Plain". [A-Z] character range. Match any of the characters within the specified range. For example, '[a-z]' can match any lowercase alphabetic characters in the 'A' to 'Z' range. [^ a-z] Negative character range. Match any of any characters that are not within the specified range. For example, '[^ a-z]' can match any of any characters that are not in the 'A' to 'Z'. / b Match a word boundary, that is, the location of the words and spaces. For example, 'er / b' can match 'ER' in "Never", but do not match 'Er' in "Verb". / B matches non-word boundary. 'ER / B' can match 'Er' in "Verb", but cannot match 'Er' in "Never". / CX matches the control character indicated by x. For example, / cm matches a Control-M or an Enterprise. The value of x must be one of A-Z or A-Z. Otherwise, the C is treated as a primary 'c' character. / d Match a numeric character. Equivalent to [0-9]. / D Match a non-digital character. Equivalent to [^ 0-9]. / f Match a change page. Equivalent to / x0c and / cl. / n Match a newline. Equivalent to / x0a and / cj. / r Match a carriage return. Equivalent to / X0D and / cm. / s Match any blank character, including spaces, tabs, change page, and the like.

Equivalent to [/ f / n / r / t / v]. / S Match any non-blank character. Equivalent to [^ / f / N / R / T / V]. / t matches a tab. Equivalent to / x09 and / ci. / v Match a vertical tab. Equivalent to / x0b and / ck. / w Match any word character that includes underscore. Equivalent to '[A-ZA-Z0-9_]'. / W Match any non-word characters. Equivalent to '[^ a-za-z0-9_]'. / XN matches n, where n is a hexadecimal escape value. The hexadecimal escape value must be a determined two numbers long. For example, '/ x41' matches "a". '/ x041' is equivalent to '/ x04' & "1". ASCII coding can be used in regular expressions. ./num matches NUM, where NUM is a positive integer. References to the acquired match. For example, '(.) / 1' matches two consecutive identical characters. / n identifies an octal escape value or a backward reference. If the sub-expression of at least n acquired before / N, N is backward reference. Otherwise, if n is an octal number (0-7), then n is an eight-input escape value. / nm identifies an octal escape value or a backward reference. If there is at least NM acquisition sub-expression before / nm, Nm is backward reference. If there is at least n acquisition before / nm, then n is a backward reference to the text M. If the previous conditions are not satisfied, if n and m are octal numbers (0-7), the / nm will match the eight-way escape value Nm. / Nml If n is an octal number (0-3), and M and L are eight-input numbers (0-7), match the eight-en-en-escaic value NML. / UN matches N, where N is a Unicode character represented by four hexadecimal numbers. For example, / u00A9 matches copyright symbol (©). 3. Establish regular expressions

A regular expression can be constructed by placing various components of the expression mode between a pair of separators. For JScript, the separator is a pair of forward slash (/) characters. E.g:

/ Expression /

For VBScript, a pair of quotes ("") are used to determine the boundary of the regular expression. E.g:

Expression

In the two examples shown above, the regular expression mode is stored in the Pattern property of the Regexp object.

Regular expressions can be a single character, a character set, a character range, a selection between characters or any combination of all of these components. 4. Priority order

The following table lists the priority order of various regular expression operators from the highest priority to the lowest priority:

Operator describes / escape (), (? :), (? =), [] Parentheses and square brackets *, ,?, {N}, {n,}, {n, m} definition ^ , $, / Anymetachacter location and order | "or" operation

5. Special characters, need to escape

There are many figures that need to be specially processed when trying to match them. To match these special characters, these characters must first use these characters, that is, use a backslash (/) in front. The following table gives these special characters and its meaning:

Special character descriptions $ Match the end position of the input string. If the demiline property of the Regexp object is set, $ or '/ r' is matched. To match the worth itself, use / $. () Mark the beginning and end position of a child expression. Sub-expressions can be used later. To match these characters, use / (and /). * Match the previous sub-expression zero or multiple times. To match * characters, use / *. Match the previous sub-expression once or multiple times. To match characters, use / . Matches any single characters other than the resort / N. To match., Please use /. [Marking a bracket expression. To match [, please use / [. • Match the previous sub-expression zero or once, or indicate a non-greedy qualifier. To match? Characters, please use / ?. / Tag the next character as a special character, or primary character, or backward reference, or eight-way escape. For example, 'n' matches character 'n'. '/ n' matches changing. Sequence '//' Match "/", and '/ (', match "(". ^ Matches the start position of the input string unless used in square brackets, it indicates that it does not accept the character set. Match ^ character itself, please use / ^. {Tag qualifier expression. To match {, please use / {. | Indicate two options between two items. To match |, please use /|.6. Print character

There are a lot of useful non-print characters, which occasionally must be used. The following table shows the escape sequence used to indicate these non-print characters:

Character Meaning / CX matches the control character indicated by x. For example, / cm matches a Control-M or an Enterprise. The value of x must be one of A-Z or A-Z. Otherwise, the C is treated as a primary 'c' character. / f Match a change page. Equivalent to / x0c and / cl. / n Match a newline. Equivalent to / x0a and / cj. / r Match a carriage return. Equivalent to / X0D and / cm. / s Match any blank character, including spaces, tabs, change page, and the like. Equivalent to [/ f / n / r / t / v]. / S Match any non-blank character. Equivalent to [^ / f / N / R / T / V]. / t matches a tab. Equivalent to / x09 and / ci. / v Match a vertical tab. Equivalent to / x0b and / ck.

7. Character match

The period (.) Matches any single print or non-printing character in a string, except for the wrap (/ N). The following JScript regular expression can match 'AAC', 'ABC', 'ACC', 'ADC', and the like can also match 'A1C', 'A2C', A-C 'and A # C':

/a.c/

Equivalent VBScript regular expression is:

"a.c"

If you try to match a string containing the file name, the period (.) Is part of the input string, you can add a backslash (/) character in front of the period in the regular expression to achieve this requirement. For example, the following JScript regular expression can match 'filename.ext':

/filename/.ext/

For VBScript, the equivalent expression is as follows:

"filename / .ext"

These expressions are still quite limited. They only allow matching any single characters. In many cases, it is useful to match special characters from the list. For example, if the input text contains the number representation as Chapter 1, Chapter 2, the chapter title you may need to find these chapters.

Braces expressions

One or more single characters can be placed in a square bracket ([and]) to create a list of to be matched. If the character is placed in parentheses, the list is called a bracket expression. Like anywhere in parentheses, ordinary characters represent itself, that is, they match one of them in the input text. Most special characters will lose their meaning when located in parentheses. There are some exceptions: ']' Character If it is not the first item, a list will be ended. To match the ']' character in the list, put it in the first item, followed behind the start '['. '/' Is still an escap. To match '/' characters, use '//'.

The characters included in parentheses are only matched to a single character in the parentheses expression in the regular expression. The following JScript regular expressions can match 'Chapter 1', 'Chapter 2', 'Chapter 3', 'Chapter 4' and 'Chapter 5':

/ Chapter [12345] /

In VBScript, you must match the same chapter title, please use the following expression:

"Chapter [12345]"

Note that the word 'Chapter' and the positional relationship of the characters in the brackets are fixed. Therefore, bracket expressions are only used to specify a character set that satisfies the single-character position immediately after the word 'Chapter' and a space. Here is the ninth character position.

If you want to use the range instead of the character itself, you can use a hyphen to separate the start and end characters of the range. The character value of each character will determine its relative order in a range. The following JScript regular expression contains an equivalent to the range expressions of the parentheses shown above.

/ Chapter [1-5] /

The expression of the same function in VBScript is as follows:

"Chapter [1-5]"

If the range is specified in this manner, the start and end values are included in this range. One thing to note is that the starting value in Unicode sort must be before the end value.

If you want to include even characters in parentheses, you must use one of the following methods:

Use a backslash to escape: [/ -] placed the hinder in the start and end position of the parentheses list. The following expressions can match all lowercase letters and hyphens: [-A-Z]

[A-Z-] Create a range, where the value of the start character is less than the hyphen, and the value of the end character is equal to or greater than the hyperproof. The following two regular expressions meet this requirement: [! -]

[! - ~]

Similarly, by placing an insert (^) at the beginning of the list (^), you can find all characters in the list or range. If the insert appears in other locations of the list, it matches its own, there is no special meaning. The following JScript regular expression match chapter section is more than 5 chapter title:

/ CHAPTER [^ 12345] /

Use VBScript:

"Chapter [^ 12345]"

In the example shown above, the expression will match any numeric characters other than 1, 2, 3, 4, or 5 in the ninth position. Therefore, 'Chapter 7' is a match, the same 'Chapter 9' is also the same.

The above expression can be represented using a hyphen (-). For JScript:

/ Chapter [^ 1-5] /

Or, VBScript is:

"Chapter [^ 1-5]"

Typical usage of parentheses is to specify matching of any uppercase or lowercase alphanumeric characters or any numbers. The following JScript expression gives this match: / [A-ZA-Z0-9] /

Equivalent VBScript expression is:

"[A-ZA-Z0-9]" 8. The qualifier sometimes doesn't know how many characters to match. In order to adapt to this uncertainty, the regular expression supports the concept of qualifier. These qualifiers can specify how many times a given component must appear to match the match. The following table gives a description of various qualifiers and its meaning:

Character Description * Matches the previous sub-expression zero or multiple times. For example, ZO * can match "Z" and "ZOO". * Equivalent to {0,}. Match the previous sub-expression once or multiple times. For example, 'ZO ' can match "ZO" and "ZOO" but cannot match "Z". Equivalent to {1,}. • Match the previous sub-expression zero or once. For example, "Do (ES)" can match "do" in "do" or "does". Is equivalent to {0,1}. {n} n is a non-negative integer. Match the determined N times. For example, 'o {2}' does not match 'o' in "Bob", but can match two O in "Food". {n,} n is a non-negative integer. At least n times. For example, 'o {2,}' cannot match 'O' in "Bob", but can match all O in "fooOOD". 'o {1,}' is equivalent to 'o '. 'o {0,}' is equivalent to 'o *'. {N, M} M and N are non-negative integers, where n <= m. Match at least n times and matched M times. For example, "O {1, 3}" will match the top three O in "foooood". 'o {0,1}' is equivalent to 'o?'. Please note that there is no space between commas and two numbers.

For a large input document, the number of chapters is easily more than 9 chapters, so there is a way to handle two-digit or three-digit chapter number. This feature is provided. The following JScript regular expressions can match chapter titles with any bits: / chapter [1-9] [0-9] * / The following VBScript regular expression performs the same match: "Chapter [1-9] [0 -9] * "Please note that the qualifier appears after the range expression. Therefore, it will be applied to the entire range of expressions included, in this example, only numbers from 0 to 9 are specified. There is no use of ' ' default here, because one number is not required in the second or subsequent position. Also didn't use '?' Characters, because this will limit the number of chapters to only two digits. At least one number is required after 'Chapter' and space characters. If the number of chapter is limited to 99, you can use the following JScript expression to specify at least one digit, but no more than two numbers. / Chapter [0-9] {1,2} / for VBScript You can use the following regular expression: "Chapter [0-9] {1, 2}" The shortcoming of the above expression is that if there is a chapter number greater than 99, It still only matches the first two digits. Another disadvantage is that some people can create a Chapter 0 and can still match. A better JScript expression for matches two-digit number is as follows: / chapter [1-9] [0-9]? / Or / chapter [1-9] [0-9] {0,1} / VBScript, the following expression is equivalent to the above: "Chapter [1-9] [0-9]?" Or "chapter [1-9] [0-9] {0, 1}" '*', ' ' And '?' Limits are called greed, that is, they match the text as much as possible. Sometimes this is not what happened. Sometimes I just hope that minimal match. For example, you may want to search for an HTML document to find a chapter title that is included in the H1 tag. In the document, this text may have the following form:

chapter 1 - Introduction to Regular Expressions The following expression matches all content between the beginning of the 6th mark (<) to the end of the H1 tag . The regular expression of/vbscript is: "<. *>" If the starting H1 tag is started, the following non-greedy expressions only match

. / ". ?>" By placing '?'? '?'? '?'? '?', The expression is transferred from greedy or minimally matched from greed. . 9. Location The following is now, and the examples seen only consider finding the chapter titles that appear anywhere. Any string 'Chapter' after the appearance, follows a space and a number might be a real chapter title, or a cross-reference for other chapters. Since the true chapter title always appears in a row, you need to design a method only to find the title and do not look for cross-reference. The locator provides this feature. The locator can secure a regular expression to the beginning or end of a row. You can also create a regular expression that only occurs only within words or only at the beginning of words. The following table contains a list of regular expressions and their meaning:

Character Description ^ Matches the start position of the input string. If the multiline property of the regexp object is set, ^ also matches the location after '/ n' or '/ r'. $ Match the end position of the input string. If the multiline property of the Regexp object is set, the $ also matches the position before '/ n' or '/ r'. / b Match a word boundary, that is, the location of the words and spaces. / B matches non-word boundary. You cannot use a qualifier for the locator. Because there is no plurality of positions in front or rear of the word boundary, such as the expression of '^ ' is not allowed. To match the text of a line of text, use the '^' characters at the beginning of the regular expression. Don't make the syntax of '^' with their syntax in parentheses. Their syntax is different. To match the text of a line of text, use the '$' character in the end of the regular expression. To use the locator when finding the chapter title, the following JScript regular expressions will match the beginning of the beginning of one line with the top of the chapter title: / ^ chapter [1-9] [0-9] {0,1} The regular expression of the same function in VBScript is as follows: "^ Chapter [1-9] [0-9] {0,1}" A true chapter title not only appears in a row, and this is only this One content, therefore, it is also located in a row. The following expression ensures that the specified match matches the chapter without matching cross-reference. It is implemented by creating a regular expression that matches only the start and end position of a line. / ^ Chapter [1-9] [0-9] {0,1} $ / for VBScript: "^ Chapter [1-9] [0-9] {0,1} $" Matching the word boundary has a little Different, but it adds a very important feature to regular expressions. The word boundary is the location between words and spaces. Non-word boundaries are anywhere else. The following JScript expressions will match the first three characters of the word 'Chapter' because they appear after the word boundary: // bcha / for VBScript is: "/ bcha" here's key location. If it is located at the beginning of the string to match, the lookup is matched at the beginning of the word; if it is located at the end of the string, the lookup is matched at the end of the word. For example, the following expression will match 'Ter' in the word 'chapter' because it appears before the word boundary: / Ter / B / and "Ter / B" The following expression will match 'Apt' because it is located The 'Chapter' is middle, but does not match 'apt': // bapt / and "/ bapt" in 'Aptitude' This is because 'APT' in the word 'Chapter' appears in the non word boundary position, and in words' Aptitude 'is located in the word boundary position. The location of the non-word boundary operator is not important because the match is not related to the beginning or end of a word.

转载请注明原文地址:https://www.9cbs.com/read-73690.html

9cbs

New Post(0)