Microsoft's Regular Expression Tutorial (4): Limits and Locator

zhaozj2021-02-16  72

Default

Sometimes I don't know how much characters you want to match. In order to adapt to this uncertainty, the regular expression supports the concept of qualifier. These qualifiers can specify how many times a given component must appear to match the match.

The following table gives a description of various qualifiers and its meaning:

Character Description * Matches the previous sub-expression zero or multiple times. For example, ZO * can match "Z" and "ZOO". * Equivalent to {0,}. Match the previous sub-expression once or multiple times. For example, 'ZO ' can match "ZO" and "ZOO" but cannot match "Z". Equivalent to {1,}. • Match the previous sub-expression zero or once. For example, "Do (ES)" can match "do" in "do" or "does". Is equivalent to {0,1}. {n} n is a non-negative integer. Match the determined N times. For example, 'o {2}' does not match 'o' in "Bob", but can match two O in "Food". {n,} n is a non-negative integer. At least n times. For example, 'o {2,}' cannot match 'O' in "Bob", but can match all O in "fooOOD". 'o {1,}' is equivalent to 'o '. 'o {0,}' is equivalent to 'o *'. {N, M} M and N are non-negative integers, where n <= m. Match at least n times and matched M times. Liu, "O {1, 3}" will match the top three O in "foooood". 'o {0,1}' is equivalent to 'o?'. Please note that there is no space between commas and two numbers.

For a large input document, the number of chapters is easily more than 9 chapters, so there is a way to handle two-digit or three-digit chapter number. This feature is provided. The following JScript regular expression can match the chapter title with any digits:

/ Chapter [1-9] [0-9] * /

The following VBScript regular expressions perform the same match:

"Chapter [1-9] [0-9] *"

Please note that the qualifier appears after the range expressions. Therefore, it will be applied to the entire range of expressions included, in this example, only numbers from 0 to 9 are specified.

There is no use of ' ' default here, because one number is not required in the second or subsequent position. Also didn't use '?' Characters, because this will limit the number of chapters to only two digits. At least one number is required after 'Chapter' and space characters.

If the number of chapter is limited to 99, you can use the following JScript expression to specify at least one digit, but no more than two numbers.

/ Chapter [0-9] {1,2} /

The following regular expressions can be used for VBScript:

"Chapter [0-9] {1,2}"

The disadvantage of the above expression is that if there is a chapter number greater than 99, it still only matches the first two digits. Another disadvantage is that some people can create a Chapter 0 and can still match. A better JScript expression that matches the two-digit number is as follows:

/ Chapter [1-9] [0-9]? /

or

/ Chapter [1-9] [0-9] {0,1} /

For VBScript, the following expression is equivalent to the above:

"Chapter [1-9] [0-9]?"

or

"Chapter [1-9] [0-9] {0,1}" "" ", ' ' and '?' The limit is called greed, that is, they match the text as much as possible. Sometimes this is not what happened. Sometimes I just hope that minimal match.

For example, you may want to search for an HTML document to find a chapter title that is included in the H1 tag. This text may have the following form in the document:

Chapter 1 - Introduction To Regular Expressions

The following expression matches all content between the beginning of the smaller than the number (<) to the end of the H1 tag.

/ "

The regular expression of VBScript is:

"<. *>"

If the starting H1 mark begins, the following non-greedy expressions only match

.

/ "

or

"<. *?>"

By placing '?'? '?'? '"After' * ',' 'or'? ', The expression is transferred from greedy to non-greed or minimally matches.

Locator

So far, the examples seen are considered to find the chapter title that appears anywhere. Any string 'Chapter' after the appearance, follows a space and a number might be a real chapter title, or a cross-reference for other chapters. Since the true chapter title always appears in a row, you need to design a method only to find the title and do not look for cross-reference.

The locator provides this feature. The locator can secure a regular expression to the beginning or end of a row. You can also create a regular expression that only occurs only within words or only at the beginning of words. The following table contains a list of regular expressions and their meaning:

Character Description ^ Matches the start position of the input string. If the multiline property of the regexp object is set, ^ also matches the location after '/ n' or '/ r'. $ Match the end position of the input string. If the multiline property of the Regexp object is set, the $ also matches the position before '/ n' or '/ r'. / b Match a word boundary, that is, the location of the words and spaces. / B matches non-word boundary.

You cannot use a qualifier for the locator. Because there is no plurality of positions in front or rear of the word boundary, such as the expression of '^ *' is not allowed.

To match the text of a line of text, use the '^' characters at the beginning of the regular expression. Don't make the syntax of '^' with their syntax in parentheses. Their syntax is different.

To match the text of a line of text, use the '$' character in the end of the regular expression.

To use the locator when finding the chapter title, the following JScript regular expression will match the beginning of a row at the beginning of a row, the chapter title:

/ ^ Chapter [1-9] [0-9] {0,1} /

The regular expression of the same function in VBScript is as follows:

"^ Chapter [1-9] [0-9] {0,1}"

A true chapter title not only appears in a row, and this line is only this content, so it is inevitably located on a line. The following expression ensures that the specified match matches the chapter without matching cross-reference. It is implemented by creating a regular expression that matches only the start and end position of a line.

/ ^ Chapter [1-9] [0-9] {0,1} $ /

Use VBScript:

"^ Chapter [1-9] [0-9] {0,1} $"

There is a little different from the matching word boundary, but it adds a very important feature to regular expressions. The word boundary is the location between words and spaces. Non-word boundaries are anywhere else. The following JScript expression will match the first three characters of the word 'Chapter' because they appear after the word boundary: // bcha /

For VBScript:

"/ bcha"

The location of the '/ b' operator here is critical. If it is located at the beginning of the string to match, the lookup is matched at the beginning of the word; if it is located at the end of the string, the lookup is matched at the end of the word. For example, the following expression will match 'Ter' in the word 'chapter' because it appears before the word boundary:

/ Ter / B /

as well as

"Ter / B"

The following expression will match 'Apt' because it is located in 'Chapter', but does not match 'Apt' in 'Aptitude':

// bapt /

as well as

"/ Bapt"

This is because 'APT' in the word 'Chapter' appears in the non word boundary position, and in the word 'aptitude' is located in the word boundary position. The location of the non-word boundary operator is not important because the match is not related to the beginning or end of a word.

转载请注明原文地址:https://www.9cbs.com/read-18839.html

New Post(0)