Regular expression

xiaoxiao2021-03-06 119

Cognitive regular expression

If there is no tax expression, it is not familiar with this term and concept. However, they are not so nice you imagine.

Recall how to find files on the hard disk. You will definitely use the * characters to help find the files you are looking for. • Characters match a single character in the file name, and * matches one or more characters. A pattern such as 'Data ?dat' can find the following file:

Data1.dat

Data2.dat

Datax.dat

Datan.dat

If you use * characters instead? The number of files found will be expanded. 'data * .dat' can match all the following file names:

Data.dat

Data1.dat

Data2.dat

Data12.dat

Datax.dat

Dataxyz.dat

Although this search file is certainly useful, it is also very limited. • The limited capacity of wildcards can make you have a concept of regular expressions, but the regular expression is more powerful, and more flexible.

Early origin of regular expressions

Regular expressions "ancestors" can have been traced back to an early study on how the human nervous system works. Two neur physiologists of Warren McCulloch and Walter Pitts have studied a mathematical way to describe these neural networks.

In 1956, a US mathematician called Stephen Kleene published an early working on McCulloch and Pitts, published a papers titled "Neural Network Emergencies", introduced the concept of regular expressions. Regular expressions are used to describe expressions he called "regular set algebra", so the term "regular expression" is used.

Subsequently, it is found that this work can be applied to some early studies using Ken Thompson's computing search algorithm, Ken Thompson is the main inventors of UNIX. The first practical application of the regular expression is the QED editor in UNIX.

As they said, the rest is a well-known history. Since then, until now the regular expression is an important part of the text-based editor and search tool.

Use regular expressions

In a typical search and alternative, the exact text to be found must be provided. This technique may be sufficient for simple search and replacement tasks in static text, but because it lacks flexibility, it is difficult to search for dynamic text, or even impossible.

Using regular expressions, you can:

ü Test a pattern of strings. For example, an input string can be tested to see if the string exists or a credit card number mode. This is called data validity verification.

ü Replace the text. You can use a regular expression in the document to identify a particular text, then you can delete it, or replace it with another text.

ü extracts a sub-string from the string according to the mode match. Can be used to find a specific text in the text or input field.

For example, if you need to search the entire Web site to delete some excessive materials and replace some HTML formatted tags, you can use the regular expression to test each file, see if there is a material or HTML you want to find in this file. Formatted tag. With this method, you can narrow the affected file range to those files that contain materials to be deleted or changed. You can then use the regular expression to delete the outdated material, and finally, you can use the regular expression again to find and replace those markers that need to be replaced.

Regular expression syntax

A regular expression is a text mode composed of normal characters (such as characters a to z) and special characters (called metammatics). This mode describes one or more strings to be matched when the text body is looking for. Regular expression As a template, a character mode matches the search string.

Here are some regular expressions that may encounter:

JScript

VBScript

match

/ ^ / [/ t] * $ /

"^ / [/ t] * $"

Match a blank line.

// d {2} - / d {5} /

"/ d {2} - / d {5}"

Verify that an ID number is composed of a 2-digit, a hyphen, and a 5-digit.

/< (.*)>.*< / / / 10//

"<(. *)>. * "

Match an HTML tag.

The table below is a complete list of metamorphic and its behavior in the regular expression context:

character

description

The next character is marked as a special character, or a primary character, or a rearward reference, or an octal escape. For example, 'n' matches characters "n". '/ n' matches a newline. Sequence '//' Match "/" and "match" (".

Match the start position of the input string. If the multiline property of the regexp object is set, ^ also matches the location after '/ n' or '/ r'.

Match the end position of the input string. If the multiline property of the Regexp object is set, the $ also matches the position before '/ n' or '/ r'.

Match the previous sub-expression zero or multiple times. For example, ZO * can match "Z" and "ZOO". * Equivalent to {0,}.

Match the previous sub-expression once or multiple times. For example, 'ZO ' can match "ZO" and "ZOO" but cannot match "Z". Equivalent to {1,}.

Match the previous sub-expression zero or once. For example, "Do (ES)" can match "do" in "do" or "does". Is equivalent to {0,1}.

{n}

n is a non-negative integer. Match the determined N times. For example, 'o {2}' does not match 'o' in "Bob", but can match two O in "Food".

{n,}

n is a non-negative integer. At least n times. For example, 'o {2,}' cannot match 'O' in "Bob", but can match all O in "fooOOD". 'o {1,}' is equivalent to 'o '. 'o {0,}' is equivalent to 'o *'.

{n, m}

M and N are non-negative integers, where n <= m. Match at least n times and matched M times. Liu, "O {1, 3}" will match the top three O in "foooood". 'o {0,1}' is equivalent to 'o?'. Please note that there is no space between commas and two numbers.

When the character is tightly followed by any other restriction (*, ,?, {N}, {n,}, {n, m}), the matching mode is non-greedy. Non-greedy patterns match the search for strings as little as possible, and the default greed mode is as many as possible to match the search string. For example, for the string "OOOO", 'o ?' Will match a single "O", and 'o ' will match all 'o'.

Match any individual characters other than "/ n". To match any characters including '/ n', use the mode of '[./n]'. (Pattern)

Match Pattern and get this match. The acquired match can be obtained from the generated Matches, using the Submatches collection in VBScript, using $ 0 ... $ 9 properties in JScript. To match the bracket characters, use '/ (' or '/)'.

(?: pattern)

Matching Pattern but does not acquire matching results, that is, this is a non-acquired match, not for storage for storage. This is useful to use the "or" character (|) to combine a pattern. For example, 'industr (?: Y | iES) is a smale of' Industry | Industries'.

(? = pattern)

Positive summation, match the lookup string at any string of Pattern. This is a non-acquisition match, that is, the match does not need to be used later. For example, 'Windows (? = 95 | 98 | NT | 2000)' Map "Windows" in Windows 2000, but does not match "Windows" in "Windows 3.1". It is not consumed by the character, that is, after a match occurs, start the next matching search immediately after the last match, not starting from the character containing the pre-check.

(?! Pattern)

Negative forecasts, match the lookup string at any string of NEGATIVE LOOKAHEAD MATCHES THE SEARCH STRING AT Any Point Where A STRING NOT Matching Pattern. This is a non-acquisition match, that is, the match does not need to be used later. For example, 'Windows (?! 95 | 98 | NT | 2000) "can match" Windows "in Windows 3.1, but cannot match" Windows "in" Windows 2000 ". The forecast does not consume characters, that is, after a match occurs, start the next matching search immediately after the last match, instead of starting from the character included in the queue

X | Y

Match X or Y. For example, 'Z | Food' can match "z" or "food". '(z | f) OOD' matches "Zood" or "Food".

[xyz]

Character collection. Match any of the included characters. For example, '[abc]' can match 'a' in "Plain".

[^ xyz]

Negative character set. Match any of the characters that are not included. For example, '[^ ABC]' can match 'P' in "Plain".

[a-z]

Character range. Match any of the characters within the specified range. For example, '[a-z]' can match any lowercase alphabetic characters in the 'A' to 'Z' range.

[^ a-z]

Negative character range. Match any of any characters that are not within the specified range. For example, '[^ a-z]' can match any of any characters that are not in the 'A' to 'Z'.

/ B

Match a word boundary, that is, the location of the words and spaces. For example, 'er / b' can match 'ER' in "Never", but do not match 'Er' in "Verb". / B

Match the non word boundary. 'ER / B' can match 'Er' in "Verb", but cannot match 'Er' in "Never".

/ cx

Match the control character indicated by x. For example, / cm matches a Control-M or an Enterprise. The value of x must be one of A-Z or A-Z. Otherwise, the C is treated as a primary 'c' character.

/ d

Match a numeric character. Equivalent to [0-9].

/ D

Match a non-digital character. Equivalent to [^ 0-9].

/ f

Match a change page. Equivalent to / x0c and / cl.

/ N

Match a newline. Equivalent to / x0a and / cj.

/ r

Match a carriage return. Equivalent to / X0D and / cm.

/ s

Match any blank characters, including spaces, tabs, change, and more. Equivalent to [/ f / n / r / t / v].

/ S

Match any non-blank character. Equivalent to [^ / f / N / R / T / V].

/ t

Match a tab. Equivalent to / x09 and / ci.

/ v

Match a vertical tab. Equivalent to / x0b and / ck.

/ w

Match any word characters including underscore. Equivalent to '[A-ZA-Z0-9_]'.

/ W

Match any non-word characters. Equivalent to '[^ a-za-z0-9_]'.

/ xn

Match N, where n is a hexadecimal escape value. The hexadecimal escape value must be a determined two numbers long. For example, '/ x41' matches "a". '/ x041' is equivalent to '/ x04' & "1". ASCII coding can be used in regular expressions. .

/ NUM

Match Num, where NUM is a positive integer. References to the acquired match. For example, '(.) / 1' matches two consecutive identical characters.

/ N

Identifies an octal escape value or a backward reference. If the sub-expression of at least N before / N, n is a backward reference. Otherwise, if n is an octal number (0-7), then n is an eight-input escape value.

/ nm

Identifies an octal escape value or a backward reference. If the / nm has at least IS Preceded by Least NM acquired a sub-expression, the nm is a backward reference. If there is at least n acquisitions before / nm, then n is a rear reference reference to the text M. If the previous conditions are not satisfied, if n and m are octal numbers (0-7), the / nm will match the eight-way escape value Nm.

/ NML

If N is an octal number (0-3), and M and L are eight-input numbers (0-7), match the eight-encentric escape value NML.

/ UN

Match N, where N is a Unicode character represented by four hexadecimal numbers. For example, / u00A9 matches copyright symbol (?).

Regular expression of priority order

After constructing the regular expression, you can evaliate like a mathematical expression, that is, from left to right and in accordance with a priority order.

The following table lists the priority sequence of various regular expression operators from the highest priority to the lowest priority:

Operator

description

Escapes

(), (? :), (? =), []

Parentheses and square brackets

*, ,?, {n}, {n,}, {n, m}

Default

^, $, / Anymetachacter

Location and order

"Or" operation

Ordinary character

Ordinary characters consist of all those that are not explicitly specified as a metamorphic character, a non-printing character. This includes all uppercase and lowercase letters characters, all numbers, all punctuation symbols, and some symbols. The simplest regular expression is a separate normal character that matches the character in the search string itself. For example, single-character mode 'a' can match the letter 'a' that appears in any position in the search string. Here are some single-character regular expression modes:

/ a / / 7 / / m /

Equivalent VBScript single-character regular expression is:

"a" "7" "m"

You can get a larger expression together with multiple single characters together. For example, the following JScript regular expression is not an alias, which is an expression created by combining single character expressive 'a', '7', and 'm'.

/ a

Equivalent VBScript expression is:

Please note that there is no connection operator. What you need to do is to place a character behind another character.

Special characters

There are many figures that need to be specially processed when trying to match them. To match these special characters, these characters must first use these characters, that is, use a backslash (/) in front. The following table gives these special characters and its meaning:

Special characters

Description

Match the end position of the input string. If the demiline property of the Regexp object is set, $ or '/ r' is matched. To match the worth itself, use / $.

()

Mark the start and end position of a child expression. Sub-expressions can be used later. To match these characters, use / (and /).

Match the previous sub-expression zero or multiple times. To match * characters, use / *.

Match the previous sub-expression once or multiple times. To match characters, use / .

Match any single characters other than the commutline / n. To match., Please use /.

[

Marking a bracket expression. To match [, please use / [.

Match the previous sub-expression zero or once, or indicate a non-greedy qualifier. To match? Characters, please use / ?.

The next character is marked as or a special character, or the primary character, or the backward reference, or an eight-encyclopedifier. For example, 'n' matches character 'n'. '/ n' matches changing. Sequence '//' matches "/", and '/ (' matches "(".

Match the start position of the input string unless used in square brackets, it indicates that it does not accept the character set. To match ^ character itself, use / ^.

{

The start of the tag qualifier expression. To match {, please use / {.

Indicates a choice between two items. To match |, please use / |.

Non-printing characters

There are a lot of useful non-print characters, which occasionally must be used. The following table shows the escape sequence used to indicate these non-print characters:

character

meaning

/ cx

Match the control character indicated by x. For example, / cm matches a Control-M or an Enterprise. The value of x must be one of A-Z or A-Z. Otherwise, the C is treated as a primary 'c' character.

/ f

Match a change page. Equivalent to / x0c and / cl.

/ N

Match a newline. Equivalent to / x0a and / cj.

/ r

Match a carriage return. Equivalent to / X0D and / cm.

/ s

Match any blank characters, including spaces, tabs, change, and more. Equivalent to [/ f / n / r / t / v].

/ S

Match any non-blank character. Equivalent to [^ / f / N / R / T / V].

/ t matches a tab. Equivalent to / x09 and / ci.

/ v

Match a vertical tab. Equivalent to / x0b and / ck.

Character match

The period (.) Matches any single print or non-printing character in a string, except for the wrap (/ N). The following JScript regular expression can match 'AAC', 'ABC', 'ACC', 'ADC', and the like can also match 'A1C', 'A2C', A-C 'and A # C':

/a.c/

Equivalent VBScript regular expression is:

"a.c"

If you try to match a string containing the file name, the period (.) Is part of the input string, you can add a backslash (/) character in front of the period in the regular expression to achieve this requirement. For example, the following JScript regular expression can match 'filename.ext':

/filename/.ext/

For VBScript, the equivalent expression is as follows:

"filename / .ext"

These expressions are still quite limited. They only allow matching any single characters. In many cases, it is useful to match special characters from the list. For example, if the input text contains the number representation as Chapter 1, Chapter 2, the chapter title you may need to find these chapters.

Braces expressions

One or more single characters can be placed in a square bracket ([and]) to create a list of to be matched. If the character is placed in parentheses, the list is called a bracket expression. Like anywhere in parentheses, ordinary characters represent itself, that is, they match one of them in the input text. Most special characters will lose their meaning when located in parentheses. There are some exceptions here:

']' Character If it is not the first item, a list will be ended. To match the ']' character in the list, put it in the first item, followed behind the start '['.

'/' Is still an escap. To match '/' characters, use '//'.

The characters included in parentheses are only matched to a single character in the parentheses expression in the regular expression. The following JScript regular expressions can match 'Chapter 1', 'Chapter 2', 'Chapter 3', 'Chapter 4' and 'Chapter 5':

/ Chapter [12345] /

In VBScript, you must match the same chapter title, please use the following expression:

"Chapter [12345]"

Note that the word 'Chapter' and the positional relationship of the characters in the brackets are fixed. Therefore, bracket expressions are only used to specify a character set that satisfies the single-character position immediately after the word 'Chapter' and a space. Here is the ninth character position.

If you want to use the range instead of the character itself, you can use a hyphen to separate the start and end characters of the range. The character value of each character will determine its relative order in a range. The following JScript regular expression contains an equivalent to the range expressions of the parentheses shown above.

/ Chapter [1-5] /

The expression of the same function in VBSCIPT is as follows:

"Chapter [1-5]"

If the range is specified in this manner, the start and end values are included in this range. One thing to note is that the starting value in Unicode sort must be before the end value.

If you want to include even characters in parentheses, you must use one of the following methods: use the backslash to escape:

[/ -]

Place the hyphen in the start and end position of the parentheses list. The following expressions match all lowercase letters and hyphens:

[-A-z] [A-Z-]

Create a range where the value of the start character is less than the hyperpoint, and the value of the end character is equal to or greater than the hyperpoint. The following two regular expressions meet this requirement:

[! -] [! - ~]

Similarly, by placing an insert (^) at the beginning of the list (^), you can find all characters in the list or range. If the insert appears in other locations of the list, it matches its own, there is no special meaning. The following JScript regular expression match chapter section is more than 5 chapter title:

/ CHAPTER [^ 12345] /

Use VBScript:

"Chapter [^ 12345]"

In the example shown above, the expression will match any numeric characters other than 1, 2, 3, 4, or 5 in the ninth position. Therefore, 'Chapter 7' is a match, the same 'Chapter 9' is also the same.

The above expression can be represented using a hyphen (-). For JScript:

/ Chapter [^ 1-5] /

Or, VBScript is:

"Chapter [^ 1-5]"

Typical usage of parentheses is to specify matching of any uppercase or lowercase alphanumeric characters or any numbers. The following JScript expressions give this match:

/ [A-za-z0-9] /

Equivalent VBScript expression is:

"[A-ZA-Z0-9]"

Default

Sometimes I don't know how much characters you want to match. In order to adapt to this uncertainty, the regular expression supports the concept of qualifier. These qualifiers can specify how many times a given component must appear to match the match.

The following table gives a description of various qualifiers and its meaning:

character

description

Match the previous sub-expression zero or multiple times. For example, ZO * can match "Z" and "ZOO". * Equivalent to {0,}.

Match the previous sub-expression once or multiple times. For example, 'ZO ' can match "ZO" and "ZOO" but cannot match "Z". Equivalent to {1,}.

Match the previous sub-expression zero or once. For example, "Do (ES)" can match "do" in "do" or "does". Is equivalent to {0,1}.

{n}

n is a non-negative integer. Match the determined N times. For example, 'o {2}' does not match 'o' in "Bob", but can match two O in "Food".

{n,}

n is a non-negative integer. At least n times. For example, 'o {2,}' cannot match 'O' in "Bob", but can match all O in "fooOOD". 'o {1,}' is equivalent to 'o '. 'o {0,}' is equivalent to 'o *'.

{n, m}

For a large input document, the number of chapters is easily more than 9 chapters, so there is a way to handle two-digit or three-digit chapter number. This feature is provided. The following JScript regular expression can match the chapter title with any digits: / chapter [1-9] [0-9] * /

The following VBScript regular expressions perform the same match:

"Chapter [1-9] [0-9] *"

Please note that the qualifier appears after the range expressions. Therefore, it will be applied to the entire range of expressions included, in this example, only numbers from 0 to 9 are specified.

There is no use of ' ' default here, because one number is not required in the second or subsequent position. Also didn't use '?' Characters, because this will limit the number of chapters to only two digits. At least one number is required after 'Chapter' and space characters.

If the number of chapter is limited to 99, you can use the following JScript expression to specify at least one digit, but no more than two numbers.

/ Chapter [0-9] {1,2} /

The following regular expressions can be used for VBScript:

"Chapter [0-9] {1,2}"

The disadvantage of the above expression is that if there is a chapter number greater than 99, it still only matches the first two digits. Another disadvantage is that some people can create a Chapter 0 and can still match. A better JScript expression that matches the two-digit number is as follows:

/ Chapter [1-9] [0-9]? /

/ Chapter [1-9] [0-9] {0,1} /

For VBScript, the following expression is equivalent to the above:

"Chapter [1-9] [0-9]?"

"Chapter [1-9] [0-9] {0,1}"

'*', ' ' And '?' Limits are called greed, that is, they match the text as much as possible. Sometimes this is not what happened. Sometimes I just hope that minimal match.

For example, you may want to search for an HTML document to find a chapter title that is included in the H1 tag. This text may have the following form in the document:

Chapter 1 - Introduction To Regular Expressions

The following expression matches all content between the beginning of the smaller than the number (<) to the end of the H1 tag.

/ "

The regular expression of VBScript is:

"<. *>"

If the starting H1 mark begins, the following non-greedy expressions only match

.

/ "

or

"<. ?>"

By placing '?'? '?'? '"After' ',' 'or'? ', The expression is transferred from greedy to non-greed or minimally matches.

Locator

So far, the examples seen are considered to find the chapter title that appears anywhere. Any string 'Chapter' after the appearance, follows a space and a number might be a real chapter title, or a cross-reference for other chapters. Since the true chapter title always appears in a row, you need to design a method only to find the title and do not look for cross-reference.

The locator provides this feature. The locator can secure a regular expression to the beginning or end of a row. You can also create a regular expression that only occurs only within words or only at the beginning of words. The following table contains a list of regular expressions and their meaning:

character

description

^

Match the start position of the input string. If the multiline property of the regexp object is set, ^ also matches the location after '/ n' or '/ r'. $

Match the end position of the input string. If the multiline property of the Regexp object is set, the $ also matches the position before '/ n' or '/ r'.

/ B

Match a word boundary, that is, the location of the words and spaces.

/ B

Match the non word boundary.

You cannot use a qualifier for the locator. Because there is no plurality of positions in front or rear of the word boundary, such as the expression of '^ ' is not allowed.

To match the text of a line of text, use the '^' characters at the beginning of the regular expression. Don't make the syntax of '^' with their syntax in parentheses. Their syntax is different.

To match the text of a line of text, use the '$' character in the end of the regular expression.

To use the locator when finding the chapter title, the following JScript regular expression will match the beginning of a row at the beginning of a row, the chapter title:

/ ^ Chapter [1-9] [0-9] {0,1} /

The regular expression of the same function in VBScript is as follows:

"^ Chapter [1-9] [0-9] {0,1}"

A true chapter title not only appears in a row, and this line is only this content, so it is inevitably located on a line. The following expression ensures that the specified match matches the chapter without matching cross-reference. It is implemented by creating a regular expression that matches only the start and end position of a line.

/ ^ Chapter [1-9] [0-9] {0,1} $ /

Use VBScript:

"^ Chapter [1-9] [0-9] {0,1} $"

There is a little different from the matching word boundary, but it adds a very important feature to regular expressions. The word boundary is the location between words and spaces. Non-word boundaries are anywhere else. The following JScript expressions will match the first three characters of the word 'Chapter' because they appear after the word boundary:

// bcha /

For VBScript:

"/ bcha"

The location of the '/ b' operator here is critical. If it is located at the beginning of the string to match, the lookup is matched at the beginning of the word; if it is located at the end of the string, the lookup is matched at the end of the word. For example, the following expression will match 'Ter' in the word 'chapter' because it appears before the word boundary:

/ Ter / B /

as well as

"Ter / B"

The following expression will match 'Apt' because it is located in 'Chapter', but does not match 'Apt' in 'Aptitude':

// bapt /

as well as

"/ Bapt"

This is because 'APT' in the word 'Chapter' appears in the non word boundary position, and in the word 'aptitude' is located in the word boundary position. The location of the non-word boundary operator is not important because the match is not related to the beginning or end of a word.

Select and group

Select Allows the use of '|' characters to select in two or more candidates. By expanding the regular expression of the title, it can be expanded to express the expression of the chapter title not only. However, this can not be imagined directly. When using the selected selection, the most likely expression of the '|' character is matched. You may think that the JScript and VBScript expressions below will match the beginning and end position of a row and then follow one or two numbers' or 'section': / ^ chapter | section [1-9] [0-9] {0,1} $ / "^ chapter | section [1-9] [0-9] {0,1} $

Unfortunately, the true situation is that the regular expression above is either matches the word 'Chapter' at the beginning of a row, or it is matched to the end of the end with any number's 'section'. If the input string is 'Chapter 22', the above expression will only match the word 'chapter'. If the input string is 'section 22', the expression will match 'section 22'. But this result is not our purpose here, so there must be an approach to make the regular expression more easily respond to what you want, and there is indeed this method.

Parentheses can be used to limit the range of choices, that is, the choice is only suitable for both words 'Chapter' and 'Section'. However, parentheses are also difficult because they are also used to create sub-expression, and some content will be introduced behind the sub-expression. By adopting the regular expression above and adding parentheses in the appropriate position, the regular expression can be matched to 'chapter 1', or the 'section 3' can also be matched.

The following regular expression uses parentheses to form a group of 'chapter' and 'section', so the expression can work correctly. For JScript:

/ ^ (Chapter | section) [1-9] [0-9] {0,1} $ /

For VBScript:

"^ (Chapter | section) [1-9] [0-9] {0,1} $

These expressions are correct, just generate an interesting by-product. The appropriate grouping is established in the 'Chapter | Section' on both sides, but it also causes one of the two to match words to be used in the future. Since there is only one group of parentheses in the expression shown above, there can be only one captured Submatch. This sub-match can be referenced using the Submatches collection of VBScript or the $ 1- $ 9 attribute of the REGEXP object in JScript.

Sometimes it is desirable to capture a child, sometimes it is undesirable. In the example shown, it is really desirable to use parentheses to group the selection group between the word 'Chapter' or 'Section'. It does not want to reference this match later. In fact, please do not use it unless it is really a capture match. Since there is no need to spend time and memory, this regular expression will be higher.

You can use '?:' To prevent storage of this match from being used in the future in the regular expression pattern parentheses. The following modifications to the regular expressions shown above provide the same functionality of exempting sub-match storage. For JScript:

/ ^ (?: chapter | section) [1-9] [0-9] {0,1} $ /

For VBScript:

"^ (?: chapter | section) [1-9] {0,1} $

In addition to '?:' Metamorphic, there are two nonaptured metammatics to call them. One is positive forecast, with? = Indicated that the search string is matched in any position where the regular expression mode in the parentheses is matched. One is a negative forecast, with '?!', Indicating that the search string is matched without matching the regular expression mode. For example, assume that there is a document containing a reference to Windows 3.1, Windows 95, Windows 98, and Windows NT. Further assume that this document needs to be updated, the method is to find all references to Windows 95, Windows 98, and Windows NT and change these references to Windows 2000. You can use the following JScript regular expressions, this is a forward review to match Windows 95, Windows 98, and Windows NT:

/ Windows (? = 95 | 98 | NT) /

The same matches to do in VBScript can use the following expression:

"? = 95 | 98 | NT)"

After finding a match, the text matched immediately (not the character used in the pre-examined) begins to search the next time. For example, if the expression shown above matches the 'Windows 98', will continue to find from 'Windows' instead of '98'.

Backward reference

Regular expressions One of the most important features is to store some of the modes of successful mode for use this capability. Recall that adding parentheses on both sides of a regular expression mode or partial mode will cause this partial expression to be stored in a temporary buffer. You can use non-capture metamorphic characters '?:', '? =', Or '?!' To ignore saving for this part of the regular expression.

Each sub-match captured is stored in the contents encountered from left to right in the regular expression mode. The buffer number of the storage sub-match starts from 1, continuous numbers up to the maximum 99 sub-expression. Each buffer can be accessed using a '/ n', where n is a one or two-digit decimal number identifies a particular buffer.

The backward reference is the simplest, most useful application is to provide the ability to confirm the location of two of the same words in the text. Please see the sentence below:

Is Is the Cost Of Off Who Going Up Up?

Depending on the content written, the above sentence is obviously the problem of multiple repetitions of words. If there is a method that you can modify this sentence without looking for repetition of each word. This feature can be achieved using a sub-expression using a sub-expression using a sub-expression.

// b ([A-Z] ) / 1 / b / gi

Equivalent VBScript expression is:

"/ b ([A-Z] ) / 1 / b"

In this example, the sub-expression is each of the parentheses. The captured expression includes one or more alphanumeric characters, namely '[a-z] ' specified. The second part of the regular expression is a reference to the child captured by the previously captured, that is, the second appearance of the additional expression. '/ 1' is used to specify the first sub-match. Word Boundary Metacity ensures only a separate word. If so, phrases such as "is quest" or "this IS" are incorrectly identified by this expression.

In the JScript expression, the global flag ('g') after the regular expression indicates that the expression will be used to find as much match as possible in the input string. Size-on-write sensitivity is specified by the case sensitivity tag ('I') at the end of the expression. Multi-line markers specify potential matching may occur at both ends of the newline. For VBScript, various tags cannot be set in the expression, but the properties of the regexp object must be explicitly set. Using the regular expression as shown above, the following JScript code can use the sub-match information, replace the same word two times in a text string to replace the same word:

Var ss = "is is the cost of get @ ./ n"; var RE = // b ([A-Z] ) / 1 / b / gim;

// Create a regular expression style. Var rv = ss.replace (RE, "$ 1"); // replacing two words with a word.

The closest equivalent VBScript code is as follows:

DIM SS, RE, RV SS = "Is Is The Cost of Gasoline Going Up Up?" & VBNewline

Set re = new regexp Re.pattern = "/ b ([A-Z] ) / 1 / b"

Re.global = true re.ignorecase = true re.multiline = true rv = re.replace (ss, "$ 1")

Note that in the VBScript code, global, case sensitivity, and multi-line tags are set by the appropriate properties of the regexp object.

Use $ 1 in the Replace method to reference the saved first sub-match. If there are multiple sub-match, you can continue to reference with $ 2, $ 3, etc..

Another use of the backward reference is to decompose a general resource indicator (URI) into the component portion. It is assumed that the following URI decomposes to protocol (FTP, HTTP, ETC), domain name address, and page / path:

Http://msdn.microsoft.com:80/scripting/default.htm

The following regular expressions can provide this feature. For JScript:

/ (/ w ): ([^ /:] ) (: / d )? ([^ #] ) /

For VBScript:

"(/ w ): ([^ /:] ) (: / d )? ([^ #] *)"

The first addition sub-expression is a protocol part used to capture the web address. The sub-expression matches any word before a colon and two front slash. The second addition sub-expression captures the domain name address of the address. The sub-expression match does not include any character sequence of '^', '/' or ':' characters. The third addition sub-expression captures the website port number code, if the port number is specified. The sub-expression matches the zero or multiple numbers of a colon. Finally, the fourth addition sub-expression captures the path specified by the web address and / or page information. The sub-expression matches one and more characters other than '#' or spaces.

After applying the regular expression to the URI shown above, the child matches contains the following:

Regexp. $ 1 contains "http"

Regexp. $ 2 contains "msdn.microsoft.com"

Regexp. $ 3 contains ": 80"

Regexp. $ 4 contains "/scripting/default.htm"

转载请注明原文地址:https://www.9cbs.com/read-86637.html

9cbs

New Post(0)