Microsoft's Regular Expression Tutorial (5): SelectGroup and Backward Quote

xiaoxiao2021-03-06  62

Select and group

Select Allows the use of '|' characters to select in two or more candidates. By expanding the regular expression of the title, it can be expanded to express the expression of the chapter title not only. However, this can not be imagined directly. When using the selected selection, the most likely expression of the '|' character is matched. You may think that the JScript and VBScript expressions below will match the beginning and end position of a row and follow one or two numbers' or 'section':

/ ^ Chapter | Section [1-9] [0-9] {0,1} $ / "^ chapter | section [1-9] [0-9] {0,1} $

Unfortunately, the true situation is that the regular expression above is either matches the word 'Chapter' at the beginning of a row, or it is matched to the end of the end with any number's 'section'. If the input string is 'Chapter 22', the above expression will only match the word 'chapter'. If the input string is 'section 22', the expression will match 'section 22'. But this result is not our purpose here, so there must be an approach to make the regular expression more easily respond to what you want, and there is indeed this method.

Parentheses can be used to limit the range of choices, that is, the choice is only suitable for both words 'Chapter' and 'Section'. However, parentheses are also difficult because they are also used to create sub-expression, and some content will be introduced behind the sub-expression. By adopting the regular expression above and adding parentheses in the appropriate position, the regular expression can be matched to 'chapter 1', or the 'section 3' can also be matched.

The following regular expression uses parentheses to form a group of 'chapter' and 'section', so the expression can work correctly. For JScript:

/ ^ (Chapter | section) [1-9] [0-9] {0,1} $ /

For VBScript:

"^ (Chapter | section) [1-9] [0-9] {0,1} $

These expressions are correct, just generate an interesting by-product. The appropriate grouping is established in the 'Chapter | Section' on both sides, but it also causes one of the two to match words to be used in the future. Since there is only one group of parentheses in the expression shown above, there can be only one captured Submatch. This sub-match can be referenced using the Submatches collection of VBScript or the $ 1- $ 9 attribute of the REGEXP object in JScript.

Sometimes it is desirable to capture a child, sometimes it is undesirable. In the example shown, it is really desirable to use parentheses to group the selection group between the word 'Chapter' or 'Section'. It does not want to reference this match later. In fact, please do not use it unless it is really a capture match. Since there is no need to spend time and memory, this regular expression will be higher.

You can use '?:' To prevent storage of this match from being used in the future in the regular expression pattern parentheses. The following modifications to the regular expressions shown above provide the same functionality of exempting sub-match storage. For JScript:

/ ^ (?: chapter | section) [1-9] [0-9] {0,1} $ /

For VBScript:

"^ (?: chapter | section) [1-9] {0,1} $

In addition to '?:' Metamorphic, there are two nonaptured metammatics to call them. One is positive forecast, with? = Indicated that the search string is matched in any position where the regular expression mode in the parentheses is matched. One is a negative forecast, with '?!', Indicating that the search string is matched without matching the regular expression mode. For example, assume that there is a document containing a reference to Windows 3.1, Windows 95, Windows 98, and Windows NT. Further assume that this document needs to be updated, the method is to find all references to Windows 95, Windows 98, and Windows NT and change these references to Windows 2000. You can use the following JScript regular expressions, this is a forward review to match Windows 95, Windows 98, and Windows NT:

/ Windows (? = 95 | 98 | NT) /

The same matches to do in VBScript can use the following expression:

"? = 95 | 98 | NT)"

After finding a match, the text matched immediately (not the character used in the pre-examined) begins to search the next time. For example, if the expression shown above matches the 'Windows 98', will continue to find from 'Windows' instead of '98'.

Backward reference

Regular expressions One of the most important features is to store some of the modes of successful mode for use this capability. Recall that adding parentheses on both sides of a regular expression mode or partial mode will cause this partial expression to be stored in a temporary buffer. You can use non-capture metamorphic characters '?:', '? =', Or '?!' To ignore saving for this part of the regular expression.

Each sub-match captured is stored in the contents encountered from left to right in the regular expression mode. The buffer number of the storage sub-match starts from 1, continuous numbers up to the maximum 99 sub-expression. Each buffer can be accessed using a '/ n', where n is a one or two-digit decimal number identifies a particular buffer.

The backward reference is the simplest, most useful application is to provide the ability to confirm the location of two of the same words in the text. Please see the sentence below:

Is Is the Cost Of Off Who Going Up Up?

Depending on the content written, the above sentence is obviously the problem of multiple repetitions of words. If there is a method that you can modify this sentence without looking for repetition of each word. This feature can be achieved using a sub-expression using a sub-expression using a sub-expression.

// b ([A-Z] ) / 1 / b / gi

Equivalent VBScript expression is:

"/ b ([A-Z] ) / 1 / b"

In this example, the sub-expression is each of the parentheses. The captured expression includes one or more alphanumeric characters, namely '[a-z] ' specified. The second part of the regular expression is a reference to the child captured by the previously captured, that is, the second appearance of the additional expression. '/ 1' is used to specify the first sub-match. Word Boundary Metacity ensures only a separate word. If so, phrases such as "is quest" or "this IS" are incorrectly identified by this expression.

In the JScript expression, the global flag ('g') after the regular expression indicates that the expression will be used to find as much match as possible in the input string. Size-on-write sensitivity is specified by the case sensitivity tag ('I') at the end of the expression. Multi-line markers specify potential matching may occur at both ends of the newline. For VBScript, various tags cannot be set in the expression, but the properties of the regexp object must be explicitly set. Using the regular expression as shown above, the following JScript code can use the sub-match information, replace the same word two times in a text string to replace the same word:

Var ss = "is is the cost of get @? ./ n"; var RE = // b ([AZ] ) / 1 / b / gim; // Create a regular expression style. VAR RV = SS.REPLACE (RE, "$ 1"); // instead of two words with a word.

The closest equivalent VBScript code is as follows:

DIM SS, RE, RV SS = "is is the cost of get =" & vbnewline set re = new regexp Re.pattern = "/ b ([AZ] ) / 1 / b" Re.global = True Re.ignorecase = true re.multiline = true rv = re.replace (SS, "$ 1")

Note that in the VBScript code, global, case sensitivity, and multi-line tags are set by the appropriate properties of the regexp object.

Use $ 1 in the Replace method to reference the saved first sub-match. If there are multiple sub-match, you can continue to reference with $ 2, $ 3, etc..

Another use of the backward reference is to decompose a general resource indicator (URI) into the component portion. It is assumed that the following URI decomposes to protocol (FTP, HTTP, ETC), domain name address, and page / path:

Http://msdn.microsoft.com:80/scripting/default.htm

The following regular expressions can provide this feature. For JScript:

/ (/ w ): ([^ /:] ) (: / d *)? ([^ #] *) /

For VBScript:

"(/ w ): ([^ /:] ) (: / d *)? ([^ #] *)"

The first addition sub-expression is a protocol part used to capture the web address. The sub-expression matches any word before a colon and two front slash. The second addition sub-expression captures the domain name address of the address. The sub-expression match does not include any character sequence of '^', '/' or ':' characters. The third addition sub-expression captures the website port number code, if the port number is specified. The sub-expression matches the zero or multiple numbers of a colon. Finally, the fourth addition sub-expression captures the path specified by the web address and / or page information. The sub-expression matches one and more characters other than '#' or spaces.

After applying the regular expression to the URI shown above, the child matches contains the following:

Regexp. $ 1 contains "http"

Regexp. $ 2 contains "msdn.microsoft.com"

Regexp. $ 3 contains ": 80"

Regexp. $ 4 contains "/scripting/default.htm"

转载请注明原文地址:https://www.9cbs.com/read-113799.html

New Post(0)