Backward reference in regular expressions

xiaoxiao2021-03-06  48

Regular expressions One of the most important features is to store some of the modes of successful mode for use this capability. Recall that adding parentheses on both sides of a regular expression mode or partial mode will cause this partial expression to be stored in a temporary buffer. You can use non-capture metamorphic characters '?:', '? =', Or '?!' To ignore saving for this part of the regular expression.

Each sub-match captured is stored in the contents encountered from left to right in the regular expression mode. The buffer number of the storage sub-match starts from 1, continuous numbers up to the maximum 99 sub-expression. Each buffer can be accessed using a '/ n', where n is a one or two-digit decimal number identifies a particular buffer.

Take a later reference to a simplest, most useful application is to provide the ability to determine the location of two of the same words in the text. Please see the sentence below:

Is Is the Cost Of Off Who Going Up Up?

Depending on the content written, the above sentence is obviously the problem of multiple repetitions of words. If there is a method that you can modify this sentence without looking for repetition of each word. This feature can be achieved using a sub-expression using a sub-expression using a sub-expression.

// b ([A-Z] ) / 1 / b / gi

Equivalent VBScript expression is:

"/ b ([A-Z] ) / 1 / b"

In this example, the sub-expression is each of the parentheses. The captured expression includes one or more alphanumeric characters, namely '[a-z] ' specified. The second part of the regular expression is a reference to the child captured by the previously captured, that is, the second appearance of the additional expression. '/ 1' is used to specify the first sub-match. Word Boundary Metacity ensures only a separate word. If so, phrases such as "is quest" or "this IS" are incorrectly identified by this expression.

In the JScript expression, the global flag ('g') after the regular expression indicates that the expression will be used to find as much match as possible in the input string. Size-on-write sensitivity is specified by the case sensitivity tag ('I') at the end of the expression. Multi-line markers specify potential matching may occur at both ends of the newline. For VBScript, various tags cannot be set in the expression, but the properties of the regexp object must be explicitly set.

Using the regular expression as shown above, the following JScript code can use the sub-match information, replace the same word two times in a text string to replace the same word:

Var ss = "is is the cost of get / g";

VAR RE = // b ([A-Z] ) / 1 / b / gim; // Create a regular expression style.

Var rv = ss.replace (RE, "$ 1"); // replaces two words with a word.

The closest equivalent VBScript code is as follows:

DIM SS, RE, RV

SS = "is is the cost of get @ up?" & vbnewline

Set re = new regexp

Re.pattern = "/ b ([A-Z] ) / 1 / b"

Re.global = TRUE

Re.ignorecase = true

Re.Multiline = true

RV = Re.Replace (SS, "$ 1")

Note that in the VBScript code, global, case sensitivity, and multi-line tags are set by the appropriate properties of the regexp object.

Use $ 1 in the Replace method to reference the saved first sub-match. If there are multiple sub-match, you can continue to reference with $ 2, $ 3, etc.. Another use of rear reference is to decompose a universal resource indicator (URI) into component part. It is assumed that the following URI decomposes to protocol (FTP, HTTP, ETC), domain name address, and page / path:

Http://msdn.microsoft.com:80/scripting/default.htm

The following regular expressions can provide this feature. For JScript:

/ (/ w ): ([^ /:] ) (: / d *)? ([^ #] *) /

For VBScript:

"(/ w ): ([^ /:] ) (: / d *)? ([^ #] *)"

The first addition sub-expression is a protocol part used to capture the web address. The sub-expression matches any word before a colon and two front slash. The second addition sub-expression captures the domain name address of the address. The sub-expression match does not include any character sequence of '^', '/' or ':' characters. The third addition sub-expression captures the website port number code, if the port number is specified. The sub-expression matches the zero or multiple numbers of a colon. Finally, the fourth addition sub-expression captures the path specified by the web address and / or page information. The sub-expression matches one and more characters other than '#' or spaces.

After applying the regular expression to the URI shown above, the child matches contains the following:

Regexp. $ 1 contains "http" regexp. $ 2 contains "msdn.microsoft.com" regexp. $ 3 contains ": 80" regexp. $ 4 contains "/scripting/default.htm"

转载请注明原文地址:https://www.9cbs.com/read-54064.html

New Post(0)