Establish regular expression

zhaozj2021-02-17  67

Establish regular expression

The method of constructing a regular expression and a method of creating a mathematical expression. That is, using a variety of metamorphic characters to create a larger expression together with the operator.

A regular expression can be constructed by placing various components of the expression mode between a pair of separators. For JScript, the separator is a pair of forward slash (/) characters. E.g:

/ Expression /

For VBScript, a pair of quotes ("") are used to determine the boundary of the regular expression. E.g:

Expression

In the two examples shown above, the regular expression mode is stored in the Pattern property of the Regexp object.

Regular expressions can be a single character, a character set, a character range, a selection between characters or any combination of all of these components.

Priority order

After constructing the regular expression, you can evaliate like a mathematical expression, that is, from left to right and in accordance with a priority order.

The following table lists the priority sequence of various regular expression operators from the highest priority to the lowest priority:

Operator describes / escape (), (? :), (? =), [] Parentheses and square brackets *, ,?, {N}, {n,}, {n, m} definition ^ , $, / Anymetachacter location and order | "or" operation

Ordinary character

Ordinary characters consist of all those that are not explicitly specified as a metamorphic character, a non-printing character. This includes all uppercase and lowercase letters characters, all numbers, all punctuation symbols, and some symbols.

The simplest regular expression is a separate normal character that matches the character in the search string itself. For example, single-character mode 'a' can match the letter 'a' that appears in any position in the search string. Here are some single-character regular expression modes:

/ A /

/ 7 /

/ M /

Equivalent VBScript single-character regular expression is:

"a"

"7"

"M"

You can get a larger expression together with multiple single characters together. For example, the following JScript regular expression is not an alias, which is an expression created by combining single character expressive 'a', '7', and 'm'.

/ a7m /

Equivalent VBScript expression is:

"a7m"

Please note that there is no connection operator. What you need to do is to place a character behind another character.

Special characters

There are many figures that need to be specially processed when trying to match them. To match these special characters, these characters must first use these characters, that is, use a backslash (/) in front. The following table gives these special characters and its meaning:

Special character descriptions $ Match the end position of the input string. If the demiline property of the Regexp object is set, $ or '/ r' is matched. To match the worth itself, use / $. () Mark the beginning and end position of a child expression. Sub-expressions can be used later. To match these characters, use / (and /). * Match the previous sub-expression zero or multiple times. To match * characters, use / *. Match the previous sub-expression once or multiple times. To match characters, use / . Matches any single characters other than the resort / N. To match., Please use /. [Marking a bracket expression. To match [, please use / [. • Match the previous sub-expression zero or once, or indicate a non-greedy qualifier. To match? Characters, please use / ?. / Tag the next character as a special character, or primary character, or rearward reference, or eight-encyclopedifier. For example, 'n' matches character 'n'. '/ n' matches changing. Sequence '//' Match "/", and '/ (', match "(". ^ Matches the start position of the input string unless used in square brackets, it indicates that it does not accept the character set. Match ^ Character itself, please use / ^. {Tag qualifier expression. To match {, please use / {. | Indicate two options between two items. To match |, please use / |. Non-print characters

There are a lot of useful non-print characters, which occasionally must be used. The following table shows the escape sequence used to indicate these non-print characters:

Character Meaning / CX matches the control character indicated by x. For example, / cm matches a Control-M or an Enterprise. The value of x must be one of A-Z or A-Z. Otherwise, the C is treated as a primary 'c' character. / f Match a change page. Equivalent to / x0c and / cl. / n Match a newline. Equivalent to / x0a and / cj. / r Match a carriage return. Equivalent to / X0D and / cm. / s Match any blank character, including spaces, tabs, change page, and the like. Equivalent to [/ f / n / r / t / v]. / S Match any non-blank character. Equivalent to [^ / f / N / R / T / V]. / t matches a tab. Equivalent to / x09 and / ci. / v Match a vertical tab. Equivalent to / x0b and / ck.

Character match

The period (.) Matches any single print or non-printing character in a string, except for the wrap (/ N). The following JScript regular expression can match 'AAC', 'ABC', 'ACC', 'ADC', and the like can also match 'A1C', 'A2C', A-C 'and A # C':

/a.c/

Equivalent VBScript regular expression is:

"a.c"

If you try to match a string containing the file name, the period (.) Is part of the input string, you can add a backslash (/) character in front of the period in the regular expression to achieve this requirement. For example, the following JScript regular expression can match 'filename.ext':

/filename/.ext/

For VBScript, the equivalent expression is as follows:

"filename / .ext"

These expressions are still quite limited. They only allow matching any single characters. In many cases, it is useful to match special characters from the list. For example, if the input text contains the number representation as Chapter 1, Chapter 2, the chapter title you may need to find these chapters.

Braces expressions

One or more single characters can be placed in a square bracket ([and]) to create a list of to be matched. If the character is placed in parentheses, the list is called a bracket expression. Like anywhere in parentheses, ordinary characters represent itself, that is, they match one of them in the input text. Most special characters will lose their meaning when located in parentheses. There are some exceptions: ']' Character If it is not the first item, a list will be ended. To match the ']' character in the list, put it in the first item, followed behind the start '['. '/' Is still an escap. To match '/' characters, use '//'.

The characters included in parentheses are only matched to a single character in the parentheses expression in the regular expression. The following JScript regular expressions can match 'Chapter 1', 'Chapter 2', 'Chapter 3', 'Chapter 4' and 'Chapter 5':

/ Chapter [12345] /

In VBScript, you must match the same chapter title, please use the following expression:

"Chapter [12345]"

Note that the word 'Chapter' and the positional relationship of the characters in the brackets are fixed. Therefore, bracket expressions are only used to specify a character set that satisfies the single-character position immediately after the word 'Chapter' and a space. Here is the ninth character position.

If you want to use the range instead of the character itself, you can use a hyphen to separate the start and end characters of the range. The character value of each character will determine its relative order in a range. The following JScript regular expression contains an equivalent to the range expressions of the parentheses shown above.

/ Chapter [1-5] /

The expression of the same function in VBSCIPT is as follows:

"Chapter [1-5]"

If the range is specified in this manner, the start and end values ​​are included in this range. One thing to note is that the starting value in Unicode sort must be before the end value.

If you want to include even characters in parentheses, you must use one of the following methods:

Use a backslash to escape: [/ -] placed the hinder in the start and end position of the parentheses list. The following expressions can match all lowercase letters and hyphens: [-A-Z]

[A-Z-] Create a range, where the value of the start character is less than the hyphen, and the value of the end character is equal to or greater than the hyperproof. The following two regular expressions meet this requirement: [! -]

[! - ~]

Similarly, by placing an insert (^) at the beginning of the list (^), you can find all characters in the list or range. If the insert appears in other locations of the list, it matches its own, there is no special meaning. The following JScript regular expression match chapter section is more than 5 chapter title:

/ CHAPTER [^ 12345] /

Use VBScript:

"Chapter [^ 12345]"

In the example shown above, the expression will match any numeric characters other than 1, 2, 3, 4, or 5 in the ninth position. Therefore, 'Chapter 7' is a match, the same 'Chapter 9' is also the same.

The above expression can be represented using a hyphen (-). For JScript:

/ Chapter [^ 1-5] /

Or, VBScript is:

"Chapter [^ 1-5]"

Typical usage of parentheses is to specify matching of any uppercase or lowercase alphanumeric characters or any numbers. The following JScript expression gives this match: / [A-ZA-Z0-9] /

Equivalent VBScript expression is:

"[A-ZA-Z0-9]"

The qualifier sometimes doesn't know how many characters to match. In order to adapt to this uncertainty, the regular expression supports the concept of qualifier. These qualifiers can specify how many times a given component must appear to match the match. The following table gives a description of various qualifiers and its meaning:

Character Description * Matches the previous sub-expression zero or multiple times. For example, ZO * can match "Z" and "ZOO". * Equivalent to {0,}. Match the previous sub-expression once or multiple times. For example, 'ZO ' can match "ZO" and "ZOO" but cannot match "Z". Equivalent to {1,}. • Match the previous sub-expression zero or once. For example, "Do (ES)" can match "do" in "do" or "does". Is equivalent to {0,1}. {n} n is a non-negative integer. Match the determined N times. For example, 'o {2}' does not match 'o' in "Bob", but can match two O in "Food". {n,} n is a non-negative integer. At least n times. For example, 'o {2,}' cannot match 'O' in "Bob", but can match all O in "fooOOD". 'o {1,}' is equivalent to 'o '. 'o {0,}' is equivalent to 'o *'. {N, M} M and N are non-negative integers, where n <= m. Match at least n times and matched M times. Liu, "O {1, 3}" will match the top three O in "foooood". 'o {0,1}' is equivalent to 'o?'. Please note that there is no space between commas and two numbers. For a large input document, the number of chapters is easily more than 9 chapters, so there is a way to handle two-digit or three-digit chapter number. This feature is provided. The following JScript regular expressions can match chapter titles with any bits: / chapter [1-9] [0-9] * / The following VBScript regular expression performs the same match: "Chapter [1-9] [0 -9] * "Please note that the qualifier appears after the range expression. Therefore, it will be applied to the entire range of expressions included, in this example, only numbers from 0 to 9 are specified. There is no use of ' ' default here, because one number is not necessarily required in two or subsequent positions. Also didn't use '?' Characters, because this will limit the number of chapters to only two digits. At least one number is required after 'Chapter' and space characters. If the number of chapter is limited to 99, you can use the following JScript expression to specify at least one digit, but no more than two numbers. / Chapter [0-9] {1,2} / for VBScript You can use the following regular expression: "Chapter [0-9] {1, 2}" The shortcoming of the above expression is that if there is a chapter number greater than 99, It still only matches the first two digits. Another disadvantage is that some people can create a Chapter 0 and can still match. A better JScript expression for matches two-digit number is as follows: / chapter [1-9] [0-9]? / Or / chapter [1-9] [0-9] {0,1} / VBScript, the following expression is equivalent to the above: "Chapter [1-9] [0-9]?" Or "chapter [1-9] [0-9] {0, 1}" '*', ' ' And '?' Limits are called greed, that is, they match the text as much as possible. Sometimes this is not what happened. Sometimes I just hope that minimal match.

For example, you may want to search for an HTML document to find a chapter title that is included in the H1 tag. In the document, this text may have the following form:

chapter 1 - Introduction to Regular Expressions The following expression matches all content between the beginning of the 6th mark (<) to the end of the H1 tag . / ":" The regular expression of VBScript is: "<*>" If the h1 mark begins, the following non-greedy expressions only match

. / ". *?>" By placing '?'? '?'? '?'? '?', The expression is transferred from greedy or minimally matched from greed. . The positioning is now, and the examples seen only consider looking for the chapter title that appears anywhere. Any string 'Chapter' after the appearance, follows a space and a number might be a real chapter title, or a cross-reference for other chapters. Since the true chapter title always appears in a row, you need to design a method only to find the title and do not look for cross-reference. The locator provides this feature. The locator can secure a regular expression to the beginning or end of a row. You can also create a regular expression that only occurs only within words or only at the beginning of words. The following table contains a list of regular expressions and their meaning:

Character Description ^ Matches the start position of the input string. If the multiline property of the regexp object is set, ^ also matches the location after '/ n' or '/ r'. $ Match the end position of the input string. If the multiline property of the Regexp object is set, the $ also matches the position before '/ n' or '/ r'. / b Match a word boundary, that is, the location of the words and spaces. / B matches non-word boundary. You cannot use a qualifier for the locator. Because there is no plurality of positions in front or rear of the word boundary, such as the expression of '^ *' is not allowed. To match the text of a line of text, use the '^' characters at the beginning of the regular expression. Don't make the syntax of '^' with their syntax in parentheses. Their syntax is different. To match the text of a line of text, use the '$' character in the end of the regular expression. To use the locator when finding the chapter title, the following JScript regular expressions will match the beginning of the beginning of one line with the top of the chapter title: / ^ chapter [1-9] [0-9] {0,1} The regular expression of the same function in VBScript is as follows: "^ Chapter [1-9] [0-9] {0,1}" A true chapter title not only appears in a row, and this is only this One content, therefore, it is also located in a row. The following expression ensures that the specified match matches the chapter without matching cross-reference. It is implemented by creating a regular expression that matches only the start and end position of a line. / ^ Chapter [1-9] [0-9] {0,1} $ / for VBScript: "^ Chapter [1-9] [0-9] {0,1} $" Matching the word boundary has a little Different, but it adds a very important feature to regular expressions. The word boundary is the location between words and spaces. Non-word boundaries are anywhere else. The following JScript expressions will match the first three characters of the word 'Chapter' because they appear after the word boundary: // bcha / for VBScript is: "/ bcha" here's key location. If it is located at the beginning of the string to match, the lookup is matched at the beginning of the word; if it is located at the end of the string, the lookup is matched at the end of the word. For example, the following expression will match 'Ter' in the word 'chapter' because it appears before the word boundary: / Ter / B / and "Ter / B" The following expression will match 'Apt' because it is located The 'Chapter' is middle, but does not match 'apt': // bapt / and "/ bapt" in 'Aptitude' This is because 'APT' in the word 'Chapter' appears in the non word boundary position, and in words' Aptitude 'is located in the word boundary position. The location of the non-word boundary operator is not important because the match is not related to the beginning or end of a word. Selecting and Group Selection Allows the use of '|' characters to choose from two or more candidates. By expanding the regular expression of the title, it can be expanded to express the expression of the chapter title not only. However, this can not be imagined directly. When using the selected selection, the most likely expression of the '|' character is matched.

You may think that the JScript and VBScript expressions below will match the beginning and end position of a row and then follow one or two numbers' or 'section': / ^ chapter | section [1-9] [0-9] {0,1} $ /

"^ Chapter | Section [1-9] [0-9] {0, 1} $" Unfortunately, the true situation is that the regular expression above is either matching the word 'chapter' at the beginning of a row. Match the end of any number 'section'. If the input string is 'Chapter 22', the above expression will only match the word 'chapter'. If the input string is 'section 22', the expression will match 'section 22'. But this result is not our purpose here, so there must be an approach to make the regular expression more easily respond to what you want, and there is indeed this method. Parentheses can be used to limit the range of choices, that is, the choice is only suitable for both words 'Chapter' and 'Section'. However, parentheses are also difficult because they are also used to create sub-expression, and some content will be introduced behind the sub-expression. By adopting the regular expression above and adding parentheses in the appropriate position, the regular expression can be matched to 'chapter 1', or the 'section 3' can also be matched. The following regular expression uses parentheses to form a group of 'chapter' and 'section', so the expression can work correctly. For JScript: / ^ (chapter | section) [1-9] [0-9] {0, 1} $ / for VBScript is: "^ (chapter | section) [1-9] [0-9] { 0, 1} $ "These expressions are working correctly, just create an interesting by-product. The appropriate grouping is established in the 'Chapter | Section' on both sides, but it also causes one of the two to match words to be used in the future. Since there is only one group of parentheses in the expression shown above, there can be only one captured Submatch. This sub-match can be referenced using the Submatches collection of VBScript or the $ 1- $ 9 attribute of the REGEXP object in JScript. Sometimes it is desirable to capture a child, sometimes it is undesirable. In the example shown, it is really desirable to use parentheses to group the selection group between the word 'Chapter' or 'Section'. It does not want to reference this match later. In fact, please do not use it unless it is really a capture match. Since there is no need to spend time and memory, this regular expression will be higher. You can use '?:' To prevent storage of this match from being used in the future in the regular expression pattern parentheses. The following modifications to the regular expressions shown above provide the same functionality of exempting sub-match storage. For JScript: / ^ (?: chapter | section) [1-9] [0-9] {0,1} $ / pair VBScript: "^ (?: chapter | section) [1-9] [0-9 ] {0,1} $ "In addition to '?:' Metamorphic, there are two non-captured metades for matching. One is positive forecast, with? = Indicated that the search string is matched in any position where the regular expression mode in the parentheses is matched. One is a negative forecast, with '?!', Indicating that the search string is matched without matching the regular expression mode. For example, assume that there is a document containing a reference to Windows 3.1, Windows 95, Windows 98, and Windows NT.

Further assume that this document needs to be updated, the method is to find all references to Windows 95, Windows 98, and Windows NT and change these references to Windows 2000. You can use the following JScript regular expression, this is a forward review to match Windows 95, Windows 98, and Windows NT: / Windows (? = 95 | 98 | NT) / under VBScript to do the same match Expression: "Windows (? = 95 | 98 | NT)" After finding a match, the text that matches immediately (without including the character used in the expedition) starts a search next time. For example, if the expression shown above matches the 'Windows 98', will continue to find from 'Windows' instead of '98'. The backward reference regular expression A most important feature is to store some part of the matching schema to use this capability. Recall that adding parentheses on both sides of a regular expression mode or partial mode will cause this partial expression to be stored in a temporary buffer. You can use non-capture metamorphic characters '?:', '? =', Or '?!' To ignore saving for this part of the regular expression. Each sub-match captured is stored in the contents encountered from left to right in the regular expression mode. The buffer number of the storage sub-match starts from 1, continuous numbers up to the maximum 99 sub-expression. Each buffer can be accessed using a '/ n', where n is a one or two-digit decimal number identifies a particular buffer. The backward reference is the simplest, most useful application is to provide the ability to confirm the location of two of the same words in the text. Please see the sentence below: Is Is The Cost Of Gasoline Going Up? According to the content written, the sentence above the above sentence is clearly a problem of multiple repetitions. If there is a method that you can modify this sentence without looking for repetition of each word. This feature can be achieved using a sub-expression using a sub-expression using a sub-expression. // b ([AZ] ) / 1 / b / gi equivalent VBScript expression is: "/ b ([AZ] ) / 1 / b" In this example, the child expression is between parentheses Each item. The captured expression includes one or more alphanumeric characters, namely '[a-z] ' specified. The second part of the regular expression is a reference to the child captured by the previously captured, that is, the second appearance of the additional expression. '/ 1' is used to specify the first sub-match. Word Boundary Metacity ensures only a separate word. If so, phrases such as "is quest" or "this IS" are incorrectly identified by this expression. In the JScript expression, the global flag ('g') after the regular expression indicates that the expression will be used to find as much match as possible in the input string. Size-on-write sensitivity is specified by the case sensitivity tag ('I') at the end of the expression. Multi-line markers specify potential matching may occur at both ends of the newline. For VBScript, various tags cannot be set in the expression, but the properties of the regexp object must be explicitly set. Using the regular expression as shown above, the following JScript code can use sub-match information, replace the same word two times in a text string to replace the same word: var ss = "is is the cost of = Gasoline going Up Up? ./ n "; var RE = // b ([AZ] ) / 1 / b / gim; // Create a regular expression style.

Var rv = ss.replace (RE, "$ 1"); // replaced two words with a word. The nearest equivalent VBScript code is as follows: DIM SS, RE, RVSS = "is is the cost of get q, Up ?. "& vbnewline

Set re = new regexp

Re.pattern = "/ b ([A-Z] ) / 1 / b"

Re.global = TRUE

Re.ignorecase = true

Re.Multiline = true

RV = RE.REPLACE (SS, "$ 1") Please note that in the VBScript code, global, case sensitivity, and multi-line tags are set by appropriate properties using the regexp object. Use $ 1 in the Replace method to reference the saved first sub-match. If there are multiple sub-match, you can continue to reference with $ 2, $ 3, etc.. Another use of the backward reference is to decompose a general resource indicator (URI) into the component portion. It is assumed that the following URI decomposes to protocol (FTP, HTTP, ETC), domain name address, and page / path: http: //msdn.microsoft.com: 80 / scripting / default.htm The following regular expressions can provide this Features. For JScript, to: / (/ w ): ([^ /:] ) / ([^ #] *) / for VBScript: "(/ w ): ([^ /:] ) (: / d *) "([^ #] *)" The first addition sub-expression is used to capture the protocol part of the web address. The sub-expression matches any word before a colon and two front slash. The second addition sub-expression captures the domain name address of the address. The sub-expression match does not include any character sequence of '^', '/' or ':' characters. The third addition sub-expression captures the website port number code, if the port number is specified. The sub-expression matches the zero or multiple numbers of a colon. Finally, the fourth addition sub-expression captures the path specified by the web address and / or page information. The sub-expression matches one and more characters other than '#' or spaces. After the regular expression is applied to the URI shown above, the sub-match contains the following: Regexp. $ 1 contains "http" regexp. $ 2 contains "msdn.microsoft.com" regexp. $ 3 contains ": 80" Regexp. $ 4 Contains "/scripting/default.htm"

转载请注明原文地址:https://www.9cbs.com/read-29983.html

New Post(0)