Regular expressions from getting started

xiaoxiao2021-04-07  317

What is the regular expression If it is not used to use a regular expression, then it is not familiar with this term and concept. However, they are not so nice you imagine.

Recall how to find files on the hard disk. You will definitely use the * characters to help find the files you are looking for. • Characters match a single character in the file name, and * matches one or more characters. A pattern such as 'Data ?dat' can find the following file:

Data1.dat

Data2.dat

Datax.dat

Datan.dat

If you use * characters instead? The number of files found will be expanded. 'data * .dat' can match all the following file names:

Data.dat

Data1.dat

Data2.dat

Data12.dat

Datax.dat

Dataxyz.dat

Although this search file is certainly useful, it is also very limited. • The limited capacity of wildcards can make you have a concept of regular expressions, but the regular expression is more powerful, and more flexible.

Early origin

Regular expressions "ancestors" can have been traced back to an early study on how the human nervous system works. Two neur physiologists of Warren McCulloch and Walter Pitts have studied a mathematical way to describe these neural networks.

In 1956, a mathematician called Stephen Kleene published a paper title "Neural Network Emergencies" on the early work of McCulloch and Pitts, introduced the concept of regular expression. Regular expressions are used to describe expressions he called "regular set algebra", so the term "regular expression" is used.

Subsequently, it is found that this work can be applied to some early studies using Ken Thompson's computing search algorithm, Ken Thompson is the main inventors of UNIX. The first practical application of the regular expression is the QED editor in UNIX.

As they said, the rest is a well-known history. Since then, until now

Regular expressions are an important part of the text-based editor and search tool.

Use regular expressions

In a typical search and alternative, the exact text to be found must be provided. This technique may be sufficient for simple search and replacement tasks in static text, but because it lacks flexibility, it is difficult to search for dynamic text, or even impossible.

Using regular expressions, you can:

· Test a pattern of strings. For example, an input string can be tested to see if the string exists or a credit card number mode. This is called data validity verification.

· Replace the text. You can use a regular expression in the document to identify a particular text, then you can delete it, or replace it with another text.

· Extract a sub-string from the string according to the mode match. Can be used to find a specific text in the text or input field.

For example, if you need to search the entire Web site to delete some excessive materials and replace some HTML formatted tags, you can use the regular expression to test each file, see if there is a material or HTML you want to find in this file. Formatted tag. With this method, you can narrow the affected file range to those files that contain materials to be deleted or changed. You can then use the regular expression to delete the outdated material, and finally, you can use the regular expression again to find and replace those markers that need to be replaced.

Another example explaining the regular expression is a language that is not known for its string processing capabilities. Vbscript is a subset of Visual Basic, with rich string processing features. JScript similar to C does not have this capability. Regular expression gives JScript string processing capabilities brings significant improvements. However, it is possible to still use the regular expression in VBScript, which allows multiple string operations to be executed in a single expression.

Regular expression syntax

A regular expression is made from ordinary characters (such as characters a to z) and

Special character (called

Metacity

Text mode. This mode describes one or more strings to be matched when the text body is looking for. Regular expression As a template, a character mode matches the search string.

Here are some regular expressions that may encounter:

JScriptvbscript Match / ^ / [/ T] * $ / "^ / [/ t] * $" matches a blank line. // D {2} - / d {5} / "/ d {2} - / d {5}" Verify that one ID number is composed of a 2-digit, a hyphen, and a 5-digit. /< (.*)>.* (*)>. * "matches an HTML tag.

The table below is a complete list of metamorphic and its behavior in the regular expression context:

Character Description / Tag the next character as a special character, or a primary character, or a backward reference, or an octave. For example, 'n' matches characters "n". '/ n' matches a newline. Sequence '//' match "/" "/ (" matches "(". ^ Match the input string of the start position. If the multiline property of the regexp object is set, ^ also matches '/ n' or '/ r' The next location. $ Match the end position of the input string. If the multiline property of the regexp object is set, $ also matches the position before '/ n' or '/ r'. * Match the previous sub-expression zero or multiple times For example, ZO * can match "z" and "zoo". * Equivalent to {0,}. Match the previous sub-expression once or more. For example, 'ZO ' can match "ZO" and "ZOO" However, it cannot match "Z". Equivalent to {1,}. • Match the previous sub-expression zero or once. For example, "Do (es)" can match "do" or "does" "" " Do ".? Is equivalent to {0,1}. {n} n is a non-negative integer. Match the N times. For example, 'o {2}' does not match" Bob "'o', but can Match two O. {n,} n is a non-negative integer. At least n times. For example, 'o {2,}' does not match 'O' in "Bob", but can match " All O.'o {1,} 'in fooood is equivalent to' o '.' o {0,} 'is equivalent to' o * '. {n, m} M and N are non-negative integers Where n <= m. Minimize n times and match M times. For example, "o {1, 3}" will match the top three O.'o {0, 1} 'in "foooood". 'o?'. Please note that there is no space between commas and two numbers.? When this character is tight in any other restriction (*, ,?, {n}, {n,}, {n, M}), when the matching mode is non-greedy. Non-greedy mode matches the search string as little as possible, and the default greed mode is as much as possible to match the search string. For example, for strings "OOOO ", 'O ?' Will match a single" O ", and 'o ' will match all 'o' .. Match any individual characters other than" / n ". To match any characters including '/ n' Please use the mode of '[./n]'. (Pattern) matches Pattern and get this match. The acquired match can be obtained from the generated Matches collection, using the Submatches collection in VBScript, using $ 0 in JScript ... $ 9 properties. To match the bracket character, use '/ (' or '/)'. (12 :Pattern) match Patte Rn but does not acquire the matching result, that is, this is a non-acquired match, not for storage for storage. This is useful to use the "or" character (|) to combine a pattern.

For example, 'industr (?: Y | iES) is a smale of' Industry | Industries'. (? = pattern) Positive to check, match the lookup string at any string of Pattern. This is a non-acquisition match, that is, the match does not need to be used later. For example, 'Windows (? = 95 | 98 | NT | 2000)' Map "Windows" in Windows 2000, but does not match "Windows" in "Windows 3.1". It is not consumed by the character, that is, after a match occurs, start the next matching search immediately after the last match, not starting from the character containing the pre-check. (?! pattern) Negative review, match the lookup string at any string of Pattern. This is a non-acquisition match, that is, the match does not need to be used later. For example, 'Windows (?! 95 | 98 | NT | 2000) "can match" Windows "in Windows 3.1, but cannot match" Windows "in" Windows 2000 ". It is not consumed by the character, that is, after a match occurs, start the next matching search immediately after the last match, not the X | Y, which matches X or Y after the character containing the queue. For example, 'Z | Food' can match "z" or "food". '(z | f) OOD' matches "Zood" or "Food". [XYZ] Character collection. Match any of the included characters. For example, '[abc]' can match 'a' in "Plain". [^ XYZ] Negative character set. Match any of the characters that are not included. For example, '[^ ABC]' can match 'P' in "Plain". [A-Z] character range. Match any of the characters within the specified range. For example, '[a-z]' can match any lowercase alphabetic characters in the 'A' to 'Z' range. [^ a-z] Negative character range. Match any of any characters that are not within the specified range. For example, '[^ a-z]' can match any of any characters that are not in the 'A' to 'Z'. / b Match a word boundary, that is, the location of the words and spaces. For example, 'er / b' can match 'ER' in "Never", but do not match 'Er' in "Verb". / B matches non-word boundary. 'ER / B' can match 'Er' in "Verb", but cannot match 'Er' in "Never". / CX matches the control character indicated by x. For example, / cm matches a Control-M or an Enterprise. The value of x must be one of A-Z or A-Z. Otherwise, the C is treated as a primary 'c' character. / d Match a numeric character. Equivalent to [0-9]. / D Match a non-digital character. Equivalent to [^ 0-9]. / f Match a change page. Equivalent to / x0c and / cl. / n Match a newline. Equivalent to / x0a and / cj. / r Match a carriage return. Equivalent to / X0D and / cm. / s Match any blank character, including spaces, tabs, change page, and the like.

Equivalent to [/ f / n / r / t / v]. / S Match any non-blank character. Equivalent to [^ / f / N / R / T / V]. / t matches a tab. Equivalent to / x09 and / ci. / v Match a vertical tab. Equivalent to / x0b and / ck. / w Match any word character that includes underscore. Equivalent to '[A-ZA-Z0-9_]'. / W Match any non-word characters. Equivalent to '[^ a-za-z0-9_]'. / XN matches n, where n is a hexadecimal escape value. The hexadecimal escape value must be a determined two numbers long. For example, '/ x41' matches "a". '/ x041' is equivalent to '/ x04' & "1". ASCII coding can be used in regular expressions. ./num matches NUM, where NUM is a positive integer. References to the acquired match. For example, '(.) / 1' matches two consecutive identical characters. / n identifies an octal escape value or a backward reference. If the sub-expression of at least n acquired before / N, N is backward reference. Otherwise, if n is an octal number (0-7), then n is an eight-input escape value. / nm identifies an octal escape value or a backward reference. If there is at least NM acquisition sub-expression before / nm, Nm is backward reference. If there is at least n acquisition before / nm, then n is a backward reference to the text M. If the previous conditions are not satisfied, if n and m are octal numbers (0-7), the / nm will match the eight-way escape value Nm. / Nml If n is an octal number (0-3), and M and L are eight-input numbers (0-7), match the eight-en-en-escaic value NML. / UN matches N, where N is a Unicode character represented by four hexadecimal numbers. For example, / u00A9 matches copyright symbol (©). Establish a regular expression to construct a regular expression and a method of creating a mathematical expression. That is, using a variety of metamorphic characters to create a larger expression together with the operator. A regular expression can be constructed by placing various components of the expression mode between a pair of separators. For JScript, the separator is a pair of forward slash (/) characters. For example: / Expression / For VBScript, a pair of quotes ("") are used to determine the boundary of the regular expression. For example: "Expression" In the two examples shown above, the regular expression mode is stored in the Pattern property of the Regexp object. Regular expressions can be a single character, a character set, a character range, a selection between characters or any combination of all of these components. The priority order After constructing the regular expression, it can be evaluated like a mathematical expression, that is, can be obtained from left to right and in accordance with a priority order. The following table lists the priority order of various regular expression operators from the highest priority to the lowest priority:

Operator describes / escape (), (? :), (? =), [] Parentheses and square brackets *, ,?, {N}, {n,}, {n, m} definition ^ , $, / Anymetachacter location and order | "or" operation

Ordinary character

Ordinary characters consist of all those that are not explicitly specified as a metamorphic character, a non-printing character. This includes all uppercase and lowercase letters characters, all numbers, all punctuation symbols, and some symbols.

The simplest regular expression is a separate normal character that matches the character in the search string itself. For example, single-character mode 'a' can match the letter 'a' that appears in any position in the search string. Here are some single-character regular expression modes:

/ A /

/ 7 /

/ M /

Equivalent VBScript single-character regular expression is:

"a"

"7" "M"

You can get a larger expression together with multiple single characters together. For example, the following JScript regular expression is not an alias, which is an expression created by combining single character expressive 'a', '7', and 'm'.

/ a7m /

Equivalent VBScript expression is:

"a7m"

Please note that there is no connection operator. What you need to do is to place a character behind another character.

Special characters

There are many figures that need to be specially processed when trying to match them. To match these special characters, these characters must first use these characters, that is, use a backslash (/) in front. The following table gives these special characters and its meaning:

Special character descriptions $ Match the end position of the input string. If the demiline property of the Regexp object is set, $ or '/ r' is matched. To match the worth itself, use / $. () Mark the beginning and end position of a child expression. Sub-expressions can be used later. To match these characters, use / (and /). * Match the previous sub-expression zero or multiple times. To match * characters, use / *. Match the previous sub-expression once or multiple times. To match characters, use / . Matches any single characters other than the resort / N. To match., Please use /. [Marking a bracket expression. To match [, please use / [. • Match the previous sub-expression zero or once, or indicate a non-greedy qualifier. To match? Characters, please use /?. / Tag the next character as a special character, or primary character, or backward reference, or eight-way escape. For example, 'n' matches character 'n'. '/ n' matches changing. Sequence '//' Match "/", and '/ (', match "(". ^ Matches the start position of the input string unless used in square brackets, it indicates that it does not accept the character set. Match ^ Character itself, please use / ^. {Tag qualifier expression. To match {, please use / {. | Indicate two options. To match |, please use / |.

Non-printing characters

There are a lot of useful non-print characters, which occasionally must be used. The following table shows the escape sequence used to indicate these non-print characters:

Character Meaning / CX matches the control character indicated by x. For example, / cm matches a Control-M or an Enterprise. The value of x must be one of A-Z or A-Z. Otherwise, the C is treated as a primary 'c' character. / f Match a change page. Equivalent to / x0c and / cl. / n Match a newline. Equivalent to / x0a and / cj. / r Match a carriage return. Equivalent to / X0D and / cm. / s Match any blank character, including spaces, tabs, change page, and the like. Equivalent to [/ f / n / r / t / v]. / S Match any non-blank character. Equivalent to [^ / f / N / R / T / V]. / t matches a tab. Equivalent to / x09 and / ci. / v Match a vertical tab. Equivalent to / x0b and / ck.

Character match

The period (.) Matches any single print or non-printing character in a string, except for the wrap (/ N). The following JScript regular expression can match 'AAC', 'ABC', 'ACC', 'ADC', and the like can also match 'A1C', 'A2C', A-C 'and A # C':

/a.c/

Equivalent VBScript regular expression is:

"a.c"

If you try to match a string containing the file name, the period (.) Is part of the input string, you can add a backslash (/) character in front of the period in the regular expression to achieve this requirement. For example, the following JScript regular expression can match 'filename.ext': / filename / .ext /

For VBScript, the equivalent expression is as follows:

"filename / .ext"

These expressions are still quite limited. They only allow matching any single characters. In many cases, it is useful to match special characters from the list. For example, if the input text contains the number representation as Chapter 1, Chapter 2, the chapter title you may need to find these chapters.

Braces expressions can be placed in a square bracket ([and]) to create a list to be matched. If the character is placed in parentheses, the list is called a bracket expression. Like anywhere in parentheses, ordinary characters represent itself, that is, they match one of them in the input text. Most special characters will lose their meaning when located in parentheses. There are some exceptions here:

']' Character If it is not the first item, a list will be ended. To match the ']' character in the list, put it in the first item, followed behind the start '['. '/' Is still an escap. To match '/' characters, use '//'.

The characters included in parentheses are only matched to a single character in the parentheses expression in the regular expression. The following JScript regular expressions can match 'Chapter 1', 'Chapter 2', 'Chapter 3', 'Chapter 4' and 'Chapter 5':

/ Chapter [12345] /

In VBScript, you must match the same chapter title, please use the following expression:

"Chapter [12345]"

Note that the word 'Chapter' and the positional relationship of the characters in the brackets are fixed. Therefore, bracket expressions are only used to specify a character set that satisfies the single-character position immediately after the word 'Chapter' and a space. Here is the ninth character position. If you want to use the range instead of the character itself, you can use a hyphen to separate the start and end characters of the range. The character value of each character will determine its relative order in a range. The following JScript regular expression contains an equivalent to the range expressions of the parentheses shown above.

/ Chapter [1-5] /

The expression of the same function in VBScript is as follows:

"Chapter [1-5]"

If the range is specified in this manner, the start and end values ​​are included in this range. One thing to note is that the starting value in Unicode sort must be before the end value. If you want to include even characters in parentheses, you must use one of the following methods:

Use a backslash to escape: [/ -] placed the hinder in the start and end position of the parentheses list. The following expressions can match all lowercase letters and hyphens: [-A-Z]

[A-Z-] Create a range, where the value of the start character is less than the hyphen, and the value of the end character is equal to or greater than the hyperproof. The following two regular expressions meet this requirement: [! -]

[! - ~]

Similarly, by placing an insert (^) at the beginning of the list (^), you can find all characters in the list or range. If the insert appears in other locations of the list, it matches its own, there is no special meaning. The following JScript regular expression match chapter section is greater than 5 chapter title: / Chapter [^ 12345] /

Use VBScript:

"Chapter [^ 12345]"

In the example shown above, the expression will match any numeric characters other than 1, 2, 3, 4, or 5 in the ninth position. Therefore, 'Chapter 7' is a match, the same 'Chapter 9' is also the same. The above expression can be represented using a hyphen (-). For JScript:

/ Chapter [^ 1-5] /

Or, VBScript is:

"Chapter [^ 1-5]"

Typical usage of parentheses is to specify matching of any uppercase or lowercase alphanumeric characters or any numbers. The following JScript expressions give this match:

/ [A-za-z0-9] /

Equivalent VBScript expression is:

"[A-ZA-Z0-9]"

The qualifier sometimes does not know how many characters to match. In order to adapt to this uncertainty, the regular expression supports the concept of qualifier. These qualifiers can specify how many times a given component must appear to match the match. The following table gives a description of various qualifiers and its meaning:

Character Description * Matches the previous sub-expression zero or multiple times. For example, ZO * can match "Z" and "ZOO". * Equivalent to {0,}. Match the previous sub-expression once or multiple times. For example, 'ZO ' can match "ZO" and "ZOO" but cannot match "Z". Equivalent to {1,}. • Match the previous sub-expression zero or once. For example, "Do (ES)" can match "do" in "do" or "does". Is equivalent to {0,1}. {n} n is a non-negative integer. Match the determined N times. For example, 'o {2}' does not match 'o' in "Bob", but can match two O in "Food". {n,} n is a non-negative integer. At least n times. For example, 'o {2,}' cannot match 'O' in "Bob", but can match all O in "fooOOD". 'o {1,}' is equivalent to 'o '. 'o {0,}' is equivalent to 'o *'. {N, M} M and N are non-negative integers, where n <= m. Match at least n times and matched M times. For example, "O {1, 3}" will match the top three O in "foooood". 'o {0,1}' is equivalent to 'o?'. Please note that there is no space between commas and two numbers.

For a large input document, the number of chapters is easily more than 9 chapters, so there is a way to handle two-digit or three-digit chapter number. This feature is provided. The following JScript regular expression can match the chapter title with any digits:

/ Chapter [1-9] [0-9] * /

The following VBScript regular expressions perform the same match:

"Chapter [1-9] [0-9] *"

Please note that the qualifier appears after the range expressions. Therefore, it will be applied to the entire range of expressions included, in this example, only numbers from 0 to 9 are specified. There is no use of ' ' default here, because one number is not required in the second or subsequent position. Also didn't use '?' Characters, because this will limit the number of chapters to only two digits. At least one number is required after 'Chapter' and space characters. If the number of chapter is limited to 99, you can use the following JScript expression to specify at least one digit, but no more than two numbers. / Chapter [0-9] {1,2} /

The following regular expressions can be used for VBScript:

"Chapter [0-9] {1,2}"

The disadvantage of the above expression is that if there is a chapter number greater than 99, it still only matches the first two digits. Another disadvantage is that some people can create a Chapter 0 and can still match. A better JScript expression that matches the two-digit number is as follows:

/ Chapter [1-9] [0-9]? /

or

/ Chapter [1-9] [0-9] {0,1} /

For VBScript, the following expression is equivalent to the above:

"Chapter [1-9] [0-9]?"

or

"Chapter [1-9] [0-9] {0,1}"

'*', ' ' And '?' Limits are called greed, that is, they match the text as much as possible. Sometimes this is not what happened. Sometimes I just hope that minimal match. For example, you may want to search for an HTML document to find a chapter title that is included in the H1 tag. This text may have the following form in the document:

Chapter 1 - Introduction To Regular Expressions

The following expression matches all content between the beginning of the smaller than the number (<) to the end of the H1 tag.

/ "

The regular expression of VBScript is:

"<. *>"

If the starting H1 tag is the beginning, the following non-greedy expressions only match.

/ "

or

"<. *?>"

By placing '?'? '?'? '"After' * ',' 'or'? ', The expression is transferred from greedy to non-greed or minimally matches. The positioning is now, and the examples seen only consider looking for the chapter title that appears anywhere. Any string 'Chapter' after the appearance, follows a space and a number might be a real chapter title, or a cross-reference for other chapters. Since the true chapter title always appears in a row, you need to design a method only to find the title and do not look for cross-reference. The locator provides this feature. The locator can secure a regular expression to the beginning or end of a row. You can also create a regular expression that only occurs only within words or only at the beginning of words. The following table contains a list of regular expressions and their meaning:

Character Description ^ Matches the start position of the input string. If the multiline property of the regexp object is set, ^ also matches the location after '/ n' or '/ r'. $ Match the end position of the input string. If the multiline property of the Regexp object is set, the $ also matches the position before '/ n' or '/ r'. / b Match a word boundary, that is, the location of the words and spaces. / B matches non-word boundary. You cannot use a qualifier for the locator. Because there is no plurality of positions in front or rear of the word boundary, such as the expression of '^ *' is not allowed. To match the text of a line of text, use the '^' characters at the beginning of the regular expression. Don't make the syntax of '^' with their syntax in parentheses. Their syntax is different. To match the text of a line of text, use the '$' character in the end of the regular expression. To use the locator when finding the chapter title, the following JScript regular expression will match the beginning of a row at the beginning of a row, the chapter title:

/ ^ Chapter [1-9] [0-9] {0,1} /

The regular expression of the same function in VBScript is as follows:

"^ Chapter [1-9] [0-9] {0,1}"

A true chapter title not only appears in a row, and this line is only this content, so it is inevitably located on a line. The following expression ensures that the specified match matches the chapter without matching cross-reference. It is implemented by creating a regular expression that matches only the start and end position of a line.

/ ^ Chapter [1-9] [0-9] {0,1} $ /

Use VBScript:

"^ Chapter [1-9] [0-9] {0,1} ___ fckpd___33quot;

There is a little different from the matching word boundary, but it adds a very important feature to regular expressions. The word boundary is the location between words and spaces. Non-word boundaries are anywhere else. The following JScript expressions will match the first three characters of the word 'Chapter' because they appear after the word boundary:

// bcha /

For VBScript:

"/ bcha"

The location of the '/ b' operator here is critical. If it is located at the beginning of the string to match, the lookup is matched at the beginning of the word; if it is located at the end of the string, the lookup is matched at the end of the word. For example, the following expression will match 'Ter' in the word 'chapter' because it appears before the word boundary:

/ Ter / B /

as well as

"Ter / B"

The following expression will match 'Apt' because it is located in 'Chapter', but does not match 'Apt' in 'Aptitude':

// bapt /

as well as

"/ Bapt"

This is because 'APT' in the word 'Chapter' appears in the non word boundary position, and in the word 'aptitude' is located in the word boundary position. The location of the non-word boundary operator is not important because the match is not related to the beginning or end of a word. Selecting and Group Selection Allows the use of '|' characters to choose from two or more candidates. By expanding the regular expression of the title, it can be expanded to express the expression of the chapter title not only. However, this can not be imagined directly. When using the selected selection, the most likely expression of the '|' character is matched. You may think that the JScript and VBScript expressions below will match the beginning and end position of a row and then follow one or two numbers' or 'section': / ^ chapter | section [1-9] [0-9] {0,1} $ / "^ Chapter | Section [1-9] [0-9] {0, 1} $" Unfortunately, the real situation is that the regular expression above is either matching at one line. The word 'Chapter' either matches the end of any numbers after the end of the line. If the input string is 'Chapter 22', the above expression will only match the word 'chapter'. If the input string is 'section 22', the expression will match 'section 22'. But this result is not our purpose here, so there must be an approach to make the regular expression more easily respond to what you want, and there is indeed this method. Parentheses can be used to limit the range of choices, that is, the choice is only suitable for both words 'Chapter' and 'Section'. However, parentheses are also difficult because they are also used to create sub-expression, and some content will be introduced behind the sub-expression. By adopting the regular expression above and adding parentheses in the appropriate position, the regular expression can be matched to 'chapter 1', or the 'section 3' can also be matched. The following regular expression uses parentheses to form a group of 'chapter' and 'section', so the expression can work correctly. For JScript: / ^ (chapter | section) [1-9] [0-9] {0, 1} $ / for VBScript is: "^ (chapter | section) [1-9] [0-9] { 0, 1} $ "These expressions are working correctly, just create an interesting by-product. The appropriate grouping is established in the 'Chapter | Section' on both sides, but it also causes one of the two to match words to be used in the future. Since there is only one group of parentheses in the expression shown above, there can be only one captured Submatch. This sub-match can be referenced using the Submatches collection of VBScript or the $ 1- $ 9 attribute of the REGEXP object in JScript. Sometimes it is desirable to capture a child, sometimes it is undesirable. In the example shown, it is really desirable to use parentheses to group the selection group between the word 'Chapter' or 'Section'. It does not want to reference this match later. In fact, please do not use it unless it is really a capture match. Since there is no need to spend time and memory, this regular expression will be higher. You can use '?:' To prevent storage of this match from being used in the future in the regular expression pattern parentheses. The following modifications to the regular expressions shown above provide the same functionality of exempting sub-match storage.

For JScript: / ^ (?: chapter | section) [1-9] [0-9] {0,1} $ / pair VBScript: "^ (?: chapter | section) [1-9] [0-9 ] {0,1} $ "In addition to '?:' Metamorphic, there are two non-captured metades for matching. One is positive forecast, with? = Indicated that the search string is matched in any position where the regular expression mode in the parentheses is matched. One is a negative forecast, with '?!', Indicating that the search string is matched without matching the regular expression mode. For example, assume that there is a document containing a reference to Windows 3.1, Windows 95, Windows 98, and Windows NT. Further assume that this document needs to be updated, the method is to find all references to Windows 95, Windows 98, and Windows NT and change these references to Windows 2000. You can use the following JScript regular expression, this is a forward review to match Windows 95, Windows 98, and Windows NT: / Windows (? = 95 | 98 | NT) / under VBScript to do the same match Expression: "Windows (? = 95 | 98 | NT)" After finding a match, the text that matches immediately (without including the character used in the expedition) starts a search next time. For example, if the expression shown above matches the 'Windows 98', will continue to find from 'Windows' instead of '98'. Take a regular expression of a regular expression that is the ability to store some part of the mode of success in the future. Recall that adding parentheses on both sides of a regular expression mode or partial mode will cause this partial expression to be stored in a temporary buffer. You can use non-capture metamorphic characters '?:', '? =', Or '?!' To ignore saving for this part of the regular expression. Each sub-match captured is stored in the contents encountered from left to right in the regular expression mode. The buffer number of the storage sub-match starts from 1, continuous numbers up to the maximum 99 sub-expression. Each buffer can be accessed using a '/ n', where n is a one or two-digit decimal number identifies a particular buffer. Take a later reference to a simplest, most useful application is to provide the ability to determine the location of two of the same words in the text. Please see the sentence below: Is Is The Cost of Gasoline Going Up Up?

Depending on the content written, the above sentence is obviously the problem of multiple repetitions of words. If there is a method that you can modify this sentence without looking for repetition of each word. This feature can be achieved using a sub-expression using a sub-expression using a sub-expression.

// b ([A-Z] ) / 1 / b / gi

Equivalent VBScript expression is:

"/ b ([A-Z] ) / 1 / b"

In this example, the sub-expression is each of the parentheses. The captured expression includes one or more alphanumeric characters, namely '[a-z] ' specified. The second part of the regular expression is a reference to the child captured by the previously captured, that is, the second appearance of the additional expression. '/ 1' is used to specify the first sub-match. Word Boundary Metacity ensures only a separate word. If so, phrases such as "is quest" or "this IS" are incorrectly identified by this expression. In the JScript expression, the global flag ('g') after the regular expression indicates that the expression will be used to find as much match as possible in the input string. Size-on-write sensitivity is specified by the case sensitivity tag ('I') at the end of the expression. Multi-line markers specify potential matching may occur at both ends of the newline. For VBScript, various tags cannot be set in the expression, but the properties of the regexp object must be explicitly set. Using the regular expression as shown above, the following JScript code can use sub-match information, replace the same word two times in a text string to replace the same word: var ss = "is is the cost of = Gasoline going up Up? ./ n ";

VAR RE = // b ([A-Z] ) / 1 / b / gim; // Create a regular expression style.

Var rv = ss.replace (RE, "$ 1"); // replaces two words with a word.

The closest equivalent VBScript code is as follows:

DIM SS, RE, RV

SS = "is is the cost of get @ up?" & vbnewline

Set re = new regexp

Re.pattern = "/ b ([A-Z] ) / 1 / b"

Re.global = TRUE

Re.ignorecase = true

Re.Multiline = true

RV = Re.Replace (SS, "$ 1")

Note that in the VBScript code, global, case sensitivity, and multi-line tags are set by the appropriate properties of the regexp object. Use $ 1 in the Replace method to reference the saved first sub-match. If there are multiple sub-match, you can continue to reference with $ 2, $ 3, etc.. Another use of rear reference is to decompose a universal resource indicator (URI) into component part. It is assumed that the following URI decomposes to protocol (FTP, HTTP, ETC), domain name address, and page / path:

Http://msdn.microsoft.com:80/scripting/default.htm

The following regular expressions can provide this feature. For JScript:

/ (/ w ): ([^ /:] ) (: / d *)? ([^ #] *) /

For VBScript:

"(/ w ): ([^ /:] ) (: / d *)? ([^ #] *)"

The first addition sub-expression is a protocol part used to capture the web address. The sub-expression matches any word before a colon and two front slash. The second addition sub-expression captures the domain name address of the address. The sub-expression match does not include any character sequence of '^', '/' or ':' characters. The third addition sub-expression captures the website port number code, if the port number is specified. The sub-expression matches the zero or multiple numbers of a colon. Finally, the fourth addition sub-expression captures the path specified by the web address and / or page information. The sub-expression matches one and more characters other than '#' or spaces. After the regular expression is applied to the URI shown above, the sub-match contains the following: Regexp. $ 1 contains "http" regexp. $ 2 contains "msdn.microsoft.com" regexp. $ 3 contains ": 80" Regexp. $ 4 Contains "/scripting/default.htm"

Original address:

Http://www.yesky.com/185/1932685.shtml

Source: SOULOGIC STUDIO

转载请注明原文地址:https://www.9cbs.com/read-132421.html

New Post(0)