Regular expression profile - organizer by Emerald Green College - Green Institute

xiaoxiao2021-03-05  25

table of Contents:

-------------------------------------------------- ----------------------------------

Regular expression

2. Early origin

3. Use regular expressions

4. Regular expression syntax

5. Establish regular expressions

6. Priority order

7. Ordinary characters

8. Special characters

9. Non-print characters

10. Character match

11. Limit

12. Locator

13. Select and Group

14. Backward reference

-------------------------------------------------- ----------------------------------

Squiring: Emerald

Green College - Green Institute

Regular expression

1.

Regular expression

top

If there is no tax expression, it is not familiar with this term and concept. However, they are not so nice you imagine.

Recall how to find files on the hard disk. You will definitely use the * characters to help find the files you are looking for. ? Character matching file name

Single characters, while * matches one or more characters. A pattern such as 'Data ?dat' can find the following file:

Data1.dat

Data2.dat

Datax.dat

Datan.dat

If you use * characters instead? The number of files found will be expanded. 'data * .dat' can match all the following file names:

Data.dat

Data1.dat

Data2.dat

Data12.dat

Datax.dat

Dataxyz.dat

Although this search file is certainly useful, it is also very limited. • The limited ability of wildcards can make you do what you can do for regular expressions?

Read, but the regular expression is more powerful and more flexible.

2.

Early origin

top

Early origin

Regular expressions "ancestors" can have been traced back to an early study on how the human nervous system works. Two digits of Warren McCulloch and Walter Pitts

Neuronics analyzes a mathematical way to describe these neural networks.

In 1956, a US mathematician called Stephen Kleene published a title based on the early work of MCCULLOCH and PITTs.

The aptitude of the incident, introduced the concept of regular expressions. Regular expressions are used to describe the expression of "algebra" called "regular set", so

Use the "regular expression" term.

Subsequently, it is found that this work can be applied to some early studies using Ken Thompson's computational search algorithm, Ken Thompson is the main invention of UNIX

people. The first practical application of the regular expression is the QED editor in UNIX.

As they said, the rest is a well-known history. Since then, until the regular expression is based on the text based on the text editor and the search tool.

Part part.

3.

Use regular expressions

top

In a typical search and alternative, the exact text to be found must be provided. This technology may be sufficient for simple search and replacement tasks in static text.

But because it lacks flexibility, it is difficult to search for dynamic text, or even impossible.

Using regular expressions, you can:

1. Test a pattern of strings. For example, you can test an input string to see if the string is present in a telephone number mode or a credit.

Card number mode. This is called data validity verification.

2. Replace the text. You can use a regular expression in the document to identify a particular text, then you can delete it, or replace it with another text.

3. Extract a sub-string from the string based on the mode match. Can be used to find a specific text in the text or input field.

For example, if you need to search the entire Web site to delete some excessive materials and replace some HTML formatted tags, you can use the regular expression to test each file, see if there is a material or HTML you want to find in this file. Formatted tag. With this method, you can narrow the affected file range.

The files that contain materials to be deleted or changed. You can then use the regular expression to delete the outdated material, and finally, you can use the regular expression again.

Find and replace those tags that need to be replaced.

Another example explaining the regular expression is a language that is not known for its string processing capabilities. Vbscript is a Visual Basic

Substrates, rich string processing functions. Visual Basic Scripting Edition similar to c does not have this capability. Regular expression

Visual Basic Scripting Edition string has a significant improvement. However, it may be used in VBScript using regular expressions

Higher efficiency, which allows multiple string operations to be performed in a single expression.

4.

Regular expression syntax

top

A regular expression is a text mode composed of normal characters (such as characters a to z) and special characters (called metammatics). This mode is described in finding text

One or more strings to be matched when the word body is hosted. Regular expression As a template, a character mode matches the search string.

Here are some regular expressions that may encounter:

Visual Basic VBScript matches

Scripting edition

/ ^ / [/ t] * $ / "^ / [/ t] * $" matches a blank line.

// D {2} - / d {5} / "/ d {2} - / d {5}" Verify that a ID number is 2 digits, one

Monolithic characters and a 5-digit composition.

/< (.*)>. (*)>. * "matches an HTML tag.

The table below is a complete list of metamorphic and its behavior in the regular expression context:

Character description

/ Tag the next character as a special character, or a primary character, or after

To a reference, or an octave. For example, 'n' matches characters "n". '/ n'

Match a newline. Sequence '//' Match "/" and "match" (".

^ Match the start position of the input string. If the regexp object is set

Multiline properties, ^ also matches the location after '/ n' or '/ r'.

$ Match the end position of the input string. If the regexp object is set

Multiline properties, $ also matching '/ n' or '/ r' before.

* Match the previous sub-expression zero or multiple times. For example, ZO * can match "Z" and

"zoo". * Equivalent to {0,}.

Match the previous sub-expression once or multiple times. For example, 'ZO ' can match "ZO"

And "ZOO" but cannot match "Z". Equivalent to {1,}.

• Match the previous sub-expression zero or once. For example, "Do (ES)" can match "do" in "do" or "does". Is equivalent to {0,1}.

{n} n is a non-negative integer. Match the determined N times. For example, 'o {2}' cannot match

"Bob" 'o', but can match two O in "Food".

{n,} n is a non-negative integer. At least n times. For example, 'o {2,}' cannot match

'O' in "Bob", but can match all O "fooood". 'o {1,}'

Equivalent to 'o '. 'o {0,}' is equivalent to 'o *'.

{N, M} M and N are non-negative integers, where n <= m. Match at least n times and up to do

With M times. Liu, "O {1, 3}" will match the top three O in "foooood".

'o {0,1}' is equivalent to 'o?'. Please note that there is no space between commas and two numbers.

• When this character is tight in any other restriction (*, ,?, {N}, {n,},

When the {n, m} is followed, the matching mode is unresurred. Non-greedy mode as possible

Match the search string, and the default greed mode is as many as possible

String of the cable. For example, for strings "oooo", 'o ?' Will match a single

"o", and 'o ' will match all 'o'.

Match any individual characters other than "/ n". To match anything including '/ n'

What characters, please use the mode of '[./n]'.

(Pattern) Match Pattern and get this match. The acquired matches can be generated

Matches collection, use Submatches collection in VBScript,

$ 0 ... $ 9 properties are used in Visual Basic scripting edition. want

Match the bracket character, use '/ (' or '/)'.

(?: pattern) Match Pattern but does not get matching results, that is, this is a non-acquisition.

With, it is not used for storage. This is combined using "or" characters (|)

Various parts of a model are useful. For example, 'industr (?: Y | iES)

It is a smale of 'Industry | Industries'.

(? = pattern) Positive summation, match the lookup character at any string of Pattern's string

string. This is a non-acquisition match, that is, the match does not need to get a supply.

use. For example, 'windows (? = 95 | 98 | NT | 2000)' Can match "Windows

"Windows" in 2000, but does not match "Windows" in "Windows3.11".

The forecast does not consume characters, that is, after a match occurs, in the last move

After the match, start the next matching search, not starting from the characters that contain the forecast.

(?! pattern) Negatively to check, in any mismatch, Negative Lookahead matches the

Search string at any point where a string not matching

Pattern's string is started to match the lookup string. This is a non-acquisition

With, that is, the match does not need to be used later. For example 'windows

(?! 95 | 98 | NT | 2000) 'Match "Windows" in "Windows 3.1",

But you can't match "Windows" in "Windows 2000". Not consuming words

That is, after a match occurs, open immediately after the last match

Start the next match, not starting from the character containing the queen

X | Y matches X or Y. For example, 'Z | Food' can match "z" or "food". '(z | f)

OOD 'matches "Zood" or "Food".

[XYZ] Character collection. Match any of the included characters. For example, '[ABC]'

Match 'A' in "Plain".

[^ XYZ] Negative character set. Match any of the characters that are not included. For example, '[^ ABC]'

Match 'P' in "Plain".

[A-Z] character range. Match any of the characters within the specified range. For example, '[a-z]' can be

Arbitrary lowercase letters in the range of 'A' to 'Z'.

[^ a-z] Negative character range. Match any of any characters that are not within the specified range. E.g,

'[^ a-z]' can match any of any characters that are not in the 'A' to 'Z'.

/ b Match a word boundary, that is, the location of the words and spaces. E.g,

'ER / B' can match 'Er' in "Never", but do not match "Verb"

'ER'.

/ B matches non-word boundary. 'ER / B' can match 'Er' in "Verb", but not

Er 'in "Never".

/ CX matches the control character indicated by x. For example, / cm matches a Control-M or

Enter. The value of x must be one of A-Z or A-Z. Otherwise, treat C as one

Original 'c' character.

/ d Match a numeric character. Equivalent to [0-9].

/ D Match a non-digital character. Equivalent to [^ 0-9].

/ f Match a change page. Equivalent to / x0c and / cl.

/ n Match a newline. Equivalent to / x0a and / cj.

/ r Match a carriage return. Equivalent to / X0D and / cm.

/ s Match any blank character, including spaces, tabs, change page, and the like. Equivalent to

[/ f / n / r / t / v].

/ S Match any non-blank character. Equivalent to [^ / f / N / R / T / V]. / t matches a tab. Equivalent to / x09 and / ci.

/ v Match a vertical tab. Equivalent to / x0b and / ck.

/ w Match any word character that includes underscore. Equivalent to '[A-ZA-Z0-9_]'.

/ W Match any non-word characters. Equivalent to '[^ a-za-z0-9_]'.

/ XN matches n, where n is a hexadecimal escape value. Hexadecimal escape value must be true

Two numbers long. For example, '/ x41' matches "a". '/ x041'

The '/ x04' & "1". ASCII coding can be used in regular expressions. .

/ NUM matches NUM, where NUM is a positive integer. References to the acquired match.

For example, '(.) / 1' matches two consecutive identical characters.

/ n identifies an octal escape value or a rearward reference. If / n is at least n

The acquired sub-expression, then n is a backward reference. Otherwise, if n is an octal

Numbers (0-7), then n is an eight-input escape value.

/ Nm identifies an octal escape value or a backward reference. If / nm has at least

Is Precededed by Least NM Gets the gathered expression, Nm is

A reference. If there is at least n acquisitions before / nm, then n is a post-text

The backward reference of the word M. If the previous conditions are not met, if n and m are

Octa (0-7), / nm will match the eight-en-en-esca-value NM.

/ Nml If n is an octal number (0-3), and M and L are eight input numbers (0-

7), match the eight-way escape value NML.

/ UN matches N, where N is a Unicode word represented by four hexadecimal numbers.

symbol. For example, / u00A9 matches copyright symbol (?).

5.

Establish regular expression

top

The method of constructing a regular expression and a method of creating a mathematical expression. That is, using a variety of metades and operators to create a smaller expression together to create larger

Expression.

A regular expression can be constructed by placing various components of the expression mode between a pair of separators. For Visual Basic Scripting Edition

Words, the separator is a pair of forward slash (/) characters. E.g:

/ Expression /

For VBScript, a pair of quotes ("") are used to determine the boundary of the regular expression. E.g:

Expression

In the two examples shown above, the regular expression mode is stored in the Pattern property of the Regexp object.

<< --------------------------------------------------------------------------------------------------------------------------------------- ------ >>

6.

Priority order

top

After constructing the regular expression, you can evaliate like a mathematical expression, that is, from left to right and in accordance with a priority order.

The following table lists the priority sequence of various regular expression operators from the highest priority to the lowest priority:

Operator description

/ Escapes

(), (? :), (? =), [] Parentheses and square brackets *, ,?, {N}, {n,}, {n, m} qualifier

^, $, / Anymetachacter location and order

| "Or" operation

<< --------------------------------------------------------------------------------------------------------------------------------------- ---------- >>

7.

Ordinary character

top

Ordinary characters consist of all those that are not explicitly specified as a metamorphic character, a non-printing character. This includes all uppercase and lowercase letters characters, all numbers, all

Point symbols and some symbols.

The simplest regular expression is a separate normal character that matches the character in the search string itself. For example, single-character mode 'a' can match

Search for letters 'A' that appears in any location in the string. Here are some single-character regular expression modes:

/ A /

/ 7 /

/ M /

Equivalent VBScript single-character regular expression is:

"a"

"7"

"M"

You can get a larger expression together with multiple single characters together. For example, the following Visual Basic scripting edition regular expression is not

Other, it is an expression created by combining single character expressive 'a', '7', and 'm'.

/ a7m /

Equivalent VBScript expression is:

"a7m"

Please note that there is no connection operator. What you need to do is to place a character behind another character.

<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<< >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>

8

.Special characters

top

There are many figures that need to be specially processed when trying to match them. To match these special characters, you must first transfrawate these characters, that is, before

A backslash (/) is used. The following table gives these special characters and its meaning:

Special character description

$ Match the end position of the input string. If you set the multiline of the regexp object

Attributes, $ also match '/ n' or '/ r'. To match the worth itself, use / $.

() Mark the beginning and end position of a child expression. Sub-expressions can be used later.

To match these characters, use / (and /).

* Match the previous sub-expression zero or multiple times. To match * characters, use / *.

Match the previous sub-expression once or multiple times. To match characters, use / .

Matches any single characters other than the resort / N. To match., Please use /.

Marking a bracket expression. To match [, please use / [.

• Match the previous sub-expression zero or once, or indicate a non-greedy qualifier. Do to match?

Character, please use / ?.

/ Tag the next character as a special character, or primary character, or rearward reference, or eight-encyclopedifier. For example, 'n' matches character 'n'. '/ n' matches changing. Sequence '//'

With "/", and '/ (' match "(".

^ Match the start position of the input string unless used in square brackets, it indicates

Do not accept the character set. To match ^ character itself, use / ^.

{Mark the start of the spectrum expression. To match {, please use / {.

| Indicate a choice between two items. To match |, please use / |.

9.

Non-printing characters

top

There are a lot of useful non-print characters, which occasionally must be used. The following table shows the escape sequence used to indicate these non-print characters:

Character meaning

/ CX matches the control character indicated by x. For example, / cm matches a Control-M or an Enterprise.

The value of x must be one of A-Z or A-Z. Otherwise, treat C as a primary 'c' word

symbol.

/ f Match a change page. Equivalent to / x0c and / cl.

/ n Match a newline. Equivalent to / x0a and / cj.

/ r Match a carriage return. Equivalent to / X0D and / cm.

/ s Match any blank character, including spaces, tabs, change page, and the like. Equivalent to

[/ f / n / r / t / v].

/ S Match any non-blank character. Equivalent to [^ / f / N / R / T / V].

/ t matches a tab. Equivalent to / x09 and / ci.

/ v Match a vertical tab. Equivalent to / x0b and / ck.

<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<< >>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>

10.

Character match

top

The period (.) Matches any single print or non-printing character in a string, except for the wrap (/ N). The following Visual Basic scripting

Edition regular expression can match 'aac', 'ABC', 'ACC', 'ADC', etc., can also match 'A1C', 'A2C', A-C 'and

A # c ':

/a.c/

Equivalent VBScript regular expression is:

"a.c"

If you try to match a string containing the file name, the period (.) Is part of the input string, you can add one in front of the period in the regular expression.

A backslash (/) character to achieve this requirement. For example, the following Visual Basic scripting edition regular expressions can be

With 'FileName.ext':

/filename/.ext/

For VBScript, the equivalent expression is as follows:

"filename / .ext"

These expressions are still quite limited. They only allow matching any single characters. In many cases, it is useful to match special characters from the list. For example, if

If the input text contains a number of chapter titles to Chapter 1, Chapter 2, and you may need to find these chapter titles. Braces expressions

One or more single characters can be placed in a square bracket ([and]) to create a list of to be matched. If the character is placed in parentheses, then

The list is called parentheses expressions. Like anywhere in parentheses, ordinary characters represent their own, that is, they match one of the input text.

already. Most special characters will lose their meaning when located in parentheses. There are some exceptions here:

1. ']' Character If not the first item, a list will be ended. To match the ']' character in the list, put it in the first item, followed at the beginning '["

Behind.

2. '/' is still an escape character. To match '/' characters, use '//'.

The characters included in parentheses are only matched to a single character in the parentheses expression in the regular expression. The following Visual Basic

Scripting Edition regular expressions can match 'Chapter 1', 'Chapter 2', 'Chapter 3', 'Chapter 4' and 'Chapter 5':

/ Chapter] / CHAPTER] /

In VBScript, you must match the same chapter title, please use the following expression:

"Chapter [12345]"

Note that the word 'Chapter' and the positional relationship of the characters in the brackets are fixed. Therefore, bracket expressions are only used to designate the satisfaction.

The word 'chapter' and a single character set after a space. Here is the ninth character position.

If you want to use the range instead of the character itself, you can use a hyphen to separate the start and end characters of the range. Per character

The character value will determine its relative order in a range. The following Visual Basic scripting edition regular expression contains an equivalent

The range of brackets shown.

/ Chapter [1-5] /

The expression of the same function in VBSCIPT is as follows:

"Chapter [1-5]"

If the range is specified in this manner, the start and end values ​​are included in this range. One thing to note is that the starting value must be determined in Unicode sorting.

To end the end value.

If you want to include even characters in parentheses, you must use one of the following methods:

1. Use a backslash to escape: [/ -]

2. Place the hyphen in the start and end position of the parentheses list. The following expressions can match all lowercase letters and hyphens: [- a-z], [a-z-]

3. Create a range where the value of the start character is less than the hyperi, and the value of the end character is equal to or greater than the hyperpoint. The following two regular expressions meet this

Requirements: [! -], [! - ~]

Similarly, by placing an insert (^) at the beginning of the list (^), you can find all characters in the list or range. If the insert appears in the list

Location, match it itself, there is no special meaning. The following Visual Basic Scripting Edition regular expression matching chapter section is greater than 5 chapter

Section title:

/ CHAPTER [^ 12345] /

Use VBScript:

"Chapter [^ 12345]"

In the example shown above, the expression will match any numeric characters other than 1, 2, 3, 4, or 5 in the ninth position. Therefore, 'Chapter 7' is a match, the same 'Chapter 9' is also the same.

The above expression can be represented using a hyphen (-). For Visual Basic scripting edition is:

/ Chapter [^ 1-5] /

Or, VBScript is:

"Chapter [^ 1-5]"

Typical usage of parentheses is to specify matching of any uppercase or lowercase alphanumeric characters or any numbers. The following Visual Basic scripting edition

The expression gives this match:

/ [A-za-z0-9] /

Equivalent VBScript expression is:

"[A-ZA-Z0-9]"

11.

Default

top

Sometimes I don't know how much characters you want to match. In order to adapt to this uncertainty, the regular expression supports the concept of qualifier. These qualifiers can specify regular expressions

A given component must appear how many times can meet the match.

The following table gives a description of various qualifiers and its meaning:

Character description

* Match the previous sub-expression zero or multiple times. For example, ZO * can match "Z" and "ZOO".

* Equivalent to {0,}.

Match the previous sub-expression once or multiple times. For example, 'ZO ' can match "ZO"

And "ZOO" but cannot match "Z". Equivalent to {1,}.

• Match the previous sub-expression zero or once. For example, "Do (ES)?" Can match "do"

Or "do" in "does". Is equivalent to {0,1}.

{n} n is a non-negative integer. Match the determined N times. For example, 'o {2}' cannot match "bob"

The 'o' in the middle, but can match two O in "Food".

{n,} n is a non-negative integer. At least n times. For example, 'o {2,}' does not match "BOB"

'O', but can match all O "fooood". 'o {1,}' is equivalent to 'o '. 'O

{0,} 'is equivalent to' o * '.

{N, M} M and N are non-negative integers, where n <= m. Match at least n times and matched M times.

Liu, "O {1, 3}" will match the top three O in "foooood". 'o {0,1}' is equivalent

At 'o?'. Please note that there is no space between commas and two numbers.

For a large input document, the number of chapters is easily more than 9 chapters, so there is a way to handle two-digit or three-digit chapter number. Default

This feature is provided. The following Visual Basic scripting edition regular expression can match the chapter title with any bits:

/ Chapter [1-9] [0-9] * /

The following VBScript regular expressions perform the same match:

"Chapter [1-9] [0-9] *"

Please note that the qualifier appears after the range expressions. Therefore, it will be applied to the entire range of expressions included. In this example, only the number from 0 to 9

word.

There is no use of ' ' default here, because one number is not required in the second or subsequent position. Also didn't use '?' Characters, because this will limit the number of chapters to only two digits. At least one number is required after 'Chapter' and space characters.

If the number of chapter is limited to 99, you can use the following Visual Basic scripting edition expression to specify at least one number, but

No more than two numbers.

/ Chapter [0-9] {1,2} /

The following regular expressions can be used for VBScript:

"Chapter [0-9] {1,2}"

The disadvantage of the above expression is that if there is a chapter number greater than 99, it still only matches the first two digits. Another disadvantage is that some people can create a chapter

0 and still match. A better Visual Basic scripting edition expression that matches two digits is as follows:

/ Chapter [1-9] [0-9]? /

or

/ Chapter [1-9] [0-9] {0,1} /

For VBScript, the following expression is equivalent to the above:

"Chapter [1-9] [0-9]?"

or

"Chapter [1-9] [0-9] {0,1}"

'*', ' ' And '?' Limits are called greed, that is, they match the text as much as possible. Sometimes this is not what happened.

Sometimes I just hope that minimal match.

For example, you may want to search for an HTML document to find a chapter title that is included in the H1 tag. This text may have the following form in the document:

Chapter 1 - Introduction To Regular Expressions

The following expression matches all content between the beginning of the smaller than the number (<) to the end of the H1 tag.

/ "

The regular expression of VBScript is:

"<. *>"

If the starting H1 mark begins, the following non-greedy expressions only match

.

/ "

or

"<. *?>"

By placing '?'? '?'? '"After' * ',' 'or'? ', The expression is transferred from greedy to non-greed or minimally matches.

12.

Locator

top

So far, the examples seen are considered to find the chapter title that appears anywhere. Any string 'Chapter' after the appearance, follows one space and one

A number may be a real chapter title, or a cross-reference for other chapters. Since the true chapter title always appears in a row,

Need to design a method only to find the title instead of a cross-reference.

The locator provides this feature. The locator can secure a regular expression to the beginning or end of a row. You can also create only in words or only in words.

The regular expression that appears at the beginning or end. The following table contains a list of regular expressions and their meaning:

Character description

^ Match the start position of the input string. If you set the multiline property of the Regexp object,

^ Also matches the location after '/ n' or '/ r'.

$ Match the end position of the input string. If you set the multiline property of the Regexp object,

$ Match '/ n' or '/ r' before.

/ b Match a word boundary, that is, the location of the words and spaces.

/ B matches non-word boundary. You cannot use a qualifier for the locator. Because there is no continuous plurality of positions in front of a newline or word boundary, there is a expression such as '^ *'

It is not allowed.

To match the text of a line of text, use the '^' characters at the beginning of the regular expression. Don't put the syntax of '^' and its in parentheses.

The syntax is mixed. Their syntax is different.

To match the text of a line of text, use the '$' character in the end of the regular expression.

To use the locator when finding the chapter title, the following Visual Basic Scripting Edition regular expression will match the beginning of a line.

Two numbers of numbers:

/ ^ Chapter [1-9] [0-9] {0,1} /

The regular expression of the same function in VBScript is as follows:

"^ Chapter [1-9] [0-9] {0,1}"

A true chapter title not only appears in a row, and this line is only this content, so it is inevitably located on a line. Below

Expression ensures that the specified match matches the chapter without matching cross-reference. It is a list of regular tables that only matches only the beginning and end position of a line

Dressing is achieved.

/ ^ Chapter [1-9] [0-9] {0,1} $ /

Use VBScript:

"^ Chapter [1-9] [0-9] {0,1} $"

There is a little different from the matching word boundary, but it adds a very important feature to regular expressions. The word boundary is the location between words and spaces. Non-word

The boundary is anywhere. The following Visual Basic scripting edition expressions will match the first three characters of the word 'chapter' because it

They appear after the word boundary:

// bcha /

For VBScript:

"/ bcha"

The location of the '/ b' operator here is critical. If it is located at the beginning of the string to match, look for matching at the beginning of the word; if it is rewriting

At the end of the string, lookup matches at the end of the word. For example, the following expression will match 'Ter' in the word 'chapter' because it appears in words.

Before the border:

/ Ter / B /

as well as

"Ter / B"

The following expression will match 'Apt' because it is located in 'Chapter', but does not match 'Apt' in 'Aptitude':

// bapt /

as well as

"/ Bapt"

This is because 'APT' in the word 'Chapter' appears in the non word boundary position, and in the word 'aptitude' is located in the word boundary position. Non-word boundary

The location of the operator is not important because the match is not related to the beginning or end of a word.

13

Select

top

Select Allows the use of '|' characters to select in two or more candidates. Expressing the regular expression of the expansion chapter title, it can be expanded to not only apply

The expression of the chapter title. However, this can not be imagined directly. When using the selected selection, the most likely expression of the '|' character is matched. You may recognize

The following Visual Basic scripting edition and VBScript expressions will match the beginning and end position of a row and follow one or two numbers.

The 'Chapter' or 'Section' of the word:

/ ^ Chapter | Section [1-9] [0-9] {0,1} $ /

"^ Chapter | Section [1-9] [0-9] {0, 1} $" Unfortunately, the true situation is that the regular expression above is either matching the word 'chapter' at the beginning of a row. Match the end of the line

Any number of 'section'. If the input string is 'Chapter 22', the above expression will only match the word 'chapter'. If you enter a string

For 'section 22', the expression will match 'section 22'. But this result is not the purpose we here, so there must be a way to make the regular table.

The Dorm is more easier to respond to what you want, and there is indeed this method.

Parentheses can be used to limit the range of choices, that is, the choice is only suitable for both words 'Chapter' and 'Section'. However, parentheses

It is also difficult to handle because they are also used to create sub-expression, and some content will be introduced behind the sub-expression. By adopting the regime shown above

Expressions and parentheses can be added to the appropriate position, allowing the regular expression to match 'Chapter 1', or match 'section 3'.

The following regular expression uses parentheses to form a group of 'chapter' and 'section', so the expression can work correctly. Visual Basic

Scripting edition is:

/ ^ (Chapter | section) [1-9] [0-9] {0,1} $ /

For VBScript:

"^ (Chapter | section) [1-9] [0-9] {0,1} $

These expressions are correct, just generate an interesting by-product. Place a proper group in 'Chapter | Section' on both sides, but also

What is one of two to match words is captured for future use. Since there is only one group of parentheses in the expression shown above, there can only be a captured

Submatch. You can use the Submatches collection of VBScript or $ 1- $ 9 properties of the Regexp object in Visual Basic scripting edition.

To reference this sub-match.

Sometimes it is desirable to capture a child, sometimes it is undesirable. In the example shown, the truly want to do it is to use parentheses.

The selection group between the word 'Chapter' or 'Section'. It does not want to reference this match later. In fact, unless it is really a capture match, no

Please do not use it. Since there is no need to spend time and memory, this regular expression will be higher.

You can use '?:' To prevent storage of this match from being used in the future in the regular expression pattern parentheses. Provide the following modifications to the regular expressions shown above

The same functionality that exempts exempt from the child. Visual Basic Scripting Edition:

/ ^ (?: chapter | section) [1-9] [0-9] {0,1} $ /

For VBScript:

"^ (?: chapter | section) [1-9] {0,1} $

In addition to '?:' Metamorphic, there are two nonaptured metammatics to call them. One is a forward review, used? = Indicated, in any start matching cope

The regular expression pattern of the regular expression mode is matched to match the search string. One is negative, with '?!', Indicating that the regular expression mode does not match at any beginning.

The location is to match the search string.

For example, assume that there is a document containing a reference to Windows 3.1, Windows 95, Windows 98, and Windows NT. Further assume that this document needs to be updated, the method is to find all references to Windows 95, Windows 98, and Windows NT and change these references to Windows 2000. can

Use the following Visual Basic scripting edition regular expression, this is a forward review to match Windows 95, Windows 98 and

Windows NT:

/ Windows (? = 95 | 98 | NT) /

The same matches to do in VBScript can use the following expression:

"? = 95 | 98 | NT)"

After finding a match, the text matched immediately (not the character used in the pre-examined) begins to search the next time. For example, if the above

The expression matches the 'Windows 98', will continue to find from 'Windows' instead of '98'.

14.

Backward reference

top

Regular expressions One of the most important features is to store some of the modes of successful mode for use this capability. Please recall, for a regular table

Adding parentheses on both sides of the Darette or Some Mode will cause this partial expression to be stored in a temporary buffer. Can you use non-capture metamorphic characters '?:', '? =',

OR '?! ignores the saving of this part of the regular expression.

Each sub-match captured is stored in the contents encountered from left to right in the regular expression mode. The buffer number of the storage sub-match starts from 1, continuously

Direct to a maximum of 99 sub-expressions. Each buffer can be accessed using a '/ n', where n is one or two decimal of a specific buffer.

number.

The backward reference is the simplest, most useful application is to provide the ability to confirm the location of two of the same words in the text. Please see the sentence below:

Is Is the Cost Of Off Who Going Up Up?

Depending on the content written, the above sentence is obviously the problem of multiple repetitions of words. If there is a way to modify the sentence without looking for repetitions of each word

The child is fine. This feature can be implemented using a sub-expression using a sub-expression using a sub-expression.

// b ([A-Z] ) / 1 / b / gi

Equivalent VBScript expression is:

"/ b ([A-Z] ) / 1 / b"

In this example, the sub-expression is each of the parentheses. The captured expression includes one or more alphanumeric characters, namely '[a-z] ' specified.

The second part of the regular expression is a reference to the child captured by the previously captured, that is, the second appearance of the additional expression. '/ 1' is used

Set the first child. Word Boundary Metacity ensures only a separate word. If so, phrases such as "is is is itsued" or "this is"

They will be incorrectly identified by this expression.

In the Visual Basic Scripting Edition expression, the global flag ('g') after the regular expression indicates that the expression will be used in the input string

Find as much match as possible. Size-on-write sensitivity is specified by the case sensitivity tag ('I') at the end of the expression. Multi-line markers specify may appear in the newline

Potential match between the ends. For VBScript, various tags cannot be set in the expression, but the properties of the regexp object must be explicitly set.

Using the regular expression as shown above, the following Visual Basic scripting edition code can use sub-match information, and replace the same word for continuous appearance twice in a text string: the same word:

Var ss = "is is the cost of get / g";

VAR RE = // b ([A-Z] ) / 1 / b / gim; // Create a regular expression style.

Var rv = ss.replace (RE, "$ 1"); // replaces two words with a word.

The closest equivalent VBScript code is as follows:

DIM SS, RE, RV

SS = "is is the cost of get @ up?" & vbnewline

Set re = new regexp

Re.pattern = "/ b ([A-Z] ) / 1 / b"

Re.global = TRUE

Re.ignorecase = true

Re.Multiline = true

RV = Re.Replace (SS, "$ 1")

Note that in the VBScript code, global, case sensitivity, and multi-line tags are set by the appropriate properties of the regexp object.

Use $ 1 in the Replace method to reference the saved first sub-match. If there are multiple sub-match, you can continue to reference with $ 2, $ 3, etc..

Another use of the backward reference is to decompose a general resource indicator (URI) into the component portion. It is assumed that the following URI is destroed into protocol (FTP,

HTTP, ETC), domain address, and page / path:

Http://msdn.microsoft.com:80/scripting/default.htm

The following regular expressions can provide this feature. For Visual Basic Scripting Edition,

/ (/ w ): ([^ /:] ) (: / d *)? ([^ #] *) /

For VBScript:

"(/ w ): ([^ /:] ) (: / d *)? ([^ #] *)"

The first addition sub-expression is a protocol part used to capture the web address. The sub-expression matches any word before a colon and two front slash. First

Two additional sub-expressions capture the domain name address of the address. The sub-expression match does not include any character sequence of '^', '/' or ':' characters. Third additional

The sub-expression captures the website port number code, if the port number is specified. The sub-expression matches the zero or multiple numbers of a colon. Finally, the fourth addition

Expression captures the path as specified by the web address and / or page information. The sub-expression matches one and more characters other than '#' or spaces.

After applying the regular expression to the URI shown above, the child matches contains the following:

Regexp. $ 1 contains "http"

Regexp. $ 2 contains "msdn.microsoft.com"

Regexp. $ 3 contains ": 80"

Regexp. $ 4 contains "/scripting/default.htm"

/ ************************************************** *************** /

*

* Author: Emerald

*

* HomePage: http://gi.2288.org:88/

*

* SEO-GI: http://seo.2288.org:88*

* Sitename: Green College - Green Institute

*

* TIME: 2005-01-24

*

/ ************************************************** *************** /