table of Contents:
-------------------------------------------------- ----------------------------------
Regular expression
2. Early origin
3. Use regular expressions
4. Regular expression syntax
5. Establish regular expressions
6. Priority order
7. Ordinary characters
8. Special characters
9. Non-print characters
10. Character match
11. Limit
12. Locator
13. Select and Group
14. Backward reference
-------------------------------------------------- ----------------------------------
Squiring: Emerald
Green College - Green Institute
Regular expression
1.
Regular expression
top
If there is no tax expression, it is not familiar with this term and concept. However, they are not so nice you imagine.
Recall how to find files on the hard disk. You will definitely use the * characters to help find the files you are looking for. ? Character matching file name
Single characters, while * matches one or more characters. A pattern such as 'Data ?dat' can find the following file:
Data1.dat
Data2.dat
Datax.dat
Datan.dat
If you use * characters instead? The number of files found will be expanded. 'data * .dat' can match all the following file names:
Data.dat
Data1.dat
Data2.dat
Data12.dat
Datax.dat
Dataxyz.dat
Although this search file is certainly useful, it is also very limited. • The limited ability of wildcards can make you do what you can do for regular expressions?
Read, but the regular expression is more powerful and more flexible.
2.
Early origin
top
Early origin
Regular expressions "ancestors" can have been traced back to an early study on how the human nervous system works. Two digits of Warren McCulloch and Walter Pitts
Neuronics analyzes a mathematical way to describe these neural networks.
In 1956, a US mathematician called Stephen Kleene published a title based on the early work of MCCULLOCH and PITTs.
The aptitude of the incident, introduced the concept of regular expressions. Regular expressions are used to describe the expression of "algebra" called "regular set", so
Use the "regular expression" term.
Subsequently, it is found that this work can be applied to some early studies using Ken Thompson's computational search algorithm, Ken Thompson is the main invention of UNIX
people. The first practical application of the regular expression is the QED editor in UNIX.
As they said, the rest is a well-known history. Since then, until the regular expression is based on the text based on the text editor and the search tool.
Part part.
3.
Use regular expressions
top
In a typical search and alternative, the exact text to be found must be provided. This technology may be sufficient for simple search and replacement tasks in static text.
But because it lacks flexibility, it is difficult to search for dynamic text, or even impossible.
Using regular expressions, you can:
1. Test a pattern of strings. For example, you can test an input string to see if the string is present in a telephone number mode or a credit.
Card number mode. This is called data validity verification.
2. Replace the text. You can use a regular expression in the document to identify a particular text, then you can delete it, or replace it with another text.
3. Extract a sub-string from the string based on the mode match. Can be used to find a specific text in the text or input field.
For example, if you need to search the entire Web site to delete some excessive materials and replace some HTML formatted tags, you can use the regular expression to test each file, see if there is a material or HTML you want to find in this file. Formatted tag. With this method, you can narrow the affected file range.
The files that contain materials to be deleted or changed. You can then use the regular expression to delete the outdated material, and finally, you can use the regular expression again.
Find and replace those tags that need to be replaced.
Another example explaining the regular expression is a language that is not known for its string processing capabilities. Vbscript is a Visual Basic
Substrates, rich string processing functions. Visual Basic Scripting Edition similar to c does not have this capability. Regular expression
Visual Basic Scripting Edition string has a significant improvement. However, it may be used in VBScript using regular expressions
Higher efficiency, which allows multiple string operations to be performed in a single expression.
4.
Regular expression syntax
top
A regular expression is a text mode composed of normal characters (such as characters a to z) and special characters (called metammatics). This mode is described in finding text
One or more strings to be matched when the word body is hosted. Regular expression As a template, a character mode matches the search string.
Here are some regular expressions that may encounter:
Visual Basic VBScript matches
Scripting edition
/ ^ / [/ t] * $ / "^ / [/ t] * $" matches a blank line.
// D {2} - / d {5} / "/ d {2} - / d {5}" Verify that a ID number is 2 digits, one
Monolithic characters and a 5-digit composition.
/< (.*)>. (*)>. * // 1> "matches an HTML tag.
The table below is a complete list of metamorphic and its behavior in the regular expression context:
Character description
/ Tag the next character as a special character, or a primary character, or after
To a reference, or an octave. For example, 'n' matches characters "n". '/ n'
Match a newline. Sequence '//' Match "/" and "match" (".
^ Match the start position of the input string. If the regexp object is set
Multiline properties, ^ also matches the location after '/ n' or '/ r'.
$ Match the end position of the input string. If the regexp object is set
Multiline properties, $ also matching '/ n' or '/ r' before.
* Match the previous sub-expression zero or multiple times. For example, ZO * can match "Z" and
"zoo". * Equivalent to {0,}.
Match the previous sub-expression once or multiple times. For example, 'ZO ' can match "ZO"
And "ZOO" but cannot match "Z". Equivalent to {1,}.
• Match the previous sub-expression zero or once. For example, "Do (ES)" can match "do" in "do" or "does". Is equivalent to {0,1}.
{n} n is a non-negative integer. Match the determined N times. For example, 'o {2}' cannot match
"Bob" 'o', but can match two O in "Food".
{n,} n is a non-negative integer. At least n times. For example, 'o {2,}' cannot match
'O' in "Bob", but can match all O "fooood". 'o {1,}'
Equivalent to 'o '. 'o {0,}' is equivalent to 'o *'.
{N, M} M and N are non-negative integers, where n <= m. Match at least n times and up to do
With M times. Liu, "O {1, 3}" will match the top three O in "foooood".
'o {0,1}' is equivalent to 'o?'. Please note that there is no space between commas and two numbers.
• When this character is tight in any other restriction (*, ,?, {N}, {n,},
When the {n, m} is followed, the matching mode is unresurred. Non-greedy mode as possible
Match the search string, and the default greed mode is as many as possible
String of the cable. For example, for strings "oooo", 'o ?' Will match a single
"o", and 'o ' will match all 'o'.
Match any individual characters other than "/ n". To match anything including '/ n'
What characters, please use the mode of '[./n]'.
(Pattern) Match Pattern and get this match. The acquired matches can be generated
Matches collection, use Submatches collection in VBScript,
$ 0 ... $ 9 properties are used in Visual Basic scripting edition. want
Match the bracket character, use '/ (' or '/)'.
(?: pattern) Match Pattern but does not get matching results, that is, this is a non-acquisition.
With, it is not used for storage. This is combined using "or" characters (|)
Various parts of a model are useful. For example, 'industr (?: Y | iES)
It is a smale of 'Industry | Industries'.
(? = pattern) Positive summation, match the lookup character at any string of Pattern's string
string. This is a non-acquisition match, that is, the match does not need to get a supply.
use. For example, 'windows (? = 95 | 98 | NT | 2000)' Can match "Windows
"Windows" in 2000, but does not match "Windows" in "Windows3.11".
The forecast does not consume characters, that is, after a match occurs, in the last move
After the match, start the next matching search, not starting from the characters that contain the forecast.
(?! pattern) Negatively to check, in any mismatch, Negative Lookahead matches the
Search string at any point where a string not matching
Pattern's string is started to match the lookup string. This is a non-acquisition
With, that is, the match does not need to be used later. For example 'windows
(?! 95 | 98 | NT | 2000) 'Match "Windows" in "Windows 3.1",
But you can't match "Windows" in "Windows 2000". Not consuming words
That is, after a match occurs, open immediately after the last match
Start the next match, not starting from the character containing the queen
X | Y matches X or Y. For example, 'Z | Food' can match "z" or "food". '(z | f)
OOD 'matches "Zood" or "Food".
[XYZ] Character collection. Match any of the included characters. For example, '[ABC]'
Match 'A' in "Plain".
[^ XYZ] Negative character set. Match any of the characters that are not included. For example, '[^ ABC]'
Match 'P' in "Plain".
[A-Z] character range. Match any of the characters within the specified range. For example, '[a-z]' can be
Arbitrary lowercase letters in the range of 'A' to 'Z'.
[^ a-z] Negative character range. Match any of any characters that are not within the specified range. E.g,
'[^ a-z]' can match any of any characters that are not in the 'A' to 'Z'.
/ b Match a word boundary, that is, the location of the words and spaces. E.g,
'ER / B' can match 'Er' in "Never", but do not match "Verb"
'ER'.
/ B matches non-word boundary. 'ER / B' can match 'Er' in "Verb", but not
Er 'in "Never".
/ CX matches the control character indicated by x. For example, / cm matches a Control-M or
Enter. The value of x must be one of A-Z or A-Z. Otherwise, treat C as one
Original 'c' character.
/ d Match a numeric character. Equivalent to [0-9].
/ D Match a non-digital character. Equivalent to [^ 0-9].
/ f Match a change page. Equivalent to / x0c and / cl.
/ n Match a newline. Equivalent to / x0a and / cj.
/ r Match a carriage return. Equivalent to / X0D and / cm.
/ s Match any blank character, including spaces, tabs, change page, and the like. Equivalent to
[/ f / n / r / t / v].
/ S Match any non-blank character. Equivalent to [^ / f / N / R / T / V]. / t matches a tab. Equivalent to / x09 and / ci.
/ v Match a vertical tab. Equivalent to / x0b and / ck.
/ w Match any word character that includes underscore. Equivalent to '[A-ZA-Z0-9_]'.
/ W Match any non-word characters. Equivalent to '[^ a-za-z0-9_]'.
/ XN matches n, where n is a hexadecimal escape value. Hexadecimal escape value must be true
Two numbers long. For example, '/ x41' matches "a". '/ x041'
The '/ x04' & "1". ASCII coding can be used in regular expressions. .
/ NUM matches NUM, where NUM is a positive integer. References to the acquired match.
For example, '(.) / 1' matches two consecutive identical characters.
/ n identifies an octal escape value or a rearward reference. If / n is at least n
The acquired sub-expression, then n is a backward reference. Otherwise, if n is an octal
Numbers (0-7), then n is an eight-input escape value.
/ Nm identifies an octal escape value or a backward reference. If / nm has at least
Is Precededed by Least NM Gets the gathered expression, Nm is
A reference. If there is at least n acquisitions before / nm, then n is a post-text
The backward reference of the word M. If the previous conditions are not met, if n and m are
Octa (0-7), / nm will match the eight-en-en-esca-value NM.
/ Nml If n is an octal number (0-3), and M and L are eight input numbers (0-
7), match the eight-way escape value NML.
/ UN matches N, where N is a Unicode word represented by four hexadecimal numbers.
symbol. For example, / u00A9 matches copyright symbol (?).
5.
Establish regular expression
top
The method of constructing a regular expression and a method of creating a mathematical expression. That is, using a variety of metades and operators to create a smaller expression together to create larger
Expression.
A regular expression can be constructed by placing various components of the expression mode between a pair of separators. For Visual Basic Scripting Edition
Words, the separator is a pair of forward slash (/) characters. E.g:
/ Expression /
For VBScript, a pair of quotes ("") are used to determine the boundary of the regular expression. E.g:
Expression
In the two examples shown above, the regular expression mode is stored in the Pattern property of the Regexp object.
<< --------------------------------------------------------------------------------------------------------------------------------------- ------ >>
6.
Priority order
top
After constructing the regular expression, you can evaliate like a mathematical expression, that is, from left to right and in accordance with a priority order.
The following table lists the priority sequence of various regular expression operators from the highest priority to the lowest priority:
Operator description
/ Escapes
(), (? :), (? =), [] Parentheses and square brackets *, ,?, {N}, {n,}, {n, m} qualifier
^, $, / Anymetachacter location and order
| "Or" operation
<< --------------------------------------------------------------------------------------------------------------------------------------- ---------- >>
7.
Ordinary character
top
Ordinary characters consist of all those that are not explicitly specified as a metamorphic character, a non-printing character. This includes all uppercase and lowercase letters characters, all numbers, all
Point symbols and some symbols.
The simplest regular expression is a separate normal character that matches the character in the search string itself. For example, single-character mode 'a' can match
Search for letters 'A' that appears in any location in the string. Here are some single-character regular expression modes:
/ A /
/ 7 /
/ M /
Equivalent VBScript single-character regular expression is:
"a"
"7"
"M"
You can get a larger expression together with multiple single characters together. For example, the following Visual Basic scripting edition regular expression is not
Other, it is an expression created by combining single character expressive 'a', '7', and 'm'.
/ a7m /
Equivalent VBScript expression is:
"a7m"
Please note that there is no connection operator. What you need to do is to place a character behind another character.
<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<< >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>
8
.Special characters
top
There are many figures that need to be specially processed when trying to match them. To match these special characters, you must first transfrawate these characters, that is, before
A backslash (/) is used. The following table gives these special characters and its meaning:
Special character description
$ Match the end position of the input string. If you set the multiline of the regexp object
Attributes, $ also match '/ n' or '/ r'. To match the worth itself, use / $.
() Mark the beginning and end position of a child expression. Sub-expressions can be used later.
To match these characters, use / (and /).
* Match the previous sub-expression zero or multiple times. To match * characters, use / *.
Match the previous sub-expression once or multiple times. To match characters, use / .
Matches any single characters other than the resort / N. To match., Please use /.
Marking a bracket expression. To match [, please use / [.
• Match the previous sub-expression zero or once, or indicate a non-greedy qualifier. Do to match?
Character, please use / ?.
/ Tag the next character as a special character, or primary character, or rearward reference, or eight-encyclopedifier. For example, 'n' matches character 'n'. '/ n' matches changing. Sequence '//'
With "/", and '/ (' match "(".
^ Match the start position of the input string unless used in square brackets, it indicates
Do not accept the character set. To match ^ character itself, use / ^.
{Mark the start of the spectrum expression. To match {, please use / {.
| Indicate a choice between two items. To match |, please use / |.
9.
Non-printing characters
top
There are a lot of useful non-print characters, which occasionally must be used. The following table shows the escape sequence used to indicate these non-print characters:
Character meaning
/ CX matches the control character indicated by x. For example, / cm matches a Control-M or an Enterprise.
The value of x must be one of A-Z or A-Z. Otherwise, treat C as a primary 'c' word
symbol.
/ f Match a change page. Equivalent to / x0c and / cl.
/ n Match a newline. Equivalent to / x0a and / cj.
/ r Match a carriage return. Equivalent to / X0D and / cm.
/ s Match any blank character, including spaces, tabs, change page, and the like. Equivalent to
[/ f / n / r / t / v].
/ S Match any non-blank character. Equivalent to [^ / f / N / R / T / V].
/ t matches a tab. Equivalent to / x09 and / ci.
/ v Match a vertical tab. Equivalent to / x0b and / ck.
<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<< >>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>
10.
Character match
top
The period (.) Matches any single print or non-printing character in a string, except for the wrap (/ N). The following Visual Basic scripting
Edition regular expression can match 'aac', 'ABC', 'ACC', 'ADC', etc., can also match 'A1C', 'A2C', A-C 'and
A # c ':
/a.c/
Equivalent VBScript regular expression is:
"a.c"
If you try to match a string containing the file name, the period (.) Is part of the input string, you can add one in front of the period in the regular expression.
A backslash (/) character to achieve this requirement. For example, the following Visual Basic scripting edition regular expressions can be
With 'FileName.ext':
/filename/.ext/
For VBScript, the equivalent expression is as follows:
"filename / .ext"
These expressions are still quite limited. They only allow matching any single characters. In many cases, it is useful to match special characters from the list. For example, if
If the input text contains a number of chapter titles to Chapter 1, Chapter 2, and you may need to find these chapter titles. Braces expressions
One or more single characters can be placed in a square bracket ([and]) to create a list of to be matched. If the character is placed in parentheses, then
The list is called parentheses expressions. Like anywhere in parentheses, ordinary characters represent their own, that is, they match one of the input text.
already. Most special characters will lose their meaning when located in parentheses. There are some exceptions here:
1. ']' Character If not the first item, a list will be ended. To match the ']' character in the list, put it in the first item, followed at the beginning '["
Behind.
2. '/' is still an escape character. To match '/' characters, use '//'.
The characters included in parentheses are only matched to a single character in the parentheses expression in the regular expression. The following Visual Basic
Scripting Edition regular expressions can match 'Chapter 1', 'Chapter 2', 'Chapter 3', 'Chapter 4' and 'Chapter 5':
/ Chapter] / CHAPTER] /
In VBScript, you must match the same chapter title, please use the following expression:
"Chapter [12345]"
Note that the word 'Chapter' and the positional relationship of the characters in the brackets are fixed. Therefore, bracket expressions are only used to designate the satisfaction.
The word 'chapter' and a single character set after a space. Here is the ninth character position.
If you want to use the range instead of the character itself, you can use a hyphen to separate the start and end characters of the range. Per character
The character value will determine its relative order in a range. The following Visual Basic scripting edition regular expression contains an equivalent
The range of brackets shown.
/ Chapter [1-5] /
The expression of the same function in VBSCIPT is as follows:
"Chapter [1-5]"
If the range is specified in this manner, the start and end values are included in this range. One thing to note is that the starting value must be determined in Unicode sorting.
To end the end value.
If you want to include even characters in parentheses, you must use one of the following methods:
1. Use a backslash to escape: [/ -]
2. Place the hyphen in the start and end position of the parentheses list. The following expressions can match all lowercase letters and hyphens: [- a-z], [a-z-]
3. Create a range where the value of the start character is less than the hyperi, and the value of the end character is equal to or greater than the hyperpoint. The following two regular expressions meet this
Requirements: [! -], [! - ~]
Similarly, by placing an insert (^) at the beginning of the list (^), you can find all characters in the list or range. If the insert appears in the list
Location, match it itself, there is no special meaning. The following Visual Basic Scripting Edition regular expression matching chapter section is greater than 5 chapter
Section title:
/ CHAPTER [^ 12345] /
Use VBScript:
"Chapter [^ 12345]"
In the example shown above, the expression will match any numeric characters other than 1, 2, 3, 4, or 5 in the ninth position. Therefore, 'Chapter 7' is a match, the same 'Chapter 9' is also the same.
The above expression can be represented using a hyphen (-). For Visual Basic scripting edition is:
/ Chapter [^ 1-5] /
Or, VBScript is:
"Chapter [^ 1-5]"
Typical usage of parentheses is to specify matching of any uppercase or lowercase alphanumeric characters or any numbers. The following Visual Basic scripting edition
The expression gives this match:
/ [A-za-z0-9] /
Equivalent VBScript expression is:
"[A-ZA-Z0-9]"
11.
Default
top
Sometimes I don't know how much characters you want to match. In order to adapt to this uncertainty, the regular expression supports the concept of qualifier. These qualifiers can specify regular expressions
A given component must appear how many times can meet the match.
The following table gives a description of various qualifiers and its meaning:
Character description
* Match the previous sub-expression zero or multiple times. For example, ZO * can match "Z" and "ZOO".
* Equivalent to {0,}.
Match the previous sub-expression once or multiple times. For example, 'ZO ' can match "ZO"
And "ZOO" but cannot match "Z". Equivalent to {1,}.
• Match the previous sub-expression zero or once. For example, "Do (ES)?" Can match "do"
Or "do" in "does". Is equivalent to {0,1}.
{n} n is a non-negative integer. Match the determined N times. For example, 'o {2}' cannot match "bob"
The 'o' in the middle, but can match two O in "Food".
{n,} n is a non-negative integer. At least n times. For example, 'o {2,}' does not match "BOB"
'O', but can match all O "fooood". 'o {1,}' is equivalent to 'o '. 'O
{0,} 'is equivalent to' o * '.
{N, M} M and N are non-negative integers, where n <= m. Match at least n times and matched M times.
Liu, "O {1, 3}" will match the top three O in "foooood". 'o {0,1}' is equivalent
At 'o?'. Please note that there is no space between commas and two numbers.
For a large input document, the number of chapters is easily more than 9 chapters, so there is a way to handle two-digit or three-digit chapter number. Default
This feature is provided. The following Visual Basic scripting edition regular expression can match the chapter title with any bits:
/ Chapter [1-9] [0-9] * /
The following VBScript regular expressions perform the same match:
"Chapter [1-9] [0-9] *"
Please note that the qualifier appears after the range expressions. Therefore, it will be applied to the entire range of expressions included. In this example, only the number from 0 to 9
word.
There is no use of ' ' default here, because one number is not required in the second or subsequent position. Also didn't use '?' Characters, because this will limit the number of chapters to only two digits. At least one number is required after 'Chapter' and space characters.
If the number of chapter is limited to 99, you can use the following Visual Basic scripting edition expression to specify at least one number, but
No more than two numbers.
/ Chapter [0-9] {1,2} /
The following regular expressions can be used for VBScript:
"Chapter [0-9] {1,2}"
The disadvantage of the above expression is that if there is a chapter number greater than 99, it still only matches the first two digits. Another disadvantage is that some people can create a chapter
0 and still match. A better Visual Basic scripting edition expression that matches two digits is as follows:
/ Chapter [1-9] [0-9]? /
or
/ Chapter [1-9] [0-9] {0,1} /
For VBScript, the following expression is equivalent to the above:
"Chapter [1-9] [0-9]?"
or
"Chapter [1-9] [0-9] {0,1}"
'*', ' ' And '?' Limits are called greed, that is, they match the text as much as possible. Sometimes this is not what happened.
Sometimes I just hope that minimal match.
For example, you may want to search for an HTML document to find a chapter title that is included in the H1 tag. This text may have the following form in the document:
The following expression matches all content between the beginning of the smaller than the number (<) to the end of the H1 tag.
/ "
The regular expression of VBScript is:
"<. *>"
If the starting H1 mark begins, the following non-greedy expressions only match
/ "
or
"<. *?>"
By placing '?'? '?'? '"After' * ',' 'or'? ', The expression is transferred from greedy to non-greed or minimally matches.
12.
Locator
top
So far, the examples seen are considered to find the chapter title that appears anywhere. Any string 'Chapter' after the appearance, follows one space and one
A number may be a real chapter title, or a cross-reference for other chapters. Since the true chapter title always appears in a row,
Need to design a method only to find the title instead of a cross-reference.
The locator provides this feature. The locator can secure a regular expression to the beginning or end of a row. You can also create only in words or only in words.
The regular expression that appears at the beginning or end. The following table contains a list of regular expressions and their meaning:
Character description
^ Match the start position of the input string. If you set the multiline property of the Regexp object,
^ Also matches the location after '/ n' or '/ r'.
$ Match the end position of the input string. If you set the multiline property of the Regexp object,
$ Match '/ n' or '/ r' before.
/ b Match a word boundary, that is, the location of the words and spaces.
/ B matches non-word boundary. You cannot use a qualifier for the locator. Because there is no continuous plurality of positions in front of a newline or word boundary, there is a expression such as '^ *'
It is not allowed.
To match the text of a line of text, use the '^' characters at the beginning of the regular expression. Don't put the syntax of '^' and its in parentheses.
The syntax is mixed. Their syntax is different.
To match the text of a line of text, use the '$' character in the end of the regular expression.
To use the locator when finding the chapter title, the following Visual Basic Scripting Edition regular expression will match the beginning of a line.
Two numbers of numbers:
/ ^ Chapter [1-9] [0-9] {0,1} /
The regular expression of the same function in VBScript is as follows:
"^ Chapter [1-9] [0-9] {0,1}"
A true chapter title not only appears in a row, and this line is only this content, so it is inevitably located on a line. Below
Expression ensures that the specified match matches the chapter without matching cross-reference. It is a list of regular tables that only matches only the beginning and end position of a line
Dressing is achieved.
/ ^ Chapter [1-9] [0-9] {0,1} $ /
Use VBScript:
"^ Chapter [1-9] [0-9] {0,1} $"
There is a little different from the matching word boundary, but it adds a very important feature to regular expressions. The word boundary is the location between words and spaces. Non-word
The boundary is anywhere. The following Visual Basic scripting edition expressions will match the first three characters of the word 'chapter' because it
They appear after the word boundary:
// bcha /
For VBScript:
"/ bcha"
The location of the '/ b' operator here is critical. If it is located at the beginning of the string to match, look for matching at the beginning of the word; if it is rewriting
At the end of the string, lookup matches at the end of the word. For example, the following expression will match 'Ter' in the word 'chapter' because it appears in words.
Before the border:
/ Ter / B /
as well as
"Ter / B"
The following expression will match 'Apt' because it is located in 'Chapter', but does not match 'Apt' in 'Aptitude':
// bapt /
as well as
"/ Bapt"
This is because 'APT' in the word 'Chapter' appears in the non word boundary position, and in the word 'aptitude' is located in the word boundary position. Non-word boundary
The location of the operator is not important because the match is not related to the beginning or end of a word.
13
Select
top
Select Allows the use of '|' characters to select in two or more candidates. Expressing the regular expression of the expansion chapter title, it can be expanded to not only apply
The expression of the chapter title. However, this can not be imagined directly. When using the selected selection, the most likely expression of the '|' character is matched. You may recognize
The following Visual Basic scripting edition and VBScript expressions will match the beginning and end position of a row and follow one or two numbers.
The 'Chapter' or 'Section' of the word:
/ ^ Chapter | Section [1-9] [0-9] {0,1} $ /
"^ Chapter | Section [1-9] [0-9] {0, 1} $" Unfortunately, the true situation is that the regular expression above is either matching the word 'chapter' at the beginning of a row. Match the end of the line
Any number of 'section'. If the input string is 'Chapter 22', the above expression will only match the word 'chapter'. If you enter a string
For 'section 22', the expression will match 'section 22'. But this result is not the purpose we here, so there must be a way to make the regular table.
The Dorm is more easier to respond to what you want, and there is indeed this method.
Parentheses can be used to limit the range of choices, that is, the choice is only suitable for both words 'Chapter' and 'Section'. However, parentheses
It is also difficult to handle because they are also used to create sub-expression, and some content will be introduced behind the sub-expression. By adopting the regime shown above
Expressions and parentheses can be added to the appropriate position, allowing the regular expression to match 'Chapter 1', or match 'section 3'.
The following regular expression uses parentheses to form a group of 'chapter' and 'section', so the expression can work correctly. Visual Basic
Scripting edition is:
/ ^ (Chapter | section) [1-9] [0-9] {0,1} $ /
For VBScript:
"^ (Chapter | section) [1-9] [0-9] {0,1} $
These expressions are correct, just generate an interesting by-product. Place a proper group in 'Chapter | Section' on both sides, but also
What is one of two to match words is captured for future use. Since there is only one group of parentheses in the expression shown above, there can only be a captured
Submatch. You can use the Submatches collection of VBScript or $ 1- $ 9 properties of the Regexp object in Visual Basic scripting edition.
To reference this sub-match.
Sometimes it is desirable to capture a child, sometimes it is undesirable. In the example shown, the truly want to do it is to use parentheses.
The selection group between the word 'Chapter' or 'Section'. It does not want to reference this match later. In fact, unless it is really a capture match, no
Please do not use it. Since there is no need to spend time and memory, this regular expression will be higher.
You can use '?:' To prevent storage of this match from being used in the future in the regular expression pattern parentheses. Provide the following modifications to the regular expressions shown above
The same functionality that exempts exempt from the child. Visual Basic Scripting Edition:
/ ^ (?: chapter | section) [1-9] [0-9] {0,1} $ /
For VBScript:
"^ (?: chapter | section) [1-9] {0,1} $
In addition to '?:' Metamorphic, there are two nonaptured metammatics to call them. One is a forward review, used? = Indicated, in any start matching cope
The regular expression pattern of the regular expression mode is matched to match the search string. One is negative, with '?!', Indicating that the regular expression mode does not match at any beginning.
The location is to match the search string.
For example, assume that there is a document containing a reference to Windows 3.1, Windows 95, Windows 98, and Windows NT. Further assume that this document needs to be updated, the method is to find all references to Windows 95, Windows 98, and Windows NT and change these references to Windows 2000. can
Use the following Visual Basic scripting edition regular expression, this is a forward review to match Windows 95, Windows 98 and
Windows NT:
/ Windows (? = 95 | 98 | NT) /
The same matches to do in VBScript can use the following expression:
"? = 95 | 98 | NT)"
After finding a match, the text matched immediately (not the character used in the pre-examined) begins to search the next time. For example, if the above
The expression matches the 'Windows 98', will continue to find from 'Windows' instead of '98'.
14.
Backward reference
top
Regular expressions One of the most important features is to store some of the modes of successful mode for use this capability. Please recall, for a regular table
Adding parentheses on both sides of the Darette or Some Mode will cause this partial expression to be stored in a temporary buffer. Can you use non-capture metamorphic characters '?:', '? =',
OR '?! ignores the saving of this part of the regular expression.
Each sub-match captured is stored in the contents encountered from left to right in the regular expression mode. The buffer number of the storage sub-match starts from 1, continuously
Direct to a maximum of 99 sub-expressions. Each buffer can be accessed using a '/ n', where n is one or two decimal of a specific buffer.
number.
The backward reference is the simplest, most useful application is to provide the ability to confirm the location of two of the same words in the text. Please see the sentence below:
Is Is the Cost Of Off Who Going Up Up?
Depending on the content written, the above sentence is obviously the problem of multiple repetitions of words. If there is a way to modify the sentence without looking for repetitions of each word
The child is fine. This feature can be implemented using a sub-expression using a sub-expression using a sub-expression.
// b ([A-Z] ) / 1 / b / gi
Equivalent VBScript expression is:
"/ b ([A-Z] ) / 1 / b"
In this example, the sub-expression is each of the parentheses. The captured expression includes one or more alphanumeric characters, namely '[a-z] ' specified.
The second part of the regular expression is a reference to the child captured by the previously captured, that is, the second appearance of the additional expression. '/ 1' is used
Set the first child. Word Boundary Metacity ensures only a separate word. If so, phrases such as "is is is itsued" or "this is"
They will be incorrectly identified by this expression.
In the Visual Basic Scripting Edition expression, the global flag ('g') after the regular expression indicates that the expression will be used in the input string
Find as much match as possible. Size-on-write sensitivity is specified by the case sensitivity tag ('I') at the end of the expression. Multi-line markers specify may appear in the newline
Potential match between the ends. For VBScript, various tags cannot be set in the expression, but the properties of the regexp object must be explicitly set.
Using the regular expression as shown above, the following Visual Basic scripting edition code can use sub-match information, and replace the same word for continuous appearance twice in a text string: the same word:
Var ss = "is is the cost of get / g";
VAR RE = // b ([A-Z] ) / 1 / b / gim; // Create a regular expression style.
Var rv = ss.replace (RE, "$ 1"); // replaces two words with a word.
The closest equivalent VBScript code is as follows:
DIM SS, RE, RV
SS = "is is the cost of get @ up?" & vbnewline
Set re = new regexp
Re.pattern = "/ b ([A-Z] ) / 1 / b"
Re.global = TRUE
Re.ignorecase = true
Re.Multiline = true
RV = Re.Replace (SS, "$ 1")
Note that in the VBScript code, global, case sensitivity, and multi-line tags are set by the appropriate properties of the regexp object.
Use $ 1 in the Replace method to reference the saved first sub-match. If there are multiple sub-match, you can continue to reference with $ 2, $ 3, etc..
Another use of the backward reference is to decompose a general resource indicator (URI) into the component portion. It is assumed that the following URI is destroed into protocol (FTP,
HTTP, ETC), domain address, and page / path:
Http://msdn.microsoft.com:80/scripting/default.htm
The following regular expressions can provide this feature. For Visual Basic Scripting Edition,
/ (/ w ): ([^ /:] ) (: / d *)? ([^ #] *) /
For VBScript:
"(/ w ): ([^ /:] ) (: / d *)? ([^ #] *)"
The first addition sub-expression is a protocol part used to capture the web address. The sub-expression matches any word before a colon and two front slash. First
Two additional sub-expressions capture the domain name address of the address. The sub-expression match does not include any character sequence of '^', '/' or ':' characters. Third additional
The sub-expression captures the website port number code, if the port number is specified. The sub-expression matches the zero or multiple numbers of a colon. Finally, the fourth addition
Expression captures the path as specified by the web address and / or page information. The sub-expression matches one and more characters other than '#' or spaces.
After applying the regular expression to the URI shown above, the child matches contains the following:
Regexp. $ 1 contains "http"
Regexp. $ 2 contains "msdn.microsoft.com"
Regexp. $ 3 contains ": 80"
Regexp. $ 4 contains "/scripting/default.htm"
/ ************************************************** *************** /
*
* Author: Emerald
*
* HomePage: http://gi.2288.org:88/
*
* SEO-GI: http://seo.2288.org:88*
* Sitename: Green College - Green Institute
*
* TIME: 2005-01-24
*
/ ************************************************** *************** /