Regular Expressions (1) ---- What Is Regular Expressions?

xiaoxiao2021-03-06 107

Regular expressions are commonly forgotten, so they still have to compare insurance, so they have this note.

I hope to help everyone. J

1. What is regular expression ........................................... ................................................ ................................................ ............. 2

2. Regular expression of the origin ............................................. ................................................ ................................................ ............ 2

3. Regular expression uses Xiangyi ........................................ ................................................ ................................................ ........ 3

3.1 Basic syntax ............................................. ................................................ ................................................ ......................... 3

3.1.1 ordinary characters ........................................... ................................................ ................................................ .................... 3

3.1.2 Non-print characters .......................................... ................................................ ................................................ .................

3.1.3 Special characters ........................................... ................................................ ................................................ .................... 3

3.1.4 Character set ........................................... ................................................ ................................................ .......................... 4

3.1.5 Using metamorphic characters in characters ..................................... ................................................ ................................................ 5

3.1.6 Pre-defined character set ......................................... ................................................ ................................................ ............... 53.1.7 defined ............................. ................................................ ................................................ .......................................

3.1.8 positioner ........................................... ................................................ ................................................ .......................... 6

3.1.9 "" Yuan characters ........................................ ................................................ ................................................ ........................ 7

3.1.10 use "|" to express the choice ....................................... ................................................ ................................................ ............. 8

3.1.11 "()" indicate grouping ...................................... ................................................ ................................................ ...... 8

3.1.12 "?" Supplementary ....................................... ................................................ ................................................ ............. 8

3.1.13 Add a note to the regular expression ....................................... ................................................ .............................................. 8

3.1.14 Operational Operations Priority ....................................... ................................................ ................................................ .. 8

3.2 Advanced Topics ............................................. ................................................ ................................................ ............................ 9

3.2.1 Reverse reference .......................................... ................................................ ................................................ ..................... 9

3.2.2 Specify mode Option in regular expression ................................... ................................................ .................................. 9

3.2.3 Lookaround assertion ........................................... ................................................ ................................................ .... 10

4. Regular expression Basic syntax index ........................................ ................................................ ............................................... 11

5. Regular expression advanced syntax index ........................................ ................................................ ................................................ 15

6. Reference information ............................................ ................................................ ................................................ ............................. 17

7. Recommended tools ............................................ ................................................ ................................................ ............................. 17

1. What is a regular expression?

Simply put, the regular expression is a powerful tool that can be used in text mode matching and replacement. It is a text matching mode that makes it clearly describing the text string by a series of common characters and special characters.

Regular expressions are not a special language, but it can also be seen as a language that allows users to clearly describe the matching mode of the text string by using a series of normal characters and special characters. In addition to simply describing these modes, the regular expression interpretation engine can usually be used to traverse matching, and use mode as a separator to resolve strings into a sub-string, or replace text or reset text format in intelligence. Regular expressions provide a valid and simple way to resolve many of the common tasks related to text handling.

Regular expressions have two standards:

· Basic regular expressions (BRE - Basic Regular Expressions)

· Expanded regular expressions (ERE - Extended Regular Expressions).

ERE includes BRE function and other concepts.

Regular expressions currently have two interpretation engines:

· TEXT-DIRECTED ENGINE based on character drive

· Regex-Directed Engine Jeffery Friedl is called DFA and NFA interpretation engines.

Convention:

In order to describe it, do some agreements in this article:

1. All expressions exemplified herein are based on an NFA interpretation engine.

2. Regular expressions, that is, matching mode, will be short-written as regex.

3. Regex's matching target, that is, the target string, is abbreviated as String.

4. The matching result will be identified with yellow background.

5. This is a regex using 1 / 1 = 2.

6. Examples will be used in the following format:

Regex

Target String

Description

Test

This is a test

Will match TEST, TESTCASE, etc.

2. Original expression

Regular expressions "ancestors" can have been traced back to an early study on how the human nervous system works. Two neur physiologists of Warren McCulloch and Walter Pitts have studied a mathematical way to describe these neural networks.

In 1956, a US mathematician called Stephen Kleene published an early working on McCulloch and Pitts, published a papers titled "Neural Network Emergencies", introduced the concept of regular expressions. Regular expressions are used to describe expressions he called "regular set algebra", so the term "regular expression" is used.

Subsequently, it is found that this work can be applied to some early studies using Ken Thompson's computing search algorithm, Ken Thompson is the main inventors of UNIX. The first practical application of the regular expression is the QED editor in UNIX. Since then, until now the regular expression is an important part of the text-based editor and search tool. Regular expressions with full syntax use in terms of format matching of characters, later applied to the field of melting information technology. Since then, the regular expression has been developed through several periods, and the current standard has been approved by ISO (International Standards Organization) and is identified by the Open Group organization.

3. Regular expression uses Xiang's solution

The simplest regular expression believes that everyone is familiar and often used, that is the text string. A particular string can be described by the text itself; the regex mode like Test can accurately match the input string "test", but it can also match this is a testcase, this is not what we want.

Of course, using regular expressions match equal to its own exact string is no value, and does not reflect the true function of regular expressions. However, if you want to find Test, but all words starting with letters T, or all 4 letters of words, what should I do? This exceeds the reasonable range of the text string. So we need to study the regular expression in depth.

3.1 Basic Syntax

Although regular expressions are not a special language, it also has some special provisions, or it can be called basic syntax.

Regular expression is a text mode composed of normal characters (such as characters a to z) and special characters (called metammatics). This mode describes one or more strings to be matched when the text body is looking for. Regular expression As a template, a character mode matches the search string.

The method of constructing a regular expression and a method of creating a mathematical expression. That is, using a variety of metamorphic characters to create a larger expression together with the operator.

A regular expression can be constructed by placing various components of the expression mode between a pair of separators.

3.1.1 Ordinary Character

It consists of all printed and non-print characters that are not explicitly specified as metabits. This includes all uppercase and lowercase letters characters, all numbers, all punctuation symbols, and some symbols.

3.1.2 Non-print characters

Non-print characters are also ordinary characters, which are listed separately for reference. Symbol

Description

/ cx

Match the control character indicated by x. For example, / cm matches a Control-M or an Enterprise. The value of x must be one of A-Z or A-Z. Otherwise, the C is treated as a primary 'c' character.

/ f

Match a change page. Equivalent to / x0c and / cl.

/ N

Match a newline. Equivalent to / x0a and / cj.

/ r

Match a carriage return. Equivalent to / X0D and / cm.

/ s

Match any blank characters, including spaces, tabs, change, and more. Equivalent to [/ f / n / r / t / v].

/ S

Match any non-blank character. Equivalent to [^ / f / N / R / T / V].

/ t

Match a tab. Equivalent to / x09 and / ci.

/ v

Match a vertical tab. Equivalent to / x0b and / ck.

Non-print characters can be used in Regex. / t Match a Tab character (ASC ||), / r will match a carriage return (0x0D), / N will match a newline (0x0a). It should be noted that Windows uses / r / n to indicate the end of the line, and UNIX uses / n.

Similarly, we can use a 16-based ASCII code or ansi standard code in Regex. In the Latin, the code of the copyright symbol is 0xA9, so we can also match the copyright symbol / XA9. Another way to match the TAB is: / x09. But note that the first bit "0" must be removed.

3.1.3 Special characters

Special characters are also called metades, retain characters (Metacharactor), which represents special significance in Regex, most of the meaning is different in different context, and only the most common sense is listed here.

There are 11 special characters:

Symbol

Description

Match the end position of the input string. If the demiline property of the Regexp object is set, $ or '/ r' is matched. To match the worth itself, use / $.

()

Mark the start and end position of a child expression. Sub-expressions can be used later. To match these characters, use / (and /).

Match the previous sub-expression zero or multiple times. To match * characters, use / *.

Match the previous sub-expression once or multiple times. To match characters, use / .

Match any single characters other than the commutline / n. To match., Please use /.

[

Marking a bracket expression. To match [, please use / [.

Match the previous sub-expression zero or once, or indicate a non-greedy qualifier. To match? Characters, please use /?.

The next character is marked or special characters, or the primary character, or reverse reference, or eight-way escape. For example, 'n' matches character 'n'. '/ n' matches changing. Sequence '//' matches "/", and '/ (' matches "(".

Match the start position of the input string unless used in square brackets, it indicates that it does not accept the character set. To match ^ character itself, use / ^.

{

The start of the tag qualifier expression. To match {, please use / {.

Indicates a choice between two items. To match |, please use / |.

In the metamorphic pre-emptive / escape, special characters can be used as normal characters.

For example: To match 1 1 = 2, the correct regular expression should be 1 / 1 = 2. Otherwise, will be treated as a special character.

In addition to special characters, all other characters should not be added /. Because / is also a special character. / And ordinary characters can also create a special meaning. For example, / D represents matching all numbers. As a programmer, single quotes and dual quotes are not special characters, it may be surprised. But this is correct. Because we are programming, the programming language knows which characters between quotation numbers indicate special meaning, the compiler processes them to regex before passing the string x to the Regex interpretation engine. For example, in C #, if we have to match 1 / 1 = 2, we must write this in the program: "1 // 1 = 2", the C # compiler will put "//" to one "/ ". Similarly, if you want to match C: / Temp, first, the regular expression should write C: // Temp this, then we should write this in the program: "C: Temp".

3.1.4 Character Set

The character set describes a set of characters, and the Regex interpreter will consider a character in the matching character set to be considered successful.

Character set [] enclose.

For example, GR [AE] Y can match GRAY or GREY.

The character set can only match a character, GR [AE] Y can't match Grame. The character sequence in the character set is arbitrary, the result is unique.

A range can be represented by means of hyphens in characters. [0-9] The results of [0123456789] are the same. There are a variety of characters. For example, [0-9A-FA-F] indicates that all 16 encompasses, including uppercase. It is also possible to combine the range and individual characters together, [0-9A-FXA-FX] represents matching all 16 credits or one character X. The order of the character set does not affect the results.

In the start flag "[" of the character set, add a "^" symbol, indicate the negation, indicating all characters other than the characters defined in the character set. Includes non-print characters and row ends.

Note: A character set matches a character, not a location. and so. Q [^ u] is not "the characters behind Q are not U". Instead, "The characters behind it can be all characters other than u".

Q [^ u] will not match IRAQ.

But it will match the Iraq IS A Country because the space behind the Q is a "not u" character.

3.1.5 Using metamodes in characters

The character set in the character set can only be ']', '/', '^', and '-'.

Other metades have lost special meaning in the character set, indicating just a normal character. You don't need to add "/".

such as:

Match a "*" or " ", it is enough to use [* ]. Even if you add "/", the regex interpreter will ignore them.

Treatment of four special characters:

In the character set to indicate "]", "^" and "-" need to add an escar "/" to indicate that they represent ordinary characters "]," ^ ", and" - ".

You can also put them in a location where you can't represent special sense, because the method is better because they don't affect readability.

"^"

To match a "^", you can put it in addition to any position that keeps "[".

Regex

String

Description

[x ^]

A string with x and ^.

Match x or "^"

"]"

"]" Can be placed immediately following the "[" position, or uses the negative character set.

Regex

String

Description

[] x]

A string with x and]

Match x or "]"

[^] x]

A string with x and]

Match all characters other than X and "]"

"/"

To match "/" as a normal character, not a special character, you must have a "/" to enclose "/".

Regex

String

Description [// x]

A string with x and / /

Match x or "/"

"-"

Connecting characters can be placed in front of "[", or "]", or followed by "^".

Regex

String

Description

[-x]

A string with x and -

Match X or "-"

[x-]

A string with x and -

Match X or "-"

3.1.6 Predefined character set

Because many character sets are often used, the regex interpreter predefined some common character sets:

Regex

Meaning

Description

/ d

[0-9]

All numbers

/ w

[A-ZA-Z]

Represents all characters, and cultural fonts

/ s

[/ t / r / n]

Space, Enter and Tab. Related to cultural fonts

Booking a character set can be used in both a character set or outside of the character set.

Regex

String

Description

/ S / D

1 2 = 3

Matching behind the blank character followed by a number

[/ s / d]

1 2 = 3

Match a single character or a number or a blank character

[/ DA-FA-F] and the matching result of [0-9A-FA-F] is the same.

Similarly, a "^" symbol is added before the predefined character set indicates the negation. They also have a pre-defined representation:

Regex

Meaning

Description

/ D

[^ / d]

Non-numeric

/ W

[^ / W]

Non-character, related to cultural font

/ S

[^ / s]

Non-spaced, Enter and Tab. Related to cultural fonts

In "[]", you should be special when using a negative booking. [/ D / s] is not equal to [/ ^ D / S]. [/ ^ D / S] will match all characters except the numbers and blank characters. And [/ d / s] will match either not a number or a blank character. Because the numbers are not a blank character, the blank character is not a number, so [/ d / s] will match any character.

3.1.7 definition

A simple method provides a simple method for specifying the number of times that allows specific characters or character sets themselves in the mode. The qualifier always references the default (left) mode, usually a single character unless the parentheses creates a mode group.

The qualifier is * or or or {n} or {n,} or {n, m} a total of 6.

Symbol

Description

0 times get 1 time

0 times or N times

1 time or N times

{MIN, MAX}

Minimum MIN times, up to MAX

Max must be greater than or equal to MIN.

{min, }

Minimum MIN times, or N times

{min}

Accurate repeating MIN times

Use "?", "*", " " After the character set, indicate duplicate. The entire character set is repeated instead of repeating the character.

Regex

String

significance

[0-9]

846, 111

Matching numbers

([0-9))

846, 111

Match the same number

[0-9] will match 846, and the 111 is also matched.

If you want to repeat, just the matching character, not the entire character set, you must use "reverse reference".

([0-9]) / 1 will only match 111 without matching 846.

(Second Partial Advanced Topic Executive)

If the target String is 811116. Then, 1111 will be matched. If you don't want this, you need to use it.

Lookahead and Lookbehind. (Second Partial Advanced Topic Executive)

3.1.8 Location

To now, we are familiar with ordinary characters, special characters (metad characters), and character sets. In both cases, REGEX matches all characters.

The locator is another, it does not match the character, the opposite, it matches a location.

There are several locators:

Regex

Function

Description

The position before the first character

Continuous line

Last position behind the last character

Continuous line

/ A

Always match the first location of String

Do not include a newline

Always match the last position of String

Do not include a newline

Regex

String

significance

ABC

Match a position before A

ABC

Match the position behind C

^ A

ABC

Match A

^ b

ABC

Cannot match

C $

ABC

Match C

A $

ABC

Cannot match

The boundary of the word

There is also a locator that matches a word (Word) boundary. Expressed with / b.

The word (Word) is a character composition that can be formed ("Word Characters"), "Word Characters" is a character that can make a word, does not include non-print characters and carriage return.

There are four different locations that are considered the boundary of the word:

The first character before, if the first character is a "word character".

The last character after the last character, if the last character is a "word character".

The position between the words and non-words followed by words.

Keeping the non-word and the word

All Word Characters can be represented by / W.

All Non-Word Characters can be represented by / W.

/ b Match the boundary of a word.

/ B represents the position of an unword boundary, which matches any position that is not a secondary boundary.

3.1.9 "." Metacity

In the regular expression, "." Is the maximum element character used, and it is also the most easily used wrong. So we are alone.

"." Almost match any characters. The only exception is a newline.

This exception exists is historical reasons. The first tool with a regular expression is based on a newline. It reads a line from the file and then matches it. Because in these tools, there will never have a wrap in String, so "." Will never match the communist.

Modern tools can be used to match a String even a String or even the entire file. So now the regex interpreter contains an option, you can let "." Will match all characters, including a newline.

"." Is a very powerful element character. It allows us to lazy. But we should use it carefully. Let's take an example:

We must match the date of the format of MM / DD / YY. But we can let the user specify the division of the date. A simple regex is: /d/d./d/d./d/d looks up. It will match 04/09/07 well. The problem is: 04409407 will also be matched. Because the third 4 and fifth 4 will be "." Match. This is not what we want to get.

/d/d[-/.]/d/d[-/.]/d/d is a good way than the above, the user can specify "-", ".", "/" as a date. symbol. Because "." Does not represent a special character in the character set, so we don't need to add "/" before ".".

But this method is not perfect, it will match 99/99/99, [0-1] / d [- /.] / D / d may be better. Although it still matches 19/39/99,. The method is enough, you don't have to pursue perfection, if this is used to verify user needs, it may also need to improve, if it is just used to analyze a code, maybe it is enough.

If we want to match a string with double quotes. It sounds easy, we can put any more characters between two double quotes. Regex may write this: ". *", This will match the Put A "String" Between Double Quotes. The result is, but if "String One" and "string two", the result will be " String one "and" string two ". This is not the result we want. So here we can replace "[^" / r / n] * "with a negative character set

3.1.10 use "|" to select

As mentioned earlier, the character set can match one of many characters, and the replacement is slightly different.

If you need to match CAT or DOG, you can write: Cat | Dog, you can also add a lot: Cat | Dog | Mouse | Fish.

Note: "|" is the minimum operator in the regular expression. The Regex interpreter is in match, or all of the "|" left all, or match all the "|".

3.1.11 Use "()" to represent packets

Parentheses can be used to limit the range of selection.

The above example, if you want to limit the replace, you can use the "()" symbol.

such as:

If we have to match the entire word instead of a part of the word. Regex can write: / b (Cat | DOG) / B.

This tells the regex interpreter to look for a border first, then either CAT, or Dog, then look for a boundary. If you ignore the parentheses, the regex interpreter will match this: either CAT follows the back of a boundary, or there is a boundary behind the DOG.

Supplementary instructions 3.1.12 "?"

"?" Except for repetition, it also represents optional.

For example: COLOU® R, Match Color and Colour.

This group is an optional project with parentheses.

For example: NOV (EMBER)? Match NOV and NOVEMBER.

With "?" Tagged, it is equal to telling the Regex interpreter having two options: either match, or do not match. However, the regex interpreter will always go to the part of the collapse, only if this failed, it will be ignored.

The effect is that if the Today IS Feb 23RD, 2004 is matched with Feb 23 (RD), the result is always Feb 23RD, not Feb 23.

"?" Is also known as "lazy elements" because it always matches as much as possible.

3.1.13 Add a comment to the regular expression

You can add a comment to the regular expression:

(? #Comment here)

3.1.14 Operational priority

Symbol

FUNCTION

Memo

Escapes

(), (? :), (? =), []

brackets

*, ,?, {n}, {n,}, {n, m}

Default

^, $, / Anymetachacter

Locator

3.2 Advanced Topics

Here is some slightly complex topics, such as Backreference, Lookround, Ifelsethen, and more.

3.2.1 Reverse reference

() In addition to enclosing regex, you can create a reverse reference. Adding parentheses on a regular expression mode or partial mode will result in associated matching to a temporary buffer, and each sub-match captured is stored in content from left to right in the regular expression mode. The buffer number of the storage sub-match starts from 1, continuous numbers up to the maximum 99 sub-expression. Each buffer can be accessed using a '/ n', where n is a one or two-digit decimal number identifies a particular buffer. You can use non-capture element characters '?:', '? =', Or '?!' To ignore the saving of related matches. E.g:

Set (Value) will match Set and SetValue. In the first case, the reverse reference of / 1 will be empty because the SET does not match Value. In the second case, the value of the reverse reference to / 1 will become Value.

If you don't want to create a reverse reference, you can use special symbols: ", such as Set (?: Value)?

Use reverse reference

For example: We have to match an HTML tag, and between the two tags.

We can write this: <([A-Z] [A-Z0-9] *) [^>] *>. *? .

First create a reference to [A-Z] [A-Z0-9] and then use this reference later.

Note: You cannot reference yourself in the reference.

Regular expressions One of the most important features is to store some of the modes of successful mode for use this capability. Recall that adding parentheses on both sides of a regular expression mode or partial mode will cause this partial expression to be stored in a temporary buffer. You can use non-capture metamorphic characters '?:', '? =', Or '?!' To ignore saving for this part of the regular expression.

Each sub-match captured is stored in the contents encountered from left to right in the regular expression mode. The buffer number of the storage sub-match starts from 1, continuous numbers up to the maximum 99 sub-expression. Each buffer can be accessed using a '/ n', where n is a one or two-digit decimal number identifies a particular buffer.

Reverse references to a simplest, most useful application is to provide the ability to determine the location of two of the same words in the text. Please see the sentence below:

Is Is the Cost Of Off Who Going Up Up?

Depending on the content written, the above sentence is obviously the problem of multiple repetitions of words. If there is a method that you can modify this sentence without looking for repetition of each word. This feature can be achieved in the following regular expression.

/ b ([A-Z] ) / 1 / b

In this example, the sub-expression is each of the parentheses. The captured expression includes one or more alphanumeric characters, namely [A-Z] specified. The second part of the regular expression is a reference to the child captured by the previously captured, that is, the second appearance of the additional expression. '/ 1' is used to specify the first sub-match. Word Boundary Metacity ensures only a separate word. If so, phrases such as "is quest" or "this IS" are incorrectly identified by this expression.

3.2.2 Specify mode Option in regular expression

You can specify matching mode in regular expression

Symbol

Function

Memo

case sensitive

The front add "-" indicates the shutdown option

Single line mode match

Multi-line mode match

Grammar is (? ISM)

You can use only a part of the expression in the expression, the effective range is from this position until the next mode is encountered.

You can also close this option in front of "-" indicates.

For example, (? I-SM), indicating case sensitive, turning off the single line mode, open multi-line mode.

3.2.3 Lookaround assertion

Perl5 introduces a structure, which is Lookahead and Lookbeehind. They are also referred to as "0 width assertions". It is said that they are "0 width" because they are similar to the positioning, which matches the beginning or end of one or one word. Different places are Lookahaed and LookBehind matching a character, not a location, but returning is not a matching character result, but returns the result of the match: Matching or matching. This is why it is called "assertion". They don't care about the result, they only use it to assert this matching result. Positive and reverse Lookahead

The syntax of forward lookahead is: (? = Regex)

Reverse Lookahead's grammar is: (?! Regex)

The meaning of our example q [^ u] is: 'Q' The characters behind the 'Q' may be all characters other than u ". However, if we are going to get, the result is: 'Q' is not 'u', pay attention No: 'Q' The characters behind 'u'. (Q will not be nothing behind, and the character set must match a character), in this case, we must use the reverse Lookahead assertion. You can write: Q (?! u). Its match result is: 'Q' is not 'u'.

The result of forwarding the LookAhead assertion Q (? = U) is: 'u' after 'Q'.

important:

Any legal regular expression can be used in Lookahead, but it is not available in lookbehind.

Lookahead is enclosed in (), but it does not create a reverse reference. If you want to save the matching results in the assertion, you must use (), like this: (? = (Regex)).

Positive and reverse Lookbehind

The syntax of forward Lookbehind is: (? <= Regex

Reverse Lookbehind's grammar is: (?

Use '<' to distinguish whether Lookahead or LookBehind.

Lookbehind and Lookahead have the same effect, but it acts behind the String. It tells the Regex interpreter to temporarily skip the lookbehind, first match whether LookBehind is matched, if the back matches, check the assertions in LookBehind.

(? <= a) Match CAB, but does not match BED and DEBT.

important:

It is not possible in Lookbehind.

Lookbehind must be a fixed length. So '?' '*' ' ' Is not available.

4. Regular expression basic syntax index

Regular Expression Basic Syntax Reference

CHARACTERS

Character

Description

EXAMPLE

Any Character Except [/ ^ $. |? * ()

All Characters Except The Listed Special Characters Match A Single Instance of Themselves.

a matches a

/ (BACKSLASH) FOLLOWED by any of [/ ^ $. |? * ()

A Backslash Escapes Special Characters to Suppress Their Special Meaning./ Matches

/ XFF WHERE FF Are 2 HEXADECIMAL DIGITS

/ xa9 matches © WHEN Using The Latin-1 Code Page.

/ n, / r and / t

Match An LF Character, Cr Character and a Tab Character Respectively. Can be used in character classes.

/ r / n Matches a DOS / Windows CRLF Line Break.

Character classes or character sets [ABC]

Character

Description

EXAMPLE

Opening Square Bracket

Starts a character class. A character class matches a single character out of all the possibilities offered by the character class. Inside a character class, different rules apply. The rules in this section are only valid inside character classes. The rules outside this section are NOT VALID IN CHARACTER CLASSES, EXCEPT / N, / R, / TID / XFF

Any Character Except ^ -] / add this character to the possible matches for the character class.

All Characters Except The listed Special Characters.

[ABC] Matches A, B or C

/ (backslash) Followed by any of ^ -] /

A Backslash Escapes Special Characters to Suppress Their Special Meaning.

[/ ^ /]] matches ^ or]

- (HYPHEN) Except immediately after the opening [

Specifies a Range of Characters.

[A-ZA-Z0-9] Matches Any Letter or Digit

^ (Caret) Immediately after the opening [

Negates The Character Class, Causeing It to Match A Single Character Not listed in The Character Class (Specifies a Caret Ifte) [)

[^ a-d] Matches X (ANY Character Except A, B, C OR D)

/ d, / w and / s

Shorthand character classes matching digits 0-9, word characters (letters and digits) and whitespace respectively. Can be used inside and outside character classes [/ d / s] matches a character that is a digit or whitespace

/ D, / w and / s

Negated Versions of the Above. Should Be Used Only Outside Character Classes (CAN BE Used Inside).)

/ D Matches a character That is not a digit

DOT

Character

Description

EXAMPLE

(DOT)

Matches Any Single Character Except Line Break Characters / R and / N. Most Regex Flavors Have An Option To Make The Dot Match Line Break Characters TOO.

Matches x or (almost) Any Other Character

ANCHORS

Character

Description

EXAMPLE

^ (Caret)

Matches at the start of the string the regex pattern is applied to. Matches a position rather than a character. Most regex flavors have an option to make the caret match after line breaks (ie at the start of a line in a file) as well .

^. Matches a in abc / ndef. Also Matches D in "Multi-line" Mode.

$ (Dollar)

Matches at the end of the string the regex pattern is applied to. Matches a position rather than a character. Most regex flavors have an option to make the dollar match before line breaks (ie at the end of a line in a file) as well Also matches Before The Very Last Line Break if The string ends with a line break.

. $ matches f in abc / ndef. Also matches c in "Multi-line" mode.

/ A

/ A. Matches a in abc

Matches at the end of the string the regex pattern is applied to. Matches a position rather than a character. Never matches before line breaks, except for the very last line break if the string ends with a line break ../ Z matches f in ABC / NDEF

Matches at the end of the string the regex pattern is applied to. Matches a position rather Than A Character. Never Matches Before Line Breaks.

./z matches f in abc / ndef

Word Boundaries

Character

Description

EXAMPLE

/ B

Matches at the position Between a Word Character (anything matched by / w) and a non-word character (anything matched by [^ / w] or / w) as well as at the start and / or end of the string if the first And / or Last Characters in the string area word character.

./b Matches C in ABC

/ B

Matches at The Position Between Two Word Characters (I.e The Position Between / W / W) AS Well As At The Position Between Two Non-Word Characters (i.e. / w / w).

/B./b Matches B in ABC

Alternation

Character

Description

EXAMPLE

| (PIPE)

ABC | DEF | XYZ Matches ABC, Def Or XYZ

| (PIPE)

ABC (DEF | XYZ) Matches Abcdef or Abcxyz

Quantifiers

Character

Description

EXAMPLE

(Question Mark)

Makes The Preceding Item Optional. Greedy, So The Optional Item is include.

ABC? Matches Ab or Abc

Makes The Preceding Item Optional. Lazy, So The Optional Item Is Excluded In The Match IF Possible. This Construction Because of ITS Limited Use.

ABC ?? Matches Ab or Abc

* (Star) Repeats the previous item zero or more times. Greedy, so as many items as possible will be matched before trying permutations with less matches of the preceding item, up to the point where the preceding item is not matched at all.

". *" Matches "DEF" "GHI" in ABC "DEF" "GHI" JKL

*? (lazy star)

Repeats The Previous Item Zero or More Times. Lazy, So The Engine First Attempts To Skip The Previous Item

"*?" Matches "DEF" in ABC "DEF" "GHI" JKL

(Plus)

Repeats the previous item once or more. Greedy, so as many items as possible will be matched before trying permutations with less matches of the preceding item, up to the point where the preceding item is matched only once.

". " Matches "DEF" "GHI" in ABC "DEF" "GHI" JKL

(Lazy Plus)

Repeats The Previous Item Once or More First Matches The Previous Item Only Once, Before Trying Permutations with Ever Increasing Matches of the Preceding Item.

". ?" "DEF" in ABC "DEF" "GHI" JKL

{n} where n is an inTeger> = 1

Repeats The Previous Item EXACTLY N TIMES.

A {3} matches aaa

{n, m} where n> = 1 and m> = n

Repeats The Previous Item Between N and M Times. Greedy, SO REPEATING M TIMES IS TRIED BEFORE Reducing The Repetition to n Times.

A {2,4} matches aa, aaa or aaaa

{n, m}? where n> = 1 and m> = n

Repeats The Previous Item Between N and M Times. Lazy, So REPEATING N TIMES IS TRIED BEFORE Increasing The Repetition To M Times.

A {2,4} matches aaaa, aaa or aa

{N,} where n> = 1Repeats the previous item at least n times. Greedy, so as many items as possible will be matched before trying permutations with less matches of the preceding item, up to the point where the preceding item is matched only N time.

A {2,} matches aaaaa in aaaaa

{n,}? where n> = 1

Repeats The Previous Item Between N and M Times. Lazy, So The Engine First Matches The Previous Item N Times, Before Trying Permutations with Ever Increasing Matches of The Preceding Item.

A {2,}? matches aa in aaaaa

5. Regular expression advanced syntax index

Regular Expression Advanced Syntax Reference

Grouping and backreferences

Syntax

Description

EXAMPLE

(regex)

Round brackets group the regex between them. They capture the text matched by the regex inside them that can be reused in a backreference, and they allow you to apply regex operators to the entire grouped regex.

(ABC) {3} Matches Abcabcabc. First Group Matches ABC.

(?: regex)

Non-Capturing Parentheses Group The Regex So You CAN Apply Regex Operators, But Do Not Capture Anything and Do Not Create Backreferences.

(?: abc) {3} matches abcabcabc. no groups.

/ 1 THROUGH / 9

Substitution with the text matched Between The 1st Through 9th Pair of Capturing Parentheses. Some Regex Flavors Allow More Than 9 Backreference.

(ABC | DEF) = / 1 Matches ABC = ABC or DEF = DEF, But not ABC = DEF OR DEF = ABC.

Modifier

Syntax

Description

EXAMPLE

(? i)

Turn ON Case Innsitivity for The Remainder of The Regular Expression. (Older Regex Flavors May Turn It on for the entire regex.)

TE (? i) ST Matches Test But Not Test.

(? -i)

Turn Off Case Innsitivity for the remainder of the regular expression.

(? i) TE (? - i) ST Matches Test But Not Test.

(? s)

"Dot Matches Newline" for the remain of the regular expression. (? - s) (? - s)

Turn Off "Dot Matches Newline" for the remainder of the regular expression.

(? M)

Caret and Dollar Match After and Before Newlines for The Remainder of The Regular Expression. (Older Regex Flavors May Apply this to the entire regex.)

(? -M)

Caret and Dollar Only Match At the Start and End of The String for The Remainder of The Regular Expression.

(? I-SM)

Turns on the options "i" and "m", and Turns Off "S" for the remain of the regular expression. (Older Regex Flavors May Apply this to the entire regex.)

(? I-SM: regex)

Matches the regex inside the span with the options "i" and "m" turned on, and "s" TURNED OFF.

(? i: te) St Matches Test But Not Test.

Atomic Grouping and Possessive Quantifier

Syntax

Description

EXAMPLE

(?> regex)

Atomic groups prevent the regex engine from backtracking back into the group (forcing the group to discard part of its match) after a match has been found for the group. Backtracking can occur inside the group before it has matched completely, and the engine can backtrack past the entire group, discarding its match entirely. Eliminating needless backtracking provides a speed increase. Atomic grouping is often indispensable when nesting quantifiers to prevent a catastrophic amount of backtracking as the engine needlessly tries pointless permutations of the nested quantifiers.

X (?> / w ) x is more effect Than X / W X if The second x cannot be matched.

? , * , and {m, n}

Possessive quantifiers are a limited yet syntactically cleaner alternative to atomic grouping. Only available in a few regex flavors. They behave as normal greedy quantifiers, except that they will not give up part of their match for backtracking.x is identical to (?> X )

LOOKAROUND

Syntax

Description

EXAMPLE

(? = regex)

Zero-width positive lookahead. Matches at a position where the pattern inside the lookahead can be matched. Matches only the position. It does not consume any characters or expand the match. In a pattern like one (? = Two) three, both two And Three Have to match at the position where the match of one ends.

T (? = s) Matches The Second T in Streets.

(?! regex)

T (?! s) Matches the first t in streets.

(? <= text)

Zero-width positive lookbehind. Matches at a position to the left of which text appears. Since regular expressions can not be applied backwards, the test inside the lookbehind can only be plain text. Some regex flavors allow alternation of plain text options in the lookbehind.

(? <= s) t Matches the first t in streets.

Zero-width Negative Lookbehind. Matches at a positionness ife...................

Continuing from the previous match

Syntax

Description

EXAMPLE

/ G

Matches at the position where the previous match ended, or the position where the current match attempt started (depending on the tool or regex flavor). Matches at the start of the string during the first match attempt.

/ G [a-z] first matches a, the matches b and one fails to match in ab_cd.conditionals

Syntax

Description

EXAMPLE

(? (? = regex) THEN | ELSE)

If the lookahead succeeds, the "then" part must match for the overall regex to match. If the lookahead fails, the "else" part must match for the overall regex to match. Not just positive lookahead, but all four lookarounds can be used . Note That the lookahead is zero-width, so the "the" and "else" Parts need to match and consume the part of the text matched by the lookahead as well.

(? (? <= a) b | c) Matches the second b and the first c in Babxcac

Comments

Syntax

Description

EXAMPLE

(? #comment)

EVERYTHING BETWEEN (? # And) is ignored by the regex engine.

A (? # foobar) b Matches ab

6. Reference

Regular expression library http://www.regexlib.com/

Regular expressions blog http://blogs.regexadvice.com/

Mastering regular expression (O'Reilly), author Jeffrey Friedl http://www.regex.info/

.NET regular expression reference

Http://msdn.microsoft.com/library/en-us/cpref/html/frlrfsystemtextregulaarXPRESSIONS.ASP

JScript regular expression syntax

http://www.msdn.microsoft.com/library/en-us/script56/HTML/JS56JSGRPREGEXPSYNTAX.ASP

Regular expression information http://www.regular-expressions.info/

7. Recommended tool

All examples of this article are verified under EditPad Pro.

This tool is very good, there is a speech check and highlighting, which is very helpful to write the correct expression. Tropic recommended J

Download address: http://www.editpadpro.com/

Another tool is: The Regulator.

This tool is targeted. The NET platform, used is the regular expression class library implemented by .NET. If you want to verify your expression in .NET, this tool is indispensable.

Regular Expressions (1) ---- What Is Regular Expressions?

9cbs