Through example learning regular expression (1) - basics

xiaoxiao2021-03-06  79

Basic syntax of regular expressions:

First, let's take a look at two special symbols: '^' and '$'. Their role is to indicate the beginning and end of a string. Separate like this:

"^ The": Corresponding to any string starting with "THE"

"of despair $": Respond to the string ending with "of despair"

"^ ABC $": A string that starts with "ABC" and ending - it is "ABC" yourself!

"Notice": A string containing "notice".

You can see if you are not used by two symbols, just like the last example, you are equal to the expression: in the arbitrary position of the string, you can match the style, that is, you don't care what it appears in the head or the end.

There are also several symbols '*', ' ', and '?', They represent the number of characters or strings. They mean: "0 or more (arbitrarily)", "1 or more ( At least 1 time) "," 0 or 1 time (up to 1 time) ". There are some examples below:

"ab *": Corresponding to a string ("A", "AB", "ABBB", and more, including one A followed by any b

"AB ": Similar, but at least one B ("ab", "abbb", etc.);

"ab?": either a B either no;

"a? b $": There may be a A at the end section, or there may be no, which is more than 1 B.

You can also use the curly brackets, the numbers inside will indicate the range of the previous characters:

"AB {2}": corresponds to a string with 2 B ("ABB") back to the back;

"AB {2,}": at least 2 B ("ABB", "abbbb", etc.);

"AB {3,5}": 3 to 5 B ("abbb", "abbbb", or "abbbb").

Note that you must pay attention to the first number. (For example: "{0,2}", can not be "{, 2}"). You may have already noticed, characters '*', ' ', and '?' Is the same as "{0,}", "{1,}", and "{0,1}".

Now, some character sequence / small string is to put them in parentheses:

"A (BC) *": Corresponding to a string containing any "BC" in A;

"A (BC) {1,5}": 1 to 5 "BC" can be.

There is also '|' characters, the role is like OR, used to choose:

"Hi | Hello": Corresponds to a string with "hi" or "hello";

"(B | CD) EF": A string with "bef" or "cdef";

"(a | b) * c": A string has a combination of A and B, and then ends at a C.

One sentence ('.') Means any individual characters:

"a. [0-9]": Indicates a string with a character and a number behind it;

"^. {3} $": There are 3 characters of strings.

Square brackets clearly point out which characters can appear in a single character:

"[AB]": Corresponds to a A or a B (equivalent to "a | b");

"[ad]": A string has lowercase letters 'a' to 'd' (equivalent "A | B | C | D" or even "[ABCD]"); "^ [A-ZA-Z]" : One start character is a string of English letters;

"[0-9]%": There is a number of strings before the percent sign;

", [A-ZA-Z0-9] $": A string end is a comma with a number or letter.

You can use a list to eliminate the characters you don't want - just use a '^' in your square bracket (for example, "% [^ a-za-z]%" in two hundred One character between the semicolon is not English letters). In addition, you must pay attention, some time, you don't have to add a backslash to indicate that special characters invalid, such as the first location of the character class. Look: "( $ | ¥) The meaning of [0-9] "can be expressed as EREG (" (/ $ | ¥) [0-9] ", $ STR) (What string is this match?)

Don't forget, all special characters in square brackets will lose special meaning (Note: '^' and '-' exceptions), including backslash, such as "[* ? {}.]" Is matching these Any one in the symbol. Regex Man Pages tell us: If you contain a ']', you can put it in the first character position, or put a backslash in front of it (for example / [ABC]] / )

Finally, I should mention: collating sequences, character classes, and Equivalence Classes, I will not mention their details, because this is not big in the in-depth relationship of this article, you can find more in Regex Man Pages. content.

Transfer from: http://se2k.51.net/myphp/

转载请注明原文地址:https://www.9cbs.com/read-119626.html

New Post(0)