Regular expression basics (transfer)

xiaoxiao2021-03-06  41

Let's start with your simple start. Suppose you want to search for a string containing characters "cat", the regular expression for search is "cat". If the search is insensitive to case sensitive, words "Catalog", "Catherine", "Sophisticated" can match. That is to say:

1.1 period symbol

Suppose you are playing English spelling games, want to find three letters of words, and these words must begin with "T" letters, ending with "n" letters. In addition, suppose there is an English dictionary, you can search all of its content with regular expressions. To construct this regular expression, you can use a wildcard-period symbol ".". In this way, the complete expression is "TN", which matches "Tan", "Ten", "Tin", and "TON", also match "T # N", "TPN" and even "TN", there are many other no A combination of meaning. This is because the sentence symbols match all characters, including spaces, TAB characters, and even wraps:

1.2 square bracket symbol

In order to solve the problem of the sentence symbol matching, you can specify meaningful characters in square brackets ("[]"). At this point, only the characters specified in square brackets participate in the match. That is, the regular expression "T [AEIO] N" matches "Tan", "Ten", "TiN" and "Ton". But "Ton" does not match, because in square brackets, you can only match a single character:

1.3 "or" symbol

If you want to match "Toon" in addition to all words you match, you can use the "|" operator. The basic meaning of "|" operator is "or" operation. To match "TOON", use "T (A | E | I | O | OO) N" regular expression. You can't use the square bookmap here because square brackets only allow matching of individual characters; cracker "()" must be used here. Parentheses can also be used to group, please refer to it later.

1.4 Represents symbols of the number of matches

Table 1 shows the symbols indicating the number of matches, which are used to determine the number of times the symbols on the left side of the symbol:

Suppose we have to search for American social security numbers in a text file. The format of this number is 999-99-9999. The regular expression used to match it is shown. In the regular expression, even characters ("-") have special significance, which represents a range, such as from 0 to 9. Therefore, when the symbols in the social security number are matched, it is necessary to add a escape character "/".

Figure 1: Match all 123-12-1234 social security numbers

When you search, you want the hyphen to appear, or you may not appear - ie, 999-99-9999 and 99999999 belong to the correct format. At this time, you can add "?" Quantity definition symbols behind the lunker symbol, as shown in Figure 2:

Figure 2: Social Security Number Matching all 123-12-1234 and 123121234

Let's take another example below. A format of a US car license is four numbers plus two letters. Its regular expression front is a digital portion "[0-9] {4}", plus the letter part "[a-z] {2}". Figure 3 shows a complete regular expression.

Figure 3: Match typical US car license numbers, such as 8836kV

1.5 "No" symbol

"^" Symbol is called "No" symbol. If used in square brackets, "^" indicates the character that does not want to match. For example, the regular expression of Figure 4 matches all words, except for the words starting with "X" letters.

Figure 4: Match all words, except "X"

1.6 parentheses and blank symbols

Suppose you want to extract the monthly sections from the birthday date of "June 26, 1951", which can match the regular expression of the date can be shown in Figure 5:

Figure 5: Matching all the "/ s" symbols of all Moth DD, YYYY format are blank symbols, match all blank characters, including Tab characters. If the string is correctly matched, how do you get the one of the months? Simply add a group of parentheses around the month, then extract its value with the Oro API (described in detail later). The regular regular expression is shown in Figure 6:

Figure 6: Match the date of all MONTH DD, YYYY format, defining the month value of the first group

1.7 Other symbols

For the sake of simplicity, you can use a shortcut symbol created for common regular expressions. As shown in Table 2:

Table 2: Commonly used symbols

For example, in the example of the previous social security number, all the "[0-9]" places we can use "/ d". The regular regular expression after modification is shown in Figure 7:

Figure 7: Social security number matching all 123-12-1234 format

转载请注明原文地址:https://www.9cbs.com/read-56590.html

New Post(0)