Basic syntax of regular expression

xiaoxiao2021-03-06 153

Let us first look at the two special symbols '^' and '$'. Their role is to point out the beginning and end of a string. The example is as follows:

"^ The": indicates all strings starting with "THE" ("there", "the cat", etc.);

"of despair $": indicates the string ending with "of despair";

"^ ABC $": Indicates that the start and end are string of "abc" - huh, only "ABC";

"Notice": indicates any string containing "Notice".

The last example, if you don't use two special characters, you are in any part of the string you want to find - you and

Do not position it on a top.

Others have the three symbols of '*', ' ' and '' ', indicating that one or a sequence character is repeated. They represent "no or

More "," once or more "and" no or once ". Below is a few examples:

"ab *": means a string having a A back followed by zero or several b. ("A", "ab", "abbb", ...);

"ab ": means that a string has a A back follows at least one B or more;

"ab?": Indicates a string having a A back followed by zero or one B;

"a? b $": indicates that there is zero or one A follow one or more b at the end of the string.

You can also use the scope, enclose with braces to indicate the range of repetitions.

"AB {2}": means a string has a A follow 2 B ("ABB");

"AB {2,}": means a string having a A follow at least 2 b;

"AB {3, 5}": indicates a string having a A followed 3 to 5 B.

Note that you must specify the lower limit of the range (such as "{0, 2}" instead of "{, 2}"). Also, you may notice, '*', ' ' and

'?' Is equivalent to "{0,}", "{1,}" and "{0, 1}".

There is also a '|', indicating "or" operation:

"Hi | Hello": indicates that there is "hi" or "hello" in a string;

"(B | CD) EF": means "bef" or "cdef";

"(a | b) * c": Indicates a string of "a" "b" mixed strings and one "c";

'.' Can replace any character:

"a. [0-9]": indicates a string having a "A" behind follows an arbitrary character and a number;

"^. {3} $": Indicates a string (length of 3 characters); square brackets indicate that some characters are allowed to appear in a particular location in a string:

"[ab]": indicates a string having "a" or "b" (equivalent to "a | b");

"[A-D]": indicates a string contains one of 'A' to 'd' (equivalent to "A | B | C | D" or "[ABCD]");

"^ [A-ZA-Z]": indicates a string that starts with letters;

"[0-9]%": indicates a number before a percent sign;

", [A-ZA-Z0-9] $": Indicates that a string is followed by a comma with a letter or numbers.

You can also use '^' in square brackets to indicate the characters that do not want to appear, '^' should in square brackets. (Such as "% [^ a-za-z]%" table

Letters should not appear in two percent signs).

In order to express one by word, you must add transition character '' before "^. $ () | * ? {".

Please note that in square brackets, no escape characters need.

If we ask those unix systems, they like what they like. In addition to stable systems and can be started remotely, ten eight-nine people will mention regular expressions; if we ask what they are the most, what is the most headache? In addition to complex process control and installation procedures, it will also be regular expressions. So what is the regular expression? How can I really master the regular expression and properly use it? This article will introduce this, hoping to help readers who are eager to understand and master regular expressions.

Getting started

Simply put, the regular expression is a powerful tool that can be used for pattern matching and replacement. We can find a regular expression in almost all UNIX-based tools, such as a VI editor, Perl, or PHP scripting language, and awk or sed shell programs. In addition, the scripting language like JavaScript has also provided support for regular expressions. It can be seen that the regular expression has exceeded the limitations of some languages or a system, and has become a widely accepted concept and function.

Regular expression allows users to build a matching mode by using a series of special characters, then compare the matching mode with data files, program input, and web pages, whether or not to include matching mode in the comparison object, perform corresponding program of.

For example, a general expression of a regular expression is whether it is used to verify that the format of the mail address entered online input is correct. If the format of the user mail address is verified by the regular expression, the form information filled out will be processed normally; contrary, if the user entered by the user input does not match the mode, the prompt information will be popped up, requiring the user to re-re- Enter the correct email address. This shows that the regular expression has a pivotable role in the logical judgment of the web application.

Basic syntax

After a preliminary understanding of the function and function of the regular expression, we will see the syntax format of the regular expression.

The form of regular expressions is generally as follows:

/ Love /

The part between the "/" delimiter is the mode that will be matched in the target object. Users can put them between the mode content you want to find the matching object in the "/" delimiter. In order to be able to make user more flexible custom mode content, regular expressions provide special "metadamic characters". The so-called metammatism refers to the exhibit mode of its preamble characters (i.e., characters in front of the metamorphism) in the regular expression. More commonly used metamodes include: " ", "*", and "?". Among them, " " figures specify that its predetermined characters must continue once or more in the target object, "*" element character specifies that its predetermined character must occur zero or continuous in the target object, and "?" Yuan Characters are specified that their leading objects must be zero or once in the target object.

Let's take a look at the specific application of the regular expression element character.

/ fo /

Since the above regular expression includes a " " element character, indicating that "fool", "fo", or "football" in the target object can match the string of one or more letter O after the letter f after the letter F. .

/ eg * /

Since the above regular expression contains "*" character, it indicates that "EASY", "EGO", or "EGG" in the target object can continuously appear from zero or more letter Gs after the letter E. match.

/ Wil? /

Since "?" Metad characters are included in the above regular expression, it indicates that "WIN", or "Wilson" in the target object, or a string of zero or one letter L continuous or one letter L continuously after the letter i.

In addition to the metammat, the user can accurately specify the frequency that appears in the match object. E.g,

/ jim {2,6} /

The above regular expression specifies that the character m can continuously appear in two times in the matching object, and therefore, the regular expression may match the character string such as JIMMY or JIMMMMMY.

After you have a preliminary understanding of how to use the regular expression, let's take a look at the other important metades.

/ S: Used to match a single space character, including Tab keys, and wrap;

/ S: Used to match all characters outside of single spaces;

/ d: Used to match the number from 0 to 9;

/ W: Used to match letters, numbers or underscore characters;

/ W: Used to match all characters that do not match / W;

: Used to match all characters outside of the resort.

(Note: We can regard / s and / s and / w and / w as mutual counterputting)

Below, we look at how to use the above metades in the regular expression.

// s /

The above regular expression can be used to match one or more space characters in the target object.

// d000 /

If we have a complex financial statement in his hand, we can find all the total amount of thousands of yuan through the above regular expressions.

In addition to the metamorphors described above, there is another unique dedicated character, ie, locator in the regular expression. The locator is used to specify the appearance of the matching mode in the target object.

More commonly used locators include: "^", "$", "/ b", and "/ b". Where "^" positioning specifies that the match mode must appear at the beginning of the target string, the "$" locator specifies that the match mode must appear on the end of the target object, / b Locator specified that the match mode must appear on the start of the target string Or one of the two boundaries end, and "/ b" positioning rules that match objects must be within two boundies of the start and end of the target string, ie the matching objects cannot be the beginning of the target string, and cannot be used as The end of the target string. Similarly, we can also regard "^" and "$" and "/ b" and "/ b" as two sets of locators that are inversely. for example:

/ ^ Hell /

Since the above regular expression contains "^" locator, it can match the string of "Hell", "Hello" or "Hellhing" in the target object. / AR $ /

Since the "$" locator is included in the above regular expression, it can match the string ends with "car", "bar" or "ar" in the target object.

// bbom /

Since the above regular expression mode begins with the "/ b" positioner, it can match the string beginning with "bomb", or "bom" in the target object.

/ MAN / B /

Because the above regular expression mode is tailing in "/ b", it can match the string of "hum", "Woman" or "man" in the target object.

In order to facilitate user more flexible setting matching mode, the regular expression allows the user to specify a range in the match mode without being limited to the specific character. E.g:

/ [A-z] /

The above regular expression will match any uppercase from the A to Z.

/ [a-z] /

The above regular expression will match any lowercase alphabet from the A to Z.

/ [0-9] /

The above regular expression will match any of the numbers from 0 to 9.

/ ([A-Z] [A-Z] [0-9]) /

The above regular expression will match any string consisting of letters and numbers, such as "AB0". Here, it is necessary to remind the user to pay attention to the use of "()" to combine the string in the regular expression. "()" The content containing the symbol must appear in the target object at the same time. Therefore, the above regular expression will not match a string such as "ABC", because the last character in "ABC" is a letter rather than a number.

If we want to implement "or" or "operations in the regular expression, you can use a match in multiple different modes to use the pipeline" | ". E.g:

/ to | TOO | 2 /

The above regular expression will match "TO", "TOO", or "2" in the target object.

There is also a more common operator in the regular expression, ie, negative "[^]". Unlike the locator "^" described in our forebel, negative "[^]" specifies the string specified in the mode in the target object. E.g:

/ [^ A-c] /

The above strings will match any characters other than A, B, and C in the target object. In general, when "^" appears in "[]", it is considered a negative operator; and when "^" is "[]", or "[]", it should be regarded. Locator.

Finally, when the user needs to add a metamorphic in the regular expression of the regular expression and find the matching object, you can use the escape character "/". E.g:

/ TH / * /

The above regular expression will match "TH *" instead of "THE" or the like in the target object.

Use example

After the regular expression has a more comprehensive understanding, let's take a look at how to use regular expressions in Perl, PHP, and JavaScript.

Typically, the usage format of the regular expression in Perl is as follows:

Operator / regular-expression / string-to-replace / modifiers

One of the operators can be M or S, represent matching operations and replacement operations, respectively.

Among them, the regular expression is a mode that will match or replace the operation, can be composed of any character, element character, or locator. The replacement string is a string that matches the object to the object when the search mode matchs the object. The final parameter item is used to control different match or replacement. E.g:

S / geed / good /

The first GEED string will be found in the target object and replace it with a good. If we want to perform multiple lookups-replacement operations in the global scope of the target object, you can use the parameter "g", which is S / Love / Lust / G.

In addition, if we don't need to limit the case where you don't need to limit the matching form, you can use the parameter "I". For example, m / jewel / i

The above regular expression will match Jewel, Jewel, or Jewel in the target object.

In Perl, use specialized operators "= ~" to specify matching objects of regular expressions. E.g:

$ FLAG = ~ S / ABC / ABC /

The above regular expression will replace the string ABC in the $ FLAG to ABC.

Below, we add regular expressions in the Perl program to verify the validity of the user's mail address format. code show as below:

#! / usr / bin / perl

# Get input

Print "What's your email address? / n";

$ EMAIL =

CHOMP ($ email);

# Match and Display Result

IF ($ email = ~ /^([a-za-z0-9_-] )@([A-ZA-Z0-9_-]) (/.[A-ZA-Z0-9_-]) /)

{

Print ("Your Email Address Is Correct! / n");

}

Else

{

Print ("please try! / n");

}

If the user prefer PHP, you can use an EREG () function to match the mode matching operation. The use format of the EREG () function is as follows:

EREG (Pattern, String)

Among them, Pattern represents the mode of the regular expression, and String is the target object that performs the lookup replacement operation. Similarly, verifying the email address, the program code written in PHP is as follows:

IF (EREG ("([A-ZA-Z0-9 _-]) @ ([A-ZA-Z0-9 _-]) (/. [A-ZA-Z0-9 _-]) ", $ EMAIL))

{Echo "Your Email Address Is Correct!";

Else

{Echo "please try again!";

Finally, let's take a look at JavaScript. JavaScript 1.2 has a powerful regexp () object that can be used to perform a matching operation of regular expressions. The test () method can verify that there is a match mode in the target object and return TRUE or FALSE accordingly.

We can write the following script using JavaScript to verify the validity of the mail address entered by the user.