Use regular expressions compatible with Perl in PHP

xiaoxiao2021-03-06  99

Source: PHP Power Online

1 Introduction

PHP is developed in a large number of background CGI development of the web, is usually a result after the user data data, but if the data entered by the user is incorrect, there will be problems, such as someone's birthday is "February 30 day"! How should I check if the summer vacation is correct? Support for regular expressions in PHP, allowing us to make data matching.

2 What is a regular expression:

Simply put, the regular expression is a powerful tool that can be used for pattern matching and replacement. Traces of regular expressions in almost all UNIX / Linux systems are found, for example: Perl or PHP scripting languages. In addition, JavaScript's scripting language also provides support for regular expressions, and now regular expressions have become a general concept and tool, which is widely used by all kinds of technicians.

There is this in a Linux website: "If you ask Linux lovers favorite, he may answer the regular expression; if you ask him the most fear, he will definitely say that he will say that he will say that he will definitely formula."

As mentioned above, the regular expression looks very complicated, which is scared, most PHP beginners will skip this, continue the following learning, but the regular expression in PHP has the character, which can be used to find a characterful character. Character Strings, determine whether the string is critical or use the specified string to replace the powerful functions such as the conditional string, it is unfortunately ...

3 Basic syntax of regular expression:

A regular expression is divided into three parts: separator, expression, and modifier.

The separator may be any character other than a special character (such as "/!", Etc.), the commonly used separator is "/". Expression consists of some special characters (special characters, see below) and non-special strings, such as "[A-Z0-9 _-] @ [A-Z0-9 _-.] " Can match a simple electronics Mail string. The modifier is used to turn on or off some function / mode. Below is an example of a complete regular expression:

/HELLO. HHELLO/IS

The above regular expression "/" is the separator, between the two "/" is the expression, the second "/" string "IS" is a modifier.

In the expression, if the separator is included, it is necessary to use the escape symbol "/", such as "/Hello. //Hello/is". In addition to the specific characters that can be performed outside the separator, all special characters composed of letters require "/" to escape, such as "/ d" represents all numbers.

4 Special characters for regular expressions:

Special characters in the regular expression are divided into element characters, locating characters, and more.

The metammathe is a kind of characteristic character in the regular expression, used to describe its preamble character (ie characters in front of the element) appear in the matched object. The metadature itself is a single character, but the different or the same metammatic combination can constitute a large element character.

Metacity:

Big brackets: Braces used to accurately specify the number of times of matching element characters, such as "/ pre {1,5} /" indicates that the matching object can be "pre", "pree", "preeeee" behind "PR" A string of 1 to 5 "E" appears. Or "/ pre {, 5} /" represents 0 this to 5 times between 0 this.

Plus: " " character is used to match characters before the character appear or multiple. For example, "/ ac /" means that the object being matched may be "ACT", "Account", "ACCCC", etc., "A", "A", "A" strings, such as "A", or more "C" strings. " " Is equivalent to "{1,}". The asterisk: "*" is used to match the characters before matching the character. For example, "/ ac * /" indicates that the matching object can be "app", "ACP", "ACCP", etc., the string of zero or more "C" appears after "A". "*" Is equivalent to "{0,}".

Question mark: "?" The characters used to match the characters in front of the character appear zero or 1 time. For example, "/ ac? /" Indicates that the matching object can be "a", "ACP", "ACWP" such as zero or 1 "C" string after "A". "?" There is also a very important role in the regular expression, "greed mode".

There are two very important characters that "[]". They can match the characters that appear in "[]", such as "/ [az] /" can match a single character "a" or "z"; if the above expression is changed to "/ [AZ] /" You can match any single lowercase letters, such as "a", "b", and more.

If "^" appears in "[]", the characters that appear in this expression do not match "[]", such as "/ [^ a-z] /" do not match any lowercase letters! And the regular expression gives several "[]" default values:

[: alpha:]: Match any letters

[: alnum:]: Match any letters and numbers

[: DIGIT:]: Match any number

[: Space:]: Matching Equity

[: Upper:]: Match any uppercase letters

[: Lower:]: Match any lowercase letters

[: punct:]: Match any punctuation

[: xdigit:]: Match any 16 credit number

In addition, the meaning of these special characters in the escape symbol "/" The meaning of the representative is as follows:

S: Match a single space character

S: It is used to match all characters outside of single spaces.

D: Used to match numbers from 0 to 9, equivalent to "/ [0-9] /".

W: Used to match letters, numbers or underscore characters, equivalent to "/ [a-za-z0-9 _] /".

W: Used to match all characters that do not match W, equivalent to "/ [^ a-za-z0-9 _] /".

D: Used to match any non-10 credit numeric characters.

: Used to match all characters outside of the resort line, if the modifier "S" is modified, "." Can represent any character.

It is easy to express some cumbersome mode matching using the special characters above. For example, "// d0000 /" utilizes the above regular expression to match more than 10,000 integer strings.

Location character:

Location characters is a very important character in the regular expression, and its main role is to describe the character in the matching object.

^: Indicates the matching mode appears in the beginning of the matching object (and in "[]") $: indicates that the matching mode appears at the end of the matching object

Space: The model that represents the matching mode appears in the start and ends of the two boundaries

"/ ^ HE /": You can match the string starting with "HE" characters, such as Hello, Height, etc.

"/ he $ /": You can match the string of the string of the "HE" character, etc.;

"/ HE /": The opening of the space, like the role of ^, match the string starting with HE;

"/ he /": The space is end, like the role of $, matches the string ended with HE;

"/ ^ HE $ /": indicates that only the string "HE" matches.

brackets:

Regular expression In addition to user matching, you can also use parentheses "()" to record the required information, store it, and read it later. such as:

/^([A-ZA-Z0-9_-] )@ ([[a-z-z0-9_-] )( (((((.[A-ZA-Z0-9_-]) $/

That is to record the username of the email address, the server address of the email address (the form is username@server.com), after you want to read the recorded string, just need to use "Essential record) Sequence "to read. For example, "/ 1" is equivalent to the first "[A-ZA-Z0-9 _-] ", "/ 2" is equivalent to the second ([A-ZA-Z0-9 _-] ), "/ 3 "is the third (. [A-ZA-Z0-9_-]). But in PHP, "/" is a special character, it needs to escape, so "" "" "// 1" should be written in the expression of PHP.

Other special symbols:

"|": Or symbol "|" and php or the same, but a "|", not the two "||" of PHP! It is to be a character or another string, such as "/ abcd | dcba /" may match "ABCD" or "DCBA".

5 greedy mode:

In front of the metamorphic "?" There is an important role, "greedy mode", what is "greed mode"?

For example, we must match the string ending with the end of the letter "A", but the string that needs to be matched in "A" has many "B", such as "a bbbbbbbbbbbbbbbbbbbbbbbbb," A bbbbbbbbbbbbb ", the regular expression will match Is the first "B" or the last "B"? If you use greed mode, then you will match the last "B", and it is only matched to the first "B".

Expression using greedy mode is as follows:

/a. ?b/

/a. b/u

Not using greed mode:

/a. b/

A modifier U is used above, see the section below.

6 modifier:

The modifier in the regular expression can change many of the regular features, making regular expressions more suitable for you (Note: The modifier is sensitive to case, which means "e" is not equal to "e"). The modifier in the regular expression is as follows: i: If "i" is added in the modifier, the general write sensitivity will be canceled, i.e., "A" and "A" are the same.

M: The default positive start "^" and end "$" just for the regular string If you add "M" in the modifier, then the beginning and end will refer to each line of the string: the beginning of each line is "^ ", The end is" $ ".

S: If you add "S" in the modifier, then the default "." represents any character other than the wrapper will become any character, that is, a wrapper!

X: If you add this modifier, your blank characters in your expression will be ignored unless it has been essential.

E: This modifier is only useful for Replacement, representing the PHP code in Replacement.

A: If this modifier is used, the expression must be the beginning of the matching string. For example, "/ a / a" matches "ABCD".

E: In contrast to "M", if you use this modifier, "$" will match the end of the absolute string, instead of the frontline, this mode is opened by default.

U: The role of the question mark is similar to setting "greed mode".

7 PCRE-related regular expression functions:

PHP's Perl compatible regular expressions provide multiple functions, divided into pattern match, replacement, and matches, etc.:

1, preg_match:

Format: int preg_match (string pattern, string subject, array [matches]);

This function will match the pattern expression in the string. If [Regs] is given, String is recorded in [Regs] [0], [regs] [1] represents the bracket "()" recorded The first string, [regs] [2] represents the second string recorded, so that this is pushed. PREG If you find a matching Pattern in String, you will return "true", otherwise it will return "false".

2, preg_replace:

Format: Mixed preg_replace (Mixed Pattern, Mixed Replacement, Mixed Subject);

This function uses all strings that match the expression of the expression Pattern to expressions to Expalingment. If you need some characters that contain Pattern in Replacement, you can use "()" to record, just need to read it with "/ 1" in Replacement.

3. PREG_SPLIT:

Format: array preg_split (String Pattern, String Subject, int [limited]);

This function is the same as the function split, which distins only the simple regular expression with Split, and preg_split uses a full PerL compatible regular expression. The third parameter LIMIT represents how many conformance values ​​are allowed. 4, preg_grep:

Function format: array preg_grep (String Patern, Array Input);

This function and preg_match feature basically, but preg_grep match all elements in a given array INPUT, returns a new array.

Let's take an example. For example, we have to check if the format of the Email address is correct:

Function Emailism ($ email) {

IF (preg_match ("^ [_ /. 0-9a-z -] @ ([0-9a-z] [0-9a-z -] /.) [AZ] {2,3} $ , $ email) {

Return 1;

}

Return 0;

}

IF (Emailism ('y10k@963.net '))) Echo' correct

IF (! Emailism ('Y10K @ ffffff')) Echo 'is incorrect

?>

The above program will output "correct
incorrect".

8. Different of Perl compatible regular expressions and Perl / EREG regular expressions:

Although it is called "Perl compatible regular expression", the PHP is still different than the regular expression of Perl, such as the modifier "G" represents all the match in Perl, but did not join this modifier in PHP support.

There is also the difference between the EREG series, EREG is also the regular expression function provided in PHP, but it is weak than PREG.

1, EREG does not need to use separator and modifier, so EREG's function is weaker than PREG.

2, about ".": Point in the regular is generally in addition to all characters other than the newline character, but in EREG "." Is any character, that is, a newline! If you want "in PREG". "" Can include a newline, and "s" can be added in the modifier.

3, EREG defaults to use greed mode, and cannot be modified, this brings trouble to many replacement and match.

4, speed: This may be a problem with many people concerned, will it be powerful to be exchanged with speed? Don't worry, PREG speed is far from EREG, and the author has made a program test:

Time Test:

PHP code:

echo "preg_replace used time:";

$ start = TIME ();

For ($ I = 1; $ I <= 100000; $ i ) {

$ str = "ssssssssssssssssssssssssssssssss";

preg_replace ("/ s /", ", $ STR);

}

$ ended = Time () - $ start;

Echo $ ended;

echo "

EREG_REPLACE Used Time: ";

$ start = TIME ();

For ($ I = 1; $ I <= 100000; $ i ) {

$ str = "ssssssssssssssssssssssssssssssss"; EREG_REPLACE ("S", ", $ STR);

}

$ ended = Time () - $ start;

Echo $ ended;

echo "

STR_REPLACE USED TIME: "

$ start = TIME ();

For ($ I = 1; $ I <= 100000; $ i ) {

$ str = "ssssssssssssssssssssssssssssss";

STR_REPLACE ("S", ", $ STR);

}

$ ended = Time () - $ start;

Echo $ ended;

?>

result:

Preg_replace used time: 5

EREG_REPLACE Used Time: 15

STR_REPLACE USED TIME: 2

Str_replace is very fast because there is no need to match, and the speed of preg_replace is much faster than EREG_REPLACE.

9. With regard to PHP3.0 for PREG:

PREG support is added by default in PHP 4.0, but it is true in 3.0. If you want to use the preg function in 3.0, you must load the php3_pcre.dll file, as long as the PHP.INI's Extension section is added to "Extension = PHP3_PCRE.DLL" and then starting PHP from the newly started PHP!

In fact, the regular expression is often used for Ubbcode implementation, and many PHP forums use this method (such as zforum zphp.com or vbullent.com), but the specific code is relatively long.

10. Regular expressions match Chinese

$ Str = "Chinese Test";

PREG_MATCH_ALL ("/ ([/ x81- / xfe] (/ x40- / xfe) /", $ STR, $ CH);

$ patterns = array_unique ($ CH [0]);

Print_r ($ patterns);

?>

转载请注明原文地址:https://www.9cbs.com/read-104526.html

New Post(0)