Basic syntax of regular expression
First, let's take a look at the two special characters: '^' and '$' they are used to match the beginning and end of the string, respectively, exemplify
"^ The": Match the string starting with "The": "of despair $": Match the string ending with "of despair"; "^ ABC $": Match the beginning of ABC and ending with ABC, In fact, only ABCs and match "notice": match the string containing NOTICE
You can see if you don't use the two characters we mentioned (the last example), that is, the mode (regular expression) can appear anywhere in the verified string, you didn't lock him to both sides. Character '*', ' ', and '?', They used to represent the number or order of characters that can appear. They represent: "Zero or more", "one or more", and "zero or one. "Here is some examples:
"ab *": Match string a and 0 or more strings ("A", "ab", "abbb", etc.); "AB ": The same is the same, but at least one B ("ab", "abbb", etc.); "ab?": matches 0 or one b; "a? b $": match with one or 0 A plus more than one B ended string .
You can also limit the number of characters in large brackets, such as
"AB {2}": Matching a A back with two B (one can not be less) ("abb"); "AB {2,}": minimum two B ("ABB", "abbbb", ETC "AB {3, 5}": 2-5 B ("abbb", "abbbb", or "abbbb").
You should also notice that you must always specify (IE, "{0, 2}", NOT "{, 2}"). Similarly, you must notice, '*', ' ', and '?' And three range labels are the same, "{0,}", "{1,}", and "{0,1}".
Now put a certain amount of characters in the parentheses, such as:
"A (BC) *": Match A follow with 0 or a "BC"; "A (BC) {1,5}": one to 5 "BC."
There is also a character '│', which is equivalent to OR operation:
"Hi│hello": Match a string containing "hi" or "hello": "(B│CD) EF": Match a string containing "BEF" or "CDEF"; "(A │B) * C" : Matching the string of a string of a string with a C followed by multiple (including 0) A or B;
One point ('.') Can represent all single characters:
"a. [0-9]": A A follows a number of characters (including such a string "^. {3} $": ^. {3} $ "in the future. Character end.
The content enclosed in brackets only matches a single character.
"[ab]": Match a single A or B (like "A │B"); "[AD]": Match 'A' to 'D' Single Character (and "A│B│C│D" There is also the "[abcd]" effect is the same); "^ [A-ZA-Z]": Match the string "[0-9]%" starting with letters: Match the shaped like X% string ", [ A-ZA-Z0-9] $ ": Match strings with a comma in adding a number or alphabet
You can also do you want to get the character column in the middle brackets, you only need to use '^' as the beginning (IE, "% [^ A-ZA-Z]%" in total brackets, including two percent sign There is a non-alpha-string).
In order to explain, "^. [$ () │ * ? {/" As a special meaning, you must add '' in front of these characters, and you should avoid the most in the mode in PHP3. Used /, such as regular expression "(/ $ │? [0-9] " should call EREG ("(/ $ │? [0-9] ", $ STR) (do not know PHP4 Is it the same?
Don't forget the characters in the middle brackets are the exceptions of this routine - inside the brackets, all special characters, including ('), will lose their special nature (IE, "[* / ? {}. ] "Match a string containing these characters). Also, as regx's manual tells us:" If you contain ']', it is best to put it as the first character in the list (possibly following '^' ). If '-', it is best to put it in the foremost or last, OR or a range of the second endpoint (IE [AD-0-9] in the middle of '-' will be effective.
For complete, I should involve Collating Sequences, Character Classes, with Equivalence Classes. But I don't want to talk about these aspects, these in the following articles don't need to be involved. You can get more in Regex Man Pages. Multiple messages.
How to build a mode to match the number of currencies
Ok, now we have to do some useful things we have learned: Build a matching mode to check if the input information is a number representing Money. We think that there are four ways to represent Money: "10000.00" and "10,000.00", or there is no fraction, "10000" and "10,000". Now let's start building this match mode:
^ [1-9] [0-9] * $
This is the variable must begin with a number of non-0. But this also means that a single "0" cannot be tested. The following is a solution:
^ (0│ [1-9] [0-9] *) $
"Only 0 and not match the number starting with 0", we can also allow a negative number to reproduce:
^ (0│ -? [1-9] [0-9] *) $
This is: "0 or one start with 0, there may be a negative number in front of the numbers." Okay, let us not be so strict, allow us to start with 0. Now let us give up the negative, because we are in the coin It doesn't need to be used. We now specify the mode to match the fractional part:
^ [0-9] (/. [0-9] )? $
This implies that the match must start with a Arabian number. But attention, "10." is not matched in the above mode, only "10" and "10.2" can be. (Do you know why)
^ [0-9] (/. [0-9] {2}?
We must have two decimals behind the specified decimal point. If you think this is too harsh, you can change: ^ [0-9] (/. [0-9] {1,2}?
This will allow the decimal point to have one to two characters. Now we add a comma used to increase readability (every three digits), we can say this:
^ [0-9] {1,3} (, [0-9] {3}) * (/. [0-9] {1, 2})?
Don't forget the plus number ' ' can be replaced by the number '*' If you want to allow your blank string to be entered (why?). Don't forget that the anti-ramp '/' may have an error in the PHP string (very Universal mistakes now, we can confirm the string, we now remove all commas to str_replace (",", ",", $ money) and then see the type as double, then we can do mathematics The constitutive check Email's regular expression
Ok, let's continue to discuss how to verify an email address. There are three parts in a complete email address: POP3 username (everything on the left of '@'), '@', server name (that is, that part). The username can contain the case of the case, the number of words ('.'), Minus ('-'), and Underline ('_'). The server name is also in line with this rule, especially the underline.
Now, the beginning and end of the username cannot be a period. The server is the same. There is also a two consecutive sentence to have at least one character, so now let's look at how to write a match mode:
^ [_ a-za-z0-9 -] $
Now you can't allow the existence of the period. We add it:
^ [_ A-ZA-Z0-9 -] (/. [_ a-za-z0-9 -] ) * $
The meaning is said: "At the beginning of less than one specification character (divided. Accident), followed by 0 or more strings starting with points."
Simple, we can replace EREG (). EREGI () is not sensitive to uppercase, we do not need to specify two range "A-Z" and "A-Z" - only need to specify one:
^ [_ a-z0-9 -] (/. [_ a-z0-9 -] ) * $
The next server name is also the same, but you have to drop the underline:
^ [a-z0-9 -] (/. [A-Z0-9 -] ) * $
Now only need to connect two parts: "@":
^ [_ a-z0-9 -] (/. [_ a-z0-9 -] ) * @ [a-z0-9 -] (/. [A-Z0-9 -] ) * $
This is the complete Email authentication match mode, just call
EREGI ('^ [_ a-z0-9 -] (/. [_ a-z0-9 -] ) * @ [A-Z0-9 -] (/. [A-Z0-9 -] ) * $ ', $ EAMIL)
You can get other usages for email regular expressions.
Extract strings
EREG () and EREGI () There is a feature that allows users to extract a string from the regular expression (specific usage you can read the manual). For example, we want to extract file name from the PATH / URL - the following code is you need:
EREG ("([^ ///] *) $", $ Pathorurl, $ regs); Echo $ Regs [1];
Advanced replacement
EREG_REPLACE () and EREGI_REPLACE () are also very useful: If we want to replace all the interval numbers:
EREG_REPLACE ("[/ N / R / T] ", ",", TRIM ($ Str));
PHP is developed in a large number of background CGI development of the web, is usually a result after the user data data, but if the data entered by the user is incorrect, there will be problems, such as someone's birthday is "February 30 day"! How should I check if the summer vacation is correct? Support for regular expressions in PHP, allowing us to make data matching. 2 What is regular expression: Simple, regular expression is a powerful tool that can be used for pattern matching and replacement. Traces of regular expressions in almost all UNIX / Linux systems are found, for example: Perl or PHP scripting languages. In addition, JavaScript's scripting language also provides support for regular expressions, and now regular expressions have become a general concept and tool, which is widely used by all kinds of technicians. There is this in a Linux website: "If you ask Linux lovers favorite, he may answer the regular expression; if you ask him the most fear, he will definitely say that he will say that he will say that he will definitely "As mentioned above, the regular expression looks very complicated, which is scared, most PHP beginners will skip this, continue the following learning, but the regular expression in PHP has a match to find compliance with mode matching The condition of the string, determine whether the string is critical or use the specified string to replace the powerful functions such as the conditional string, and it is unfortunately ...
3 Basic syntax of regular expression: a regular expression, divided into three parts: separator, expressions, and modifiers. The separator may be any character other than a special character (such as "/!", Etc.), the commonly used separator is "/". Expression consists of some special characters (special characters, see below) and non-special strings, such as "[A-Z0-9 _-] @ [A-Z0-9 _-.] " Can match a simple electronics Mail string. The modifier is used to turn on or off some function / mode. The following is an example of a complete regular expression: /Hello. ?hello/is's regular expression "/" is the separator, between the two "/" is expressions, the second "/" behind The string "IS" is a modifier. In the expression, if the separator is included, it is necessary to use the escape symbol "/", such as "/Hello. //Hello/is". In addition to the specific characters that can be performed outside the separator, all special characters composed of letters require "/" to escape, such as "/ d" represents all numbers.
4 Special characters for regular expressions: Special characters in regular expressions are divided into element characters, locating characters, and more. The metammathe is a kind of characteristic character in the regular expression, used to describe its preamble character (ie characters in front of the element) appear in the matched object. The metadature itself is a single character, but the different or the same metammatic combination can constitute a large element character. Metacity: Big Big Number: Braces Use to accurately specify the number of matches, such as "/ pre {1, 5} /" means that the matching object can be "pre", "pree", "preeee" " 1 to 5 "E" strings appear behind the PR. Or "/ pre {, 5} /" represents 0 this to 5 times between 0 this. Plus: " " character is used to match characters before the character appear or multiple. For example, "/ ac /" means that the object being matched may be "ACT", "Account", "ACCCC", etc., "A", "A", "A" strings, such as "A", or more "C" strings. " " Is equivalent to "{1,}". The asterisk: "*" is used to match the characters before matching the character. For example, "/ ac * /" indicates that the matching object can be "app", "ACP", "ACCP", etc., the string of zero or more "C" appears after "A". "*" Is equivalent to "{0,}". Question mark: "?" The characters used to match the characters in front of the character appear zero or 1 time. For example, "/ ac? /" Indicates that the matching object can be "a", "ACP", "ACWP" such as zero or 1 "C" string after "A". "?" There is also a very important role in the regular expression, "greed mode". There are two very important characters that "[]". They can match the characters that appear in "[]", such as "/ [az] /" can match a single character "a" or "z"; if the above expression is changed to "/ [AZ] /" You can match any single lowercase letters, such as "a", "b", and more. If "^" appears in "[]", the characters that appear in this expression do not match "[]", such as "/ [^ a-z] /" do not match any lowercase letters! And the regular expression gives a few "[]" defaults: [: alpha:]: Match any letters [: alnum:]: Match any letters and numbers [: Digit:]: Match any number [: Space: ]: Matching Air Clearance [: Upper:]: Match any uppercase letters [: Lower:]: Match any lowercase letters [: punct:]: Match any punctuation [: xdigit:]: Match any 16 credit number
In addition, the following these special characters have the following meanings of the escape symbol "/" escape as follows: S: Match a single spacer S: It is used to match all characters except the single space character. D: Used to match numbers from 0 to 9, equivalent to "/ [0-9] /". W: Used to match letters, numbers or underscore characters, equivalent to "/ [a-za-z0-9 _] /". W: Used to match all characters that do not match W, equivalent to "/ [^ a-za-z0-9 _] /". D: Used to match any non-10 credit numeric characters. : Used to match all characters outside of the resort line, if the modifier "S" is modified, "." Can represent any character. It is easy to express some cumbersome mode matching using the special characters above. For example, "// d0000 /" utilizes the above regular expression to match more than 10,000 integer strings.
Positioning characters: Location characters is a very important character in the regular expression, and its main role is to describe the characters in the matching object. ^: Indicates that the matching mode appears in the beginning of the matching object (and "[]" inside ("[]") $: indicates that the matching mode appears on the end space of the matching object: indicates the two boundaries of the match and ends. One "/ ^ HE /": You can match the string starting with "HE" characters, such as Hello, Height, etc. "/ he $ /": You can match the string of strings that end up with "HE" characters, "" / HE / ": Sole, and ^ the role of ^, match the string starting with HE;" / he / ": The space is ended, like the role of $, match the string ended with HE;" / ^ he $ / ": Indicates that only the string" HE "matches.
Brand: Regular expression In addition to user matching, you can also use parentheses "()" to record the required information, store it, read it later. For example: /^([A-ZA-Z0-9_-] )@ ([*-za-z0-9_-] ) (.[A-ZA-Z0-9_-]) $/ is the record email address Username, server address of the email address (in the form of username@server.com), after you want to read recorded strings, just need to read "Essential record order" take. For example, "/ 1" is equivalent to the first "[A-ZA-Z0-9 _-] ", "/ 2" is equivalent to the second ([A-ZA-Z0-9 _-] ), "/ 3 "is the third (. [A-ZA-Z0-9_-]). But in PHP, "/" is a special character, it needs to escape, so "" "" "// 1" should be written in the expression of PHP. Other Special Symbols: "|": or symbol "|" and php or the same, but a "|", rather than PHP's "||"! It is to be a character or another string, such as "/ abcd | dcba /" may match "ABCD" or "DCBA".
5 greedy mode: In the first character, "?" There is an important role, "greedy mode", what is "greed mode"? For example, we must match the string ending with the end of the letter "A", but the string that needs to be matched in "A" has many "B", such as "a bbbbbbbbbbbbbbbbbbbbbbbbb," A bbbbbbbbbbbbb ", the regular expression will match Is the first "B" or the last "B"? If you use greed mode, then you will match the last "B", and it is only matched to the first "B". The expression using greedy mode is as follows: /a. ?b/ /a. BU does not use greed mode as follows: /a. b/ The above uses a modifier U, see the section below. 6 Modifier: The modifier inside the regular expression can change many characteristics of the regularity, so that the regular expression is more suitable for your needs (Note: The modifier is sensitive to case, which means "e" is not equal " E "). The modifier in the regular expression is as follows: i: If "i" is added in the modifier, the general write sensitivity will be canceled, i.e., "A" and "A" are the same. M: The default positive start "^" and end "$" just for the regular string If you add "M" in the modifier, then the beginning and end will refer to each line of the string: the beginning of each line is "^ ", The end is" $ ". S: If you add "S" in the modifier, then the default "." represents any character other than the wrapper will become any character, that is, a wrapper! X: If you add this modifier, your blank characters in your expression will be ignored unless it has been essential. E: This modifier is only useful for Replacement, representing the PHP code in Replacement. A: If this modifier is used, the expression must be the beginning of the matching string. For example, "/ a / a" matches "ABCD". E: In contrast to "M", if you use this modifier, "$" will match the end of the absolute string, instead of the frontline, this mode is opened by default. U: The role of the question mark is similar to setting "greed mode".
7 PCRE-related regular expression functions: PHP's Perl compatible regular expressions provide multiple functions provided by pattern matching, replacement, and matching: 1, preg_match: function format: int preg_match (String Pattern, String Subject, Array [Matches]); this function will match the pattern expression in String, if [Regs] is given, String records in [Regs] [0], [regs] [1] represents the use of parentheses "()" Record the first string, [regs] [2] represents the second string recorded, and so on. PREG If you find a matching Pattern in String, you will return "true", otherwise it will return "false". 2, preg_replace: function format: Mixed preg_replace (Mixed Pattern, Mixed Replacement, Mixed Subject); This function uses all strings that match the String of Expression Pattern to express Replacement. If you need some characters that contain Pattern in Replacement, you can use "()" to record, just need to read it with "/ 1" in Replacement.
3, preg_split: function format: array preg_split (String Pattern, String Subject, INT [Limit]); Like the function split Complete Perl compatible regular expressions. The third parameter LIMIT represents how many conformance values are allowed.
4, preg_grep: function format: array preg_grep (String Patern, Array Input); This function and preg_match feature basically, but preg_grep can match all elements in the given array INPUT, return a new array.
Let's take an example. For example, we have to check if the format of the Email address is correct:
PHP Function Emailism ($ EMAIL) {if (preg_match ("^ [_ /. 0-9a-z -] @ ([0-9A-z] [0-9A-Z -] /.) [AZ] {2,3} $ ", $ emil)) {return 1;} return 0;} if (Emailism (@963.net ') Echo' correct
'; if (! emailism ( 'Y10K @ fffff')) Echo 'is incorrect
';?>
The above program will output "correct
incorrect".
8. PERL compatible regular expressions and Perl / EREG regular expressions in PERL / EREG: Although called "Perl compatible regular expression", PHP is still different, such as modifiers compared to PERL's regular expressions "G" represents all the match in Perl, but no support for this modifier is added in PHP. There is also the difference between the EREG series, EREG is also the regular expression function provided in PHP, but it is weak than PREG.
1, EREG does not need to use separator and modifier, so EREG's function is weaker than PREG. 2, about ".": Point in the regular is generally in addition to all characters other than the newline character, but in EREG "." Is any character, that is, a newline! If you want "in PREG". "" Can include a newline, and "s" can be added in the modifier. 3, EREG defaults to use greed mode, and cannot be modified, this brings trouble to many replacement and match. 4, speed: This may be a problem with many people concerned, will it be powerful to be exchanged with speed? Don't worry, PREG speed is far more than EREG, the author made a program test: Time Test:
PHP code:
phpecho "preg_replace used time:"; $ start = time (); for ($ I = 1; $ I <= 100000; $ I ) {$ Str = "Ssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssss"; preg_replace ("/ s /", " ", $ STR);} $ ended = Time () - $ start; echo $ ended; echo" EREG_REPLACE Used Time: "; $ start = TIME (); for ($ I = 1; $ i <= 100000; i ) {$ str = "ssssssssssssssss"; EREG_REPLACE ("S", ", $ STR);} $ ended = time () - $ start; echo $ ended; echo" str_replace used time: "; $ start = TIME (); for ($ I = 1; $ I <= 100000; $ i ) {$ str = "ssssssssssssssssssssssssssssssssss"; str_replace ("s", ", $ str);} $ ended = time () - $ sart Echo $ ended;?> Result: preg_replace used Time: 5 EREG_REPLACE Used Time: 15 str_replace used Time: 2
Str_replace is very fast because there is no need to match, and the speed of preg_replace is much faster than EREG_REPLACE.
9. With regard to PHP3.0 for PREG: Deweg support in PHP 4.0, it is added in 3.0. If you want to use the preg function in 3.0, you must load the php3_pcre.dll file, as long as the PHP.INI's Extension section is added to "Extension = PHP3_PCRE.DLL" and then starting PHP from the newly started PHP! In fact, the regular expression is often used for Ubbcode implementation, and many PHP forums use this method (such as zforum zphp.com or vbullent.com), but the specific code is relatively long.