Regular expression in PHP

zhaozj2021-02-16 140

PHP inherits * NIX's consistent tradition, fully supports processing of regular expressions. Regular expressions provide advanced, but not intuitive string matching and processing methods. Friends who have used Perl know that the functionality of formal expressions is very powerful, but it is not so easy to learn.

such as:

^. @. / .. $

This code that is effective but understands is enough to make some programmers headache (I just) or let them give up using regular expressions. I believe that after you finish this tutorial, you can understand the meaning of this code.

Basic pattern matching

Everything from the most basic start. Mode, is the most basic element of regular expression, which is a set of characters describing string feature. The pattern can be very simple, consisting of ordinary strings, can also be very complicated, often representing a range of characters in a range, repeated, or represents the context with a special character. E.g:

^ onCe

This mode contains a special character ^, indicating that this mode only matches those strings that are starting with ONCE. For example, this mode matches the string "ONCE UPON A TIME", which does not match "There Once Was A Man from NewYork. As the symbol indicates the beginning, the $ symbol is used to match those strings ending at a given mode.

Bucket $

This mode matches "WHO Kept All of this Cash In A Bucket" with "BUCKETS". When using characters ^ and $ simultaneously, indicate exact match (the string is the same as mode). E.g:

^ bucket $

Only match the string "bucket". If a pattern does not include ^ and $, then it matches any string containing the mode. For example: mode

ONCE

String

There ONCE WAS A Man from Newyork

Who Kept All of His Cash in A Bucket.

It is matched.

The letter (O-N-C-E) in this mode is a literal character, that is, they indicate that the letter itself, the number is also the same. Others have some slightly complex characters such as punctuation and white characters (spaces, tabs, etc.) to use escape sequences. All escape sequences are headed in a backslash (/). The escape sequence of the tab is: / t. So if we want to detect if a string starts with the tab, you can use this mode:

^ / t

Similarly, use / n to indicate "new rows", / r represented the carriage return. Other special symbols can be used in front of the backslash, such as the reverse slash itself with / indication, the number. Used /. Representation, in this type.

Character cluster

In the Internet program, regular expressions are usually used to verify the user's input. After the user submits an Form, it is necessary to determine whether the input phone number, address, email address, credit card number, etc. are valid, and use ordinary literally based characters.

So use a more freely describing the way we want, it is a character cluster. To create a character cluster that represents all vow characters, put all the vow characters in a square bracket:

[Aaeeiiouu]

This mode matches any element, but only one character can be represented. Use a linked font size to represent a range of characters, such as:

[A-Z] // Match all lowercase letters

[A-Z] // Match all uppercase letters

[A-ZA-Z] // Match all letters

[0-9] // Match all numbers

[0-9 /./-] // Match all numbers, period, and minus

[/ f / r / t / n] // matches all white characters

Similarly, these are only a character, which is a very important. If you want to match a string consisting of a lowercase letter and a digit, such as "Z2", "T6" or "G7", but not "AB2", "R2D3" or "B52", with this mode: ^ [AZ] [0-9] $

Although [A-Z] represents the scope of 26 letters, it is only matching the first character to the first character is a string of lowercase letters.

The front mentioned that ^ indicates the beginning of the string, but it still has another meaning. When using ^ in a set of square brackets, it means "non-" or "exclusion", often used to eliminate a character. In the previous example, we ask the first character that cannot be numbers:

^ [^ 0-9] [0-9] $

This mode is matched with "& 5", "G7" and "-2", but with "12", "66" is not matched. Here is a few examples of exclusion of specific characters:

[^ a-z] // In addition to all characters other than lowercase letters

[^^] / / In addition to all characters other than (/) (/) (^)

[^ / "/] / / In addition to all characters other than double quotes (") and single quotes ()

Special characters "." (Point, junctions) are used in regular expressions to indicate all characters except "new rows". So the mode "^ .5 $" matches any two characters, ending with the number 5 and the other non-"new row" characters. Mode "." You can match any string, except for a string and only a "new row" string.

PHP's regular expression has some built-in general character clusters, the list is as follows:

Character cluster meaning

[[: alpha:]] any letters

[[: DIGIT:]] Any number

[: alnum:]] any letters and numbers

[: space:]] any white characters

[: Upper:]] any uppercase letters

[: lower:]] any lowercase letters

[[: punct:]] Any punctuation symbol

[[: xdigit:]] Any 16-based figures, equivalent to [0-9A-FA-F] to determine repeated appearance

So far, you know how to match a letter or number, but more cases, you may have to match a word or a set of numbers. A word has several letters, a set of numbers have several singletons. Follow the rack ({}) behind the character or character cluster to determine the number of repetitive appearances of the previous content.

Character cluster meaning

^ [A-ZA-Z _] $ All letters and underscores

^ [[: alpha:]] {3} $ All 3 letters

^ a $ letter A

^ a {4} $ aaaa

^ a {2,4} $ AA, AAA or AAAA

^ a {1,3} $ a, aa or aaa

^ a {2,} $ contains more than two A strings

^ a {2,} such as: Aardvark and Aaab, but Apple can't

A {2,} such as: Baad and AAA, but nantucket is not

/ t {2} Two tabs

. {2} All two characters

These examples describe three different uses of curly brackets. A number, {x} means "the front character or character cluster only appears"; a digital plus comma, {x,} means "the previous content" X or more times "; two The comma-separated numbers, {x, y} representation "The front content appears at least X times, but does not exceed Y." We can extend the pattern to more words or numbers:

^ [A-ZA-Z0-9 _] {1,} $ // All consisting of more than one letter, number or underscore string

^ [0-9] {1,} $ // All positive numbers

^ / - {0,1} [0-9] {1,} $ // All integers

^ / - {0, 1} [0-9] {0,} /. {0, 1} [0-9] {0,} $ // All decimal

The last example is not very understanding, is it? That is: The selected decimal point (/. {0, 1}) followed up to 0 or more numbers ([0-9] {0,}), and there is no other things ($). Below you will know a simpler way to use. Special characters "?" Are equal to {0, 1}, which are all represented: "0 or 1 previous content" or "the previous content is optional." So just now the example can be simplified:

^ / -? [0-9] {0,} /.? [0-9] {0,} $

Special characters "*" are equal to {0,}, which are all representative "0 or more previous content." Finally, the characters " " are equal, indicating "1 or more previous content", so the four examples above can be written:

^ [A-ZA-Z0-9 _] $ // All strings containing more than one alphabet, numbers or underscore

^ [0-9] $ // All positive

^ / -? [0-9] $ // All integers

^ / -? [0-9] * /.? [0-9] * $ // All decimal

Of course, this is not to reduce the complexity of regular expressions, but can make them easier to read.

转载请注明原文地址:https://www.9cbs.com/read-13713.html

9cbs

New Post(0)