The use of regular expressions in network programming

xiaoxiao2021-03-06  79

[Foreword:] When we write a web program, it is often judged that a string validity, such as; whether a string is a number, whether it is a valid Email address, etc. If not

Regular expression, then the judgment program will be very long, and it is easy to make mistakes. If the regular expression is used, these judgments are a very easy task. This article has a comprehensive introduction to the regular expression

Sound, format. And add readers' sensibility understanding in PHP, ASP. Regular expressions are widely used, and you need to summarize in learning and practice.

Regular expression

Simply put, the regular expression is a powerful tool that can be used for pattern matching and replacement. Apply a wide range of network programming, such as PHP scripting languages ​​or JavaScript, VBSCRI

The client scripting language such as PT provides support for regular expressions. It can be seen that the regular expression has exceeded a certain language or a system limit, and it has become a widely accepted

Concept and function.

Regular expressions allow users to build matching mode by using a series of special characters, then target matching mode with data files, program input, and web pages.

When compared, the corresponding program is performed based on whether or not the matching mode is included in the comparison object.

For example, a general expression of a regular expression is to verify that the format of the mail address input by the user is correct, if the user mail is verified by regular expression

The address is formatted correctly, the form information filled in the user will be processed normally; contrary, if the user entered by the mail address does not match the mode, the prompt information will be popped up,

Ask the user to re-enter the correct email address. This shows that the regular expression has a pivotable role in the logical judgment of the web application. Behind we will give an example detail.

Regular expressions are generally as: / love /, where the "/" part of the "/" The segment is the mode to match in the target object. Users as long as they want to find the matching object

The mode content is placed between "/". In order to be able to make user more flexible custom mode content, regular expressions provide special "metadamic characters". The so-called metammatism refers to those in positive

The expression is specialized in the expression that can be used to specify its preamble character (ie characters in front of the element characters) in the target object. More commonly used element character

Including: " ," * ,? and {} ", or" / s, / s, / d, / w, and / w ", etc. In order to facilitate user more flexible setting matching mode, regular expression allows Users are in matching mode

Use [] to define characters that matches a range without being limited to specific characters.

In addition to our metamodes, the regular expression has another unique dedicated character, ie the positioner. The locator is used to specify the matching mode in the target object.

It is now possible. More commonly used locators include: "^", "$", "/ b", and "/ b".

If we want to implement "or" or "operations in the regular expression, you can use a match in multiple different modes to use the pipeline" | ". E.g

:

There is also a more common operator in the regular expression, ie, negative "[^]". Unlike the positioning characters "^" mentioned above, the negative "[^]" specifies that the target object cannot be stored.

The string specified in the mode. In general, when "^" appears in "[]", it is considered a negative operator; and when "^" is "[]", or "[]", it should be regarded. Position

symbol. Finally, when the user needs to add a metamorphic in the regular expression of the regular expression and find the matching object, you can use the escape character "/". For example: / TH / * /, the regular expression will be

"TH *" instead of "THE" is equal to "THE".

Regular expression syntax rules and tags

Now we officially enter the expression of expressions, I will explain the usage of the expressions according to the instance, after reading, you will feel that the UBB code is so simple, as long as you step by step

After I learned, you will become a UBB master after reading this article. Exciting is that you can write your own UBB label, no longer have to go to the copy of the code and template there.

. Fortunately, Vbscritp5.0 provides us with the "Regular Expression" object, as long as your server is installed IE5.x, you can run.

Character description:

^ Symbol match the beginning of the string. E.g:

^ ABC matches "ABC XYZ" without "XYZ ABC"

The symbol matches the end of the string. E.g:

ABC $ matches "XYZ ABC" without matching "ABC XYZ".

Note: If you use the ^ symbols and $ symbols at the same time, it will be accurately matched. E.g:

^ ABC $ matches "ABC"

* The symbol matches 0 or more front characters. E.g:

AB * can match "ab", "abb", "abbb", etc.

The symbol matches at least one front character. E.g:

AB can match "ABB", "ABBB", etc., but do not match "ab".

? The symbol matches 0 or 1 front characters. E.g:

AB? C? can only match "ABC", "ABBC", "ABCC" and "ABBCC"

The symbol matches any character other than the change line. E.g:

(.) Match all strings other than the restroom

X | Y matches "x" or "y". E.g:

ABC | XYZ Match "ABC" or "XYZ", and "AB (C | X) YZ" matches "abcyz" and "ABXYZ"

{n} matches characters in front of N times (n is non-negative integer). E.g:

A {2} can match "aa" but do not match "a"

{n,} Match the character in front of at least N (n is non-negative integer). E.g:

A {3,} Match "AAA", "AAAA", etc., but does not match "A" and "AA".

Note: A {1,} equivalent to A

A {0,} equivalent to a *

{m, N} matches at least M, up to N, and the characters. E.g:

A {1,3} matches "A", "AA" and "AAA".

Note: A {0,1} is equivalent to a?

[XYZ] represents one of the character sets that matches one of the characters in parentheses. E.g:

[ABC] Match "A", "B" and "C"

[^ xyz] represents a negative character set. Match any character in this parentheses. E.g:

[^ ABC] can match any characters other than "A", "B" and "C"

[A-Z] indicates a range of characters in a range, matching any characters in the specified interval. For example: [A-Z] matches any lowercase letter character between "A" to "Z"

[^ m-n] represents characters outside a range, matching characters that are not within the specified range. E.g:

[m-n] matches any character from "M" to "N"

/ Symbol is a escape operator. E.g:

/ N wrapper

/ f page

/ r Enter

/ T tabletter

/ V vertical tab

// Match "/"

// Match "/"

/ s Any white character, including spaces, tabs, pagings, and the like. Equivalent "[/ f / n / r / t / v]"

/ S any non-blank character. Equivalent to "^ / f / n / r / t / v]"

/ w Word characters, including letters and underscores. Equivalent "[A-ZA-Z0-9_]"

/ W any non-word character. Equivalent "[^ a-za-z0-9_]"

/ b Match the end of the word. E.g:

VE / B matching words "love", etc., but do not match "Very", "Even", etc.

/ B matches the beginning of the word. E.g:

VE / B match the word "Very", but do not match "love", etc.

/ d Match a numeric character, equivalent to [0-9]. E.g:

ABC / DXYZ matches "ABC2xyz", "ABC4xyz", etc.

But do not match "Abcaxyz", "ABC-XYZ", etc.

/ D Match a non-digital character, equivalent to [^ 0-9]. E.g:

ABC / DXYZ matches "Abcaxyz", "ABC-XYZ", etc.

But do not match "ABC2XYZ", "ABC4xyz", etc.

/ NUM matches NUM (where NUM is a positive integer), reference to the match to remember. E.g:

(.) / 1 Match two continuously identical characters.

/ ONUM matches N (where N is an octave extension value of one less than 256). E.g:

/ O011 matching tab

/ XNUM matches NUM (where Num is a hexadecimal code value of less than 256). E.g:

/ x41 match characters "a"

Applications

After a comprehensive understanding of the regular expression, you can use the regular expression in Perl, PHP, and ASP.

The following is a PHP language as an example, using the authenticated user online input, and whether the format of the URL is correct. PHP provides an EREGI () or EREG () data processing function implementation string

The format of the EREG () function is as follows:

EREG (Pattern, String)

Among them, Pattern represents the mode of the regular expression; and String is the target object that performs the lookup replacement operation, such as the email address value. This format is analyzed by Pattern rules.

String string, find the return value is true. The difference between the letter EREG () and EREGI () is that the former is case sensitive, the latter is not related to the case. Program code written using PHP is as follows

:

IF (EREG ("^ ([A-Z0-9_-]) @ ([A-ZZ0-9_-]) (/. [A-Z0-9 _-]) [AZ] {2,3} $ ", $ EMAIL))

{Echo "Your E-mail is checked!"

Else

{Echo "is not a legal E-mail address, please re-enter!";}?>

This example is a simple check for E-mail to the user, check if the user's E-mail string is @ 字, with lowercase English letters, numbers in front of the @ character

"_", There are several sketches after @, and only two or three lowercase English letters after the last decimal point. Such as webmaster@mail.sever.net, Hello_2001@88New.cn can

By checking, and new99@253.com (there is a capital letter) and new99@253.comn (only more than 3 English letters after the last decimal point) cannot be checked.

We can also check the function by calling custom regular regular rules, such as the following URL inspection.

Function VerifyWebsiteaddr ($ StrWebsiteaddr) {

Return (EREGI ("^ ([[0-9A-Z -] .) ([0-9A-Z -] .) [A-Z] {2,3} $", $ StrWebsiteAddr);

}

We know that the PHP program must have server support if you want to implement the above features on your homepage.

Embedded scripting language JavaScript may be a good choice. JavaScript has a powerful regexp () object, which can be used to perform a matching operation of regular expressions. Where T

The EST () method can verify that there is a match mode in the target object and return TRUE or FALSE accordingly. A JavaScript code is only required to add a JavaScript code in the area of ​​the HTML document.

Function VerifyAddress (OBJ) {

Var email = Obj.email.Value;

Var pattern = /^([A-ZA-Z0-9_-] )@([A-ZA-Z0-9_-] ) (/.[a-za-z0-9_-]) /;

Flag = Pattern.Test (email);

IF (flag) {

Alert ("Your E-Mail is checked!");

Return True;}

Else {

Alert ("Not a legal E-mail address, please re-enter!");

Return false;}

}

Then add the following code in the form of the message in the web page to enter the information:

>

When the submission button is pressed, the verifyAddress () is first run, matching the identification, sending form information to the target page if the condition is met, otherwise returns an error message.

In fact, the function of the regular expression is far from this point mentioned in this article. Next time, give you a presentation of any type of type text information from any specified web page using regular expressions (such as

Tips for all image file names in the web page.

Analysis of Image Tags in HTML Source File

The upper article, we introduce the concept of regular expressions and its use of regular expressions in network program verification user online input, and the correct application examples of the URL

Today, I introduced a programming skill from the IMAGE tag from the specified web source file, that is, in the web source file, parsing the illustration file name (including the picture path), which is the label The file name" ... / ... / abc.jpg "(some may be a GIF format). Programming environment: PHP Apache for Win98.

First of all,

Use the text editor to create a file with a PHP type: AbstractSrcFromPage.php3. For convenience, we plan to enter the image tag in the browser form field.

The URL (or native document) of the web page, executes the analysis, so in this file, we have to establish a form for entering the URL, for example:

Enter URL

Enter the correct URL, the submission form information is sent to the AbstractSrcFromPage.php3 page, because the form itself is on this page, so it is equivalent to being sent to its own page, below

We need to write the PHP code for the drawback processing, and write the following code after the form code segment:

IF ($ filename! = ") {

$ fp = fopen ($ filename, "r"); file: // If the input is not empty, turn on the local or remote file;

While ($ Buffer = FGETS ($ FP, 1024)) {

$ Source. = $ buffer;

Fclose ($ fp);

File: // Find no such tags in $ SOURCE

IF (EREGI ("(] (SRC = /") [^ / * / "<> |] (/.) ((GIF) | (JPG)) ( / ")", $ SOURCE)) {

echo "Find pictures Tags :)
";

Else {

Echo "No picture label: (
";

File: // Split, the first time with the label,

$ splitres = split ("((/">) | ()) (] (src = / ")", $ source);

Echo "Found: $ imagenums-1 picture
is:
, respectively;

For ($ I = 1; $ i

UNSET ($ IMGNAME); // Remove the IMGNAME variable before use;

$ IMGNAME = Spliti ("/", $] s s]]]] 图片 图片 图片 信息

Echo "$ I =>". $ IMGNAME [0]. "
"; file: // Output image information

}

}

?>

The design idea of ​​this program is that the PHP program determines whether the file name (URL or native file name) is entered. If it does not open the file in a read-only mode; then use the function FGE

TS (FP, Length) acquires the line referred to by the file index FP and passes back the string of length length-1 in the line. The above example is 1024-1 = 1023; then use the string alignment to the error EREG () check

Looking for $ SOURCE in

Such a mark (with a detailed introduction in the context); if you find it, use the split () in the Split () to perform two split points, remove the

= Character and "Character, result More of the array splitres, each of which is an array starting with graphical path file name; using for loop to output each array on the screen

The value of all the graphics paths we need.

One of the ports, SIZEOF ($ SPLITRES); in the FOR loop, assign each element (also array) of the array splitres to the array variable IMGNAME, and output IMGNAM

E The first element value (found for a graphics path file name), when the next loop is executed, delete the variable IMGNAME to reach the purpose of reuse. Everyone can study it carefully.

secret.

Ok, after writing, save the abstractsrcfrompage.php3 to your server specified, start the Apache server, open it in your browser, just enter an existence

Web name or remote URL to see how it works.

If you are interested, you can try to extract any interest in the HTML document, if you change it slightly, do a website text search engine is not more wonderful?

Application of Regular Expression in UBB Forum

I. Concept of UBB code

What is UBB code?

The UBB code is a variant of HTML. In general, the UBB forum does not allow you to use HTML code, but only replace HTML code with UBB code.

UBB code is a set of fixed UBB tags that form a fixed code, and the code has a unified format. Users can implement the user's desired function as long as the code rules can be followed. Such as:

Want to display a bold how are you, you should enter how are you instead of entering how are you

You may ask: How is the ASP to convert how are you convert to how are you ? Answering this question is: Use a regular expression.

Second, instance analysis

1) Precise look up the link address in the string

((HTTP | HTTPS | FTP): (|) ((/ W) [.]) {1,} (NET | COM | CN | ORG | CC | TV | [0-9] {1,3}) ((// [/ ~] * | // [/ ~] *)

(/W) )| [.] (/w) ) * (((( ((([w) )} ((/w) )} ((/w) )}}} ([/ &] (/ w) [/ =] (/ w) ) *) *)

We know that link addresses generally appear in HTTP or HTTPS or FTP. Initial summary, the link address must meet the following conditions:

Condition 1

Opening with http: // or https: // or ftp: // (of course, there are other forms, only the main one is listed)

Condition 2

Http: // must follow a word character, followed by "." (such a combination must appear once). Tightly followed "." The domain suffix (eg

NET or COM or CN, etc., if it is a number in the form of an IP address)

Condition 3

Once there is a complete link address, you can also appear next or more directory (also pay attention to the address of the personal home page), "~" symbol)

Condition 4

The link address can be used with parameters. As typical pages? Pageno = 2 & action = display, etc.

Now let's match the above conditions by using the following code -

1, ((http | https | ftp): (|) Satisfaction 1

Represents http: // https: // ftp: // ftp: // matches (here, some users may put "//" to "//" Sexual error)

Note: "|" "" or "," / "is an escape character. "" "//", "" means "//"

2, ((/ w) [.]) {1,} (NET | COM | CN | ORG | CC | TV | [0-9] {1,3}) Satisfaction Condition 2

"(/W) [.] {1,}" indicates that a word character plus a point number can appear or multiple times (here, some users like to omit WWW and will http: //www.w3c .com is written in http: //

W3C.com)

"(NET | COM | CN | ORG | CC | TV | [0-9] {1, 3})" indicates that must be ended in NET or COM or CN or ORG or CC or TV or three times below

[0-9] {1, 3} represents the number below, because any paragraph of the IP address cannot exceed 255

3, ((/ [/ ~] * | // [/ ~] *) (/ w) ) | [.] (/ W) ) * Satisfaction 3

"(// ~] * | // [/ ~] *)" indicates that "/ ~" or "/ ~", (where "[/ ~] *" can be displayed ~ can appear or not ), Because it is not the next level of each link address

table of Contents

"(/W) )| [.] (/w ) " means a word character (ie directory or file with extensions)

Note: Finally, there is a "*" indicating that the appearance in the parentheses above may not appear, otherwise it can only match the link address of the next level directory. 4 (((((((([?] (/ W) ) {1} [=] *)) * ((/ w) ) {1} ([/ &] (/ w) [/ =] / w) ) *) *) Satisfaction 4

"(((((([?] (/ w) ) {1} [=] *)) * (/ w) ) {1}" The string of "? pageno = 2" can also appear Do not appear, if there is only one (because there is no two

"?" Appears).

"([/ &] (/ w) [/ =] (/ w) ) *" Indicates the string of "& Action = Display" can not appear (because it is not every web page There are more than two parameters.

Whole "((([?] (/ W) ) {1} [=] *)) * (/ w) ) {1} ([/ &] (/ w) [/ =] (/ W) *) * "The string of" "? pageno = 2 & action = display" can not appear

Now (ie, the link address can have parameters or no parameters)

To combine the above, we can match a more comprehensive link address. Compare a simple "http: / s )" to match a link address, readers can

Row test comparison. Of course, this code has a lot of shortcomings, I hope everyone can continue to improve.

2) Alternative typical UBB tags:

Our purpose is to replace the pair to see the template we implemented in .

(/[b/]) (. ) (//b/])

Here is "(. )" To match the entire string between, we must write it when it is alternative.

Str = Checkexp (Re, Str, " $ 2 ")

(Note: Checkexp is my custom function, will be given later. This function will replace the template we provide.)

Maybe you will ask a "$ 2" in what you have, and you pay attention to this $ 2, but it represents the entire string of "(. )".

Why is $ 2 instead of $ 1, $ 3? Because the "" string of "" string, $ 3 representative (/ [/]) matched (/ [b /]), apparently here we need $ 2 without

It is $ 1 $ 3.

Third, UBB regular expression template instance

Below is a UBB function I wrote, this function can basically make your forum become an excellent UBB code forum. Of course, by improvement, you can get a more powerful U

BB Forum.

Function Rethestr (Face, STR)

DIM RE, STR

RE = "/>"

Str = ChecKexp (Re, Str, ">")

RE = "/ <"

Str = ChecKexp (Re, Str, "<")

RE = "/ n / r / n /"

Str = ChecKexp (Re, Str, "

")

RE = CHR (32)

Str = ChecKexp (RE, STR, "")

RE = "/ r" str = checkexp (re, str, "")

RE = "/ [img /] ((http: (|)) {1} ((/ w) [.]) {1,3} _

(NET | COM | CN | ORG | CC | TV) ((// [/ ~] * | // [/ ~] *)

(/W) )| [.] (/w) ) * (/w) [.] {1} (GIF|jpg|png ))/[//img/] "'Find the image address

Str = ChecKexp (Re, Str, "

RE = "/ [w /] (http: (|) ((/ w) [.]) {1,} _

(NET | COM | CN | ORG | CC | TV) ((/ [/ ~] * | // [/ ~] *) (/ w) ) | [.] (/ w) ) *

((((((((?] (/ w) ) {1} [=] *)) * ((/ w) ) {1} ([/ &] (/ w) [/ =] (/ W ) ) *) *) / [// w /] "" Find the frame address

Str = Checkexp (Re, Str, "