The use of regular expressions in network programming

xiaoxiao2021-03-06  39

The use of regular expressions in network programming

[Foreword:] When we write a web program, it is often judged that a string validity, such as; whether a string is a number, whether it is a valid Email address, etc. If you do not use a regular expression, the program that judges will be very long, and it is easy to make mistakes. If you use the regular expression, these judgments are a very easy task. This article fully introduces the regular expressions, format. And add readers' sensibility understanding in PHP, ASP. Regular expressions are widely used, and you need to summarize in learning and practice. Regular expression brief introduction, the regular expression is a powerful tool that can be used for pattern matching and replacement. Applications in network programming, such as PHP scripting languages ​​or JavaScript, client scripts such as VBScript provide support for regular expressions. It can be seen that the regular expression has exceeded the limitations of some languages ​​or a system, and has become a widely accepted concept and function. Regular expression allows users to build a matching mode by using a series of special characters, then compare the matching mode with data files, program input, and web pages, whether or not to include matching mode in the comparison object, perform corresponding program of. For example, a general expression of a regular expression is to verify that the format of the mail address input by the user online is correct. If the format of the user's mail address is verified by the regular expression, the form information filled in the user will be Normally; Conversely, if the user entered by the mail address does not match the pattern of regular expression, the prompt information will be popped up, requiring the user to re-enter the correct email address. This shows that the regular expression has a pivotable role in the logical judgment of the web application. Behind we will give an example detail. Regular expressions are generally as: / love /, where the "/" part of the "/" The segment is the mode to match in the target object. Users can put them between the mode content you want to find the matching object in the "/" delimiter. In order to be able to make user more flexible custom mode content, regular expressions provide special "metadamic characters". The so-called metammatism refers to the exhibit mode of its preamble characters (i.e., characters in front of the metamorphism) in the regular expression. More commonly used metamodes include: " ," *, μ, and {} ", or" / s, / s, / d, / w, and / w ", etc. In order to facilitate user more flexible setting matching mode Regular expression allows the user to use [] to define a character in a matching mode without being limited to a specific character. In addition to our metammatism, the regular expression also has another unique Special characters, ie, locator. Locator is used to specify the appearance position of the matching mode in the target object. More commonly used locators include: "^", "$", "/ b", "/ b". If we It is desirable to implement "or" or "operations in the regular expression. If you choose to match in multiple different modes, you can use the duct" | ". For example: there is a general purpose in the regular expression. The operator, ie, negative "[^].". Different the positioning character "^" mentioned above, the negative "[^]" specifies the string specified in the mode in the target object. Generally come Say, when "^" appears in "[]", it is considered a negative operator; and when "^" is "[]", or if "[]", it should be regarded as a locator. .

Finally, when the user needs to add a metamorphic in the regular expression of the regular expression and find the matching object, you can use the escape character "/". For example: / TH / * /, the regular expression will match "TH *" instead of "THE" in the target object. Regular expression of grammar rules and tags now we officially entered the expression of expressions, I will explain the usage of the expression according to the instance, after reading, you will feel that you will write UBB code so simple, as long as you follow step by step I learned to read this article, you become a UBB master. Exciting is that you can write your own UBB tags, no longer have to go to someone else to copy the code and template. Fortunately, Vbscritp5.0 provides us with the "Regular Expression" object, as long as your server is installed IE5.x, you can run. Character description: ^ Symbol match the beginning of the string. For example: ^ ABC matches "ABC XYZ" without matching the "XYZ ABC" matches the end of the string. For example: ABC $ matches "XYZ ABC" without matching "ABC XYZ". Note: If you use the ^ symbols and $ symbols at the same time, it will be accurately matched. For example, ^ ABC $ is only matched with "ABC" match * symbol matchs 0 or more front characters. For example, AB * can match "AB", "ABB", "ABBB", etc. symbols match at least one front character. For example: AB can match "ABB", "ABBB", etc., but do not match "ab". ? The symbol matches 0 or 1 front characters. For example: AB? C? Can only match "ABC", "ABBC", "ABCC", and "ABBCC". The symbol matches any character other than the commutation. For example: (.) ​​ Match all strings X | Y in addition to the wrapper match "x" or "y". For example: ABC | XYZ can match "ABC" or "XYZ", and "AB (C | X) YZ" matches "abcyz" and "abxyz" {n} matching characters in front of N times (n is non-negative) . For example: A {2} can match "aa", but do not match "a" {n,} matches the character in front of at least N (n is non-negative integer). For example: a {3,} matches "AAA", "AAAA", etc., but does not match "a" and "aa". Note: A {1,} equivalent to A a {0,} equivalent to a * {m, n} matches at least M, up to N-front characters. For example: A {1,3} matches "A", "AA" and "AAA". Note: A {0,1} is equivalent to a? [Xyz] represents a character set, which matches one of the characters in parentheses. For example: [ABC] Matches "A", "B" and "C" [^ xyz] represents a negative character set. Match any character in this parentheses.

For example, [^ ABC] can match any character [A-Z] other than "A", "B" and "C" represents a range of characters in a certain range, and match any characters within the specified interval. For example: [A-Z] matches any lower case character [^ m-n] from "A" to "Z" to represent characters outside a range, matching the characters within the specified range. For example: [m-n] matches any character / symbol from "M" to "N" is an escape operator. For example: / N wrap / F pacharge / R Enter // Matter // Match "/" // Match "/" / s any white character, including spaces, tabs, Page break, etc. Equivalent to "[/ f / n / r / t / v]" / s any non-blank characters. Equivalent to "^ / f / n / r / t / v]" / w word characters, including letters and underscores. Equivalent to "[A-ZA-Z0-9_]" / w any non-word characters. Equivalent to "[^ A-ZA-Z0-9_]" / B match the end of the word. For example: VE / B matches the word "love", but does not match "Very", "Even" and other / b match the beginning of the word. For example: VE / B match words "Very", etc., but do not match "love" / D matching a numeric character, equivalent to [0-9]. For example: ABC / DXYZ matches "ABC2xyz", "ABC4xyz", etc., but does not match "abcaxyz", "ABC-XYZ", etc. / D matching a non-digital character, equivalent to [^ 0-9]. For example: ABC / DXYZ matches "abcaxyz", "ABC-XYZ", etc., but does not match "abc2xyz", "ABC4xyz", "ABC4xyz", etc. / NUM match NUM (where Num is a positive integer), reference to the match to remember. For example: (.) ​​/ 1 Match two consecutive identical characters. / ONUM matches N (where N is an octave extension value of one less than 256). For example: / O011 matching tab / XNUM matches NUM (where Num is a hexadecimal code value of less than 256). For example: / x41 Matching Character "A" application instance After the regular expression has a more comprehensive understanding, you can use the regular expression in Perl, PHP, and ASP. The following is a PHP language as an example, using the authenticated user online input, and whether the format of the URL is correct. PHP provides an EREGI () or EREG () data processing function implementation string compared to profiling mode matching operation EREG () function's usage format is as follows: EREG (Pattern, String) Where Pattern represents the regular expression; and String Then the target object that looks for the replacement operation, such as the email address value. This formats analyzes the bit string String with Pattern rules and finds that the return value is TRUE. The difference between the letter EREG () and EREGI () is that the former is case sensitive, the latter is not related to the case.

The program code written using PHP is as follows: This example is a simple check that can be entered to the user, check if the user's E-mail string is @ 字, in the @ 字 元 英文 英文 英文, digital or lower " _ ", There are several sketches after @, only two or three lowercase English letters after the last decimal point. Such as webmaster@mail.sever.net, hello_2001@88new.cn can pass the check, and new99@253.com (uppercase letters) and new99@253.comn (only more than 3 English letters after the last decimal point) Can't pass the inspection. We can also check the function by calling custom regular regulations, such as the following URL inspection Function: Function VerifyWebsiteaddr ($ StrWebsiteaddr) {Return (EREGI ("^ ([[_ 0-9a-z -] .) . ([0-9A-Z -] .) [AZ] {2, 3} $ ", $ StrWebsiteAddr);} We know that the PHP program must have server support if you want to think on your homepage Realize the above functions, embedded scripting language JavaScript may be a good choice. JavaScript has a powerful regexp () object, which can be used to perform a matching operation of regular expressions. The test () method can verify that there is a match mode in the target object and return TRUE or FALSE accordingly. A JavaScript code is only required to add a JavaScript code in the area of ​​the HTML document. function verifyaddress (obj) {var email = obj.email.value; var pattern = / ^ ([A-ZA-Z0-9 _-]) @ ([A-ZA-Z0 -9_-]) (/. [A-ZA-Z0-9_-]) /; flag = pattern.test (email); if (flag) {Alert ("Your E-mail is checked!" Return true;} else {Alert ("Not a legitimate E-mail address, re-enter!"); Return false;}} then enter information in the webpage

tag area Join the following code: After pressing the submission button, first run the verifyaddress () in, match the identification, send the form information to the target page if the condition is met, otherwise returns Error message.

In fact, the functionality of the regular expression is far from this point mentioned. Next time, give you an example of any kind of text information from any specified web page (all picture file names in the web page) from any specified web page. Skills. In the HTML source file, the image label is defined. We introduce the concept of regular expressions and its use of regular expressions in network programming to verify the user's online input, and the format of the URL is correct, today introduces one The programming skills from the specified web page source file, that is, from the web source file, resolve all the illustration file names (including the picture path), that is, the label The file name" ... / ... / abc.jpg "(some may be a GIF format). Programming environment: PHP Apache for Win98. First, create a new PHP type with a text editor: AbstractSrcFromPage.php3.

For convenience, we plan to enter the URL (or native document) that needs to be drawn from the web page of the image tag in the browser form field, and execute the analysis operation after the submission, so we have to create a file for input The form of the URL is as follows: Enter the URL
Such a mark IF (EREGI (" ("] (SRC = /") [^ / * / "<> |] (/.) ((GIF) | (JPG)) (/ ")", $ source)) {echo "Find the picture tag :)
";} else {echo "did not find pictures label: (
"; } File: // Split, the first time with the label, ) | ()) (] (SRC = / ")", $ source); echo "found: $ imagenums-1 picture
:
"; for ($ I = 1; $ I

Because the file name can contain ", the first element of the split array is the path file name; unset ($ IMGNAME); // Remove the imgname variable before use; $ IMGNAME = Spliti (" / ", $ Splitres [$ I]); // Put the desired picture information to the imgname variable echo "$ I =>". $ IMGNAME [0]. "
"; file: // Output image information }}?> The design idea of ​​this program is that the PHP program determines whether the file name (URL or native file name) is entered. If it does not open the file in a read-only mode; then use a letter FGETS (FP , Length, acquire the line referred to by the file index FP and pass back the string of length Length-1 in this line. The above example is 1024-1 = 1023; then use the string alignment to the error EREG () Find $ SOURCE Whether it contains This is a tag (with a detailed introduction in the letter); if you find it, use split () The function performs two splitting, remove the how are you You may ask: how is the how are you convert to How Are you ? Answering this question is: Use a regular expression.

Second, instance analysis 1) Precise look up the link address in the string ((http | https | ftp): (|) ((/ w) [.]) {1,} (NET | COM | CN | ORG | CC | TV | [0-9] {1,3}) ((// ~] * | // [/ ~] )| [.] (/w) ) * ((((((((?] (/ w) ) {1} [=] *)) * ((/ w) ) {1} ([/ &] (/ w) [/ =] (/ W ) ) *) We know that link addresses generally appear in HTTP or HTTPS or FTP. Initial summary, the link address must meet the conditions: Condition 1 begins at http: // or https: // or ftp: //, etc. (of course, there is only other forms, here, the main) Conditions 2 http: / / Behind the word characters must be followed by "." (Such a combination must appear once, or more). Tightly followed "." The domain name suffix (such as NET or COM or CN, etc., if it is a number in the form of an IP address), it can be a digital 3 after a complete link address, the next level can also occur or more Multi-level directory (also pay attention to the address of the personal home page "~" Symbol) Condition 4 The link address can be taken at the end of the link. If a typical page number? Pageno = 2 & action = Display // https: // https: // ftp: // ftp: // Both match (here, some users may put "//" to "//" volatility errors) Note: "| "Represents" or "," / "is an escape character." "" // "," "//" 2, ((/ w) [.]) {1,} (NET | COM | CN | ORG | CC | TV | [0-9] {1,3}) Meet the condition 2 "((/ W) [(/.) {1,}" means a word character plus a point number can appear 1 time Or multiple times (here, some users like to omit WWW, write http://w3c.com) "(NET | COM | CN | ORG | CC | TV | [0- 9] {1,3}) "indicates that the number below must be terminated with NET or COM or CN or ORG or CC or TV or three times, and the number below is indicated by the number below, because IP No segment of the address exceeds 255 3, ((/ [/ ~] * | ///////// ~] *) (/ w) ) | [.] (/ W) ) * Satisfaction Condition 3 "( // [/ ~] * | // [/ ~] *) "indicates that" / ~ "or" / ~ ", (" [/ ~] * "indicates that ~ can be displayed or not), Because not every link address has the next level directory "(/W) )| [.] (/w)) indicates that a word character must appear (ie, a directory or a file with extensions) : Finally, there is a "*" indicating that the appearance in the parentheses above may not appear, otherwise it can only match the link address of the next level directory.

4 (((((((([?] (/ W) ) {1} [=] *)) * ((/ w) ) {1} ([/ &] (/ w) [/ =] / w) ) *) *) Meet the condition 4 ((([?] (/ w) ) {1} [=] *)) * (/ w) ) {1} "means" • The string of Pageno = 2 can also appear or do not appear. If there is an event (because two "?" Numbers appear). "([/ &] (/ w) [/ =] (/ w) ) *" Indicates the string of "& Action = Display" can not appear (because it is not every web page There are more than two parameters. Whole "((([?] (/ W) ) {1} [=] *)) * ((/ w) ) {1} ([/ &] (/ w) [/ =] (/ w) ) *) * "Indicates the string of"? Pageno = 2 & Action = Display "can also appear (i.e., the link address can have parameters can also have no parameters) Combining, we can match a comparison of a comprehensive link address. Compare a link address with simple "(http: / s )", readers can test comparison. Of course, this code has a lot of shortcomings, I hope everyone can continue to improve. 2) Alternative typical UBB tags: Our purpose is to replace the pair to below to see us to implement its template (/[b/]) (. ) (//////// B /] This is used here "(. )" to match the entire string between between, we have to write to this Str = ChecKexp (RE, STR, " $ 2 " (Note: ChecKexp is my custom function, will be given later. This function will replace the template we provide.) Maybe you will ask a "$ 2" in the east, and pay attention this $ 2 is very important, it represents "(. )" All string that is matched. Why is $ 2 instead of $ 1, $ 3? Because the "" string, $ 3 representative (/ [/]) matched by the "" string, $ 3 representative (/]) matched "" string, it is clear that we need $ 2 instead of $ 1. Third, UBB Regular Expression Template Example The following is a UBB function I wrote, this function can basically make your forum a good UBB code forum. Of course, after improvement, you can get a more powerful UBB forum.

Function Rethestr (Face, STR) DIM RE, STR RE = "/> Str = ChecKexp (RE, STR,"> ") RE =" / <"Str = ChecKexp (Re, Str," <") RE =" / N / R / N / "Str = ChecKexp (Re, Str,"

") RE = CHR (32) Str = ChecKexp (RE, STR," ") RE =" / r "str = checkexp (Re) , Str, "") RE = "/ [img /] ((http: (|)) {1} ((/ w) [.]) {1,3} _ (net | com | cn | org | CC | TV) ((/ / [/ ~] * | // // [/ ~] *) (/ w) ) | [.] (/ w) ) * (/ w) [.] {1 } (GIF | JPG | PNG)) / [// IMG /] "'" Find Image Address Str = ChecKexp (Re, Str, "") RE = "/ [W / ] (http: (|) ((/ w) [.]) {1,} _ (NET | COM | CN | ORG | CC | TV) ((// [/ ~] * | // [/ ~] *) (/ w) ) | [.] (/ w) ) * (((([?] (/ w) ) {1} [=] *)) * ((/ w) ) {1} ([/ &] (/ w) *) *) *) / [// w /] "'" lookup frame address str = checkexp (RE, STR, "