Regular expression profile (transfer)

zhaozj2021-02-16  93

Regular expression profile (transfer)

Regular expressions If they are not used, they may be less familiar with this terminology and concept. However, they are not so nice you imagine. Recall how to find files on the hard disk. You will definitely use the * characters to help find the files you are looking for. • Characters match a single character in the file name, and * matches one or more characters. A pattern such as 'data ?dat' can find the following file: data1.datdata2.datdataX.datdatan.dat If you use * characters instead? The number of files will be expanded. 'data * .dat' can match all the following file names: data.datdata1.datdata2.datdata12.datdata2.datdata12.dataX.datdataxyz.dat Although this search file is definitely useful, it is also very limited. • The limited capacity of wildcards can make you have a concept of regular expressions, but the regular expression is more powerful, and more flexible. -------------------------------------------------- ----- 2. Early origin of the early origins of the original expression "ancestors" can have been traceable to the early study of how the human nervous system works. Two neur physiologists of Warren McCulloch and Walter Pitts have studied a mathematical way to describe these neural networks. In 1956, a US mathematician called Stephen Kleene published an early working on McCulloch and Pitts, published a papers titled "Neural Network Emergencies", introduced the concept of regular expressions. Regular expressions are used to describe expressions he called "regular set algebra", so the term "regular expression" is used. Subsequently, it is found that this work can be applied to some early studies using Ken Thompson's computing search algorithm, Ken Thompson is the main inventors of UNIX. The first practical application of the regular expression is the QED editor in UNIX. As they said, the rest is a well-known history. Since then, until now the regular expression is an important part of the text-based editor and search tool. -------------------------------------------------- ------ 3. Using regular expressions In typical search and replacement, you must provide the exact text to find. This technique may be sufficient for simple search and replacement tasks in static text, but because it lacks flexibility, it is difficult to search for dynamic text, or even impossible. With regular expressions, you can: 1. Test a pattern of strings. For example, an input string can be tested to see if the string exists or a credit card number mode. This is called data validity verification. 2. Replace the text. You can use a regular expression in the document to identify a particular text, then you can delete it, or replace it with another text. 3. Extract a sub-string from the string based on the mode match. Can be used to find a specific text in the text or input field. For example, if you need to search the entire Web site to delete some excessive materials and replace some HTML formatted tags, you can use the regular expression to test each file, see if there is a material or HTML you want to find in this file. Formatted tag. With this method, you can narrow the affected file range to those files that contain materials to be deleted or changed. You can then use the regular expression to delete the outdated material, and finally, you can use the regular expression again to find and replace those markers that need to be replaced. Another example explaining the regular expression is a language that is not known for its string processing capabilities. Vbscript is a subset of Visual Basic, with rich string processing features. Visual Basic Scripting Edition similar to c does not have this capability.

Regular expression gives a significant improvement in string processing capabilities for Visual Basic Scripting Edition. However, it is possible or the use of regular expressions in VBScript, which allows multiple string operations to be executed in a single expression 4. Regular expression syntax, a regular expression is made by ordinary characters (such as characters A to Z). And text mode composed of special characters (called metammatics). This mode describes one or more strings to be matched when the text body is looking for. Regular expression As a template, a character mode matches the search string. Here are some regular expressions that may be encountered: Visual Basicvbscript Match Scripting Edition / ^ / [/ T] * $ / chr (34) ^ / [/ t] * $ chr (34) Matching a blank line. // D {2} - / d {5} / chr (34) / d {2} - / d {5} chr (34) Verify that a ID number is from a 2-bit word, a hyphen, and a 5-bit Number consists. /< (.* )>.*.*< / / / / 10r (34) matches an HTML tag. The following table is a full list of metammatics and its behavior in the regular expression context: Character Description / Tags the next character as a special character, or a primary character, or a backward reference, or an octal escape symbol. For example, 'N' Match Character CHR (34) NCHR (34). '/ n' matches a newline. Sequence '//' matches CHR (34) / chr (34) and chr (34) / (chr (34) matches CHR (34) (CHR (34). ^ Match the input string start position. If set The demiline property of the regexp object, ^ also matches the position after '/ n' or '/ r'. The $ matching the end position of the input string. If the multiline property of the regexp object is set, $ also match '/ n' or '/ R 'previous position. * Match the previous sub-expression zero or multiple times. For example, ZO * can match CHR (34) ZCHR (34) and CHR (34) Zoochr (34). * equivalent {0, }. Match the previous sub-expression once or multiple times. For example, 'ZO ' can match CHR (34) Zochr (34) and CHR (34) Zoochr (34), but does not match CHR (34) ZCHR (34) Equivalent to {1,}. • Match the previous sub-expression zero or once. For example, CHR (34) DO (ES)? CHR (34) can match CHR (34) dochr (34) or CHR ( 34) The CHR (34) DOCHR (34) in Doeschr (34).? Is equivalent to {0,1}. {N} n is a non-negative integer. Match the N times. For example, 'o {2} 'No' in CHR (34) Bobchr (34), but two O 'in the CHR (34) Foodchr (34) can match the two o. {N,} n is a non-negative integer. At least N times. For example, 'o {2,}' does not match 'o' in CHR (34) Bobchr (34), but can match all O.'o {1,} 'equivalent in CHR (34) foooodchr (34) The 'o '. 'O {0,}' is equivalent to 'o *'. {N, m} m and n are non-negative integers, where n <= m. Count the N times and maximum match M times Liu, CHR (34) O {1, 3} Chr (34) will match the top three O in the CHR (34) FooooODChr (34).

'o {0,1}' is equivalent to 'o?'. Please note that there is no space between commas and two numbers? When the character is tight in any other restriction (*, , {n}, {n,}, {n, m}), The matching mode is non-greedy. Non-greedy patterns match the search for strings as little as possible, and the default greed mode is as many as possible to match the search string. For example, for string chr (34) Ooochr (34), 'o ?' Will match a single CHR (34) Ochr (34), and 'o ' will match all 'o'. Match any individual characters other than CHR (34) / Nchr (34). To match any characters including '/ n', use the mode of '[./n]'. (Pattern) Match Pattern and get this match. The acquired matches can be obtained from the generated Matches, using the Submatches collection in VBScript, using $ 0 ... $ 9 properties in Visual Basic Scripting Edition. To match the bracket characters, use '/ (' or '/)'. (?: pattern) Match Pattern but does not acquire the matching result, that is, this is a non-acquired match, not to use it after storage. This is useful to use the CHR (34) or CHR (34) character (|) to combine a pattern. For example, 'industr (?: Y | iES) is a smale of' Industry | Industries'. (? = pattern) Positive to check, match the lookup string at any string of Pattern. This is a non-acquisition match, that is, the match does not need to be used later. For example, 'Windows (? = 95 | 98 | NT | 2000) Map CHR (34) Windows 2000CHR (34) in Windowschr (34), but does not match CHR (34) Windows3.1chr (34) CHR (34) Windowschr (34). It is not consumed by the character, that is, after a match occurs, start the next matching search immediately after the last match, not starting from the character containing the pre-check. (?! pattern) negotiation, match the lookup string at any string of any mismatch at any Point WHERE A STRING NOT MATCHING POINT WHERE A STRING NOT MATCHING PATTERN. This is a non-acquisition match, that is, the match does not need to be used later. For example, 'Windows (?! 95 | 98 | NT | 2000)' Match CHR (34) Windows 3.1Chr (34) in Windowschr (34), but does not match CHR (34) Windows 2000CHR (34) CHR (34) Windowschr (34). It is not consumed by the character, that is, after a match occurs, start the next matching search immediately after the last match, not the X | Y, which matches X or Y after the character containing the queue. For example, 'Z | Food' can match CHR (34) ZCHR (34) or Chr (34) Foodchr (34). '(z | f) OOD' matches CHR (34) Zoodchr (34) or Chr (34) Foodchr (34). [XYZ] Character collection. Match any of the included characters.

For example, '[abc]' can match 'a' in CHR (34) Plainchr (34). [^ XYZ] Negative character set. Match any of the characters that are not included. For example, '[^ ABC]' can match the 'P' in the CHR (34) Plainchr (34). [A-Z] character range. Match any of the characters within the specified range. For example, '[a-z]' can match any lowercase alphabetic characters in the 'A' to 'Z' range. [^ a-z] Negative character range. Match any of any characters that are not within the specified range. For example, '[^ a-z]' can match any of any characters that are not in the 'A' to 'Z'. / b Match a word boundary, that is, the location of the words and spaces. For example, 'ER / B' can match the 'Er' in the CHR (34) Neverchr (34), but does not match the 'ER' in the CHR (34) Verbchr (34). / B matches non-word boundary. 'ER / B' Match 'Er' in CHR (34) Verbchr (34), but does not match the 'Er' in the CHR (34) Neverchr (34). / CX matches the control character indicated by x. For example, / cm matches a Control-M or an Enterprise. The value of x must be one of A-Z or A-Z. Otherwise, the C is treated as a primary 'c' character. / d Match a numeric character. Equivalent to [0-9]. / D Match a non-digital character. Equivalent to [^ 0-9]. / f Match a change page. Equivalent to / x0c and / cl. / n Match a newline. Equivalent to / x0a and / cj. / r Match a carriage return. Equivalent to / X0D and / cm. / s Match any blank character, including spaces, tabs, change page, and the like. Equivalent to [/ f / n / r / t / v]. / S Match any non-blank character. Equivalent to [^ / f / N / R / T / V]. / t matches a tab. Equivalent to / x09 and / ci. / v Match a vertical tab. Equivalent to / x0b and / ck. / w Match any word character that includes underscore. Equivalent to '[A-ZA-Z0-9_]'. / W Match any non-word characters. Equivalent to '[^ a-za-z0-9_]'. / XN matches n, where n is a hexadecimal escape value. The hexadecimal escape value must be a determined two numbers long. For example, '/ x41' matches CHR (34) ACHR (34). '/ x041' is equivalent to '/ x04' & chr (34) 1chr (34). ASCII coding can be used in regular expressions. ./num matches NUM, where NUM is a positive integer. References to the acquired match. For example, '(.) / 1' matches two consecutive identical characters. / n identifies an octal escape value or a rearward reference. If the sub-expression of at least N before / N, n is a backward reference. Otherwise, if n is an octal number (0-7), then n is an eight-input escape value. / Nm identifies an octal escape value or a backward reference. If the / nm has at least IS Preceded by Least NM acquired a sub-expression, the nm is a backward reference. If there is at least n acquisitions before / nm, then n is a rear reference reference to the text M. If the previous conditions are not satisfied, if n and m are octal numbers (0-7), the / nm will match the eight-way escape value Nm. / Nml If n is an octal number (0-3), and M and L are eight-input numbers (0-7), match the eight-en-en-escaic value NML. / UN matches N, where N is a Unicode character represented by four hexadecimal numbers.

For example, / u00A9 matches copyright symbol (?). 5. Establish a regular expression to construct a regular expression and a method of creating a mathematical expression. That is, using a variety of metamorphic characters to create a larger expression together with the operator. A regular expression can be constructed by placing various components of the expression mode between a pair of separators. For Visual Basic Scripting Edition, the separator is a pair of forward slash (/) characters. For example: / Expression / For VBScript, a pair of quotes (CHR (34) CHR (34)) are used to determine the boundary of the regular expression. For example: CHR (34) Expressionchr (34) In the two examples shown above, the regular expression mode is stored in the Pattern property of the Regexp object. << --------------------------------------------------------------------------------------------------------------------------------------- ------ >> 6. After the sequence of priority is constructed, it can be evaluated like a mathematical expression, that is, can be obtained from left to right and in accordance with a priority order. The following table lists the priority sequence of various regular expression operators from the highest priority to the lowest priority: operator describes / escape (), (? :), (? =), [] Parentheses and squares Brand brackets *, ,}, {n, m} Limits ^, $, / ANYMETAracter location and sequence | "or" operation << ----------- ----------------------------------------------- >> 7 Ordinary character normal characters consist of all the prints and non-print characters that are not explicitly specified as metamatics. This includes all uppercase and lowercase letters characters, all numbers, all punctuation symbols, and some symbols. The simplest regular expression is a separate normal character that matches the character in the search string itself. For example, single-character mode 'a' can match the letter 'a' that appears in any position in the search string. There are some examples of single-character regular expression modes: / A // 7 // m / equivalent VBScript single-character regular expression is: CHR (34) ACHR (34) CHR (34) 7chr (34) CHR ( 34) MCHR (34) You can get a large expression together with multiple single characters together. For example, the following Visual Basic scripting edition regular expression is nothing else, is a expression created by combining single character expressive 'a', '7' and 'm'. / A7M / equivalent VBScript expression is: CHR (34) A7MCHR (34) Please note that there is no connection operator. What you need to do is to place a character behind another character. <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<< >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Available in a special process when trying to match it. To match these special characters, these characters must first use these characters, that is, use a backslash (/) in front. The following table gives these special characters and its meaning: Special character descriptions $ Match the end position of the input string.

If the demiline property of the Regexp object is set, $ or '/ r' is matched. To match the worth itself, use / $. () Mark the beginning and end position of a child expression. Sub-expressions can be used later. To match these characters, use / (and /). * Match the previous sub-expression zero or multiple times. To match * characters, use / *. Match the previous sub-expression once or multiple times. To match characters, use / . Matches any single characters other than the resort / N. To match., Please use /. Marking a bracket expression. To match [, please use / [. • Match the previous sub-expression zero or once, or indicate a non-greedy qualifier. To match? Characters, please use / ?. / Tag the next character as a special character, or primary character, or rearward reference, or eight-encyclopedifier. For example, 'n' matches character 'n'. '/ n' matches changing. Sequence '//' matches CHR (34) / chr (34), and '/ (' matches CHR (34) (CHR (34). ^ Match the start position of the input string unless used in square brackets It is at this time that it does not accept the character collection. To match the character itself, use the / ^. {Tag qualifier expression. To match {, please use / {. | Indicate two options between two items. To match |, please use / |. 9. Non-print characters have a very useful non-printing characters, occasionally must be used. The following table shows the escape sequence used to indicate these non-print characters: Character meaning / CX match x The control character is specified. For example, / cm matches a CONTROL-M or a carriage return. X must be one of AZ or AZ. Otherwise, the C is treated as a primary 'c' character. / f matches one Change page. Equivalent to / x0c and / cl. / N matches a newline. Equal to / x0a and / cj. / R Match a carriage return. Equal price to / x0d and / cm. / S match any Blank characters, including spaces, tabs, change page characters, etc. Equivalent to [/ f / n / r / t / v]. / S matches any non-blank character. Equivalent to [^ / f / N / R / T / V]. / t Match a tab. Equal price is / x09 and / ci. / v matches a vertical tab. Equal price is / x0b and / ck. <<<<<<<< <<<<<<<<<<<<<<<<<<<<<<<<<< >>>>>>>>>>>>>>> >>>>>>>>>>>> 10. The character matching point (.) Matches any single print or non-printing character in a string, except for the wrapper (/ N). The following Visual Basic scripting edition Regular expressions can match 'AAC', 'ABC', 'ACC', 'ADC', etc., can also match 'A1C', 'A2C', A-C 'and A # C': / AC /, etc. VBScript regular expression is: chr (34) a.cchr (34) If you try to match a string containing a file name, the period (.) Is part of the input string, you can in the regular expression in the regular expression A backslash (/) character is added to achieve this requirement.

转载请注明原文地址:https://www.9cbs.com/read-14985.html

New Post(0)