Regular expression learning notes

xiaoxiao2021-03-06  60

Regular expression learning notes date: 2004-01-30 Regular Expression describes a string matching mode, which can be used to check if a string contains a skewer, replacing the matching substring or from A schist that meets a certain condition is taken out in a string. When the column directory, DIR * .TXT or LS * .TXT * .txt is not a regular expression because * is different from the meaning of regular *. In order to facilitate understanding and memory, start with some concepts, all special characters or characters have a total table behind, and finally some examples are understood.

Regular expression

It is a text mode composed of normal characters (such as characters a to z) and special characters (called metammatics). Regular expression As a template, a character mode matches the search string. One regular expression can be constructed by placing various components of the expression mode between a pair of separators, i.e. / Expression /

Ordinary character

It consists of all printed and non-print characters that are not explicitly specified as metabits. This includes all uppercase and lowercase letters characters, all numbers, all punctuation symbols, and some symbols.

Non-printing characters

Character Meaning / CX matches the control character indicated by x. For example, / cm matches a Control-M or an Enterprise. The value of x must be one of A-Z or A-Z. Otherwise, the C is treated as a primary 'c' character. / f Match a change page. Equivalent to / x0c and / cl. / n Match a newline. Equivalent to / x0a and / cj. / r Match a carriage return. Equivalent to / X0D and / cm. / s Match any blank character, including spaces, tabs, change page, and the like. Equivalent to [/ f / n / r / t / v]. / S Match any non-blank character. Equivalent to [^ / f / N / R / T / V]. / t matches a tab. Equivalent to / x09 and / ci. / v Match a vertical tab. Equivalent to / x0b and / ck.

Special characters

The so-called special characters are some characters with special meanings. As mentioned above, "* .txt" *, simple saying is to indicate any string. If you want to find a file in the file name, you need to escape *, that is, before it is added. Ls /*.txt. Regular expressions have the following special characters. Special character description $ Match the end position of the input string. If the demiline property of the Regexp object is set, $ or '/ r' is matched. To match the worth itself, use / $. () Mark the beginning and end position of a child expression. Sub-expressions can be used later. To match these characters, use / (and /). * Match the previous sub-expression zero or multiple times. To match * characters, use / *. Match the previous sub-expression once or multiple times. To match characters, use / . Matches any single characters other than the resort / N. To match., Please use /. [Marking a bracket expression. To match [, please use / [. • Match the previous sub-expression zero or once, or indicate a non-greedy qualifier. To match? Characters, please use /?. / Tag the next character as a special character, or primary character, or backward reference, or eight-way escape. For example, 'n' matches character 'n'. '/ n' matches changing. Sequence '//' Match "/", and '/ (', match "(". ^ Matches the start position of the input string unless used in square brackets, it indicates that it does not accept the character set. Match ^ character itself, please use / ^. {Tag qualifier expression. To match {, please use / {. | Indicate two options between two items. To match |, please use / |. Construction regular expression The method of the formula and the method of creating a mathematical expression. That is, using a variety of metades and operators to create a larger expression. Regular expressions can be a single character, characters Collection, character range, selection of characters or any combination of all of these components.

Default

A given component that must be specified to specify a given component to meet the match. There are 6 types of * or or {n} or {n,} or {n, m}. *, And? The qualifier is greedy because they will match the text as much as possible, only with one after they add one? Non-greed or minimum match. Regulators of the regular expression include: Character Description * Match the previous sub-expression zero or multiple times. For example, ZO * can match "Z" and "ZOO". * Equivalent to {0,}. Match the previous sub-expression once or multiple times. For example, 'ZO ' can match "ZO" and "ZOO" but cannot match "Z". Equivalent to {1,}. • Match the previous sub-expression zero or once. For example, "Do (ES)" can match "do" in "do" or "does". Is equivalent to {0,1}. {n} n is a non-negative integer. Match the determined N times. For example, 'o {2}' does not match 'o' in "Bob", but can match two O in "Food". {n,} n is a non-negative integer. At least n times. For example, 'o {2,}' cannot match 'O' in "Bob", but can match all O in "fooOOD". 'o {1,}' is equivalent to 'o '. 'o {0,}' is equivalent to 'o *'. {N, M} M and N are non-negative integers, where n <= m. Match at least n times and matched M times. For example, "O {1, 3}" will match the top three O in "foooood". 'o {0,1}' is equivalent to 'o?'. Please note that there is no space between commas and two numbers. Locator

The boundaries used to describe strings or words, ^ and $ respectively refer to the start and end of strings, / b Describe the front or rear boundaries of the word, / b represents a non-word boundary. You cannot use a qualifier for the locator.

select

Use parentheses to enclose all options, and use them between adjacent selection. However, there will be a side effect with parentheses, which is the associated match. Is it available at this time?: Placing this side effect before the first option. Among them?: It is one of the non-captured elements. There are two non-arrested elements. = And?! The location of the pattern is matched to match the search string, the latter is a negative forecast, and the search string is matched without matching the regular expression mode.

Backward reference

Adding parentheses on a regular expression mode or partial mode will result in associated matching to a temporary buffer, and each sub-match captured is stored in content from left to right in the regular expression mode. The buffer number of the storage sub-match starts from 1, continuous numbers up to the maximum 99 sub-expression. Each buffer can be accessed using a '/ n', where n is a one or two-digit decimal number identifies a particular buffer. You can use non-capture element characters '?:', '? =', Or '?!' To ignore the saving of related matches.

Operation priority of various operators

The same priority is calculated from left to right, and the calculations of different priorities are first low. The priority of various operators is as low as the following: operators describe / escape (), (? :), (? =), [] Parentheses and square brackets *, ,?, {N}, {n,}, {n, m} qualifiers ^, $, / anymetachacter location and order | "or" all symbolic explanation

Character Description / Tag the next character as a special character, or a primary character, or a backward reference, or an octave. For example, 'n' matches characters "n". '/ n' matches a newline. Sequence '//' match "/" "/ (" matches "(". ^ Match the input string of the start position. If the multiline property of the regexp object is set, ^ also matches '/ n' or '/ r' The next location. $ Match the end position of the input string. If the multiline property of the regexp object is set, $ also matches the position before '/ n' or '/ r'. * Match the previous sub-expression zero or multiple times For example, ZO * can match "z" and "zoo". * Equivalent to {0,}. Match the previous sub-expression once or more. For example, 'ZO ' can match "ZO" and "ZOO" However, it cannot match "Z". Equivalent to {1,}. • Match the previous sub-expression zero or once. For example, "Do (es)" can match "do" or "does" "" " Do ".? Is equivalent to {0,1}. {n} n is a non-negative integer. Match the N times. For example, 'o {2}' does not match" Bob "'o', but can Match two O. {n,} n is a non-negative integer. At least n times. For example, 'o {2,}' does not match 'O' in "Bob", but can match " All O.'o {1,} 'in fooood is equivalent to' o '.' o {0,} 'is equivalent to' o * '. {n, m} M and N are non-negative integers Where n <= m. Minimize n times and match M times. For example, "o {1, 3}" will match the top three O.'o {0, 1} 'in "foooood". 'o?'. Please note that there is no space between commas and two numbers.? When this character is tight in any other restriction (*, ,?, {n}, {n,}, {n, M}), when the matching mode is non-greedy. Non-greedy mode matches the search string as little as possible, and the default greed mode is as much as possible to match the search string. For example, for strings "OOOO ", 'O ?' Will match a single" O ", and 'o ' will match all 'o'. Match any individual characters other than" / n ". To match any character, including '/ n' Please use the mode of '[./n]'. (Pattern) matches Pattern and get this match. The acquired match can be obtained from the generated Matches collection, using the Submatches collection in VBScript, using $ 0 in JScript ... $ 9 properties. To match the bracket character, please make Use '/ (' or '/)'. (?: pattern) Match Pattern but does not acquire the matching result, that is, this is a non-acquired match, not to use it after storage.

This is useful to use the "or" character (|) to combine a pattern. For example, 'industr (?: Y | iES) is a smale of' Industry | Industries'. (? = pattern) Positive to check, match the lookup string at any string of Pattern. This is a non-acquisition match, that is, the match does not need to be used later. For example, 'Windows (? = 95 | 98 | NT | 2000)' Map "Windows" in Windows 2000, but does not match "Windows" in "Windows 3.1". It is not consumed by the character, that is, after a match occurs, start the next matching search immediately after the last match, not starting from the character containing the pre-check. (?! pattern) Negative review, match the lookup string at any string of Pattern. This is a non-acquisition match, that is, the match does not need to be used later. For example, 'Windows (?! 95 | 98 | NT | 2000) "can match" Windows "in Windows 3.1, but cannot match" Windows "in" Windows 2000 ". It is not consumed by the character, that is, after a match occurs, start the next matching search immediately after the last match, not the X | Y, which matches X or Y after the character containing the queue. For example, 'Z | Food' can match "z" or "food". '(z | f) OOD' matches "Zood" or "Food". [XYZ] Character collection. Match any of the included characters. For example, '[abc]' can match 'a' in "Plain". [^ XYZ] Negative character set. Match any of the characters that are not included. For example, '[^ ABC]' can match 'P' in "Plain". [A-Z] character range. Match any of the characters within the specified range. For example, '[a-z]' can match any lowercase alphabetic characters in the 'A' to 'Z' range. [^ a-z] Negative character range. Match any of any characters that are not within the specified range. For example, '[^ a-z]' can match any of any characters that are not in the 'A' to 'Z'. / b Match a word boundary, that is, the location of the words and spaces. For example, 'er / b' can match 'ER' in "Never", but do not match 'Er' in "Verb". / B matches non-word boundary. 'ER / B' can match 'Er' in "Verb", but cannot match 'Er' in "Never". / CX matches the control character indicated by x. For example, / cm matches a Control-M or an Enterprise. The value of x must be one of A-Z or A-Z. Otherwise, the C is treated as a primary 'c' character. / d Match a numeric character. Equivalent to [0-9]. / D Match a non-digital character. Equivalent to [^ 0-9]. / f Match a change page. Equivalent to / x0c and / cl.

/ n Match a newline. Equivalent to / x0a and / cj. / r Match a carriage return. Equivalent to / X0D and / cm. / s Match any blank character, including spaces, tabs, change page, and the like. Equivalent to [/ f / n / r / t / v]. / S Match any non-blank character. Equivalent to [^ / f / N / R / T / V]. / t matches a tab. Equivalent to / x09 and / ci. / v Match a vertical tab. Equivalent to / x0b and / ck. / w Match any word character that includes underscore. Equivalent to '[A-ZA-Z0-9_]'. / W Match any non-word characters. Equivalent to '[^ a-za-z0-9_]'. / XN matches n, where n is a hexadecimal escape value. The hexadecimal escape value must be a determined two numbers long. For example, '/ x41' matches "a". '/ x041' is equivalent to '/ x04' & "1". ASCII coding can be used in regular expressions. . / NUM matches NUM, where NUM is a positive integer. References to the acquired match. For example, '(.) / 1' matches two consecutive identical characters. / n identifies an octal escape value or a backward reference. If the sub-expression of at least n acquired before / N, N is backward reference. Otherwise, if n is an octal number (0-7), then n is an eight-input escape value. / nm identifies an octal escape value or a backward reference. If there is at least NM acquisition sub-expression before / nm, Nm is backward reference. If there is at least n acquisition before / nm, then n is a backward reference to the text M. If the previous conditions are not satisfied, if n and m are octal numbers (0-7), the / nm will match the eight-way escape value Nm. / Nml If n is an octal number (0-3), and M and L are eight-input numbers (0-7), match the eight-en-en-escaic value NML. / UN matches N, where N is a Unicode character represented by four hexadecimal numbers. For example, / u00A9 matches copyright symbol (?). Part of example

Regular expression description // b ([AZ] ) / 1 / b / gi a word continuous position / (/ w ): ([^ /:] ) (: / d *)? ([^ # ] *) / Resolution of a URL to protocol, domain, port, and relative path / ^ (?: chapter | section) [1-9] [0-9] {0, 1} $ / Location Section location / [- AZ] / a to z total 26 letters plus one - number. / Ter / B / Match Chapter, not Terminal // BAPT / Match Chapter, not APTITUDE / Windows (? = 95 | 98 | NT) / Match Windows 95 or Windows 98 or Windows, after finding a match, The next search match is started after Windows.

Reference: Regular expression http://www.soulogic.com/code/doc/regularexpressions/

转载请注明原文地址:https://www.9cbs.com/read-112925.html

New Post(0)