Regular expression

zhaozj2021-02-17  64

If there is no tax expression, it is not familiar with this term and concept. However, they are not so nice you imagine.

Recall how to find files on the hard disk. You will definitely use the * characters to help find the files you are looking for. • Characters match a single character in the file name, and * matches one or more characters. A pattern such as 'Data ?dat' can find the following file:

Data1.dat

Data2.dat

Datax.dat

Datan.dat

If you use * characters instead? The number of files found will be expanded. 'data * .dat' can match all the following file names:

Data.dat

Data1.dat

Data2.dat

Data12.dat

Datax.dat

Dataxyz.dat

Although this search file is certainly useful, it is also very limited. • The limited capacity of wildcards can make you have a concept of regular expressions, but the regular expression is more powerful, and more flexible.

Early origin

Regular expressions "ancestors" can have been traced back to an early study on how the human nervous system works. Two neur physiologists of Warren McCulloch and Walter Pitts have studied a mathematical way to describe these neural networks.

In 1956, a US mathematician called Stephen Kleene published an early working on McCulloch and Pitts, published a papers titled "Neural Network Emergencies", introduced the concept of regular expressions. Regular expressions are used to describe expressions he called "regular set algebra", so the term "regular expression" is used.

Subsequently, it is found that this work can be applied to some early studies using Ken Thompson's computing search algorithm, Ken Thompson is the main inventors of UNIX. The first practical application of the regular expression is the QED editor in UNIX.

As they said, the rest is a well-known history. Since then, until now the regular expression is an important part of the text-based editor and search tool.

Use regular expressions

In a typical search and alternative, the exact text to be found must be provided. This technique may be sufficient for simple search and replacement tasks in static text, but because it lacks flexibility, it is difficult to search for dynamic text, or even impossible.

Using regular expressions, you can:

Test a pattern of strings. For example, an input string can be tested to see if the string exists or a credit card number mode. This is called data validity verification. Replace the text. You can use a regular expression in the document to identify a particular text, then you can delete it, or replace it with another text. Extract a sub-string from the string based on the mode match. Can be used to find a specific text in the text or input field.

For example, if you need to search the entire Web site to delete some excessive materials and replace some HTML formatted tags, you can use the regular expression to test each file, see if there is a material or HTML you want to find in this file. Formatted tag. With this method, you can narrow the affected file range to those files that contain materials to be deleted or changed. You can then use the regular expression to delete the outdated material, and finally, you can use the regular expression again to find and replace those markers that need to be replaced.

Another example explaining the regular expression is a language that is not known for its string processing capabilities. Vbscript is a subset of Visual Basic, with rich string processing features. JScript similar to C does not have this capability. Regular expression gives JScript string processing capabilities brings significant improvements. However, it is possible to still use the regular expression in VBScript, which allows multiple string operations to be executed in a single expression.

Regular expression syntax

A regular expression is a text mode composed of normal characters (such as characters a to z) and special characters (called metammatics). This mode describes one or more strings to be matched when the text body is looking for. Regular expression As a template, a character mode matches the search string. Here are some regular expressions that may encounter:

JScriptvbscript Match / ^ / [/ T] * $ / "^ / [/ t] * $" matches a blank line. // D {2} - / d {5} / "/ d {2} - / d {5}" Verify that one ID number is composed of a 2-digit, a hyphen, and a 5-digit. /< (.*)>.* (*)>. * "matches an HTML tag.

The table below is a complete list of metamorphic and its behavior in the regular expression context:

Character Description / Tags the next character as a special character, or a primary character, or a backward reference, or an octave. For example, 'n' matches characters "n". '/ n' matches a newline. Sequence '//' match "/" "/ (" matches "(". ^ Match the input string of the start position. If the multiline property of the regexp object is set, ^ also matches '/ n' or '/ r' The next location. $ Match the end position of the input string. If the multiline property of the regexp object is set, $ also matches the position before '/ n' or '/ r'. * Match the previous sub-expression zero or multiple times For example, ZO * can match "z" and "zoo". * Equivalent to {0,}. Match the previous sub-expression once or more. For example, 'ZO ' can match "ZO" and "ZOO" However, it cannot match "Z". Equivalent to {1,}. • Match the previous sub-expression zero or once. For example, "Do (es)" can match "do" or "does" "" " Do ".? Is equivalent to {0,1}. {n} n is a non-negative integer. Match the N times. For example, 'o {2}' does not match" Bob "'o', but can Match two O. {n,} n is a non-negative integer. At least n times. For example, 'o {2,}' does not match 'O' in "Bob", but can match " All O.'o {1,} 'in fooood is equivalent to' o '.' o {0,} 'is equivalent to' o * '. {n, m} M and N are non-negative integers Where n <= m. Leverage N times and matched M times. Liu, "O {1, 3}" will match the top three O.'o {0, 1} 'in "foooood" 'o?'. Please note that there is no space between commas and two numbers.? When this character is tight in any other restriction (*, ,?, {n}, {n,}, {n, M}), when the matching mode is non-greedy. Non-greedy mode matches the search string as little as possible, and the default greed mode is as much as possible to match the search string. For example, for strings "OOOO ", 'O ?' Will match a single" O ", and 'o ' will match all 'o' .. Match any individual characters other than" / n ". To match any characters including '/ n' Please use the mode of '[./n]'. (Pattern) matches Pattern and get this match. The acquired match can be obtained from the generated Matches collection, using the Submatches collection in VBScript, using $ 0 in JScript ... $ 9 properties. To match the bracket character, use '/ (' or '/)'. (12 :Pattern) match Patte Rn but does not acquire the matching result, that is, this is a non-acquired match, not for storage for storage. This is useful to use the "or" character (|) to combine a pattern.

For example, 'industr (?: Y | iES) is a smale of' Industry | Industries'. (? = pattern) Positive to check, match the lookup string at any string of Pattern. This is a non-acquisition match, that is, the match does not need to be used later. For example, 'Windows (? = 95 | 98 | NT | 2000)' Map "Windows" in Windows 2000, but does not match "Windows" in "Windows 3.1". It is not consumed by the character, that is, after a match occurs, start the next matching search immediately after the last match, not starting from the character containing the pre-check. (?! pattern) negotiation, match the lookup string at any string of any mismatch at any Point WHERE A STRING NOT MATCHING POINT WHERE A STRING NOT MATCHING PATTERN. This is a non-acquisition match, that is, the match does not need to be used later. For example, 'Windows (?! 95 | 98 | NT | 2000) "can match" Windows "in Windows 3.1, but cannot match" Windows "in" Windows 2000 ". It is not consumed by the character, that is, after a match occurs, start the next matching search immediately after the last match, not the X | Y, which matches X or Y after the character containing the queue. For example, 'Z | Food' can match "z" or "food". '(z | f) OOD' matches "Zood" or "Food". [XYZ] Character collection. Match any of the included characters. For example, '[abc]' can match 'a' in "Plain". [^ XYZ] Negative character set. Match any of the characters that are not included. For example, '[^ ABC]' can match 'P' in "Plain". [A-Z] character range. Match any of the characters within the specified range. For example, '[a-z]' can match any lowercase alphabetic characters in the 'A' to 'Z' range. [^ a-z] Negative character range. Match any of any characters that are not within the specified range. For example, '[^ a-z]' can match any of any characters that are not in the 'A' to 'Z'. / b Match a word boundary, that is, the location of the words and spaces. For example, 'er / b' can match 'ER' in "Never", but do not match 'Er' in "Verb". / B matches non-word boundary. 'ER / B' can match 'Er' in "Verb", but cannot match 'Er' in "Never". / CX matches the control character indicated by x. For example, / cm matches a Control-M or an Enterprise. The value of x must be one of A-Z or A-Z. Otherwise, the C is treated as a primary 'c' character. / d Match a numeric character. Equivalent to [0-9]. / D Match a non-digital character. Equivalent to [^ 0-9]. / f Match a change page.

转载请注明原文地址:https://www.9cbs.com/read-29984.html

New Post(0)