Regular expression (1)

zhaozj2021-02-16 167

First, introduction

Regular expression of this noun, I believe that many people have heard that this noun originated in 1956, a US mathematician called Stephen Kleene published a title based on the early work of McCulloch and Pitts. The paper of the incident, introduced the concept of regular expressions. Regular expressions are used to describe expressions he called "regular set algebra", so the term "regular expression" is used.

Subsequently, it is found that this work can be applied to some early studies using Ken Thompson's computing search algorithm, Ken Thompson is the main inventors of UNIX. The first practical application of the regular expression is the QED editor in UNIX.

Q: Regular expression, what can we do for us?

A: An important part of the text-based editor and search tool. Regular expression allows users to build a matching mode by using a series of special characters, then compare the matching mode with data files, program input, and web pages, whether or not to include matching mode in the comparison object, perform corresponding program of.

Let's introduce the use of regular expressions in one step by step.

Second, the initial contact regular expression

Let's first understand some of the basic concepts of regular expressions. Regular expression as a representation language that defines its own set of descriptions to describe a wide variety of character classes. The following is taken below a paragraph in the MSDN. (MS-Help: //ms.vscc/ms.msdnvs.2052/cpgenref/html/cpconcharacterclasses.htm)

Character escape table

Character class

meaning

Match any character other than / N. If you modify via the Singleline option (see the regular expression option), the description character matches any character.

[aeiou]

Match with any individual characters contained in the specified character set.

[^ aeiou]

Match with any single character in the specified character set.

[0-9A-FA-F]

Use a linked font size (-) to allow the specified continuous character range.

/ p {name}

Match any of the characters in the name character class specified by Name. The supported name is a Unicode group and block range. For example, LL £ ¬nd £ ¬z £ ¬ISGREEK £ ¬isboxdrawing.

/ P {name}

Text matching the text that is not included in the group and block range specified in {Name}.

/ w

Match with any word character. Equivalently Unicode Character Class [/ P {LL} / P {lu} / p {lt} / p {lo} / p {nd} / p {pc}]. If you specify a behavior that meets ECMAScript by the ECMAScript option, / W is equivalent to [A-ZA-Z_0-9].

/ W

Match with any non word character. Equivalent to Unicode category [^ / p {ll} / p {lu} / p {lt} / p {lo} / p {nd} / p {pc}]. If you specify a behavior that meets ECMAScript via the ECMAScript option, / W is equivalent to [^ A-ZA-Z_0-9].

/ s

Match with any blank character. Equivalent to Unicode character category [/ f / n / r / t / v / x85 / p {z}]. If you specify a behavior that meets ECMAScript by the ECMAScript option, / s is equivalent to [/ f / n / r / t / v].

/ S

Match with any non-blank character. Equivalent to Unicode character category [^ / F / N / R / T / V / X85 / P {z}]. If you specify a behavior that meets ECMAScript by the ECMAScript option, / s is equivalent to [^ / f / N / R / T / V].

/ d

Match with any decimal number. Like Unicode / P {nd} and non-Unicode's [0-9], and ECMAScript behavior.

/ D

Match with any non-numeric. Like Unicode / P {ND} and non-Unicode's [^ 0-9], and ECMAScript behavior. The above table lists, the most basic syntax definition in the regular expression, understands this, we can define some simple rules, for example:

1. Match all characters

Of course, don't write anything (@ _ @)

2. Match all English characters

a) / w

b) [A-ZA-Z_0-9]

3. Match ten credit numbers

a) / d

b) [0-9]

Look at the example, is it very simple, but so far, the rule written in this, there is a big defect, that is, there is no number of matching characters?

Q: I want to match characters to 5 English letters

A:? ? ?

Light understands the above knowledge is that this L is unable to solve this. How do I solve this problem in the regular expression? Let's see the following table:

(MS-Help: //ms.vscc/ms.msdnvs.2052/cpgenref/html/cpconquantifiers.htm)

Limit table

Default

Description

Specifies zero or more match; for example / w * or (abc) *. The same as {0,}.

Specify one or more matchs; for example / w or (abc) . The same as {1,}.

Specify zero or one match; for example / w? Or (abc)? The same as {0, 1}.

{n}

Specifies just n matching; for example (PIZZA) {2}.

{n,}

Specifies at least n matching; for example (ABC) {2,}.

{n, m}

Specifies at least n but not more than M matching.

Specifies to use repeated first matching as little as possible.

Specifies to use repeated but at least once as possible (Lazy ) as possible.

Specifies to use zero repetition (if possible) or repetition (lazy?).

{n}?

Equivalent to {n} (lazy {n}).

{n,}?

Specifies to use repetition as little as possible, but at least N times (lazy {n,}).

{n, m}?

Specifies between N times and M times and uses it as little as possible (Lazy {N, M}).

Listed in the above table, the regular expression of the regular expression, with the use of these characters, we can easily write more powerful regular expressions.

E.g:

1. Match zero or multiple all characters

2. Match one or more characters

3. Match zero or multiple English characters

/ w *

4. Match one or more English characters

[A-ZA-Z0-9]

5. Match 3 decimal numbers

/ d {3}

6. Match at least 3 decimal numbers

/ d {3,}

7. Match 3 to 6 decimal numbers

/ d {3, 6}

Now we can answer the above question:

Q: I want to match characters to 5 English letters

A: / w {5}

Very happy, we have solved the above problems, but new problems are always constant. How do I limit the matching character?

Q: I want to match the string starting with DOC

A:???

In order to solve this problem, let's take a look at this table:

(MS-Help: //ms.vscc/ms.msdnvs.2052/cpgenref/html/cpconatomiczero-widthassertions.htm)

Atomic zero width assertion

assertion

Description

Specifies that the match must appear on the beginning or row of strings. For more information, see the Multiline option in the regular expression option.

The specified match must appear in the following position: the end of the string, the end of the string / N or the end of the line. For more information, see the Multiline option in the regular expression option.

/ A

Specifies that the match must appear in the beginning of the string (ignore the multiline option).

Specifies that the match must appear before / n at the end of the string or the end of the string (ignore the multiline option).

Specifies that the match must appear on the end of the string (ignore the multiline option).

/ G

The specified match must appear in the current search (this location is usually the first character after the last search end position). For example, consider a series string consisting of separate character groups, where each set is n characters. When searching in each character group, if the regular expression is matched in the character position such as 0, n, 2n, 3n, the regular expression is successful. It will be successful only when the match appears on the positioning group boundary.

/ B

Specifies that the match must appear on the boundary between / w (alphanumeric) and / w (non-alphanumeric) characters. Matching must appear on the word boundary, that is, appearing on the first or last character in the word separated by the space.

/ B

Specifying match must not appear on the / B boundary.

I believe everyone noticed that the first assertion characters in this table are @ _ @.

For example, ^ specifies the current position at the beginning of the row or string. Therefore, the regular expression ^ ftp will only return the match item of the string "FTP" that appears at the beginning of the row.

It seems that the problems encountered above, can solve it, let us solve the above problem:

Q: I want to match the string starting with DOC

A: ^ DOC

We initially understand what is the regular expression, which has been known for its most basic syntax, as warm-up @ _ @, next, only officially entered the topic, we will be in depth from the second article to discuss the use of regular expressions.

转载请注明原文地址:https://www.9cbs.com/read-28174.html

9cbs

New Post(0)