Perl regular expression explanation

xiaoxiao2021-03-06 53

9.3.1 Principles 1

Regular expressions have three forms: match, replacement, and conversion.

There are three regular expression operators listed in Table 9-1.

Next, each expression is given a detailed explanation.

Match: m / / This form indicates that the internal regular expression will be used to match = ~ or! ~ Left. For sympathetic use / /, slightly M.

Replacement: S / / / This form indicates that the regular expression will be replaced by text , for syntax, simplified / / slightly s.

· Conversion: TR / / / This form includes a range of characters - / - Also replaces them with .

CAUTION Conversion is not truly a regular expression, but it is often used for data that is difficult to process with regular expressions. Therefore, TR / [0-9] / 9876543210. Composition 1223456789, 987654321 and other strings.

These expressions are bundled to the scalar by using = ~ (in English: does, with "to match" the same) and! ~ (English: doesn't, "mismatch" same). As an example of this type, we will give six example regular expressions and corresponding definitions:

$ SCALARNAME = ~ S / A / B; # Substitution The Character a for b, and return true if this can happern

$ SCALARNAME = ~ m / a; # does The Scalar $ SCALARNAME HAVE AN A IT?

$ SCALARNAME = ~ TR / A-Z / A-Z /; # Translate All Capital Letter with Lower Case Ones, And Return Turei this Happens

$ SCALARNAME! ~ S / A / B /; # Substitution The Character a for B, And Return False if this indeed happens.

$ SCALARNAME! ~ M / A /; # Does The Scalar $ SCALARNAME MATCH The Character a? Return False if it does.

$ SCALARNAME! ~ TR / 0-9 / A-J /; # Translate The Digits for the Letters A Thru J, AND RETURN FALSE IF this Happens.

If we enter code like Horned toad = ~ M / TOAD / this, Figure 9-1 shows:

Also, if the reader is matched to a specific variable $ _ (the reader may be used in the While loop, MAP or GREP), it can be not available! ~ And = ~. Thus, all of the following code will be legally:

My @Elemente = ('Al', 'A2', 'A3', 'A4', 'A5');

Foreach (@Elements) {s / a / b /;}

The program makes @Elements equal to B1, B2. B3, b4, b5. In addition:

While (<$ fd>) {print if (m / erbor /);} Print all rows that contain Error strings:

IF (GREP (/ Pattern /, @Lines)) {print "The variable / @ lineshas pattern in it! / n";}

Print all rows that contain mode Pattern content, which is directly introduced into the next principle.

9.3.2 Principle 2

Regular expressions are only matched on the scalar.

Note the importance of scalar here, if the reader tries the following code:

@ArrayName = ('variablel', 'variable2');

@ArrayName = ~ m / variable /; # ods for 'variable' in the array? no! users Use Grep Instead

Then @ArrayName matches unsuccessful! @ArrayName is interpreted by Perl 2, so this means that the reader is entering:

'2' = ~ m / variable /;

At least this does not give expectations. If the reader wants to do this, the loses are:

GREP (m / variable /, @ArrayName);

This function is looped by each element in @ArrayName, returns the number of times (in the scalar environment), and returns the actual list of matching elements in an array environment.

9.3.3 Principle 3

For a given mode string, the regular expression only matches the earliest match. You only match or replace it once when you match.

This principle uses the process called "back" to indicate how to match a given string; if a partial matching is found, it is found to make this match invalid, the regular expression "back" in the string is the minimum number, This quantity of characters must ensure that no match is lost.

What is the most helpfulness of understanding regular expressions, this principle is the most helpful one, and it doesn't need to understand what it is doing as Perl. Assume that there is a mode:

'Silly People Do Silly Things if in silly moods'

At the same time, I want to match the following mode: '

'Silly Moods'

So, the regular expression engine matches Silly, then encounters the PEOPLE P, and the regular expression engine knows that the first Silly does not match, and the regular expression engine moves to P and continues to seek match. It then encounters the second silly, so come to match MOODS. However, the letter T (in thing), so that the THINGS is moved to THINGS, continue to match. When the engine encounters the third Silly and try to match MOODS, the match is successful, and the match is finally completed. The situation happens is shown in Figure 9-2.

It will become more important when we encounter a wildcard. If there are several wildcards in the same regular expression, and all wildcards are intertwined together, then there is a sick situation, in this case, the trace is very expensive. See the following expression::

$ line = m / expression. * be. * Very. * Expensive. * /

* Represents a wildcard, which means "matching any character (except for restrictions) zero or multiple times." This process may take a long time; if you have possible match at the end of the unpacked string, the engine will be crackled back. For more information on this, please pay attention to the principle of wildcards.

If the reader finds that the context is similar to the above, the wildcard needs to decompose the regular expression into a small work. In other words, simplify your regular expressions. 9.3.4 Principle 4

Regular expressions can handle any and all of the characters that can be processed by a dual quotation string.

At the S // Operator (S / * / /), or the first partition area of the m // operator m / * /, the entry located is actually treated as a double quotation string (with some additional Additional functions, special regular expression characters on behalf of the name! Readers can use them to interpolate:

$ Variable = 'Test'; $ a = ~ m / $ {variable} aha /;

with:

$ a = "$ {variable} aha";

Both points to the same string: the former is homoked in $ A Testaha. The latter sets $ a as a string Testaha. Since the regular expression processes each character of the dual quotular string processing, the following operations can be performed:

$ expression = 'hello';

@ArrayName = ('Elem1', 'ELEM2');

$ Variable = ~ m / $ expression /; # this equals m / hello /;

Here, we simply extends the $ expression to Hello to get m / hello /. This trick can also be used in arrays:

$ Variable = ~ m / @ arrayname /; # this equals m / elem1 elem2 /;

Here, the expression is equivalent to M / ELEM1 ELEM2 /. If the special variable is set to |. The expression will be equivalent to M / ELEM | ELEM2 /, as we see, it matches the ELEM or ELEM2 of the string. This method can also be applied to special characters:

$ Variable = ~ m // x01 / 27 /; # match binary character X01, AND

# Octal Character 27.

$ Variable = ~ s // t / t / t //; # Substitution Three Tabs for Three Spaces.

In fact, in addition to a few exceptions discussed herein, Perl processing is the same as in double quotes in the M / / process. But there are exceptions: some characters that have a clear meaning of the regular expression engine. So what happens if you want to match similar to a front roller (/) or garden (())? These characters have special significance to regular expressions: thus cannot use the following statement:

$ variable = ~ m // usr / local / bin /; # matches / usr / local / bin? no! Syntax Error

Because Perl will use the / interpret the end tag of the regular expression. There are three methods to match the methods similar to the above special characters. The first method is to use the "escape" to "escape" to match any special character, including a backslash. Therefore, the example just given can be variable:

$ PATH = ~ m /// usr // local // bin /;

The program tries to match / usr / local / bin in $ PATH. The second method is to use a different regular expression character. If there is a lot of characters to match, then use a backslash will become very ugly (path characters are especially good).

Fortunately, Perl is based in a synthetic form. Because the reader needs to give each / to the slash when the reader inputs M / / or S //, the regular expression allows the reader to change the order of the regular expression to any of its favorite characters. For example, we can use double quotes (") to avoid a large number of backslashes:

$ Variable = ~ m "/ usr / local / bin"; # Note The Quotation Marks. $ variable = ~ m "/" Help / ""; # if you are going to match quotation

# Marks, You NEED To BACKSLASH THEM here. (As per / ")

$ Variable = ~ S "$ Variable" $ VARIABLE "; # works in s /// Too.

For good original intentions, we used this approach to the first few chapters of this book. If "as the regular expression character as the reader, it acts as a good memory method when used, remember that it is actually the reverse insertion of the string; otherwise, the quotation marks are far less than slash. .

Perl allows for {} () [] to write regular expressions:

$ Variable = ~ M {this works well with vi or emacs Because the Parens Bounce}

$ Variable = ~ M (this Also Works Well);

$ Variable = ~ s (Substitution Pattern) {for this pattern} sg;

This principle is very convenient to handle multi-line regular expression. Since there is no parentheses here, the reader can start to treat the expression as a "micro function" (if the reader has a reasonable intelligent editor like Emacs or Vi), in other words, the reader can start at the beginning of the expression Between the end of the end.

The third method is to use the function quotemeta () to move to a slope. If you enter the following code:

$ Variable = ~ M "$ SCALAR";

Then $ scal will be inserted and converted to a value. Here is a warning: Any special character will be affected by the regular expression engine and may cause a syntax error. Therefore, if the scale is:

$ scal = "({";

Then enter the following code:

$ VARIABIE = ~ M "$ SCALAR";

It is equivalent to saying: $ variable = ~ m "({", and this is a runtime syntax error. If the expression is as follows:

$ SCALAR = quotemeta ('({');

Then the expression will make the $ SCALAR to / (/ {, and replace $ SCALAR to:

$ Variable = ~ M "/ (/ {";

This can match the string ({) that can be matched to the reader.

9.3.5 Principles 5

Regular expressions are generated in the process of evaluating: the result status and reverse reference.

Get each time the regular expression is evaluated:

. Indicates the number of times the regular expression matches the string (resulting state).

. If you want to save a portion, there is a-series called a variable called reverse reference.

Next, let us learn them in turn:

1. Result

The result status indicates the number of regular expressions match characters. The result of obtaining the results is to see the value of regular expressions in the scalar environment. All of the following examples use this - result variable.

$ pattern = 'simple always simple';

$ Result = ($ pattern = ~ m "simple");

Here, Result is 1 because mode Simple is located in Simple Always Simple. Similarly, given Simple Always Simple: $ Result = ($ PATTERN = ~ M "Complex");

The result will be empty because Complex is not a sub-string of Simple Always Simple, then:

$ Result = ($ pattern = ~ s "Simple" complex ");

Make Result 1 because the SIMPLE is replaced with Complex success. Further:

$ pattern = 'simple simple';

$ Result = ($ pattern = ~ s "Simple" Complex "G);

The situation becomes more complicated. Here, $ Result is 2, because SIMPLE in Simple Always Simple occurs twice, and the regular expression of the G modifier is used, which means "matching the possibility of matches". " (To refer to the modifier behind this chapter). Similarly:

$ pattern = 'simple still';

IF ($ pattern = ~ m "simple")

{

Print "matched! / n";

}

Use $ pattern = ~ m "simple" in the IF clause, and the clause tells Perl, if the mode $ pattern contains SIMPLE, print matched!

2. Reverse reference

The reverse reference is a bit complicated. If you want to save some matching, then for this purpose, Perl has an operator (parentheses ()) that can be used to surround a series of given characters that readers want to match.

In the regular expression, it is a pattern to tell the interpreter "Hey, I want to save that data." The Perl interpreter requests and saves the match to the variable of a series of Tu beads ($ 1) $ 2, $ 3 ... $ 65536) These variables can be used to query the first, second, third, etc., Which can then seek regular expressions by viewing the corresponding variable or in an array environment. Value and access. E.g:

$ text = "this matches 'this' NOT 'THAT'";

$ TEXT = ~ M "('TH ..')";

Print "$ 1 / n";

Here, the word willow HIS is printed - Perl has saved them in $ 1, and then print $ 1 later. However, this example reveals more content, for example:

1) Wildcard (character point (.) Matches any character). If this is not in the string, the mode (TH ..) will be pleased

That.

2) Regular expression matches the first mode that appears on a row. This is matched because it appears first. At the same time, by default REGEXP behavior, this will always match the first string. (You can change the default value by modifier, and details will be described later).

Figure 9-3 shows how this matching process is performed.

Each parentheses in Figure 9-3 runs with their own digital variables.

There are more examples here:

$ TEXT = 'this is an example';

($ EXAMPLE, $ BACKREFERENCES) = ($ text = ~ m ". * (backreference)") "); here is used here to separate two text strings $ EXAMPLE and $ backreferences. These strings are placed in $ 1 and $ 2, and then immediately assign $ EXAMPLE and $ BACKREFERCENCES. This process is illustrated in Figure 9-4.

However, it should be noted that the process assigned to $ EXAMPLE and $ BACBREFERENCE is only when the text string matches. $ Example and Backreferences are empty when the text string does not match. Here there is a better example, this example is included in the IF statement, print $ 1 and $ 2 only when matching.

IF ($ TEXT = ~ M "(Example). * (back)")

{

Print $ 1; #prints 'example' - Since The First Parens Match The Text EXAMPLE.

Print $ 2; # prints 'back' - Since the second parens match the text back

}

This way, if the regular expression does not match what will happen? If you use the following mode:

$ TEXT = 'this is an example';

$ text = ~ s "(examplar). * (back)" Doesn't Work ";

Print $ 1;

$ 1 cannot be assigned due to the regular expression matching. More importantly, Perl will not tell the reader that it does not give $ 1. The last example shows the two-point reproduction content about the regular expression:

1) Regular expression is "either all either no" processing, just because the Back string can match in the mode, so:

This is an example of backreference '

It does not mean that the entire expression is matched. Because ExemPlar is not in the string, the replacement failed.

2) If the regular expression fails, the reverse reference cannot be assigned. Therefore, it is not possible to definitely print anything. When tracking logic issues, this is the reason for people; and often Perl Gotcha. $ 1 is just a regular variable and (the opposite to the Perl syntax) If the regular expression fails, the reverse reference is not set to "blank". Some people think this is a flaw, but some people think this is a feature. However, it becomes very obvious when the following code is analyzed.

1 $ a = 'bedbugs bite';

2 $ a = ~ m "(BEDBUG)"; # sets $ 1 to be bedbug.

4 $ b = 'this is nasty';

5 $ b = ~ m "(Nasti)"; # does not set $ 1 (Nasti is not in 'this is nasty').

6 # but $ 1 is still set to bedbug!

7 Print $ 1; #prints 'bedbug'.

In this case, $ 1 is a string Bedbug because the matching of the 5th line failed! If you want to get Nasti, ok, it is your own problem. This perListive behavior may make people measures. Considering yourself be careful.

3. General constructance using reverse reference

If you want to avoid this normal defect (reader wants to get a match, but not get and end with the front match to replace it), as long as the reverse reference is given to the variable, only one of the following three constructors: 1) Short-circuit method. Verify match, if the match occurs, and only use '&&' at this time, for example:

($ SCALARNAME = ~ M "(Nasti)") {$ matched = $ 1;}

2) IF clause. Put the match in the IF clause, if the IF clause is true, and only at this time is assigned to the pattern.

IF ($ SCALARNAME = ~ M "(Nasti)") {$ matched = $ 1;}

Else {Print "$ SCALARNAME DIDN't Match";

3) Direct assignment. This is always taken by using the regular expression directly to a value.

($ MATCH1, $ MATCH2) = ($ SCALARNAME = ~ M "(regExp1). * (regexp2)");

The matching code of all the readers should look similar to one of the three examples described above. Missing these forms, then the encoding is performed without security assurance. If the reader never wants this type of error, then these forms will save a lot of time.

4. Use reverse references in regular expressions

When you want to use S "" Operator or M "" operator to match some complex modes, Perl provides useful features that readers should realize. This feature is that the reverse reference can be used for regular expressions themselves. In other words, if a set of characters can be hosted with parentheses, you can use a reverse reference before the regular expression. If you want to use the reverse reference in the second part (with underscore), So you have to use grammar $ 1, $ 2, etc. If you want to use the reverse reference in the first part of M "" or S "" ", use the syntax / 1/2. Here are some examples:

$ String = 'Far Out';

$ String = ~ S "(FAR) (out)" $ 2 $ 1 "; # this makess string 'out far'.

In this example, we only convert the word far out to OUT FAR.

$ String = 'Sample EXAMPLES'

IF ($ String = ~ M "(amp ..) EX / 1") {print "matches! / n";}

This example is a bit complicated. The first mode (AMP ..) matches the string ample. This means that the entire mode becomes a string ample example, where the underlined text corresponds to / 1. Therefore, mode match is Sample Examples.

Below is the same style more complex example;

$ String = 'bballball';

$ String = ~ S "(b) / 1 (a ...) / 1/2" $ 1 $ 2 ";

Let's take a look at this example in detail. This example is completed, but the reason is not too obvious. There are five steps to match this string:

1) The first B matched the beginning of the string in parentheses, then store it in / 1 and $ 1.

2) / 1 The second b in the string is then matched because the second character happens to B.

3) (a ..) Matching the string all and is present in / 2 and $ 2.

4) / 1 Match the next B.

5) Because / 2 is equal to ALL, the next one is matched and the last three characters (all).

Put them together to get regular expressions match Bballball, or the entire string. Since $ 1 is equal to 'b', $ 2 is equal to ALL, the entire expression: $ String = 'bballball';

$ String = ~ S "(b) / 1 (a ..) / 1/2" $ 1 $ 2 ";

(In this example) Convert to the following code:

$ String = ~ S "(b) B (all) ball" ball ";

Or use the juh, replace the ball with Bballball. '

Regular expressions look like Figures 9-5.

S "" "There are some complex reverse references. If the last example is understood. Then the reader is in front of how the regular expression of Perl is far away. Reverse references may be more worse. .

5. Nested reverse reference

Nested reverse reference For complex difficulty in single order (a string follows the other string), the string effect is obvious. For example, the following expression:

M "((aaa) *)";

Use * to match the AAA: Matching ", AAA, AAAAA, AAAAAAA. In other words, Perl matches a plurality of 3A mode. But this mode does not match the AA. Assume that you want to match the following string :

$ String = 'Softly Slowly SURELY SUBTLY';

Then use the regular expressions below after the nesting garden parentheses, the following:

$ String = M "((S ... LY / S *) *)"; # Note Nested Parens.

In this example, the outermost parentheses captures all strings: Softly Slowly SURELY SUBTLY. The innermost parentheses captures a combination of strings, which is formed with LY, and LY is formed by LY and LY is formed. Therefore, the regular expression first captures SURELY, throws it, and then captures Slowly, letting it, then capture SURELY, and finally capture Subtly. There is a problem here, what is the order of reverse reference? Readers may be easily confused on this issue. Is the outer parentheses appear first, or the inner rinc-nest number appears first? The simplest solution is to remember the following three principles:

1) In the expression, the smaller the counter reference number, the corresponding reverse reference number. E.g:

$ VAR = ~ M "(a) (b)";

In this example, the reverse reference (a) becomes $ 1, (b) becomes $ 2.

2) A reverse reference If it contains a wider range, its reverse reference number is smaller. E.g:

$ VAR = ~ m "(c (a (b) *) *)";

In this example, the reverse references contain all of the content (M "(c (a (b) *)") becomes $ 1. There is an expression M nested inside (c (c (b) *) * "be $ 2. In (M "(c (c (a (b) *)"), the nested expression is $ 3.

3) In the case of two rule conflicts, the rule 1 is prioritized. In statements $ VAR = ~ m "(a) (b (c))", (a) becomes $ 1, B (c) is $ 2, (c) becomes $ 3.

Thus, in this example, (s ... LY / S *) * becomes $ 1, (s ... LY / S *) * becomes $ 2.

Note that there is another problem here. Let us return to the complex regular expression of the beginning of the beginning:

$ String = 'Softly Slowly SURELY SUBTLY' $ String = M "((S ... LY / S *) *)"; # Note Nested Parens.

What is this (s ... ly / s *) * match? It matches multiple strings; the first is Softly, then Slowly, then SURELY, and finally Subtly. Since (s ... ly / s *) * matches multiple strings, Perl will abandon the first match and make $ 2 into subtly.

Even these rules, nesting parentheses may still cause confusion. The best thing to do is practicing. The regular expression is re-implemented again with the different combinations of these logic and then handed over to the Perl interpreter. This allows the reader to understand that the reverse reference is explained by the Perl interpreter in what order.

9.3.6 Principles 6

The core of the ability of the regular expression is the wildcard and multiple matching operators. Wildcard operators allow multiple characters in the string. If binary data is being processed, the wildcard matches a series of characters. Multi-match operators can match zero, one or more characters. For the basis of explaining Perl, the examples we use so far are inspirated, but the function is not very powerful. In fact, the done may use the C subroutine to complete any of them. The powerful feature of the PERL regular expression collection comes from the multi-mode capabilities that match the text, (ie: describes many non-directed data modes by the logic "quick notger" mentioned above). Perl just provides the best shortprint.

1. Wildcard

Wildcard represents a character class. He didn't have the following strings, but I don't know if they write:

. Kumquat

. Kristina

. Kentucky

. Key

. Keeping

In this case, the following Perl expressions will match the first character of each word:

[KK]

This is an example of a character class. All wildcards in Perl can be represented by parentheses [and put the characters you want to match in parentheses in parentheses] This method is represented. The previous wildcard tells the regular expression engine "Ok, I am looking for" k "or" r "here. If one of the two is found, it matches it." Below is another example of using wildcards:

$ SCALARNAME = 'This Has A Digit (1) in it';

$ SCALARNAME = ~ M "[0-9]"; # this matches any character Between 0 and 9, That is matches Any Digit.

$ SCALARNAME = ~ 'This Has A Capital Letter (a) in it';

$ SCALARNAME = ~ m "[a-z]"; # this matches any capital letter.

$ SCALARNAME = ~ "this does not match, since the letter after the string 'an' is an a"

$ SCALARNAME = ~ M "AN [^ a]";

The first two examples are quite intuitive, [0-9] matches the number 1 in IT in IT in IT. [A-Z] Match the uppercase character A in this Has Acapital Letter (a) in IT. The last example is slightly skillful, because there is only one AN in this mode, so the characters that may be matched only have the last four characters, namely AN A.

However, by inquiry mode AN [^ A] We have clearly told the regular expression to match A, then N, space, the last one is non-A character. Thus, there is no match in this example. If a given mode is Match An a not an e, then the match will be completed, because the first AN is skipped, the second match is matched! Just as the following example: $ SCALARNAME = "This Has A Tab () OR A Newline In It So it matches

$ SCALARNAME = ~ m "[/ t / n]" # Matches Either a Tab OR a newline.

# Matches Since The Tab Is Present.

This example shows some interesting things that can be made with matching and wildcards. First, the reader already in "" "" "" "" "" "string can also be inserted in the regular expression and the character class represented by parentheses ([T / N]). Where "/ T" matches tab, "/ N" matches the change line.

Second. If the reader places one ^ in [], the wildcard will match the characters in the non-character group. Similarly, if you are placed in [] -, you can match a given range (all numbers in this example [0-9]. All uppercase letters ([AZ]). These operators can also be merged Also quite special wildcard:

$ A = ~ m "[a-fH-z]"; # matches any lowercaes letter * except * g.

$ a = ~ m "[^ 0-9a-za-z]"; # matches any nonword character. (i.e., not

# A character in 0-9, a-z or a-z)

$ a = ~ m "[0-9 ^ a-za-z]"; # a misteake, does not

# Equal the Above. Instead matches 0-9,

$ A = ~ m "[/ n]"; # matches a space character: tab, newline or blank).

Important places to be a third example, insertion markers in [0-9 ^ A-ZA-Z] are inserted marks on a literal, rather than representative, because it appears in the middle of the character class . Therefore, if the reader wants to get a negative character class. Then you always put the insertion mark in []. Don't forget to use []. If the reader has forgotten [], the resulting will be a literal text string instead of a character class.

(1) Public wildcard

It happens that some wildcards are common; when the reader wants to match a number each time, you may not be willing to enter a code similar to [0-9] each time. For those situations, Perl has several convenient shortcuts that make the programming work easily after use. Below are these edge acquisitions and their representative meaning and the characters corresponding to them:

. / d - matching number (character combination [0-9]).

. / D - matching is not numbered (character combination [^ 0-9]).

. / W - Match word characters (character combination [A-ZA-Z0-9 _]) (here next to line calculate a word character).

. / W - Matching non word characters (character combination [^ a-za-za-za-z0-9_]).

. / s - Match the space character (character combination [/ t / n]) (tab, newline, space).

. / S - matching non-space character (character combination [/ t / n]).

. - Match any character (in some cases) except for the charm (character combination [^ / n]), when entering M "(. *)", You can match any character. See the modifiers behind this chapter. . $ - Although it is not a wildcard (it does not match any specific character). But it is a widely used character; if it places it in the tail of the regular expression, it matches the "row". Zero width assertion.

. ^ - Although it is not a wildcard, it is a special character that matches the "lead" if it is at the beginning of the regular expression. Zero width assertion.

. / b, / b- with $ and ^ the same; do not match characters, but match the word boundary (/ B) or match the no single periphery (/ b). Zero width assertion.

The first point noted from the table is the "point" wildcard (.). It is often used with multiple matching operators to act as a fill in the entries. Please see the following:

$ a = 'now is the time for all good men to come to the aid of their party;

$ A = ~ m "(now). * (party)"; # matches, since '.' matches any

Character Except Newline

And '*' means match zero or more character.

* Capture all characters in the middle of NOW and PartY, matching is successful. ("All" in this environment means "zero or more, as many as possible". This is the so-called greediness (Greediness); talk about it when we talk about multiple matching hours.)

Here are some other examples of wildcards. Note that we use a single string (this is a simple method for test expressions) on the left side of = ~.

1 '1956.23' = ~ m "(/d ) /. (/D )"; # $ 1 = 1956, $ 2 = 23

2 '333E 12' = ~ m "(/ d )"; # $ 1 = 'E '

3 '$ hash ($ value)' = ~ m "/ $ (/ w ) {/ $ (/ w )}"; # $ 1 = 'hash', $ 2 = 'Value'

4 '$ hash ($ value)' = ~ m "/ $ (/ w ) {(/ w) * (/ w ) (/ w *)}"; # $ 1 = '$', $ 2 = 'hash',

# $ 3 = '$', $ 4 = 'Value'

5 'variable = value' = ~ m "(/ w ) (/ s *) = (/ s *) (/ w )"; # $ 1 = 'variable', # $ 2 = '',

# $ 3 = '', $ 4 = 'Value'

6 'Catch as catch can' = ~ m "^ (. *) CAN $; # $ 1 = 'catch as catch'

7 'can as catch catch' = ~ m "^ CAN (. *) $ # $ 1 = 'as catch catch'

8 'Word_With_underlines Word2' = ~ M "/ B (/ W ) / B; # $ 1 = Word_With_underlines Each example, we use a different wildcard, in the program, using * indicates that zero in" Or multiple wildcards. "Match one or more wildcards in one line" in one line. Some of these examples are useful: Example 5 shows how to use / s * to enhance expression to deal with scattered spaces; Examples of a generalization method that matches a word; Example 4 exemplifies a generalization method that uses a keyword matching hash structure.

However, specially. Example 1 is not a general method that matches the Perl number. But if you give all formats supported by Perl, this will be a very difficult problem. We will take it as a question in the back. There is also a place in this table. Need to note: Some adapters are marked as "zero width assertion", and we will explain the rollies below.

(2) Zero width assertion and positive width assertion

The characters in Table 9-2 are the positive width assertions that the reader may be called:

Table 9-2 Positive statement

/ D non-number

/ d number

/ w words

/ W non-word

/ s space

/ S non-space

'. Any character other than the wrap.

These assertions actually match a character in the string. Positive width means matching a character, and the regular expression engine "eat" in the matching process. The negative width assembly listed in Table 9-3.

These assertions are not matching a character, which matches a condition. In other words, ^ cat matches the string starting with CAT, but does not match a given manner. Please see the expression below:

$ ziggurautstring = 'this matches the word zigguraut';

$ ziggurautstring = ~ m "/ bzigguraut / b";

$ ziggurautstring = ~ m "/ wzigguraut / w";

The first example matches success because it looks for ziggurat between two non-word characters (word boundaries). The string satisfies this condition.

The second example did not complete the match, why? Because the / w at the end / W is a positive width assertion, therefore. Must match a character. But the row is not a character, but a condition. This is an important difference.

Furthermore, even if a match is implemented, the regular expression engine will eliminate the characters involved. Therefore, if you enter the following code:

$ ziggurautstring = "this matches the word zigguraut now";

$ ziggurautstring = ~ s "/ wzigguraut / w" "g;

The final result is this matches the wordnow. The reason is that the words and inserted spaces have been replaced. thus:

• Zero width assertion, such as / b / b, can match where there is no character. They don't drop anything in the match. Here are other examples of the matching of wildcards:

$ EXAMPLE = '111119';

$ EXAMPLE = ~ m "/ d / d / d"; # match The first three digits it can find in the string matches '111'.

$ EXAMPLE = 'this is a set of words and not of numbers';

$ EXAMPLE = ~ m "of (/ w / w / w / w)"; # matches 'of words' ..creates a backreference Please note the last example, this column, because there is one in the beginning of the string Of in front of Words, the mode matching will match this particular OF. Not going to match the back of the OF (the one in front of NumBers). The final example also shows the problem of we will discuss the discussion. This is if you want to match five word characters, then you must print five times / W, which is very troublesome. Therefore, in order to facilitate matching length mode, Perl provides multiple matching operators. We will discuss this problem next.

2. Multi-matching operator

There are six multiple matching operators in PERI. Mainly used to avoid writing duplicate code, such as in the previous section, declared / w five times in a row. Duty people can regard them as shortcuts.

The six multi-match operators of Peri are:

· * - Match zero, once or more. ·? - Match once or multiple times.

• - Match zero or once.

· {X} - Match 'x' times.

· {X,} - Match 'x' or more times.

· {X, y} - Match 'x' to 'Y'.

There are two equivalents here, but which is easy to read?

$ EXAMPLE = 'this is a set of words and not of numbers';

$ EXAMPLE = ~ m "of (/ w / w / w / w)"; # matches 'of word'.

$ EXAMPLE = ~ M "of (/ w {5})"; # usage of {x} form. matches 5 characters,

# And backreference $ 1 Becomes the string 'word'.

The reader may find the code to read the second example. This example uses multiple matching operators to avoid writing duplicate, annoying code. The second example also uses symbols to match uncertain numbers. Regular expression a * matching ", A, AA or AAA, or any number of A. That is, match zero or more A, for example:

$ EXAMPLE = 'this matches a set of words and not of numbers;

$ EXAMPLE = ~ M "of (/ w )";

Matching string Words (Of (/ w ) Eq of Words, as follows:

$ example = ~ m "of (/ w (2, 3))"; # usage of {x, y}. matches the string 'wor'

# (The First Three Letters of The First Match It Finds.

Matches the string 'wor' ('of / w {2, 3}' equals 'of wor' here)

Contrary to intuition. The m "" clause:

$ EXAMPLE = 'this matches a set of words and not of numbers;

$ EXAMPLE = ~ M "of (/ d *)";

Match the string, although we are looking for numbers with / d *. What is the reason? Because / d * is characterized by zero to multiple, the expression matches the zero number! however:

$ EXAMPLE = ~ M "of (/ d )";

Will not match the same string because the expression is used / d instead of / d *. This means that the lookup of one or more numbers located behind the word OF, and this condition is that this string does not have. 3. greedy

So far, all of the above examples have the main point that the regular expression engine matches a given string according to a given expression. That is, the multi-matched operator is greedily.

"Greed" meaning here? "Greed" means that in the default, Perl's multi-match operator captures the maximum number of characters in a string, and still has the ability to complete the mode match. Readers should master this. Understand the essence of greedy Perl expressions will save a lot of time to avoid tracking of quirky regular expression behavior.

Here is a few simple examples about greed, the greedy behavior in the example can make the programmer. Let us start from the following statement:

$ EXAMPLE = 'this is the best example of the greedy pattern match in perl5';

It is assumed that IS is designed in this example. Accordingly, to write the following code:

$ example = ~ m # this (. *) the #;

Print $ 1; # this does not print out the string 'is'!

Readers want to print $ 1 is IS. But what is obtained is the string below:

'Is the best esample of'

The process of the program is shown in Figure 9-6.

The reason for this result is the greed of multiple matching operators *. Operators * acquire all characters until the last occurred string The (one before GREEDY). Then, if the reader is not careful, it is not expensive after using the regular expression.

Here are more examples:

$ EXAMPLE = 'SAM I am';

$ EXAMPLE = ~ M "(. *) am '; # matches the string' sam i '

$ esample = 'record: 1 value: a value2: b';

$ EXAMPLE = ~ M ".record '; (. *) value"; # matches' 1 value: a';

$ example = 'record';

$ EXAMPLE = ~ M "/ w {2, 3}"; # matches rec

The last example shows that even the digital multi-matcharging operators are also greedy. Although there are two word characters in Record, PERI is more willing to match three because it has this capability. If you enter 'RE' = m "/ w {2, 3}", only two characters can be matched because it is the maximum number of possible matching.

4. Backtracks and multiple accessories

Ok, it's ready for a long time. Now it's time to have a tricky topic. As mentioned earlier, combination of wildcards and backtracks makes regular expressions have extremely slow performance. If the reader understands the reasons, then this is a good sign for the reader "get" regular expression.

See the following example:

$ String = ~ m "HAS (. *) Multiple (. *) Wildcards";

This means that the regular expression will look (in digital order):

1) Mode HAS (M "HAS. * Multiple. * Wildcards").

2) Regular expressions can discover the maximum text until it reaches the last multiple (. *) Mmltiple (. *)

Wildcards.

3) String Multiple (M "HAS (. *) Multiple (. *) Wildcards"). 4) The maximum text that can be found until the last Wildcards (. *) Multiple (. *) Wildcards ").

5) String Wildcards (M "HAS (. *) Multiple (. *) Wildcards").

Then consider what will happen using the following mode: