9.3.1 Principles 1
Regular expressions have three forms: match, replacement, and conversion.
There are three regular expression operators listed in Table 9-1.
Next, each expression is given a detailed explanation.
Match: m /
Replacement: S /
· Conversion: TR /
CAUTION Conversion
These expressions are bundled to the scalar by using = ~ (in English: does, with "to match" the same) and! ~ (English: doesn't, "mismatch" same). As an example of this type, we will give six example regular expressions and corresponding definitions:
$ SCALARNAME = ~ S / A / B; # Substitution The Character a for b, and return true if this can happern
$ SCALARNAME = ~ m / a; # does The Scalar $ SCALARNAME HAVE AN A IT?
$ SCALARNAME = ~ TR / A-Z / A-Z /; # Translate All Capital Letter with Lower Case Ones, And Return Turei this Happens
$ SCALARNAME! ~ S / A / B /; # Substitution The Character a for B, And Return False if this indeed happens.
$ SCALARNAME! ~ M / A /; # Does The Scalar $ SCALARNAME MATCH The Character a? Return False if it does.
$ SCALARNAME! ~ TR / 0-9 / A-J /; # Translate The Digits for the Letters A Thru J, AND RETURN FALSE IF this Happens.
If we enter code like Horned toad = ~ M / TOAD / this, Figure 9-1 shows:
Also, if the reader is matched to a specific variable $ _ (the reader may be used in the While loop, MAP or GREP), it can be not available! ~ And = ~. Thus, all of the following code will be legally:
My @Elemente = ('Al', 'A2', 'A3', 'A4', 'A5');
Foreach (@Elements) {s / a / b /;}
The program makes @Elements equal to B1, B2. B3, b4, b5. In addition:
While (<$ fd>) {print if (m / erbor /);} Print all rows that contain Error strings:
IF (GREP (/ Pattern /, @Lines)) {print "The variable / @ lineshas pattern in it! / n";}
Print all rows that contain mode Pattern content, which is directly introduced into the next principle.
9.3.2 Principle 2
Regular expressions are only matched on the scalar.
Note the importance of scalar here, if the reader tries the following code:
@ArrayName = ('variablel', 'variable2');
@ArrayName = ~ m / variable /; # ods for 'variable' in the array? no! users Use Grep Instead
Then @ArrayName matches unsuccessful! @ArrayName is interpreted by Perl 2, so this means that the reader is entering:
'2' = ~ m / variable /;
At least this does not give expectations. If the reader wants to do this, the loses are:
GREP (m / variable /, @ArrayName);
This function is looped by each element in @ArrayName, returns the number of times (in the scalar environment), and returns the actual list of matching elements in an array environment.
9.3.3 Principle 3
For a given mode string, the regular expression only matches the earliest match. You only match or replace it once when you match.
This principle uses the process called "back" to indicate how to match a given string; if a partial matching is found, it is found to make this match invalid, the regular expression "back" in the string is the minimum number, This quantity of characters must ensure that no match is lost.
What is the most helpfulness of understanding regular expressions, this principle is the most helpful one, and it doesn't need to understand what it is doing as Perl. Assume that there is a mode:
'Silly People Do Silly Things if in silly moods'
At the same time, I want to match the following mode: '
'Silly Moods'
So, the regular expression engine matches Silly, then encounters the PEOPLE P, and the regular expression engine knows that the first Silly does not match, and the regular expression engine moves to P and continues to seek match. It then encounters the second silly, so come to match MOODS. However, the letter T (in thing), so that the THINGS is moved to THINGS, continue to match. When the engine encounters the third Silly and try to match MOODS, the match is successful, and the match is finally completed. The situation happens is shown in Figure 9-2.
It will become more important when we encounter a wildcard. If there are several wildcards in the same regular expression, and all wildcards are intertwined together, then there is a sick situation, in this case, the trace is very expensive. See the following expression::
$ line = m / expression. * be. * Very. * Expensive. * /
* Represents a wildcard, which means "matching any character (except for restrictions) zero or multiple times." This process may take a long time; if you have possible match at the end of the unpacked string, the engine will be crackled back. For more information on this, please pay attention to the principle of wildcards.
If the reader finds that the context is similar to the above, the wildcard needs to decompose the regular expression into a small work. In other words, simplify your regular expressions. 9.3.4 Principle 4
Regular expressions can handle any and all of the characters that can be processed by a dual quotation string.
At the S // Operator (S / * / /), or the first partition area of the m // operator m / * /, the entry located is actually treated as a double quotation string (with some additional Additional functions, special regular expression characters on behalf of the name! Readers can use them to interpolate:
$ Variable = 'Test'; $ a = ~ m / $ {variable} aha /;
with:
$ a = "$ {variable} aha";
Both points to the same string: the former is homoked in $ A Testaha. The latter sets $ a as a string Testaha. Since the regular expression processes each character of the dual quotular string processing, the following operations can be performed:
$ expression = 'hello';
@ArrayName = ('Elem1', 'ELEM2');
$ Variable = ~ m / $ expression /; # this equals m / hello /;
Here, we simply extends the $ expression to Hello to get m / hello /. This trick can also be used in arrays:
$ Variable = ~ m / @ arrayname /; # this equals m / elem1 elem2 /;
Here, the expression is equivalent to M / ELEM1 ELEM2 /. If the special variable is set to |. The expression will be equivalent to M / ELEM | ELEM2 /, as we see, it matches the ELEM or ELEM2 of the string. This method can also be applied to special characters:
$ Variable = ~ m // x01 / 27 /; # match binary character X01, AND
# Octal Character 27.
$ Variable = ~ s // t / t / t //; # Substitution Three Tabs for Three Spaces.
In fact, in addition to a few exceptions discussed herein, Perl processing is the same as in double quotes in the M / / process. But there are exceptions: some characters that have a clear meaning of the regular expression engine. So what happens if you want to match similar to a front roller (/) or garden (())? These characters have special significance to regular expressions: thus cannot use the following statement:
$ variable = ~ m // usr / local / bin /; # matches / usr / local / bin? no! Syntax Error
Because Perl will use the / interpret the end tag of the regular expression. There are three methods to match the methods similar to the above special characters. The first method is to use the "escape" to "escape" to match any special character, including a backslash. Therefore, the example just given can be variable:
$ PATH = ~ m /// usr // local // bin /;
The program tries to match / usr / local / bin in $ PATH. The second method is to use a different regular expression character. If there is a lot of characters to match, then use a backslash will become very ugly (path characters are especially good).
Fortunately, Perl is based in a synthetic form. Because the reader needs to give each / to the slash when the reader inputs M / / or S //, the regular expression allows the reader to change the order of the regular expression to any of its favorite characters. For example, we can use double quotes (") to avoid a large number of backslashes:
$ Variable = ~ m "/ usr / local / bin"; # Note The Quotation Marks. $ variable = ~ m "/" Help / ""; # if you are going to match quotation
# Marks, You NEED To BACKSLASH THEM here. (As per / ")
$ Variable = ~ S "$ Variable" $ VARIABLE "; # works in s /// Too.
For good original intentions, we used this approach to the first few chapters of this book. If "as the regular expression character as the reader, it acts as a good memory method when used, remember that it is actually the reverse insertion of the string; otherwise, the quotation marks are far less than slash. .
Perl allows for {} () [] to write regular expressions:
$ Variable = ~ M {this works well with vi or emacs Because the Parens Bounce}
$ Variable = ~ M (this Also Works Well);
$ Variable = ~ s (Substitution Pattern) {for this pattern} sg;
This principle is very convenient to handle multi-line regular expression. Since there is no parentheses here, the reader can start to treat the expression as a "micro function" (if the reader has a reasonable intelligent editor like Emacs or Vi), in other words, the reader can start at the beginning of the expression Between the end of the end.
The third method is to use the function quotemeta () to move to a slope. If you enter the following code:
$ Variable = ~ M "$ SCALAR";
Then $ scal will be inserted and converted to a value. Here is a warning: Any special character will be affected by the regular expression engine and may cause a syntax error. Therefore, if the scale is:
$ scal = "({";
Then enter the following code:
$ VARIABIE = ~ M "$ SCALAR";
It is equivalent to saying: $ variable = ~ m "({", and this is a runtime syntax error. If the expression is as follows:
$ SCALAR = quotemeta ('({');
Then the expression will make the $ SCALAR to / (/ {, and replace $ SCALAR to:
$ Variable = ~ M "/ (/ {";
This can match the string ({) that can be matched to the reader.
9.3.5 Principles 5
Regular expressions are generated in the process of evaluating: the result status and reverse reference.
Get each time the regular expression is evaluated:
. Indicates the number of times the regular expression matches the string (resulting state).
. If you want to save a portion, there is a-series called a variable called reverse reference.
Next, let us learn them in turn:
1. Result
The result status indicates the number of regular expressions match characters. The result of obtaining the results is to see the value of regular expressions in the scalar environment. All of the following examples use this - result variable.
$ pattern = 'simple always simple';
$ Result = ($ pattern = ~ m "simple");
Here, Result is 1 because mode Simple is located in Simple Always Simple. Similarly, given Simple Always Simple: $ Result = ($ PATTERN = ~ M "Complex");
The result will be empty because Complex is not a sub-string of Simple Always Simple, then:
$ Result = ($ pattern = ~ s "Simple" complex ");
Make Result 1 because the SIMPLE is replaced with Complex success. Further:
$ pattern = 'simple simple';
$ Result = ($ pattern = ~ s "Simple" Complex "G);
The situation becomes more complicated. Here, $ Result is 2, because SIMPLE in Simple Always Simple occurs twice, and the regular expression of the G modifier is used, which means "matching the possibility of matches". " (To refer to the modifier behind this chapter). Similarly:
$ pattern = 'simple still';
IF ($ pattern = ~ m "simple")
{
Print "matched! / n";
}
Use $ pattern = ~ m "simple" in the IF clause, and the clause tells Perl, if the mode $ pattern contains SIMPLE, print matched!
2. Reverse reference
The reverse reference is a bit complicated. If you want to save some matching, then for this purpose, Perl has an operator (parentheses ()) that can be used to surround a series of given characters that readers want to match.
In the regular expression, it is a pattern to tell the interpreter "Hey, I want to save that data." The Perl interpreter requests and saves the match to the variable of a series of Tu beads ($ 1) $ 2, $ 3 ... $ 65536) These variables can be used to query the first, second, third, etc., Which can then seek regular expressions by viewing the corresponding variable or in an array environment. Value and access. E.g:
$ text = "this matches 'this' NOT 'THAT'";
$ TEXT = ~ M "('TH ..')";
Print "$ 1 / n";
Here, the word willow HIS is printed - Perl has saved them in $ 1, and then print $ 1 later. However, this example reveals more content, for example:
1) Wildcard (character point (.) Matches any character). If this is not in the string, the mode (TH ..) will be pleased
That.
2) Regular expression matches the first mode that appears on a row. This is matched because it appears first. At the same time, by default REGEXP behavior, this will always match the first string. (You can change the default value by modifier, and details will be described later).
Figure 9-3 shows how this matching process is performed.
Each parentheses in Figure 9-3 runs with their own digital variables.
There are more examples here:
$ TEXT = 'this is an example';
($ EXAMPLE, $ BACKREFERENCES) = ($ text = ~ m ". * (backreference)") "); here is used here to separate two text strings $ EXAMPLE and $ backreferences. These strings are placed in $ 1 and $ 2, and then immediately assign $ EXAMPLE and $ BACKREFERCENCES. This process is illustrated in Figure 9-4.
However, it should be noted that the process assigned to $ EXAMPLE and $ BACBREFERENCE is only when the text string matches. $ Example and Backreferences are empty when the text string does not match. Here there is a better example, this example is included in the IF statement, print $ 1 and $ 2 only when matching.
IF ($ TEXT = ~ M "(Example). * (back)")
{
Print $ 1; #prints 'example' - Since The First Parens Match The Text EXAMPLE.
Print $ 2; # prints 'back' - Since the second parens match the text back
}
This way, if the regular expression does not match what will happen? If you use the following mode:
$ TEXT = 'this is an example';
$ text = ~ s "(examplar). * (back)" Doesn't Work ";
Print $ 1;
$ 1 cannot be assigned due to the regular expression matching. More importantly, Perl will not tell the reader that it does not give $ 1. The last example shows the two-point reproduction content about the regular expression:
1) Regular expression is "either all either no" processing, just because the Back string can match in the mode, so:
This is an example of backreference '
It does not mean that the entire expression is matched. Because ExemPlar is not in the string, the replacement failed.
2) If the regular expression fails, the reverse reference cannot be assigned. Therefore, it is not possible to definitely print anything. When tracking logic issues, this is the reason for people; and often Perl Gotcha. $ 1 is just a regular variable and (the opposite to the Perl syntax) If the regular expression fails, the reverse reference is not set to "blank". Some people think this is a flaw, but some people think this is a feature. However, it becomes very obvious when the following code is analyzed.
1 $ a = 'bedbugs bite';
2 $ a = ~ m "(BEDBUG)"; # sets $ 1 to be bedbug.
3
4 $ b = 'this is nasty';
5 $ b = ~ m "(Nasti)"; # does not set $ 1 (Nasti is not in 'this is nasty').
6 # but $ 1 is still set to bedbug!
7 Print $ 1; #prints 'bedbug'.
In this case, $ 1 is a string Bedbug because the matching of the 5th line failed! If you want to get Nasti, ok, it is your own problem. This perListive behavior may make people measures. Considering yourself be careful.
3. General constructance using reverse reference
If you want to avoid this normal defect (reader wants to get a match, but not get and end with the front match to replace it), as long as the reverse reference is given to the variable, only one of the following three constructors: 1) Short-circuit method. Verify match, if the match occurs, and only use '&&' at this time, for example:
($ SCALARNAME = ~ M "(Nasti)") {$ matched = $ 1;}
2) IF clause. Put the match in the IF clause, if the IF clause is true, and only at this time is assigned to the pattern.
IF ($ SCALARNAME = ~ M "(Nasti)") {$ matched = $ 1;}
Else {Print "$ SCALARNAME DIDN't Match";
3) Direct assignment. This is always taken by using the regular expression directly to a value.
($ MATCH1, $ MATCH2) = ($ SCALARNAME = ~ M "(regExp1). * (regexp2)");
The matching code of all the readers should look similar to one of the three examples described above. Missing these forms, then the encoding is performed without security assurance. If the reader never wants this type of error, then these forms will save a lot of time.
4. Use reverse references in regular expressions
When you want to use S "" Operator or M "" operator to match some complex modes, Perl provides useful features that readers should realize. This feature is that the reverse reference can be used for regular expressions themselves. In other words, if a set of characters can be hosted with parentheses, you can use a reverse reference before the regular expression. If you want to use the reverse reference in the second part (with underscore), So you have to use grammar $ 1, $ 2, etc. If you want to use the reverse reference in the first part of M "" or S "" ", use the syntax / 1/2. Here are some examples:
$ String = 'Far Out';
$ String = ~ S "(FAR) (out)" $ 2 $ 1 "; # this makess string 'out far'.
In this example, we only convert the word far out to OUT FAR.
$ String = 'Sample EXAMPLES'
IF ($ String = ~ M "(amp ..) EX / 1") {print "matches! / n";}
This example is a bit complicated. The first mode (AMP ..) matches the string ample. This means that the entire mode becomes a string ample example, where the underlined text corresponds to / 1. Therefore, mode match is Sample Examples.
Below is the same style more complex example;
$ String = 'bballball';
$ String = ~ S "(b) / 1 (a ...) / 1/2" $ 1 $ 2 ";
Let's take a look at this example in detail. This example is completed, but the reason is not too obvious. There are five steps to match this string:
1) The first B matched the beginning of the string in parentheses, then store it in / 1 and $ 1.
2) / 1 The second b in the string is then matched because the second character happens to B.
3) (a ..) Matching the string all and is present in / 2 and $ 2.
4) / 1 Match the next B.
5) Because / 2 is equal to ALL, the next one is matched and the last three characters (all).
Put them together to get regular expressions match Bballball, or the entire string. Since $ 1 is equal to 'b', $ 2 is equal to ALL, the entire expression: $ String = 'bballball';
$ String = ~ S "(b) / 1 (a ..) / 1/2" $ 1 $ 2 ";
(In this example) Convert to the following code:
$ String = ~ S "(b) B (all) ball" ball ";
Or use the juh, replace the ball with Bballball. '
Regular expressions look like Figures 9-5.
S "" "There are some complex reverse references. If the last example is understood. Then the reader is in front of how the regular expression of Perl is far away. Reverse references may be more worse. .
5. Nested reverse reference
Nested reverse reference For complex difficulty in single order (a string follows the other string), the string effect is obvious. For example, the following expression:
M "((aaa) *)";
Use * to match the AAA: Matching ", AAA, AAAAA, AAAAAAA. In other words, Perl matches a plurality of 3A mode. But this mode does not match the AA. Assume that you want to match the following string :
$ String = 'Softly Slowly SURELY SUBTLY';
Then use the regular expressions below after the nesting garden parentheses, the following:
$ String = M "((S ... LY / S *) *)"; # Note Nested Parens.
In this example, the outermost parentheses captures all strings: Softly Slowly SURELY SUBTLY. The innermost parentheses captures a combination of strings, which is formed with LY, and LY is formed by LY and LY is formed. Therefore, the regular expression first captures SURELY, throws it, and then captures Slowly, letting it, then capture SURELY, and finally capture Subtly. There is a problem here, what is the order of reverse reference? Readers may be easily confused on this issue. Is the outer parentheses appear first, or the inner rinc-nest number appears first? The simplest solution is to remember the following three principles:
1) In the expression, the smaller the counter reference number, the corresponding reverse reference number. E.g:
$ VAR = ~ M "(a) (b)";
In this example, the reverse reference (a) becomes $ 1, (b) becomes $ 2.
2) A reverse reference If it contains a wider range, its reverse reference number is smaller. E.g:
$ VAR = ~ m "(c (a (b) *) *)";
In this example, the reverse references contain all of the content (M "(c (a (b) *)") becomes $ 1. There is an expression M nested inside (c (c (b) *) * "be $ 2. In (M "(c (c (a (b) *)"), the nested expression is $ 3.
3) In the case of two rule conflicts, the rule 1 is prioritized. In statements $ VAR = ~ m "(a) (b (c))", (a) becomes $ 1, B (c) is $ 2, (c) becomes $ 3.
Thus, in this example, (s ... LY / S *) * becomes $ 1, (s ... LY / S *) * becomes $ 2.
Note that there is another problem here. Let us return to the complex regular expression of the beginning of the beginning:
$ String = 'Softly Slowly SURELY SUBTLY' $ String = M "((S ... LY / S *) *)"; # Note Nested Parens.
What is this (s ... ly / s *) * match? It matches multiple strings; the first is Softly, then Slowly, then SURELY, and finally Subtly. Since (s ... ly / s *) * matches multiple strings, Perl will abandon the first match and make $ 2 into subtly.
Even these rules, nesting parentheses may still cause confusion. The best thing to do is practicing. The regular expression is re-implemented again with the different combinations of these logic and then handed over to the Perl interpreter. This allows the reader to understand that the reverse reference is explained by the Perl interpreter in what order.
9.3.6 Principles 6
The core of the ability of the regular expression is the wildcard and multiple matching operators. Wildcard operators allow multiple characters in the string. If binary data is being processed, the wildcard matches a series of characters. Multi-match operators can match zero, one or more characters. For the basis of explaining Perl, the examples we use so far are inspirated, but the function is not very powerful. In fact, the done may use the C subroutine to complete any of them. The powerful feature of the PERL regular expression collection comes from the multi-mode capabilities that match the text, (ie: describes many non-directed data modes by the logic "quick notger" mentioned above). Perl just provides the best shortprint.
1. Wildcard
Wildcard represents a character class. He didn't have the following strings, but I don't know if they write:
. Kumquat
. Kristina
. Kentucky
. Key
. Keeping
In this case, the following Perl expressions will match the first character of each word:
[KK]
This is an example of a character class. All wildcards in Perl can be represented by parentheses [and put the characters you want to match in parentheses in parentheses] This method is represented. The previous wildcard tells the regular expression engine "Ok, I am looking for" k "or" r "here. If one of the two is found, it matches it." Below is another example of using wildcards:
$ SCALARNAME = 'This Has A Digit (1) in it';
$ SCALARNAME = ~ M "[0-9]"; # this matches any character Between 0 and 9, That is matches Any Digit.
$ SCALARNAME = ~ 'This Has A Capital Letter (a) in it';
$ SCALARNAME = ~ m "[a-z]"; # this matches any capital letter.
$ SCALARNAME = ~ "this does not match, since the letter after the string 'an' is an a"
$ SCALARNAME = ~ M "AN [^ a]";
The first two examples are quite intuitive, [0-9] matches the number 1 in IT in IT in IT. [A-Z] Match the uppercase character A in this Has Acapital Letter (a) in IT. The last example is slightly skillful, because there is only one AN in this mode, so the characters that may be matched only have the last four characters, namely AN A.
However, by inquiry mode AN [^ A] We have clearly told the regular expression to match A, then N, space, the last one is non-A character. Thus, there is no match in this example. If a given mode is Match An a not an e, then the match will be completed, because the first AN is skipped, the second match is matched! Just as the following example: $ SCALARNAME = "This Has A Tab () OR A Newline In It So it matches
$ SCALARNAME = ~ m "[/ t / n]" # Matches Either a Tab OR a newline.
# Matches Since The Tab Is Present.
This example shows some interesting things that can be made with matching and wildcards. First, the reader already in "" "" "" "" "" "string can also be inserted in the regular expression and the character class represented by parentheses ([T / N]). Where "/ T" matches tab, "/ N" matches the change line.
Second. If the reader places one ^ in [], the wildcard will match the characters in the non-character group. Similarly, if you are placed in [] -, you can match a given range (all numbers in this example [0-9]. All uppercase letters ([AZ]). These operators can also be merged Also quite special wildcard:
$ A = ~ m "[a-fH-z]"; # matches any lowercaes letter * except * g.
$ a = ~ m "[^ 0-9a-za-z]"; # matches any nonword character. (i.e., not
# A character in 0-9, a-z or a-z)
$ a = ~ m "[0-9 ^ a-za-z]"; # a misteake, does not
# Equal the Above. Instead matches 0-9,
$ A = ~ m "[/ n]"; # matches a space character: tab, newline or blank).
Important places to be a third example, insertion markers in [0-9 ^ A-ZA-Z] are inserted marks on a literal, rather than representative, because it appears in the middle of the character class . Therefore, if the reader wants to get a negative character class. Then you always put the insertion mark in []. Don't forget to use []. If the reader has forgotten [], the resulting will be a literal text string instead of a character class.
(1) Public wildcard
It happens that some wildcards are common; when the reader wants to match a number each time, you may not be willing to enter a code similar to [0-9] each time. For those situations, Perl has several convenient shortcuts that make the programming work easily after use. Below are these edge acquisitions and their representative meaning and the characters corresponding to them:
. / d - matching number (character combination [0-9]).
. / D - matching is not numbered (character combination [^ 0-9]).
. / W - Match word characters (character combination [A-ZA-Z0-9 _]) (here next to line calculate a word character).
. / W - Matching non word characters (character combination [^ a-za-za-za-z0-9_]).
. / s - Match the space character (character combination [/ t / n]) (tab, newline, space).
. / S - matching non-space character (character combination [/ t / n]).
. - Match any character (in some cases) except for the charm (character combination [^ / n]), when entering M "(. *)", You can match any character. See the modifiers behind this chapter. . $ - Although it is not a wildcard (it does not match any specific character). But it is a widely used character; if it places it in the tail of the regular expression, it matches the "row". Zero width assertion.
. ^ - Although it is not a wildcard, it is a special character that matches the "lead" if it is at the beginning of the regular expression. Zero width assertion.
. / b, / b- with $ and ^ the same; do not match characters, but match the word boundary (/ B) or match the no single periphery (/ b). Zero width assertion.
The first point noted from the table is the "point" wildcard (.). It is often used with multiple matching operators to act as a fill in the entries. Please see the following:
$ a = 'now is the time for all good men to come to the aid of their party;
$ A = ~ m "(now). * (party)"; # matches, since '.' matches any
Character Except Newline
And '*' means match zero or more character.
* Capture all characters in the middle of NOW and PartY, matching is successful. ("All" in this environment means "zero or more, as many as possible". This is the so-called greediness (Greediness); talk about it when we talk about multiple matching hours.)
Here are some other examples of wildcards. Note that we use a single string (this is a simple method for test expressions) on the left side of = ~.
1 '1956.23' = ~ m "(/d ) /. (/D )"; # $ 1 = 1956, $ 2 = 23
2 '333E 12' = ~ m "(/ d )"; # $ 1 = 'E '
3 '$ hash ($ value)' = ~ m "/ $ (/ w ) {/ $ (/ w )}"; # $ 1 = 'hash', $ 2 = 'Value'
4 '$ hash ($ value)' = ~ m "/ $ (/ w ) {(/ w) * (/ w ) (/ w *)}"; # $ 1 = '$', $ 2 = 'hash',
# $ 3 = '$', $ 4 = 'Value'
5 'variable = value' = ~ m "(/ w ) (/ s *) = (/ s *) (/ w )"; # $ 1 = 'variable', # $ 2 = '',
# $ 3 = '', $ 4 = 'Value'
6 'Catch as catch can' = ~ m "^ (. *) CAN $; # $ 1 = 'catch as catch'
7 'can as catch catch' = ~ m "^ CAN (. *) $ # $ 1 = 'as catch catch'
8 'Word_With_underlines Word2' = ~ M "/ B (/ W ) / B; # $ 1 = Word_With_underlines Each example, we use a different wildcard, in the program, using * indicates that zero in" Or multiple wildcards. "Match one or more wildcards in one line" in one line. Some of these examples are useful: Example 5 shows how to use / s * to enhance expression to deal with scattered spaces; Examples of a generalization method that matches a word; Example 4 exemplifies a generalization method that uses a keyword matching hash structure.
However, specially. Example 1 is not a general method that matches the Perl number. But if you give all formats supported by Perl, this will be a very difficult problem. We will take it as a question in the back. There is also a place in this table. Need to note: Some adapters are marked as "zero width assertion", and we will explain the rollies below.
(2) Zero width assertion and positive width assertion
The characters in Table 9-2 are the positive width assertions that the reader may be called:
Table 9-2 Positive statement
/ D non-number
/ d number
/ w words
/ W non-word
/ s space
/ S non-space
'. Any character other than the wrap.
These assertions actually match a character in the string. Positive width means matching a character, and the regular expression engine "eat" in the matching process. The negative width assembly listed in Table 9-3.
These assertions are not matching a character, which matches a condition. In other words, ^ cat matches the string starting with CAT, but does not match a given manner. Please see the expression below:
$ ziggurautstring = 'this matches the word zigguraut';
$ ziggurautstring = ~ m "/ bzigguraut / b";
$ ziggurautstring = ~ m "/ wzigguraut / w";
The first example matches success because it looks for ziggurat between two non-word characters (word boundaries). The string satisfies this condition.
The second example did not complete the match, why? Because the / w at the end / W is a positive width assertion, therefore. Must match a character. But the row is not a character, but a condition. This is an important difference.
Furthermore, even if a match is implemented, the regular expression engine will eliminate the characters involved. Therefore, if you enter the following code:
$ ziggurautstring = "this matches the word zigguraut now";
$ ziggurautstring = ~ s "/ wzigguraut / w" "g;
The final result is this matches the wordnow. The reason is that the words and inserted spaces have been replaced. thus:
• Zero width assertion, such as / b / b, can match where there is no character. They don't drop anything in the match. Here are other examples of the matching of wildcards:
$ EXAMPLE = '111119';
$ EXAMPLE = ~ m "/ d / d / d"; # match The first three digits it can find in the string matches '111'.
$ EXAMPLE = 'this is a set of words and not of numbers';
$ EXAMPLE = ~ m "of (/ w / w / w / w)"; # matches 'of words' ..creates a backreference Please note the last example, this column, because there is one in the beginning of the string Of in front of Words, the mode matching will match this particular OF. Not going to match the back of the OF (the one in front of NumBers). The final example also shows the problem of we will discuss the discussion. This is if you want to match five word characters, then you must print five times / W, which is very troublesome. Therefore, in order to facilitate matching length mode, Perl provides multiple matching operators. We will discuss this problem next.
2. Multi-matching operator
There are six multiple matching operators in PERI. Mainly used to avoid writing duplicate code, such as in the previous section, declared / w five times in a row. Duty people can regard them as shortcuts.
The six multi-match operators of Peri are:
· * - Match zero, once or more. ·? - Match once or multiple times.
• - Match zero or once.
· {X} - Match 'x' times.
· {X,} - Match 'x' or more times.
· {X, y} - Match 'x' to 'Y'.
There are two equivalents here, but which is easy to read?
$ EXAMPLE = 'this is a set of words and not of numbers';
$ EXAMPLE = ~ m "of (/ w / w / w / w)"; # matches 'of word'.
$ EXAMPLE = ~ M "of (/ w {5})"; # usage of {x} form. matches 5 characters,
# And backreference $ 1 Becomes the string 'word'.
The reader may find the code to read the second example. This example uses multiple matching operators to avoid writing duplicate, annoying code. The second example also uses symbols to match uncertain numbers. Regular expression a * matching ", A, AA or AAA, or any number of A. That is, match zero or more A, for example:
$ EXAMPLE = 'this matches a set of words and not of numbers;
$ EXAMPLE = ~ M "of (/ w )";
Matching string Words (Of (/ w ) Eq of Words, as follows:
$ example = ~ m "of (/ w (2, 3))"; # usage of {x, y}. matches the string 'wor'
# (The First Three Letters of The First Match It Finds.
Matches the string 'wor' ('of / w {2, 3}' equals 'of wor' here)
Contrary to intuition. The m "" clause:
$ EXAMPLE = 'this matches a set of words and not of numbers;
$ EXAMPLE = ~ M "of (/ d *)";
Match the string, although we are looking for numbers with / d *. What is the reason? Because / d * is characterized by zero to multiple, the expression matches the zero number! however:
$ EXAMPLE = ~ M "of (/ d )";
Will not match the same string because the expression is used / d instead of / d *. This means that the lookup of one or more numbers located behind the word OF, and this condition is that this string does not have. 3. greedy
So far, all of the above examples have the main point that the regular expression engine matches a given string according to a given expression. That is, the multi-matched operator is greedily.
"Greed" meaning here? "Greed" means that in the default, Perl's multi-match operator captures the maximum number of characters in a string, and still has the ability to complete the mode match. Readers should master this. Understand the essence of greedy Perl expressions will save a lot of time to avoid tracking of quirky regular expression behavior.
Here is a few simple examples about greed, the greedy behavior in the example can make the programmer. Let us start from the following statement:
$ EXAMPLE = 'this is the best example of the greedy pattern match in perl5';
It is assumed that IS is designed in this example. Accordingly, to write the following code:
$ example = ~ m # this (. *) the #;
Print $ 1; # this does not print out the string 'is'!
Readers want to print $ 1 is IS. But what is obtained is the string below:
'Is the best esample of'
The process of the program is shown in Figure 9-6.
The reason for this result is the greed of multiple matching operators *. Operators * acquire all characters until the last occurred string The (one before GREEDY). Then, if the reader is not careful, it is not expensive after using the regular expression.
Here are more examples:
$ EXAMPLE = 'SAM I am';
$ EXAMPLE = ~ M "(. *) am '; # matches the string' sam i '
$ esample = 'record: 1 value: a value2: b';
$ EXAMPLE = ~ M ".record '; (. *) value"; # matches' 1 value: a';
$ example = 'record';
$ EXAMPLE = ~ M "/ w {2, 3}"; # matches rec
The last example shows that even the digital multi-matcharging operators are also greedy. Although there are two word characters in Record, PERI is more willing to match three because it has this capability. If you enter 'RE' = m "/ w {2, 3}", only two characters can be matched because it is the maximum number of possible matching.
4. Backtracks and multiple accessories
Ok, it's ready for a long time. Now it's time to have a tricky topic. As mentioned earlier, combination of wildcards and backtracks makes regular expressions have extremely slow performance. If the reader understands the reasons, then this is a good sign for the reader "get" regular expression.
See the following example:
$ String = ~ m "HAS (. *) Multiple (. *) Wildcards";
This means that the regular expression will look (in digital order):
1) Mode HAS (M "HAS. * Multiple. * Wildcards").
2) Regular expressions can discover the maximum text until it reaches the last multiple (. *) Mmltiple (. *)
Wildcards.
3) String Multiple (M "HAS (. *) Multiple (. *) Wildcards"). 4) The maximum text that can be found until the last Wildcards (. *) Multiple (. *) Wildcards ").
5) String Wildcards (M "HAS (. *) Multiple (. *) Wildcards").
Then consider what will happen using the following mode:
HAS Multiple Wildcards Multiple Wildcards
Everything happens:
1), Perl matches HAS (I.E, M "HAS (. *) Multiple (. *) Wildcards);
HAS Multiple Wildcards Multiple Wildcards
2) Perl Perform M "HAS (. *) Multiple (. *) Wildcards section and eat all characters that it can discover until the last multiple is encountered, then match:
HAS Multipie Wildcards Multiple Wildcards
3) Perl matching string Multiple (ie M "HAS (. *) Multiple (. *) Wildcards:
HAS Multiple Wildcards Multiple Wildcards
4) Perl tries to discover the string Wildcards but fail, then read the rest of the string:
Wildcards Does Not Match 'Wildcards'!
5) What should I do now? Because there are multiple wildcards (*), Perl is backtrained. Regular expressions may make mistakes are in step 2, that is, when it eats:
HAS Multiple Wildcards Multiple Wildcards
When it is, it just returns to HAS:
HAS Multiple Wildcards Multiple Wildcards
^ Goes Back Here
6) Now trying to correct the error, only capture characters before the last multiple. Therefore, mode M "
HAS (. *) Multiple (. *) Wildcards matches:
HAS Multiple Wildcards Multiple Wildcards
7) MultiPle in M "HAS (. *) Multiple (. *) Wildcard Subsequently matches:
HAS Multiple Wildcards Multiple Wildcards
8) Next Configuration Matching Space - M "HAS (. *) Multiple (. *) Wildcards") Match:
HAS Multiple Wildcards Multiple Wildcards
9) Finally, Wildcards (M "HAS (. *) Multiple (. *) Wildcards") matches:
HAS Multiple Wildcards Multiple Wildcards
Therefore, the entire regular expression matches HAS Multiple Wildcards. It gives the expected result, but get the result is definitely a detour. Firmly, Perl implements a shortcut to improve performance, but the logic given is basically correct. I don't hesitate to think that this example is the most important one in this chapter -
Even if Perl is not sometimes prayed in a semantic correctly, it will be annoyed. Review it over and over again. Until the results are fully understood. After that, try to track the following code as PERL matches: $ pattern = "afbgchdjafbgche";
$ PATTERN = ~ m "a (. *) b (. *) c (. *) d";
We will freely set out:
Afbgchdjafbgche (m "a (. *) b (. *) c (. *) d";)
Afbgchdjafbgche (m "a (. *) b (. *) c (. *) D";) - GREEDY, GoES to Last 'B'
Afbgchdjafbgche (m "a (. *) b (. *) c (. *) d";)
Afbgchdjafbgche (m "a (. *) b (. *) c (. *) D";) - Matches G
Afbgchdjafbgche (m "a (. *) b (. *) c (. *) d";)
Afbgchdjafbgche (M "A (. *) b (. *) c (. *) D";) - BackTrack Because No 'd'
Afbgchdjafbgche (m "a (. *) b (. *) c (. *) D";) - NOW We Take Up Everything to the next to last b
Afbgchdjafbgche (m "a (. *) b (. *) c (. *) d";)
Afbgchdjafbgche (m "a (. *) b (. *) c (. *) d";) - now the second. * Becomes Greedy.
Afbgchdjafbgche (m "a (. *) b (. *) c (. *) d";)
Afbgchdjafbgche (m "a (. *) b (. *) c (. *) D";) - STILL NO D.DARN.BACKTRACKS
Afbgchdjafbgche (M "A (. *) B (. *) c (. *) D";) - Wildcard Becomes Less Greedy, Gobbles To Next To Last C
Afbgchdjafbgche (m "a (. *) b (. *) c (. *) d";)
Afbgchdjafbgche (M "a (. *) B (. *) c (. *) D";) - There is Only One D in The Expression and this matches Up to IT
Afbgchdjafbgche (m "a (. *) b (. *) c (. *) D";) - a match!
Matches 'AfBGCHD'.
Not very standard, as readers can see, if the reader has multiple greedy multiple match, things may become worse, the efficiency is lower. Perhaps the easiest way to get from this example is that the left multi-match is preferred on the right. The following mode:
M "(. *) (. *)";
In a string without a wrap, it will always make the first reverse reference contain the entire string, and the second is nothing. Like this error, it is best to use the -DR command line option, just like in Perl - Dr.Script.p. We will discuss this in Chapter 21. Also, what happens if the reader does not want to have greed? Ok, just as we will see it next, Perl (different from other packages) has the ability to handle the non-greedy form of characters.
5. Non-greedy multiple matching operators
Greeds may be a blessing, but it will often argue! An example of a commonly used C comment statement (it is normal FAQ and is in the document). Assume that the following bold text: / * this is a comment * / / * another comment * /
Here you try to find a greedy solution. We want to match the / * of all text until included * /. If we try the following code:
m "//*.*/*/";
So, it will match:
/ * This is a comment * / / * another comment * /
It is also matched all strings because it is a Greedy form.
The following is the best greedy solutions that the author can provide:
$ commentsMatcher = ~ m "// * ([^ *] * | / ** [^ / *]) * / * /"
It is not the best reading in all solutions. (We can use M "" make it easier to read, will later.) We
This will be reviewed later, because this special expression is very helpful to master the regular expression in the general! Now let's take a look at the greedy form. For non-greedy forms, there is a simple rule to remember them: as long as it adds to the back of a greedy multi-operative operator? Just make it a non-greed.
Thus, the previous CommentMatcher became:
$ commentmatcher = ~ m "//*(.* )/*/";
This is still not the most easy to read, but it is definitely better than the front! We can simply describe this statement as follows: "Take / *, then take the minimum number of characters, then take the end * /", the work process is shown in Figure 9-7.
"Lazy" is another term for describing? Regular expression engines can be seen in the north to move slowly until the first matching expression. In this example, it is encountered to move to the next step. If the reader enters the following code:
$ line = ~ m "(. *?) (. *?)";
Every (. *?) Value will not match anything. What is the reason? The minimum number of characters that match here is zero (because the reader has entered * (zero to more or all)). Thus, according to the requirements of matching zero characters. Each (. *?) Complete its own work, and slowly transmit control to the next (. *?), And returns it to take a zero character. Below is a general rule for lazy matching programs. Just add a character class to them (eg, / d, or [123]), you can get a lazy behavior:
· *? - Match zero, one or more times, but match the least least amount of possible.
· ? - Match one or more times, but matches the least least number of possible.
· ?? - Match zero or once, still match the minimum number of possible.
· {X}? - Just match "X" times
· {X,}? - Match "X" or more, but match the least least number of possible.
· {X, y}? - Match "X" to "Y" times, but match the least least number of possible.
Here are some examples of minimum matching programs, as well as examples of matching with non-greed mode:
$ eXample = 'this is The Time for a11good man to come to the aid of their party;
$ EXAMPLE = ~ M "this (. *?) the";
This example is with 'is', if it is greedy, the 'is the time for all good man to come to' is matched.
$ esample = '19992113333333333333331';
IF ($ EXAMPLE = ~ M "L (/ D {3,}?)")
{
}
This expression represents L. L. of three numbers (or more numbers). However, due to what's after '?', It does not substantially indicate the matching of three numbers. Therefore, it will be matched to '1999'. $ esample = '1f9991333333333333333331';
IF ($ EXAMPLE = ~ M "L (/ D {3,}?)")
{
}
Here, we have the same expression, but the first 1 that the pattern match is discovered does not satisfy the condition. Because of its back with 'f'. So the match is found next to 1, and match 1333.
$ esample = '1f9991333333333333333331';
IF ($ EXAMPLE = ~ M "L (/ D {3,}?) 1")
{
}
In this example, matching with the previous one. We have requested that there is a 1 that we must use / d (3,). Thus, although the mode match is lazy, it must also go to the end of the expression to find a match. As the reader is seen, everyone must be very careful for the logic of the regular expression. For those who have not gone, there are too many stunned things. For those who don't know what they are doing, there are many ways to make them don't know what they do.
Mastering the principle of regular expressions is a big step forward. If the reader needs to know more, go to the "Perl Debug" section, where we will give more information about debugging the regular expression.
9.3.7 Principle 7
If you want to match multiple character sets, Perl uses techniques called replacement.
replace
Replacement is to tell Per1 to match one of two or more modes. In other words, the expression:
(Able | Baker | Charlie)
Tell Per1 "Find Strings Able or Baker or Charlie" in the regular expression. As an example, the following statement begins:
$ Declaration = 'char string [80];' or $ declaration = 'unsigned char string [80];'
It is convenient to match the string char or unsigned char, which will match multiple strings once. The following regular expressions match:
Foreach $ Declaration ('char string [80]', 'unsigned char string [80])
{
IF ($ declaration = ~ m "(unsigned char | char)")
{
Print ": $ 1:"; # prints ': char:' First Time Around.
#Prints' ': unsigned char:' Second Time Around.
}
}
Syntax | means matching unsigned char or char and saving the matching string in the reverse reference. Replace can be very subtle, because there is an important thing for replacement behavior:
· Replace Always try to match the first item in the parentheses; if the match is unsuccessful, try the matching of the second style, and push it according to the class.
This is the so-called extreme left match, which can explain many errors encountered when people have exposed to regular expressions. In the previous example, it is assumed that the order of the entries in parentheses is changed, then this example becomes:
$ Declaration = 'unsigned char string [80]';
$ DECLATION = ~ M "(CHAR | Unsigned Char);
This can match the string unsigned char (just like in unsigned char string [80])? No, it matches char (ie, unsigned char string [80]). Since Char in the list is in the first bit, it is limited to the string Unsigned Char. Regular expression matches Char, thus saving an error in the reverse reference. This error message is quite common, so pointing here: • Always take the highest priority string first to match, the most conspirable string first matches. If the reader does not do this, it will fall into a painful waiting, for example:
$ line = ~ m "(. * | Word)";
Never match Word. This is because word is. * (That is, four or four characters). At the same time, the regular expression is extremely left, so pick it first. *, And thus:
$ line = "wordstar";
$ line = ~ m "(. * | Word)";
WordStar will match WordStar (ie, the entire string instead of word .. * Match any character set, and because of the first bit in the replacement, it is always preferred than "word". Therefore, there is no match WordStar Word:
$ line = ~ m "(word |. *)"; #since 'Word' is first.
It is very helpful if you don't know if a word is a delimiter or whether the word is multiple. For example, as follows:
$ line = 'Words';
$ line = 'word';
$ line = ~ m "Word (s | $)" SG; # word may be followed by the character '!' or '$'.
Both can match. This syntax will match the string word, regardless of Word followed by the string end tag or S. Replace $:
$ line = ~ m "Word (s | / b)";
This gives a good way to handle multiplexes.
9.3.8 Principles 8
Perl provides an extension to the regular expression with (? ..) syntax.
One day in Perl history (approximately Perl 4 transitions to Perl5), in order to grow the regular expression collection, people determine that Perl must "reach the element character standard". Some people argued that there are too many metamorphic characters, and others disagree until there are not much metamorphism on the keyboard.
To that is a good idea to form a unique structure that can be used for multiple extensions. After viewing the keyboard, find a quite ordinary character (?) Is not available anywhere. Therefore, it is fixed. The syntax looks like this:
(?
Here,
In addition, an operator (? #Comment) allows the user to embed an injection into a regular expression. However, because the appearance of the X modifier (later discussion) This operator is outdated. In other aspects, the extension is working like other regular expressions. Put them in the regular expression. If you enter:
$ line = ~ m "i love (?! oranges)";
Since (?! Prohibiting ORANGES follows the string i love, this line code matches Love Figs instead of I love Oranges. However, this expression matches i love orange or i love ripe Oranges because it is only forbidden by characters. Strings at the beginning of ORANGES. You can enter: $ line = ~ m "i love (?!. * Orange)";
To prohibit these strings.
In fact, the modifier (?! Is the most easily understood structure in Perl. People hope that it will handle the following:
$ line = ~ m "(?! oranges) i love";
Make it matches ORANGES I Love. This is obviously not successful. Structure (? L) only matches when a substring is not ORANGES. Thus, in this example, the unique location where the match is invalid is to do anything at any other position at the beginning of the string. Regular expressions are executed forward and have discovered that six characters immediately are not "ORANGES". Therefore, it is required to meet the next request.
The other two use the most expressions are (?: ...) and (? = ...). For example: (?: ...) makes the regular expression more efficient by removing unnecessary reverse references. If you enter:
$ LINE = ~ M "(?: int | unsigned int | char) / s * (/ w )";
To give the variable name, it may not be desirable to save the type of variable, so use (? :). Expression matches the given type, followed by space, followed by variable name (/ w ). But save the variable name in $ 1 instead of $ 2, (? :) is ignored. This saves time and memory, especially in large mode matching.
Another table reached (? :) is useful when using the G modifier, let's take a closer introduction. The g modifier allows the reader to start from the reader in the regular expression, do not have to come back from the head. For example, if you look like a data below:
Block1 block2
And readers want to match , secondly match
$ line = ~ m "BLOCK / D (, *?) (Block / D | $)" G;
The first run will match , but after the second Block, the "matching pointer" is placed in the wrong location. If the input is:
$ line = ~ m "block / d (. *?) (? = block / d)" g;
The same number of text is matched due to the "minimum number of texts between blockl and block2". BLOCK2 is ignored for the next match, so that the next call regular expression can match
In the form of (? =), You can now enter the following statement:
Block 1 Block2
While ($ line = ~ m "block / d (. *?) (? = block / d | $)" g)
{
Print "$ 1 / n";
}
And printed first printed
9.3.9 Regular expression principles summary
The knowledge of the previous section should be sufficient to let readers use regular expressions. Although they are called "basic" regular expressions, we will see a different combination, the regular expression can form a huge alliance in the battle of data - this is often a war. Eight principles - declaration once - Yes:
· Principle 1: Regular expression has three different forms (match (m //), replacement (S ///) and conversion (TR / / /)).
· Principle 2: Regular expression only matches scalar ($ scalar = ~ m "a"; can work; @ array = ~ m "a" treats @Array as scales, but may not succeed).
· Principle 3: Regular expression matches the earliest possible match of a given mode. By default, only regular expressions only ($ a = 'string string2'; $ a = ~ s "string"; resulting in $ A == '1 string 2').
· Principle 4: Regular expressions can handle any and all characters that can be handled by dual quotes ($ A = ~ m "$ VARB" extends VARB to variables before matching; thus, $ varb = 'a' $ a = 'As' $ a = ~ s "$ varb" "makes $ a equal to S).
· Principle 5: Regular expression generates two situations during the evaluation process: the result status and reverse reference. Whether $ a = ~ m "VARB" tells if there is a bilateral string Varb in $ A, then $ A = ~ S "(Word1)" $ 1 $ 2 "" Switch "two words.
· Principle 6: The ability of the regular expression is the wildcard and multiple matching operators and how they operate. $ A = ~ m "/ w " matches one or more word characters; $ a = ~ m "/ d *" matches zero or more numbers.
· Principle 7: If you want to match the not only one character set, Perl uses the technology called Always. If you enter M "(CAT | DOG)", it is equivalent to "Matching string CAT or" DOG ".
· Principle 8: Perl provides an extension to the regular expression with (? ..) syntax.
Wow! How do readers learn all these principles? I suggest you start with your simple start. If you learn $ a = ~ m "error" is the ability to find sub-string error in $ A, it is already the ability to get in the low language such as C. In the discussion of two main concepts, we will give many practical examples below after the regular expression modifier and environment.
9.3.10 Regular expression visits
All regular expressions in the previous section have the following form:
$ a = ~ m //; # m "" IS synony2n, as is m {}
or:
$ A = ~ s ///; # s "" "is synonym, as is s {} {}
Both represent the default form of regular expressions, they start from the head of the expression, match or replace it once.
Assume that we don't want to "match or replace it." Assume that we want to replace all A in the expression to b, or we think: no isolation. In other words, suppose we don't want to use default behavior. Fortunately, there are some helpful modifiers, we can increase them to regular expressions to make things outside their default action. Regular expression modified forms look like this:
$ a = ~ m // gismxo; $ a = ~ s /// geimxo;
Use one or more "additional" to convert the Perl expression in the back modifier. We will first tell a modifier (S, M, I, X, O) having a common feature between the two, thereby telling a modifier (E, G) having different meanings between the two operators. Let's talk about them below.
1. Replace and match the identifier
Structure M "and S" "" There are many common operators. They are listed in Table 9-6 below.
The five modifications will be described in detail below.
(1) x: Extended readability specification expressions
Regular expressions sometimes become very messy, everyone has seen in front. But those who have some expression in actual life are still not as well. Consider the following code --- Roughly - Match a subroutine in Perl:
$ line = ~ m "SUB / S (W ) / S {(. *?)} / s * (? = SUB)" S;
What does these code mean? Even if the reader is a PERL older, even if a comment is given, this expression will force the reader to think about a small array. Especially the lack of space is a nasty thing, the number of special characters makes the reader feel dizzy. The X operator is provided in Perl4. Because the X operator places a space in the regular expression to make them easy to read and allow the space to leave the space, the X operator is a special blessing. expression:
$ line = ~ m "SUB / S (/ W ) / S {(. *?)} / s * (? = SUB)" S;
Change to:
$ line = ~ m {
SUB / S (/ W ) / S # matches the 'Sub' Keyword, Subroutine Name
# And matches The White SpaceAfter Words.
{# Opening BRACE
(. *?) # Matches the text of the sub. And savs it
# Further us.
} # Closing brace
/ s * (? = SUB) # The next sub keyword
} SX;
But it is still difficult. Although there are still a lot of headache special characters, readers can see the logical relationships between them very clearly. It is more like a practical logical thinking process. Praggies are placed in the correct position. Since it is possible to write a comment, this is very helpful to give a detailed description of the content being executed.
However, pay attention to a few warnings. Since there is a space in the regular expression, then filter it, the following code will not match:
$ line = "Multi line string / nhere";
$ line = ~ m "multi line string" x; # this does not match the Above Because
# The space Above gets Munged Out.
Note The X readable function acts only at the first parentheses of the alternative operator (that is, in s {}}}). This is because only the first brackets are inserted into a double quocation of it. Everything in the second parentheses is on the literal. For example, the following example may not do what readers want to make it do:
$ line = 'aaaaaa'; # we want 'bbbbb' after the substeute below.
$ line = ~ s {a
}
{
B
} Gx; # we want to do a 'general' match, i.e., match
# All a's for b's. Doeesn't work!
Print $ line; # prints'. b b ... etc.
# Six Times over.
Everything that happens here is due to the items that have not been left in the second parentheses. Instead, each instance of A (left here is replaced with 3 tablets, one B and a wrap, resulting in confusion.
Easy-to-read regular expressions have a huge role in keeping your mind when dealing with more complex things.
2) I: case in case of unoccupied match *
Regular expression is default, case sensitive. Using i indicates that uncomment matching is performed.
$ pattern = 'adercise';
$ pattern = ~ s "EXER" EXER "I '# matches first four character character charactercise. (exer)
$ pattern = 'edward peschko';
$ pattern ==???
In both cases, they do: the first kind of Xercise, the second kind of EDWARD
PESCHKO turns into edmund peschko.
i modifier is indeed a good speed diak when writing a few monotonic regular expressions, such as * pattern = ~ m "[EE]
[Xx] [EE] [RR] "; (3) S: Put the mode as a line
There is no modifier, point (.) Matches any character other than the wrapper. Sometimes this is helpful; sometimes it is very frustrating, especially if the reader has more than a multi-line data. Consider the following example:
$ line =
BLOCK1:
End Block
BLOCK2:
End Block '
Now suppose the reader wants to match the text between the block
$ line = ~ m {
Block (/ d )
(. *?)
End / Block # Note Backslash, Space Will BE IGNORED OTHERWISE
}
No success, because the wildcard (.) Matches each character other than the wrapper. Therefore, the regular expression is in trouble when the first commission is encountered, and there is still a match. Sometimes, in the following example, it is very helpful to match any characters other than the line harmonic (.). By expanding, the wildcard (/ s) matches [/ n / t] instead of tab, and spaces are very helpful. This is the function of the S operator. It tells Perl Do not assume that the string is being used is one line. The previous example matches success after using a S added to the regular expression at the end of the expression.
$ line = ~ m {
Block (/ d )
(. *?)
End / Block # Note Backslash, Space Will BE IGNORED OTHERWISE
} S;
At the end plus S, now the match is completed.
(4) M: Despect the pattern as many lines
The M operator is opposite to the function of the S operator. In other words, it refers to: "Put the regular expression as multi-line instead of one line." This basically makes ^ and $ now do not only match the beginning and end of the string, but also make ^ matching change The next arbitrary characters and make the $ match a newline. In the example below: ''
$ line = 'a
B
C ';
$ line = ~ m "????
The M modifier makes the reverse reference $ becoming a rather than A / NB / NC. (5) O: Compile regular expression once
o The modified poet is helpful when processing long expressions. Enter the following statement:
$ line = ~ m "
Here
$ line = ~ m "$ regex" o;
The reader will promise $ regex to Perl will not change. If you change, Perl will not pay attention to the reader's change. thus,
$ regex = 'b';
While ('bbbbb' = ~ m "$ regex o) {$ regex = 'c';}
In fact, an infinite loop in Perl. $ regex changed. However, there is no reflection in the regular expression (however, this does not limit the user with one and only one regexp. Each expression instance with O is compiled before use).
2. Replace dedicated modifier
The modifier (S, M, X, I and O) is suitable for replacement and for matching (s ///, m //), but several modifiers are matched. They are G and E listed below.
(1) G: Replace all modes for their equivalent part.
The operator s /// only replaces the first discovery character when the character is discovered. If you want to replace each single-instance content as other content, play with the G operator. The following three examples are equivalent:
$ PATTERN = 'Num1 Num2 NUM3;
$ pattern = ~ s "Num" letter "g; #substitutes Num for letter.
$ pattern = ~ s "Num" letter "gi; #note - You can Stack these Modifiers.
# Does Exactly the Same Thing as The Above.
While ($ pattern = ~ s "Num" letter ") {}
All examples lead to $ pattern letterl letter2 letter3. The first example is based on the conditions, the second basis (GI identifier), the third slowly completed (each time S "Num" Letter "replaced once. The first is Num1 Letter2 Letter3, followed by Numl Num2 Letter3 Finally, Num1 Num2 Num3).
(2) E: Apply the second part of the S /// as a complete "micro Perl program" rather than a string for evaluation. The E modifier for S /// is very good. But it is also very investing. Readers can use it to complete spell-like replacement. We only mention it here and give an example. We assume that all letters in the following strings are replaced with the corresponding ASCII number:
$ String = 'Hello';
$ String = ~ s {(/ w)} #We save the $ 1.
{ORD ($ 1): = - "";} EGX; this example will print 104 101 108 108 111. Here each character is removed sequentially and then converted to a number through the ORD function. Needless to say, this can handle very powerful materials in a very short time. This will also take a risk to understand.
We recommend that the reader uses this logic as the final method, and only after all "skills" fail. Its use sometimes hides the clear way to do things. Is it the clear, or the following clearer?
$ String = Turntoascii ($ String);
Sub Turntoascii
{
MY ($ String) = @ _;
MY ($ RETURN, @ letters);
@ Letters = split (//, $ string);
Foreach $ letter (@Letters)
{
$ letter = ORD ($ Letter). "" IF ($ letter = ~ m "/ w");
}
$ RETURN = JOIN ('', @ letter);
$ RETURN;
}
The latter example is obvious and easy to maintain. However, it is more than ten times the length of the former and the speed is a few times; a judgment must be invoked when e.
9.3.11 Matching and G operator
The work of modifiers (X, I, S and E) is the same as that of matching operators m //; however, there is a significant change in the working mode of the G operator, and the reader will use it frequently.
As we see in the previous, the G operator in the replacement means replacing each instance of the regular expression.
However, this is meaningless in the matching environment, and the reverse reference indicates one and only one match. Thus, Perl uses G operators in M / / different from s ///.
Perl Connect the G operator - iterative procedure. When $ String = ~ m "G matches once, Perl remembers the place where the match occurs. This means that readers can use iterative programs to match where they leave. When Perl encounters the end of the string, reset the iterative program:
$ line = "Hello STRANGER Hello Friend Hello Sam";
While ($ line = ~ m "Hello (/ w )" SG)
{
Print "$ L / N";
}
This will output:
STRANGER
Friend
Sam
Then exit because the internal iterative procedure reaches the tail of the expression. Here is a prompt if the reader is using the G modifier, then any modifications that match the variables that pass assignments will result in resetting iterative programs.
$ line = "hello";
While ($ line = ~ m "Hello" SG)
{
$ line = $ line;
}
This is an unlimited loop! So to constrain yourself and avoid modifying strings when matching. (Do a copy to replace it)
9.3.12 Modifiers and Environment
If readers are not familiar with Perl regular expressions before starting to learn this chapter, then the same modifiers, methods, special characters, and more may be floated in the reader's mind. Let us take some time to see different forms of regular expressions, and then introduce some commonly used regular expressions to end this chapter.
All this is related to the environment. Remember, in Perl, the environment is the king. If the reader pays attention to it, you can do a lot of power by identifying the environment in which different expressions. We simply "crystallize" some of us to see so far, then add a few new.
1. Replacement (no decorative) in the scalar environment seems to be as follows:
IF ($ String = ~ S "a" b ") {print:" subs ";} program prints SubstitudeD Correctly, and returns IF back when the string actually matches. At the same time it will replace all instances A B.
2. Replacement in the scalar environment (g modifier)
Returns the number of successful matches in this case. Use this method if the reader enters the following code:
($ String = ~ S "a" B "g) == ($ String = ~ S" a "b" g)
Perl will tell the reader if there is a String that has the same number of A in $ String2, simultaneous replacement. If the reader is willing to use it throughout the file:
Undef $ /;
MY $ fH = new filehandle ("file1");
MY $ fH2 = new filehandle ("file2");
(($ LINE = <$ FH>) = ~ S "a" b ") == ($ line2 = <$ fH2>) = ~ s" a "b");
This calculates A number in two files, comparing them when doing replacement.
3. Replacement (no decorative) and replacement in array environments (G identifier) in an array environment
These two are more annoying. They actually do the same job with the scalar environment.
4. Matching in the scalar environment (no decorative)
This is the same as the replacement without a modifier in the scalar environment. If the reader enters:
IF ($ line = ~ m "a") {print "matched an a! / n";}
It only checks if there is a A in $ LINE. If you enter:
IF ($ line = ~ m "/ b (/ w ) / b") {Print "$ L / N";}
This will check if there is a word in $ LINE, then store it in $ 1, print it. at the same time:
($ line = ~ m "/ b (/ w ) / b") && (Print "$ L / N";);
Do the same job, use only short circuit to print it.
5. Matching in an array environment (no decorative)
This will match the first position that the regular expression can match, and then simply put the reverse reference into a table that can be accessed. E.g:
($ VARIABLE, $ Equals, $ VALUE) = ($ LINE = ~ M "(/ W ) / S * (=) / s * (/ w )")
This code takes the first reference (/ w ) so that it is $ VARIABLE; take the second reference (=) makes $ equals; then take the third reference (/ w ) so that it is $ Value.
6. Match in array environments (g 姊姊)
Take the regular expression and apply it as much as possible. Then put the result to an array that consists of all possible matching.
E.g:
$ line = '1.2 3.4 Beta 5.66';
@Matches = ($ line = ~ m "(/d*)" g);
@Matches is equal to 1.2, 3.4, 5.66. The G modifier completes iteration, first match 1.2, followed by 3.4, and the third is 5.6. same:
Undef $ /;
MY- $ fd = new filehandle ("file"); @ Comments = (<$ fd> = ~ m "//*(.*)/*/");
An array of comments containing file $ fd will be generated.
7. Matching in scalar environment (g modifier)
Finally, if you use a matching operator in a scalar environment, the reader gets completely different behaviors that are even in the regular expression world or even in the Perl world. This is what we talk about "iteration" behavior. If you enter the following code:
$ line = "Begin begin
While ($ line = ~ m "begin (. *?) (? = begin | $)" sg)
{
Push (@Blocks, $ L);
}
This will match the following text (bold), and then load it in the @Blocks in the next While iteration.
Begin (%) Begin
Begin Begin
Begin Begin
Through (%) we have shown that each iteration starts matching position. Note the usage in this case (? =). It is necessary to match the correct way; if you don't need it, the matching program will be placed in the wrong place.