Unveiling the mystery of the regular expression syntax
Regular expressions (RES) are often mistakenly considered to be a mysterious language that is only a few people understand. On the surface, they do look messy, if you don't know its grammar, then its code is just a bunch of text garbage in your eyes. In fact, the regular expression is very simple and can be understood. After reading this article, you will know the general syntax of the regular expression.
Support multiple platforms
Regular expressions were first proposed by mathematician Stephen Klene in 1956. He is proposed on the basis of the increasing research results of natural language. Regular expressions with full syntax use in terms of format matching of characters, later applied to the field of melting information technology. Since then, the regular expression has been developed through several periods, and the current standard has been approved by ISO (International Standards Organization) and is identified by the Open Group organization.
Regular expressions are not a dedicated language, but it can be used to find and replace text in a file or character. It has two standards: basic regular expressions (BRE), extended regular expressions (ERE). ERE includes BRE function and other concepts.
Regular expressions are used in many programs, including XSH, EGREP, SED, VI, and programs under UNIX platforms. They can be adopted in many languages, such as HTML and XML, which is usually only a subset of the entire standard.
More ordinary than you think
This function is also increasingly complete as the regular expression of the procedure of the crossed platform is transplanted to the cross platform. The search engine on the network uses it, the E-mail program is also used, even if you are not a UNIX programmer, you can also use rule language to simplify your program and shorten your development time.
Regular expression 101
Many regular expressions look similar, because you have not studied them before. Wildcard is a structural type of RE, that is, repeated operation. Let's take a look at the most common basic syntax type of ERE standard. In order to provide examples of specific purposes, I will use several different programs.
Character match
The key to the regular expression is to determine what you want to search, if there is no concept, RES will be useless. Each expression contains instructions that you need to find, as shown in Table A.
Table A: Character-Matching Regular Expressions
operating
Explanation
example
result
.
Match Any One Character
Grep .ord sample.txt
Will Match "Ford", "Lord", "2ORD", etc. In the file sample.txt.
[]
Match Any One Character Listed Between THE BRACKETS
GREP [CNG] ORD SAMPLE.TXT
Will Match Only "Cord", "NORD", And "Gord"
[^]
Match Any One Character Not listed Between The Brackets
GREP [^ cn] ORD SAMPLE.TXT
Will Match "Lord", "2ORD", etc. But not "cord" OR "NORD"
GREP [A-ZA-Z] ORD SAMPLE.TXT
Will Match "Aord", "Bord", "Aord", "BORD", ETC.
GREP [^ 0-9] ORD SAMPLE.TXT
Will Match "Aord", "Aord", ETC. But Not "2ORD", ETC.
Repeat operator
Duplicate operators, or quantity words describe the number of times to find a particular character. They are often used in character matching syntax to find multi-line characters, see table B.
Table B: Regular Expression Repetition Operators
operating
Explanation
example
result?
Match Any Character ONE TIME, IF IT EXISTS
EGREP "? Erd" Sample.txt
Will Match "Berd", "Herd", etc. And "ERD"
*
Match Declared Element Multiple Times, IF IT EXISTS
EGREP "n. * rd" Sample.txt
Will Match "Nerd", "NRD", "NEARD", ETC.
Match declared element one or more Times
egrep "[n] ERD" Sample.txt
Will Match "Nerd", "Nnerd", ETC., but not "ERD"
{n}
Match Declared Element EXACTLY N TIMES
EGREP "[a-z] {2} ERD" Sample.txt
Will Match "Cherd", "Blerd", ETC. But not "Nerd", "Erd", "Buzzerd", etc.
{n,}
Match Declared Element At Least N Times
"{2,} ERD" Sample.txt
Will Match "Cherd" and "buzzerd", but not "Nerd"
{n, n}
Match Declared Element At Least N Times, But Not More Than N Times
EGREP "N [E] {1, 2} rd" Sample.txt
Will Match "Nerd" and "neerd"
anchor
Anchor refers to the format it to match, as shown in Figure C. Use it to make it easy for you to find a merger of universal characters. For example, I use the VI line editor command: s represents substeute, the basic syntax of this command is:
S / PATTERN_TO_MATCH / PATTERN_TO_SUBSTITUTE /
Table C: Regular Expression ANCHORS
operating
Explanation
example
result
^
Match at the beginning of a line
S / ^ / Blah /
INSERTS "Blah" at the beginning of the line
$
Match at the end of a line
S / $ / blah /
INSERTS "Blah" at the end of the line
/ <
Match at the beginning of a word
S // blah /
INSERTS "Blah" at the beginning of the word
EGREP "/ Matches "Blahfield", ETC. /> Match at the end of a word S //> / blah / INSERTS "Blah" at the end of the word EGREP "/> black" Sample.txt Matches "Soupblah", ETC. / B Match at the beginning of End of a Word Egrep "/ bblah" Sample.txt Matches "Blahcake" and "countblah" / B Match in the Middle of A Word Egrep "/ bblah" Sample.txt Matches "SUBLAHPER", ETC. interval Another possibility in RES is interval (or insert) symbol. In fact, this symbol is equivalent to an OR statement and represents | symbols. The following statement returns "Nerd" and "Merd" handle in the file Sample.txt: egrep "(n | m) Erd" Sample.txt The interval is very powerful, especially when you look for different spelling, but you can get the same results in the following example: egrep "[nm] ERD" Sample.txt When you use the interval function to connect with the advanced features of the RES, it is more reflected in it. Some reserved characters The last most important feature of RES is to keep characters (also known as specific characters). For example, if you want to find the characters of "NE * RD" and "Ni * RD", the format matching statement "n [ei] * rd" is in line with "NeeeeerD" and "NieieierD", but it is not you want to find. character. Because '*' (asterisk) is a reserved character, you must replace it with a backslash symbol, namely: "n [ei] / * rd". Other reserved characters include: ^ (CARAT) (PERIOD) (Left Bracket} $ (DOLLAR SIGN) (Left Parenthesis) Right Parenthesis | (PIPE) * (Asterisk) (Plus Symbol) (Question Mark) {(Left Curly Bracket, or Left Brace) / Backslash Once you put the above characters, there is no doubt that RES is very hard to read. For example, the EREGI search engine code in the following PHP is hard to read. EREGI ("^ [_ a-z0-9 -] (/. [_ a-z0-9 -] ) * @ [A-Z0-9 -] (/. [A-Z0-9 -] ) * $ ", $ Sendto) You can see that the intent of the program is difficult to grasp. But if you leave the reserved character, you often mistakenly understand the meaning of the code. to sum up In this article, we unveiled the mystery of regular expressions and list the general syntax of the ERE standard. If you want to read the full description of the rules of the Open Group organization, you can see: Regular Expressions, you are welcome to express your questions or views in the discussion district.