Analyze C # files with regular expressions

xiaoxiao2021-03-06  142

??? Many readers have written to write a program that colors the program code according to the synthesis. And this is a very difficult thing before a period of time. You need to write a lot of code analysis syntax - and this is often the most difficult part. until,

In the event of a regular expression, we can freely from heavy work. Regular expressions provide a series of methods (standards, patterns), so that us

Ability to create, compare, and modify strings, and quickly analyze a lot of text and data to search, remove, and replace text modes

[1] . DOTNET Framework offers system.text.RegularExpression namespace to implement their commitments.

???? 1. Regular expression [2]

???? First, I want to briefly introduce the regular expression.

???? The regular expression was earliered by mathematician Stephen Klene in 1956. He is proposed on the basis of the increasing research results of the natural language. Regular expressions with full syntax use in terms of format matching of characters, later applied to the field of melting information technology. Since then, the regular expression has been developed through several periods, and the current standard has been approved by ISO (International Standards Organization) and is identified by the Open Group organization.

???? Regular expression is not a dedicated language, but it can be used to find and replace text in a file or character. It has two standards: Basic Regular Expression (BRE), expanding regular expression (ERE). ERE includes BRE function and other concepts.

???? Advanced existing XSH, EGREP, SED, VI, and procedures under UNIX platform implement regular expressions. They can be adopted in many languages, such as HTML and XML, which is usually only a subset of the entire standard. As the regular expression is transplanted to the development of the cross platform, its function is also increasingly complete, and the use has gradually widely used.

???? 2. Related expressions

???? I can only say so much about the regular expression - it is a small knowledge system, it is impossible to explain it clearly. Here I only introduce a matching string associated with C # syntax analysis. For details, see the collection of this Blog Station Regular Expression Specification [The Open Group]. Also, if you have a quite understanding of the regular expression, you can skip the explanation of each of the following to complete the full text as soon as possible.

???? I> string?? "(///?.)*?" ?????? Regular expression is removed. $ ^ {[(|) * ? / outside, other characters and itself match. In the top of the form, the quotation mark on both sides refers to the quotation marks on both sides of the string. "//" represents a "/" character. "?" Later indicates that match zero or a character. "." Match any character other than / n. ?????? "()" indicates the captured sub-string. Use () capture to start automatically numbered from 1 according to the order of the left bracket. The first capture of capture element numbered zero is the text that matches the entire regular expression pattern. "*" Behind the brackets indicates that there is one or more such subtrips. That is, "*" is acting on "(// ?.)". ?????? "?" The presence enables empty strings to be captured. ???? II> Crossword string? ?? @ "(" "|.) *?" ?????? Match similar to @ "hello" "world" "!" string. ????? Matches with any term separated by a | (vertical bar) character; for example, Cat | Dog | Tiger. Use the leftmost success match. ???? III> The XML element in the C # document information ??? s * <. *> ????? Match the C # automation XML document. "/ S" means any blank character. It should be noted that please do not modify the case at will. Because in the regular expression is sensitive, in its wildcard, the case is often expressed in the opposite. For example, "/ s" means any non-blank character. (Below? "/ Z" is also like this) ???? iv> C # document information ??? s? * ???? v> blank line ??? ^ / s * / z??? ??? "" "" Specifies that the match must appear on the beginning or row of strings. And "/ z" indicates that the specified match must appear before the end of the string or the end of the string ends. ???? vi> c # annotation ??? //.* ???? VII> C # keyword ??? (Abstract | WHERE | While | Yield) {1} (/. | (/ s) | |, | / (| / [) {1} ?????? section, this only lists few keywords (C # has at least 80 keywords ^ _ ^). Need to pay attention to The parser will match the first success item on the left. Therefore, words with the relationship should be paid to the order: The included person is to be placed before being included. For example: (in | int), it will not find Int, so it should Yes (int | in). In addition to this, all parentheses (/ {| / [| / (| /} | /] | /)). ???? 3. Related Class and its members [3]

[Serializable] public class regex: iSerializable / / Indicates untrutive regular expressions.

?????? Regex class contains a number of static methods that allow you to use regular expressions without explicitly create a regex object. Use the static method to equivore the constructing Regex object, use the object once again and then destroy it.

?????? Regex class is unality (read-only) and has inherent thread security. You can create a regex object on any thread and shared between threads.

?????? The above is taken from Microsoft's development documentation. We also need to use several members of it:

/ / Search in the specified input string search the regular expression matching item specified in the Regex constructor. Public match match (?????? string intput)

?????? For the Match class

[Serializable] public class match: group // represents the result of a single regular expression match. See Microsoft Development Documents for more information on Group. ?????? we will use its following members

/ / The starting position of the captured sub-string is found in the original string. Public int index {? get ;?}? // The length of the sub-string captured. Public int layth {? get ;?}? // By matching the captured actual sub-string. Public int value {? get ;?}? // Get a value, which indicates whether the match is successful. Public Bool SuCcess {? get ;?}? // Get a collection of groups that match the regular expression. Public Virtual GroupCollection Groups {? get ;?}? // Returns a new MATCH containing the next matching result from the position of the previous match (i.e., the character after the last matching character). Public match nextmatch ();

?????? and the corresponding member of the Group class (in the member of the Match listed above, the first four attributes are inherited by the group class, so these members will no longer list them).

?????? Matching strings must be specified when the instance of the Regex class is initialized. You can use the constructor to create an instance, use it, then destroy it. Or use the static method directly, which is equivalent to creating an instance. However, after testing, I found that static methods should be slightly slower than the compiled regex object. Please see a set of test data below:

???? 4. Write code

?????? We now need to analyze the C # language elements listed in the third quarter. What I take is a progressive analysis (if you want to take multi-line analysis, the relevant expression needs to be modified [4]).

Using system.text.regularExpression; // Some Other Codes ... // Create a regex instance (parsing as an example of string). Regex doublequotedString = new regex (? "/" (?) *? / ""?); // then go to match the string. Match m; for (? M? =? DoublequotedString.match (? STRSOMECODES?)?;? M.success?;? M.nextmatch ()?)? {?????? foreach (? Group? G? In ? m.Groups?)? {????????????? // do some drawings ??????}}

?????? The rest is to write coloring code.

???? 5. Source code

?

Note: [1] "Can ... Text Mode" ??? Putting from the regular expression language element in .NET Framework regular reference Language element [2] Regular expression introduction ??? About the regular expression in ZDNET Covered content in China technology and development. [3] The signature and comments of the class and functions that appear in this section are from the Microsoft documentation. [4] Multi-line analysis??? For details, please refer to .NET Framework General Reference Regular Expression Language Elements

转载请注明原文地址:https://www.9cbs.com/read-127894.html

New Post(0)