Analyze C # files with regular expressions

zhaozj2021-02-16  130

Many readers must write programs that have been colored by the program code according to the synthesis. And this is a very difficult thing before a period of time. You need to write a lot of code analysis syntax - and this is often the most difficult part. until,

In the event of a regular expression, we can freely from heavy work. Regular expressions provide a series of methods (standards, patterns), so that us

Ability to create, compare, and modify strings, and quickly analyze a lot of text and data to search, remove, and replace text modes

[1] . DOTNET Framework offers system.text.RegularExpression namespace to implement their commitments.

1. Regular expression [2]

First of all, I would like to briefly introduce the regular expression.

Regular expressions were first proposed by mathematician Stephen Klene in 1956. He is proposed on the basis of the increasing research results of natural language. Regular expressions with full syntax use in terms of format matching of characters, later applied to the field of melting information technology. Since then, the regular expression has been developed through several periods, and the current standard has been approved by ISO (International Standards Organization) and is identified by the Open Group organization.

Regular expressions are not a dedicated language, but it can be used to find and replace text in a file or character. It has two standards: Basic Regular Expression (BRE), expanding regular expression (ERE). ERE includes BRE function and other concepts.

Advanced existing XSH, EGREP, SED, VI, and procedures under UNIX platform implementation of regular expressions. They can be adopted in many languages, such as HTML and XML, which is usually only a subset of the entire standard. As the regular expression is transplanted to the development of the cross platform, its function is also increasingly complete, and the use has gradually widely used.

2. Related expressions

I can only say so much about the regular expression - it is a small knowledge system, it is impossible to explain it clearly. Here I only introduce a matching string associated with C # syntax analysis. For details, see the collection of this Blog Station Regular Expression Specification [The Open Group]. Also, if you have a quite understanding of the regular expression, you can skip the explanation of each of the following to complete the full text as soon as possible.

I> String "(/////?.)*?" Regular expression is removed. $ ^ {[(|) * ? / outside, other characters match itself. In the top of the form, the quotation mark on both sides refers to the quotation marks on both sides of the string. "//" represents a "/" character. "?" Later indicates that match zero or a character. "." Match any character other than / n. "()" Means capturing a matched sub-string. Use () capture to start automatically numbered from 1 according to the order of the left bracket. The first capture of capture element numbered zero is the text that matches the entire regular expression pattern. "*" Behind the brackets indicates that there is one or more such subtrips. That is, "*" is acting on "(// ?.)". "?" The presence makes an empty string can also be captured. II> Crossword string @ "(" |.) *? "Match the string similar to @" hello "" world "!". Match with any term separated by a | (vertical bar) character; for example, Cat | Dog | Tiger. Use the leftmost success match. III> The XML element S * <. *> Match the C # automation XML documentation in the C # document information. "/ S" means any blank character. It should be noted that please do not modify the case at will. Because in the regular expression is sensitive, in its wildcard, the case is often expressed in the opposite. For example, "/ s" means any non-blank character. (The "/ z" below is this) IV> C # document information in content s? * V> blank line ^ / s * / z "^" specifies that the match must appear on the beginning or row of strings. And "/ z" indicates that the specified match must appear before the end of the string or the end of the string ends. Vi> C # Note //.* VII> C # keyword (Abstract | WHERE | While | Yield) {1} (/. | (/ s) |; |, | / (| / [) {1} space Limit, this only lists few keywords (C # has at least 80 keywords ^ _ ^). It should be noted that the parser matches the first success item on the left. Therefore, there is a word containing the relationship Note Sequence: The included person is to be placed before being included. For example: (in | int), it will not find Int, so it should be (int | in). In addition to this, all parentheses (/ { | / [| / (} | /] | /)) .3. Related classes and their members [3]

[Serializable] public class regex: iSerializable / / Indicates untrutive regular expressions.

The Regex class contains a number of static methods that allow you to use regular expressions without explicitly create a regex object. Use the static method to equivore the constructing Regex object, use the object once again and then destroy it.

The Regex class is unality (read-only) and has inherent thread security. You can create a regex object on any thread and shared between threads.

The above is taken from Microsoft's development documentation. We also need to use several members of it:

/ / Search in the specified input string search the regular expression matching item specified in the Regex constructor. Public match match (String IntPut)

For Match classes

[Serializable] public class match: group // represents the result of a single regular expression match. See Microsoft Development Documents for more information on Group.

We will use its following members

/ / The starting position of the captured sub-string is found in the original string. Public int index {get;} // The length of the sub-string captured. Public int length {get;} // By matching the captured actual sub-string. Public int value {get;} // Gets a value indicating whether the match is successful. Public Bool SuCcess {get;} // Get a collection of groups that match the regular expression. Public Virtual GroupCollection Groups {get;} // From the previous match, the new MATCH containing the next matching result is returned from the last matching position (i.e., the character after the previous matching character). Public match nextmatch (); and the corresponding member of the Group class (in the member of the Match listed above, the first four attributes are inherited by the group class, so these members will not be listed).

The matching string must be specified when the instance of the Regex class is initialized. You can use the constructor to create an instance, use it, then destroy it. Or use the static method directly, which is equivalent to creating an instance. However, after testing, I found that static methods should be slightly slower than the compiled regex object. Please see a set of test data below:

4. Write code

We now need to analyze the C # language elements listed in the third quarter. What I take is a progressive analysis (if you want to take multi-line analysis, the relevant expression needs to be modified [4]).

Using system.text.regularExpression; // Some Other Codes ... // Create a regex instance (parsing as an example of string). Regex doublequotedstring = new regex ("/"); // then goes to match the string. Match m; for (m = doublequotedstring.match (strsomecodes); m.success; m.nextmatch ()) {ForeAne (Group g in m.groups) {// do some drawings}}

The rest is to write color code.

5. Source code

Note: [1] "Can ... Text Mode" Putting from the regular expression language element in the .NET Framework General Reference Language Element [2] Regular Expression Introduction About the regular expression of the regular expression Reference from ZDNET China technology and development related information. [3] The signature and comments of the class and functions that appear in this section are from the Microsoft documentation. [4] For more information, please refer to .NET Framework General Reference Regular Expression Language Elements

转载请注明原文地址:https://www.9cbs.com/read-8707.html

New Post(0)