Interpretation of regular expressions in C #

xiaoxiao2021-03-20  220

For many years, many programming languages ​​and tools contain support for regular expressions, and a series of namespaces and a range of classes that make full playback of rule expressions, and they are also with future Perl. The rule expression in 5 is compatible.

In addition, the Regexp class can also complete some other functions, such as editing from the right left binding mode and expression.

In this article, I will briefly introduce the classes and methods in System.Text.RegularExpression, some string matching, and replacement examples, and the details of the group structure, and finally, some of the common you may use. Expression.

Basic knowledge that should be mastered

Knowledge of rules expressions may be one of the knowledge of many programmers "often forget". In this article, we will assume that you have mastered the usage of rule expressions, especially the use of expressions in Perl 5. The .NET's regexp class is a supercharge in the expression in Perl 5, so it is theoretically as a good starting point. We also assume that you have the basic knowledge of the C # syntax and .NET architecture. If you don't have a rule expression, I suggest you start learning from Perl 5's grammar. The authoritative book in rule expressions is a book written by Jeffrey Freder, which we strongly recommend reading this book for readers who wish to understand expressions. RegularExpression assembly

The Regexp rule class is included in the System.Text.RegularExpressions.dll file, you must reference this file when compiling the application, for example:

CSC r: system.text.regulaRexpressions.dll foo.cs

The command will create a foo.exe file, which references the System.Text.RegularExpressions file. Name Space Introduction

Only 6 classes and one definition in the namespace, they are: Capture: The result of one match; CaptureCollection: Capture sequence; group: The result of a group record, inherited by Capture; Match: One expression The matching result is inherited by group; MatchCollection: Match's sequence; Matchevaluator: The agent used when replacing the operation; Regex: Instance of the compiled expression.

There are also some static methods in the Regex class:

Escape: Side escape in Regex in the string; ismatch: If the expression matches in the string, the method returns a Boolean value; Match: Returns the instance of Match; Matches: Return a series of match Method; Replace: Replace the matching expression with replacement strings; split: Returns a series of characters determined by the expression; Unescape: Do not escape the escape character in the string. Simple match

We first start learning from the simple expression of Regex, Match classes. Match m = regex.match ("Abracadabra", "(A | B | R) "); we now have an instance of the MATCH class that can be used to test, for example: if (m.success) ... if Want to use the matching string, you can convert it into a string: console.writeline ("match =" m.toString ()); this example can get the following output: match = abra. This is the matching string. The replacement of the string is very intuitive. For example, the following statement: string s = regex.replace ("Abracadabra", "Abra", "zzzz"); it returns a string zzzcadzzzz, all matching strings are replaced with zzzzz.

Now let's look at an example of a more complex string: string s = regex.replace ("abra", @ "^ / s * (. *?) / S * $", "$ 1"); this statement returns String Abra, its preamble and suffix are removed. The above mode is very useful for deleting leading and subsequent spaces in any string. In C #, we often use alphanuce strings, in an alphanumeric string, the compiler does not treat characters "/" as an escape character. When using characters "/" specifies the escape character, @ "..." is very useful. Also worth mentioning $ 1 is used in a string replacement, it indicates that the replacement string can only contain the alternative string. Detail of the matching engine

Now, we understand a slightly complicated example through a group structure.

Look at the following example: String text = "Abracadabra1abracadabra2abracadabra3"; string Pat = @ "(# first group start ABRA # matching string abra (# second group start CAD # matching string CAD)? # Second Group End (Optional)) # The first group ends # Match once or multiple "; // ignore the comment using X modifier 忽 注 注 Regex R = New Regex (PAT," X "); // Get group number List int [] gnums = r.getGroupNumBers (); // Match Match m = r.match (text); while (m.success) {// From group 1 Start (int i = 1; i

Another way to complete the functions in the previous example is through a Matchevaluator, the new code as follows: static string capText (MATCH M) {// acquisition of the matching string string x = m.toString (); // If One character is lowercase if (Char.islower (x [0])) // Convert to capital return char.toupper (x [0]) x.substring (1, x.length-1); return x;} Static void main () {string text = "The Quick Red Fox Jumped Over the lazy brown dog."; system.console.writeline ("text = [" text "]"); string pattern = @ "/ w " String Result = regex.replace (TEXT, PATTERN, New Matchevaluator (Test.capText)); System.Console.writeline ("Result = [" Result "]");} At the same time, it should be noted that due to simply need This model is very simple to modify words without any words. Common expression

In order to better understand how to use rule expressions in the C # environment, I wrote some rules expressions that may be useful to you. These expressions have been used in other environments. I hope to have some to you. help. Roman numerals

String p1 = "^ m * (d? c {0, 3} | C [DM])" "(L? x {0, 3} | x [lc]) (v? i {0, 3} | i [vx]) $ "; string t1 =" vii "; match m1 = regex.match (t1, p1); two words before exchange

String t2 = "the quick brown fox"; string p2 = @ "(/ s ) (/ s )"; regex x2 = new regex (p2); string r2 = x2.replace (T2, "$ 3 $ 2 $ 1 ", 1); Guan Jian = Value

String T3 = "myval = 3"; string p3 = @ "(/ w ) / s * = / s * (. *) / s * $"; match m3 = regex.match (t3, p3); implement each line 80 characters

String T4 = "*******************" "****************************** ******* " " ****************************** "; String P4 =". {80, } "; Match M4 = regex.match (T4, P4); Month / Day / Years: Score: Second Time Format String T5 =" 01/01/01 16:10:01 "; String P5 = @" / D ) / (/ d ) / (/ d ): (/ d ): (/ d ) "; match m5 = regex.match (t5, p5); change the directory (only for Windows platform)

String T6 = @ "c: / documents and settings / user1 / desktop /"; string r6 = regex.replace (T6, @ "// user1 ///" @ "// user2 //); extended 16-bit turn Mean

String T7 = "% 41"; // Capital A String P7 = "% ([0-9A-FA-F] [0-9A-FA-F])"; string r7 = regex.replace (T7, P7, HEXCONVERT); Remove annotations in C language (waiting to be improved)

String T8 = @ "/ * * Traditional Note * /"; String P8 = @ "// * # Match the delimiter started by the comment. *? # Match the annotation / * / # matching annotation end delimiter; String r8 = regex.replace (T8, P8, "," XS "); delete spaces in the string and end

String T9A = "Leading"; string p9a = @ "^ / s "; string r9a = regex.replace (T9A, P9A, ""); String T9B = "trailing"; string p9b = @ "/ s $"; string; string; string R9B = regex.replace (T9B, P9B, ""); add character n in characters / post, make it a true new line

String T10 = @ "/ nteest / n"; string r10 = regex.replace (t10, @ "// n", "/ n"); conversion IP address

String T11 = "55.54.53.52"; string p11 = "^" @ "([01]? / d / d | 2 [0-4] / d | 25 [0-5]) /." @ " ([01]? / D / d | 2 [0-4] / d | 25 [0-5]) /. " @" ([01]? / D / d | 2 [0-4] / d " @" ([01]? / D / d | 2 [0-4] / d | 25 [0-5]) " " $ "; match m11 = regex .Match (t11, p11); Remove the file name included path String T12 = @ "c: /file.txt"; string p12 = @ "^. * //"; string r12 = regex.replace (T12, P12, ""); Links in the multi-line string

String T13 = @ "this is a split line"; string p13 = @ "/ s * / r? / n / s *"; string r13 = regex.replace (T13, P13, ""); extract the string All numbers

String T14 = @ "TEST 1 TEST 2.3 TEST 47"; STRING P14 = @ "(/ D /.? / D * | /. Matchcollection MC14 = Regex.matches (T14, P14); find out Upper case

String T15 = "this is a test of all caps"; string p15 = @ "(/ b)"; matchcollection mc15 = regex.matches (t15, p15); Lower word

String T16 = "this is a test of lowercase"; string p16 = @ "(/ b)"; matchcollection mc16 = regex.matches (T16, P16); find Word for a letter is uppercase

String T17 = "this is a test of initial caps"; string p17 = @ "(/ b [^ / WA-Z0-9 _] [^ / WA-Z0-9 _] * / b)"; MatchCollection MC17 = Regex. Matches (T17, P17); find links in simple HTML languages

String T18 = @ " first tag text Next tag text "; String p18 = @" ] *? Href / s * = / s * ["" "" " @" ([^ "">] ?) ['""]? > "; Matchcollection mc18 = regex.matches (t18, p18," si "); string t18 = @" first tag text Next Tag text "; string p18 = @" ] *? href / s * = / s * ["" " " @ "([^" ">] ?) ['" "" ">"; Matchcollection mc18 = regex.matches (t18, p18, "si");


New Post(0)