3.0. Introduction
It would be very rare to create an entire application without using a single string. Strings help make sense of the seemingly random jumble of binary data that applications use to accomplish a task. They appear in all facets of application development from the smallest system utility to large enterprise services. Their value is so apparent that more and more connected systems are leaning toward string data within their communication protocols by utilizing the Extensible Markup Language (XML) rather than the more cumbersome traditional transmission of large binary data. This book uses strings extensively To Examine The International Contents of Variables and The Results of Program Flow Using Framework Class Libraries (FCL) Methods Such as console.writeline and messagebox.show.
In this chapter, you will learn how to take advantage of the rich support for strings within the .NET Framework and the C # language. Coverage includes ways to manipulate string contents, programmatically inspect strings and their character attributes, and optimize performance when working with string objects. Furthermore, this chapter uncovers the power of regular expressions and how they allow you to effectively parse and manipulate string data. After reading this chapter, you will be able to use regular expressions in a variety of different situations where their value is apparent.
3.1. CREATING AND Using String Objects
You want to create and manipulate string data foring.
TECHNIQUE
The C # language, knowing the importance of string data, contains a string keyword that simulates the behavior of a value data type. To create a string, declare a variable using the string keyword. You can use the assignment operator to initialize the variable using a STATIC STRING OR WITH AN Already Initialized String Variable.String String1 = "this is a string"; string string2 = string1;
To gain more control over string initialization, declare a variable using the System.String data type and create a new instance using the new keyword. The System.String class contains several constructors that you can use to initialize the string value. For instance, to create a new string that is a small subset of an existing string, use the overloaded constructor, which takes a character array and two integers denoting the beginning index and the number of characters from that index to copy:
Class class1 {[stathread] static void main (string "args) {string string1 =" field1, field2 "; system.string string2 = new system.string (string1.tochararray (), 8, 6); console.writeline String2);}}
Finally, if you know a string will be intensively manipulated, use the System.Text. StringBuilder class. Creating a variable of this data type is similar to using the System.String class, and it contains several constructors to initialize the internal string value. The key internal difference between a regular string object and a StringBuilder lies in performance. Whenever a string is manipulated in some manner, a new object has to be created, which subsequently causes the old object to be marked for deletion by the garbage collector. for a string that undergoes several transformations, the performance hit associated with frequent object creation and deletions can be great. The StringBuilder class, on the other hand, maintains an internal buffer, which expands to make room for more string data should the need arise, thereby Decreasing frequent Object Activations.comments
There is no recommendation on whether you use the string keyword or the System.String class. The string keyword is simply an alias for this class, so it is all a matter of taste. We prefer using the string keyword, but this preference is purely . --...
The string class contains many methods, both instance and static, for manipulating strings. If you want to compare strings, you can use the Compare method. If you are just testing for equality, then you might want to use the overloaded equality operator (= =). However, the Compare method returns an integer instead of Boolean value denoting how the two strings differ. If the return value is 0, then the strings are equal. If the return value is greater than 0, as shown in Listing 3.1, then the first operand is greater alphabetically than the second operand. If the return value is less than 0, the opposite is true. When a string is said to be alphabetically greater or lower than another, each character reading from left to right from both strings Is Compared Use ITS Equivalent Ascii Value.LISTING 3.1 Using The Compare Method in The String Class
using System; namespace _1_UsingStrings {class Class1 {[STAThread] static void Main (string [] args) {string string1 = ""; String string2 = ""; Console.Write ( "Enter string 1:"); string1 = Console. Readline (); "Enter String 2:"); String2 = console.readline (); // string and string area The Same Types console.writeline ("string1 is a {0} / nstring2 is a {1 } ", string1.GetType () FullName, string2.GetType () FullName);.. CompareStrings (string1, string2);} public static void CompareStrings (string str1, string str2) {int compare = String.Compare (str1, str2 ); if (Compare == 0) {Console.WriteLine ("The strings {0} and {1} are the same./n", str1, str2);} else if (Compare <0) {Console.WriteLine "THE STRING {0}", str1, str2);} else if (compare> 0) {Console.Writeline ("the string {0} is greater Than {1}", STR1, STR2) }}}}}
As mentioned earlier, the string class contains both instance and static methods. Sometimes you have no choice about whether to use an instance or static method. However, a few of the instance methods contain a static version as well. Because calling a static method is a nonvirtual function call, you see performance gains if you use this version. An example where you might see both instance and static versions appears in Listing 3.1. The string comparison uses the static Compare method. you can also do so using the nonstatic CompareTo method using one of the string instances passed in as parameters. in most cases, the performance gain is negligible, but if an application needs to repeatedly call these methods, you might want to consider using the static over the non-static method.The string class IS Immutable. Once a string is created, it cannot be manipulated. Methods With String Class That Modify DESTRINAL STRING INSTANCE ATUALLY THE STRING AND CREATE A New String O bject rather than manipulate the original string instance. It can be expensive to repeatedly call string methods if new objects are created and destroyed continuously. To solve this, the .NET Framework contains a StringBuilder class contained within the System.Text namespace, which is explained Later in this chapter.
3.2. Formatting strings
Given One or More Objects, You Want to create a single formatted string representation.
TECHNIQUE
You can Format strings Using Numeric and pictures for the an .format or welting - formters such as console.writeline.
Comments
The String class as well as a few other methods within the .NET Framework allow you to format strings to present them in a more ordered and readable format. Up to this point in the book, we used basic formatting when calling the Console.WriteLine method . The first parameter to Console.WriteLine is the format specifier string. This string controls how the remaining parameters to the method should appear when displayed. You use placeholders within the format string to insert the value of a variable. This placeholder uses the syntax { N} where n is the index in the parameter list following the format specifier. Take the Following Line of code, for instance: console.writeline ("x = {0}, y = {1}, {0} {1} = {2} ", x, y, x y);
This line of code has three parameters following the format specifier string. You use placeholders within the format specification, and when this method is called, the appropriate substitutions are made. Although you can do the same thing using string concatenation, the resultant line of code IS Slightly Obfuscated:
String s = "x =" x ", y =" y "," x " " y "=" (x y); console.writeline (s);
You can further refine the format by applying format attributes on the placeholders themselves These additional attributes follow the parameter index value and are separated from that index with a:.. Character There are two types of special formatting available The first is numeric formatting, which. lets you format a numeric parameter into one of nine different numeric formats, as shown in Table 3.1. The format of these specifiers, using the currency format as an example, is Cxx where xx is a number from 1 to 99 specifying the number of digits to display. Listing 3.2 shows how to display an array of integers in hexadecimal format, including how to specify the number of digits to display. Notice also how you can change the case of the hexadecimal numbers A through F by using an uppercase or lowercase format Specifier.Table 3.1 Numeric Formatting SpecifierS
Character Format Description C or c Currency Culturally aware currency format. D or d Decimal Only supports integral numbers. Displays a string using decimal digits preceded by a minus sign if negative. E or e Exponential / scientific notation Displays numbers in the form ± d. ddddddE ± dd where d is a decimal digit. F or f Fixed point Displays a series of decimal digits with a decimal point and additional digits. G or g General format Displays either as a fixed-point or scientific notation based on the size of the number. N or n Number format Similar to fixed point but uses a separator character (such as,) for groups of digits. P or p Percentage multiplies the number by 100 and displays with a percent symbol. R or r Roundtrip Formats a floating- point number so that it can be successfully converted back to its original value. X or x Hexadecimal Displays an integral number using the base-16 number system.Listing 3.2 Specifying a Different Numeric Format by Adding Format Specifiers on a Para Meter PlaceHolder
Using system; namespace _2_formatting {class class1 {[stathread] static void main (string [] args) {double [] NumArray = {2, 5, 4.5, 45.43, 200000}; // Format in LowerCase Hex Console.writeline (" / N / NHEX (LOWER) / N ----------- "); Foreach (Double Num in NumArray) {Console.Write (" 0x {0: X} / t ", (int) NUM );} // format in Uppercase Hex console.writeLine ("/ n / nhex (upper) / n -----------"); Foreach (Double Num in NumArray) {Console.write (" 0x {0: X} / t ", (int) Num);}}}}
Another type of formatting is picture formatting. Picture formatting allows you to create a custom format specifier using various symbols within the format specifier string. Table 3.2 lists the available picture format characters. Listing 3.3 also shows how to create a custom format specifier. In that code, the digits of the input number are extracted and displayed using a combination of digit placeholders and a decimal-point specifier. Furthermore, you can see that you are free to add characters not listed in the table. This freedom allows you to add literal Characters Intermixed with The Digits.Table 3.2 Picture Formatting Specifiers
Character Name Description 0 Zero placeholder Copies a digit to the result string if a digit is at the position of the 0. If no digit is present, a 0 is displayed. # Display digit placeholder Copies a digit to the result string if a digit appears at the position of the #. If no digit is present, nothing is displayed.. decimal point Represents the location of the decimal point in the resultant string., Group separator and number scaling Inserts thousands separators if placed between two placeholders or scales a number down by 1,000 per, character when placed directly to the left of a decimal point. & Percent multiplies a number by 100 and inserts a% symbol. E ± 0, e ± 0 Exponential notation Displays the number in exponential notation using the number of 0s . as a placeholder for the exponent value / Escape character Used to specify a special escape-character formatting instruction Some of these include / n for newline, / t for tab, and / for the / character;.. Section separator Separates positive, negative, and zero numbers in the format string in which you can apply different formatting rules based on the sign of the original number.Listing 3.3 shows how custom formatting can separate a number by its decimal point. Using a foreach loop, each value Is Printed Using Three Different Formats. The First Format Will Output The Value's Integer Portion Using The Following Format String:
0: $ #, #
Next, The Decimal Portion IS Written. If The Value Does Not Explicitly Define A Decimal Portion, ZEROES Are Written Instead. The Format String To Output The Decimal Value IS
$. # 0;
Finally, The Entire Value Is Displayed Up To Two Decimal Places Using The Following Format String:
{0: $ #, #. 00}
Listing 3.3 Using Picture Format Specifiers to Create Special Formatsusing System; namespace _2_Formatting {class Class1 {[STAThread] static void Main (string [] args) {double [] numArray = {2, 5, 4.5, 45.43, 200000}; // Format As Custom Console.writeline ("/ N / NCUSTOM / N ------"); Foreach (Double Num in NumArray) {Console.WriteLine ("{0: $ #, # $. # 0;} = {0: $ #, #. 00} ", num);}}}}
3.3. Accessing Individual String Characters
You will to process Individual Characters within a string.
TECHNIQUE
Use the index operator ([]) by specifying the zero-based index of the character within the string that you want to extract. Furthermore, you can also use the foreach enumerator on the string using a char structure as the enumeration data type.
Comments
The string class is really a collection of objects. These objects are individual characters. You can access each character using the same methods you would use to access an object in most other collections (which is covered in the next chapter).
You use an indexer to specify which object in a collection you want to retrieve. In C #, the first object begins at the 0 index of the string. The objects are individual characters whose data type is System.Char, which is aliased with the char keyword. The indexer for the string class, however, can only access a character and can not set the value of a character at that position. Because a string is immutable, you can not change the internal array of characters unless you create and return a new string . If NEED TOELITY TO INDEX A STRING TO SET INDIVIDUAL CHARACTERS, USE A STRINGBUILDER Object.
Listing 3.4 shows how to access the characters in a string. One thing to point out is that because the string also implements the IEnumerable interface, you can use the foreach control structure to enumerate through the string.Listing 3.4 Accessing Characters Using Indexers and Enumeration
using System; using System.Text; namespace _3_Characters {class Class1 {[STAThread] static void Main (string [] args) {string str = "abcdefghijklmnopqrstuvwxyz"; str = ReverseString (str); Console.WriteLine (str); str = ReverStringenum (STR); console.writeline (STR); STRING REVERSSTRING (STRINGBUILDER SB = New StringBuilder (Strin.Length); for (INT i = 0; i 3.4. Analyzing Character Attributes You Want to Evaluate The Individual Characters in A String To Determine a Character's Attributes. TECHNIQUE The System.Char structure contains several static functions that let you test individual characters. You can test whether a character is a digit, letter, or punctuation symbol or whether the character is lowercase or uppercase. Comments One of the hardest issues to handle when writing software is making sure users input valid data. You can use many different methods, such as restricting input to only digits, but ultimately, you always need an underlying validating test of the input data. You can use the System.Char structure to perform a variety of text-validation procedures. Listing 3.5 demonstrates validating user input as well as inspecting the characteristics of a character. It begins by displaying a menu and then waiting for user input using the Console. ReadLine method. Once a user enters a command, you make a check using the method ValidateMainMenuInput. This method checks to make sure the first character in the input string is not a digit or punctuation symbol. If the validation passes, the string is passed to a method that inspects each character in the input string. This method simply enumerates through all the characters in the input string and prints descriptive messages based on the characteristics. Some of the System.Char methods for inspection have been inadvertently left out of Listing 3.5. Table 3.3 Shows The Remaining Methods and Their FunctionAnsia. The Results of Running The Application IN LISTING 3.5 APPEAR in Figure 3.1.Listing 3.5 USING THE Static Methods in system.char to inspect the details of a single character using System; namespace _4_CharAttributes {class Class1 {[STAThread] static void Main (string [] args) {char cmd = 'x'; string input; do {DisplayMainMenu (); input = Console.ReadLine (); if ((input == "") || ValidateMainMenuinput (CHAR.TOUPPER (Input [0])) == 0) {Console.WriteLine ("Invalid Command!");} Else {cmd = char.toupper (Input [0]); Switch (cmd) {CASE 'Q': {Break;} case 'n': {Console.write ("Enter a phrase to inspect); input = console.readline (); inspectphrase (input); Break;} }}} while (cmd! = 'q');} private static void inspectphrase (string infut) {finch (char chin infut) {console.write (CH "-"); if (Char.IsDigit (CH) ) Console.Write ("isdigit"); if (char.isletter (ch)) {console.write ("isletter"); console.write ("(LowerCase = {0}, Uppercase = {1})", char .Tolower (CH), CHAR.TOUPPER (CH));} IF ("ispunctuation (" ispunctuation)); if (Char.IsWhitespace (CH)) Console.Write ("Iswhitespace") Console.write ("/ n");}} private ST atic int ValidateMainMenuInput (char input) {// a simple check to see if input == 'N' or 'Q' is good enough // the following is for illustrative purposes if (Char.IsDigit (input) == true) return 0; Else IF (Input) RETURN 0; Else IF (Input)) Return 0; Else IF (Input! = 'N' && Input! = 'Q') Return 0; Return ( INPUT ;.} private static void displaymainMenu () {Console.WriteLine ("/ Nphrase Inspector / N ----------------"); console.writeline ) EW phrase "); Console.writeline ("Q) UIT / N"); console.write (">>");}}} Table 3.3 System.char Inspection Methods Name Description IsControl Denotes a control character such as a tab or carriage return. IsDigit Indicates a single decimal digit. IsLetter Used for alphabetic characters. IsLetterOrDigit Returns true if the character is a letter or a digit. IsLower Used to determine whether a character is lowercase . IsNumber Tests whether a character is a valid number. IsPunctuation Denotes whether a character is a punctuation symbol. isSeparator Denotes a character used to separate strings. An example is the space character. IsSurrogate Checks for a Unicode surrogate pair, which consists of two 16 -bit values primarily used in localization contexts. IsSymbol Used for symbolic characters such as $ or #. IsUpper Used to determine whether a character is uppercase. isWhiteSpace Indicates a character classified as whitespace such as a space character, tab, or carriage return. Figure 3.1 I The Static Method in The System.char Class To Inspect Character Attributes. The System.Char structure is designed to work with a single Unicode character. Because a Unicode character is 2 bytes, the range of a character is from 0 to 0xFFFF. For portability reasons in future systems, you can always check the size of a char by using the MaxValue constant declared in the System.Char structure. One thing to keep in mind when working with characters is to avoid the confusion of mixing char types with integer types. characters have an ordinal value, which is an integer value used as a lookup into a table of symbols. One example of a table is the ASCII table, which contains 255 characters and includes the digits 0 through 9, letters, punctuation symbols, and formatting characters. The confusion lies in the fact that the number 6, for INSTANCE, HAS An Ordinal Char Value of 0x36. Therefore, The line of code meant to initialize a character to the number 6char ch = (char) 6; is wrong because the actual character in this instance is ^ F, the ACK control character used in modem handshaking protocols. Displaying this value in the console would not provide the 6 that you were looking for. You could have chosen two different methods to initialize the Variable. The first way is CHAR CH = (char) 0x36; which produces the desired result and prints the number 6 to the console if passed to the Console.Write method. However, unless you have the ASCII table memorized, this procedure can be cumbersome. To initialize a char variable, simply place the value between single Quotes: CHAR CH = '6'; 3.5. Case-INSENSITIVE STRING COMPARISON You Want to Perform Case-INSENSIVE STRING COMPARION on Two strings. TECHNIQUE Use the overloaded Compare method in the System.String class which accepts a Boolean value, ignoreCase, as the last parameter. This parameter specifies whether the comparison should be case insensitive (true) or case sensitive (false). To compare single characters, convert THEM to Uppercase or Lowercase, Using Toupper or TooliWer, and Then Perform the Comparison.comments Validating user input requires a lot of forethought into the possible values a user can enter. Making sure you cover the range of possible values can be a daunting task, and you might ultimately run into human-computer interaction issues by severely limiting what a user can enter. Case-sensitivity issues increase the possible range of values, leading to greater security with respect to such things as passwords, but this security is usually at the expense of a user's frustration when she forgets whether a character is capitalized. As with many other Programming problems, you must weigh the pros and cons. To perform a case-insensitive comparison, you can use one of the many overloaded Compare methods within the System.String class. The methods that allow you to ignore case issues use a Boolean value as the last parameter in the method. This parameter is named Ignorecase, and when you set it to true, you make a copy-insensitive comparison, as demonstrateed in listing 3.6. Listing 3.6 Performing a Case-Insensitive String Comparison Using system; namespace _5_casecomparison {class class1 {[stathread] static void main (string "args) {string str1 =" this is a string. "; string str2 =" this is a strup. "; console.writeline (" Case SENSITIVE COMPARISON OF " " str1 and str2 = {0} ", string.compare (str1, str2); console.writeline (" Case Innsitive Comparison Of " " str1 and str2 = {0} ", string.compare ( STR1, STR2, TRUE);}}} 3.6. Working with Substrings You NEED TO CHANGE or Extract A Specific Portion of a string. TECHNIQUE To copy a portion of a string into a new string, Use the substring method. You call this method for the String Object Instance of the Source String: 0000-00-00 0000-00-00. 's Source String: String source = "abcd1234wxyz"; string dest = source.substring (4, 4); console.writeline ("{0} / n", dest); To copy a substring into an already existing character array, use the CopyTo method. To assign a character array to an existing string object, create a new instance of the string using the new keyword, passing the character array as a parameter to the string constructor As Shown in The Following Code, Whose Ouput Appears in Figure 3.2: String Source = "abcd"; char [] dest = {'1', '2', '3', '4', '5', '6', '7', '8'}; console.write "Char Array before ="); console.writeLine (dest); // copy substring into char Array Source.copyTo (0, DEST, 4, SOURCE.LENGTH); Console.Write ("char ArrayAfter ="); console .Writeline (dest); // Copy Back Into Source String Source = New String (dest); console.writeline ("new source = {0} / n", source); Figure 3.2 Use the copyto method to copy a substring inTo An existing character array. If you need to remove a substring within a string and replace it with a different substring, use the Replace method This method accepts two parameters, the substring to replace and the string to replace it with.: String replacestr = "1234"; string dest = "abcdefghwxyz"; dest = dest.Replace ("EFGH", ReplaceStr); console.writeline (dest); To extract an array of substrings that are separated from each other by one or more delimiters, use the Split method. This method uses a character array of delimiter characters and returns a string array of each substring within the original string as shown in the following code Whose Output Appears in Figure 3.3. You can optionally Supply An Integer specifying the maximum number of substrings to split: CHAR DELIM = '/'; string filepath = "c: / windows / temp"; string [] Directories = null; Directories = filepath.split (Delim); Foreach (String Directory in Director) {Console.WriteLine ("{0 } "", Directory; Figure 3.3 You can use the split method in the system.string class to place delimited substrings into a string array.comments Parsing strings is not for the faint of heart. However, the job becomes easier if you have a rich set of methods that allow you to perform all types of operations on strings. Substrings are the goal of a majority of these operations, and the string Class Withnet Framework Contains Many Methods That Are Designed To Extract OR Change OF A STRING. The Substring method extracts a portion of a string and places it into a new string object. You have two options with this method. If you pass a single integer, the Substring method extracts the substring that starts at that index and continues until it reaches the end of the string. One thing to keep in mind is that C # array indices are 0 based. The first character within the string will have an index of 0. The second Substring method accepts an additional parameter that denotes the ending index. It lets you Extract parts of a string in the middle of the string. You can create a new character array from a string by using the ToCharArray method of the string class. Furthermore, you can extract a substring from the string and place it into a character array by using the CopyTo method. The difference between these two methods is that the character array used with the CopyTo method must be an already instantiated array. Whereas the ToCharArray returns a new character array, the CopyTo method expects an existing character array as a parameter to the method. Furthermore, although methods exist to extract character arrays from a string, there is no instance method available to assign a character array to a string. to do this, you must create a new string object using the new keyword, as opposed to creating the familiar value-type string, and pass the character array As a parameter to the string constructor.using the replace method is a powerful way to alter the contents of a string. this method allows you to search all instances of a specified substr ing within a string and replace those with a different substring. Additionally, the length of the substring you want to replace does not have to be the same length of the string you are replacing it with. If you recall the number of times you have performed .................... One other powerful method is Split. By passing a character array consisting of delimiter characters, you can split a string into a group of substrings and place them into a string array. By passing an additional integer parameter, you can also control how many substrings to extract from the source string. Referring to the code example earlier demonstrating the Split method, you can split a string representing a directory path into individual directory names by passing the / character as the delimiter. you are not, however, confined to using a single Delimiter. If You Paras A Character Array Consisting of Several Delimiters, The Split Method Extracts Substrings Based on Any of The Delimiters That IT Encounters.3.7. Using Verbatim String Syntax You Want to Represent a path to a file using a string without escape character. TECHNIQUE When Assigning a Literal String to a string Object, preface the string with the @ symbol. It turns off all escape-character processing so theyre is no need to escape path separators: String nonverbatim = "c: / windows / temp"; string verbatim = @ "c: / windows / temp"; Comments A compiler error that happens so frequently comes from forgetting to escape path separators. Although a common programming faux pas is to include hard-coded path strings, you can overlook that rule when testing an application. Visual C # .NET added verbatim string syntax as a Feature to alleviate the frustration of haVing to escape all the path Separators Wtem, Which can be especially cumbersome for large path. 3.8. Choosing Between Constant and Mutable strings You want to choose the correct string data type to best fit your current coplication design.technique If you know a string's value will not change often, use a string object, which is a constant value. If you need a mutable string, one that can change its value without having to allocate a new object, use a StringBuilder. Comments Using a regular string object is best when you know the string will not change or will only change slightly. This change includes the whole gamut of string operations that change the value of the object itself, such as concatenation, insertion, replacement, or removal of characters. The Common Language Runtime (CLR) can use certain properties of strings to optimize performance. If the CLR can determine that two string objects are the same, it can share the memory that these string objects occupy. These strings are then known as interned strings. The CLR contains an intern pool, which is a lookup table of string instances. Strings are automatically interned if they are assigned to a literal string within code. However, you can also manually place a string within the intern pool by using the Intern Method. To Test WHETHER A STRING IS ISINTERNED METHOD, AS Shown in Listing 3.7. Listing 3.7 Interning a string by using the intern method Using system; namespace _7_stringbuilder {/// Listing 3.8 Manipulating An Internal String Buffer Instead of Returning New String Objects using System; using System.Text; namespace _7_StringBuilder {class Class1 {[STAThread] static void Main (string [] args) {string string1 = ""; String string2 = ""; Console.Write ( "Enter string 1:") ; string1 = Console.ReadLine (); Console.Write ( "Enter string 2:"); string2 = Console.ReadLine (); BuildStrings (string1, string2);} public static void BuildStrings (string str1, string str2) {StringBuilder SB = New StringBuilder (str1 str2); sb.insert (str1.length, "is the first string./N"); sb.insert (Sb.Length, "is the second string./n"); console. WriteLine (SB);}}}}}} 3.9. Optimizing StringBuilder Performance . TECHNIQUE Use The Ensurecapacity Method in The Stringbuilder Class. Set this Integral Value TO a Value ThatiFies The Length of The Longest String You May IN this buffer. Comments The StringBuilder class contains methods that allow you to expand the memory of the internal buffer based on the size of the string you may store. As your string continually grows, the StringBuilder will not have to repeatedly allocate new memory for the internal buffer. In other words, if you attempt to place a larger length string than what the internal buffer of the StringBuilder class can accept, then the class will have to allocate additional memory to accept the new data. If you continuously add strings that increase in size from the last input string, the StringBuilder class will have to allocate a new buffer size, which it does internally by calling the GetStringForStringBuilder method defined in the System.String class. This method ultimately calls the unmanaged method FastAllocateString. by giving the StringBuilder class a hint using The Ensurecapacity Method, You Can Help AlleViate Some of this Continual Memory Reallocation, Thereby Optimizing The Stringbuilder Performance By r Educing The Amount of Memory Allocations Needed to Store A String Value.3.10. Understanding Basic Regular Expression Syntax You 's want to create a regular expression. TECHNIQUE Regular expressions consist of a series of characters and quantifiers on those characters. The characters themselves can be literal or can be denoted by using character classes, such as / d, which denotes a digit character class, or / S, which denotes any nonwhitespace character . Table 3.4 Regular Expression Single Character Classes Class Description / D Any Digit / D Any Nondigit / WS Any Word Character / s Any Whitespace Character / SW Any Nonwhitespace In addition to the single character classes, you can also specify a range or set of characters using ranged and set character classes. This ability allows you to narrow the search for a specified character by limiting characters within a specified range or within a defined set. Table 3.5 Ranged and Set Character Classes Format Description. Any character except newline. / P {uc} Any character within the Unicode character category uc. [Abcxyz] Any literal character in the set. / P {uc} Any character not within the Unicode character category uc. [^ Abcxyz ] Any Character NOT IN The Set of Literal Characters. Quantifiers work on character classes to expand the number of characters the character classes should match. You need to specify, for instance, a wildcard character on a character class, which means 0 or more characters within that class. Additionally, you can also specify a Set Number of Matches of a class this kind of braces Following The Character Class Designation. Table 3.6 Character Class Quantifiers Format description * 0 or more character name? 0 or 1 character n characters {n,} at least n characters {n, m} at Least n But no more tran m character You can also specify where a certain regular expression should start within a string. Positional assertions allow you to, for instance, match a certain expression as long as it occurs at the beginning or ending of a string. Furthermore, you can create a regular expression . Table 3.7 Positional (Atomic Zero-Width) Assertions Format Description ^ Beginning of a string or beginning of a newline / z End of the string, including the newline character $ End of a string before a newline character or at the end of the line / G Continues where the last match left off / A Beginning of A String / B Between Word Boundaries (Between Alphaumeric and Nonalphanumeric Characters) / z end of the string before the newline character / B character Regular expressions use a variety of characters both symbolic and literal to designate how a particular string of text should be parsed. The act of parsing a string is known as matching, and when applied to a regular expression, the match will be either true or false ................................. .. You build regular expressions using a series of character classes and quantifiers on those character classes as well as a few miscellaneous regular-expression constructs. You use character classes to match a single character based either on what type of character it is, such as a digit or letter, or whether it belongs within a specified range or set of characters (as shown in Table 3.4). Using this information, you can create a series of character classes to match a certain string of text. For instance, if you want to Specify A Phone Number Using Character Classes, You Can Use the Following Regular Expression: / (/ d / d / d /) / s / d / d / d- / d / d / d This expression begins by first escaping the left parenthesis. You must escape it because parentheses are used for grouping expressions. Next you can see three digits representing a phone number's area code followed by the closing parenthesis. You use a / s to denote a whitespace character . The remainder of the regular expression contains the remaining digits of the phone number.In addition to the single character classes, you can also use ranged and set character classes. They give you fine-grain control on exactly the type of characters the regular expression SHOULD MATCH. for instance, if you want to match any character as long as it is a vowel, use the folowing expression: [aeiou] This line means that a character should match one of the literal characters within that set of characters. An even more specialized form of single character classes are Unicode character categories. Unicode categories are similar to some of the character-attribute inspection methods shown earlier in this chapter. For instance, you can use Unicode categories to match on uppercase or lowercase characters. Other categories include punctuation characters, currency symbols, and math symbols, to name a few. you can easily find the full list of Unicode categories in MSDN under the Topic "Unicode Categories Enumeration." You can optimize the phone-number expression, although it's completely valid, by using quantifiers. Quantifiers specify additional information about the character, character class, or expression to which it applies. Some quantifiers include wildcards such as *, which means 0 or more occurrences , and?, which means only 0 or 1 occurrences of a pattern. you can also use braces containing an integer to specify how many characters within a given character class to match. Using this quantifier in the phone-number expression, you can specify that The Phone Number SHOULD Contain Four Digits for the Area Code Four Digits Separated By A Dash: / (/ D {3} /) / S / D {3} - / D {4} Because the regular expression itself is not that complicated, you can still see that using quantifiers can simplify regular-expression creation. In addition to character classes and quantifiers, you can also use positional information within a regular expression. For instance, you can specify that given an input string, the regular expression should operate at the beginning of the string. you express it using the ^ character. Likewise, you can also denote the end of a string using the $ symbol. Take note that this does not mean start at the end of the string and attempt to make a match because that obviously seems counterintuitive;. no characters exist at the end of the string Rather, by placing the $ character following the rest of the regular expression, it means to match the string WITH MATCH ATANCE, IFY You Want To Match A Sentence In Which a Phone Number Is The Last Portion of The Sentence, You Could Use TH e following: / (/ d {3} /) / s / d {3} - / d {4} $ My Phone Number IS (555) 555-5555 = Match (555) 555-5555) 555-5555) 555-5555) 555-5555 is My Phone Number = NOT A MATCH3. 11. Validating User Input with Regular Expressions You want to ensure valid user input by using regular expression. Test for validity. TECHNIQUE Create a RegEx object, which exists within the System.Text.RegularExpressions namespace, passing the regular expression in as a parameter to the constructor. Next, call the member method Match using the string you want to validate as a parameter to the method. The method returns a match object regardless of the outcome. To test whether a match is made, evaluate the Boolean Success property on that match object as demonstrated in Listing 3.9. It should also be noted that in many cases, the forward slash (/) character IS Used WORKING WITH EXPRESSITIONS. TO AVOID Compiration An Invalid Control Character, Use The @ Symbol To Turn Off Escape Processing. Listing 3.9 Validating User Input of a Phone Number Using A Regular Expression using System; using System.Text.RegularExpressions; namespace _11_RegularExpressions {class Class1 {[STAThread] static void Main (string [] args) {Regex phoneExp = new Regex (@ "^ / (/ d {3} /) / s / D {3} - / d {4} $ "); String Input; console.write (" Enter a phone number: "); input = console.readline (); while (phoneexp.match (input) .Success == {Console.Writeline ("Invalid INPUT. TRY AGAIN); Console.Write (" Enter a Phone Number: "); Input = Console.readLine ();} console.writeline (" Validated! ");} } Comments Earlier in this chapter I mentioned that you could perform data validation using the static methods within the System.Char class. You can inspect each character within the input string to ensure it matches exactly what you are looking for. However, this method of input validation can be extremely cumbersome if you have different input types to validate because it requires custom code for each validation. in other words, using the methods in the System.Char class is not recommended for anything but the simplest of data-validation procedures.Regular expressions , on the other hand, allow you to perform the most advanced input validation possible, all within a single expression. you are in effect passing the parsing of the input string to the regular-expression engine and offloading all the work that you would normally do . In Listing 3.9, you can see how you create and use a regular expression to test the validity of a phone number entered by a user. The regular expression is similar to the previous expressions used earlier for phone numbers except for the addition of positional markers. The regular expression is valid if a user enters a phone number and nothing else. A match is successful when the Success property within the match object, which is returned from the Regex.Match method, is true. The only caveat to using regular expressions for Input Validation Is That Even Though You know The Validation Faled, you are unable to query the regex or match class to see what part of the string failed. 3.12. Replacing Substrings Using Regular Expressions You want to replace all substrings..................... TECHNIQUE Create a Regex object, passing the regular expression used to match characters in the input string to the Regex constructor. Next, call the Regex method Replace, passing the input string to process and the string to replace each match within the input string. You can also use the static Replace method, passing the regular expression as the first parameter to the method as shown in the last line of Listing 3.10.Listing 3.10 Using Regular Expressions to Replace Numbers in a Credit Card with xs using System; using System.Text.RegularExpressions; namespace _12_RegExpReplace {class Class1 {[STAThread] static void Main (string [] args) {Regex cardExp = new Regex (@ "(/ d {4}) - (/ d {4 }) - (/ d {4}) - (/ D {4}) "); string safeoutputexp =" $ 1-xxxx-xxxx- $ 4 "; string cardnum; console.write (" please enter your credit card number: " ); cardnum = console.readline (); while (cardExp.match (cardnum) .success == false) {console.writeline ("Invalid Card Number. Try Again); console.write (" please enter your credit card Number: "); cardnum = console.readline ();" SECURE OUTPUT RESULT = {0} ", Cardexp.Replace (cardnum, saffutPutex);}}} Comments Although input validation is an extremely useful feature of regular expressions, they also work well as text parsers. The previous recipe used regular expressions to verify that a particular string matched a regular expression exactly. However, you can also use regular expressions to match substrings within a string and return each of those substrings as a group. Furthermore, you can use a separate regular expression that acts on the result of the regular-expression evaluation to replace substrings within the original input string. Listing 3.10 creates a regular expression that matches the format for a credit card. In that regular expression, you can see that it will match on four different groups of four digits apiece separated by a dash. However, you might also notice that each one of these groups is surrounded with parentheses. In an earlier recipe, I mentioned that to use a literal parenthesis, you must escape it using a backslash because of the conflict with regular-expression grouping symbols. In this case, you want to use the grouping feature of regular expressions. When you place a portion of a regular expression within parentheses, you are creating a numbered group. Groups are numbered starting with 1 and are incremented for each subsequent group. In this case, there are four numbered groups. These groups are Used by The Replacement String, Which is Contained in String Safeoutputexp. To referenceputexp. To reference a Numbered Group, Use the $ smbol stocked by the number of the group to reference by the number of the group to reference by the number of the group to reference by the number of the group to reference by the number of the group ents all characters within the input string that match the group expression within the regular expression. Therefore, in the replacement string, you can see that it prints the characters within the first group, replaces the characters in the second and third groups with xs, and Finally Prints The Characters in The FouRh Group. One thing to note is that you can use the RegEx class to view the groups themselves. If you change the regular expression to "/ d {4}", you can then use the Matches method to enumerate all the groups using the foreach keyword, as shown in Listing 3.11. in the listing, the program first checks to make sure at least four matches were made. This number corresponds to four groups of four digits. Next, it uses a foreach enumeration on each Match object that is returned from the . Matches method If the match is in the second or third group, the values are replaced with xs; otherwise, the match object's value, the characters within that group, are concatenated to the result string.Listing 3.11 Enumerating Through the match Collection to Perform Special Operations On Each Match in A Regular Expression Static void TestmanualGrouping () {regex cardExp = new regex (@ "/ d {4}"); string cardnum; string safeoutputexp = ""; console.write ("please enter your credit card number:"); cardnum = console. Readline (); if (Cardexp.matches (CardNum) .count <4) {Console.writeline ("Invalid Card Number"); Return;} Foreach (Match Field In CardExp.matches (CardNum) {IF (Field.success == false) {Console.WriteLine ("Invalid Card Number"); return;} if (field.index == 5 || field.index == 10) {SafeOutputexp = "-xxxx-";} else {SafeOutputExp = Field.Value;}} console.writeline ("Secure Output Result = {0}", SAFEOUTPUTEXP); 3.13. Building a regular expression library You want to create a library of regular expressions That You Can Reuse in Other Projects. TECHNIQUE Use the CompileToAssembly static method within the Regex class to compile a regular expression into an assembly. This method uses an array of RegexCompilationInfo objects that contain any number of regular expressions you want to add to the assembly.The RegexCompilationInfo class contains a constructor with five fields that you must fill out The parameters denote the string for the regular expression;. any options for the regular expression, which appear in the RegexOptions enumerated type; a name for the class that is created to hold the regular expression; a corresponding namespace; and . After creating the RegexCompilationInfo object, create an AssemblyName object, making sure to reference the System.Reflection namespace, and set the Name property to a name you want the resultant assembly filename to be. Because the CompileToAssembly creates a DLL, exclude the DLL extension on the assembly name. Finally, place all the RegexCompilationInfo objects within an array, as shown in Listing 3.12, and call the CompileToAssembly method. Listing 3.12 demonstrates how to create a RegexCompilationInfo object and how to use that object to compile a regular expression into an assembly Using the compiletoassembly method. Listing 3.12 Using The CompileToassembly Regex Method to Save Regular Expressions in a New Assembly for Later Reuse using System; using System.Text.RegularExpressions; using System.Reflection; namespace _12_RegExpReplace {class Class1 {[STAThread] static void Main (string [] args) {CompileRegex (@ "(/ d {4}) - (/ d { 4}) - (/ D {4}) - (/ D {4}) ", @" regexlib ");} static void compileReGex (String Exp, String assemblyName) {regexcompilationinfo compinfo = new regexcompilationinfo (exp, 0," CreditCardExp "," ", true); AssemblyName assembly = new AssemblyName (); assembly.Name = assemblyName; RegexCompilationInfo [] rciArray = {compInfo}; Regex.CompileToAssembly (rciArray, assembly);}}} Comments If you use regular expressions regularly, then you might find it advantageous to create a reusable library of the expressions you tend to use the most. The Regex class contains a method named CompileToAssembly that allows you to compile several regular expressions into an assembly that you can The Reference With Internally, you will find a class for each regular expression you added, all contained within its corresponding namespace, as specified in the RegexCompilationInfo object when you created it. Furthermore, each of these classes inherits from the Regex class so all the Regex methods are available for you to use. As you can see, creating a library of commonly used regular expressions allows you to reuse and share these expressions in a multitude of different projects. A change in a regular expression simply involves changing one assembly instead of each project that hard -CODED The regular expression.