Simple validations often are not adequate for some kinds of user input-for example, credit card numbers and Social Security numbers. The fact that users like to enter data in different formats complicates matters. For example, users might enter a credit card number as 1234 5678 9012 3456, 1234567890123456, or 1234-5678-9012-3456. You can parse any of these as a valid credit card number. Even simple requirements, such as ZIP codes, have a regular format, but some users might include spaces. I'll show you how to use regular expressions to validate user input against a variety of common formats. you can find many of these formats in the ASP.NET regular expression validator wizard (see Figure 1 and Additional Resources). I'll start by showing you how to create a validator for credit card numbers in all possible formats. Then, you can create a WinForms application that lets you enter a possible credit card number or domain name and click on a button to validate it (see Figure 2, And Downloa D the source code.
Your goal is to create a regular expression you can use in the RegularExpression.IsMatch () method. This method returns TRUE if the method is a match, and FALSE if it's not. To match a regular expression, you create a regular-expression string that matches the expected input string. The key is that regular expressions provide a language to describe pattern matches. As a simple example, * .doc in the DOS dir command matches all Word documents. You can create similar and far more complex patterns with regular expressions.When you build any regular expression to validate input, you want to match the entire input expression. All your regular expressions should start with ^ to match the beginning of the input string and end with $ to match the end of the input string ( see Table 1 for the most common RegEx elements for validating user input) .I'll show you how to build the regular expression that matches only the forms of a credit card number that I listed previously (see Listing 1). All the fo rms use four groups of four digits each. You use / d to match a single digit, so the regular expression ^ / d $ matches exactly one digit. However, you want four digits for each credit card number group. Regular expressions use braces ( {}) to Describe How MATIMES A PATTERN IS REPEATED, SO / D {4} Matches Four Digits. You Use the expression ^ / d {16} $ to match a credit card number with no intervening spaces.
Match Optional DelimitersYou must modify this expression now so that the user can insert optional spacing delimiters between the groups of four digits The delimiters can be hyphens. (-)., Spaces, or nothing To match a single character from a set, you place the Set of character square brackets ([]); [AB] Matches Either a or b. You use / s to match any whitespace character. The expression [- / s] matches the Possible Delimiters.but Wait-There's a bit More. You Want The User to Place Zero delimiters or one delimiter. You Could Use {0,1}: [- / s] {0,1}, but match izro or one copy of a substring is so compon That a SIMPLER WAY TO SPECIFY THIS-WITH The Question Mark (?). You Use [/ s] to match zero or one delimiter.
Your Version of The Regular Expression (Which Splits Here Because of Line-Width Constraints) Now Looks Like this:
^ [// D] {4} [- // s]? [// D] {4} [- // s]? [// D] {4} [- // s]? [// D ] {4} $
You're almost there, but the preceding regular expression has one small bug:. The user could use different delimiter characters between different four-digit groups For example, 1234 5678-90123456 would be valid input You need to make sure the user places. the same delimiter between each of the groups. You use two features of regular expressions-grouping and backreferences-to do this. A grouping is a set of characters that the regular expression processor remembers from the input string. A backreference is a copy of the Remembed text. first, modify your expression to recember Which Delimiter the user type first:
^ [// d] {4} ([- // s]?)
A group is any expression in parentheses. Simply place a substring in parentheses to create a numbered group. The entire string is number 0, and each group is numbered from left to right, starting at 1. However, I prefer to avoid numbered expressions, because they can be difficult to understand later, especially if they involve multiple or nested groups You can use named groups instead You name a group by adding a question mark and a name in angle brackets after the opening parentheses:.. ^ [// d ] {4} (?
The remembered delimiter is named grpdel now You must match the remembered group for each delimiter in order to limit the user to using the same delimiter in each group Use a backreference to match a remembered group..:
^ [/ d] {4} (?
The / k
^ [/ d] {4} (?
If You're confused at this point, Walk Through Each Step. ^ Matches the beginning of the input. [/ D] {4} matches exactly four digits. (?
Validate a Domain AddressThe example I've just shown you demonstrates the most common constructs you use when you write regular expressions to validate user input. Here's another simple example that shows how you can use other input characters to validate input (see Listing 2). Suppose you want to parse a domain address The addresses fawcette.com, microsoft.com, and srtsolutions.com are all valid However, any address that uses a different protocol.. (such as ftp: //) or contains invalid characters is invalid ........................................................................................................................................................................................................................................................................................................ To Validate A Domain Name. I'll Limit The List of Valid Suffixes To .com, .NET,. MIL, .EDU, .GOV, AND.MIL, TO Keep this Example Reasonably Simple.once Again, You Build The Regular Expression You need to find between 1 and 63 char Acters in the set of a-z, 0-9, and -. You Place The Set of Valid Character Ranges INCLUDES The Range A-Z, The Range 0-9, And The Single Hyphen Character. The Single Hyphen Character.. {1,63} Construct Matches from 1 to 63 Repeats of the Preceding Range. This Expression Builds on a Construction You SAW Previously:
[a-z, 0-9, -] {1,63}
Next, you must find a single period. This might seem simple, but a period is a special character in the regular expression language, so you need to escape the character by preceding it with a backslash (/). Finally, you must find one Of the approved extensions. You Find One of a set of phrases by plation all the phrases Between Parentheses, Separated by The Pipe Character (|): (COM | NET | ORG | EDU | GOV | MIL). The Complete Expression IS: ^ [AZ, 0-9, -] {1,63} //. (COM | Org | Net | GOV | MIL) $
Note that you need only two lines of code for each expression in the examples I've shown you. Putting together regular expressions can take work, but it pays off handsomely. I've yet to see an input format that you can not validate with regular expressions. Look at the samples provided in the ASP.NET regular expression validator to learn more about using and forming your own regular expressions. Try to understand how each one works. Then, build expressions for your own validations. Test all the small SubExpressions Individually and include Comments in your code, Because Debugging a long expression can be tricky.