What is a regular expression?
Regular expressions consist of one or more characters text and / or metammatics. Under the simplest format, a regular expression consists only of character text, such as a regular expression CAT. It is read as a letter C, followed by letters A and T, which matches a string such as CAT, Location, and Catalog. The metamorphor provides an algorithm to determine how Oracle handles characters that make up a regular expression. When you understand the meaning of each character, you will experience a regular expression for finding and replacing a specific text data is very powerful.
Verify data, identify the appearance of repeat keywords, detect unnecessary spaces, or analyze strings only part of the application of regular expressions. You can use them to verify the format of phone numbers, postal codes, email addresses, social security numbers, IP addresses, file names, and pathnames. In addition, you can find modes such as HTML tags, numbers, and date, or anything in any text data, and replace them with other modes.
Use Oracle Database 10g to use regular expressions
You can use the latest introduced Oracle SQL Regexp_Like operator and regexp_instr, regexp_substr, and regexp_replace functions to play a regular expression. You will experience this new feature to add Like operators and instr, substr, and replace functions. In fact, they are similar to existing operators, but now add powerful mode matching. The searched data can be a simple string or a large number of text stored in the database character column. Regular expressions allow you to search, replace and verify data in a way you have never thought of, and provide height flexibility.
Basic example of regular expression
Before using this new feature, you need to know the meaning of some metache. The junction matches any character in a regular expression (except for the wrapper). For example, a formal expression A.B is in a string that first contains letter A, followed by any other single character (except for the wrapper), and then it is the letter B. Strings AXB, XAYBX, and ABBA matches it because this mode is hidden in the string. If you want to accurately match a string of three letters at the beginning and at the end of B, you must position regular expressions. Decomposition Symbol (^) Metacity Indicates the beginning of a row, while the dollar symbol ($) indicates the end of a row (see Table 1). Therefore, regular expression ^ a.b $ matching string AAB, ABB or AXB. This method is compared to a similar pattern provided by the LIKE operator, where (_) is a single-word inventory.
By default, a single character or character list in a regular expression only matches once. In order to indicate a character that appears multiple times in a regular expression, you can use a quantifier, which is also known as a repetitive operator. If you want to get a matching mode that starts from the letter A and ends with the letter B, your regular expression looks like this: ^ a. * B. * Metamorphism Repeats the previous character (.) Indicated to match zero, once or more. The equivalent mode of the LIKE operator is A% B, where the percentage (%) indicates zero, once or more.
Table 2 gives a complete list of repeated operators. Note that it contains special duplicate options that achieve greater flexibility than existing LIKE wildcards. If you enclose an expression with parentheses, this will effectively create a sub-expression that can repeat a certain number of times. For example, regular expression B (AN) * a matches BA, BANA, BANANA, Yourbanasplit, and so on.
Oracle's regular expression is implemented to support the POSIX (portable operating system interface) character class, see the content listed in Table 3. This means that the type of characters you are looking for can be very special. Suppose you want to write a Like condition that only finds non-alphabetic characters - as a result of the WHERE clause may become very complicated. The POSIX character class must be included in a list of characters indicated by square brackets ([]). For example, a regular expression [[: Lower:]] matches a lowercase letter character, and [[: Lower:]] {5} matches five consecutive lowercase letters characters.
In addition to the POSIX character class, you can place a separate character in a list of characters. For example, a regular expression ^ AB [CD] EF $ matching string ABCEF and ABDEF. C or D must be selected.
In addition to the characterization character (^) and even characters (-), most of the metad characters in the character list are considered text. Regular expressions look complicated, because some element characters have multiple senses that are determined with the above environment. ^ Is such a element character. If you use it as a first character list, it represents a list of characters. Therefore, [^ [: Digit:]] looks out the mode containing any non-digital characters, and ^ [[: DIGIT:]] looks for matching mode starting with the number. Connect characters (-) indicate a range, formal expression [A-M] matches any letters between the letter A to the letter M. But if it is the first character in a character line (in [-AFG]), it represents the character.
Previous examples describe the use of parentheses to create a sub-expression; they allow you to enter a replaceable option by entering a longer element character, which is separated by the vertical line (|).
For example, a regular expression T (A | E | I) n allows three possible characters between the letters T and N to replace. Matching mode includes words such as Tan, Ten, Tin, and Pakistan, but does not include Teen, Mountain or Tune. As another selection, the regular expression T (A | E | I) N may also be represented as a character list T [AEI] n. Table 4 summarizes these metamodes. Although there is more metamorphic characters, this concise overview is sufficient to understand the formal expression used in this article.
Regexp_like operator
Regexp_like operator introduces you a regular expression function when used in an Oracle database. Table 5 lists the syntax of regexp_like.
The WHERE clause of the SQL query below shows the regexp_like operator, which searchs in the ZIP column to meet the formal expression [^ [: Digit:]]]. It will retrieve those ZIP columns in the zipcode table contain any lines of any non-digital characters.
Select Zip from Zipcode Where Regexp_like (ZIP, '[^ [: DIGIT:]]')
Zip
-----
AB123
123xy
007ab
Abcxy
The example of this regular expression is consisting of element characters, and more specifically, the POSIX character class DIGIT is separated by colon and square brackets. The second set of square brackets (as shown in [^ [: Digit:]] includes a list of characters. As mentioned earlier, you need this because you can only use the POSIX character class to build a list of characters.
Regexp_instr function
This function returns a mode start position, so its function is very similar to the INSTR function. The syntax of the new regexp_instr function is given in Table 6. The main difference between these two functions is that regexp_instr allows you to specify a model instead of a specific search string; thus provides more features. The next example uses regexp_instr to return the starting position of the five postal coding modes in the string Joe Smith, 10045 Berry Lane, San Joseph, CA 91234. If a regular expression is written as [[: Digit:]] {5}, you will get the starting position of the house number instead of postal code, because 10045 is the first time five consecutive numbers. Therefore, you must position the expression to the end of the line, as indicated by the same character, the function will display the starting position of the postal code, regardless of the number of the number of the house number. Select Regexp_instr ('Joe Smith, 10045 Berry Lane, San Joseph, CA 91234 ",
'[: Digit:]] {5} $') AS RX_INSTR from DUAL
RX_INSTR
------------
45
Write more complex patterns
Let's expand in the postal coding mode of the previous example to include an optional four-digit digital mode. Your pattern can now look like this: [[: Digit:]] {5} (- [: DIGIT:] {4})? If your source string ends with 5 bits of postal coding or 5-bit 4-digit postal codes, you will be able to display the starting position of this mode.
Select Regexp_instr ('Joe Smith, 10045 Berry Lane, San Joseph, CA 91234-1234 ",
'[[: DIGIT:]] {5} (- [: DIGIT:]] {4})? $') AS Starts_at from DUAL
Starts_at
------------
44
In this example, the sub-expression in the parentheses (- [: DIGIT:]} {4}) will press the indication of the repeat operator to repeat zero or once. In addition, trying to use traditional SQL functions to achieve the same result or even a challenge for SQL experts. In order to better illustrate the different components of this formal expression example, Table 7 includes a description of a single text and metammatism.
Regexp_substr function
The regexp_substr function similar to the Substr function is used to extract a part of a string. Table 8 shows the syntax of this new function. In the example below, the string of matching mode [^,] * will be returned. The regular expression searches for a comma that follows the space; then press [^,] * instruction to search for zero or more characters that are not a comma, and finally find another comma. This model looks a bit like a string of value separated by a comma.
Select Regexp_substr ('First Field, Second Field, Third Field', ", [^,] *, ') from DUAL
Regexp_substr ('FIR
------------------
, SECOND FIELD,
Regexp_replace function
Let's take a look at the traditional replace SQL function, which replaces a character to another string. Suppose your data is unnecessary space in your body, you want to replace them with a single space. With the Replace function, you need to list how many spaces you want to replace. However, the number of excess spaces may not be the same in the body. The following example has three spaces between Joe and Smith. The parameter of the Replace function specifies that you want to replace two spaces with a space. In this case, the result has an additional space between Joe and Smith in the original string. SELECT Replace ('Joe Smith', '', ') AS Replace from Dual
Replace
---------
Joe Smith
The regexp_replace function advances the replacement function forward and its syntax is listed in Table 9. The following query replaces any two or more spaces with a single space. () Sub-expression contains a single space, which can be repeated twice or more in {2,} indication.
Select Regexp_replace ('Joe Smith', '() {2,}', '') AS RX_REPLACE from DUAL
RX_REPLACE
------------
Joe Smith
Backward reference
A useful characteristic of the regular expression is to be able to store sub-expressions after reuse; this is also referred to as a rearward reference (an overview thereof in Table 10). It allows complex alternatives, such as swap mode in a new location or a word or letters repeated. The matching portion of the sub-expression is saved in the temporary buffer. The buffer is numbered from left to right, and the / DIGIT symbol is used to access, where DIGIT is a number between 1 and 9, which matches the Digit sub-expression, the child expression is displayed in a group of parentheses.
The next example shows that the name Ellen Hildi Smith is converted to Smith, Ellen Hildi by quoting each sub-expression by following numbers.
SELECT Regexp_replace
'ELLEN HILDI Smith',
'(. *) (. *) (. *),' / 3, / 1/2 ')
From dual
Regexp_replace ('EL
------------------
Smith, Ellen Hildi
The SQL statement displays three separate sub-expressions that are hosted by parentheses. Each individual sub-expression contains a matching element character (.), Followed by the * element character, indicating that any character (except for the line) must match zero or more times. Space separates each sub-expression and space must also match. Parentheses Create a sub-expression that gets values and can be referenced with / Digit. The first sub-expression is assigned / 1, second / 2, and so on. These backward references are used in the last parameters (/ 3, / 1/2) of this function, which effectively returns replacement sub-string and arranges them (including comma and spaces) according to the desired format. Table 11 details the respective components of the regular expression.
The retrieval reference is very useful, formatted, and instead of the value, and you can use them to find the values that appear adjacent. The next example shows the use of the regep_substr function to find the repeated alphanumeric value that is spaced apart from the space. The result of the display gives a sub-string that identifies the word IS that repeatedly appears.
SELECT Regexp_substr
'The final test is is the importation',
'([: alnum:]] ) ([: space:]] ) / 1') as Substr
From dual
Substr
--------
IS IS
Match parameter option
You may have noticed the regular expression operator and function that contains an optional matching parameter. This parameter controls whether it is case sensitive, the matching of the wrap, and retains multiple lines of input.
Actual application of regular expressions
Not only can you use regular expressions in the queue, but also use regular expressions anywhere using SQL operators or functions (such as in the PL / SQL language). You can write a trigger that utilizes the regular expression function to verify, generate, or extract values.
The next example demonstrates how you can apply the regexp_like operator in a single column check constraint for data verification. It verifies the correct social insurance number format in insert or update. Social insurance numbers in the format of 123-45-6789 and 123456789 are acceptable for this column constraint condition. Effective data must start with three numbers, followed by a hyphen, plus two numbers and a hyphen, and finally four numbers. Another expression only allows 9 consecutive numbers. The vertical line symbol (|) separates each option.
Alter Table Students
Add constraint stud_ssn_ck check
(Regexp_like (SSN,
'^ ([[: Digit:]] {3} - [[: Digit:]] {2} - [: DIGIT:] {4} | [: DIGIT:]] {9} $') )
The characters that are indicated by ^ and $ indicated are unacceptable. Make sure your regular expression is not divided into multi-line or contain any unnecessary spaces unless you want to match the format and each match. Table 12 illustrates the respective components of the regular expression example.
Compare regular expressions with existing functions
Regular expressions There are several advantages superior to common LIKE operators and instr, substr, and replace functions. These traditional SQL functions are not convenient for pattern matching. Only the LIKE operator matches the use% and _ characters, but LIKE does not support expression of repetition, complex replacement, character range, character list, and POSIX character classes. In addition, the new regular expression function allows the detection of repeated words and mode exchange. The example here provides you with an overview of the regular expressions, and how you can use them in your application.
Really enriched your toolkit
Because regular expressions help solve complex problems, they are very powerful. Some functions of regular expressions are difficult to use traditional SQL functions. When you understand this slightly mysterious language, the regular expression will become an indispensable part of your toolkit (not only in other programming locations in the SQL environment). In order to make your various modes, although try and errors are sometimes necessary, the simple and power of regular expressions is not doubtful.
Alice Rischert (Ar280@yahoo.com) is a chairman of database application development and design directions of University of Columbia computer technology and application. She wrote the 2nd edition of the Oracle SQL Interaction (Prentice Hall, 2002) and the upcoming Oracle SQL example (PRENTICE HALL, 2003). Rischert has more than 15 years of experience as a database designer, DBA, and project executives within 100 Fortune 100 companies, and she has always used Oracle products since Oracle Version 5.
Table 1: Positioning element characters
Metacity Description ^ Locate the expression to a row to position the expression to the end of a row
Table 2: Quantifiers or repeated operators
Quantifier Description * Match 0 times or more? Match 0 times or 1 match 1 or more {m} just matches M times {m,} at least M times {m, n} at least M times but No more than N times
Table 3: Predefined POSIX Character Class
Character class Description [: alpha:] Letter Character [: LOWER:] Captap alphabet character [: Upper:] uppercase letters characters [: DIGIT:] Number [: alnum:] Alphanumeric characters [: space:] blank characters (forbidden print ), Such as Enterprise, Punch, Vertical Table, and Page [: Punct:] Point Character [: CNTR:] Control Character (Prohibited Print) [: Print:] Print word table 4: Expression Replace matching and packet
Emminal character description | Replace separation option, usually use () packets with the packet operator () to group sub-expression into a replacement unit, quantifier unit, or rear reference unit (see "rearward reference" part) [char] The character list represents a character list; most of the character lists in a character list (except for the character class, ^ and - metammatics) are understood as text
Table 5: Regexp_like operator
Grammatical description
Regexp_like (Source_String, Pattern
[, match_parameter]) Source_string supports character data type (char, varchar2, clob, nchar, nvarchar2, and nclob, but excluding long). The Pattern parameter is another name of the regular expression. Match_Parameter allows optional parameters (such as handling newline, multi-line formatting, and providing control of case sensitive).
Table 6: Regexp_instr function
Grammatical description
Regexp_instr (Source_String, Pattern
[, start_position [, occurrence [, match_option [, match_parameter]]]) This function looks up Pattern and returns the first location of the mode. You can specify the start_position you want to start searching. The Occurrence parameter defaults to 1. Unless you specify a mode you want to find next. Return_Option The default value is 0, which returns the starting position of the mode; the value is 1 returns the starting position of the next character that matches the match.
Table 7: 5 digits plus 4 postal coding expressions
Syntax Description must match blank [: DIGIT:] POSIX number class] The end of the character list {5} Character list is repeated 5 times (the beginning of the child expression - a text consecutive character, because it is not a range within a list of characters Metacity [out of the character list [: DIGIT:] POSIX [: DIGIT: DIGIT:] The end of the character list [The beginning of the character list] The end of the character list {4} Character list is repeated 4 times) End parentheses, end child expression ?? Metrios match the sub-expression 0 or 1 time, so that the 4-bit code can be selected to locate the character character, indicating the end of the row
Table 8: Regexp_substr function
Grammatical description
Regexp_substr (Source_String, Pattern
[, position [, occurrence "]]] The regexp_substr function returns a sub-string of matching mode.
Table 9: Regexp_replace function
Grammatical description
Regexp_replace (Source_String, Pattern
[, Replace_String [, Occurrence, [Match_Parameter]]]) This function replaces the matching mode with a specified replace_string to allow complex "search and replacement" operations.
Table 10: Back to reference element characters
Metacity Description / Digit reverse slash follows a number of 1 to 9 between 1 to 9, and the Digit brackets parenthered before the reverse slant line matches the brackets. (Note: Antilans has another meaning in the regular expression, depending on the context, it may also represent an escape character. Table 11: Description of the mode exchange regular expression
Regular expression project description (the beginning of the first sub-expression. Match any single character * repeat operator outside the resolve, match the previous. Element character 0 to N times) The end of the first sub-expression; The match results are obtained in / 1 (in this example, the result is ELLEN.) The blank (the beginning of the second sub-expression. Match any single character * repeat operator outside the restroom, match. The end of the second sub-expression; the match results are acquired in / 2 (in this example, the result is hildi.) Blank (the beginning of the third sub-expression. Match except for the offline Any single-character * repeat operator, before matching. Element characters 0 to n times) The end of the third sub-expression; match results are obtained in / 3 (in this example, the result is smith.)
Table 12: Description of the regular expression of social insurance numbers
Regular expression item description ^ line first character (regular expression can not have any preamble characters before matching.) (Start sub-expression and replaceable options separated by | EMY [: DIGIT:] POSIX Digital Class] The end of the character list {3} Character list is repeated 3-Connect characters [out of the character list [: DIGIT:] POSIX digital class] The end of the character list {2} Character list is repeated 2 times - Another hyphen [: DIGIT:] POSIX number class] The end of the character list {4} Character list is repeated 4 times | Replace the element characters; end the first option and start the next replacement expression [ The beginning of the character list [: DIGIT:] POSIX digital class] The end of the character list {9} Character list is repeated 9 times) End parentheses, ending the sub-expression group of the sub-expression group for replacement, indicating the tail; No extra characters can match the mode