SAS9 New Experience - Using Perl Regular Expression Support in Data Step (Regular Expressions)

xiaoxiao2021-03-06  57

SAS is beginning to support Perl (Perl 5.6.1) regular expression support from the 9th edition, which is very convenient for the simplicity of the data verification. Reliability can only use index, substr, tranwrd, etc. before REGULAR Expressions (RE). The function operates on the string, but these functions are lack of elasticity and low efficiency, SAS9 launched RE, to make it easy to string check, replace, and extract RegexP is called metarachacters. Special characters consisting, these special characters represent special match rules, please refer to http://www.perldoc.com/perl5.6.1/pod/perlre.html

Various use cases are as follows: 1. Data check Data _null_; return RE; Length First Last Home Business $ 16;

IF _N_ = 1 THEN DO; / * Set Phone Matching Mode 1 (XXX) XXX-XXXX * / PAREN = "/ ([2-9] / D / D /)? [2-9] / D / D- / D / D / D / D "; / * Set phone matching mode 2 xxx-xxx-xxxx * / dash =" [2-9] / d- [2-9] / d / d- / d / d / d / d ";

/ * Combined two matching mode, use [|] Special Symbol * / Regexp = "/ (" | | | | DASH || ") /"; / * Judging whether it is the right regular Expression * / re = prxparse (regexp); if missing (re) THEN DO; PUTLOG "Error: Invalid Regexp" regexp;

Input first last home business; / * Enables regular match, return missing * / if ^ prxmatch (re, home) "NOTE: INVALID Home Phone Number for" First Last Home; if ^ prxmatch (Re, Business) ) then putlog "NOTE: Invalid business phone number for" first last business; datalines; Jerome Johnson (919)319-1677 (919) 846-2198Romeo Montague 800-899-2164 360-973-6201Imani Rashid (508)852-2146 (508) 366-9821Palinor Kent.................................................................... number for Ruby ArchuletaNOTE: Invalid business phone number for Ruby ArchuletaNOTE: Invalid home phone number for Takei Ito 7042982145NOTE: Invalid business phone number for Takei ItoNOTE: Invalid home phone number for Tom Joad 209/963 / 2764NOTE: Invalid business phone number for Tom Joad 2099-66-842, replace the string, put

Replace>

Data _null_;

Retain LT_RE GT_RE;

IF _N_ = 1 Then DO;

/ * Set the replacement mode format is: S / regular matching expression / replacement text / * /

LT_re = prxparse ('S /

GT_RE = prxParse ('s /> /> /');

If Missing (LT_RE) or Missing (GT_RE) THEN DO;

Putlog "Error: Invalid Regexp."

STOP;

END;

END;

Input; / * Enable this replacement * / call prxchange (lt_re, -1, _infile_); Call PrxChange (gt_re, -1, _infile_);

PUT _INFILE_; Datalines4; The Bracketing Construct (...) Creates Capture Buffers.to Refer to the DigiT'th Buffer USE /

WITHIN THE MATCH.

Outside the match use "$" instead of "/". (THE)

Notation Works in Certain Circumstances Outside The Match.

See The Warning Below About / 1 VS $ 1 for Details.) Referring

Back to Another Part of The Match Is Called A Backreference.

;;;;;

The output is as follows: The bracketing construct (...) Creates Capture Buffers.to Refer to the DigiT'th Buffer USE / within the match.outside the match use "$" INSTEAD OF "/". (The / < digit> notation works in certain circumstances outside the match.See the warning below about / 1 vs $ 1 for details.) Referringback to another part of the match is called a backreference.3, extract the customer's office phone text data from the customer information _null_ Retain re receivede_re; Length First Last Home Business $ 16; Length AREACODE $ 3; if _n_ = 1 Then Do; / * (xxx) XXX-XXXX * / PAREN = "/ ((([2-9] / D / D ) /)? [2-9] / d / d- / d / d / d / d ";

/ * Xxx-xxx-xxxx * / dash = "([2-9] / d / d) - [2-9] / D / D- / D / D / D / D";

/ * Combine Two Phone Patterns INTO One with a | * / regexp = "/ (" || Paren || ") | (" || dash || ") /";

Re = prxParse (regexp); if missing (re) THEN DO; PUTLOG "Error: Invalid Regexp" regexp;

AreaCode_re = prxParse ("/ 828 | 336 | 704 | 910 | 919 | 252 /"); if Missing (AreaCode_re) Then DO; PUTLOG "Error: Invalid Area Code Regexp";

INPUT FIRST Last Home Business;

IF ^ prxmatch (re, home) THEN PUTLOG "NOTE: INVALID Home Phone Number for" First Last Home; if Prxmatch (Re, required) THEN DO; / * Returns the information of the last match result * / which_format = prxparen (RE); / * Extract strings from matching results * / Call PRXPOSN (RE, WHICH_FORMAT, POS, LEN); AreaCode = Substr (Business, POS, LEN); / * Determine whether the area code of the extracted string matches, the match is output results * / if prxmatch (areacode_re, areacode) then put "In North Carolina:" first last business; end; else putlog "NOTE: Invalid business phone number for" first last business; datalines; Jerome Johnson (919)319-1677 ( 919) 846-2198Romeo Montague 800-979-2164 360-973-6201imani rashid (508 )852-2146 (508) 366-9821palinor Kent 704-782-4673 704-782-3199Ruby Archuleta 905-384-2839 905-328- 3892Takei ITO 704-298-2145 704-298-4738Tom JOAD 515-372-4829 515-372-4829 515-389-2838; Output results are as follows: in North Carolina: Jerome Johnson (919) 846-2198in North Carolina: Palinor Kent 704-782-3199in North Carolina: Takei ITO 704-298-4738

The above source code comes from the SAS website, I just added some comments, which is convenient for the first contact. Please refer to the SAS website for details.

转载请注明原文地址:https://www.9cbs.com/read-114275.html

New Post(0)