265 29 An Introduction to Perl Regular Expressions in SAS 9

265 29 An Introduction To Perl Regular Expressions In Sas 9-Free PDF

  • Date:10 Dec 2019
  • Views:147
  • Downloads:0
  • Pages:16
  • Size:477.63 KB

Share Pdf : 265 29 An Introduction To Perl Regular Expressions In Sas 9

Download and Preview : 265 29 An Introduction To Perl Regular Expressions In Sas 9


Report CopyRight/DMCA Form For : 265 29 An Introduction To Perl Regular Expressions In Sas 9


Transcription:

SUGI 29 Tutorials, using the forward slashes as the default Perl delimiters The letters cat inside the slashes specify an exact match to. the characters cat Each time you compile a regular expression SAS assigns sequential numbers to the resulting. expression This number is needed to perform searches by the other PRX functions such as PRXMATCH PRXCHANGE. PRXNEXT PRXSUBSTR PRXPAREN or PRXPOSN Thus the value of PATTERN NUM in this program is one In this. simple example the PRXMATCH function is used to return the position of the word cat in each of the strings The two. arguments in the PRXMATCH function are the return code from the PRXPARSE function and the string to be searched. The result is the first position where the word cat is found in each string If there is no match the PRXMATCH function. returns a zero Let s look at the output from Program 1. Perl Regular Expression Tutorial Program 1, PATTERN NUM 1 STRING There is a cat in this line POSITION 12. PATTERN NUM 1 STRING Does not match CAT POSITION 0. PATTERN NUM 1 STRING cat in the beginning POSITION 1. PATTERN NUM 1 STRING At the end a cat POSITION 15,PATTERN NUM 1 STRING cat POSITION 1. Notice that the value of PATTERN NUM is 1 in each observation and the value of POSITION is the location of the letter c. in cat in each of the strings In the second line of output the value of POSITION is 0 since the word cat lowercase was. not present in that string, Be careful Spaces count For example if you change the PRXPARSE line to read. IF N 1 THEN PATTERN NUM PRXPARSE cat,then the output will be.
PATTERN NUM 1 STRING There is a cat in this line POSITION 11. PATTERN NUM 1 STRING Does not match CAT POSITION 0. PATTERN NUM 1 STRING cat in the beginning POSITION 0. PATTERN NUM 1 STRING At the end a cat POSITION 14,PATTERN NUM 1 STRING cat POSITION 0. Notice that the strings in lines 3 and 5 no longer match because the regular expression has a space before and after the. word cat The reason there is a match in the fourth observation is that the length of STRING is 30 and there are trailing. blanks after the word cat, Perl regular expressions use special characters called metacharacters to represent classes of characters Named in. honor of Will Rogers I never meta character I didn t like Before we present a table of Perl regular expression. metacharacters it is instructive to introduce a few of the more useful ones The expression d refers to any digit 0 9 D. to any non digit and w to any word character A Z a z 0 9 and The three metacharacters and are. particularly useful because they add quantity to a regular expression For example the matches the preceding. subexpression zero or more times the matches the previous subexpression one or more times and the matches the. previous expression zero or one times So here are a few examples using these characters. PRXPARSE d d d matches any three digits in a row,PRXPARSE d matches one or more digits. PRXPARSE w w w matches any word with two or more characters followed by a space. PRXPARSE w w matches one or two word characters such as x xy or X followed by one or. more spaces, PRXPARSE w w d matches two word characters followed by one or more spaces followed. by a single digit followed by one or more spaces Note that the expression for the two. word characters w w is placed in parentheses Using the parentheses in this way. creates what is called a capture buffer The second set of parentheses around the d. represent the second capture buffer Several of the Perl regular expression functions. can make use of these capture buffers to extract and or replace specific portions of a. string For example the location of the two word characters or the single digit can be. obtained using the PRXPOSN function,SUGI 29 Tutorials.
Remember that the quotes are needed by the PRXPARSE function and the outer slashes are used to delimit the regular. expression Since the backslash forward slash parentheses and several other characters have special meaning in a. regular expression you may wonder how do you search a string for a character or a left or right parenthesis You do this. by preceding any of these special characters with a backslash character in Perl jargon called an escape character So to. match a in a string you code two backslashes like this To match an open parenthesis you use. The table below describes several of the wild cards and metecharacters used with regular expressions. Metacharacter Description Examples, Matches the previous subexpression zero or more cat matches cat cats catanddog. times c at matches c cat and catatat, Matches the previous subexpression one or more d matches one or more digits. Matches the previous subexpression zero or one hello matches hell and hello. period Matches exactly one character r n matches ron run and ran. d Matches a digit 0 to 9 d d d matches any three digit number. D Matches a non digit D D matches xx ab and, Matches the beginning of the string cat matches cat and cats but not the. Matches the end of a string cat matches the cat but not cat in the. xyz Matches any one of the characters in the square ca tr matches cat and car. a e Matches the letters a to e a e D matches adam edam and car. a eA E Matches the letter a to e or A to E a eA E w matches Adam edam and. n Matches the previous subexpression n times d 5 matches any 5 digit number and is. equivalent to d d d d d, n Matches the previous subexpression n or more times w 3 matches cat NULL and is. equivalent to w w w, n m Matches the previous subexpression n or more times w 3 5 matches abc abcd and abcde.
but no more than m, abcxyz Matches any characters except abcxyz 8 d d matches 123 and 999 but not. x y Matches x or y c a o t matches cat and cot, s Matches a white space character including a space d s d matches one or more digits. or a tab followed by one or more spaces followed,by one or more digits such as 123 4. Note space, w Matches any word character upper and lowercase w w w matches any three word characters. letters blank and underscore, Matches the character d d d matches three digits in.
parentheses such as 123, Matches the character d d d matches three digits in. parentheses such as 123, Matches the character D D matches the character Note. 1 Matches the previous capture buffer and is called a d D d 1 matches 9a99a9 but not. back reference 9a97b7,1 matches any two repeated characters. This is not a complete list of Perl metacharacters but it s enough to get you started The Version 9 Online Doc or any book. on Perl programming will provide you with more details Examples of each of the PRX functions in this tutorial will also help. you understand how to write these expressions,SUGI 29 Tutorials. Function used to define a regular expression,Function PRXPARSE.
Purpose To define a Perl regular expression to be used later by the other Perl regular expression functions. Syntax PRXPARSE Perl regular expression, Perl regular expression is a Perl regular expression Please see examples in the tutorial and in the sample. programs in this chapter The PRXPARSE function is usually executed only once in a DATA step and the return. value is retained, The forward slash is the default delimiter However you may use any non alphanumeric character instead of. Matching brackets can also be used as delimiters Look at the last few examples below to see how other delimiters. may be used, If you want the search to be case insensitive you can follow the final delimiter with an i For example. PRXPARSE cat I will match Cat CAT or cat see example 4 below. Function Matches Does not Match,PRXPARSE cat The cat is black cots. PRXPARSE cat cat on the roof The cat,PRXPARSE cat There is a cat cat in the house.
PRXPARSE cat i The CaT no dogs allowed,PRXPARSE r aeiou t rat rot rut rt and rxt. PRXPARSE d d d 345 and 999 three digits followed by a space 1234 and 99. PRXPARSE d d d 123 and 12 any two or three digits 1 1AB 1 9. PRXPARSE d d d 123 and 12345 three or more digits 12X. PRXPARSE d d d 123 12 12345 two or more digits 1 and xyz. PRXPARSE r n ron ronny r9n r n rn,PRXPARSE 1 5 d 6 9 299 106 337 666 919 11. PRXPARSE d x d 56 and x9 9x and xx,PRXPARSE a e D fX 9 AA aa 99 b. PRXPARSE sysin dd the is here,PRXPARSE a or in cols 1 and 2 123. PRXPARSE equivalent to previous expression,PRXPARSE d d any two digits ab.
PRXPARSE cat the cat is black cots,Functions to locate text patterns. PRXSUBSTR call routine,PRXPOSN call routine,PRXNEXT call routine. Function PRXMATCH, Purpose To locate the position in a string where a regular expression match is found This function returns the first. position in a string expression of the pattern described by the regular expression If this pattern is not. found the function returns a zero, Syntax PRXMATCH pattern id or regular expression string. SUGI 29 Tutorials, pattern id is the value returned from the PRXPARSE function.
regular expression is a Perl regular expression placed in quotation marks version 9 1 and higher. string is a character variable or a string literal. Regular Expression String Returns Does not match returns 0. cat The cat is black 5 cots,cat cat on the roof 1 The cat. cat There is a cat 12 cat in the house,cat I The CaT 5 no dogs allowed. r aeiou t rat rot rut 1 rt and rxt,d d d 345 and 999 1 1234 and 99. d d d 123 and 12 1 1 1AB 1 9,d d d 123 and 12345 1 12. d d d 123 12 12345 1 1 and xyz,r n ron ronny r9n r n 1 rn.
1 5 d 6 9 299 106 337 1 666 919 11,d x d 56 and x9 1 9x and xx. a e D fX 9 AA 1 aa 99 b,sysin dd 1 the is here,a or in cols 1 and 2 1 123. Examples of PRXMATCH without using PRXPARSE STRING The cat in the hat. Function Result,PRXMATCH cat STRING 4,PRXMATCH d d AB123 3. Program 2 Using a regular expression to search for phone numbers in a string. Primary functions PRXPARSE PRXMATCH,DATA PHONE,IF N 1 THEN PATTERN PRXPARSE d d d d d d d 4. Regular expression will match any phone number in the form. nnn nnn nnnn or nnn nnn nnnn,matches a left parenthesis.
d d d matches any three digits,blank matches zero or one blank. d d d matches any three digits,matches a dash,d 4 matches any four digits. RETAIN PATTERN,INPUT STRING CHAR40,IF PRXMATCH PATTERN STRING GT 0 THEN OUTPUT. One number 123 333 4444,Two here 800 234 2222 and 908 444 2344. PROC PRINT DATA PHONE NOOBS,TITLE Listing of Data Set Phone.
SUGI 29 Tutorials,Explanation, To search for an open parenthesis you use a The three d s specify any three digits The closed parenthesis is written. as The space followed by the means zero or one space This is followed by any three digits and a dash Following. the dash are any four digits The notation d 4 is a short way of writing d d d d The number in the braces indicates. how may times to repeat the previous subexpression Since you only execute the PRXPARSE function once remember to. RETAIN the value returned by the function, Since the PRXMATCH function returns the first position of a match any line containing one or more valid phone numbers. will return a value greater than zero Output from PROC PRINT is shown next. Listing of Data Set Phone,RETURN STRING,1 One number 123 333 4444. 1 Two here 800 234 2222 and 908 444 234, Program 3 Modifying Program 2 to search for toll free phone numbers. Primary functions PRXPARSE PRXMATCH,Other function MISSING.
DATA TOLL FREE,IF N 1 THEN DO,RE PRXPARSE 8 00 77 87 d d d d 4 b. Regular expression looks for phone numbers of the form. nnn nnn nnnn or nnn nnn nnnn In addition the first. digit of the area code must be an 8 and the next two. digits must be either a 00 77 or 87,IF MISSING RE THEN DO. PUT ERROR IN COMPILING REGULAR EXPRESSION,INPUT STRING CHAR80. POSITION PRXMATCH RE STRING,IF POSITION GT 0 THEN OUTPUT. One number on this line 877 234 8765,No numbers here.
One toll free one not 908 782 6354 and 800 876 3333 xxx. Two toll free 800 282 3454 and 887 858 1234,No toll free here 609 848 9999 and 908 345 2222. PROC PRINT DATA TOLL FREE NOOBS,TITLE Listing of Data Set TOLL FREE. Explanation, Several things have been added to this program compared to the previous one First the regular expression now searches. for numbers that begin with either 800 877 or 887 This is accomplished by placing an 8 in the first position and then. using the or operator the to select either 00 77 or 87 as the next two digits One other difference between this. expression and the one used in the previous program is that the number is followed by a word boundary a space or end of. line b Hopefully you re starting to see the impressive power of regular expressions by now The MISSING function tests. if its argument is missing or not If you have an invalid regular expression the value of RE will be missing and the MISSING. function will return a true value See the listing of data set TOLL FREE below. SUGI 29 Tutorials,Listing of Data Set TOLL FREE,RE STRING POSITION. 1 One number on this line 877 234 8765 25, 1 One toll free one not 908 782 6354 and 800 876 3333 xxx 42.
1 Two toll free 800 282 3454 and 887 858 1234 15, Program 4 Using PRXMATCH without PRXPARSE entering the regular expression directly in the function. Primary functions PRXMATCH,DATA MATCH IT,INPUT 1 STRING 20. POSITION PRXMATCH d d d STRING,LINE 345 IS HERE,ABC1234567. PROC PRINT DATA MATCH IT NOOBS,TITLE Listing of Data . Paper 265 29 An Introduction to Perl Regular Expressions in SAS 9 Ron Cody Robert Wood Johnson Medical School Piscataway NJ Introduction Perl regular expressions were added to SAS in Version 9

Related Books