Regular Expressions Perl

Regular Expressions Perl-Free PDF

  • Date:14 May 2020
  • Views:75
  • Downloads:0
  • Pages:32
  • Size:729.41 KB

Share Pdf : Regular Expressions Perl

Download and Preview : Regular Expressions Perl


Report CopyRight/DMCA Form For : Regular Expressions Perl


Transcription:

What Are They, The term Regular Expression now commonly abbreviated to RegExp or even RE simply refers. to a pattern that follows the rules of syntax outlined in the rest of this chapter Regular expressions are. not limited to perl Unix utilities such as sed and egrep use the same notation for finding patterns in. text So why aren t they just called search patterns or something less obscure. Well the actual phrase itself originates from the mid fifties when a mathematician called Stephen. Kleene developed a notation for manipulating regular sets Perl s regular expressions have grown and. grown beyond the original notation and have significantly extended the original system but some of. Kleene s notation remains and the name has stuck, History lessons aside it s all about identifying patterns in text So what constitutes a pattern And how. do you compare it against something, The simplest pattern is a word a simple sequence of characters and we may for example want to. ask perl whether a certain string contains that word Now we can do this with the techniques we have. already seen We want to split the string into separate words and then test to see if each word is the one. we re looking for Here s how we might do that,usr bin perl. match1 plx,use warnings,use strict,my found 0, Nobody wants to hurt you cept I do hurt people sometimes Case.
my sought people,foreach my word split,if word eq sought. print Hooray Found the word people n,Sure enough the program returns success. perl match1 plx,Hooray Found the word people, But that s messy It s complicated and it s slow to boot Worse still the split function which breaks. each of our lines up into a list of words we ll see more of this later on in the chapter actually keeps all. the punctuation the string you wouldn t be found in the above whereas you would This looks. like a hard problem but it should be easy Perl was designed to make easy tasks easy and hard things. possible so there should be a better way to do this This is how it looks using a regular expression. Regular Expressions,usr bin perl,match1 plx,use warnings. use strict, Nobody wants to hurt you cept I do hurt people sometimes Case.
print Hooray Found the word people n, This is much much easier and yeilds the same result We place the text we want to find between. forward slashes that s the regular expression part that s our pattern what we re trying to match We. also need to tell perl which particular string we re looking for in that pattern We do this with the. operator This returns 1 if the pattern match was successful in our case whether the character sequence. people was found in the string and the undefined value if it wasn t. Before we go on to more complicated patterns let s just have a quick look at that syntax As we noted. previously a lot of Perl s operations take as a default argument and regular expressions are one such. operation Since we have the text we want to test in we don t need to use the operator to bind. the pattern to another string We could write the above even more simply. Nobody wants to hurt you cept I do hurt people sometimes Case. print Hooray Found the word people n, Alternatively we might want to test for the pattern not matching the word not being found. Obviously we could say unless people but if the text we re looking at isn t in we may also. use the negative form of that operator which is For example. usr bin perl,nomatch plx,use warnings,use strict, Nobody wants to hurt you cept I do hurt people sometimes Case. if gibson fish,print There are no fish in William Gibson n. True to form for cyberpunk books that don t regularly involve fish we get the result. perl nomatch plx,There are no fish in William Gibson.
Literal text is the simplest regular expression of all to look for but we needn t look for just the one word. we could look for any particular phrase However we need to make sure that we exactly match all the. characters words with correct capitalization numbers punctuation and even whitespace. usr bin perl,match2 plx,use warnings,use strict, Nobody wants to hurt you cept I do hurt people sometimes Case. print I do is in that string n,if sometimes Case,print sometimes Case matched n. Let s run this program and see what happens,perl match2 plx. I do is in that string, The other string didn t match even though those two words are there This is because everything in a. regular expression has to match the string from start to finish first sometimes then a space then. Case In there was a comma before the space so it didn t match exactly Similarly spaces inside. the pattern are significant,usr bin perl,match3 plx.
use warnings,use strict,my test1 The dog is in the kennel. my test2 The sheepdog is in the field,if test1 dog. print This dog s at home n,if test2 dog,print This dog s at work n. This will only find the first dog as perl was looking for a space followed by the three letters dog. perl match3 plx,This dog s at home, So for the moment it looks like we shall have to specify our patterns with absolute precision As. another example look at this,usr bin perl,match4 plx.
use warnings,use strict,Regular Expressions, Nobody wants to hurt you cept I do hurt people sometimes Case. print I guess it s just the way I m made n,print Case Where are you Case n. perl match4 plx,Case Where are you Case, Hmm no match Why not Because we asked for a small c when we had a big C regexps are. if you ll pardon the pun case sensitive We can get around this by asking perl to compare. insensitively and we do this by putting an i for insensitive after the closing slash If we alter. the code above as follows,print I guess it s just the way I m made n. print Case Where are you Case n,then we find him,perl match4 plx.
I guess it s just the way I m made, This i is one of several modifiers that we can add to the end of the regular expression to change its. behavior slightly We ll see more of them later on,Interpolation. Regular expressions work a little like double quoted strings variables and metacharacters are. interpolated This allows us to store patterns in variables and determine what we are matching when we. run the program we don t need to have them hard coded in. Try it out Pattern Tester, This program will ask the user for a pattern and then test to see if it matches our string We can use this. throughout the chapter to help us test the various different styles of pattern we ll be looking at. usr bin perl,matchtest plx,use warnings,use strict. q I wonder what the Entish is for yes and no he thought. Tolkien Lord of the Rings,print Enter some text to find.
my pattern STDIN,chomp pattern,if pattern,print The text matches the pattern pattern n. print pattern was not found n,Now we can test out a few things. perl matchtest plx,Enter some text to find wonder,The text matches the pattern wonder. perl matchtest plx,Enter some text to find entish,entish was not found. perl matchtest plx,Enter some text to find hough,The text matches the pattern hough.
perl matchtest plx,Enter some text to find and no,The text matches the pattern and no. Pretty straightforward and I m sure you could all spot those not in as well. How It Works,matchtest plx has its basis in the three lines. my pattern STDIN,chomp pattern,if pattern, We re taking a line of text from the user Then since it will end in a new line and we don t necessarily. want to find a new line in our pattern we chomp it away Now we do our test. Since we re not using the operator the test will be looking at the variable The regular expression. is pattern and just like the double quoted string pattern the variable pattern is. interpolated Hence the regular expression is purely and simply whatever the user typed in once we ve. got rid of the new line,Escaping Special Characters. Of course regular expressions can be more than just words and spaces The rest of this chapter is. going to be about the various ways we can specify more advanced matches where portions of the. match are allowed to be one of a number of characters or where the match must occur at a certain. position in the string To do this we ll be describing the special meanings given to certain characters. called metacharacters and look at what these meanings are and what sort of things we can express. Regular Expressions, At this stage we might not want to use their special meanings we may want to literally match the.
characters themselves As you ve already seen with double quoted strings we can use a backslash to. escape these characters special meanings Hence if you want to match in the above text you. need your pattern to say For example,perl matchtest plx. Enter some text to find Ent,The text matches the pattern Ent. perl matchtest plx,Enter some text to find Ent,Ent was not found. We ll see later why the first one matched due to the special meaning of. These are the characters that are given special meaning within a regular expression. which you will need to backslash if you want to use literally. Any other characters automatically assume their literal meanings. You can also turn off the special meanings using the escape sequence Q After perl sees Q the 14. special characters above will automatically assume their ordinary literal meanings This remains the. case until perl sees either E or the end of the pattern. For instance if we wanted to adapt our matchtest program just to look for literal strings instead of. regular expressions we could change it to look like this. if Q pattern E,Now the meaning of is turned off,perl matchtest plx. Enter some text to find Ent,Ent was not found, Note that all Q does is turn off the regular expression magic of those 14 characters above it doesn t.
stop for example variable interpolation, Don t forget to change this back again We ll be using matchtest plx throughout the chapter. to demonstrate the regular expressions we look at We ll need that magic fully functional. So far our patterns have all tried to find a match anywhere in the string The first way we ll extend our. regular expressions is by dictating to perl where the match must occur We can say these characters. must match the beginning of the string or this text must be at the end of the string We do this by. anchoring the match to either end, The two anchors we have are which appears at the beginning of the pattern anchor a match to the. beginning of the string and which appears at the end of the pattern and anchors it to the end of the. string So to see if our quotation ends in a full stop and remember that the full stop is a special. character we say something like this,perl matchtest plx. Enter some text to find,The text matches the pattern. That s a full stop which we ve escaped to prevent it being treated as a special character and a dollar. sign at the end of our pattern to show that this must be the end of the string. Try if you can to get into the habit of reading out regular expressions in English Break them into. pieces and say what each piece does Also remember to say that each piece must immediately follow the. other in the string in order to match For instance the above could be read match a full stop. immediately followed by the end of the string, If you can get into this habit you ll find that reading and understanding regular expressions becomes a.
lot easier and you ll be able to translate back into Perl more naturally as well. Here s another example do we have a capital I at the beginning of the string. perl matchtest plx,Enter some text to find I,I was not found. We use to mean beginning of the string followed by an I In our case though the character at the. beginning of the string is a so our pattern does not match If you know that what you re looking for. can only occur at the beginning or the end of the match it s extremely efficient to use anchors Instead. of searching through the whole string to see whether the match succeeded perl only needs to look at a. small portion and can give up immediately if even the first character does not match. Let s see one more example of this where we ll combine looking for matches with looking through the. lines in a file,Try it out Rhyming Dictionary, Imagine yourself as a poor poet In fact not just poor but downright bad so bad you can t even think. of a rhyme for pink So what do you do You do what every sensible poet does in this situation and. you write the following Perl program,usr bin perl,rhyming plx. use warnings,use strict,my syllable ink,print if syllable. Regular Expressions, We can now feed it a file of words and find those that end in ink.
perl rhyming plx wordlist txt, For a really thorough result you ll need to use a file containing every word in the dictionary be. prepared to wait though if you do For the sake of the example however any text based file will do. though it ll help if it s in English A bobolink in case you re wondering is a migratory American. songbird otherwise known as a ricebird or reedbird. How It Works, With the loops and tests we learned in the last chapter this program is really very easy. print if syllable, We ve not looked at file access yet so you may not be familiar with the while. Regular Expressions 11 15 Restate my assumptions 1 Mathematics is the language of nature 2 Everything around us can be represented and understood through numbers 3 If you graph these numbers patterns emerge Therefore There are patterns everywhere in nature Max Cohen in Pi 1998 Whether or not you agree that Max s assumptions give rise to his conclusion is your own opinion but

Related Books