Language Processing with Perl and Prolog Chapter 2

Language Processing With Perl And Prolog Chapter 2-Free PDF

  • Date:28 Mar 2020
  • Views:78
  • Downloads:0
  • Pages:39
  • Size:372.33 KB

Share Pdf : Language Processing With Perl And Prolog Chapter 2

Download and Preview : Language Processing With Perl And Prolog Chapter 2


Report CopyRight/DMCA Form For : Language Processing With Perl And Prolog Chapter 2


Transcription:

Language Technology Chapter 2 Corpus Processing Tools. A corpus is a collection of texts written or spoken or speech. Corpora are balanced from different sources news novels etc. English French German,Most frequent words in a collection the de der. of contemporary running texts of le article die,to la article und. and les des,Most frequent words in Genesis and et und. the de die, Pierre Nugues Language Processing with Perl and Prolog 2 39. Language Technology Chapter 2 Corpus Processing Tools. Characteristics of Current Corpora, Big The Bank of English Collins and U Birmingham has more than 500.
million words,Available in many languages, Easy to collect The web is the largest corpus ever built and within the. reach of a mouse click, Parallel same text in two languages English French Canadian Hansards. European parliament 23 languages, Annotated with part of speech or manually parsed treebanks. Characteristics N of PREP Current ADJ Corpora N,NP NP Characteristics PP of NP Current Corpora. Pierre Nugues Language Processing with Perl and Prolog 3 39. Language Technology Chapter 2 Corpus Processing Tools. Lexicography,Writing dictionaries, Dictionaries for language learners should be build on real usage.
They re just trying to score brownie points with politicians. The boss is pleased that s another brownie point, Bank of English brownie point 6 occs brownie points 76 occs. Extensive use of corpora to,Find concordances and cite real examples. Extract collocations and describe frequent pairs of words. Pierre Nugues Language Processing with Perl and Prolog 4 39. Language Technology Chapter 2 Corpus Processing Tools. Concordances,A word and its context,Language Concordances. English s beginning of miracles did Je,n they saw the miracles which. n can do these miracles that t,ain the second miracle that Je.
e they saw his miracles which,French le premier des miracles que fi. i dirent Quel miracle nous mo,om voyant les miracles qu il. peut faire ces miracles que tu,s ne voyez des miracles et des. Pierre Nugues Language Processing with Perl and Prolog 5 39. Language Technology Chapter 2 Corpus Processing Tools. Collocations,Word preferences Words that occur together. English French German,You say Strong tea Th fort Schmales Gesicht.
Powerful computer Ordinateur puissant Enge Kleidung. You don t Strong computer Th puissant Schmale Kleidung. say Powerful tea Ordinateur fort Enges Gesicht, Pierre Nugues Language Processing with Perl and Prolog 6 39. Language Technology Chapter 2 Corpus Processing Tools. Word Preferences,Strong w Powerful w,strong w powerful w w strong w powerful w w. 161 0 showing 1 32 than,175 2 support 1 32 figure,106 0 defense 3 31 minority. Pierre Nugues Language Processing with Perl and Prolog 7 39. Language Technology Chapter 2 Corpus Processing Tools. Corpora as Knowledge Sources,Short term,Describe usage more accurately. Assess tools part of speech taggers parsers, Learn statistical machine learning models for speech recognition.
taggers parsers, Derive automatically symbolic rules from annotated corpora. Longer term,Semantic processing,Texts are the main repository of human knowledge. Pierre Nugues Language Processing with Perl and Prolog 8 39. Language Technology Chapter 2 Corpus Processing Tools. Finite State Automata,A flexible to tool to search and process text. A FSA accepts and generates strings here ac abc abbc abbbc. abbbbbbbbbbbbc etc, Pierre Nugues Language Processing with Perl and Prolog 9 39. Language Technology Chapter 2 Corpus Processing Tools. Mathematically defined by,Q a finite number of states.
a finite set of symbols or characters the input alphabet. q0 a start state,F a set of final states F Q, a transition function Q Q where q i returns the state. where the automaton moves when it is in state q and consumes the. input symbol i, Pierre Nugues Language Processing with Perl and Prolog 10 39. Language Technology Chapter 2 Corpus Processing Tools. FSA in Prolog,The start state The final states,start q0 final q2. transition q0 a q1,transition q1 b q1,transition q1 c q2. accept Symbols,start StartState,accept Symbols StartState.
accept State,final State,accept Symbol Symbols State. transition State Symbol NextState,accept Symbols NextState. Pierre Nugues Language Processing with Perl and Prolog 11 39. Language Technology Chapter 2 Corpus Processing Tools. Regular Expressions, Regexes are equivalent to FSA and generally easier to use. Constant regular expressions,Pattern String,regular A section on regular expressions. the The book of the life, The automaton above is described by the regex ab c.
grep ab c myFile1 myFile2, Pierre Nugues Language Processing with Perl and Prolog 12 39. Language Technology Chapter 2 Corpus Processing Tools. Metacharacters,Chars Descriptions Examples, Matches any number of occur ac e matches strings ae ace. rences of the previous character acce accce etc as in The. zero or more aerial acceleration alerted the, Matches at most one occur ac e matches ae and ace as in. rence of the previous character The aerial acceleration alerted. zero or one the ace pilot,Matches one or more occur ac e matches ace acce. rences of the previous character accce etc as in as in The. aerial acceleration alerted the, Pierre Nugues Language Processing with Perl and Prolog 13 39.
Language Technology Chapter 2 Corpus Processing Tools. Metacharacters,Chars Descriptions Examples, n Matches exactly n occurrences ac 2 e matches acce as in. of the previous character The aerial acceleration alerted. the ace pilot, n Matches n or more occurrences ac 2 e matches acce accce. of the previous character etc, n m Matches from n to m occur ac 2 4 e matches acce. rences of the previous character accce and acccce, Literal values of metacharacters must be quoted using. Pierre Nugues Language Processing with Perl and Prolog 14 39. Language Technology Chapter 2 Corpus Processing Tools. The Dot Metacharacter, The dot is a metacharacter that matches one occurrence of any character.
except a new line,a e matches the strings ale and ace in. The aerial acceleration alerted the ace pilot, as well as age ape are ate awe axe or aae aAe abe aBe a1e etc. matches any string of characters until we encounter a new line. Pierre Nugues Language Processing with Perl and Prolog 15 39. Language Technology LanguageProcessingwithPerlandProlog Chapter2 CorpusProcessingTools PierreNugues Lund University Pierre Nugues cs lth se http cs lth se pierre

Related Books