Corpora Assignment

Submitted by: Submitted by

Views: 41

Words: 1655

Pages: 7

Category: English Composition

Date Submitted: 11/09/2014 10:07 AM

Report This Essay

I

a) Normalized frequency 

Normalized frequency shows the number of occurrences of a word per million words. For instance, when searching for the word car in the BNC it returns 26690 matches in 2241 different texts, which means that out of 98.313.429 words, which is the size of the BNC, we get the result of a frequency of 271.48 instances per million words. While the BNC does this equation automatically, one has to do it manually in the OIEC. When searching for the word car in the OIEC it returns 1193 matches (the OIEC does not specify the number of texts in the search result). The OIEC contains 7 million words, which means that we have to calculate manually to find out the frequency per million words using the equation 1193 x 1.000.000 / 7.000.000 = a frequency of 170.43 instances per million words. By using Normalized frequency we are able to compare corpora of different sizes. In our case, car has a frequency of 271.48 instances per million words in the BNC, while it has a frequency of 170.43 instances per million words in the OIEC. We can conclude that car appears with a higher frequency per million words in the BNC than the OIEC. To find out why, one needs to further investigate. One can for example look at the time span of the corpora, what kind of texts it consists of, if the texts are of spoken or written language, and so on (Lecture handouts/notes).

b) KWIC concordance

KWIC concordance, KWIC meaning Key Word in Context, sorts the word we search for, the key word, in the middle of the result list and makes it easier to see what the surrounding text looks like (Lecture handouts/notes). By using KWIC concordance the result page of the corpora, in our case the BNC, looks a lot clearer and systematic, making it easier to spot how the keyword is connected to its neighbouring words and sentences. As an example, when searching for the word pants it returns 541 matches in 303 different texts, with a frequency of 5.5 instances per million words. The BNC...