Submitted by: Submitted by benbrock26
Views: 10
Words: 2724
Pages: 11
Category: Science and Technology
Date Submitted: 10/01/2015 12:04 PM
Lecture 4 - Good-Turing probability estimation.∗
http://kochanski.org/gpk, http://www.staff.uiuc.edu/∼cls/
March 6, 2006
1
Probabilities of Probabilities
If we’re not careful, we will end up talking about probabilities of probabilities,
using sentences like “. . . the estimator gives the value of the underlying probability that is most probable. . . ” That sentence makes sense and is correct, but
keeping the two different probabilities apart can put a bit of strain on one’s
cranium.
To reduce the cranial strain, we will change the terminology a bit. Instead of
trying to estimate the probability of observing some word P (W ), we’ll imagine
that we have an huge (infinite) corpus and talk about the frequency at which
we observe the word, FW . We’ll call this hypothetical frequency observed in a
near-infinite corpus the “underlying frequency” or sometimes just the frequency.
Now, FW = P (W ): if we have a large enough corpus so that we see each possible
English sentence many times1 , then the rate of observing each word or N-gram
or sentence is just equal to the probability. So we’re really talking about the
same thing; we’ve changed nothing but the name. Still, this lets us talk about
ideas like “The frequency is 90% likely to between 0.0012 and 0.0013” with less
confusion.
The obvious thing we can do now is to use Bayes’ Theorem and compute
P (FW |D) for all possible values of the frequency of the word under considera∗ This
work is licensed under the Creative Commons Attribution-NonCommercialShareAlike 2.5 License. To view a copy of this license, visit http://creativecommons.
org/licenses/by-nc-sa/2.5/ or send a letter to Creative Commons, 543 Howard
Street, 5th Floor, San Francisco, California, 94105, USA. This work is available under
http://kochanski.org/gpk/teaching/0401Oxford.
1 . . . and that’s a rather large corpus. If we assume there are about 100,000 words in
English and that the longest sentence we care about is 30 words, then...