IS1320 Sp03 Quiz 1 review  Prof. Futrelle
Quiz to be given on Thursday 10 April  Closed book/notes
 Given the first line of the formula for sim(d_{j},q) on pg 27,
be able to write out the second line as a sum over the components
of the weight vector. I will give you the dot product, you have
to come up with the summation version.

Be able to write out an example of the similarity computation, e.g.,
if the query vector has only the word "tuna" with weight 3 and
a document has "tuna" and "boats" with weights 3 and 4 respectively,
compute the sim(doc,query) numerically.
Be able to explain why the similarity is between 0.0 and 1.0,
and never negative.
In addition, you might try
to sketch the vectors in 2space to show the result graphically.

Be ready to discuss and deal with an example of the normalized frequency
in Eq. 2.1. For example, given the raw frequencies for a halfdozen words,
in one or two documents, compute their normalized frequencies.
You have to memorize the definition of normalized frequency in Eq. 2.1.
Discuss the possible range of the normalized frequency values.

Memorize the expression for the inverse document frequency in Eq. 2.2.
Be able to describe the meaning of the terms N and n_{i}.
Be able to come up with an example of a common word and what its
document frequency might be, as well as a less common word.
Make up examples with simple log values, e.g., if you use log_{10}
then log_{10}(1) = 0, log_{10}(10) = 1,
log_{10}(100) = 2  handy for examples.

Be able to draw the set diagram, Fig. 3.1 and to define Precision
and Recall in terms of the sizes of the appropriate sets.

Make up four simple numerical example to illustrate the combinations
of high and low Recall and high and low Precision.

Explain how it is possible to get perfect Recall but very difficult to
ever get perfect Precision.
Go to IS1320 home page.
Return to Prof. Futrelle's home page