## IS1320 Sp03 Quiz 1 review - Prof. Futrelle

### Quiz to be given on Thursday 10 April - Closed book/notes

1. Given the first line of the formula for sim(dj,q) on pg 27, be able to write out the second line as a sum over the components of the weight vector. I will give you the dot product, you have to come up with the summation version.
2. Be able to write out an example of the similarity computation, e.g., if the query vector has only the word "tuna" with weight 3 and a document has "tuna" and "boats" with weights 3 and 4 respectively, compute the sim(doc,query) numerically. Be able to explain why the similarity is between 0.0 and 1.0, and never negative. In addition, you might try to sketch the vectors in 2-space to show the result graphically.
3. Be ready to discuss and deal with an example of the normalized frequency in Eq. 2.1. For example, given the raw frequencies for a half-dozen words, in one or two documents, compute their normalized frequencies. You have to memorize the definition of normalized frequency in Eq. 2.1. Discuss the possible range of the normalized frequency values.
4. Memorize the expression for the inverse document frequency in Eq. 2.2. Be able to describe the meaning of the terms N and ni. Be able to come up with an example of a common word and what its document frequency might be, as well as a less common word. Make up examples with simple log values, e.g., if you use log10 then log10(1) = 0, log10(10) = 1, log10(100) = 2 -- handy for examples.
5. Be able to draw the set diagram, Fig. 3.1 and to define Precision and Recall in terms of the sizes of the appropriate sets.
6. Make up four simple numerical example to illustrate the combinations of high and low Recall and high and low Precision.
7. Explain how it is possible to get perfect Recall but very difficult to ever get perfect Precision.