Semantically Rich Information Retrieval

In addition to the commonly used IR mechanisms, KEYNET supports a semantically richer form of information retrieval. However, as Lakoff points out[Lak87], ``Human categorization is based on principles that extend far beyond those envisioned in the classical theory.'' As a result, simple classification methods leading to taxonomies of concepts are inadequate for expressing the rich variety of human categorization techniques.

Unfortunately, since no IR systems currently use such a mechanism, one cannot evaluate whether it would improve retrieval effectiveness. Our contribution is a ``proof-of-existence'' that there are no technological or user interface barriers to building a high-performance IR system of this kind.

KEYNET gains potential semantic leverage relative to traditional vector space methods by responding to relations between keywords, in addition to (possibly accidental) co-occurrences. Consider the query ``Can potatoes get bunions?'' of Figure 2. In this case, the entire query represents one of the fragments. Other fragments may be regarded as varying degrees of generalization of the original query. For example, the specific concept `potato' may be abstracted into the conceptual category `plant' to yield the fragment ``plants that have bunions;'' or `bunion' may be replaced by the conceptual category `acquired abnormality' to produce the corresponding fragment ``potatoes that have acquired abnormalities.''

The best matches for the query ``Can potatoes get bunions?'' will, roughly speaking, be the content labels that include the largest number of index terms (content label fragments) that match the query fragments. Therefore an information object that deals with plants that get bunions or potatoes with acquired abnormalities is considered a better match than an information object that discusses, say, potatoes as a treatment for bunions, or mentions potatoes and also bunions but in different contexts. This is the sense in which a KEYNET system can be said to have ``understood'' a query.

