For a simple example of a content label consider an imaginary paper entitled, ``POOQ: A parallel, object-oriented query system.'' Suppose that the paper uses some known dynamic programming algorithms to optimize queries for use on parallel machine architectures. A keyword-based approach could classify this paper by using phrases such as ``parallel algorithms,'' ``object-oriented databases,'' ``dynamic programming,'' and ``query optimization.''
Keywords can include topically relevant words and phrases that do not appear in the information objects themselves. Furthermore, some information objects (like images, scientific data and software) have no text that can reasonably be used by traditional IR technology. Keynets enrich the semantics possible with keywords (simple subject classifications) by adding relationships between keywords. In addition to being more expressive than sets of keywords, keynets exhibit more structure and are generally larger, although still much smaller than the entire information object.
To describe a content label for the imaginary POOQ paper above, we must first describe a hypothetical ontology for Computer Science. Assume that one of the classes in this ontology is concerned with the concept of translation or transformation. The content label for the POOQ paper begins with an instance of the transformation class which is linked via an attribute edge labeled ``input'' to an object having the subtype ``declarative language,'' as shown in Figure 1. The label of each node consists of its type followed by its value (if any) separated by a colon. Since the output language is well-known, no further elaboration is needed. However, the input language, POOQ, is not well-known, so it must be specified further with attributes like its name and other attributes not shown.
As an example of a query, consider ``What systems use dynamic programming to generate C* code?'' This is translated into the keynet in Figure 2.
After converting a query into a keynet, it is decomposed into components of bounded size. These fragments are called probes. The fragmentation algorithm is given in [BS94a]. The probes are indexed using a distributed hash index. Content labels are also fragmented and the fragments inserted into the index. The search step consists of hashing the probes and looking for matches with fragments of content labels. The POOQ document in the example has fragments that match every probe of the query.