Next: Insertion. Up: Index Structure. Previous: Data Structure.

## Searching.

To search for a keynet q, one first performs substitutions on the nodes of q, obtaining a list of variants for each node. One then fragments q into a set of probes having at most three vertices, at most one of which is a variant node. See Corollary 2 in the Appendix for a bound on how many probes can be generated. Let P(q) be the set of probes.

The next step is to compute h(p) for every p in P(q). This can be done in parallel. The hash values are then sent to their buckets. More precisely, at a bucket with address a, one collects the weights w(p) and partial hash values l(n-m,p) of the probes that satisfy f(m,p)=a. Each partial hash value is then used for searching within its bucket. This step uses well-known algorithms.

The tree searching can be done in parallel. However, in this case, one must face the problem that not all buckets will have the same number of probes. To process all the probes in parallel will require a time proportional to the maximum number of probes occurring in a bucket. This can be large, and it wastes resources because only a few (or even just one) of the parallel processors will be doing useful work when the last few probes are being processed. However, because of the statistical nature of the algorithm, it is acceptable to discard a small number of probes. When this is done, the number of parallel processing cycles needed can be reduced to reasonable values. For example, if there are 1,000 processors and 2,000 probes, and if one is willing to discard 2 probes on average, then it suffices to process at most 7 probes at each bucket. See the discussion and table in the Appendix for details.

The result of each search within a bucket is a set (possibly empty) of document identifiers. Each document identifier is associated with the weight w(p) of the probe. Sets of document identifiers at different buckets are then merged, with the weights being accumulated for each document. The document identifiers having the highest accumulated weight are then passed to the next module for post-processing.

Next: Insertion. Up: Index Structure. Previous: Data Structure.

kenb@ccs.neu.edu
Fri Jan 20 22:04:00 EST 1995