Next: Introduction
A unified approach to high-performance,
vector-based information retrieval
Kenneth Baclawski
and
J. Elliott Smith
Northeastern University
College of Computer Science
Boston, Massachusetts 02115
(617) 373-4631
FAX: (617) 373-5121
{kenb,esmith}@ccs.neu.edu
Abstract:
An information retrieval model based on the vector space model
is proposed that unifies and extends many
commonly used retrieval mechanisms. A distributed architecture and
indexing algorithm for high-performance retrieval using
this model has been developed. A prototype system has been built that
achieves a throughput of 500 queries per second with a response time of less
than one second on an 8-node network of workstations. The model and
algorithm are designed for retrieval from a corpus of information objects
in a single subject area. The objects need not be textual, and must be
annotated with content labels. With current technology, our system can be
scaled up to support a corpus of several million information objects.
Finally, the model allows for content labels that are semantically more
complex than just attributes, keywords and subject classifications.