Cells in the spatial index (SPAS)

Objects are installed in cells in a spatial index

A spatial index (SPAS == Spatially Associative Substrate) is used to store the regions occupied by the primitives and higher-level objects in a diagram. The basic collection of cells is typically a 64x64 square array, covering the space occupied by the diagram. Each object is installed in the array by creating an object reference in every array cell that the object occupies or passes through. The spatial array operates as an associative index mapping from 2-space to objects. In addition, two linear spatial projection arrays for the x and the y axes are also filled in the same way. The spatial array is used to efficiently compute spatial relations that are important in parsing. For example, finding out whether two objects A and B are near one another is done by inspecting the cells occupied by A for the presence of a B object (or vice-versa). To avoid near misses that can occur in such computations, objects are also installed in the 8-neighbors of each cell. That is, two objects can be very close together but happen to be in two distinct but adjacent cells. They should be considered near in such a case, and the strategy we have used is to look in the eight nearest neighbor cells. This, in effect, essentially "coarsens" the grid, but that can be overcome simply by using a finer grid (deeper pyramid). Important relations such as horizontally-aligned can be computed immediately from the projection arrays. Though it is obvious how to compute predicates, such as, are A and B near one another?, it is often more important to generate sets of objects that satisfy a given spatial relation. This is also easily accomplished using SPAS -- to find all objects near A we form the union of all other occupants of cells occupied by A. To accommodate spatial predicates that operate at various levels of resolution, SPAS is actually a pyramid of arrays of size 2ⁱx2ⁱ, i=0,6 (the last figure, 6, is a settable parameter) and the objects are installed in all levels of the pyramid. Generated sets are typically used to restrict the context that is passed to a constituent rule.

Multiple copies of higher-level objects may be constructed during a parse, but only one copy of a given constituent with identical leaf nodes is installed in the SPAS and referred to in the solution. This retains object integrity for all constituents. In the future, it may be possible to exploit this to even avoid rediscovering constituents, much as natural language chart parsers do. But diagram parsing cannot rely on the simple linear position index that natural language chart parsers do, so this will be a challenging efficiency issue. (We have experimented with memoization, but that also needs further work.)

One of the benefits of the spatial index is that once objects are installed, geometric computations can be done based on cells, ignoring the objects' geometric structure, so that it is as easy to find objects near a Bezier curve as it is to find objects near a straight line or a text item. Higher-level objects are installed in SPAS as the parsing process proceeds and each higher-level object occupies the cells that are the union of the cells occupied by the primitives descendants.

In the figure below, we show how the diagram viewer can display the cells occupied by an arbitrary object, in this case the higher-level object DATA-LINE.

When lines or Bezier curves are installed, their endpoints are also installed as distinct, but related, objects. In the figure above it is clear that a connected sequence of data lines has been identified by the parser. This is done by using the spatial index to look in the neighborhood of a line end-point to see if the (starting) end-point of another line is there. In this way, a chain of connected lines or curves can be built up.

Next: Finite-state automata diagrams -- using sharing for context

Back to Index page