This page contains discussions of some of the points of contact between this work and the work in the FRVDR98 Fall Symposium, as recorded in the Proceedings (URL or citation). See the Contents list for the symposium at the end of this page. My comments here will be reasonably self-contained, but they will be more meaningful to someone who has a copy of the proceedings in hand.
Free Rides and SPAS -- "Free rides" is a concept that is mentioned by various papers, e.g., (Gurr), and refers to Shimojima's 1996 paper. It means that when viewing a diagram, certain inferences are available directly by inspection of the diagram, and follow directly from the spatial layout of the diagram. Examples include Euler and Venn diagrams. Our SPAS spatially associative data structure can be thought of from this viewpoint. If you want to know if a certain object passes near a certain small region in space, you simply look in the corresponding SPAS cell to see if it contains a reference to the object. That is, SPAS directly represents the near relation -- it is a type of spatial cache. (An alternate representation could store all objects near to some object O in a near list attached to O -- another type of cache. The near lists would not necessarily have size N2 since, for example, a set of well-spaced points would have zero entries in all their near lists.) Similarly, if we want to know if a set of points is horizontally aligned, we can see if they all appear in some cell in the y projection array of SPAS.
Generalized equivalence relations (GERs) -- We have stressed in the past that human perception is one of the sources of free rides, (Futrelle, 1990). For example, a collection of aligned items is a "pop-out" phenomena, as evidenced by the existence of illusory contours (also called subjective contours), (Petry, 1987). These occur when for example, a set of small objects arranged in space to form a broken circumference of a circle, which lead to the illusion of an actual circle whose interior is lighter than the surrounding plane. A related phenomenon is the pop-out phenomena in which psychophysical experiments have determined that people can detect the presence of an "O" in a collection of "X" distractors in the visual field in a time that is independent of the number of distractors (Treisman, 1985, 1990). Thus, the extraction of such information is a "free-ride" in that it is done as a pre-attentive parallel operation by the visual system. In general, we refer to uniform sets of objects that are collected together as an equivalence set or an extension of the equivalence relation such as near or equally spaced, as obeying a Generalized Equivalence Relation. Another example is a collection of data points in a region of a graph all of the same size and shape, e.g., triangles. Our algorithm normally chooses the maximal collection of the possible sets of the elements so related. This choice can also be argued on entropy grounds and more generally by minimal-length encoding theory or MLE (Li and Vitanyi, 1993).
Local extent -- (Foo) discusses the importance of local extent. Since diagrams are finite objects, direct reasoning about them must have local extent. Again, we have developed SPAS to aid such reasoning.
Perception and understanding -- (Hayes and Laforte) have an interesting and provocative paper that discusses the relation between perception and understanding using variations on the diagrammatic proof of Pythagoras' theorem. One reason this is important is that parsers such as ours are faced with the problem of geometric inference. Consider the following innocent example, analogous to a class of problems met in diagram parsing -- what are the letter-number pairings in the following two sequences: b 40 k 12 p 32 versus 36 b 40 k 12 p? The sequences agree precisely in the central subsequence b 40 k 12 p, but the first sequence clearly groups as (b 40) (k 12) (p 32) and the second, (36 b) (40 k) (12 p). This is a potential, but in fact, resolvable, ambiguity, and some reasoning is required to establish the grouping. The grammar-based approaches to parsing are not well-suited for such tasks.
Scaffolding -- In (Allwein) it is pointed out that the Greek approach to geometry often involved auxillary constructions or "scaffolding". In our grammars, because of the controlled sequence of top-down expansion of productions, we can introduce "scaffolding" and then use it to support the discovery of the constituents of interest. This is done, for example, in the Diagram rule:
where Axis is a non-terminal that is required to be made up of a long horizontal line and a long vertical line which touch at the lower left (details here). Once Axis is determined in this way, the remaining constituents can refer to its components in their productions, e.g., extracting the long horizontal line from Axis for use in the X-Axis production.
Relation of Chok and Marriott's work to ours -- Perhaps the most obvious point of contact between our work and work described in the symposium, is Chok and Marriott's paper on the Penguins project. They build parsers for diagrams that can operate in batch mode or incrementally in a pen-based drawing application. They also have a solution to the layout problem, error detection, and more. This is excellent work, which in some ways makes our work seem modest by comparison. But our work has very different origins and goals, which explains at least some of the differences. Our projects have focused on understanding diagrams from the research literature, so our approach to parsing and the examples we parse have been chosen accordingly. Nevertheless, the grammatical formalism we use is very close to Chok and Marriott's (due in part to the fact that we studied Marriott's work, among others, early in the development of our system some time ago). Our system uses both synthesized and inherited attributes. The major inherited attribute (passed down the tree as parsing proceeds) is the context, which is the set of primitive objects that can be used by a rule in searching for a solution, an idea not in their work. The synthesized attributes, passed up the tree, are not restricted to being chosen from attributes of the primitives. For example, we could compute and pass up the smallest distance between any two of the points in a set of n points found in solving a particular production, or the spacing of a set of equally spaced tick marks. As they do, we implement negative constraints -- the example we mentioned earlier was (not (polylinep Line)) in which we want to ignore any lines that are a side of a polygon. The relation between their grammar for finite-state automata and our grammar is very close. But our rule for defining a final state is notably simpler than theirs because we use the contain constraint, that one circle be contained within another, rather than object-type-specific computations involving the centers and radii of the two circles.
Our parser is also notably slower than theirs, even on a fast machine. But they have focused on interactive applications in which speed is of the essence, and have used Borning's highly refined constraint solver, itself designed for on-line operation. We have been more concerned with expressiveness than speed, and undoubtedly could speed up our system by an order-of-magnitude, were that required. But the literature of science that is our focus is finite and diagrams in the hard sciences appear in the published literature at a rate of a few million per year. An upper bound would be 10 million distinct diagrams published per year. Since there are 32 million seconds per year, all diagrams in the published literature could be parsed, as fast as they are produced, by a parser that could parse a diagram in three seconds, on average. The more important questions have to do with the goals of parsing diagrams. For us, it is the building of knowledge bases that will give users access the huge collections of diagrammatic information that is contained in the research literature of science and engineering.
One of the operations Chok and Marriott's system can do is related to unrestricted productions which can have more than one left-hand-side symbol. In their Figure 2, they create four lines from two intersecting lines. Interestingly, this brings up a major paradox that they do not discuss, which we might call the occlusion paradox. Given a collection of primitive objects in a representation of an image, e.g., a file listing the primitives and their attributes, the appearance of the diagram may differ notably from the representation. In the two intersecting lines case, it is impossible for a human observer to know whether there are two intersecting lines, or four lines with some coincident endpoints, or indeed, two L-shaped broken lines with a common corner. Sometimes, when people draw diagrams using an interactive drawing application, they deliberately use occlusion to produce a particular visual effect. This is often done because of the limitation of the drawing application. The extreme example of the paradox is the blank sheet. In this example, an arbitrarily complex drawing is created. Then a large white opaque, borderless rectangle (or circle or whatever) is drawn with dimensions large enough and position chosen to totally cover all elements in the original drawing. The parser should report an empty image, but no parser that I know of, including ours, would return such an interpretation.
I will have more to say about Chok and Marriott's important paper when I act as the discussant for it at the Symposium. The reader is also referred to the recent extensive survey on visual language specification and recognition (Marriott, Meyer, and Wittenburg, 1998).
Allwein, G., Marriott, K., & Meyer, B. (1998). Formalizing Reasoning with Visual & Diagrammatic Representations. Rept. No FS-98-04. AAAI Press.
Futrelle, R. P. (1990). Strategies for Diagram Understanding: Object/Spatial Data Structures, Animate Vision, and Generalized Equivalence. In 10th ICPR, (pp. 403-408): IEEE Press.
Li, M., & Vitanyi, P. (1993). An Introduction to Kolmogorov Complexity and Its Applications. New York: Springer-Verlag.
Marriott, K., Meyer, B., & Wittenburg, K. (1998). A Survey of Visual Language Specification and Recognition. In K. Marriott & B. Meyer (Eds.), Visual Language Theory (pp. 5-85): Springer Verlag.
Petry, S., & Meyer, G. E. (Ed.). (1987). The Perception of Illusory Contours: Springer-Verlag.
Shimojima, A. (1996). Operational Constraints in Diagrammtic Reasoning. In G. Allwein & J. Barwise (Eds.), Logical Reasoning with Diagrams (pp. 27-48). New York: Oxford.
Treisman, A. (1985). Preattentive Processing in Vision. Computer Vision, Graphics, and Image Processing, 31, 156-177.
Treisman, A., & Sato, S. (1990). Conjunction Search Revisited.
Journal of Experimental Psychology: Human Perception & Performance, 16(3), 459-478.
Excerpt from the online description of the technical report of the symposium:
October 23-25, Orlando, Florida
Technical Report FS-98-04
112 pp., $25.00
Visual Language Specification and Recognition
Kim Marriott 1
Theories of Visual and Diagrammatic Reasoning: Foundational Issues
Corin A. Gurr 3
Diagrammatic Reasoning and Color
Michael Anderson and Chris Armen 13
Verification of Diagrammatic Proofs
Mateja Jamnik, Alan Bundy, and Ian Green 23
Gerard Allwein 31
Diagrammatic Reasoning: Analysis of an Example
Patrick J. Hayes and Geoffrey L. LaForte 33
Diagrammatic Reasoning about Actions Using Artificial Potential
Marcello Frixione, Gianni Vercelli, and Renato Zaccaria 39
Local Extent in Diagrams
Norman Foo 51
A Logic-based Formalism for Reasoning about Visual Representations
Volker Haarslev 57
Generating User Interfaces for Pen-based Computers
Sitt Sen Chok and Kim Marriott 67
Hypergraph Representations of Diagrams in Diagram Editors
Mark Minas 79
Gerard Allwein 87
Diamond: Diagrammatic Reasoning System Demonstration
Mateja Jamnik, Alan Bundy, and Ian Green 95
The BITPICT Computation System
George W. Furnas 97
Demonstration of the Diagram Understanding System
Robert P. Futrelle 99
GenEd: A Generic Editor for Reasoning about Visual Notations
Volker Haarslev and Michael Wessel 101
VISCO: Querying GIS with Spatial Sketches
Volker Haarslev and Michael Wessel 103
Michael Anderson 105
Interpretation of Visual Notations in the Recopla Editor Generator
Bernd Meyer and Hubert Zweckstetter 107
|Next: (no link for now)|
|Back to Index page|