Material for NU Tech Expo/Scholarship Showcase

to be held 5/11/2000

from Robert P. Futrelle


_____________________________________________________________________
Robert P. Futrelle        |  Biological Knowledge Laboratory
Associate Professor       |  College of Computer Science  161CN
Office: (617)-373-4239    |  Northeastern University Fax:   
(617)-373-5121            |  360 Huntington Ave. futrelle@ccs.neu.edu    
                          |  Boston, MA 02115 
           http://www.ccs.neu.edu/home/futrelle
_____________________________________________________________________

Abstract: Automated Understanding of Diagrams

Diagrams of all sorts are critical in the presentation of topics in all types of writing, from the daily newspaper to complex technical articles in journals. Diagrams are created in a wide variety of forms including data graphs, block diagrams, gene diagrams, mechanical drawings, maps, and more. Our research in the Biological Knowledge Laboratory over the last ten years has looked at a variety of aspects of automated diagram analysis. The four aspects we will describe in our presentation are: 1) Parsing diagrams to build representations of their content (our primary emphasis); 2) Summarization of complex diagrams or sets of diagrams by a simpler diagram; 3) Ambiguity in diagrams in which the visual appearance of a diagram may "mean" more than one thing, and the question is, which? 4) Text/Graphics Discourse: How text and diagrams work together to form a single coherent "discourse" that is greater than the sum of its parts.

Summary:

The presentation (poster) will be divided into four parts, corresponding to the four topics listed in the Abstract. The poster will include screen shots from our Diagram Understanding System as well as edited excerpts from our publications in all four areas.

1) Parsing: There are many automated techniques that exploit the text in electronic documents to automatically index, retrieve, summarize, and translate them, as well as to make the text available for editing and inclusion in still other documents. There are almost no techniques of this sort for diagrams, primarily because the internal structure of diagrams is not laid out explicitly in their digital representations Ð often diagrams are only available as raster images, e.g., as GIFS on the Web. This project will develop a new class of automated analysis tools for vector diagrams (not bitmap diagrams) that will analyze the graphical elements and the text associated with diagrams to create rich descriptions of diagram syntax and semantics. This allows collections of diagrams to be intelligently indexed and queried and forms the basis for systems that can compare, merge, summarize, and otherwise manipulate diagrammatic information in useful ways. Results and impact: Enhanced understanding of the syntax and semantics of diagrams; efficient systems for extracting syntactic and semantic information from vector diagrams; strategies for representing diagram structure and content in electronic form; demonstration collections of knowledge-based diagrams stored and indexed in a database; prototypes of systems that allows users to do content-based, Web-based searching of diagrams in our databases and view the results.

2) Summarization: There is substantial work on automated text summarization but almost none on the automated summarization of graphics. Four examples of diagrams from the scientific literature can be used to indicate the problems and possible solutions: a table of images, a flow chart, a set of x,y data plots, and a block diagram. Manual summaries are constructed. Two sources of information are used to guide summarization. The first is the internal structure of the diagram itself, its topology and geometry. The other is the text in captions, running text, and within diagrams. The diagram structure can be generated using the author's constraint-based diagram understanding system. Once the structure is obtained, procedures such as table element elision or subgraph deletion are used to produce a simpler summary form. Then automated layout algorithms can be used to generate the summary diagram.

3) Ambiguity: As broad-coverage visual languages grammars are developed, they will generate multiple interpretations Ð ambiguities. Four classes of ambiguities in diagram parsing are discussed. Two of them are familiar from natural language: lexical ambiguity and structural ambiguity. In a diagram, the named role of an object such as an arrow is a lexical ambiguity. Structural ambiguity includes attachment, e.g., to what object is a text label attached, and analytical ambiguities, e.g., does a line in a data graph represent data or is it a fiduciary? Two types of ambiguities are unique to graphics: Occlusion ambiguities and segmentation ambiguities. Occlusion ambiguities may be simple: the overlay of a data point on a data line, or synthetic: where objects are deliberately arranged so that occlusion produces a novel object. Segmentation ambiguities arise when a line is intersected by another and the resulting segments act as distinct elements. Ambiguity resolution often uses nearness and alignment, but more complex ambiguities require the integration of substantial amounts of information. For synthetic occlusion, the diagram is re-rendered to create new objects for analysis.

4) Text/Graphics Discourse: This is an area of our research that is currently quite active, working with a graduate assistant, Anna Rumshisky. There has been a good deal of research on the overall discourse structure of human conversations (dialogue) and text. This work looks at overall structure beyond the structure of single sentences which is often the focus of linguistic studies. But very little discourse research has looked at the integrated structure of text and graphics in documents. And this is in spite of the fact that a large percentage of the millions of publications in the world every year use include graphics in some form! Our current work is focused on broadly understandable examples such as found in books such as "The New Way Things Work" by Macaulay and "What do People Do All Day" by Scarry. We will parse some of the diagrams and parse the accompanying text and work to automatically integrate the two parsed representations into a greater whole, all the time developing a theoretical approach to characterize such discourse.

Other information: Website: http://www.ccs.neu.edu/home/futrelle where the primary links to follow are the ones to Research and the one to my long CV.