Efficient Navigation of Objects using Regular Expressions

Speaker: Karl Lieberherr, Northeastern University, CCIS, Boston

Abstract:

Navigation of rooted object graphs and rooted XML documents are two ubiquitous operations in object-oriented programs and XML processing applications. In a control-driven setting, navigation is used for information retrieval as epitomized by DOM-based XML applications and field access in OOPs. In a data-driven setting, navigation can be used to guide a traversal, that would normally visit the entire object graph or the XML document, away from certain fruitless paths. Navigation in a data-driven setting is epitomized by the set of ``accept'' methods in an instance of the visitor design pattern.

Abbreviated navigation specification enables developers to avoid unnecessarily hard coded dependencies between the methods and the structure of the object graph and thus lead to increased reusability and maintainability of programs.

Formalisms for expressing abbreviated navigation specification fall into two broad categories: explicit and implicit. Explicit formalisms provide developers with wild cards to replace the abbreviated path components. Wild cards can also be iterated to replace multiple consecutive abbreviated path components. Examples of explicit formalisms are XPath where developers are provided with ``//'' as form of iterated wild card and regular expressions where the Kleene star is the wild card.

Implicit formalisms do not provide developers with iterated wild cards. Instead, the navigation paths are translated into an explicit form by inserting wild cards into certain places defined by what we call an expansion semantics. Implicit formalisms aim at saving the developers from writing too many wild cards to make their methods flexible. An example of an implicit formalisms is the regular-expression-like strategy language in adaptive programming.

In both formalisms, iterated wild cards pose two problems: the semantic problem of ambiguity and an efficiency problem when traversals are naively guided by a navigation specification that contains iterated wild cards.

This talk contributes to the ambiguity problem by introducing two kinds of restricted wild cards that are guaranteed not to introduce ambiguity to any non-ambiguous abbreviated navigation specification written in an implict formalism. This paper also contributes to the efficiency problem by introducing a new construction for efficient execution of traversals guided by a navigation specification containing iterated wild cards. The new construction subsumes the traversal graph construction in [TOPLAS04]: it is simpler, more efficient and derived systematically from a characterization of the traversal using regular language closure properties.

Joint work with Ahmed Abdelmeged, Mitchell Wand and Pete Manolios.

TOPLAS04: Karl Lieberherr and Boaz Patt-Shamir and Doug Orleans, Traversals of object structures: Specification and Efficient Implementation, ACM Trans. Program. Lang. Syst., 26(2), 2004.