vldb2005

Abstract

XPath is the defacto navigation language for XML documents conforming to DTD and it is important to have efficient evaluation and checking techniques available for XPath. Traversal strategies, a component of Adaptive Programming, are a well studied navigation language for objects conforming to class graphs and an efficient evaluation technique has been developed in the previous millenium. Traversal strategies don't deal with the full generality of XPath but instead they focus on ancestor-descendent (ad) edges. Indeed, a traversal strategy is a graph consisting only of ad edges with optional negative constraints on the edges. While it has been widely acknowledged in the data base community that DTD may speed up XPath evaluation, no paper has shown an exponential improvement by using the DTDs. We show an infinite sequence of XPath expression/DTD/document triples so that evaluation using the meta information in the DTDs is exponentially faster. We draw a strong connection between traversal strategy evaluation and XPath evaluation and show how this leads to comprehensive logical optimization techniques for XPath. Our evaluation is optimal in the sense that by visiting one node less than our approach visits would lead to wrong results. Finally, we present empirical results of our speed up and explore various classes of XML documents that can benefit from this type of evaluation. We believe that the automata-theoretic algorithmic ideas presented in this paper should be an important building block of any efficient XPath evaluator.

The paper: [pdf] [ps.gz] [src]

Last built: Sat May 13 13:06:49 EDT 2006 by jpalm on denali