%
% this does not make sense: where are the non-terminals? Actually, it does.

We adopt Wood's model of DTDs \cite{wood03containment}.  For a DTD $D$
this model consists of a finite alphabet, $\Sigma$, a root type,
denoted \froot{D}, and a mapping that associates with each $a \in \Sigma$ a
regular expression of $\Sigma$.  This is most easily written as a EBNF
grammar where the production right-hand sides are regular
expressions and the alphabet elements are non-terminals.

We use the model used by Miklau \etal for XML documents as trees over
infinite alphabets \cite{543623}.  We also use the fragment of XPath
known as \XPfrag, which consists of node tests, the child axis (/),
the descendant axis (//), and wildcards (*).  This model does not
capture the full XPath, but is sufficient for analyzing the speed we
can gain over structural queries.  A full semantics of an XPath query
is given in \cite{wadler99formal}.

The following is adapted from \cite{mitch:karl-2001}.
A \term{class graph} consists of a set $C$ of classes; a set $E$ of
field names; and for each $e \in E$ a relation denoted $e$ (``has part
named $e$'') on classes, and a reflexive transitive relation $\le$ on
classes (``is a subclass of'').

If $C$ is a class graph, then an \term{object graph} for $C$ consists of:
\begin{enumerate}
\item a set $O$ (``of objects''),
\item a map \class : $O \rightarrow C$, and
\item for reach $e \in E$, a relation denoted $e$ on $O$ such that if \fe{\o1}{\o2}, then
$$
\inbetween{\fclass{\o1}}{e}{\fclass{\o2}}
$$
\end{enumerate}

We say that $o$ is of type $c$ when $\fclass{o} \le c$.

The traversal of an edge labeled $e$ corresponds to retrieving the
value of the \e{} field.  Condition 3 captures the notion that every
edge in the object graph is an image of a has-as-part edge in the
class graph: There is an edge \fe{\o1}{\o2} in $O$ only when there
exist classes \c1 and \c2 such that \o1 is of type \c1, \c1 has an
\e{}-part of type \c2, and \o2 is of type \c2, that is,
$$
\inbetween{\fclass{\o1}}{\c1 \; e \; \c2}{\fclass{\o2}}
$$



The valid traversals of a class graph is given in terms of \FIRSTf{}
sets.  For each pair of classes \c{} and $c^\prime$ we have an edge $e
\in \fFIRST{c}{c^\prime}$ iff it is possible for an object of class \c{}
to reach an object of type $c^\prime$ by a path beginning with an edge
\e{}.  Precisely, \fFIRST{c}{c^\prime} = \{ $e \in E \; | \;$  there exists
an object graph $O$ of $C$ and object $o$ and $o^\prime$ such that: 
\begin{enumerate}
\item $\fclass{c} = c$,
\item $\fclass{o^\prime} \le c^\prime$,
\item $o \; eO* \; e^\prime$ \}
\end{enumerate}
The last condition says that there is a path from $o$ to $o^\prime$ in
the object graph, consisting of an edge labeled $e$, followed by any
sequence of edges in the graph.  Furthore more, Wand and Lieberherr
show that
$$
\fFIRST{c}{c^\prime} = \{ e | 
\inbetween{c}{e}{( \le C \ge)^* \le c^\prime}
 \}.
$$
That is, an object of class $c^\prime$ is reachable from an object of
class $c$ starting with edge \e{} if we can follow \e{} so some object
$o^\prime$; then we can follow \hasaedge s to an object of type
$c^\prime$.  So, a traversal of an object graph in then specified
transitively as \FIRSTf sets.

Traversals may be implemented efficiently using strategy graphs.
These graphs lay out a road map for traversing only those edges of an
object graph that can lead to sucessful endpoints.  Formally, a
strategy graph is given by a set of states $Q$, a transition relation $S$ on
states, a map $\class{} : Q \rightarrow C$, a set $QI \subset Q$ of
initial states, and a set $QF \subset Q$ of final states.  A complete
discussion of strategy graphs is found in \cite{strategies:toplas},
but the important point of these graphs is that we can use them to
efficiently traverse object graphs.

A \term{path} in a graph is a sequence $\v1 \cdots \v{n}$ where $\v1,
\dots, \v{n}$ are nodes of the graph; and $\v{i} \rightarrow \v{i+1}$
is an edge of the graphs for all $i \in 1 \dots n-1$.  We call \v1 and
\v{n} the \term{source} and \term{target} nodes of the path,
respectively.

A strategy, then, selects valid paths in an object graph, and uses
information from that object's class graph to prune out paths that
could never lead to a successful result.

To execute an XPath query efficiently we can compute the corresponding
strategy graph and let this graph prune away paths that could never
lead us to a successful outcome.  We show this translation in Section
\ref{sec:translation}, but first we give a motivating example showing
that query evaluation without meta information may visit exponentially
many nodes.

We use the term \term{meta information} to refer to any information
that describes the structure of other information.  The meta
information of an XML document is its DTD; the meta information of an
object graph is its class graph.

