\label{sec:example}

The goal of this paper is to show that strategies can be used to speed
up query evaluation.  As Gottlob \etal~point out, the way XPath is
defined motivates an implementation approach that leads to highly
inefficient (exponential-time) XPath processing \cite{gottlob02}. So,
in this section we show that for some queries and documents, an
evaluation without meta information may visit exponentially many
nodes.  To do so, we first define a
\term{naive evaluation} of a query to be a complete traversal of a document.
%% This is wrong
%% At every node $n$, this evaluation proceeds as follows:
%% \begin{enumerate}
%% \item If we are at a node that satisfies this query, we stop.
%% \item Otherwise, we visit the children of $n$ left-to-right.
%% \end{enumerate}
The key here is that we choose an arbitrary but consistent order to
visit the children of a given node.  So, consider the DTD $D_n$
\dtd{
\dtdl{root}{\a0}
\dtdl{\a0}{\a1 \; \b0}
\dtdl{\b0}{\a1}
\dtdl{\a1}{\a2 \; \b1}
\dtdl{\b1}{\a2}
&\vdots&\\
\dtdl{\b{n-1}}{\a{n}}
\dtdl{\a{n}}{\PCDATA} 
}
and a query
\begin{equation}
Q_n = \slash root \slashslash \a0 \slashslash \b0 \slashslash \cdots
\slashslash \a{n} .
% \slashslash \a{n} \slashslash \b{n} .
\label{eq:ex}
\end{equation}
 A document satisfying this DTD is denoted $P_n$, and one such
 document for $n=3$ has the structure shown in Figure
\ref{fig:ex}.  Intuitively a ``smart'' traversal of this document
simply proceeds down the left-hand branch of the tree, and this is
shown as the 'double-circles' in Figure \ref{fig:ex}. But, if we choose
a right-to-left ordering to visit the children of a node we will visit
all nodes before arriving at our target node, \a{n}.  Without loss of
generality, we could simply swap the $a$ and $b$ in every $\a{i}
\rightarrow \a{i+1} \; \b{i}$ production to $\a{i} \rightarrow \b{i}
\; \a{i+1}$ if we had chosen a left-to-right ordering.  A naive
nagivation would, then, visit all the nodes of this tree.

\begin{figure}
\centering
\epsfig{file=ex2.ps,height=3.5in}
\caption{Satisfying document for DTD (\ref{eq:ex}) with $n=3$.}
\label{fig:ex}
\end{figure}

We state this lower bound formally through Theorem
\ref{thm:lowerbound}.
\begin{theorem}
There is a sequence of query/DTD/document triples $(Q_n,D_n,P_n)$ such
that $P_n$ conforms to $D_n$, $|Q_n| = O(n)$, $|D_n| = O(n)$, and
$|P_n| = o(2^n)$, and so that the naive evaluation will visit $o(2^n)$
nodes in $P_n$ while the meta-information-based evaluation will visit
$O(n)$ nodes in $P_n$.
\label{thm:lowerbound}
\end{theorem}

\begin{proof}
Consider the DTD and query above.  For an $n>0$, the DTD has $5n+4$
nodes, and any satisfying document will have $2^{n+1}+3$ nodes.
Additionally if we choose a left-to-right traversal, the naive
evaluation will visit all $2^{n+1}+3$ nodes before arriving at the one
node satisfying the query, \a{n}.
\end{proof}

The strategy graph construction algorithm in \cite{strategies:toplas}
will construct the graph shown in Figure \ref{fig:sg} for DTD
(\ref{eq:ex}).

\begin{figure}
\centering
\epsfig{file=ex3.ps,height=3.5in}
\caption{Strategy graph constructed for DTD (\ref{eq:ex}) with $n=3$.}
\label{fig:sg}
\end{figure}


We now give an inductive proof showing that the traversal with this
strategy visits exactly $2n+1$ nodes before arriving at the target
\a{n}.

\begin{proof}
The root must have one child, \a0, since $n>0$ so we visit the root.
We then proceed by induction on the index of the current node, $i$.
If $i=n$ then we visit 2 nodes -- \a{i} and its one child \b{i}.  For
$i \ne n$, \a{i} has two children, \a{i+1} and \b{i+1}.  The strategy
graph does not permit a move to \a{i+1}, so our only move is to visit
2 more nodes --
\b{i+1} and its only child \a{i+1}; both allowed by the strategy.
\end{proof}

So, we've shown the lower bound of naive evaluation can visit
exponentially many nodes, where as evaluation with a strategy may
visit just a linear number of nodes.



