Uses of Northeastern Patent 5,946,490 by the XML Community

The patent by Lieberherr(Northeastern), Palsberg(UCLA) and Patt-Shamir (Tel Aviv University) was issued before the XML, XPath and XQuery technology was introduced. The patent covers an important aspect of Semantic Query Optimization (SQO). SQO uses schema information to optimize query implementation.

Terminology Mapping

We first map the patent terminology to XQuery terminology
Patent Terminology XQuery Terminology
class graph DTD or XML schema
object graph XML document
traversal specification XPath expression using wildcards
traversal automaton implementation of query
traversal method algorithm implementation of query

The patent takes the schema and the query and turns them into an automaton for efficient query processing. This approach is now widely used. In [Rundensteiner et al.] we read:

Automata are widely used [6 referenced papers] for pattern retrieval over XML ...
The automaton basically replaces the non-deterministic "*" and "//" wildcard navigation steps in the query so that more SQO can be applied. This approach is sometimes also called type inference in the XML community.

Interestingly, the dates of the six referenced papers are 2002 and 2003. Our patent predates all these papers.

The patent is also applicable outside XML, namely for Object Databases, but here we focus on XML.

The patent applies to both streaming and persistent XML. In persistent XML, the patent provides for jumping over irrelevant parts of a document greatly reducing the processing effort. In streaming XML, the patent provides for skipping computations that do not contribute to the final result [Rundensteiner et al.].

Instead of SQO for XQuery, some companies use the term: schema-aware XQuery.

One issue to keep in mind is that XQuery is much more powerful than the queries that are covered by the patent. But XQuery uses what is covered by the patent as a subset. Implementing XQuery requires more technology than what the patent provides but for efficient XQuery processing you need the kind of automata proposed in the patent.

On the other hand, the patent covers more than is needed by XQuery. In other words, the patent can deal with queries that XQuery cannot express directly with iterated wildcards.

Companies who use XML with SQO

There are too many to list: Microsoft, IBM, Google, Oracle, Saxonica, Stylus/Studio, Progress Software, ...

The IBM Case

See: http://www.ibm.com/developerworks/xml/library/x-schemaxslt.html

Quote You can put XML schemas to use during the compilation of the stylesheet or during the runtime in other words, when the input XML document is transformed. The XSLT 2.0 specification says nothing about the compile-time usage of schemas, but computer language theory makes it well known that having extra type information during compile time allows the compiler to make compile-time optimizations to generate efficient code, for example. End of quote.

The Northeastern Patent covers the compile-time optimizations for this special case.

References

[Rundensteiner et al.]
@inproceedings{1083627,
author = {Su, Hong and Rundensteiner, Elke A. and Mani, Murali},
title = {Semantic query optimization for XQuery over XML streams},
booktitle = {VLDB '05: Proceedings of the 31st international conference on Very large data bases},
year = {2005},
isbn = {1-59593-154-6},
pages = {277--288},
location = {Trondheim, Norway},
publisher = {VLDB Endowment},
}
More information: http://www.ccs.neu.edu/home/lieber/tech/available.html