Demeter ideas in data description languages

This page is a response to the paper: The Next 700 Data Description Languages by Fisher, Mandelbaum, and Walker (POPL 2006).

Since 1988, Demeter class dictionaries and the types defined in them specify both the external data format (a sequence of characters) and a mapping into a data structure in the host programming language, e.g. Flavors, C++, or Java. Each type defines both a parser and, more conventionally, a classifier for parsed data.

A class dictionary consists of a class graph and a syntax and pretty printing aspect. A class graph is a labeled graph with is-a and has-a edges. Basically, the syntax aspect and the pretty printing aspect add tokens and pretty printing information before and after each node or edge. Class dictionaries are defined in my book at: http://www.ccs.neu.edu/research/demeter/technology-transfer/dd/index.html. See chapters 11, 12 and 16. The first journal paper on class dictionaries is from 1988: http://www.ccs.neu.edu/research/demeter/biblio/class-dictionaries.html.

A class dictionary G defines a set of objects Objects(G) and a print function print: Objects(G) -> Sentences(G). We require that the print function is a bijection in which case there is a parse function parse: Sentences(G) -> Objects(G).

A class dictionary G defines a lot of other useful software, see

http://www.ccs.neu.edu/research/demeter/biblio/class-dictionaries.html

Our current tool, DemeterJ, generates: PrintVisitor, DisplayVisitor, TraceVisitor, CopyVisitor and EqualVisitor. For an example, see:

http://www.ccs.neu.edu/home/lieber/courses/csu670/f05/hw/4/visitor-usage/program.beh

The pretty printer is controlled by 4 pretty printing commands.

A Demeter type is either 
T1|T2	(union type; parse one or the other)
T1.T2   (sum type; parse T1, then T2)
"start" T {"separator" T} "end"  (sequence type; parse T one or more times)
Class dictionaries have been used in numerous courses at Northeastern on a regular basis since about 1986. Class dictionaries make it easy for an undergraduate student to learn to design ad-hoc data languages. Class dictionaries have been mentioned in several publications from the Demeter group at Northeastern and other groups, starting with the original Law of Demeter paper and recently in our XAspects work that builds on DemeterJ and in the ICSE 2004 keynote paper.

The idea of using types to define parsers has been reinvented by the PADS project. However, they cover a larger set of languages that cannot be expressed by class dictionaries.

http://www.padsproj.org/

The POPL 2006 paper calls the following a remarkable insight: Types can describe data in both its external and internal (programmatic) forms. But this is also the insight behind class dictionaries.

The PADS project goes beyond the Demeter technology: 1. They have dependent sums (Parsing of one component may control the parsing of a following component). 2. They use dynamic predicates that are checked during parsing. 3. They find all errors, not just the first one. (But those are basically semantic errors; they don't have a good way to recover from parsing errors.) 4. They evaluate expressions during parsing to fill in fields. 5. They can deal with a larger set of languages. Here is an example from the POPL 2006 paper expressed in Demeter types and PADS types to show that the two projects overlap. Both sets of types define basically the same language (modulo slight differences in the meaning of the base types). This is the Newick Format Description for representing trees.

Newick in Demeter:

node_t = [node_t1].
node_t1 =  Pstring ":"  Puint32.
tree_t : Internal | Leaf.
Internal =  CList(tree_t) ":"  Puint32.
CList(S) ~ "(" S {"," S} ")".
Leaf = node_t.
Start =  tree_t ";".
Puint32 = Integer.
Pstring = Ident.


Newick in PADS:

node_t = Popt Pstruct {
  name : Pstring(":"); ":";
  dist : Puint32;
};
Prec tree_t = Punion {
  internal : Pstruct {
    "("; branches : tree_t Parray(",",")");
    "):"; dist : Puint32;
  };
  leaf : node_t;
};
Pstruct { body : tree_t; ";"; }

Example input for both cases:
(B:3,(A:5,C:10,E:2):12,D:0):32;

The PADS project continues to explore Demeter space: They use XQuery to explore the data which brings Adaptive Programming ideas into the mix.

An interesting project would be to add PADS dependent types to Demeter (e.g., the DAJ tool). How does this affect the readability of the languages? How does this extend the expressiveness? How does it complicate the tools? Can we still use standard tools like ANTLR? How can we create bijections between Object(G) and Sentences(G) in the presence of dependent types?

Why is the PADS work not proving round-tripping theorems in the sense of Wadler's POPL 03 paper? For class dictionaries we can prove those round-tripping theorems. See: http://www.ccs.neu.edu/home/lieber/courses/csg711/f05/lectures/karl-2/XML-sem-gen.ppt

PADS seems to promote mixing the concern of parsing with the concern of s checking. In principle, these two concerns should be separated. But it might be justified to violate this separation for the benefit of more succinct data description languages. A better analysis of this tradeoff is necessary.

Notes about simulating PADS in Demeter.

IN PADS we can parse a sentence:

3ABC4DEFG 
using dependent types. In Demeter we would simulate this with:
"ABC""DEFG" 
with the added semantic check that the first string must have length 3 and the second length 4. Certainly, the Demeter sentence is two characters longer but it is also more robust to data errors.

Dependent types are also useful for self-describing data. Consider

string "This is a String" number 33 string "x".
But this type of use of dependent types can be simulated in Demeter with:
SelfDescribing = List(Term) ".".
Term = Str | Num.
Str = "string" String.
Num = "number" Integer.
List(S) ~ {S}.

Acknowledgements: Contributions to this page by Bryan Chadwick.

Demeter Home Page

Professor Karl J. Lieberherr
College of Computer and Information Science, Northeastern University
360 Avenue of the Arts
Boston, MA 02115-9959
Phone: (617) 373 2077 / Fax: (617) 373 5121

Updated Nov. 26, 2005.