From lieber@ccs.neu.edu Wed Dec 17 22:10:05 1997 To: lieber@ccs.neu.edu, wand@ccs.neu.edu, will@ccs.neu.edu Subject: Re: Not jumping to conclusions [was: Eliminating the professor's code] Cc: dem@ccs.neu.edu, jcurry@ccs.neu.edu, lblando@ccs.neu.edu, palsberg@cs.purdue.edu Hi Will: Showing that programming approach A is better than approach B is indeed a challenging task. Only if A is "obviously" better than B has it a chance of success. Adaptive programs have this quality of being obviously better: they are shorter and more generic and in the worst-case never worse than an OO program. We build on Java technology, including the Java compiler compiler. We don't have our own parser generator. More below. >From will@ccs.neu.edu Wed Dec 17 14:03:20 1997 >From: William D Clinger >To: lieber@ccs.neu.edu, wand@ccs.neu.edu >Subject: Re: Not jumping to conclusions [was: Eliminating the professor's code] >Cc: dem@ccs.neu.edu, jcurry@ccs.neu.edu, lblando@ccs.neu.edu, > palsberg@cs.purdue.edu, will@ccs.neu.edu >Status: R > >>>KL> 1. Professor does not have to prepare scanner and parser. >>>KL> Students master entire project from scratch. >>> >>I think that the issue is that with Demeter/Java the objects (syntax trees) >>are available so that they are "strategically traversable". >>Lex/Yacc does not generate class definitions with traversal >>code. > >I think this is mostly, but perhaps not entirely, a red herring. >The scanner and parser that I supplied for COM 3355 generated >abstract syntax trees, so they were "strategically traversable" >as well. If there is any advantage here for the Demeter/Java >approach, it would have to be related to the automatic coupling >between the representation of the abstract syntax trees and the >traversal strategies. I'm not sure how much automatic coupling >there was in the Demeter/Java version, or how much there could be, >because the structure of the abstract syntax trees cannot be >inferred automatically from the concrete grammar. That's why >parser generators allow arbitrary code to be associated with >productions. > >For a compiler or interpreter, I think it's probably almost as >easy to express the traversal strategies as straight code as to >express them as Demeter/Java traversal patterns, partly because >most of the traversal strategies are in-order. > Consider a very simple interpreter: a metrics program for programming language P. Let P=Java. You would work many yours to write out the traversal code to navigate the Java programs while we write a few sexy strategies to get the job done. See: http://www.ccs.neu.edu/research/demeter/course/f97/projects/JavaMetrics or ask Larry Dodds. I know what you mean by pre-order and post-order but what does in-order mean an object? > >>Luis Blando's project >>http://www.ccs.neu.edu/research/demeter/evaluation/gte-labs/ >>seems to indicate that to develop a parser with Demeter/Java is >>easier than with Lex/Yacc. > >For almost all scanner generators X and parser generators Y, >developing a parser with X/Y is easier than with Lex/Yacc. (Lex >isn't so bad, but yacc is, especially if you would like to report >syntax errors in an understandable way and recover from them as >well.) For COM 3355 I used LexGen and ParseGen, both of which I >wrote myself. LexGen isn't finished (for example, it uses Cambridge >Polish notation to express the annotated regular expressions), so >it doesn't make sense to make a detailed comparison between it and >Lex or to whatever scanner generation tool is provided by Demeter/Java. > >ParseGen is an LL(1) parser generator (for pedagogical reasons), so >you have to convert the grammar to LL(1) form, which you don't have >to do with yacc. In every other respect ParseGen is easier to use than >yacc. (So is Fischer and LeBlanc's LLGen, whose inputs are usually >compatible with ParseGen's.) From what I recall of John Curry's >original message, the Demeter/Java parser generator also requires that >the grammar be converted to LL(1) form, which suggests that the effort >required to generate the scanner and parser would be essentially the >same whether Curry had used LexGen/ParseGen or Demeter/Java. > >Even if the Demeter/Java tools for scanner and parser generation were >superior to those that are commonly used with other styles of programming >(as I can well believe to be true in the case of lex/yacc), this does >not imply anything about the relative merits of Demeter/Java versus >those other styles of programming, unless you can establish some causal >connection between the quality of the tools and the style of programming. > >JC>The code generation of the C compiler required approximately 880 lines of >JC>C code. The entire implementation of the interpreter required >JC>apporximately 570 lines of Demeter / Java code. > >I believe John was using the output module I supplied, which featured >programmable peephole optimization. If so, then he would have written >one line of C code for each MIPS instruction that he wanted to generate. >The corresponding portion of an interpreter would not require so many >lines of code. It should also be remembered that his interpreter does >not implement the full language that was implemented by his compiler. > >JC>Although I did not keep any data upon the original implementation of the >JC>compiler, I recall the time to add new features was longer (I would guess >JC>by a factor of 2X - 4X). Therefor I believe programming in traversal >JC>visitor style does promote development of flexible systems. > >I suspect that this factor of 2X - 4X is attributable primarily to the >fact that the COM 3355 project was a compiler, not an interpreter. To >add a new feature to the compiler, John had to think about the runtime >structures and invariants of the compiler, which were probably more >complex than those of his interpreter, and also had to figure out how >to synthesize the semantics using the instruction set of the MIPS. John >was probably not an expert on that instruction set. > >I wouldn't be surprised if traversal visitor style were more flexible, >but I think that effect would be small compared to the noise generated >by comparing a compiler with an interpreter. Hence this particular data >point neither refutes nor justifies the claim that Demeter/Java style is >more flexible. > Ok, but the point remains that John wrote a huge family of interpreters in Demeter/Java while only a small family of compilers in C. Maybe we should talk about the size of the families? In my view: larger family size -> more flexibility. > >JC>3. Programs are more understandable provided the reader knows >JC> the concepts of AP. > >This could be tested by implementing the interpreter as straight C code, >asking some neutral committee to devise some questions about the interpreter >that are independent of its source language (such as asking whether >environments are implemented as mutable or as immutable data structures), >and dividing a group of students who know the concepts of AP into two groups: >One group must answer the questions by reading the C code, and the other >group must answer the questions by reading the Demeter/Java code. The >null hypothesis would be that there is no difference between the performance >of the two groups on those questions. You can perform the statistical >analysis to see how likely the null hypothesis is, given the experimental >results. > >Suppose the Demeter/Java style turns out to be more understandable than >the straight C code. Then you still have to address the question of >whether the time spent learning the concepts of AP might have been better >spent learning about recursive traversals in C, fundamentals of interpreters, >and so forth. In other words, which group is better able to understand >the interpreter: > > 1. students who have spent N days learning the concepts of AP, > and must then answer questions based on the Demeter/Java version > of the interpreter; or > 2. students who have spent N days learning the concepts of recursion, > recursive data types, and interpreters, and must then answer > questions based on the C version of the interpreter. > >Even then you've got the problem of controlling for differences in the >quality of the code, for differences between Java and C, and so forth. > Your experimental analysis is great. It shows some pitfalls one can get into. I try to stay away from this if I can. > >This kind of experimentation is very hard to do well, and the results >are usually very difficult to interpret. That's why, as a graduate >student, I gave up on doing my PhD thesis in cognitive psychology: >hard-core computer science is easier. > >Will > -- Karl