Contents:

Project Description

Project Chronology

Elaboration

Implementation

Results

Bugs and Improvements

Observations on using Demeter/Java

Size Calculation

Credits


Project Description

 This project re-implemented stubgen, a code generation tool that I originally developed as a Perl script. At 1500 lines, the Perl script was becoming unmaintainable mostly because of the poor encapsulation of data structures that it employed. The goals of the project were:

Documentation for the original version can be found here (Postscript, HTML). Although somewhat outdated (it doesn't describe the configuration file syntax and semantics), it gives a good view of how the tool operates.

Stubgen was originally developed to generate the C++ wrapper code associated with a design pattern, dubbed ‘handle-bridge-body’, used to insulate different clients of an object oriented database from each other and from the underlying database technology itself. The goals of the pattern are:

In this pattern, each database entity is implemented as a group of collaborating classes shown in Figure 1 below:

Figure 1: Handle-Body-Bridge Pattern

The details of this pattern are discussed in the Stubgen documentation. The classes, Handle and HandleBody, form a framework implementing the Handle-Body pattern. They take care of reference counting and support a copy-on-write semantics. The class, ooObj, is the base class of all persistent classes. The other classes (whose names contain the meta-variable, <Entity>) are the collaborators in the pattern. A separate group of these classes must be created for each database entity (denoted by <Entity>). The roles of these classes are summarized here:

Each of these classes is called a code category. Except for oby<Entity>, the code for these categories is mostly boiler plate and can be mechanically generated. Stubgen creates the header and implementation files for each code category by expanding a corresponding template file. The template file contains C++ source code interspersed with directives delimiting semantically significant areas of the template. These regions are usually expanded repeatedly as required. For instance a section can be marked as the boiler plate for a method body. This section will be expanded once for each method in the interface. The template text also contains variables, whose values are substituted when the text is expanded. Thus the boiler plate section for a method body would use variables denoting the method’s name, return value, argument list, etc. Details of the directives and variables are given in the Stubgen documentation.

Figure 2 summarizes the basic data flow of Stubgen. A single execution of the program generates all of the header and implementation files for a given code category. The Definitions file contains information describing the database entities and their interfaces.

Figure 2: Stubgen Data Flow

Stubgen was subsequently extended to generate code for the ‘handle-body’ pattern as well. This extention required encoding the pattern in data structures which could be loaded from a Config file. (The documentation for Stubgen was not updated to discuss the configuration file).

When generating header files, Stubgen attempts to minimize the numbe of header files that must be #included. It does this by analyzing how classes are used in the interface. Classes that are passed by reference or as method return values only do not need to have a header #included; a simple forward declaration will suffice. Headers for these forward declared classes are #included in the implementation file instead.

 


Project Chronology

Elaboration:

The elaboration phase consisted of studying the Perl implementation to extract design information out of it. The results of this reverse engineering effort were summarized in a document (Postscript, HTML).

Implementation:

After the elaboration phase, I developed a growth plan (GrowthPlan1) which focused on implementing the core capabilities first, using Demeter object syntax for the 'configuration' and 'definition' files. After the core capability was implemented, I would create custom parsers in JavaCC to implement a more 'natural' syntax for these files.

As I began the second cycle of that plan, I realized that it would take as much effort to work around the limitations of the Demeter object syntax as it would to simply use JavaCC from the start. I wrote a new growth plan (GrowthPlan2), re-implemented the first cycle of the former plan (to use a custom JavaCC grammar for the TypeRegistry), and proceeded with the second cycle of the new plan.

Halfway through the second cycle of the second growth plan I discovered that I needed CodeCategories in order to implement 'Update Type Registry with entries for Entities. The original version of stubgen creates entries for each code category by expanding the type name pattern and header file patterns of that category to create 'SimpleClass' type. This was a detail that I missed in my intial reading of the original code. So I wrote my third (and final!) growth plan (GrowthPlan3) which I followed for the rest of the development.

 Cycle Directories

Cycle 1 (Growth Plan 1)

Cycle 1b (Growth Plan 2)

Cycle 2

Cycle 3

Cycle 4

Cycle 5

Cycle 6


Project Results

 I did all of my development in terms of Demeter/Java and JavaCC input files. I did not directly write or modify any Java source. The class dictionary contains 47 classes. There are about twice that many classes in the generated code.

At this point the Demeter/Java implementation provides the basic functionality of the tool, but is missing some significant capabilities that are exploited in practice:

-v : verbose output

-o : create list of generated files

-p : specify a header file path prefix

Curiously the Demeter/Java implementation is about the same size (2400 lines) as the Perl implementation and still does not provide the same level of functionality. (See below for a detailed break-down of the Demeter/Java implementation.) This observaton still holds when you discount the manually generated JavaCC grammar files used to implement the improved input format. These comparisons are approximate because I simply counted raw lines of text. Nonetheless, there is not the order of magnitude difference that one typically sees between Java and Demeter/Java.

Perhaps one reason for the similar sizes, is that there is a subtantial amount of non-traversal logic in this program. For instance template expansion (Expand.beh) is the largest single file. This code deals mostly with file I/O and text transformation, and doesn't entail much object traversal. The logic in this area follows the Perl code fairly closely. Likewise, the second largest file, Load.beh, contains the database loading logic and is concerned mostly with parsing and object building rather than traversal.

I still consider the Demeter/Java version more maintainable for several reasons:

  1. The underlying data model is more naturally represented by classes than by associative arrays of strings. Furthermore this data model is explicitly documented by the class dictionary.
  2. Important collaborations can be captured in a single behavior file rather than being scattered across multiple files.

The Perl version represents about 3 person months of development effort. The Demeter/Java version represents about a single person month of development effort. I would attribute this difference more to the following factors than to any superior expresivity in Demeter/Java.

Bugs and Improvements

Most of my testing was done against templates, configuration and definition data used in our production environment. Since this information is proprietary, I have included contrived data for this delivery.

Bugs:

method [bar] int foo(void) const;

will not parse

Unfinished:

Observations on using Demeter/Java:

  1. There were several cases where I had to maintain dual data structures (e.g. a Hashtable or Vector and a Demeter/Java repetition class) to hold the same information. I needed the repetition class to perform traversals, but for performance reasons I needed an alternate way to look up information in the collection. Current work on extending Demeter/Java to support various data structures should eliminate the need for this kind of trickery.
  2. It takes time to design traversals that are robust against change I have a number of traversals from ExpansionContext that would probably break if I added another collection to that class.
  3. Demeter/Java allows methods of a class to be scattered across many behavior files. While this is handy for grouping all of the methods related to a particular collaboration, it makes locating commonly used methods more difficult. (Especially under Windows95 where I found myself 'grep'-less :-) I started organizing my behavior files into two categories (a) files containing methods of a single class, which are shared by several collaborations, and(b) files containing methods from several classes but specific to a single collaboration.
  4. The Demeter/Java translator has an annoying habit of regenerating code when it doesn't have to. I ran it twice in a row, and discovered that the second run had overwritten most of the generated files even though none of the input files had changed. Consequently, I wound up nearly always building from scratch whenever I made even a small change to the code. Since a complete build takes about 9 minutes on my PC (a 75MHz Pentium with 40MB of RAM), this represents a big productivity hit (only 2 to 3 rebuilds per hour). It's not only the delay that is painful, but the loss of continuity in your thinking that is so damaging to productivity when when build times start to exceed a minute. The mind wanders and time is lost refocusing on the problem at hand when the build does complete.

Implementation Size Analysis:

Lines

File

Contents

128

CmdArgs.beh

Methods of class, 'CmdArgs'

73

Display.beh

Collaborations to display database

45

EntityAsType.beh

Collaboration to register entities as types in the type registry

465

Expand.beh

Collaborations for template expansion

166

GenerateCode.beh

Top level collaborations for code generation

309

Load.beh

Loads database from files (Config, Definitions)

84

MacroTable.beh

Macro Table data structure

67

Main.beh

Top level of the program

50

TypeUseFixup.beh

Collaboration to bind each TypeUse object with its corresponding Defined Type object.

74

Util.beh

Miscellaneous methods from sundry classes

1461

Subtotal

 

98

Cmdargs.jj

Command line parsing

629

Stubgen.jj

Database file parsing (Config and Definitions)

727

Subtotal

 

195

Stubgen.cd

Class dictionary

2383

Grand Total

 


Credits:

The following tools were used:

Tool

Version

Vendor

JDK

1.1.5

Sun

JavaCC

0.7

Sun

Demeter/Java

0.6.4

Northwestern University

PerlTools

1.0.2

Original Reusable Objects, Inc.

PerlTools is a set of Java classes implementing Perl style regular expression pattern mathching and substitution facilities. This tool kit allowed me to easily translate my template expansion logic from Perl into Java. My use of PerlTools complies with its License Agreement. More information may be found at: www.oroinc.com.