This project re-implemented stubgen, a code generation tool that I originally developed as a Perl script. At 1500 lines, the Perl script was becoming unmaintainable mostly because of the poor encapsulation of data structures that it employed. The goals of the project were:
Documentation for the original version can be found here (Postscript, HTML). Although somewhat outdated (it doesn't describe the configuration file syntax and semantics), it gives a good view of how the tool operates.
Stubgen was originally developed to generate the C++ wrapper code associated with a design pattern, dubbed ‘handle-bridge-body’, used to insulate different clients of an object oriented database from each other and from the underlying database technology itself. The goals of the pattern are:
In this pattern, each database entity is implemented as a group of collaborating classes shown in Figure 1 below:
Figure 1: Handle-Body-Bridge Pattern

The details of this pattern are discussed in the Stubgen documentation. The classes, Handle and HandleBody, form a framework implementing the Handle-Body pattern. They take care of reference counting and support a copy-on-write semantics. The class, ooObj, is the base class of all persistent classes. The other classes (whose names contain the meta-variable, <Entity>) are the collaborators in the pattern. A separate group of these classes must be created for each database entity (denoted by <Entity>). The roles of these classes are summarized here:
Each of these classes is called a code category. Except for oby<Entity>, the code for these categories is mostly boiler plate and can be mechanically generated. Stubgen creates the header and implementation files for each code category by expanding a corresponding template file. The template file contains C++ source code interspersed with directives delimiting semantically significant areas of the template. These regions are usually expanded repeatedly as required. For instance a section can be marked as the boiler plate for a method body. This section will be expanded once for each method in the interface. The template text also contains variables, whose values are substituted when the text is expanded. Thus the boiler plate section for a method body would use variables denoting the method’s name, return value, argument list, etc. Details of the directives and variables are given in the Stubgen documentation.
Figure 2 summarizes the basic data flow of Stubgen. A single execution of the program generates all of the header and implementation files for a given code category. The Definitions file contains information describing the database entities and their interfaces.
Figure 2: Stubgen Data Flow

Stubgen was subsequently extended to generate code for the ‘handle-body’ pattern as well. This extention required encoding the pattern in data structures which could be loaded from a Config file. (The documentation for Stubgen was not updated to discuss the configuration file).
When generating header files, Stubgen attempts to minimize the numbe of header files that must be #included. It does this by analyzing how classes are used in the interface. Classes that are passed by reference or as method return values only do not need to have a header #included; a simple forward declaration will suffice. Headers for these forward declared classes are #included in the implementation file instead.
The elaboration phase consisted of studying the Perl implementation to extract design information out of it. The results of this reverse engineering effort were summarized in a document (Postscript, HTML).
After the elaboration phase, I developed a growth plan (GrowthPlan1) which focused on implementing the core capabilities first, using Demeter object syntax for the 'configuration' and 'definition' files. After the core capability was implemented, I would create custom parsers in JavaCC to implement a more 'natural' syntax for these files.
As I began the second cycle of that plan, I realized that it would take as much effort to work around the limitations of the Demeter object syntax as it would to simply use JavaCC from the start. I wrote a new growth plan (GrowthPlan2), re-implemented the first cycle of the former plan (to use a custom JavaCC grammar for the TypeRegistry), and proceeded with the second cycle of the new plan.
Halfway through the second cycle of the second growth plan I discovered that I needed CodeCategories in order to implement 'Update Type Registry with entries for Entities. The original version of stubgen creates entries for each code category by expanding the type name pattern and header file patterns of that category to create 'SimpleClass' type. This was a detail that I missed in my intial reading of the original code. So I wrote my third (and final!) growth plan (GrowthPlan3) which I followed for the rest of the development.
Cycle 1 (Growth Plan 1)
Cycle 1b (Growth Plan 2)
I did all of my development in terms of Demeter/Java and JavaCC input files. I did not directly write or modify any Java source. The class dictionary contains 47 classes. There are about twice that many classes in the generated code.
At this point the Demeter/Java implementation provides the basic functionality of the tool, but is missing some significant capabilities that are exploited in practice:
-v : verbose output
-o : create list of generated files
-p : specify a header file path prefix
Curiously the Demeter/Java implementation is about the same size (2400 lines) as the Perl implementation and still does not provide the same level of functionality. (See below for a detailed break-down of the Demeter/Java implementation.) This observaton still holds when you discount the manually generated JavaCC grammar files used to implement the improved input format. These comparisons are approximate because I simply counted raw lines of text. Nonetheless, there is not the order of magnitude difference that one typically sees between Java and Demeter/Java.
Perhaps one reason for the similar sizes, is that there is a subtantial amount of non-traversal logic in this program. For instance template expansion (Expand.beh) is the largest single file. This code deals mostly with file I/O and text transformation, and doesn't entail much object traversal. The logic in this area follows the Perl code fairly closely. Likewise, the second largest file, Load.beh, contains the database loading logic and is concerned mostly with parsing and object building rather than traversal.
I still consider the Demeter/Java version more maintainable for several reasons:
The Perl version represents about 3 person months of development effort. The Demeter/Java version represents about a single person month of development effort. I would attribute this difference more to the following factors than to any superior expresivity in Demeter/Java.
Most of my testing was done against templates, configuration and definition data used in our production environment. Since this information is proprietary, I have included contrived data for this delivery.
Bugs:
method [bar] int foo(void) const;
will not parse
- $[Header]
can expand to the same value more than once in a template.
- $[ExternalClass]
can expand to the same value more than once in a template
- $[ExternalClass]
should not expand to the class being generated.
- $[Header]
should not expand to the header of the class being enerated.
Unfinished:
- 'virtual'
and 'static' method qualifiers are treated in two different ways. Make the handling of 'virtual' the same as 'static'.
|
Lines |
File |
Contents |
|
128 |
CmdArgs.beh |
Methods of class, 'CmdArgs' |
|
73 |
Display.beh |
Collaborations to display database |
|
45 |
EntityAsType.beh |
Collaboration to register entities as types in the type registry |
|
465 |
Expand.beh |
Collaborations for template expansion |
|
166 |
GenerateCode.beh |
Top level collaborations for code generation |
|
309 |
Load.beh |
Loads database from files (Config, Definitions) |
|
84 |
MacroTable.beh |
Macro Table data structure |
|
67 |
Main.beh |
Top level of the program |
|
50 |
TypeUseFixup.beh |
Collaboration to bind each TypeUse object with its corresponding Defined Type object. |
|
74 |
Util.beh |
Miscellaneous methods from sundry classes |
|
1461 |
Subtotal |
|
|
98 |
Cmdargs.jj |
Command line parsing |
|
629 |
Stubgen.jj |
Database file parsing (Config and Definitions) |
|
727 |
Subtotal |
|
|
195 |
Stubgen.cd |
Class dictionary |
|
2383 |
Grand Total |
|
The following tools were used:
|
Tool |
Version |
Vendor |
|
JDK |
1.1.5 |
Sun |
|
JavaCC |
0.7 |
Sun |
|
Demeter/Java |
0.6.4 |
Northwestern University |
|
PerlTools |
1.0.2 |
Original Reusable Objects, Inc. |
PerlTools is a set of Java classes implementing Perl style regular expression pattern mathching and substitution facilities. This tool kit allowed me to easily translate my template expansion logic from Perl into Java. My use of PerlTools complies with its License Agreement. More information may be found at: www.oroinc.com.