1) Introduction
This project will re-implement a code generation tool called ‘stubgen’. Stubgen is currently implemented as a 1500 line Perl script, and is rapidly becoming unmaintainable. In addition to providing a more maintainable implementation, this project will redefine the syntax of some of the driver files to be more ‘natural’ for C++ programmers. The original syntax for describing classes and methods was designed to be easy to parse in Perl and was rather cumbersome.
Stubgen was originally developed to generate the C++ wrapper code associated with a design pattern, dubbed ‘handle-bridge-body’, used to insulate different clients of an object oriented database from each other and from the underlying database technology itself. The goals of the pattern are:
In this pattern, each database entity is implemented as a group of collaborating classes shown in Figure 1 below:
Figure 1: Handle-Body-Bridge Pattern

The details of this pattern are discussed in the Stubgen documentation. The classes, Handle and HandleBody, form a framework implementing the Handle-Body pattern. They take care of reference counting and support a copy-on-write semantics. The class, ooObj, is the base class of all persistent classes. The other classes (whose names contain the meta-variable, <Entity>) are the collaborators in the pattern. A separate group of these classes must be created for each database entity (denoted by <Entity>). The roles of these classes are summarized here:
Each of these classes is called a code category. Except for oby<Entity>, the code for these categories is mostly boiler plate and can be mechanically generated. Stubgen creates the header and implementation files for each code category by expanding a corresponding template file. The template file contains C++ source code interspersed with directives delimiting semantically significant areas of the template. These regions are usually expanded repeatedly as required. For instance a section can be marked as the boiler plate for a method body. This section will be expanded once for each method in the interface. The template text also contains variables, whose values are substituted when the text is expanded. Thus the boiler plate section for a method body would use variables denoting the method’s name, return value, argument list, etc. Details of the directives and variables are given in the Stubgen documentation.
Figure 2 summarizes the basic data flow of Stubgen. A single execution of the program generates all of the header and implementation files for a given code category. The Definitions file contains information describing the database entities and their interfaces.
Figure 2: Stubgen Data Flow

Stubgen was subsequently extended to generate code for the ‘handle-body’ pattern as well. This extention required encoding the pattern in data structures which could be loaded from a Config file. (The documentation for Stubgen was not updated to discuss the configuration file).
When generating header files, Stubgen attempts to minimize the numbe of header files that must be #included. It does this by analyzing how classes are used in the interface. Classes that are passed by reference or as method return values only do not need to have a header #included; a simple forward declaration will suffice. Headers for these forward declared classes are #included in the implementation file instead.
2) Data Models
This section discusses the data models extracted from a reading of the Perl script. The data maintained by Stubgen fallse into three major categories: information about types, information about entities, and information about code categories. The data model for types and entities is shown in Figure 3. The data model for code categories is shown in Figure 4.
Figure 3: Types and Entities
2.1) Types
Stubgen needs to know some basic information about every type (included the generated clases) used in an entitiy’s inteface:
The name and header file name are stored as attributes of the base class, BasicType. The distinction between class and non-class type is done by subtyping BasicType into NonClassType and ClassType ClassType is further subdivided into SimpleClassType and TemplatedClassType because the format of a forward reference for these to kinds of class is different. A TemplatedClass must also store the template arguments used in its forwared reference.
Each generated class is represented by an instance of SimpleClassType. The class name and header file name for generated classes are derived using rules given in the corresponding code category.
2.2) Entities
Entities represent database entities. An entity is represented by one or more generated classes. Associated with each Entity is a set of AccessKeys defining which code categories will be allowed to generate a class for that Entity. AccessKeys are also associated with code categories, and a code category is allowed to generate a class for each entity having an access key in common with the code category.
Each Entity has a list of parent Entities; those Entities from which it inherits directly. During code generation, Stubgen flattens the Entity’s interface, so that inherited methods are included in the generated class. Flattening is done because generated classes (being handles) are not organized into an inheritence hierarchy mirroring that of the Entities. Such an approach allows the type safety of C++ to be violated. [Reference to ‘Smart Pointers, They are neither Smart nor Pointers’ goes here.] (The implicit upcasts performed by C++ are emulated by providing conversion operators that allow a handle for a derived entity to be converted to a handle for its parent entity. Currently these upcasts are explicitly specified in the definitions file. Stubgen ought to generate them automatically.)
[Note: It might be useful to allow flatteningn to be enabled or disabled on a per code category basis.]
An Entity also possesses a set of interface methods represented by instances of Method. Each method has a set of access keys defining what code categories are allowed to include this method in their generated class. When generating a class, a code category may only use those methods (inherited or not) having an access key in common with the code category.
Methods provide functionality used during code generation. In particular they can calcluate the value of several variable that can be used in the boiler plate text for methods.
A Method also has a ‘bodyVersion’ attribute, specifying which of several alternative boiler plate segments should be expanded for this method. (For instance static methods have a different boiler plate segment than instance methods).
Each Method also maintains descriptions of its return value and argument declarations. Associated with each of these declarations are one or more types referenced in that declaration. (A declaration using a templated class will have more than one associated types. Other declarations have only one associated type). The association is represented by an instance of TypeUse which also indicates if the type is being used by value or by reference in the declaration. This type use information is used by Stubgen to determine if it needs to #include a header for a referenced class or simply forward declare it when expanding the template for an entitiy’s header file.
2.3) Code Categories
Code categories represent a set of classs having a common implementation. (See for this discussion). Each category has the following attributes:
Not all code categories actually govern the generation of code. For instance in the Handle-Bridge-Body pattern the oby<Entity> category has no generated code, but it must be referenced in various template files for other generated categories. Thus CodeCategory is subdivided into NonGeneratedCodeCategory and GeneratedCodeCategory.
GeneratedCodeCategory represents a code category that can generate code. It has two additional attributes:
Figure 4: Code Categories
Several code categories can share the same CodeCategoryType. For example in the Handle-Bridge-Body idiom, the ClientA<Entity>, ClientB<Entity>, etc. can share the same CodeCategoryType. Aside from the client name, the code is identical. To support this reuse, Stubgen provides a template variable that expands to the client name.
The subclasses of Template implement specific code generation algorithms (See section 3) for each type of template. Although the basic process is the same for all templates, there are differences between header, and implemtation templates. The ImakeTemplate is used to generate a fragment of an Imake file containing the compilation rules for the generated classes.
This section discusses the basic process for generating code by expanding a template file.
3.1) Code Generation Overview
This section describes how Stubgen goes about generating all of the code for a given GeneratedCodeCategory.
Initialize the macro table with definitions of: ‘Date’, ‘<Category>’, ‘<Category>Header’. (This entails visiting all of the CodeCategory objects)
Visit all qualified Entity objects. (An Entity is qualified w.r.t. a GeneratedCodeCategory iff it has some AccessKey in common with the GeneratedCodeCategory).
For each visited Entity:
Template expansion is always in one of several modes:
$define directive
$forwards directive
$includes directive
$methods directive
$sources directive
Normal Mode:
Each line is read and the expanded line is written.
Define Mode:
The new variable name and definition are extracted from the input line. The definition is expanded and added to the macro table.
Forwards Mode:
All of the text between the $forwards and $end-forwards directives is collected into a segment buffer.
The set of classes needed forward references is determined. This determination depends upon the template being expanded. For a HeaderTemplate, it is the set of all classes that are used by reference only. For all other templates it is an empty set. (For a HeaderTemplate, the set can be determined by visiting members of the TypeUseage computed earlier).
For each class needing a forwared reference
Includes Mode:
All of the text between the $includes and $end-incudes directive is collected into a segment buffer.
The set of header files that need inclusion is determined. This determination depends upon the template being expanded.
For a HeaderTemplate it is the union of all header files defining (a) non-class types occuring in the TypeUseage or (b) class types that are passed by value in the TypeUseage.
For an ImplTempalte it is the union of all header files defining classes that are passed by reference in TypeUseage and are not explicitly forbidden by a $no-header directive.
For an ImakeTemplate it is an empty set.
[ The sets of types defined for HeaderTemplate and ImplTempate are disjoint and usually partition the set of types in TypeUseage (unless some $no-header directive is in effect). However the sets of include files may not be disjoint because a given include file could define types in both sets. ]
For each required header file:
Extend the macro table with a definition of ‘HeaderFile’
Expand and output each line of the segment buffer
Retract the definition of ‘HeaderFile’
Methods Mode:
The scope of the $methods directive is partitioned into a default segment and zero or more alternative segments each introduced by an $alt-method directive. Collect these segments into separate buffers, indexed by their name.
For each qualified Method:
Extend the macro table with definitions of: ‘MethodName’, ‘MethodType’, ‘MethodSignature’, ‘MethodCall’, ‘Return’, ‘ConversionType’.
Determine the appropriate segment buffer to use by finding the buffer whose name matches the Method’s ‘bodyVersion’ attribute. (Default body text is used if no match is found).
Expand and output the contents of the selected buffer.
Retract the macro table definitions.
Source Mode:
The scope of the $sources directive is collected into a buffer.
For each file name in Sources:
Extend the macro table with a definition of ‘SourceFile’.
Expand and output buffer contents.
Retract definition of ‘SourceFile’.