Macros for C-like Languages

Presented by Dave Herman
Transcribed by Dale Vaillancourt

Weise and Crew '93

Early macro systems for C-like languages were based on character substitution. A typical example is:
  #define FOO "abc
Such systems are unsafe in the sense that the macro can produce ill-formed syntax and non-tokens. The C preprocessor was a small step forward; it is essentially a filter on the token stream produced by the lexer. The most prevalent problem with CPP is illustrated by the following example:
  #define SCALE(a) (a * 0.75)
The result of SCALE(3 + 2) most likely does not capture the intent of the programmer. Even worse, a programmer may not notice the presence of the error until long after the (3 + 2 * 0.75) computation takes place.

Weise and Crew argue that macros should work on the level of ASTs rather than tokens, and they provide three criteria that a macro system must satisfy.

  1. noninterference: SCALE(3+2) should work as intended.
  2. syntactic abstraction: We want to define DSLs.
  3. syntactic safety: Macros should not be able to produce non-well-formed syntax.
They introduce quasiquote, unquote, and a syntactic type system.

Say you want a macro to create enumeration types; you might use it like this:

new_enum color {red, green, blue}
One can define this in Weise et. al's system like this:
syntax decl[] new_enum {| $$id::name {$$+\, id::ids}; |}
{ ... body ... }
This demonstrates that they are capable of destructuring input syntax. The system does not solve hygiene, but it does provide a gensym facility. The paper claims that the syntax type system guarantees that macros do not produce ill-formed syntax. The authors provide neither an algorithm nor a theorem. The most important contribution of this paper is that the authors demonstrate that macro systems are not tied to s-expressions.

Batory et al. '98

JTS has quasiquote and a syntactic type system. They are aware of capture problems, but their solution is minimal. They rely on the programmer to mark the variables that must be introduced hygienically (same idea as gensym in LISP). Programmers write macros by specifying a tree traversal that mutates ASTs.

Arguably the most interesting piece of the system is called Bali. It is a parser generator that takes a grammar as input and creates a grammar component as output. This grammar component can be combined with other grammar components to produce bigger grammars. Programmers use Bali to specify arbitrary grammar extensions to a host language, and macros provide a semantics for the extended syntax.

Bachrach and Playford '01

JSE does not allow for arbitrary grammar extension, but it offers a fixed set of places where macros are allowed to extend the syntax. The system offers quasiquote, unquote, syntax types, and pattern matching. Hygiene has been promised, but it has not been delivered. The authors claim to perform syntactic type-checking, however the process is actually closer to tag-checking as one typically finds in Scheme. The following example illustrates a problem with the system:

public class Foo {
  private int x;
  public syntax myMacro() { ... #{x} }
If myMacro is invoked outside of Foo, then the generated code should not be allowed to refer to x. The generated code, however, will be permitted to access x.

Baker and Hsieh '02

Maya offers syntactic type-checking, and it has the novel feature of putting the programmer somewhat more in charge of expansion. A typical example is the following:

h.keys().foreach(String st) {
A programmer may provide a foreach macro implementation for various underlying types (hashtable, array, etc.). The macro expander will determine the type of the data upon which the macro is applied and dispatch to the correct implementation. To facilitate this, the programmer is given control over an outside-in expansion process. The programmer must specify how much expansion is necessary for the system to dispatch to a macro implementation, and one accomplishes this by marking certain pieces of the output pattern as "lazy".

They claim to solve hygiene, but no algorithm is given. With regard to referential transparency, they mention trying to resolve class names, but not field names. It is unclear to what extent referential transparency is accommodated.