Assignment #9: (Not for submission) Toy Compiler

Out: Wednesday, April 23rd   Due: Wednesday, April 23rd, 0:00midnight



Administrative

Same pairs, unless you let me know.

Update the course plugin for the language level for this assignment.

In this assignment you will build on the interpreter that was extended in the previous homework, so begin by retrieving the posted solution to be used as a basis for this one.



What to do

In this homework, you will turn the extended TOY evaluator into a compiler. Do this in small steps, following the same process that was demonstrated in class, starting with the posted solution:



Dealing with multiple expressions

One problem that you will need to deal with is the fact that the extended language has several places were multiple expressions are evaluated one by one, using eval-body. There are several possible ways to deal with compiling such multiple-expression bodies:

In any case, remember to specify the correct types, write clear purpose statements, and document your code if there is anything non-trivial it does. If the the second or third options are too difficult for you, you can use the first one (and write a comment saying that), but this will imply a small penalty. If you're using the second or third approaches, then you're writing a piece of the compiler — so the new function should also check whether the compiler is enabled before it does anything (use the same unless check and error).



Dealing with rfun calls

When you pull out all dependencies on the input syntax, pay attention to the case of applying an rfun. In this case, there are two things that can be done at compile time:

  1. Checking that the arguments are all identifier expressions,
  2. Grabbing the list of symbols that are the given names.
Note that you cannot grab the boxes at compile-time, because the boxes are taken from the environment, and that is a run-time value.

Note that the check happens at compile-time, so you don't know whether the function that is applied is going to be a FunV or an RFunV. (There is no way to know that at compile time, since some function calls can be used with a variable function value that can be either one.) You should therefore only perform the check, but raise the error only if you actually get an RFunV function (at run-time).

Also, the list of symbols that you grab is used only in case of an RFunV application. You can use the fact that Scheme is dynamically typed and combine both of these by computing this:

(and (andmap Id? arg-exprs) (map Id-name arg-exprs))
at compile-time. Then, at run-time, you need to check that this value is not false and either use it or throw an error.



Optional extension

This question is not required, do not submit a solution. Do it only if you think it will be fun. (You can talk to me about it if you did it and want to get some karma points.)

Going back in time, remember that we talked about de-Bruijn indices as an alternative for variable names. Now that you have an actual compiler, you combine the de-Bruijn preprocessor with your compiler. This is easy since they both operate on expressions: you will need to add an argument to the compiler that describes the names that are known in the current lexical scope (a list of lists, since each environment frame has a list of names (no need for the functional representation that was used in the BRANG* homework)), and use this value to precompute the position of names when they are looked up. As a result, you will not need to do any lookups at run-time, since you will know where to find the required value (the frame number and the index number in the frame). (Dealing with RFunVs will be a little tricky here.)
If you do this properly, you will notice that you no longer need the names in the environment, since you pull out the values directly. This makes it possible to change the environment implementation (as in the BRANG* homework) so it no longer contains names. Note that global names will require some special treatment — you can either make up a description of the global environment to start the compilation with, or you can ‘inline’ global values when the compiler sees them. In either of these, global lookups are also eliminated in the run-time (inlining requires forbidding mutation of global bindings). Also note that you can make closures keep only their arity, instead of a list of names that are not needed.
For an even better performance, you can use vectors for the implementation of frames. They are a little better because they have constant-time access to all elements, unlike lists. See make-vector, vector-ref, etc.
Throughout all this, you can use some code, like

(time (run "{bindrec {{fib {fun {n} {if {< n 2} n {+ {fib {- n 1}} {fib {- n 2}}}}}}} {fib 27}}"))
to measure speed improvements. (You can switch off debugging information in DrScheme's Language dialog to make things run faster.) In my experiments, the original TOY extension ran for 5.7 seconds, the basic solution for this homework runs for 4.0 seconds, and doing all of the above extra optimizations got it further down to 2.7 seconds (more than twice faster).