Lecture Notes


CS4410/6410: Compilers
Spring 2013

Problem Set 1: Building a Front-End for Fish

Due Saturday, 2 Feb 2013, 11:59pm


The goal of this problem set is to get you used to building lexers and parsers for semi-realistic languages using ML-Lex and ML-Yacc. In particular, you (and your partner if any) will build a lexer and parser (using Lex and Yacc) for a small language called Fish, which stands for Fortran-ish. Fish is intended to have syntax similar to that of C and Java and in fact, we'll soon scale Fish up to have more features of these languages.

A tutorial for Lex and Yacc can be found here and here.


Download the ps1.zip zip file and unzip it. You should find a directory fish that contains all of the files that you will need for this assignment.

The files are incomplete (you'll be modifying lex.mll, and parse.mly) but you should be able to compile them with the provided Makefile. The file ast.ml contains the abstract syntax for Fish programs. The file eval.ml contains an interpreter for Fish programs.

The file fishyacc.ml uses the parser/lexer to parse an input fish program, and evaluate it so you can see if the resulting AST is what you expected. Running make will produce an executable for your lex/yacc solution.

Fish Syntax

The Fish language is very simple and only includes integer expressions and statements. Here is an example Fish program:

  /* this is a comment */
  x = 42;
  y = 31;
  for (s = 0; x > 0; x = x - 1) {
    y = y + 1;
  return y;
More example Fish programs can be found in the test directory in ps1.zip. As you can see, the syntax and semantics of Fish programs is intended to follow that of C (and Java). Your job is to write a lexer and parser for Fish using ML-Lex and ML-Yacc. In the interest of not giving too much away, I've intentionally not specified the language's syntax carefully. To a large degree, this is up to you. However, here are some guidelines that I expect you to follow:
  • Comments, as in C, should start with "/*" and end with "*/". Comments in C do not nest.

  • Expressions are grouped with parentheses, whereas statements are grouped with braces "{" and "}".

  • You should follow the usual precedence and associativity for arithmetic and logical operations. For instance, * and / bind tighter than + and -, and arithmetic binds tighter than comparisons, and comparisons bind tighter than && and ||, and finally, assignment comes last in the precedence.

  • Identifiers must start with a character, and can include characters, digits, or underscores.

  • Numbers are restricted to decimal integers. Note that your parser should support things like "-42". How you handle this is up to you...

  • C lets you have an "empty" statement -- for instance:
        for (x = 0; x < 10; x = x + 1) /* empty statement */ ;
    You can use "skip", defined in ast.ml to represent an empty statement.

  • Your grammar should have no conflicts. So you'll have to resolve the "dangling else" discussed in the text.

  • You should make sure to test your parser extensively to make sure it is behaving correctly. (See the tests directory in the zip file for some tests.)

Using Lex and Yacc

The files lex.mll, parse.mly, and fishyacc.ml contain the boilerplate code that you'll need for your Lex and Yacc-based front-end. You really only need to write the regular expression definitions in lex.mll to define tokens, and the productions and semantic actions in parse.mly to define your parser.

Submitting your work

First, make sure that you and your partner's name (if any) are on the modified files in a comment. Second, only one of you needs to submit something. However, if you forget to put your names on the submission, we won't know who to assign credit (or more properly, we won't know who you partnered with.)

To submit, zip up all of your ps1 files as ps1-lastname1-lastname2 and email to 4410staff at ccs.neu.edu. The last one before the deadline will be accepted.