Assignment 1: Lexer
Due: class time (6 PM), Thursday, February 2.
Assignment 1 is to build a lexer for Appel's Tiger language using
ML-Lex. The Tiger language is defined in Appendix A of the course
textbook. Documentation on ML-Lex is available on the class web page,
and also (in less detail) in chapter two of the textbook.
Skeleton files to get you started are available in the $TIGER/chap2/
directory, which you can find at Appel's textbook homepage.
You should submit:
- A tiger.lex file, with the source for your lexer.
- Any other source files you wrote to support your lexer.
- A text file describing
- the members of your team
- How you handled comments
- How you handled errors
- How you handled end-of-file
- Anything else you think is of interest about your lexer
You're expected to write clean code; just getting it to work is not enough.
Your lexer should use the error-reporting machinery in Appel's
ErrorMsg module (see file $TIGER/chap2/errormsg.sml), or something
equivalent that you write yourself. In particular, error messages
should be reported using line-number/column offsets, not by simply
specifying the character offset from the beginning of the file.
Words to the wise: Relative to, say, a parser, it's not hard to build a lexer,
- Tiger's lexical grammar has some complexities that may take you more
work to handle than you might initially suspect:
- Comments nest in Tiger.
- The string-literal syntax is especially complex, involving a lot
of subcases, some of which are fairly complex in their own right.
- Having all of the above machinery interact with your line-number
tracking machinery has its own complications.
- If this is your first experience ever programming in SML, you'll
need to allocate time generously to deal with coming up to speed
on the language.
- Does your lexer do the right thing if eof occurs inside a comment
or string-literal? How about illegal escape codes in string literals?
See the course text for more information, in particular chapter two
and appendix A.