Hand out Quiz 3. Interpreters and Compilers ========================== -- Abstract syntax tree (or parse tree) An abstract syntax is a representation that identifies the syntactic rules used in obtaining the given expression. The representation is in the form a tree, referred to as a parse tree or abstract syntax tree, and provides easy access to subcomponents of the expression. How does one derive an abstract syntax tree for a given expression in a given language? Through scanning and parsing. Scanning is the process of analyzing a sequence of characters into larger units, called tokens. Typical tokens are variables, keywords, numbers, punctuations, whitespace and comments. The output is a sequence of tokens which is then send to the parser. The most suitable model for defining scanners is a finite state automaton. (Refer to handout from the book "Elements of Programming Languages" by Friedman, Wand, and Haynes.) The parser organizes a sequence of tokens into a parse tree. This is done on the basis of the grammar defined for the language. The parser identifies the production rule that is associated with each subcomponent of the expression. Parse tree for the expression Example: * 5 + - * 6 7 5 + 8 9 Note that the parse tree immediately provides a guideline for evaluating the expression. For the above example, we assume that the tokens are already given. That is, the expression is already scanned. -- Environment An environment is a mapping from variables to constants. -- Interpreter An interpreter consists of a scanner, a parser, and an evaluator. The evaluator takes an abstract syntax tree (or parse tree) and an environment, and executes the parse tree in the environment. This results in some output, and possibly, some changes to the environment. -- Compiler A compiler takes a parse tree or a sequence of parse trees and produces machine code (rather than evaluating the parse trees). The machine code can then be executed by the machine for which the code is generated. One can thus transform an interpreter into a compiler by making changes to the evaluator: when evaluating the parse tree, replace the "evaluations" by "outputting appropriate machine code". -- Instructions and machine code The machine code typically consists of primitive arithmetic and store/load instructions. Primitive arithmetic involves simple binary and unary operations on values stored in specified registers. Store/load instructions involve loading from and storing into memory locations. -- Register allocation Since the number of registers in the CPU is limited (10s, say), the compiler needs to take up the task of allocating registers. Thus, a value that resides in a register and is needed (live) should be stored into the appropriate memory location before the register may be used for other computation. This involves liveness analysis and register allocation. Java Virtual Machine ==================== (Notes based on article by Bill Venners.) Represents a virtual machine. The Java compiler compiles a given Java program into JVM code. The JVM has five components -- Bytecodes -- Registers (i) program counter; (ii) optop; (iii) frame; and (iv) vars. Most of the bytecode operations operate on the stack. -- The method area and program counter Contains the bytecodes for the program. The program counter points to the instruction that will be executed next. -- Stack and registers The Java stack is used to store parameters for an results of bytecode instructions, to pass parameters and values between method invocations. A stack frame consists of: local variables, start of stack frame operands, and operands for bytecode instructions. The first is pointed to by vars, the second by frame, and the third by optop. -- Heap This is where objects reside. When new objects are created, memory is allocated on the heap. When objects die, their locations can be reclaimed. This is done by a process called garbage collection, which is part of the JVM. Memory management is not handled by the programmer; instead, it is handled by JVM.