astci: Tree Compilation Interpreters
The "Abstract Syntax Tree Compilation" interpreter, implemented in support\base\astci.cpp creates an abstract
syntax tree using the Abstract Syntax Tree Parser and
then compiles the statements in a script into bytecode by traversing
the tree. Two versions
of the astci interpreter can be built by using either CLL syntax or PLL syntax. This is a somewhat more complicated method, in terms of
support code, than used by the Direct Compilation Interpreter,
but it has the advantage of allowing multiple traversals of the tree and
therefore more sophisticated code generation.
In this example, no attempt has been made to generate code for a more
sophisticated virtual machine. Instead, to illustrate the facility
obtained by using an abstract syntax tree, a small amount of code optimization has been performed by adding the
capability of evaluating constant expressions at compile time.
The astci interpreter is implemented in terms of two classes: the
Compiler class which creates
bytecode from an AbstractSyntaxTree object,
and the ScriptMethod class which interprets
the bytecode.
Although other methods of the ScriptMethod class
are defined in the bcidefs module, the constructor,
which actually compiles the input script, is defined as part of the astci
module.
The constructor is implemented in two stages. In the first stage, a Compiler object is created. The constructor for
the Compiler class invokes the ast parser to create
an abstract syntax tree which summarizes the
script. The compile() method of the
Compiler class is then invoked to traverse the tree and generate bytecode.
The external interface for the interpreter
begins by constructing a ScriptMethod object.
The constructor actually parses the
script and compiles the byte code.
interpret() then invokes the
ScriptMethod::apply() function to execute
the bytecode.
A call to the ScriptMethod::list() function, commented out for regression
testing, may be used to create a listing of the generated bytecode.
Compiling from an abstract syntax tree provides substantially greater flexibility
than compiling directly at parse time, since it is possible to traverse the
tree as many times as is desired. To illustrate this capability, the compiler
has optional optimization code which evaluates constant expressions at compile
time and simplifies the generated code accordingly. Optimization is controlled
by the definition of the macro OPTIMIZE. For convenience, this macro has
been predefined in astci.h. To turn off
optimization, comment out the macro definition.
When the OPTIMIZE macro is defined, the doConstantArithmetic()
function is defined and called as the first step in the
compile() function. The function works
by replacing any portion of the abstract syntax tree that represents a
constant expression with a single ConstantNode.
In addition, the typeIfStatement,
typeIfElseStatement, and
typeWhileStatement
cases of the compile() function check for a constant condition value and take
appropriate action.
The Compiler class encapsulates the functions and the data necessary
to traverse an abstract syntax tree and compile the statements it describes
into bytecode.
Strictly speaking, this class is unnecessary. The functions in the class
could have been written as external functions. They would, however, have
had to have two extra arguments in order to pass the references to the dictionary and the
constants. Should any additional parameters
become necessary, it would be
necessary to change all the function calls to accommodate them. Making
the functions members of a class and defining these parameters as class
members allows the calling sequences to be simplified and thereby
simplifies program maintenance.
AbstractSyntaxTree
tree;
-
The tree is created by a call to buildTree(),
the external interface of the Abstract Syntax Tree parser,
at construction time.
AgStringDictionary<AgString> &dictionary;
-
A reference to the dictionary used to identify variable names is initialized
by the constructor. Notice that the dictionary must be externally defined.
The bytecode created by this compiler can be used on any
Dataset object that contains a
reference to the same dictionary.
AgDictionary<Constant> constants; -
The constants dictionary is used to identify and accumulate constants that are encountered
in the script. Rather than complicate the bytecode interpreter by putting
such constants inline as immediate constants, they are referenced in the
bytecode by their index in the constants dictionary. A dictionary
is used rather than a stack in order to guarantee that there is only one
instance of any given constant, no matter how many times it appears in
a script.
CodeFragment
compile(const AstNode *node);
-
The compile function is called to generate bytecode for the specified node.
It returns a CodeFragment
object. If the node is not a leaf node, the compile() function
calls itself recursively to compile its subnodes. It then combines these
code fragments with specific glue code to implement the indicated functionality.
The function is implemented as a switch statement controlled by the type
of the node. The treatment of specific nodes is as follows:
default
-
The default case handles all nodes not explicitly listed below.
The compile() function is called recursively for each subnode.
The resulting CodeFragment
objects are simply concatenated and the concatenation is returned.
typeStatementBlock
-
The compile() function is called recursively for each node
in the list. The resulting CodeFragment
objects are simply concatenated and the concatenation is returned.
typeVariable
-
The name of the variable is entered into the dictionary if necessary
and the dictionary index obtained. A CodeFragment is then returned which
consists of a locate instruction and the dictionary index of the
variable name.
typeDump
-
For each variable name in the list, the variable is identified
in the dictionary and a
CodeFragment containing a
DUMP instruction is generated.
If the variable is not in the dictionary, it is added to the dictionary
and its value is set to uninitType.
The code fragments are concatenated together and the final result
is returned.
typePrint
-
Each expression in the list is compiled and a PRINT instruction appended.
The code fragments are concatenated together and the final result is
returned.
typeReturn
-
If an expression provided in the return statement, it is compiled and a
RETURN instruction is appended.
Otherwise, a simple HLT instruction is
coded.
typeConstant
-
The constant value is identified in the constants dictionary. The dictionary index of the
constant is the argument to the PUSHC
instruction.
typeUnary
-
The code to compute the operand is compiled by a recursive call to
compile(). The unary operator is then appended to the code
thus created.
typeBinary
-
The code to compute the left operand is compiled by a recursive call
to compile(). The code to compute the right operand is
then created by a second recursive call and concatenated with the
previously generated code. Finally, the binary operator itself is
appended.
typeCommaExpression
-
The code to compute the left operand is compiled by a recursive call
to compile(). A POP instruction is appended to discard the
result of the left operand. The code to compute the right operand is
then created by a second recursive call and concatenated with the
previously generated code.
typeExpressionStatement
-
The code to compute the expression is compiled by a recursive call
to compile(). A POP instruction is appended to discard the
result.
typeAssignment
-
First, the code to evaluate the lvalue expression on the left side
of the assignment is compiled by a recursive call to compile.
Then the code to compute the expression on the
right side of the assignment is compiled by another recursive call
to compile() and concatenated with the lvalue code. Finally,
the assignment opcode is appended.
typeConditionalExpression
-
Since there is no difference in the code generation between a conditional
expression and an if-else statement, the condition and the two expression
nodes are separately compiled and the ifElse() method of the CodeFragment class is invoked to
finish the job.
typeLogicalOrExpression
-
Code for the right expression is compiled first in order to determine its
size. Then code for the left expression is compiled.
A LOR instruction is appended. The offset
for the branch is the size of the code for the right expression.
Finally, the code for the right expression is concatenated.
typeLogicalAndExpression
-
Code for the right expression is compiled first in order to determine its
size. Then code for the left expression is compiled.
A LAND instruction is appended. The offset
for the branch is the size of the code for the right expression.
Finally, the code for the right expression is concatenated.
typeIfElseStatement
-
If the OPTIMIZE flag is set and the condition expression has
typeConstant, the value of the
constant is interrogated and used to select one of the two subnodes. Thus,
if the OPTIMIZE flag is set, the normal if-else statement can be used
to control conditional compilation.
If the OPTIMIZE flag is not set or the loop condition is not constant,
code for the three subnodes, the
condition, the onTrue statement, and the onFalse statement, is
compiled independently and passed to the
CodeFragment::ifElse() function which
combines these code fragments with the appropriate branch instructions to
create a single code fragment.
typeIfStatement
-
If the OPTIMIZE flag is set and the condition expression has
typeConstant, the value of the
constant is interrogated. If it is false, an empty
CodeFragment is
returned. Otherwise the CodeFragment for the statement controlled by the
if is returned. Thus, if the OPTIMIZE flag is set the normal if statement
can be used to control conditional compilation.
If the OPTIMIZE flag is not set or the loop condition is not constant,
code for the two subnodes, the
condition and the statement controlled by the if, is
compiled independently and passed to the
CodeFragment::ifStatement() function which
combines these code fragments with the appropriate branch instructions to
create a single code fragment.
typeWhileStatement
-
If the OPTIMIZE flag is set and the condition expression has
typeConstant, the value of the
constant is interrogated. If it is false, an empty CodeFragment is
returned. Otherwise the CodeFragment for the statement is checked for
the presence of a break statement. If none is found, an error is
reported. Otherwise, a continue statement is appended to the code,
patchBranches() is invoked
to substitute correct exit and continue branch offsets, and the
code fragment is returned.
If the OPTIMIZE flag is not set, code for the two subnodes, the
condition and the statement controlled by the while, is
compiled independently and passed to the
CodeFragment::whileLoop() function which
combines these code fragments with the appropriate branch instructions to
create a single code fragment.
typeDoStatement
-
The statement and the condition are separately compiled and then handed
over to CodeFragment::doLoop to add
the necessary branch instructions.
typeForStatement
-
The four different components of the for statement are separately compiled
by recursive calls to compile(). The results are then handed
over to CodeFragment::forLoop().
typeBreak
-
The function appendBreak() is
called to append a branch statement to an empty CodeFragment object.
Eventually, after this bit of code is concatenated with others, the
patchBranches() function will
fill in the operand field of the instruction with the correct offset to
exit the loop.
Note that the parser will throw an exception if a break statement occurs
outside a loop.
typeContinue
-
The function appendContinue() is
called to append a branch statement to an empty CodeFragment object.
Eventually, after this bit of code is concatenated with others, the
patchBranches() function will
fill in the operand field of the instruction with the correct offset to
continue the loop.
Note that the parser will throw an exception if a continue statement occurs
outside a loop.
typeFunctionCall
-
The argument expressions are compiled in turn and the code concatenated.
Then the function table
index of the function to be called is determined by calling
idFunction(). Finally a
CALL instruction is appended with the
function table index as the parameter.
void doConstantArithmetic(const AstNode *node);
-
The doConstantArithmetic() function is incorporated in the
compiler only if the OPTIMIZE switch is set. It is
then called as the first statement of the compile() function.
On entry, doConstantArithmetic() calls itself recursively to
handle constant arithmetic on all the subnodes of the current node.
Then, the function uses a switch statement to support special handling
for specific node types:
- ListNode Nodes
-
A recursive call is made to handle each node in the list.
typeUnary
-
If the operand has a constant value, the value is fetched and the indicated
operation is performed. The value of the constant node is set to the resultant
value and the context is set to the context of the
unary node. Finally, the unary node is replaced with the constant node.
There is no need to delete the now abandoned unary node since it was
registered in the nodeStack belonging to
the abstract syntax tree when it was created and will be properly
deleted when the tree itself is deleted.
typeBinary
-
If both operands are constant, the indicated operation is performed
leaving the result in the left operand node. The binary node is then
replaced with the left operand node.
There is no need to delete the now abandoned binary node, since it was
registered in the nodeStack belonging to
the abstract syntax tree when it was created and will be properly
deleted when the tree itself is deleted.
Compiler(char *text,
AgDictionary<AgString> &d);
-
The constructor invokes the buildTree() function to create
an abstract syntax tree from the text string. It also records a reference
to the dictionary.
~Compiler();
-
There is nothing for the destructor to do.
ScriptMethod compile();
-
If the OPTIMIZE switch is set, compile() begins by calling
doConstantArithmetic() on the
root node of the tree. Then it calls the private
compile()
function on the root node of the tree, and appends a HLT
instruction to the code. Finally it returns a
ScriptMethod object, constructed from the
generated code, the dictionary and the constantList.
|