Extending Interpreters:
Adding Function Definitions
As one illustration of the methods involved in extending interpreters, a
set of interpreters has been created which adds function definitions and
calls to the features provided by the baseline interpreters. It also adds
a local statement which provides for variables whose scope is
effectively limited to a single script or function. Source code
for these extended interpreters can be found in cll\function, pll\function and support\function.
Although adding support for arrays and structures required no changes to
existing code, the same is not true for function definitions. The first
reason is that the baseline interpreters already had support for calls to
library functions. In this extension, the function call logic must be
modified to distinguish between library functions and locally defined
functions. Also, the implementation of the local statement required
modifying the rule defining the "start" token of the grammar.
Otherwise, adding support for function definitions required only that code be
added to the interpreters. New code to implement the new features
has been added to the support code. New rules have been added to the
syntax. Almost identical rules were added to both the CLL and the PLL languages. The
only difference is the use of begin/end to mark the body of a function in
the PLL syntax and braces in the CLL syntax.
There are, of course, many ways to implement function definitions. The
method chosen here was to use a syntax similar to that used for arrays
and structs:
function <function name> ([<parameter name>]...) {
< function body>
}
in the CLL syntax, or, in the PLL syntax:
function <function name> ([<parameter name>]...)
begin
< function body>
end
(Note that line endings are simply white space and may be used or
omitted freely.)
The function definition is encapsulated in a LocalFunction
object which is stored in the variable named by <function name>. Since functions
can thus be values of variables, they may be assigned to other variables and
passed as arguments to functions.
The Euclidean algorithm for greatest common divisor can
be written thus in the CLL notation:
function gcd(n, m) {
local remainder;
if (n < 0) return gcd(-n, m);
if (m < 0) return gcd(n, -m);
if (n < m) return gcd(m,n);
remainder = n%m;
if (remainder != 0) return gcd(m, remainder);
return m;
}
or, in the PLL notation:
function gcd(n, m) begin
local remainder;
if n < 0 then return gcd(-n, m);
if m < 0 then return gcd(n, -m);
if n < m then return gcd(m,n);
remainder := n mod m;
if remainder <> 0 then return gcd(m, remainder);
return m;
end
When the interpreter encounters a function definition, it simply scans over
the text of the function and saves it as a string in a LocalFunction object. When the function is called,
the string containing the text of the function is extracted and the
interpret() function is called recursively.
The local statement consists of a list of variable names. If it is
used, it must be either the first statement of the script or the first
statement of a function definition.
The modifications to support local function definitions are substantially
more extensive than those required to support arrays and subscripts, on
the one hand, or structures and objects, on the other.
The most obvious changes are the changes to the
syntax. Not only added syntax was required, but a small
modification to existing syntax was required.
The changes to the supporting code are somewhat more extensive.
The first and largest set of changes to supporting code is to the common definitions module. These changes include the
definition of an LocalFunction struct, to
encapsulate function definitions. These modifications are sufficient to
provide support for the direct execution
interpreters. They are also necessary for all the other interpreter
architectures.
The second set of changes, to the bytecode
interpreter module, added to the changes to the common definitions,
suffices to support the direct compilation interpreters.
These changes are also necessary for the astci interpreter.
A set of changes to the definitions of the
AstNode class is also required for
both the astxi and astci
interpreters.
Finally, a set of changes to the
treewalk functions of the astxi and astci is required to add structures and objects to the
abstract syntax tree interpreters.
To add function definition support to the syntax, it was necessary to add
three new tokens to the grammar:
name list -
A name list is a comma delimited list of names. The name list token is
used both to specify the parameter names and the local variables in a
function definition. The name list token is identical to the name
list token defined in the Structures and Objects
extension.
Note that the name list token could be used to implement the dump statement.
(AgStack<AgString>) name list
-> name:n =AgStack<AgString>().push(n);
-> name list:stack, ',', name:n =stack.push(n);
The same implementation of the name list is used in dxi, dci and ast parsers. To make sure that
the AgStack
object is properly constructed on the parser stack, a wrapper statement has been added to the
configuration section.
local statement
-
A local statement has been added to the language in order to
provide the capability of using local variables inside a function without
disturbing the value of like named variables used outside the function.
As with all extensions, there are a number of ways that the local
statement could have been implemented. In many ways, the most convenient
method of implementing scopes is to use a push down stack of symbol
tables. On entering a scope, the current symbol table is stacked and a
new symbol table is created. On exiting the scope, the previously active
symbol table is restored. In order to handle variable access correctly,
symbol table lookups must scan the current symbol table and all stacked
tables until the symbol is found or there are no more symbol tables.
Such an implementation, however, must usually be designed in from the
beginning. In the present circumstance, it would require retrofitting all
symbol table lookups in the interpreters. Accordingly, another approach was
taken.
The approach taken to implementation of the local statement is
to use a single symbol table together with a stack. On entry to a function,
the values of the local symbols are stacked and then set to uninitType. On exit from the
function, the values are restored from the stack.
The local statement, if used, must be the first statement of a
script or function. It has the following syntax, in both the CLL and PLL languages:
local statement
->
-> "local", name list, ';'
The reduction procedures for these rules depend, of course, on the
interpreter architecture. They do not, however, depend on the choice of
language, CLL or PLL.
Here is how they are implemented:
- dxi
-
The local localVariables()
function is called to save the values of the listed variables
on a stack. The original values will be restored on exit from the
interpret() function.
- dci
-
The list of local variable names is saved and passed on to the
ScriptMethod constructor, where it is
used by the apply() function to save
values on entry and restore them on exit.
- ast
-
A NameListNode is created and returned.
function body
-
Using the context variable and the
input pointer, the function body production identifies the
beginning and end of the function text and copies it to an AgString object:
(AgString) function body
-> local statement, statement list ={
return AgString().append(
(const char *) CONTEXT.pointer,
PCB.pointer - CONTEXT.pointer);
}
The same reduction procedure is used for all interpreters.
function definition
-
The function definition differs from the C function definition in the use
of the key word "function" and the lack of any return type. The return
type of any function is automatically Value. If a function does not specify a
return value, a value of uninitType is returned. The
CLL syntax is:
function definition
-> "function", name,'(', name list, ')',
'{', function body, '}'
The PLL syntax is:
function definition
-> "function", name,'(', name list, ')',
"begin", function body, "end"
The reduction procedures for these rules depend, of course, on the
interpreter architecture. They do not, however, depend on the choice of
language, CLL or PLL.
Here is how they are implemented:
- dxi
-
The reduction procedure calls the local, dxi version of the defineFunction() function to
create a LocalFunction object and save it
as the value of the variable given as the function name.
- dci
-
The reduction procedure calls the local, dci version of the defineFunction() function to
generate code to define the function and save it as the value of the
variable given as the function name.
- ast
-
The reduction procedure creates and returns a typeFunction
node.
In order to integrate the function definition syntax with the rest of the
grammar, one rule has been added to the grammar:
simple statement
-> function definition
This rule simply ties the function definition into the grammar. There is
no reduction procedure for this rule in any of the parsers.
One rule has been changed, to provide for local variables in a script or
function:
script $
-> local statement, statement list, eof
Because of this rule, the local statement, if it exists at all,
must be the first rule in the script. Note that the function body token imposes the
same condition.
The "local" statement works by saving the values, if any, of the
listed variables, and then restoring them on exit from the function.
Thus, if two temporary variables are needed, one can specify
local foo, bar;
to define them.
The parameters of the function are also treated as local variables, just
as is the norm in C. Note that any variable may be used within a function,
not just local variables, so that function calls can have side effects.
Because the direct execution interpreters require
special syntax selection
to handle loops, the changed rule for the dxi case is somewhat different:
(Value) syntax selection $
-> SCRIPT, local statement, statement list, eof =Value();
Note that although the reduction procedure is unchanged in dxi, the
dci and ast parsers need
slight modifications in their reduction procedures in order to store away
the list of local variables.
One configuration statement has been added
to all the parsers in order to support the name list
token:
wrapper {AgStack<AgString>}
This provides proper C++ handling of the AgStack object on the parser
value stack.
New data fields and new methods have been added to the parser
control block as follows:
- dxi
-
AgStack<AgString> localList;
-
This AgStack object is used
to record the names of the local variables that must be saved on entry and
restored on exit from a script or function.
AgStack<Value> saveStack;
-
This stack provides storage to save the values of the variables on the
localList.
Value callFunction(const AgString &name, const AgStack<Value> &);
-
This function, local to the parser control block, checks for a variable
with the given name. If the value of this variable is a local function
definition and the number of arguments is correct, the local function is
called. Otherwise, the global version of callFunction()
is invoked to try calling an external function with the given name.
void defineFunction(const AgString &variableName,
const AgStack<AgString> ¶meterNames, const AgString &text);
-
This function is used to create a
LocalFunction object with the specified parameter
names and body text. The object is then saved as the value of the named
variable.
void localVariables(const AgStack<AgString> &list);
-
This function protects the current values of the listed variables, by
pushing their values onto the saveStack.
- dci
-
AgDictionary<AgString> functionNames;
-
This AgDictionary is used to
translate the names of functions that are called into a compact set of
integers for use by the CALL instruction.
As the parser identifies function calls, it records the name of each
function called in this dictionary. The contents of the dictionary are
eventually handed off to the ScriptMethod constructor for use by the
bytecode interpreter.
AgStack<AgString> localList;
-
This AgStack object is used
to record the names of the local variables that must be saved on entry and
restored on exit from a script or function.
CodeFragment defineFunction(
const AgString &n,
const AgStack<AgString> &nameList,
const AgString &text);
-
This function generates a Constant
object which, in turn, contains a LocalFuntion
object which describes the function. Code is then generated to fetch the
constant and store its value in the named variable. Note that the
CALL opcode has been modified to take the
number of arguments to the function on the stack.
- ast
-
No new extensions to the parser control block are required.
The LocalFunction struct serves to encapsulate the definition of a function
that has been defined using the function statement. It identifies
the names of the parameters and the text which comprises the body of the
function. Note that the data members of the struct are defined to be
constant, since once the function is defined, there is no reason to change
these values. By defining them as const, unnecessary copying of data is
reduced.
const AgArray<AgString> parameterList;
-
Contains a list specifying the names of the parameters used in the body of
the function.
const AgString text;
-
This field contains the body of the function as an ascii string. An
implementation of the interpreter that used a bytecode interpreter
might well replace this field with an array of bytecode.
-
LocalFunction(const AgStack<AgString>
&p, const AgString
&t);
-
The primary constructor takes references to a stack of parameter names
and a string of body text. The constructor converts the stack of parameter
names to an array of parameter names.
LocalFunction(const LocalFunction &);
-
The copy constructor exists simply for completeness.
~LocalFunction();
-
There is nothing for the destructor to do. The AgArray and AgString objects both have
destructors which will take care of freeing storage when it becomes
necessary.
-
AgString asString()
const;
-
Returns a string representation of the function. This implementation
simply creates a string of the form "function(a,b,..)" where a, b, ...
are the parameters of the function.
int hash(int startValue = 0);
-
This function calculates a hash code for the instance using the
hash functions for the AgArray and AgString members. Creating a hash
code allows the use of LocalFunction objects in
AgSet,
AgMap and
AgDictionary
containers.
int operator == (const Instance &);
-
Two instances of a LocalFunction object are equal if they have the same
parameters, in the same order, and the function bodies are identical.
The equality operator supplements the hash function to allow use
of Instance objects in AgSet, AgMap and AgDictionary
containers.
The changes to the Value class required to support structures
and objects are as follows:
-
Add a new type, functionType to the
Type enumeration.
-
Since the LocalFunction struct has a constructor,
and thus requires the use of the ValueWrapper class,
add an entry to the anonymous
union to guarantee adequate space for it.
char functionSpace[sizeof(ValueWrapper<LocalFunction>)];
It is good practice to include such a sizing entry for each class that
uses the ValueWrapper class, since
not all classes can be counted upon to fit within a double.
-
Add a constructor to the Value class
that will create a functionType
Value object containing a given LocalFunction object.
-
Add functionType cases to the
switch statements in the following functions (see Added Cases):
-
In addition to the LocalFunction constructor,
add the following new functions to the Value class to support the
above modifications (see Added Functions):
-
Add string constants for error messages corresponding to new error
modes:
- Trying to call something that is not a function.
Value &setValue(const Value &v);
-
In the setValue() function, used by constructors and the assignment
operator, a case corresponding to the
functionType
has been added. It uses the placement new operator of the ValueWrapper class to store the
LocalFunction instance in the space provided by the anonymous union.
~Value();
-
In the destructor, the wrapper for the LocalFunction object must be
deleted. Since the fields in the LocalFunction struct are reference
counted, the actual data will only be deleted if this is the last
reference to it.
int hash(int) const;
-
The functionType case computes a
hash code based on both the parameter list and the body text of the
function.
AgString asString() const;
-
The functionType
passes the call through to the asString()
implementation.
- Value &operator = (const Value &v);
-
A case has been added to delete the ValueWrapper
containing the old value of the LocalFunction object.
Deleting the old value is exactly the same as in the destructor.
int operator == (const Value &v) const;
-
The function case passes
the operation back to the equality operator
defined for the LocalFunction class.
Value(const LocalFunction &);
-
This constructor creates a functionType Value object.
void assertFunction() const ;
-
Throws an ErrorMessage exception
if the Value object is not functionType.
LocalFunction &getFunction() const;
-
If the Value object contains a function definition, this function returns
a reference to it. Otherwise, the function throws an ErrorMessage exception.
int isFunction() const;
-
Returns non-zero if the Value object contains a function definition.
Otherwise, the function returns zero.
Extending the bytecode interpreter, whether for
the CLL or the PLL language, to
handle structures and objects requires the following changes:
-
Define localList as an array of strings.
This contains the names of the local variables the interpreter should
save on entry and restore on exit.
-
Define functionNames as an array
of strings. This list contains the names of variables that are used as
function names in function calls. At compile time, a dictionary is used
to identify the names of functions that are called, so that the name
of the function can be identified by a small integer. When the ScriptMethod
object is initialized, the contents of the dictionary are copied to this
array so that at run time the name of the function to call can be retrieved.
Note that if the dataset dictionary were used for this functionality, and
a script called a library function, sqrt() for instance, the
name of the library function would be incorrectly added to the dataset.
-
An initializer for functionNames
has been added to the ScriptMethod
constructors.
-
The CALL case in ScriptMethod::list() has been changed
to take the name of the function from the functionNames array rather than from the
function table. It also takes the
number of arguments from the top of the stack. Note that in the baseline
interpreters, the function is actually identified at compile time. In the
this extension, that cannot be done, since it may be the user's desire to
override a library function definition. Thus the number of arguments is
simply noted at compile time, and furnished as an argument on the stack
at run time.
-
Define a local stack variable, saveStack, in ScriptMethod::apply(). On
entry to the function, scan over the variables listed in localList and push their values onto
saveStack. On exit from the function, scan
localList in reverse order and restore the named variables.
-
Modify the CALL case in the ScriptMethod::apply() function
so that:
-
it fetches the function name from the functionNames array,
-
finds the value of the named variable
-
If the named variable identifies a function and the number of operands
is correct, the named function is called. Otherwise, an attempt is made
to call a library function with the given name.
The function call process begins by saving the variables named in the parameter
list. It then makes a recursive call to the interpret() function. On return,
the parameter variables are restored to the values they had before the
function was executed.
Extending the Abstract Syntax Tree definitions, whether for
the CLL or the PLL language, to
handle structures and objects requires the following changes:
-
Add new node types to the NodeType
enumeration to support the following types of nodes:
- typeFunction
-
This node type represents a function declaration.
- typeNameList
-
Identifies a node that contains a list of names. It is implemented
as a class, NameListNode, derived from AstNode.
-
Add a struct, subnodesFunction to the anonymous union of the AstNode class to
allow access to the subnodes by name.
-
Define a specialized node class, NameListNode, derived from
AstNode to encapsulate the
typeNameList node. This class is
required since this is a leaf node of the syntax tree.
-
Add subnode counts for the newly defined node types, typeFunction and typeNameList to the nSubnodes[] array.
-
Increment the subnode count for typeScript node types in the nSubnodes[] array to reflect
the fact that an additional subnode has been added to this node.
-
Implement constructors for the newly defined node class, NameListNode.
-
Define a wrapper function,
nameListNode(), to simplify the creation of typeNameList nodes.
The changes to the astxi interprereter required to support local function
definitions are:
-
In the Interpreter::execute() function,
modify the typeScript case to
fetch a reference to the list of local variables and protect them
on a local stack before making the recursive call to the interpret() function. On return, restore
the variables from the stack.
-
In the Interpreter::execute() function,
add a typeFunction case to handle
function definitions. In this case, a variable name, parameter list, and
function text are extracted from the subnodes. A LocalFunction object is created from the
parameter list and the function text, and then assigned to the named
variable.
-
In the Interpreter::evaluate() function,
modify the typeFunctionCall case
to check first for a locally defined function. If one is not found, try
calling the like named library function.
If a local function is found, save the variables named in the parameter list on
a local stack. Then
make a recursive call to the interpret() function. On
return, restore the parameter variables to the values they had before the
function was executed.
The changes to the astci interpreter required to
support local function definitions are:
-
Add a functionNames dictionary to
the Compiler class:
AgDictionary<AgString> functionNames;
This dictionary provides a means of identifying function names without
inadvertently adding the names of library functions to the global symbol
table. The dictionary is created by the compile()
function as it walks the abstract syntax tree.
-
Add a localList stack to
the Compiler class:
AgStack<AgString> localList;
The list is created by the compile() function as it walks
the abstract syntax tree. It is then passed to
the ScriptMethod constructor, where it is used by the ScriptMethod::apply()
function to save and restore the values of local variables.
-
Add a typeScript case to
the Compiler::compile()
function. Note that in the baseline interpreter, the typeScript node
contains a single subnode, typeStatementBlock,
which is handled by the default case in the switch
statement.
In this case, the list of local variables is extracted from the
typeNameList subnode and assigned to
localList. Then the compile()
function is called recursively to compile the body of the script.
-
Add a typeFunction case to
the Compiler::compile()
function. This case generates a Constant
object which, in turn, contains a LocalFuntion
object which describes the function. Code is then generated to fetch the
constant and store its value in the named variable. Note that the
CALL opcode has been modified to take the
number of arguments to the function on the stack.
-
Add initializers for functionNames
and localList to the constructor
for ScriptMethod. After compiling the bytecode, copy the values of these
lists from the Compiler object.
|