Extending Interpreters:
Adding Arrays and Subscripts
To illustrate the methods involved in extending interpreters, a set of
interpreters has been created which adds arrays and subscripts to the
features provided by the baseline interpreters. Source code for these
extended interpreters can be found in cll\array,
pll\array and support\array.
It is perhaps misleading to speak of changes to the interpreters, since
adding support for arrays and subscripts required only that code be
added to the interpreters. In no case was it necessary to modify
existing code. New rules have been added to the syntax. The same rules were
added to both the CLL and the PLL languages. New code to implement the new features
has been added to the support code.
Although it may seem that arrays and subscripts should be part of the basic
syntax supported by the kit, they have been left as an extension since there are
a large number of ways to implement arrays and subscripts. It was felt that
the baseline interpreters in the kit should be limited to those features
which enjoy general agreement as to their implementation.
There are two basic ways that arrays can be implemented. One way is to
simply set up an AgMap<Value, Value> object which would map any
value to any other value. This approach would not require any sort of
declaration and would allow statements of the form:
x["foo"] = bar;
as well as conventional subscripts. Although this is an interesting
approach, it is not what most people think of when they think of
arrays and subscripts.
The approach actually chosen is to implement a simple array declaration:
array foo[size];
where size can be an arbitrary integer expression. The expression
is evaluated and an array of the specified size is assigned to the variable
foo. The values in the array are initially set to
uninitType. Elements of the
array can now be set to arbitrary values.
To provide unity in the implementation, the Value class has been extended so
that the value of any variable could be an array. Since each array is an
array of Value class instances, it is possible to have array elements which
are themselves arrays, thus providing the capability of multidimensional
arrays. For example, here is a way to set up a three by three matrix.
array x[3];
for (i = 0; i<3; i++) {
array temp[3];
x[i] = temp;
}
for (i = 0; i<3; i++) {
for (j = 0; j<3; j++) {
x[i][j] = 1.0/(i+j+1);
}
}
Since character strings are already arrays of characters, it seemed
reasonable to extend subscripting to string valued variables as well.
For example,
x = "castle";
y = x[2];
x[2] = 't';
The value of y is 115, the ascii code for the letter 's'.
Setting x[2] to 't' makes the value of x equal to "cattle".
The most obvious changes to support arrays and subscripts are the changes
to the syntax. The changes to the supporting code are somewhat more
extensive.
The first and largest set of changes to supporting code is to the common definitions module. These changes include the
definition of a CharPointer class, to
represent pointers into character strings and additions to the Value class. These modifications are
sufficient to provide array and subscript support for the direct execution interpreters. They are also necessary
for all the other interpreter architectures.
The second set of changes, to the bytecode
interpreter module, added to the changes to the common definitions,
suffices to support the direct compilation interpreters.
These changes are also necessary for the astci interpreter.
The change to the opcode enumeration is
also necessary for the astxi interpreter.
A small set of changes to the definitions of the
AstNode class is also required for
both the astxi and astci
interpreters.
Finally, a small set of changes to the
treewalk functions of the astxi and astci is required to add arrays and subscripts to the
abstract syntax tree interpreters.
To add array and subscript support to the syntax, it was necessary to add
only two new rules to the grammar: an additional rule for the token simple
statement:
simple statement
-> "array", name, '[', expression, ']', ';'
and an additional rule for the token lvalue:
lvalue
-> lvalue, '[', expression:x, ']'
The first rule creates an array of specified size. The second implements
subscripts. Note that since lvalue is recursively defined, it is possible
to have multiple subscripts, as in C and C++.
The reduction procedures for these rules depend, of course, on the
interpreter architecture. They do not, however, depend on the choice of
language, CLL or PLL.
The CharPointer class has been added to represent lvalues which
refer to character positions within a string. It enables
scripts to write as well as read single characters within
a string.
AgString &string;
-
A reference to the string that is the base of the subscript
operation. Note that a reference can be used since the lvalues
that give rise to a CharPointer value are transitory. They are created
and deleted entirely within the scope of a single expression.
int index;
-
The index into the string specified by the subscript operator.
CharPointer(AgString &, int);
-
Creates a CharPointer object from a string and an index. If the index is
out of bounds, an ErrorMessage
exception is thrown.
CharPointer(const CharPointer &);
-
The copy constructor exists simply for completeness.
~CharPointer();
-
There is nothing for the destructor to do.
char &value() ;
-
This function returns a reference to the indexed character within
the string. The reference may be used either to read or to write
the character value.
One facet of supporting arrays and subscripts is to modify the Value class so that a Value object may
contain an AgArray<Value> object.
Note that the AgArray template class uses value semantics.
Because the AgArray template class has a constructor, an AgArray object
cannot be added to the anonymous
union. Instead, the AgArray object has to be constructed by using the
ValueWrapper class.
The changes to the Value class required to support arrays and subscripts are as follows:
-
Add two new types, arrayType and charPointerType, to the Type enumeration.
-
Add entries to the anonymous
union to guarantee adequate space for the
AgArray and
CharPointer objects:
char arraySpace[sizeof(ValueWrapper< AgArray<Value> >)];
char charPointerSpace[sizeof(ValueWrapper<CharPointer>)];
It is good practice to include such sizing entries for each class that
uses the ValueWrapper class, since
not all classes can be counted upon to fit within a double.
-
Add constructors to the Value class
that will create a Value object containing an array of a specified size
or a CharPointer object
-
Overload the subscript operator in the
Value class so it can access an element of an array or a character in a
string.
-
Add arrayType and
charPointerType cases to the
switch statements in the following functions:
Note that every switch statement that has a
pointerType case also requires
a corresponding charPointerType
case.
-
In addition to the new constructors
and the subscript operator override,
add the following new functions to the Value class to support the
above modifications:
-
Add string constants for error messages corresponding to new error
modes:
- Subscripting a variable whose value is not an array or string.
- Subscript out of bounds.
Generally speaking, new cases for the arrayType
and the charPointerType must be
added to each switch statement in Value class functions. No charPointerType
case is needed for those functions which are not applied to lvalues.
Two major classes of functions are the assignment operators and the
increment/decrement operators.
Each assignment operator requires a new case to handle the
charPointerType. These
represent the situations where a subscripted string appears on
the left side of an assignment operator. For example:
x = "castle";
x[2] = 't';
x[1] |= 0x14;
The value of x is now "cuttle".
Each of the autoincrement and autodecrement operators also requires
handling the charPointerType.
These cases take care of situations like the following:
x = "castle";
y = x[2]++;
The value of x is now "cattle" and the value of y is 115, the ascii
value of 's'.
In the following list of functions and operator overrides, only
representative cases of the assignment and increment/decrement operators
are provided.
Value &setValue(const Value &v);
-
In the setValue() function, used by constructors and the assignment
operator, cases corresponding to the arrayType
and the charPointerType
have been added. Both use the placement new operator of the ValueWrapper class to store the
appropriate object in the space provided by the anonymous union.
~Value();
-
In the destructor, the wrapper for the array object must be deleted.
Since the array itself is reference counted, the actual array will only
be deleted if this is the last reference to the array.
int hash(int) const;
-
The arrayType case calculates a hash
value for the array, No hash value is required for the
charPointerType
case, since the lvalues cannot exist except in the course of evaluating
an expression. Thus they cannot be entered into containers. Note that
this would not be the case if the interpreter were to implement pointers.
const Value &deref();
-
The charPointerType case is added
to fetch the character the pointer points to, turning the object into
an integerType object.
AgString asString() const;
-
The arrayType case is added to make
an appropriate AgString representation of the array.
No case is required for the charPointerType
since lvalues are never output.
Value &operator = (const Value &v);
-
A case has been added for the charPointerType
which dereferences the pointer and sets the value of the character pointed
to, as well as the this object, to the value of the argument.
A case has also been added to delete the ValueWrapper containing the
AgArray object.
Deleting the old value is exactly the same as in the destructor.
Value &operator += (const Value &v);
-
A case has been added for the charPointerType
which dereferences the pointer and adds the value of the argument to
the character pointed to. The value of this is set to the resulting
sum and the type to integerType.
All the remaining assignment operators are handled in a similar manner.
int operator == (const Value &v) const;
-
The arrayType case determines that
two arrays are equal if and only if the arrays have the same size and
all elements are equal.
Nothing is required for the charPointerType
case, since comparisons are not done on lvalues.
int operator ++ (int);
-
The charPointerType
case of the postincrement operator sets the value of the object to
the unincremented value of the character pointed to, sets the type
to integerType and
increments the character pointed to.
The other autoincrement and autodecrement operators are handled in
a similar fashion.
Value(const Value &, int);
-
When defining the constructor for an array object, one's first instinct is
to define it to take a single integer argument. This, however, conflicts
with the constructor for an integerType.
The next thing that comes to mind is to define the constructor to take
a single Value object. This approach has the unpleasant property of
conflicting with the copy constructor for the class.
The approach finally taken was to put a dummy argument in the calling
sequence which is ignored. This is similar to the convention in C++ for
distinguishing overrides for the pre-increment and post-increment
operators. Thus the arguments to the constructor are a Value object,
whose value determines the size of the array, and an integer whose
value is utterly ignored.
Note that as implemented any scalar value, integerType or
realType, may be used to specify the size of the array.
If the scalar value is not integral, it is truncated, thus
array x[7.3];
creates an array of seven elements.
Value(const CharPointer &);
-
This constructor uses the placement new operator of the ValueWrapper to construct a
CharPointer object in the space allocated by the anonymous union.
Value &operator [] (Value &);
-
Prior to actually doing the subscript operation, the function
must first verify:
- The value being subscripted is an array.
- The subscript is a scalar, i.e.
integerType or
realType.
- The subscript is not out of bounds.
void assertArray() const;
-
Throws an exception if the Value object is not an array.
AgArray<Value> &getArray() const;
-
getArray() returns a reference to the AgArray object contained
in a Value object.
CharPointer<Value> &getCharPointer() const;
-
getCharPointer() returns a reference to the CharPointer object contained
in a Value object. If the type of the object is not
charPointerType an
ErrorMessage exception is thrown.
To extend the bytecode interpreter to handle
arrays and subscripts requires only the addition and implementation of two
opcodes:
- ARRAY
-
The ARRAY instruction takes the size of the array from top element of the
stack and replaces the top of the stack with a Value object initialized to contain an
array of the specified size.
- SUBSCRIPT
-
The SUBSCRIPT instruction pops the index from the top of the stack and
subscripts what is now the top of the stack. The subscript operator in
the Value class replaces the lvalue
which identifies the array with the lvalue of the array element. This
is true for both explicit arrays and for character strings.
Implementing these opcodes requires adding them to the opcode enumeration in opcodes.h, adding
ascii representations for them to the opcodeText[] array in bcidefs.cpp,
and, finally, adding two cases to the switch statement in the ScriptMethod::apply()
function. No other changes to the bytecode interpreter definitions are
required.
To extend the abstract syntax tree definitions
to handle arrays and subscripts requires only addition of two
new nodeTypes, typeArray and typeSubscript, to the
node type enumeration.
Corresponding structs, subnodesArray and subnodesSubscript,
have been added to the anonymous union of the
AstNode class to allow
access to the subnodes by name. The final change was to add appropriate
entries to the nSubnodes[]
array. It should be noted that no changes at all were required to the
AbstractSyntaxTree class.
The changes to the two abstract syntax tree interpreters, astxi and astci, are essentially
trivial.
The only changes to astxi are the addition of a typeArray
case to the switch statement in Interpreter::execute() and
the addition of a typeSubscript case to the switch statement in
Interpreter::evaluate(). Of course,
this interpreter also depends on the changes made to the common definitions,
the changes to the opcodes.h enumeration, and the changes to the abstract
syntax tree parser.
The only changes to astci are the addition of typeArray and
typeSubscript cases to the switch statement in Compiler::compile().Of course,
this interpreter also depends on the changes made to the common definitions,
the changes to bytecode interpreter, and the changes to the abstract
syntax tree parser.
|