Parsifal Software


XIDEK
Interpreter Development Kit
Reference documentation


dci: Direct Compilation Interpreters

Introduction

The baseline "Direct Compilation" interpreters, found in cll\base\dci.syn and pll\base\dci.syn compile the statements in a script into bytecode as they parse the script, without creating an intermediate representation of the script. This approach is facilitated by the CodeFragment class which provides methods for creating small fragments of code and then combining them into larger fragments, and finally into one fragment which corresponds to the entire script.

The interpreter interface function, interpret() is implemented by creating an instance of the ScriptMethod class. The class constructor invokes the parser to compile the script into bytecode. The apply() function of ScriptMethod is then used to execute the bytecode. ScriptMethod also has a list() function to create a listing of the bytecode. A call to the list() function has been provided in the interpret function, but has been commented out for testing purposes.

In what follows, we describe first the features of the CLL version of the dxi.syn syntax file and how it differs from the cll.syn file. Then we describe the differences required for the PLL version.


C Prologue

Introduction

The C Prologue is the initial piece of embedded C/C++ code in a syntax file. AnaGram copies it to the parser file before writing any generated code, so it is the appropriate place to include definitions and declarations needed by the parser. In particular, any data type that is used as a token type must be defined in the C prologue, either explicitly or by means of an include file.

Include File

The bcidefs.h header file contains definitions of the ScriptMethod and CodeFragment classes.

Configuration Section

Configuration parameters that are common to all of the parsers in the kit are discussed in the descriptions of the CLL and PLL syntaxes. The following additional configuration parameters are set in the CLL version of dci.syn:
wrapper {CodeFragment}, AgStack<CodeFragment>, Constant}
A wrapper declaration tells AnaGram to ensure that constructors and destructors are properly called when instances of the named classes are stored on the parser value stack. If the wrapper declaration were not used, the objects would be stored on the stack by coercing a pointer. This would cause the classes to malfunction. The rule of thumb is that if a class or any of its member fields overrides the assignment operator, it should have a wrapper declared.

parser name = dci
This statement causes the generated parser function to be named dci. The struct defining the parser control block will be named dci_pcb_struct and a typedef dci_pcb_type is also defined to be equivalent to struct dci_pcb_struct.


Parser Control Block Extensions

In order to support reentrancy, it is convenient to declare local data used by the parser in the parser control block. It is also convenient to declare functions used by the parser as members of the parser control block, though this is not, strictly speaking, necessary.


Added Fields

AgDictionary<AgString> &dictionary;
This is a reference to the dictionary the parser is to use. Note that the dictionary must exist independently of the parser. It is initialized by the constructor.

AgDictionary<Constant> constants;
This dictionary is used to identify the constant values encountered in the course of parsing and compiling a script into bytecode. Constants are referenced in the bytecode by their dictionary index. Use of a dictionary means that no matter how many times a particular constant appears in the script, there is only one instance of it in the dictionary. This can be quite significant when using only 8-bit bytecodes.

int loopDepth;
This counter is used to track the depth of nesting of loops. It is initialized to zero by the constructor and incremented when the beginning of a loop is encountered. It is decremented when the end of the loop is reached. The loopDepth counter is used when parsing break and continue statements. If it is zero, there is no active loop, so a break or continue is an error and an exception is thrown.


Local Functions

The following functions are declared to be member functions of the parser control block. The actual implementations of these functions are found in the embedded C portion of the syntax file.


Reduction Procedures

The useful work of any parser is carried out by the reduction procedures which indicate what is to be done when a grammar rule is matched. In the case of dci(), there are two types of reduction procedures: The first type creates a fragment of code, encapsulated in a CodeFragment object, from scratch. These procedures invoke one of the three code() functions to create a single bytecode instruction, consisting of an opcode and possibly an operand. The second type of reduction procedure, presented with one or more CodeFragment objects and a desired operation, stitches them together by inserting or appending the appropriate instructions. In all these cases, the reduction procedure should be reasonably self-explanatory.


Embedded C

Macro Definitions

SYNTAX_ERROR
This definition overrides the default definition of SYNTAX_ERROR that AnaGram provides. The default definition simply writes the diagnostic to stderr and causes the parser to return with the exit_flag field of the parser control block set to AG_SYNTAX_ERROR_CODE. The overriding definition calls reportError() which formats an error message and throws an exception.

GET_CONTEXT
The GET_CONTEXT macro implements the context tracking feature of the AnaGram parser. In particular, it creates a ParserContext object that describes the current location in the input file and stores it on the context stack.


Parser Control Block Member Functions

The following member functions of the parser control block, dci_pcb_struct, are declared in the extensions to the parser control block.
dci_pcb_struct( AgDictionary<AgString> &d, char *text);
This constructor for the parser control block sets the dictionary reference, initializes the pointer to the text of the script, and initializes loopdepth and argCount to zero.

void reportError();
The parsing engine calls this function (as directed by the SYNTAX_ERROR macro) when it encounters a syntax error. The function formats the error information and throws an exception.

void reportError(const char *msg);
This function is available to add context information to an error message and then throw an exception. It is used by the interpret() function to handle exceptions thrown during compilation by methods belonging to the CodeFragment class.

void checkLoop();
The checkLoop() function is used to verify that break and continue statements are inside loops. If no loop is active an exception is thrown.

int idName(const AgString &name);
The parameter is the name of a variable. the function ensures it is in the dictionary and returns its index.

CodeFragment code(Opcode, const AgString &name);
This function generates a code fragment consisting of the specified opcode followed by the index of the specified variable name in the dictionary. This function is used to generate code for primary expressions.

CodeFragment code(Opcode, const Constant &);
The argument is entered into the constants dictionary. The function then generates a code fragment consisting of the specified opcode followed by the index of the argument in the dictionary.

CodeFragment codeCall( const AgString &name, AgStack<CodeFragment> &args);
The AgString object is the name of a function to be called. The code fragments on the stack consist of the code necessary to evaluate and stack each argument to the function. codeCall begins by concatenating the code fragments that evaluate the arguments and then looks up the function name in the function table, relying on the size of the stack for the number of arguments,and appends a CALL instruction to the concatenated argument code which it then returns.


ScriptMethod::ScriptMethod(const char *text, AgDictionary<AgString> &d);

The constructor for the ScriptMethod is implemented in two stages. In the first stage, a parser control block is created, and the dci parser is invoked to compile bytecode. A try-catch block is used to handle errors that occur during compilation. Then the return value of the parse, a CodeFragment object, is retrieved and the bytecode extracted and stored in the ScriptMethod object. Finally, the constants are extracted from the parser control block and stored in the constantList array.


The External Interface: interpret()

The external interface for the interpreter begins by constructing a ScriptMethod object. The constructor actually parses the script and compiles the byte code. interpret() then invokes the ScriptMethod::apply() function to execute the bytecode. A call to the ScriptMethod::list() function, commented out for regression testing, may be used to create a listing of the generated bytecode.


PLL Differences

The differences between the PLL syntax and the PLL version of the dci syntax are essentially identical to those described above for CLL. Because the PLL language does not implement break and continue statements, the loopDepth counter and the checkLoop() function are not needed.


Table of Contents | Parsifal Software Home Page


Interpreter Development Kit
Copyright © 1997-2002, Parsifal Software.
All Rights Reserved.