CLL Syntaxcll.syn contains the syntax that describes the C-like Language used by one set of baseline interpreters in the Extensible Interpreter Development Kit. This syntax supports the basic elements of a script language: recognition of variable names and constants, evaluation of expressions, and execution of conditional and loop statements, all based on the familiar C syntax. These may fairly be regarded as the portions of C that have become more or less standard and are not the subject of any significant controversy. Omitted are declarations, pointers, subscripts, and switch statements.
The syntax also contains three additional elements. To illustrate changes in expression syntax, Fortran style exponentiation has been added. To aid in debugging scripts and to illustrate how to add new statements to the grammar, dump and print statements have been added.
The files dci.syn and ast.syn use this syntax unchanged except for the addition of reduction procedures. The file dxi.syn introduces a number of syntactic changes in order to facilitate computation. The changes do not, however, affect the language recognized by the syntax.
cll.syn is a convenient peg on which to hang a discussion of those features of the script language and the parsers that are common to all the interpreters in the development kit. Apart from the lexical section of the grammar, it lacks reduction procedures, since they differ from interpreter to interpreter. cll.syn is not itself used in any of the interpreters in the kit. It can be used by developers as a clean starting point for their own development.
This document describes the elements of cll.syn in the order in which they appear.
There are two header files that are required for all the parsers:
if statement -> "if", '(', expression, ')', statement -> "if", '(', expression, ')', statement, "else", statementThis has the advantage of succinctness and clarity. It has the disadvantage of ambiguity. In the statement
if (condition_one) if (condition_two) do_something(); else do_something_different();Does the else belongs to the first or the second if? Although, by convention the else belongs to the second if, that is, to the if physically closest to it, both interpretations are consistent with the above syntax. This problem is often referred to as the "dangling else problem", or the "if-else ambiguity".
There are two ways to deal with the problem: resort to some sort of trick to force the parser to select the correct interpretation, or rewrite the grammar to avoid the difficulty. Using AnaGram, it is possible to use a sticky directive to handle the problem. Generally, however, it is best not to resort to such tricks. The parsers in the development kit, therefore, have been written in such a manner as to avoid this ambiguity.
In order to support this technique, statements are classified as "open statements" or "closed statements", depending on whether or not they could be followed by an else clause.
There are, therefore, both open and closed if statements:
open statement -> if condition, statement -> if condition, closed statement, "else", open statement closed statement -> if condition, closed statement, "else", closed statementNotice that a simple if statement is always an open statement. If-else statements are open or closed depending on whether the statement following the else is open or closed.
The script language designer should note that the dangling else problem is a direct result of what is fundamentally unnecessary generality in the definition of C. If the definitions were
if statement -> "if", '(', expression, ')', compound statement -> "if", '(', expression, ')', compound statement, "else", compound statementthere would be no problem. In other words, by allowing the user to avoid writing a pair of curly braces, substantial complication has been introduced into the language, as well as substantial opportunities for introducing bugs into programs. Certainly every programmer has seen the following problem:
if (condition) do_something(); do_something_else();where the call to do_something_else() is not controlled by the if, in spite of the suggestive indentation. If the language had been defined to only allow compound statements to be controlled by if statements, this opportunity for error would not exist.
open statement -> WHILE, '(', expression, ')', open statement closed statement -> WHILE, '(', expression, ')', closed statementNote that a while statement is open or closed depending on whether the statement controlled by the while is open or closed.
open statement -> FOR, '(', optional expression, ';', optional expression, ';', optional expression, ')', open statement closed statement -> FOR, '(', optional expression, ';', optional expression, ';', optional expression, ')', closed statementNote that a for statement is open or closed depending on whether the statement controlled by the for is open or closed.
The dump statement consists of the keyword "dump", followed by a comma-delimited list of variable names. All the interpreters implement this statement by printing the name of each listed variable and its value, each variable on a line by itself.
print statement -> "print", arg list
The expression syntax, following conventional usage, implements operator precedence using normal syntactic conventions. It does not use shortcuts such as operator precedence declarations.
The conventional method for writing expression syntax is to define a token for each level of precedence, beginning with the lowest level. At least two rules are written for each of these tokens. The first rule simply consists of the token that defines the next higher level of precedence. The other rules define the actual operators at the given precedence level. These latter rules are written as left or right recursive rules, depending on whether the relevant operator groups to the left or to the right.
Operators are said to "group to the left" or "group to the right" depending on the implicit parenthesization that is understood. Subtraction, for example, groups to the left. We interpret
a - b - cas
(a - b) - cIt would contravene mathematical conventions to compute it as
a - (b - c)On the other hand, the logical or operator groups to the right:
x || y || zis fairly interpreted as
x || (y || z)From a mathematical point of view, associative operators may be grouped either to the left or the right. Nonassociative operators, to be consistent with conventional usage, must be grouped to the left.
Since grouping operators to the right requires right recursion and right recursion runs a remote risk of parser stack overflow, it is preferable to group operators to the left wherever it makes sense to do so.
In the following, the tokens that define expression syntax are described in order of increasing precedence.
expression -> assignment expression -> expression, ',', assignment expressionThese rules define the comma operator as the lowest level of precedence in the expression syntax. Since expression is left recursive, items group to the left. The next higher level of precedence is specified by assignment expression.
Note that the expression to the left of the comma must be evaluated and its value discarded in order to allow for side effects. The proper functioning of the initialization, condition, and increment fields of the for statement depend on this property of the comma operator.
Note that if you need a comma delimited list of expressions, as in arg list and the print statement, you should use assignment expression rather than expression to avoid confusion with the comma operator.
assignment expression -> conditional expression -> name, assignment op, assignment expression assignment op -> '=' -> "+=" -> "-=" -> "*=" -> "/=" -> "%=" -> "|=" -> "&=" -> "^=" -> "<<=" -> ">>="Assignment operators group to the right, since it would not make sense for them to group to the left. They take precedence over the comma operator.
conditional expression -> logical or expression -> logical or expression, '?', expression, ':', conditional expressionThe conditional expression is a ternary operator that groups to the right. The conditional expression, as defined in C, presents an implementation challenge, since of the two expressions selected by the logical expression, only one is to be executed. If both were to be executed and the results of one simply discarded, there would be a problem with possible side effects created by the expression that was discarded.
The ? operator groups to the right and takes precedence over the comma operator.
logical or expression -> logical and expression -> logical and expression, "||", logical or expressionThe logical or expression groups to the left. Note that like the conditional expression and the logical and expression there is a problem with side effects since the expression to the right of the operator is to be evaluated only if the expression to the left returns false.
The || operator groups to the right and takes precedence over ?.
logical and expression -> inclusive or expression -> inclusive or expression, "&&", logical and expressionThe logical and expression groups to the left. Note that like the conditional expression and the logical or expression there is a problem with side effects since the expression to the right of the operator is to be evaluated only if the expression to the left returns true.
The && operator groups to the right and takes precedence over ||.
inclusive or expression -> exclusive or expression -> inclusive or expression, '|', exclusive or expressionThe | operator groups to the left and takes precedence over &&.
exclusive or expression -> and expression -> exclusive or expression, '^', and expressionThe ^ operator groups to the left and takes precedence over |.
and expression -> equality expression -> and expression, '&', equality expressionThe & operator groups to the left and takes precedence over ^.
equality expression -> relational expression -> equality expression, equality op, relational expression equality op -> "==" -> "!="Equality operators group to the left and take precedence over &.
relational expression -> shift expression -> relational expression, relational op, shift expression relational op -> '<' -> "<=" -> '>' -> ">="Relational operators group to the left and take precedence over equality operators.
shift expression -> additive expression -> shift expression, shift op, additive expression shift op -> "<<" -> ">>"Shift operators group to the left and take precedence over relational operators.
additive expression -> multiplicative expression -> additive expression, additive op, multiplicative expression additive op -> '+' -> '-'Note that additive operators group to the left and take precedence over shift operators.
multiplicative expression -> unary expression -> multiplicative expression, multiplicative op, unary expression multiplicative op -> '*' -> '/' -> '%'Multiplicative operators group to the left and take precedence over additive operators.
unary expression -> factor -> '+', unary expression -> unary op, unary expression unary op -> '-' -> '!' -> '~'The unary + is treated separately, since it does not normally require any computation.
Unary operators group to the right and take precedence over multiplicative operators.
factor -> primary -> primary, "**", unary expressionThe exponentiation operator has a distinctive operator precedence, in that its precedence is different to the right and to the left. It takes precedence over leading minus signs for example:
-x ** yis construed as
-(x ** y)but at the same time, it is possible to write:
x ** -ywithout the need for parentheses around the right hand operand. These rules reflect normal mathematical conventions, though they are difficult to achieve with simple precedence grammars.
unary expression -> primary -> '+', unary expression -> unary op, unary expression
primary -> '(', expression, ')' -> constant -> lvalue -> "++", lvalue -> "--", lvalue -> lvalue, "++" -> lvalue, "--" -> function call -> '(', "long", ')', primary -> '(', "double", ')', primaryBecause the development kit language does not support arbitrary casts, subscripts, or pointers, primary is somewhat different from the corresponding token definition in a standard C syntax.
Note that primary has simple right recursion for the two cast operators that are defined and the much more complex center recursion for parentheses. Otherwise, primary is the leaf node of the expression tree. Unlike C and C++, the auto-increment and auto-decrement operators are defined simply as operating directly on variables. This is possible because the development kit does not support pointers.
function call -> name, '(', optional arg list, ')'The function call definition is designed to allow function calls with a comma delimited argument list. The rules for the arg list token use the assignment expression token instead of expression since, in a calling sequence, comma is used to delimit arguments. If the grammar had been written only for use with parsers that generate an abstract syntax tree, it would have been possible to write the function call rule as:
function call -> name, '(', optional expression, ')'since tree-walk functions could separate out the individual arguments.
Character set expression may be used directly as tokens in any rule. As an alternative, they may be named for convenience. In this grammar, four character sets are given explicit names.
Since the parser is configured to take its input from a string in memory, eof is defined simply as a null byte. If, on the other hand, the parser were intended to take stream input, one might write:
eof = -1 + 0 + ^D + ^ZSuch a definition would allow for all of the conventional representations of end of file.
The definitions of the name token and the white space token are self-explanatory.
The upshot of this is that what starts out looking as though it is an octal integer can suddenly turn into a floating point number. If this should happen, the octal conversion is wrong and needs to be redone. The makeDecimal() function recovers the original digits and reconverts them as decimal. The token hybrid integer is used to accumulate these constants.
|Table of Contents|||||Parsifal Software Home Page|
Interpreter Development Kit
Copyright © 1997-2002, Parsifal Software.
All Rights Reserved.