Home
Trial Copy
Intro. to Parsing
Users Say...
Special Features
Notation Summary
New 2.01 Features
File Trace
Grammar Trace
Glossary
Examples
Expression evaluator (freeware)
XIDEK interpreter kit (freeware)
Lex/Yacc Comparison
If-else ambiguity
Contact
Parsifal
|
AnaGram LALR Parser Generator:
New Features in AnaGram 2.01
-
AnaGram 2.01 incorporates several changes designed to make it
easier to write thread safe parsers.
First, the new reentrant
parser switch makes the AnaGram parse
engine reentrant by passing the parser control block as an argument
to all function calls. Without it, the parser control block becomes a
global resource, so that only one parse context can be in use at one
time.
Second, the extend pcb statement allows you to add your own
declarations to the parser control block, so that you can avoid
references to global or static variables in your reduction procedures.
Finally, the parsers generated by AnaGram 2.01 no longer use any
static or global variables to store temporary data. All working storage
is now kept on the stack or in the parser control block.
These are the steps to make a parser thread safe:
- Set the reentrant parser switch in your syntax file.
- Add one or more extend pcb statements to your syntax file
and include declarations for all the variables needed by your
reduction procedures. Update your reduction procedures
accordingly.
- If your parser will modify any variable which is not in the
parser control block, make sure that variable is protected by
a mutex, or otherwise synchronized properly.
- To run the parser, declare an instance of the parser control
block on the stack, initialize your fields in the parser control
block as appropriate, lock any relevant mutexes, and then call
the parser function with a pointer to the parser control block
as the argument.
-
In previous versions of AnaGram it has not been possible to return class
instances (rather than pointers to them) from reduction procedures except
under limited circumstances. This is because AnaGram generates code that
stores objects on the parser value stack simply by casting the stack pointer
and assigning the value. This approach is correct for all traditional data
types, but leads to unpredictable behavior for a class that has supplied its
own assignment operator. Overloaded assignment operators depend on the
destination being a valid instance of the class. With the traditional AnaGram
parser value stack, however, this is not normally the case.
Since there are many classes, such as string classes, which require
their own implementation of the assignment operator, the restriction
on returning class instances has often made reduction procedures
unnecessarily complex.
AnaGram 2.01 now has a wrapper statement which can be used to
overcome this problem. For each class specified in a wrapper
statement, AnaGram generates a wrapper class that transparently
solves the problem. The stacked object is created using the copy
constructor. The reduction procedure is called with a reference to the
stacked object rather than a copy. Wrapped objects are removed after
the reduction procedure that uses them returns.
-
Error Diagnostic Support
The error diagnostics created by the diagnose errors switch have
been revised so that their text is defined by macros which the user
can replace. There are three macros involved:
- MISSING_FORMAT. The default definition of this macro is
"Missing %s" . It is used when the parser expects a unique
input token, the name of the token exists in the token names
table, and the token is not found in the input.
- UNEXPECTED_FORMAT. The default definition of this
macro is
"Unexpected %s" . It is used when there is more
than one possible input token, but the token found is not one of
those expected.
- UNNAMED_TOKEN. The default definition is
"input" . It
is used in place of a token name in UNEXPECTED_FORMAT
when the actual input encountered cannot be identified as a
token.
Note that if diagnose errors is ON, AnaGram automatically
includes in your generated parser the array of ascii strings specified by the
TOKEN_NAMES macro, which is useful in creating diagnostics. The default
name of this array is <parser name>_token_names
-
New Attribute Statements
- extend pcb
-
The extend pcb statement is an attribute statement that allows you to
add declarations of your own to the parser control block. With this
feature, data needed by reduction procedures can be stored in the
parser control block rather than in global or static storage. This
capability greatly facilitates the construction of thread safe parsers.
The extend pcb statement may be used in any configuration section.
The format is as follows:
extend pcb { <C or C++ declaration>... }
It may, of course, extend over multiple lines and may contain any number
of C or C++ declarations of any kind. AnaGram will append it to the end of
the parser control block definition in the generated parser header file.
There may be any number of extend pcb statements. The extensions are
appended to the parser control block definition in the order in which they
occur in the syntax file.
The extend pcb statement is compatible with both C and C++ parsers.
Note that even if you are deriving your own class from the parser
control block, you might want to use extend pcb to provide virtual
function definitions or other declarations appropriate to a base class.
- wrapper
-
The wrapper attribute statement provides correct handling of C++
objects returned inline by reduction procedures.
If you specify a wrapper for a C++ object, when a reduction
procedure returns an instance of the object, a copy of the object will
be constructed on the parser value stack and the destructor will be
called when that object is removed from the stack.
Without a wrapper, objects are stored on the value stack simply by
coercing the stack pointer to the appropriate type. There is no
constructor call when the object is stored nor a destructor call when
it is removed from the stack.
Classes which use reference counts or otherwise overload the
assignment operator should always have wrappers in order to
function correctly.
Wrapper statements, like other attribute statements, must appear in
configuration sections. The syntax is:
wrapper {<comma delimited list of data types>}
For example:
[
wrapper {CString, CFont}
]
You cannot specify a wrapper for the default token type.
If your parser uses AnaGram wrappers and exits with an error condition, there
may be objects remaining on the parser value stack. If you have no further use for
these objects, you should call the DELETE_WRAPPERS macro on error exit
so that they will be properly deleted, thus avoiding a memory leak. If you
have enabled auto resynch, DELETE_WRAPPERS will be invoked automatically.
-
Changed Configuration Parameters
- Parser stack alignment
-
Parser stack alignment now defaults to long instead of int. With
this default, AnaGram parsers will compile and run on 64-bit
processors with no further attention. Users who are building parsers
for embedded systems or other uses where memory is limited may
want to override this default value with their own specification.
- Parser stack size
-
Parser stack size now defaults to 128 instead of 32. AnaGram
adjusts the parser stack size upwards, if necessary, depending on the
grammar. If your grammar uses only left recursive constructs, you
will never have a problem with parser stack overflow. If there is
center recursion or right recursion in your grammar, however, there
always exists syntactically correct input which can cause stack
overflow no matter how large the stack. Be sure that the parser stack
size is ample enough to handle all reasonable cases.
- Token names
-
Token names defaults to OFF. If it is set, AnaGram generates a
static array of character strings, indexed by token number, to provide
ASCII representations of token names for use in error diagnostics.
The array contains strings for all grammar tokens which have been
explicitly named in the syntax file as well as tokens which represent
keywords or single character constants.
Prior to version 2.01 of AnaGram, the array contained strings
for explicitly named tokens only. If this restriction is required, set the
token names only switch.
-
New Configuration Parameters
- iso latin 1
-
The iso latin 1 configuration switch defaults to ON. It controls case
conversion on input characters when the case sensitive switch is set
to OFF. When iso latin 1 is set, the default CONVERT_CASE macro
is defined to convert correctly all characters in the latin 1 character
set.
When the iso latin 1 switch is off, only characters in the ASCII range
(0-127) are converted.
- reentrant parser
-
The reentrant parser configuration switch defaults to off. If you
turn it on, AnaGram will generate code that passes the parser control
block to functions via calling sequences so they do not have to use a
static reference to find the control block.
AnaGram passes the parser control block using the macro PCB_TYPE. For example,
static void ag_ra(PCB_TYPE *pcb_pointer)
AnaGram will define PCB_TYPE as the type of the parser control block if you
do not define it otherwise. If you are using C++, and derive a class from the
parser control block, you can override the definition of PCB_TYPE in order to
make your derived class accessible from your reduction procedures.
The reentrant parser switch cannot be used in conjunction with the
old style switch.
When you have enabled the reentrant parser switch, the parse
function, the initializer function, and the parser value function are all
defined to take a pointer to the parser control block as their sole
argument.
- token names only
-
token names only defaults to OFF. This configuration
switch was added to AnaGram 2.01 to provide the functionality previously
provided by the token names switch. When token names
only is ON, only tokens which have been given explicit names in the
syntax file have non-empty strings in the generated list of character strings.
Token names only takes precedence over the token
names switch.
- no cr
-
The no cr configuration switch is provided for developers
who intend to use the generated parser on a Unix system. When
no cr is set, it causes AnaGram's
output parser and header files to be written without carriage
returns. The switch defaults to OFF, to maintain compatibility with
Windows systems.
|