AnaGram Parser Generator: New features in version 2.01

Home

Trial Copy

Intro. to Parsing

Users Say...

Special Features

Notation Summary

New 2.01 Features

File Trace

Grammar Trace

Glossary

Examples

Expression evaluator (freeware)

XIDEK interpreter kit (freeware)

Lex/Yacc Comparison

If-else ambiguity

Contact Parsifal

AnaGram LALR Parser Generator:
New Features in AnaGram 2.01

Thread Safe Parsers

AnaGram 2.01 incorporates several changes designed to make it easier to write thread safe parsers.

First, the new reentrant parser switch makes the AnaGram parse engine reentrant by passing the parser control block as an argument to all function calls. Without it, the parser control block becomes a global resource, so that only one parse context can be in use at one time.

Second, the extend pcb statement allows you to add your own declarations to the parser control block, so that you can avoid references to global or static variables in your reduction procedures.

Finally, the parsers generated by AnaGram 2.01 no longer use any static or global variables to store temporary data. All working storage is now kept on the stack or in the parser control block.

These are the steps to make a parser thread safe:
- Set the reentrant parser switch in your syntax file.
- Add one or more extend pcb statements to your syntax file and include declarations for all the variables needed by your reduction procedures. Update your reduction procedures accordingly.
- If your parser will modify any variable which is not in the parser control block, make sure that variable is protected by a mutex, or otherwise synchronized properly.
- To run the parser, declare an instance of the parser control block on the stack, initialize your fields in the parser control block as appropriate, lock any relevant mutexes, and then call the parser function with a pointer to the parser control block as the argument.
Added C++ Support

In previous versions of AnaGram it has not been possible to return class instances (rather than pointers to them) from reduction procedures except under limited circumstances. This is because AnaGram generates code that stores objects on the parser value stack simply by casting the stack pointer and assigning the value. This approach is correct for all traditional data types, but leads to unpredictable behavior for a class that has supplied its own assignment operator. Overloaded assignment operators depend on the destination being a valid instance of the class. With the traditional AnaGram parser value stack, however, this is not normally the case.

Since there are many classes, such as string classes, which require their own implementation of the assignment operator, the restriction on returning class instances has often made reduction procedures unnecessarily complex.

AnaGram 2.01 now has a wrapper statement which can be used to overcome this problem. For each class specified in a wrapper statement, AnaGram generates a wrapper class that transparently solves the problem. The stacked object is created using the copy constructor. The reduction procedure is called with a reference to the stacked object rather than a copy. Wrapped objects are removed after the reduction procedure that uses them returns.
Error Diagnostic Support

The error diagnostics created by the diagnose errors switch have been revised so that their text is defined by macros which the user can replace. There are three macros involved:
- MISSING_FORMAT. The default definition of this macro is "Missing %s". It is used when the parser expects a unique input token, the name of the token exists in the token names table, and the token is not found in the input.
- UNEXPECTED_FORMAT. The default definition of this macro is "Unexpected %s". It is used when there is more than one possible input token, but the token found is not one of those expected.
- UNNAMED_TOKEN. The default definition is "input". It is used in place of a token name in UNEXPECTED_FORMAT when the actual input encountered cannot be identified as a token.
Note that if diagnose errors is ON, AnaGram automatically includes in your generated parser the array of ascii strings specified by the TOKEN_NAMES macro, which is useful in creating diagnostics. The default name of this array is
```
       <parser name>_token_names
```
New Attribute Statements
extend pcb
The extend pcb statement is an attribute statement that allows you to add declarations of your own to the parser control block. With this feature, data needed by reduction procedures can be stored in the parser control block rather than in global or static storage. This capability greatly facilitates the construction of thread safe parsers.

The extend pcb statement may be used in any configuration section. The format is as follows:
```
  extend pcb { <C or C++ declaration>... }
```
It may, of course, extend over multiple lines and may contain any number of C or C++ declarations of any kind. AnaGram will append it to the end of the parser control block definition in the generated parser header file. There may be any number of extend pcb statements. The extensions are appended to the parser control block definition in the order in which they occur in the syntax file.

The extend pcb statement is compatible with both C and C++ parsers. Note that even if you are deriving your own class from the parser control block, you might want to use extend pcb to provide virtual function definitions or other declarations appropriate to a base class.
wrapper
The wrapper attribute statement provides correct handling of C++ objects returned inline by reduction procedures.

If you specify a wrapper for a C++ object, when a reduction procedure returns an instance of the object, a copy of the object will be constructed on the parser value stack and the destructor will be called when that object is removed from the stack.

Without a wrapper, objects are stored on the value stack simply by coercing the stack pointer to the appropriate type. There is no constructor call when the object is stored nor a destructor call when it is removed from the stack.

Classes which use reference counts or otherwise overload the assignment operator should always have wrappers in order to function correctly.

Wrapper statements, like other attribute statements, must appear in configuration sections. The syntax is:
```
  wrapper {<comma delimited list of data types>}
```
For example:
```
   [
      wrapper {CString, CFont}
   ]
```
You cannot specify a wrapper for the default token type.
If your parser uses AnaGram wrappers and exits with an error condition, there may be objects remaining on the parser value stack. If you have no further use for these objects, you should call the DELETE_WRAPPERS macro on error exit so that they will be properly deleted, thus avoiding a memory leak. If you have enabled auto resynch, DELETE_WRAPPERS will be invoked automatically.
Changed Configuration Parameters

Parser stack alignment

Parser stack alignment now defaults to long instead of int. With this default, AnaGram parsers will compile and run on 64-bit processors with no further attention. Users who are building parsers for embedded systems or other uses where memory is limited may want to override this default value with their own specification.

Parser stack size

Parser stack size now defaults to 128 instead of 32. AnaGram adjusts the parser stack size upwards, if necessary, depending on the grammar. If your grammar uses only left recursive constructs, you will never have a problem with parser stack overflow. If there is center recursion or right recursion in your grammar, however, there always exists syntactically correct input which can cause stack overflow no matter how large the stack. Be sure that the parser stack size is ample enough to handle all reasonable cases.

Token names

Token names defaults to OFF. If it is set, AnaGram generates a static array of character strings, indexed by token number, to provide ASCII representations of token names for use in error diagnostics.

The array contains strings for all grammar tokens which have been explicitly named in the syntax file as well as tokens which represent keywords or single character constants.

Prior to version 2.01 of AnaGram, the array contained strings for explicitly named tokens only. If this restriction is required, set the token names only switch.
New Configuration Parameters
iso latin 1

The iso latin 1 configuration switch defaults to ON. It controls case conversion on input characters when the case sensitive switch is set to OFF. When iso latin 1 is set, the default CONVERT_CASE macro is defined to convert correctly all characters in the latin 1 character set.

When the iso latin 1 switch is off, only characters in the ASCII range (0-127) are converted.

reentrant parser
The reentrant parser configuration switch defaults to off. If you turn it on, AnaGram will generate code that passes the parser control block to functions via calling sequences so they do not have to use a static reference to find the control block.
AnaGram passes the parser control block using the macro PCB_TYPE. For example,
```
  static void ag_ra(PCB_TYPE *pcb_pointer)
```
AnaGram will define PCB_TYPE as the type of the parser control block if you do not define it otherwise. If you are using C++, and derive a class from the parser control block, you can override the definition of PCB_TYPE in order to make your derived class accessible from your reduction procedures. The reentrant parser switch cannot be used in conjunction with the old style switch.
When you have enabled the reentrant parser switch, the parse function, the initializer function, and the parser value function are all defined to take a pointer to the parser control block as their sole argument.
token names only

token names only defaults to OFF. This configuration switch was added to AnaGram 2.01 to provide the functionality previously provided by the token names switch. When token names only is ON, only tokens which have been given explicit names in the syntax file have non-empty strings in the generated list of character strings. Token names only takes precedence over the token names switch.

no cr

The no cr configuration switch is provided for developers who intend to use the generated parser on a Unix system. When no cr is set, it causes AnaGram's output parser and header files to be written without carriage returns. The switch defaults to OFF, to maintain compatibility with Windows systems.

Links to: Home page | Trial Copy | Syntax Directed Parsing | Glossary

AnaGram LALR Parser Generator: New Features in AnaGram 2.01

Thread Safe Parsers

Added C++ Support

Error Diagnostic Support

New Attribute Statements

Changed Configuration Parameters

New Configuration Parameters

AnaGram LALR Parser Generator:
New Features in AnaGram 2.01