CSP LR(1) Parser Generator

	SF Area/Download

	User Docs:
	Grammar
	Classes
	Tutorial
	Devel Docs:
	Internals
	Classes

CSP Grammar Format

The first thing to note is that the grammar can be changed to suit your needs. However, this requires knowledge on internal workings of the parser generator. The current format is as follows.

There are 3 kinds of lines:

Blank lines with or without comments.
Production names.
Production expansions.

Comments

Comments start with a ';' character without the quotes. Everything that follows until the end of line is a comment and is not used by the parser although it is read in. Comments can appear at the end of any line.

   ; This is an example of a comment.

Production Names

A production name is simply a name followed immediately by a ':' without the quotes on a line by itself. The production name 'S' is reserved as the start symbol. The first production name used in the grammar is the production that 'S' will reference.

Multiple production expansions can follow the production name.

myprod: ; This is an example of a production name.

Production Expansions

Production expansions are all the productions that can reduce to the production name above it. Production expansions can have many different kinds of items:

Dots as spaces.
Strings.
Production names.
Epsilon.
Action routines.
Conditional sections.
Repeating conditional sections.
Regular expressions.

Dots as Spaces

Dots or '.' without the quotes is used when spaces can appear in the input stream. It is replaced with the 'SPACES' production name and you must define this production if you use dots. You may define the SPACES production(s) to anything you wish.

Strings

Strings and characters that are allowed in the input string are simply surrounded by single or double quotes. If you need a quote in the string, use a backslash before the quote. This is also called escaping the character. You can escape all the common special characters such as n,r,b,v etc.

Production Names

Production names can be inserted to reference other productions. This should be self-explanatory.

Epsilon

If you want an empty production, simply put a '@' character without the quotes by itself on a line.

Actions Routines

Action routines start with '_action_' followed by the routine name that you want in your C++ class that defines action routines called CSAction. For more info on how to implement action routines, see the class CSAction. An example of an action routine is '_action_AddNumbers' without the quotes. This will create a function called AddNumbers in the CSAction class which you will subclass to implement it.

Conditional Sections

Instead of writing down a seperate production with epsilon every time a conditional section appears, you can just enclose that section with '[' and ']' without the quotes. Also make sure to leave spaces between the brackets and other items. The parser generator will create the extra productions with epsilon on its own. This feature allows the grammar to be more readable.

Repeating Conditional Sections

Similar to a conditional section, this one simply can appear multiple times in the input stream. Again, the parser generator will create the extra productions on its own. Use '{' and '}' instead of square brackets.

Regular Expressions

This section also applies to strings.

If you wish to use a regular expression, you must use a special format.

#{ID}#{NUMBER}#{regex or string}#{expect}#

The only item that is necessary is the regex or string. All the other sections can be left empty.

All tokens are assigned a number. If you want a token to have a specific NUMBER, you can specify it in the location where it says {NUMBER}.

In your code, remembering numbers can be error prone and sometimes, you just don't care what the number is, you just want a way to reference it. You can do this by assigning an ID to it. In the CSTokens.h file, it will define the following:

#define CSPTOK_{ID} {NUMBER}

For example, let's say we have the following in your grammar: #VARIABLE#5#![_a-zA-Z][_a-zA-Z0-9]*##

This would create a token which is identifiable with the number 5, but you can also reference it with CSPTOK_VARIABLE so that you don't need to hard code anything. You can also leave out the 5 in the grammar and CSPTOK_VARIABLE will be assigned a number automatically for your token.

Now for the regular expression and string. If you want a regular expression, it must start with the '!' character. This character will be discarded. It is only used to tell the parser that a regular expression follows. If no '!' is found, then it's a string (don't use quotes in this case).

The last item is only used during error detection. At every token read in, the parser expects X number of valid tokens. Say a variable or a number in an equation for example. But printing that the parser expected "[_a-zA-Z][_a-zA-Z0-9]*" isn't very useful. What does that mean? Not every user of your eventual parser will necessarilly know regular expressions.

Instead you insert here a human readable explanation of what the parser was expecting.

#VARIABLE#5#![_a-zA-Z][_a-zA-Z0-9]*#variable name#

It should be noted that leaving out the number 5 is the preferred way to do things as this lets the parser generator select the numbers assigned to each token.

For an example of a grammar, click here.

Webmaster: Cléo Saulnier