CSP Grammar Format
The first thing to note is that the grammar can be changed to suit your needs. However,
this requires knowledge on internal workings of the parser generator. The current
format is as follows.
There are 3 kinds of lines:
Comments
Comments start with a ';' character without the quotes. Everything that follows until
the end of line is a comment and is not used by the parser although it is read in.
Comments can appear at the end of any line.
; This is an example of a comment.
Production Names
A production name is simply a name followed immediately by a ':' without the quotes
on a line by itself. The production name 'S' is reserved as the start symbol. The first
production name used in the grammar is the production that 'S' will reference.
Multiple production expansions can follow the production name.
myprod: ; This is an example of a production name.
Production Expansions
Production expansions are all the productions that can reduce to the production name above it.
Production expansions can have many different kinds of items:
Dots as Spaces
Dots or '.' without the quotes is used when spaces can appear in the input stream.
It is replaced with the 'SPACES' production name and you must define this production
if you use dots. You may define the SPACES production(s) to anything you wish.
Strings
Strings and characters that are allowed in the input string are simply surrounded
by single or double quotes. If you need a quote in the string, use a backslash
before the quote. This is also called escaping the character. You can escape all
the common special characters such as n,r,b,v etc.
Production Names
Production names can be inserted to reference other productions. This should
be self-explanatory.
Epsilon
If you want an empty production, simply put a '@' character without the quotes
by itself on a line.
Actions Routines
Action routines start with '_action_' followed by the routine name that you want
in your C++ class that defines action routines called CSAction. For more info on how
to implement action routines, see the class CSAction. An example of an action routine
is '_action_AddNumbers' without the quotes. This will create a function called
AddNumbers in the CSAction class which you will subclass to implement it.
Conditional Sections
Instead of writing down a seperate production with epsilon every time a conditional section
appears, you can just enclose that section with '[' and ']' without the quotes. Also
make sure to leave spaces between the brackets and other items. The parser generator
will create the extra productions with epsilon on its own. This feature allows the
grammar to be more readable.
Repeating Conditional Sections
Similar to a conditional section, this one simply can appear multiple times in the input stream.
Again, the parser generator will create the extra productions on its own. Use '{' and '}'
instead of square brackets.
Regular Expressions
This section also applies to strings.
If you wish to use a regular expression, you must use a special format.
#{ID}#{NUMBER}#{regex or string}#{expect}#
The only item that is necessary is the regex or string. All the other sections
can be left empty.
All tokens are assigned a number. If you want a token to have a specific NUMBER,
you can specify it in the location where it says {NUMBER}.
In your code, remembering numbers can be error prone and sometimes, you just don't
care what the number is, you just want a way to reference it. You can do this by assigning
an ID to it. In the CSTokens.h file, it will define the following:
#define CSPTOK_{ID} {NUMBER}
For example, let's say we have the following in your grammar: #VARIABLE#5#![_a-zA-Z][_a-zA-Z0-9]*##
This would create a token which is identifiable with the number 5, but you can also
reference it with CSPTOK_VARIABLE so that you don't need to hard code anything. You
can also leave out the 5 in the grammar and CSPTOK_VARIABLE will be assigned a number
automatically for your token.
Now for the regular expression and string. If you want a regular expression, it must start
with the '!' character. This character will be discarded. It is only used to tell the parser
that a regular expression follows. If no '!' is found, then it's a string (don't use quotes in
this case).
The last item is only used during error detection. At every token read in, the parser
expects X number of valid tokens. Say a variable or a number in an equation for example.
But printing that the parser expected "[_a-zA-Z][_a-zA-Z0-9]*" isn't very useful. What does
that mean? Not every user of your eventual parser will necessarilly know regular expressions.
Instead you insert here a human readable explanation of what the parser was expecting.
#VARIABLE#5#![_a-zA-Z][_a-zA-Z0-9]*#variable name#
It should be noted that leaving out the number 5 is the preferred way to do things as this lets
the parser generator select the numbers assigned to each token.
For an example of a grammar, click here.
Webmaster: Cléo Saulnier
|