Why does this simple grammar have a shift/reduce conflict? - grammar

%token <token> PLUS MINUS INT
%left PLUS MINUS
THIS WORKS:
exp : exp PLUS exp;
exp : exp MINUS exp;
exp : INT;
THIS HAS 2 SHIFT/REDUCE CONFLICTS:
exp : exp binaryop exp;
exp : INT;
binaryop: PLUS | MINUS ;
WHY?

This is because the second is in fact ambiguous. So is the first grammar, but you resolved the ambiguity by adding %left.
This %left does not work in the second grammar, because associativity and precedence are not inherited from rule to rule. I.e. the binaryop nonterminal does not inherit any such thing even though it produces PLUS and MINUS. Associativity and predecence are localized to a rule, and revolve around terminal symbols.
We cannot do %left binaryop, but we can slightly refactor the grammar:
exp : exp binaryop term
exp : term;
term : INT;
binaryop: PLUS | MINUS ;
That has no conflicts now because it is implicitly left-associative. I.e. the production of a longer and longer expression can only happen on the left side of the binaryop, because the right side is a term which produces only an INT.

You need to specify a precedence for the exp binop exp rule if you want the precedence rules to resolve the ambiguity:
exp : exp binaryop exp %prec PLUS;
With that change, all the conflicts are resolved.
Edit
The comments seem to indicate some confusion as to what the precedence rules in yacc/bison do.
The precedence rules are a way of semi-automatically resolving shift/reduce conflicts in the grammar. They're only semi-automatic in that you have to know what you are doing when you specify the precedences.
Bascially, whenever there is a shift/reduce conflict between a token to be shifted and a rule to be reduced, yacc compares the precedence of the token to be shifted and the rule to be reduced, and -- as long as both have assigned precedences -- does whichever is higher precedence. If either the token or the rule has no precedence assigned, then the conflict is reported to the user.
%left/%right/%nonassoc come into the picture when the token and rule have the SAME precedence. In that case %left means do the reduce, %right means do the shift, and %nonassoc means do neither, causing a syntax error at runtime if the parser runs into this case.
The precedence levels themselves are assigned to tokens with%left/%right/%nonassoc and to rules with %prec. The only oddness is that rules with no %prec and at least one terminal on the RHS get the precedence of the last terminal on the RHS. This can sometimes end up assigning precedences to rules that you really don't want to have precedence, which can sometimes result in hiding conflicts due to resolving them incorrectly. You can avoid these problems by adding an extra level of indirection in the rule in question -- change the problematic terminal on the RHS to to a new non-terminal that expands to just that terminal.

I assume that this falls under what the Bison manual calls "Mysterious Conflicts". You can replicate that with:
exp: exp plus exp;
exp: exp minus exp;
exp: INT;
plus: PLUS;
minus: MINUS;
which gives four S/R conflicts for me.

The output file describing the conflicted grammar produced by Bison (version 2.3) on Linux is as follows. The key information at the top is 'State 7 has conflicts'.
State 7 conflicts: 2 shift/reduce
Grammar
0 $accept: exp $end
1 exp: exp binaryop exp
2 | INT
3 binaryop: PLUS
4 | MINUS
Terminals, with rules where they appear
$end (0) 0
error (256)
PLUS (258) 3
MINUS (259) 4
INT (260) 2
Nonterminals, with rules where they appear
$accept (6)
on left: 0
exp (7)
on left: 1 2, on right: 0 1
binaryop (8)
on left: 3 4, on right: 1
state 0
0 $accept: . exp $end
INT shift, and go to state 1
exp go to state 2
state 1
2 exp: INT .
$default reduce using rule 2 (exp)
state 2
0 $accept: exp . $end
1 exp: exp . binaryop exp
$end shift, and go to state 3
PLUS shift, and go to state 4
MINUS shift, and go to state 5
binaryop go to state 6
state 3
0 $accept: exp $end .
$default accept
state 4
3 binaryop: PLUS .
$default reduce using rule 3 (binaryop)
state 5
4 binaryop: MINUS .
$default reduce using rule 4 (binaryop)
state 6
1 exp: exp binaryop . exp
INT shift, and go to state 1
exp go to state 7
And here is the information about 'State 7':
state 7
1 exp: exp . binaryop exp
1 | exp binaryop exp .
PLUS shift, and go to state 4
MINUS shift, and go to state 5
PLUS [reduce using rule 1 (exp)]
MINUS [reduce using rule 1 (exp)]
$default reduce using rule 1 (exp)
binaryop go to state 6
The trouble is described by the . markers in the the lines marked 1. For some reason, the %left is not 'taking effect' as you'd expect, so Bison identifies a conflict when it has read exp PLUS exp and finds a PLUS or MINUS after it. In such cases, Bison (and Yacc) do the shift rather than the reduce. In this context, that seems to me to be tantamount to giving the rules right precedence.
Changing the %left to %right and omitting it do not change the result (in terms of the conflict warnings). I also tried Yacc on Solaris and it produce essentially the same conflict.
So, why does the first grammar work? Here's the output:
Grammar
0 $accept: exp $end
1 exp: exp PLUS exp
2 | exp MINUS exp
3 | INT
Terminals, with rules where they appear
$end (0) 0
error (256)
PLUS (258) 1
MINUS (259) 2
INT (260) 3
Nonterminals, with rules where they appear
$accept (6)
on left: 0
exp (7)
on left: 1 2 3, on right: 0 1 2
state 0
0 $accept: . exp $end
INT shift, and go to state 1
exp go to state 2
state 1
3 exp: INT .
$default reduce using rule 3 (exp)
state 2
0 $accept: exp . $end
1 exp: exp . PLUS exp
2 | exp . MINUS exp
$end shift, and go to state 3
PLUS shift, and go to state 4
MINUS shift, and go to state 5
state 3
0 $accept: exp $end .
$default accept
state 4
1 exp: exp PLUS . exp
INT shift, and go to state 1
exp go to state 6
state 5
2 exp: exp MINUS . exp
INT shift, and go to state 1
exp go to state 7
state 6
1 exp: exp . PLUS exp
1 | exp PLUS exp .
2 | exp . MINUS exp
$default reduce using rule 1 (exp)
state 7
1 exp: exp . PLUS exp
2 | exp . MINUS exp
2 | exp MINUS exp .
$default reduce using rule 2 (exp)
The difference seems to be that in states 6 and 7, it is able to distinguish what to do based on what comes next.
One way of fixing the problem is:
%token <token> PLUS MINUS INT
%left PLUS MINUS
%%
exp : exp binaryop term;
exp : term;
term : INT;
binaryop: PLUS | MINUS;

Related

Yacc conflict i cant fix

i've been trying to fix a shift/reduce conflict in my yacc specification and i can't seem to find where it is.
%union{
char* valueBase;
char* correspondencia;
}
%token pal palT palC
%type <valueBase> pal
%type <correspondencia> palT palC Smth
%%
Dicionario : Traducao
| Dicionario Traducao
;
Traducao : Palavra Correspondencia
;
Palavra : Base Delim
| Exp
;
Delim :
| ':'
;
Correspondencia :
| palC {printf("PT Tradução: %s\n",$1);}
;
Exp : Smth '-' Smth {aux = yylval.valueBase; printf("PT Tradução: %s %s %s\n", $1, aux, $3);}
;
Smth : palT {$$ = strdup($1);}
| {$$ = "";}
;
Base : pal {printf("EN Palavra base: %s\n",$1);}
;
Any help to find and fix this conflict would be extremely appreciated.
So looking at the y.output file from your grammar, you have a shift/reduce conflict in state 13:
State 13
10 Exp: Smth '-' . Smth
palT shift, and go to state 2
palT [reduce using rule 12 (Smth)]
$default reduce using rule 12 (Smth)
Smth go to state 16
Basically, what this is saying is that when parsing an Exp after having seen a Smth '-' and looking at a lookahead of palT, it doesn't know whether it should reduce an empty Smth to finish the Exp (leaving the palT as part of some later construct) OR shift the palT so it can then be reduced (recognized) as a Smth that completes this Exp.
The language you are recognizing is a sequence of one or more Traducao, each of which consists of a Palavra followed by an optional palC (Correspondencia that may be a palC or empty). That means that you might have a Palavra directly following another Palavra (the Correspondencia for the first one is empty). So the parser needs to find the boundary between one Palavra and the next just by looking at its current state and one token of lookahead, which is a problem.
In particular, when you have an input like PalT '-' PalT '-' PalT, that is two consecutive Palavra, but it is not clear whether the middle PalT belongs to the first one or the second. It is ambiguous, because it could be parsed successfully either way.
If you want the parser to just accept as much as possible into the first Palavra, then you can just accept the default resolution (of shift). If that is wrong and you would want the other interpretation, then you are going to need more lookahead to recognize this case, as it depends on whether or not there is a second '-' after the second palT or something else.

How do I find reduce/shift conflicts in my yacc code?

I'm writing a flex/yacc program that should read some tokens and easy grammar using cygwin.
I'm guessing something is wrong with my BNF grammar, but I can't seem to locate the problem. Below is some code
%start statement_list
%%
statement_list: statement
|statement_list statement
;
statement: condition|simple|loop|call_func|decl_array|decl_constant|decl_var;
call_func: IDENTIFIER'('ID_LIST')' {printf("callfunc\n");} ;
ID_LIST: IDENTIFIER
|ID_LIST','IDENTIFIER
;
simple: IDENTIFIER'['NUMBER']' ASSIGN expr
|PRINT STRING
|PRINTLN STRING
|RETURN expr
;
bool_expr: expr E_EQUAL expr
|expr NOT_EQUAL expr
|expr SMALLER expr
|expr E_SMALLER expr
|expr E_LARGER expr
|expr LARGER expr
|expr E_EQUAL bool
|expr NOT_EQUAL bool
;
expr: expr ADD expr {$$ = $1+$3;}
| expr MULT expr {$$ = $1-$3;}
| expr MINUS expr {$$ = $1*$3;}
| expr DIVIDE expr {if($3 == 0) yyerror("divide by zero");
else $$ = $1 / $3;}
|expr ASSIGN expr
| NUMBER
| IDENTIFIER
;
bool: TRUE
|FALSE
;
decl_constant: LET IDENTIFIER ASSIGN expr
|LET IDENTIFIER COLON "bool" ASSIGN bool
|LET IDENTIFIER COLON "Int" ASSIGN NUMBER
|LET IDENTIFIER COLON "String" ASSIGN STRING
;
decl_var: VAR IDENTIFIER
|VAR IDENTIFIER ASSIGN NUMBER
|VAR IDENTIFIER ASSIGN STRING
|VAR IDENTIFIER ASSIGN bool
|VAR IDENTIFIER COLON "Bool" ASSIGN bool
|VAR IDENTIFIER COLON "Int" ASSIGN NUMBER
|VAR IDENTIFIER COLON "String" ASSIGN STRING
;
decl_array: VAR IDENTIFIER COLON "Int" '[' NUMBER ']'
|VAR IDENTIFIER COLON "Bool" '[' NUMBER ']'
|VAR IDENTIFIER COLON "String" '[' NUMBER ']'
;
condition: IF '(' bool_expr ')' statement ELSE statement;
loop: WHILE '(' bool_expr ')' statement;
I've tried changing statement into
statement:';';
,reading a simple token to test if it works, but it seems like my code refuses to enter that part of the grammar.
Also when I compile it, it tells me there are 18 shift/reduce conflicts. Should I try to locate and solve all of them ?
EDIT: I have edited my code using Chris Dodd's answer, trying to solve each conflict by looking at the output file. The last few conflicts seem to be located in the below code.
expr: expr ADD expr {$$ = $1+$3;}
| expr MULT expr {$$ = $1-$3;}
| expr MINUS expr {$$ = $1*$3;}
| expr DIVIDE expr {if($3 == 0) yyerror("divide by zero");
else $$ = $1 / $3;}
|expr ASSIGN expr
| NUMBER
| IDENTIFIER
;
And here is part of the output file telling me what's wrong.
state 60
28 expr: expr . ADD expr
29 | expr . MULT expr
30 | expr . MINUS expr
31 | expr . DIVIDE expr
32 | expr . ASSIGN expr
32 | expr ASSIGN expr .
ASSIGN shift, and go to state 36
ADD shift, and go to state 37
MINUS shift, and go to state 38
MULT shift, and go to state 39
DIVIDE shift, and go to state 40
ASSIGN [reduce using rule 32 (expr)]
ADD [reduce using rule 32 (expr)]
MINUS [reduce using rule 32 (expr)]
MULT [reduce using rule 32 (expr)]
DIVIDE [reduce using rule 32 (expr)]
$default reduce using rule 32 (expr)
I don't understand, why would it choose rule 32 when it read ADD, MULT, DIVIDE or other tokens ? What's wrong with this part of my grammar?
Also, even though that above part of the grammar is wrong, shouldn't my compiler be able to read other grammar correctly ? For instance,
let a = 5
should be readable, yet the program returns syntax error ?
Your grammar looks reasonable, though it does have ambiguities in expressions, most of which could be solved by precedence. You should definitely look at ALL the conflicts reported and understand why they occur, and preferrably change the grammar to get rid of them.
As for your specific issue, if you change it to have statement: ';' ;, it should accept that. You don't show any of your lexing code, so the problem may be there. It may be helpful to compile you parser with -DYYDEBUG=1 to enable the debugging code generated by yacc/bison, and set the global variable yydebug to 1 before calling yyparse, which will dump a trace of everything the parser is doing to stderr.

A yacc shift/reduce conflict on an unambiguous grammar

A piece of code of my gramamar its driveing me crazy.
I have to write a grammar that allow write functions with multiple inputs
e.g.
function
begin
a:
<statments>
b:
<statements>
end
The problem with that its that is statements that are assignments like this
ID = Expresion.
in the following quote you can see the output produced by yacc.
0 $accept : InstanciasFuncion $end
1 InstanciasFuncion : InstanciasFuncion InstanciaFuncion
2 | InstanciaFuncion
3 InstanciaFuncion : PuntoEntrada Sentencias
4 PuntoEntrada : ID ':'
5 Sentencias : Sentencias Sentencia
6 | Sentencia
7 Sentencia : ID '=' ID
State 0
0 $accept: . InstanciasFuncion $end
ID shift, and go to state 1
InstanciasFuncion go to state 2
InstanciaFuncion go to state 3
PuntoEntrada go to state 4
State 1
4 PuntoEntrada: ID . ':'
':' shift, and go to state 5
State 2
0 $accept: InstanciasFuncion . $end
1 InstanciasFuncion: InstanciasFuncion . InstanciaFuncion
$end shift, and go to state 6
ID shift, and go to state 1
InstanciaFuncion go to state 7
PuntoEntrada go to state 4
State 3
2 InstanciasFuncion: InstanciaFuncion .
$default reduce using rule 2 (InstanciasFuncion)
State 4
3 InstanciaFuncion: PuntoEntrada . Sentencias
ID shift, and go to state 8
Sentencias go to state 9
Sentencia go to state 10
State 5
4 PuntoEntrada: ID ':' .
$default reduce using rule 4 (PuntoEntrada)
State 6
0 $accept: InstanciasFuncion $end .
$default accept
State 7
1 InstanciasFuncion: InstanciasFuncion InstanciaFuncion .
$default reduce using rule 1 (InstanciasFuncion)
State 8
7 Sentencia: ID . '=' ID
'=' shift, and go to state 11
State 9
3 InstanciaFuncion: PuntoEntrada Sentencias .
5 Sentencias: Sentencias . Sentencia
ID shift, and go to state 8
ID [reduce using rule 3 (InstanciaFuncion)]
$default reduce using rule 3 (InstanciaFuncion)
Sentencia go to state 12
State 10
6 Sentencias: Sentencia .
$default reduce using rule 6 (Sentencias)
State 11
7 Sentencia: ID '=' . ID
ID shift, and go to state 13
State 12
5 Sentencias: Sentencias Sentencia .
$default reduce using rule 5 (Sentencias)
State 13
7 Sentencia: ID '=' ID .
$default reduce using rule 7 (Sentencia)
Maybe somebody can help me to disambiguate this grammar
Bison provides you with at least a hint. In State 9, which is really the only relevant part of the output other than the grammar itself, we see:
State 9
3 InstanciaFuncion: PuntoEntrada Sentencias .
5 Sentencias: Sentencias . Sentencia
ID shift, and go to state 8
ID [reduce using rule 3 (InstanciaFuncion)]
$default reduce using rule 3 (InstanciaFuncion)
Sentencia go to state 12
There's a shift/reduce conflict with ID, in the context in which the possibilities are:
Complete the parse of an InstanciaFuncion (reduce)
Continue the parse of a Sentencias (shift)
In both of those contexts, an ID is possible. It's easy to construct an example. Consider these two instancias:
f : a = b c = d ...
f : a = b c : d = ...
We've finished with the b and c is the lookahead, so we can't see the symbol which follows the c. Now, have we finished parsing the funcion f? Or should we try for a longer list of sentencias? No se sabe. (Nobody knows.)
Yes, your grammar is unambiguous, so it doesn't need to be disambiguated. It's not LR(1), though: you cannot tell what to do by only looking at the next one symbol. However, it is LR(2), and there is a proof than any LR(2) grammar has a corresponding LR(1) grammar. (For any value of 2 :) ). But, unfortunately, actually doing the transformation is not always very pretty. It can be done mechanically, but the resulting grammar can be hard to read. (See Notes below for references.)
In your case, it's pretty easy to find an equivalent grammar, but the parse tree will need to be adjusted. Here's one example:
InstanciasFuncion : PuntoEntrada
| InstanciasFuncion PuntoEntrada
| InstanciasFuncion Sentencia
PuntoEntrada: ID ':' Sentencia
Sentencia : ID '=' ID
It's a curious fact that this precise shift/reduce conflict is a feature of the grammar of bison itself, since bison accepts grammars as written above (i.e. without semi-colons). Posix insists that yacc do so, and bison tries to emulate yacc. Bison itself solves this problem in the scanner, not in the grammar: it's scanner recognizes "ID :" as a single token (even if separated with arbitrary whitespace). That might also be your best bet.
There is an excellent description of the proof than any LR(k) grammar can be covered by an LR(1) grammar, including the construction technique and a brief description of how to recover the original parse tree, in Sippu & Soisalon-Soininen, Parsing Theory, Vol. II (Springer Verlag, 1990) (Amazon). This two-volume set is a great reference for theoreticians, and has a lot of valuable practical information, but its heavy reading and its also a serious investment. If you have a university library handy, there should be a copy of it available. The algorithm presented is due to MD Mickunas, and was published in 1976 in JACM 23:17-30 (paywalled), which you should also be able to find in a good university library. Failing that, I found a very abbreviated description in Richard Marion Schell's thesis.
Personally, I wouldn't bother with all that, though. Either use a GLR parser, or use the same trick bison uses for the same purpose. Or use the simple grammar in the answer above and fiddle with the AST afterwards; it's not really difficult.

byacc shift/reduce

I'm having trouble figuring this one out as well as the shift reduce problem.
Adding ';' to the end doesn't solve the problem since I can't change the language, it needs to go just as the following example. Does any prec operand work?
The example is the following:
A variable can be declared as: as a pointer or int as integer, so, both of this are valid:
<int> a = 0
int a = 1
the code goes:
%left '<'
declaration: variable
| declaration variable
variable : type tNAME '=' expr
| type tNAME
type : '<' type '>'
| tINT
expr : tINTEGER
| expr '<' expr
It obviously gives a shift/reduce problem afer expr. since it can shift for expr of "less" operator or reduce for another variable declaration.
I want precedence to be given on variable declaration, and have tried to create a %nonassoc prec_aux and put after '<' type '>' %prec prec_aux and after type tNAME but it doesn't solve my problem :S
How can I solve this?
Output was:
Well cant figure hwo to post linebreaks and code on reply... so here it goes the output:
35: shift/reduce conflict (shift 47, reduce 7) on '<'
state 35
variable : type tNAME '=' expr . (7)
expr : expr . '+' expr (26)
expr : expr . '-' expr (27)
expr : expr . '*' expr (28)
expr : expr . '/' expr (29)
expr : expr . '%' expr (30)
expr : expr . '<' expr (31)
expr : expr . '>' expr (32)
'>' shift 46
'<' shift 47
'+' shift 48
'-' shift 49
'*' shift 50
'/' shift 51
'%' shift 52
$end reduce 7
tINT reduce 7
Thats the output and the error seems the one I mentioned.
Does anyone know a different solution, other than adding a new terminal to the language that isn't really an option?
I think the resolution is to rewrite the grammar so it can lookahead somehow and see if its a type or expr after the '<' but I'm not seeing how to.
Precedence is unlikely to work since its the same character. Is there a way to give precendence for types that we define? such as declaration?
Thanks in advance
Your grammar gets confused in text like this:
int a = b
<int> c
That '<' on the second line could be part of an expression in the first declaration. It would have to look ahead further to find out.
This is the reason most languages have a statement terminator. This produces no conflicts:
%%
%token tNAME;
%token tINT;
%token tINTEGER;
%token tTERM;
%left '<';
declaration: variable
| declaration variable
variable : type tNAME '=' expr tTERM
| type tNAME tTERM
type : '<' type '>'
| tINT
expr : tINTEGER
| expr '<' expr
It helps when creating a parser to know how to design a grammar to eliminate possible conflicts. For that you would need an understanding of how parsers work, which is outside the scope of this answer :)
The basic problem here is that you need more lookahead than the 1 token you get with yacc/bison. When the parser sees a < it has no way of telling whether its done with the preivous declaration and its looking at the beginning of a bracketed type, or if this is a less-than operator. There's two basic things you can do here:
Use a parsing method such as bison's %glr-parser option or btyacc, which can deal with non-LR(1) grammars
Use the lexer to do extra lookahead and return disambiguating tokens
For the latter, you would have the lexer do extra lookahead after a '<' and return a different token if its followed by something that looks like a type. The easiest is to use flex's / lookahead operator. For example:
"<"/[ \t\n\r]*"<" return OPEN_ANGLE;
"<"/[ \t\n\r]*"int" return OPEN_ANGLE;
"<" return '<';
Then you change your bison rules to expect OPEN_ANGLE in types instead of <:
type : OPEN_ANGLE type '>'
| tINT
expr : tINTEGER
| expr '<' expr
For more complex problems, you can use flex start states, or even insert an entire token filter/transform pass between the lexer and the parser.
Here is the fix, but not entirely satisfactory:
%{
%}
%token tNAME tINT tINTEGER
%left '<'
%left '+'
%nonassoc '=' /* <-- LOOK */
%%
declaration: variable
| declaration variable
variable : type tNAME '=' expr
| type tNAME
type : '<' type '>'
| tINT
expr : tINTEGER
| expr '<' expr
| expr '+' expr
;
This issue is a conflict between these two LR items: the dot-final:
variable : type tNAME '=' expr_no_less .
and this one:
expr : expr . '<' expr
Notice that these two have different operators. It is not, as you seem to think, a conflict between different productions involving the '<' operator.
By adding = to the precedence ranking, we fix the issue in the sense that the conflict diagnostic goes away.
Note that I gave = a high precedence. This will resolve the conflict by favoring the reduce. This means that you cannot use a '<' expression as an initializer:
int name = 4 < 3 // syntax error
When the < is seen, the int name = 4 wants to be reduced, and the idea is that < must be the start of the next declaration, as part of a type production.
To allow < relational expressions to be used as initializers, add the support for parentheses into the expression grammar. Then users can parenthesize:
int foo = (4 < 3) <int> bar = (2 < 1)
There is no way to fix that without a more powerful parsing method or hacks.
What if you move the %nonassoc before %left '<', giving it low precedence? Then the shift will be favored. Unfortunately, that has the consequence that you cannot write another <int> declaration after a declaration.
int foo = 3 <int> bar = 4
^ // error: the machine shifted and is now doing: expr '<' . expr.
So that is the wrong way to resolve the conflict; you want to be able to write multiple such declarations.
Another Note:
My TXR language, which implements something equivalent to Parse Expression Grammars handles this grammar fine. This is essentially LL(infinite), which trumps LALR(1).
We don't even have to have a separate lexical analyzer and parser! That's just something made necessary by the limitations of one-symbol-lookahead, and the need for utmost efficiency on 1970's hardware.
Example output from shell command line, demonstrating the parse by translation to a Lisp-like abstract syntax tree, which is bound to the variable dl (declaration list). So this is complete with semantic actions, yielding an output that can be further processed in TXR Lisp. Identifiers are translated to Lisp symbols via calls to intern and numbers are translated to number objects also.
$ txr -l type.txr -
int x = 3 < 4 int y
(dl (decl x int (< 3 4)) (decl y int nil))
$ txr -l type.txr -
< int > x = 3 < 4 < int > y
(dl (decl x (pointer int) (< 3 4)) (decl y (pointer int) nil))
$ txr -l type.txr -
int x = 3 + 4 < 9 < int > y < int > z = 4 + 3 int w
(dl (decl x int (+ 3 (< 4 9))) (decl y (pointer int) nil)
(decl z (pointer int) (+ 4 3)) (decl w int nil))
$ txr -l type.txr -
<<<int>>>x=42
(dl (decl x (pointer (pointer (pointer int))) 42))
The source code of (type.txr):
#(define ws)#/[ \t]*/#(end)
#(define int)#(ws)int#(ws)#(end)
#(define num (n))#(ws)#{n /[0-9]+/}#(ws)#(filter :tonumber n)#(end)
#(define id (id))#\
#(ws)#{id /[A-Za-z_][A-Za-z_0-9]*/}#(ws)#\
#(set id #(intern id))#\
#(end)
#(define type (ty))#\
#(local l)#\
#(cases)#\
#(int)#\
#(bind ty #(progn 'int))#\
#(or)#\
<#(type l)>#\
#(bind ty #(progn '(pointer ,l)))#\
#(end)#\
#(end)
#(define expr (e))#\
#(local e1 op e2)#\
#(cases)#\
#(additive e1)#{op /[<>]/}#(expr e2)#\
#(bind e #(progn '(,(intern op) ,e1 ,e2)))#\
#(or)#\
#(additive e)#\
#(end)#\
#(end)
#(define additive (e))#\
#(local e1 op e2)#\
#(cases)#\
#(num e1)#{op /[+\-]/}#(expr e2)#\
#(bind e #(progn '(,(intern op) ,e1 ,e2)))#\
#(or)#\
#(num e)#\
#(end)#\
#(end)
#(define decl (d))#\
#(local type id expr)#\
#(type type)#(id id)#\
#(maybe)=#(expr expr)#(or)#(bind expr nil)#(end)#\
#(bind d #(progn '(decl ,id ,type ,expr)))#\
#(end)
#(define decls (dl))#\
#(coll :gap 0)#(decl dl)#(end)#\
#(end)
#(freeform)
#(decls dl)

ANTLR grammar problem with parenthetical expressions

I'm using ANTLRWorks 1.4.2 to create a simple grammar for the purpose of evaluating an user-provided expression as boolean result. This ultimately will be part of a larger grammar, but I have some questions about this current fragment. I want users to be able to use expressions such as:
2 > 1
2 > 1 and 3 < 1
(2 > 1 or 1 < 3) and 4 > 1
(2 > 1 or 1 < 3) and (4 > 1 or (2 < 1 and 3 > 1))
The first two expressions are legal in my grammar, but the last two are not, and I am not sure why. Also, ANTLRworks seems to suggest that input such as ((((1 > 2) with mismatched parentheses is legal, and I am not sure why. So, I seem to be missing out on some insight into the right way to handle parenthetical grouping in a grammar.
How can I change my grammar to properly handle parentheses?
My grammar is below:
grammar conditional_test;
boolean
: boolean_value_expression
EOF
;
boolean_value_expression
: boolean_term (OR boolean_term)*
EOF
;
boolean_term
: boolean_factor (AND boolean_factor)*
;
boolean_factor
: (NOT)? boolean_test
;
boolean_test
: predicate
;
predicate
: expression relational_operator expression
| LPAREN boolean_value_expression RPAREN
;
relational_operator
: EQ
| LT
| GT
;
expression
: NUMBER
;
LPAREN : '(';
RPAREN : ')';
NUMBER : '0'..'9'+;
EQ : '=';
GT : '>';
LT : '<';
AND : 'and';
OR : 'or' ;
NOT : 'not';
Chris Farmer wrote:
The first two expressions are legal in my grammar, but the last two are not, and I am not sure why. ...
You should remove the EOF token from:
boolean_value_expression
: boolean_term (OR boolean_term)*
EOF
;
You normally only use the EOF after the entry point of your grammar (boolean in your case). Be careful boolean is a reserved word in Java and can therefor not be used as a parser rule!
So the first two rules should look like:
bool
: boolean_value_expression
EOF
;
boolean_value_expression
: boolean_term (OR boolean_term)*
;
And you may also want to ignore literal spaces by adding the following lexer rule:
SPACE : ' ' {$channel=HIDDEN;};
(you can include tabs an line breaks, of course)
Now all of your example input matches properly (tested with ANTLRWorks 1.4.2 as well).
Chris Farmer wrote:
Also, ANTLRworks seems to suggest that input such as ((((1 > 2) with mismatched parentheses is legal, ...
No, ANTLRWorks does produce errors, perhaps not very noticeable ones. The parse tree ANTLRWorks produces has a NoViableAltException as a leaf, and there are some errors on the "Console" tab.