Bison: Production ignoring required syntax - grammar

I have the following rule in my .y file:
statement:
expression |
REDUCE operator reductions ENDREDUCE |
IF expression THEN statement_ ELSE statement_ ENDIF |
CASE expression IS cases OTHERS ARROW statement_ ENDCASE
;
cases:
case cases |
;
case:
WHEN INT_LITERAL ARROW statement_
;
The cases statements is a list of case statements. After the cases, the OTHERS ARROW statement_ portion is required as a default (like switch/case in programming) before the ENDCASE token. However, when testing it, it does not think it is a syntax error for not having it:
./compile < tests/syntax5.txt
1 // Multiple errors
2
3 function main a integer returns real;
syntax error, unexpected INTEGER, expecting ':'
4 b: integer is * 2;
syntax error, unexpected MULOP
5 c: real is 6.0;
6 begin
7 if a > c then
8 b + / 4.;
syntax error, unexpected MULOP
9 else
10 case b is
11 when => 2;
syntax error, unexpected ARROW, expecting INT_LITERAL
12 when 2 => c;
13 endcase;
14 endif;
15 end;
Lexical Errors: 0
Syntax Errors: 4
Semantic Errors: 0
Duplicate Identifier Errors: 0
Undeclared Errors: 0
Total Errors: 4
Did i set something up wrong?

Sorry, it was error recovery. I added error ';' to the case rule:
case:
WHEN INT_LITERAL ARROW statement_ |
error ';'
;
And now see the error:
1 // Multiple errors
2
3 function main a integer returns real;
syntax error, unexpected INTEGER, expecting ':'
4 b: integer is * 2;
syntax error, unexpected MULOP
5 c: real is 6.0;
6 begin
7 if a > c then
8 b + / 4.;
syntax error, unexpected MULOP
9 else
10 case b is
11 when => 2;
syntax error, unexpected ARROW, expecting INT_LITERAL
12 when 2 => c;
13 endcase;
syntax error, unexpected ENDCASE, expecting WHEN or OTHERS
14 endif;
15 end;
Lexical Errors: 0
Syntax Errors: 5
Semantic Errors: 0
Duplicate Identifier Errors: 0
Undeclared Errors: 0
Total Errors: 5

Related

Antlr "CASE" error path

I am using antlr 2.7.6.
I am programming a parser for plc 61131-3 ST language and I can't resolve an issue with my grammar.
The grammar is:
case_Stmt : 'CASE' expression 'OF' case_Selection + ( 'ELSE' stmt_List )? 'END_CASE';
case_Selection : case_List ':' stmt_List;
case_List : case_List_Elem ( ',' case_List_Elem )*;
case_List_Elem : subrange | constant_Expr;
constant_Expr : constant | enum_Value;
stmt_List : ( Stmt ? ';' )*;
stmt : assign_Stmt | subprog_Ctrl_Stmt | selection_Stmt | Iteration_Stmt;
assign_Stmt : ( variable ':=' expression )
enum_Value : ( identifier '#' )? identifier;
variable : identifier | ...
The problem occurs with "enum_Value" as "case_Selection", the parser interprets it as a new "stmt" instead of the new "Case_Selection" it was supposed to.
Example:
CASE (enumVariable) OF
enum#literal1: Variable1 := 1;
enum#liteal2: Variable1 := 2;
enum#liteal3: Variable1 := 3;
ELSE
Variable1 := 4;
END_CASE;
In the above example instead of taking " enum.liteal2" as the new "case_Selection" it interprets it as "assign_Stmt" and gives error because it doesn't found the ':='.
Is there a way to try to read the maximum of characthers till we find the ':' or the ':=' to understand if we realy have a new "stmt" or not?
Thank you!
Edit1: better syntax;

ANTLR4: no viable alternative at input error

Im using a cut down version of a pascal grammar to create a compiler which converts pascal to javascript, however i keep running into this error
line 3:4 no viable alternative at input 'PROCEDURE'
line 3:38 extraneous input ':' expecting {'END', ';'}
line 5:4 no viable alternative at input 'VAR'
The following is the relevant parts of my Grammar:
grammar pascal;
program
: programHeading ('INTERFACE')?
block
DOT
;
programHeading
: 'PROGRAM' identifier (LPAREN identifierList RPAREN)? SEMI
| 'UNIT' identifier SEMI
;
identifier
: IDENT
;
block
: ( labelDeclarationPart
| constantDefinitionPart
| typeDefinitionPart
| variableDeclarationPart
| procedureAndFunctionDeclarationPart
| usesUnitsPart
| 'IMPLEMENTATION'
)*
| compoundStatement
;
procedureAndFunctionDeclarationPart
: procedureOrFunctionDeclaration SEMI
;
procedureOrFunctionDeclaration
: procedureDeclaration
| functionDeclaration
;
procedureDeclaration
: 'PROCEDURE' identifier (formalParameterList)? SEMI
( block | directive )
;
functionDeclaration
: 'FUNCTION' identifier (formalParameterList)? COLON resultType SEMI
( block | directive )
;
compoundStatement
: 'BEGIN'
statements
'END'
;
statements
: statement ( SEMI statement )*
;
statement
: label COLON unlabelledStatement
| unlabelledStatement
;
im using antlr-4.5-complete and was just hoping someone could shed some light on this.
This is the program im trying to compile:
PROGRAM Lesson1_PROGRAM3;
BEGIN
PROCEDURE DrawLine(X : Integer; Y : Integer);
VAR
Num1, Num2, Sum : Integer;
BEGIN
Write('Input number 1:');
Readln(Num1);
Writeln('Input number 2:');
Readln(Num2);
Sum := Num1 + Num2;
Writeln(Sum);
Readln;
IF Sel = '1' THEN
BEGIN
Total := N1 + N2;
Write('Press any key TO continue...');
Readkey;
GOTO 1;
END;
FOR Counter := 1 TO 7 DO
writeln('for loop');
Readln;
END;
END.

“list label” operator not working for a set of alternatives

It seems the following rule will not work in antlr4
testSetLabel
: (flags+=( 'A' | 'B' | 'C' | 'D' ))* ;
It will give this error:
TestSetLabelParser.java:69: error: incompatible types
((TestSetLabelContext)_localctx).flags = _input.LT(1);
^
required: List<Token>
found: Token
If I change the rule to this:
testSetLabel2
: ( flags+= 'A' | flags+='B' | flags+='C' | flags+='D' )* ;
I get warning: 'Factor label out of set'
Is this a bug or expected behavior?
It sounds like a bug. The = operator works, as in the following case.
flags=('A' | 'B' | 'C' | 'D')
The message you are seeing is only a performance suggestion, so I would use the working method for now and factor the label out of the set when ANTLR 4.1 comes out at the end of June.
Here is the issue report

Why does this simple grammar have a shift/reduce conflict?

%token <token> PLUS MINUS INT
%left PLUS MINUS
THIS WORKS:
exp : exp PLUS exp;
exp : exp MINUS exp;
exp : INT;
THIS HAS 2 SHIFT/REDUCE CONFLICTS:
exp : exp binaryop exp;
exp : INT;
binaryop: PLUS | MINUS ;
WHY?
This is because the second is in fact ambiguous. So is the first grammar, but you resolved the ambiguity by adding %left.
This %left does not work in the second grammar, because associativity and precedence are not inherited from rule to rule. I.e. the binaryop nonterminal does not inherit any such thing even though it produces PLUS and MINUS. Associativity and predecence are localized to a rule, and revolve around terminal symbols.
We cannot do %left binaryop, but we can slightly refactor the grammar:
exp : exp binaryop term
exp : term;
term : INT;
binaryop: PLUS | MINUS ;
That has no conflicts now because it is implicitly left-associative. I.e. the production of a longer and longer expression can only happen on the left side of the binaryop, because the right side is a term which produces only an INT.
You need to specify a precedence for the exp binop exp rule if you want the precedence rules to resolve the ambiguity:
exp : exp binaryop exp %prec PLUS;
With that change, all the conflicts are resolved.
Edit
The comments seem to indicate some confusion as to what the precedence rules in yacc/bison do.
The precedence rules are a way of semi-automatically resolving shift/reduce conflicts in the grammar. They're only semi-automatic in that you have to know what you are doing when you specify the precedences.
Bascially, whenever there is a shift/reduce conflict between a token to be shifted and a rule to be reduced, yacc compares the precedence of the token to be shifted and the rule to be reduced, and -- as long as both have assigned precedences -- does whichever is higher precedence. If either the token or the rule has no precedence assigned, then the conflict is reported to the user.
%left/%right/%nonassoc come into the picture when the token and rule have the SAME precedence. In that case %left means do the reduce, %right means do the shift, and %nonassoc means do neither, causing a syntax error at runtime if the parser runs into this case.
The precedence levels themselves are assigned to tokens with%left/%right/%nonassoc and to rules with %prec. The only oddness is that rules with no %prec and at least one terminal on the RHS get the precedence of the last terminal on the RHS. This can sometimes end up assigning precedences to rules that you really don't want to have precedence, which can sometimes result in hiding conflicts due to resolving them incorrectly. You can avoid these problems by adding an extra level of indirection in the rule in question -- change the problematic terminal on the RHS to to a new non-terminal that expands to just that terminal.
I assume that this falls under what the Bison manual calls "Mysterious Conflicts". You can replicate that with:
exp: exp plus exp;
exp: exp minus exp;
exp: INT;
plus: PLUS;
minus: MINUS;
which gives four S/R conflicts for me.
The output file describing the conflicted grammar produced by Bison (version 2.3) on Linux is as follows. The key information at the top is 'State 7 has conflicts'.
State 7 conflicts: 2 shift/reduce
Grammar
0 $accept: exp $end
1 exp: exp binaryop exp
2 | INT
3 binaryop: PLUS
4 | MINUS
Terminals, with rules where they appear
$end (0) 0
error (256)
PLUS (258) 3
MINUS (259) 4
INT (260) 2
Nonterminals, with rules where they appear
$accept (6)
on left: 0
exp (7)
on left: 1 2, on right: 0 1
binaryop (8)
on left: 3 4, on right: 1
state 0
0 $accept: . exp $end
INT shift, and go to state 1
exp go to state 2
state 1
2 exp: INT .
$default reduce using rule 2 (exp)
state 2
0 $accept: exp . $end
1 exp: exp . binaryop exp
$end shift, and go to state 3
PLUS shift, and go to state 4
MINUS shift, and go to state 5
binaryop go to state 6
state 3
0 $accept: exp $end .
$default accept
state 4
3 binaryop: PLUS .
$default reduce using rule 3 (binaryop)
state 5
4 binaryop: MINUS .
$default reduce using rule 4 (binaryop)
state 6
1 exp: exp binaryop . exp
INT shift, and go to state 1
exp go to state 7
And here is the information about 'State 7':
state 7
1 exp: exp . binaryop exp
1 | exp binaryop exp .
PLUS shift, and go to state 4
MINUS shift, and go to state 5
PLUS [reduce using rule 1 (exp)]
MINUS [reduce using rule 1 (exp)]
$default reduce using rule 1 (exp)
binaryop go to state 6
The trouble is described by the . markers in the the lines marked 1. For some reason, the %left is not 'taking effect' as you'd expect, so Bison identifies a conflict when it has read exp PLUS exp and finds a PLUS or MINUS after it. In such cases, Bison (and Yacc) do the shift rather than the reduce. In this context, that seems to me to be tantamount to giving the rules right precedence.
Changing the %left to %right and omitting it do not change the result (in terms of the conflict warnings). I also tried Yacc on Solaris and it produce essentially the same conflict.
So, why does the first grammar work? Here's the output:
Grammar
0 $accept: exp $end
1 exp: exp PLUS exp
2 | exp MINUS exp
3 | INT
Terminals, with rules where they appear
$end (0) 0
error (256)
PLUS (258) 1
MINUS (259) 2
INT (260) 3
Nonterminals, with rules where they appear
$accept (6)
on left: 0
exp (7)
on left: 1 2 3, on right: 0 1 2
state 0
0 $accept: . exp $end
INT shift, and go to state 1
exp go to state 2
state 1
3 exp: INT .
$default reduce using rule 3 (exp)
state 2
0 $accept: exp . $end
1 exp: exp . PLUS exp
2 | exp . MINUS exp
$end shift, and go to state 3
PLUS shift, and go to state 4
MINUS shift, and go to state 5
state 3
0 $accept: exp $end .
$default accept
state 4
1 exp: exp PLUS . exp
INT shift, and go to state 1
exp go to state 6
state 5
2 exp: exp MINUS . exp
INT shift, and go to state 1
exp go to state 7
state 6
1 exp: exp . PLUS exp
1 | exp PLUS exp .
2 | exp . MINUS exp
$default reduce using rule 1 (exp)
state 7
1 exp: exp . PLUS exp
2 | exp . MINUS exp
2 | exp MINUS exp .
$default reduce using rule 2 (exp)
The difference seems to be that in states 6 and 7, it is able to distinguish what to do based on what comes next.
One way of fixing the problem is:
%token <token> PLUS MINUS INT
%left PLUS MINUS
%%
exp : exp binaryop term;
exp : term;
term : INT;
binaryop: PLUS | MINUS;

Shift reduce conflict

I'm having a problem understanding the shift/reduce confict for a grammar that I know has no ambiguity. The case is one of the if else type but it's not the 'dangling else' problem since I have mandatory END clauses delimiting code blocks.
Here is the grammar for gppg (Its a Bison like compiler compiler ... and that was not an echo):
%output=program.cs
%start program
%token FOR
%token END
%token THINGS
%token WHILE
%token SET
%token IF
%token ELSEIF
%token ELSE
%%
program : statements
;
statements : /*empty */
| statements stmt
;
stmt : flow
| THINGS
;
flow : '#' IF '(' ')' statements else
;
else : '#' END
| '#' ELSE statements '#' END
| elseifs
;
elseifs : elseifs '#' ELSEIF statements else
| '#' ELSEIF statements else
;
Here is the conflict output:
// Parser Conflict Information for grammar file "program.y"
Shift/Reduce conflict on symbol "'#'", parser will shift
Reduce 10: else -> elseifs
Shift "'#'": State-22 -> State-23
Items for From-state State 22
10 else: elseifs .
-lookahead: '#', THINGS, EOF
11 elseifs: elseifs . '#' ELSEIF statements else
Items for Next-state State 23
11 elseifs: elseifs '#' . ELSEIF statements else
// End conflict information for parser
I already switched arround everything, and I do know how to resolve it, but that solution involves giving up the left recursion on 'elseif' for a right recursion.
Ive been through all the scarse documentation I have found on the internet regarding this issue (I post some links at the end) and still have not found an elegant solution. I know about ANTLR and I don't want to consider it right now. Please limit your solution to Yacc/Bison parsers.
I would appreciate elegant solutions, I managed to do It by eleminating the /* empty */ rules and duplication everything that needed an empty list but in the larger grammar Im working on It just ends up like 'sparghetti grammar syndrome'.
Here are some links:
http://nitsan.org/~maratb/cs164/bison.html
http://compilers.iecc.com/comparch/article/98-01-079
GPPG, the parser I'm using
Bison manual
Your revised ELSEIF rule has no markers for a condition -- it should nominally have '(' and ')' added.
More seriously, you now have a rule for
elsebody : else
| elseifs else
;
and
elseifs : /* Nothing */
| elseifs ...something...
;
The 'nothing' is not needed; it is implicitly taken care of by the 'elsebody' without the 'elseifs'.
I would be very inclined to use rules 'opt_elseifs', 'opt_else', and 'end':
flow : '#' IF '(' ')' statements opt_elseifs opt_else end
;
opt_elseifs : /* Nothing */
| opt_elseifs '#' ELSIF '(' ')' statements
;
opt_else : /* Nothing */
| '#' ELSE statements
;
end : '#' END
;
I've not run this through a parser generator, but I find this relatively easy to understand.
I think the problem is in the elseifs clause.
elseifs : elseifs '#' ELSEIF statements else
| '#' ELSEIF statements else
;
I think the first version is not required, since the else clause refers back to elseifs anyway:
else : '#' END
| '#' ELSE statements '#' END
| elseifs
;
What happens if you change elseifs?:
elseifs : '#' ELSEIF statements else
;
The answer from Jonathan above seems like it would be the best, but since its not working for you I have a few suggestions you could try that will help you in debugging the error.
Firstly have you considered making the hash/sharp symbol a part of the tokens themselves (i.e. #END, #IF, etc)? So that they get taken out by the lexer, meaning they don't have to be included in the parser.
Secondly I would urge you to rewrite the rules without duplicating any token streams. (Part of the Don't Repeat Yourself principle.) So the rule " '#' ELSEIF statements else " should only exist in one place in that file (not two as you have above).
Lastly I suggest that you look into precedence and associativity of the IF/ELSEIF/ELSE tokens. I know that you should be able to write a parser that doesn't require this but it might be the thing that you need in this case.
I'm still switching thing arround, and my original question had some errors since the elseifs sequence had an else allways at the end which was wrong. Here is another take at the question, this time I get two shift/reduce conflicts:
flow : '#' IF '(' ')' statements elsebody
;
elsebody : else
| elseifs else
;
else : '#' ELSE statements '#' END
| '#' END
;
elseifs : /* empty */
| elseifs '#' ELSEIF statements
;
The conflicts now are:
// Parser Conflict Information for grammar file "program.y"
Shift/Reduce conflict on symbol "'#'", parser will shift
Reduce 12: elseifs -> /* empty */
Shift "'#'": State-10 -> State-13
Items for From-state State 10
7 flow: '#' IF '(' ')' statements . elsebody
4 statements: statements . stmt
Items for Next-state State 13
10 else: '#' . ELSE statements '#' END
11 else: '#' . END
7 flow: '#' . IF '(' ')' statements elsebody
Shift/Reduce conflict on symbol "'#'", parser will shift
Reduce 13: elseifs -> elseifs, '#', ELSEIF, statements
Shift "'#'": State-24 -> State-6
Items for From-state State 24
13 elseifs: elseifs '#' ELSEIF statements .
-lookahead: '#'
4 statements: statements . stmt
Items for Next-state State 6
7 flow: '#' . IF '(' ')' statements elsebody
// End conflict information for parser
Empty rules just aggravate the gppg i'm affraid. But they seem so natural to use I keep trying them.
I already know right recursion solves the problem as 1800 INFORMATION has said. But I'm looking for a solution with left recursion on the elseifs clause.
elsebody : elseifs else
| elseifs
;
elseifs : /* empty */
| elseifs '#' ELSEIF statements
;
else : '#' ELSE statements '#' END
;
I think this should left recurse and always terminate.
OK - here is a grammar (not minimal) for if blocks. I dug it out of some code I have (called adhoc, based on hoc from Kernighan & Plauger's "The UNIX Programming Environment"). This outline grammar compiles with Yacc with no conflicts.
%token NUMBER IF ELSE
%token ELIF END
%token THEN
%start program
%%
program
: stmtlist
;
stmtlist
: /* Nothing */
| stmtlist stmt
;
stmt
: ifstmt
;
ifstmt
: ifcond endif
| ifcond else begin
| ifcond eliflist begin
;
ifcond
: ifstart cond then stmtlist
;
ifstart
: IF
;
cond
: '(' expr ')'
;
then
: /* Nothing */
| THEN
;
endif
: END IF begin
;
else
: ELSE stmtlist END IF
;
eliflist
: elifblock
| elifcond eliflist begin /* RIGHT RECURSION */
;
elifblock
: elifcond else begin
| elifcond endif
;
elifcond
: elif cond then stmtlist end
;
elif
: ELIF
;
begin
: /* Nothing */
;
end
: /* Nothing */
;
expr
: NUMBER
;
%%
I used 'NUMBER' as the dummy element, instead of THINGS, and I used ELIF instead of ELSEIF. It includes a THEN, but that is optional. The 'begin' and 'end' operations were used to grab the program counter in the generated program - and therefore should be removable from this without affecting it.
There was a reason I thought I needed to use right recursion instead of the normal left recursion - but I think it was to do with the code generation strategy I was using, rather than anything else. The question mark in the comment was in the original; I remember not being happy with it. The program as a whole does work - it is a project that's been on the back burner for the last decade or so (hmmm...I did some work at the end of 2004 and beginning of 2005; prior to that, it was 1992 and 1993).
I've not spent the time working out why this compiles conflict-free and what I outlined earlier does not. I hope it helps.