Is there a shift/reduce error in this yacc code? - grammar

I'm getting a message from yacc saying that there is a shift/reduce conflict. I think it's coming from this part of the yacc file.
statement : expression_stmt
| compound_stmt
| selection_stmt
| iteration_stmt
| return_stmt ;
selection_stmt : IF '(' expression ')' statement
| IF '(' expression ')' statement ELSE statement ;
expression : var '=' expression | simple_expression ;
Can you see a conflict? How can it be fixed?

Yes, I'm seeing a conflict. The selection_statement rule matches expressions like
IF(<expression 1>)
THEN
IF(<expression 2>)
THEN <expression statement 1>
ELSE <expression statement 2>
But that's ambiguous. It could also be
IF(<expression 1>)
THEN
IF(<expression 2>)
THEN <expression statement 1>
ELSE <expression statement 2>

Related

Im just starting with ANTLR and I cant decipher where Im messing up with mismatched input error

I've just started using antlr so Id really appreciate the help! Im just trying to make a variable declaration declaration rule but its not working! Ive put the files Im working with below, please lmk if you need anything else!
INPUT CODE:
var test;
GRAMMAR G4 FILE:
grammar treetwo;
program : (declaration | statement)+ EOF;
declaration :
variable_declaration
| variable_assignment
;
statement:
expression
| ifstmnt
;
variable_declaration:
VAR NAME SEMICOLON
;
variable_assignment:
NAME '=' NUM SEMICOLON
| NAME '=' STRING SEMICOLON
| NAME '=' BOOLEAN SEMICOLON
;
expression:
operand operation operand SEMICOLON
| expression operation expression SEMICOLON
| operand operation expression SEMICOLON
| expression operation operand SEMICOLON
;
ifstmnt:
IF LPAREN term RPAREN LCURLY
(declaration | statement)+
RCURLY
;
term:
| NUM EQUALITY NUM
| NAME EQUALITY NUM
| NUM EQUALITY NAME
| NAME EQUALITY NAME
;
/*Tokens*/
NUM : '0' | '-'?[1-9][0-9]*;
STRING: [a-zA-Z]+;
BOOLEAN: 'true' | 'false';
VAR : 'var';
NAME : [a-zA-Z]+;
SEMICOLON : ';';
LPAREN: '(';
RPAREN: ')';
LCURLY: '{';
RCURLY: '}';
EQUALITY: '==' | '<' | '>' | '<=' | '>=' | '!=' ;
operation: '+' | '-' | '*' | '/';
operand: NUM;
IF: 'if';
WS : [ \t\r\n]+ -> skip;
Error I'm getting:
(line 1,char 0): mismatched input 'var' expecting {NUM, 'var', NAME, 'if'}
Your STRING rule is the same as your NAME rule.
With the ANTLR lexer, if two lexer rules match the same input, the first one declared will be used. As a result, you’ll never see a NAME token.
Most tutorials will show you have to dump out the token stream. It’s usually a good idea to view the token stream and verify your Lexer rules before getting too far into your parser rules.

Antlr - mismatched input '1' expecting number

I'm new to Antlr and I have the following simplified language:
grammar Hello;
sentence : targetAttributeName EQUALS expression+ (IF relationedExpression (logicalRelation relationedExpression)*)?;
expression :
'(' expression ')' |
expression ('*'|'/') expression |
expression ('+'|'-') expression |
function |
targetAttributeName |
NUMBER;
filterExpression :
'(' filterExpression ')' |
filterExpression ('*'|'/') filterExpression |
filterExpression ('+'|'-') filterExpression |
function |
filterAttributeName |
NUMBER |
DATE;
relationedExpression :
filterExpression ('<'|'<='|'>'|'>='|'=') filterExpression |
filterAttributeName '=' STRING |
STRING '=' filterAttributeName
;
logicalRelation :
'AND' |
'OR'
;
targetAttributeName :
'x'|
'y'
;
filterAttributeName :
'a' |
'a' '1' |
targetAttributeName;
function:
simpleFunction |
complexFunction ;
simpleFunction :
'simpleFunction' '(' expression ')' |
'simpleFunction2' '(' expression ')'
;
complexFunction :
'complexFunction' '(' expression ')' |
'complexFunction2' '(' expression ')'
;
EQUALS : '=';
IF : 'IF';
STRING : '"' [a-zA-z0-9]* '"';
NUMBER : [-]?[0-9]+('.'[0-9]+)?;
DATE: NUMBER NUMBER NUMBER NUMBER '.' NUMBER NUMBER? '.' NUMBER NUMBER? '.';
WS : [ \t\r\n]+ -> skip ; // skip spaces, tabs, newlines
It works with x = y * 2, but it doesn't work with x =y * 1.
The error message is the following:
Hello::sentence:1:7: mismatched input '1' expecting {'simpleFunction', 'complexFunction', 'x', 'y', 'complexFunction2', '(', 'simpleFunction2', NUMBER}
It is very strange for me, because 1 is a NUMBER...
If I change the filterAttribute from 'a' '1' to 'a1', then it works with x=y*1, but I don't understand the difference between the two cases. Could somebody explain it for me?
Thanks.
By doing this:
filterAttributeName :
'a' |
'a' '1' |
targetAttributeName;
ANTLR creates lexer rules from these inline tokens. So you really have a lexer grammar that looks like this:
T_1 : '1': // the rule name will probably be different though
T_a : 'a';
...
NUMBER : [-]?[0-9]+('.'[0-9]+)?;
In other words, the input 1 will be tokenized as T_1, not as a NUMBER.
EDIT
Whenever certain input can match two or more lexer rules, ANTLR chooses the one defined first. The lexer does not "listen" to the parser to see what it needs at a particular time. The lexing and parsing are 2 distinct phases. This is simply how ANTLR works, and many other other parser generators. If this is not acceptable for you, you should google for "scanner-less parsing", or "packrat parsers".

error(211): [fatal] rule conditions has non-LL(*)

I use ANTLR to create a grammar, but I get this error
error(211): [fatal] rule conditions has non-LL(*) decision due to recursive rule invocations reachable from alts 1,2. Resolve by left-factoring or using syntactic predicates or using backtrack=true option.
my grammar rules:
conditions
: '(' conditions ')'
| condition (C_BINARY_OPERATOR conditions)?
;
condition
: expression C_CONDITIONAL_OPERATOR expression
;
expression
: (term) (('+'|'-') term)*
;
term
: (factor) (('*' | '/') factor)*
;
factor
: C_ID
| C_NUMBERS
| '(' expression ')'
;
// Binary Operators for Logical Calculation
C_BINARY_OPERATOR
: '&&'
| '||'
;
// Conditonal Operators
C_CONDITIONAL_OPERATOR
: '>'
| '<'
| '=='
| '!='
| '=<'
| '=>'
;
How can I fix this error?
See this page on the ANTLR website. It has information on how to fix your error.
Well, the error does say "Resolve by left-factoring or using syntactic predicates or using backtrack=true option". Is that confusing?

Ambiguous grammar (using Bison)

I've got a problem with an ambiguous grammar. I've got this:
%token identifier
%token lolcakes
%start program
%%
program
: call_or_definitions;
expression
: identifier
| lolcakes;
expressions
: expression
| expressions ',' expression;
call_or_definition
: function_call
| function_definition;
call_or_definitions
: call_or_definition
| call_or_definitions call_or_definition;
function_argument_core
: identifier
| identifier '=' expression
| identifier '=' '{' expressions '}';
function_call
: expression '(' function_arguments ')' ';';
function_definition
: identifier '(' function_definition_arguments ')' '{' '}';
function_argument
: lolcakes
| function_argument_core;
function_arguments
: function_argument
| function_arguments ',' function_argument
function_definition_argument
: expression function_argument_core
| function_argument_core;
function_definition_arguments
: function_definition_argument
| function_definition_arguments ',' function_definition_argument;
It's a subset of my genuine grammar which is separately compilable. At the moment, it generates an S/R conflict between function_call and function_definition when encountering the stream identifier (. I'm trying to convince Bison that it doesn't need to make the decision until later in the token stream by unifying the grammar for function calls and function definitions. In other words, if it encounters something that's common to both calls and definitions, it can reduce that without needing to know which is which, and if it encounters something else, that something else would clearly label which is which. Is that even possible, and if so, how can I do it? I'd really rather avoid having to alter the structure of the input stream if possible.
The problem should not arise until you see identifier ( identifier with a lookahead of , or ). At that point the parser has to decide whether to reduce the second identifier as a function_definition_argument or an expression (to become function_argument).
You can solve this purely in the grammar by brute force, but it will lead you into a maze of nonterminals like expression_not_naked_identifier and ambiguous_begining_of_function_defn_or_call, with resulting rampant duplication of semantic actions.
It would probably be more maintainable (and lead to more intelligible syntax error messages) to write something like
definition_or_call_start: identifier '(' generic_argument_list ')'
generic_argument_list: generic_argument
| generic_argument_list ',' generic_argument
generic_argument: expression
| function_argument_core
| ...
function_call: definition_or_call_start ';';
function_definition : definition_or_call_start '{' '}';
and then check as a semantic constraint in the action for the last two productions that the actual generic_arguments you have parsed match the use they're being put to.
The problem is that an expression can consist of a single identifier. At this time the parser needs to decide whether it's a identifier only or if it shall reduce it to expression, since that will decide on the path afterwards.

separate return statement from other statements ANTRL

here's part from my grammar.
statement
: assignmentStatement
| doLoopStatement
| whileStatement
| ifStatement
| procedureCallStatement
;
function
: 'FUNCTION' IDENT '(' parameters? ')' ':' type ':='
(variable (';' variable)*)?
'BEGIN'
main_body //body can be empty
return_Statement
'END' IDENT
;
where main_body is:
main_body
: (statement (';' statement)*)?
;
now, before creating my AST, I need to fix the return statement,
the problem is that the assignmentStatement and return_Statement
and so I'm getting a LL(*) error from the parser, as it does not know what to choose.
assignmentStatement
: IDENT ':=' expression
;
return_Statement
: IDENT ':=' expression
;
any ideas?
If assignmentStatement is really supposed to be identical to return_Statement then there's no reason to have both. Eliminate the return_Statement rule and in your function rule replace it with assignmentStatement.