ANTLR no viable alternative at input '/' - eclipse-plugin

Okay, I'm really confused about this error. I know in the past having a '/' as a token in a rule hasn't produced any errors. However, this is simply baffling. Here is my grammar:
grammar LilWildC;
options {
language = Java;
}
#header
{
package com.matthewkimber.lilwildc;
}
#lexer::header
{
package com.matthewkimber.lilwildc;
}
program
: global_variables procedure+
;
global_variables
: variable_definition*
;
variable_definition
: 'number' IDENT ';'
| 'number' '[' A_NUMBER ']' IDENT ';'
;
procedure
: 'procedure' IDENT '{' block '}'
;
block
: local_variables statement+
;
local_variables
: variable_definition*
;
statement
: variable_reference '=' numeric_expression ';'
;
variable_reference
: IDENT
| IDENT '[' numeric_expression ']'
;
numeric_expression
: multiply_expression
( '+' multiply_expression
| '-' multiply_expression
)*
;
multiply_expression
: negative_factor
( '*' negative_factor
| '/' negative_factor
| '%' negative_factor
)*
;
negative_factor
: '-'? factor
;
factor
: A_NUMBER
| variable_reference
| '(' numeric_expression ')'
;
A_NUMBER: (('0'..'9')+'.'?) | (('0'..'9')*'.'('0'..'9')+) ;
IDENT: ('a'..'z' | 'A'..'Z')('a'..'z' | 'A'..'Z' | '0'..'9' | '_')* ;
WS: (' ' | '\t' | ('\r'?'\n'))+ { $channel = HIDDEN; } ;
When I run a test on the grammar with the following input:
procedure main
{
var = 10 / 1;
}
I get the following parse tree in the ANTLR eclipse plug-in:
What I don't get is that multiplication and modulo work fine, only divide throws this error. Is ANTLR skipping right over the '/' and not seeing it as a token or have I missed something? Any help is greatly appreciated.

There's nothing wrong with your grammar, the problem must be the Eclipse plugin. ANTLRWorks' debugger produces the tree:
And creating a little test myself (after fixing the typo grammary LilWildC; to grammar LilWildC;, and removing the packages) with a main class and ANTLR 3.3:
LilWildC.g
grammar LilWildC;
options {
language = Java;
}
program
: global_variables procedure+
;
global_variables
: variable_definition*
;
variable_definition
: 'number' IDENT ';'
| 'number' '[' A_NUMBER ']' IDENT ';'
;
p rocedure
: 'procedure' IDENT '{' block '}'
;
block
: local_variables statement+
;
local_variables
: variable_definition*
;
statement
: variable_reference '=' numeric_expression ';'
;
variable_reference
: IDENT
| IDENT '[' numeric_expression ']'
;
numeric_expression
: multiply_expression
( '+' multiply_expression
| '-' multiply_expression
)*
;
multiply_expression
: negative_factor
( '*' negative_factor
| '/' negative_factor
| '%' negative_factor
)*
;
negative_factor
: '-'? factor
;
factor
: A_NUMBER
| variable_reference
| '(' numeric_expression ')'
;
A_NUMBER: (('0'..'9')+'.'?) | (('0'..'9')*'.'('0'..'9')+) ;
IDENT: ('a'..'z' | 'A'..'Z')('a'..'z' | 'A'..'Z' | '0'..'9' | '_')* ;
WS: (' ' | '\t' | ('\r'?'\n'))+ { $channel = HIDDEN; } ;
Main.java
import org.antlr.runtime.*;
public class Main {
public static void main(String[] args) throws Exception {
String src =
"procedure main \n" +
"{ \n" +
" var = 10 / 1; \n" +
"} \n";
LilWildCLexer lexer = new LilWildCLexer(new ANTLRStringStream(src));
LilWildCParser parser = new LilWildCParser(new CommonTokenStream(lexer));
parser.program();
}
}
bart#hades:~/Programming/ANTLR/Demos/LilWildC$ java -cp antlr-3.3.jar org.antlr.Tool LilWildC.g
bart#hades:~/Programming/ANTLR/Demos/LilWildC$ javac -cp antlr-3.3.jar *.java
bart#hades:~/Programming/ANTLR/Demos/LilWildC$ java -cp .:antlr-3.3.jar Main
produces no errors or warnings.

Related

How has a Language like Apache Velocity to be parsed with Antlr4?

i am working on a grammar to parse apache velocity on my own and i ran into the issue that i am not able to detect normal text neither the markup.
I am getting this message during the first line of the source.
line 1:0 extraneous input '// ${Name}.java' expecting {BREAK, FOREACH, IF, INCLUDE, PARSE, SET, STOP, '#[[', RAW_TEXT, '$'}
The input '// ${Name}.Java' should be tokenized to RAW_TEXT '$' '{' IDENTIFIER '}' RAW_TEXT. The parser rules should be rawText reference rawText. These parser rules are statements.
This is my source file. It is a java template in this case but the source file could or might be also a html template like mentioned in the user guide of apache velocity.
// ${Name}.java
#foreach ( $vertice in $Vertices )
#if ( $vertice.Type == "Class" )
public class $vertice.Name {
#foreach ( $edge in $Edges )
#if ( $edge.from == $vertice.Name)
// From $edge.from to $edge.to
private $edge.to $edge.to.toLowerCase();
public $edge.to get{$edge.to}() {
return this.${edge.to.toLowerCase()};
}
public void set${edge.to}(${edge.to} new${edge.to}) {
$edge.to old${edge.to} = this.${edge.to.toLowerCase()};
if (old${edge.to} != new${edge.to}) {
if (old${edge.to} != null) {
this.${edge.to.toLowerCase()} = null;
old${edge.to}.set${edge.from}(null);
}
this.${edge.to.toLowerCase()} = new${edge.to};
if (new${edge.to} != null) {
new${edge.to}.set${edge.from}(this);
}
}
}
public $edge.from with${edge.to}(${edge.to} new${edge.to}) {
this.set${edge.to}(new${edge.to});
return this;
}
#end
#end
}
#end
#end
This is my grammar.
grammar Velocity;
/* -- Parser Rules --- */
/*
* Start Rule
*/
template
: statementSet EOF?
;
/*
* Statements
*/
statementSet
: statement+
;
statement
: rawText # RawTextStatement
| unparsed # UnparsedStatement
| reference # ReferenceStatement
| setDirective # SetStatement
| ifDirective # IfStatement
| foreachDirective # ForeachStatement
| includeDirective # IncludeStatement
| parseDirective # ParseStatement
| breakDirective # BreakStatement
| stopDirective # StopStatement
;
rawText
: RAW_TEXT
;
unparsed
: UNPARSED UnparsedText=(TEXT | NL)* UNPARSED_END
;
setDirective
: SET '(' assignment ')'
;
ifDirective
: ifPart (elseifPart)* (elsePart)? END
;
foreachDirective
: FOREACH '(' variableReference 'in' enumerable ')' statementSet END
;
includeDirective
: INCLUDE '(' stringValue (',' stringValue)* ')'
;
parseDirective
: PARSE '(' stringValue ')'
;
breakDirective
: BREAK
;
stopDirective
: STOP
;
/*
* Expressions
*/
assignment
: assignableReference '=' expression
;
expression
: reference # ReferenceExpression
| string # StringLiteralExpression
| NUMBER # NumberLiteralExpression
| array # ArrayExpression
| map # MapExpression
| range # RangeExpression
| arithmeticOperation # ArithmeticOperationExpression
| booleanOperation # BooleanOperationExpression
;
enumerable
: array
| map
| range
| reference
;
stringValue
: string # StringValue_String
| reference # StringValue_Reference
;
/*
* References
*/
reference
: DOLLAR Quiet='!'? (referenceType | '{' referenceType '}')
;
assignableReference
: DOLLAR Quiet='!'? (assignableReferenceType | '{' assignableReferenceType '}')
;
referenceType
: assignableReferenceType # ReferenceType_AssignableReferenceType
| methodReference # ReferenceType_MethodReference
;
assignableReferenceType
: variableReference # AssignableReferenceType_VariableReference
| propertyReference # AssignableReferenceType_PropertyReference
;
variableReference
: IDENTIFIER indexNotation?
;
propertyReference
: IDENTIFIER ('.' IDENTIFIER)+ indexNotation?
;
methodReference
: IDENTIFIER ('.' IDENTIFIER)* '.' IDENTIFIER '(' (expression (',' expression)*)? ')' indexNotation?
;
indexNotation
: '[' NUMBER ']' # IndexNotation_Number
| '[' reference ']' # IndexNotation_Reference
| '[' string ']' # IndexNotation_String
;
/*
* Parsed Types
*/
string
: '"' stringText* '"' # DoubleQuotedString
| '\'' TEXT? '\'' # SingleQuotedString
;
stringText
: TEXT # StringText_Text
| reference # StringText_Reference
;
/*
* Container Types
*/
array
: '[' (expression (',' expression)*)? ']'
;
map
: '{' (expression ':' expression (',' expression ':' expression))? '}'
;
range
: '[' n=NUMBER '..' m=NUMBER ']'
;
/*
* Arithmetic Operators
*/
arithmeticOperation
: sum
;
sum
: term (followingTerm)*
;
followingTerm
: Operator=('+' | '-') term
;
term
: factor (followingFactor)*
;
followingFactor
: Operator=('*' | '/' | '%') factor
;
factor
: NUMBER # Factor_Number
| reference # Factor_Reference
| '(' arithmeticOperation ')' # Factor_InnerArithmeticOperation
;
/*
* Boolean Operators
*/
booleanOperation
: disjunction
;
disjunction
: conjunction (followingConjunction)*
;
followingConjunction
: Operator=OR conjunction
;
conjunction
: booleanComparison (followingBooleanComparison)*
;
followingBooleanComparison
: Operator=AND booleanComparison
;
booleanComparison
: booleanFactor (followingBooleanFactor)*
;
followingBooleanFactor
: Operator=(EQUALS | NOT_EQUALS) booleanFactor
;
booleanFactor
: BOOLEAN # BooleanFactor_Boolean
| reference # BooleanFactor_Reference
| negation # BooleanFactor_Negation
| arithmeticComparison # BooleanFactor_ArithmeticComparison
| '(' booleanOperation ')' # BooleanFactor_InnerBooleanOperation
;
arithmeticComparison
: LeftHandSide=arithmeticOperation Operator=(EQUALS | NOT_EQUALS | GREATER_THAN | GREATER_THAN_OR_EQUAL_TO | LESS_THAN | LESS_THAN_OR_EQUAL_TO) RightHandSide=arithmeticOperation
;
negation
: NOT booleanFactor
;
/*
* Conditionals
*/
ifPart
: IF '(' booleanOperation ')' statementSet
;
elseifPart
: ELSEIF '(' booleanOperation ')' statementSet
;
elsePart
: ELSE statementSet
;
/* --- Lexer Rules --- */
/*
* Comments
*/
SINGLE_LINE_COMMENT
: '##' TEXT? NL -> skip
;
MULTI_LINE_COMMENT
: '#*' (TEXT | NL)* '*#' -> skip
;
COMMENT_BLOCK
: '#**' (TEXT | NL)* '*#' -> skip
;
/*
* Directives
*/
BREAK
: '#break'
| '#{break}'
;
DEFINE
: '#define'
| '#{define}'
;
ELSE
: '#else'
| '#{else}'
;
ELSEIF
: '#elseif'
| '#{elseif}'
;
END
: '#end'
| '#{end}'
;
EVALUATE
: '#evaluate'
| '#{evaluate}'
;
FOREACH
: '#foreach'
| '#{foreach}'
;
IF
: '#if'
| '#{if}'
;
INCLUDE
: '#include'
| '#{include}'
;
MACRO
: '#macro'
| '#{macro}'
;
PARSE
: '#parse'
| '#{parse}'
;
SET
: '#set'
| '#{set}'
;
STOP
: '#stop'
| '#{stop}'
;
UNPARSED
: '#[['
;
UNPARSED_END
: ']]#'
;
/*
* Identifier
*/
DOLLAR
: '$' -> more
;
IDENTIFIER
: CHARACTER+ (CHARACTER | INTEGER | HYPHEN | UNDERSCORE)*
;
/*
* Boolean Values
*/
TRUE
: 'true'
;
FALSE
: 'false'
;
/*
* Boolean Operators
*/
EQUALS
: '=='
| 'eq'
;
NOT_EQUALS
: '!='
| 'ne'
;
GREATER_THAN
: '>'
| 'gt'
;
GREATER_THAN_OR_EQUAL_TO
: '>='
| 'ge'
;
LESS_THAN
: '<'
| 'lt'
;
LESS_THAN_OR_EQUAL_TO
: '<='
| 'le'
;
OR
: '||'
;
AND
: '&&'
;
NOT
: '!'
| 'not'
;
/*
* Literals
*/
BOOLEAN
: TRUE
| FALSE
;
NUMBER
: '-'? INTEGER
| '-'? INTEGER '.' INTEGER
;
/*
* Content
*/
RAW_TEXT
: ~[*#$]+
;
TEXT
: (ESC | SAFE_CODE_POINT)+
;
fragment ESC
: '\\' (["\\/#$!bftrn] | UNICODE)
;
fragment UNICODE
: 'u' HEX HEX HEX HEX
;
fragment HEX
: [0-9a-fA-F]
;
fragment SAFE_CODE_POINT
: ~["\\\u0000-\u001F]
;
/*
* Atomic elements
*/
CHARACTER
: [a-zA-Z]+
;
INTEGER
: [0-9]+
;
HYPHEN
: '-'
;
UNDERSCORE
: '_'
;
NL
: '\r'
| '\n'
| '\r\n'
;
WS
: ('\t' | ' ' | '\r' | '\n' | '\r\n')+ -> skip
;
What details am i missing here? What has to be done to actually parse velocity code?
Best Regards
Update:
I have changed these lexer rules.
DOLLAR
: '$'
;
RAW_TEXT
: ~[*#$]*
;
TEXT
: (ESC | SAFE_CODE_POINT)*?
;
fragment SAFE_CODE_POINT
: ~[$"\\\u0000-\u001F]
;
And now i'm getting this messages.
[0] line 1:4 mismatched input '{Name}.java\r\n' expecting {'!', '{', IDENTIFIER}
[0] line 2:8 mismatched input ' ( ' expecting '('
[0] line 2:12 mismatched input 'vertice in ' expecting {'!', '{', IDENTIFIER}
[0] line 2:24 mismatched input 'Vertices )\r\n' expecting {'!', '{', IDENTIFIER}
[0] line 3:3 mismatched input ' ( ' expecting '('
[0] line 3:7 mismatched input 'vertice.Type == "Class" )\r\npublic class ' expecting {'!', '{', IDENTIFIER}
[0] line 4:14 mismatched input 'vertice.Name {\r\n\t' expecting {'!', '{', IDENTIFIER}
[0] line 5:9 mismatched input ' ( ' expecting '('
[0] line 5:13 mismatched input 'edge in ' expecting {'!', '{', IDENTIFIER}
[0] line 5:22 mismatched input 'Edges )\r\n\t' expecting {'!', '{', IDENTIFIER}
[0] line 6:4 mismatched input ' ( ' expecting '('
[0] line 6:8 mismatched input 'edge.from == ' expecting {'!', '{', IDENTIFIER}
[0] line 6:22 mismatched input 'vertice.Name)\r\n\t' expecting {'!', '{', IDENTIFIER}
It helped, but the lexer is still stealing the $ symbol and why is it expecting a '{' character while the input starts with a '{' character? I will have a look at this problem.

Precedence in Antlr using parentheses

We are developing a DSL, and we're facing some problems:
Problem 1:
In our DSL, it's allowed to do this:
A + B + C
but not this:
A + B - C
If the user needs to use two or more different operators, he'll need to insert parentheses:
A + (B - C) or (A + B) - C.
Problem 2:
In our DSL, the most precedent operator must be surrounded by parentheses.
For example, instead of using this way:
A + B * C
The user needs to use this:
A + (B * C)
To solve the Problem 1 I've got a snippet of ANTLR that worked, but I'm not sure if it's the best way to solve it:
sumExpr
#init {boolean isSum=false;boolean isSub=false;}
: {isSum(input.LT(2).getText()) && !isSub}? multExpr('+'^{isSum=true;} sumExpr)+
| {isSub(input.LT(2).getText()) && !isSum}? multExpr('-'^{isSub=true;} sumExpr)+
| multExpr;
To solve the Problem 2, I have no idea where to start.
I appreciate your help to find out a better solution to the first problem and a direction to solve the seconde one. (Sorry for my bad english)
Below is the grammar that we have developed:
grammar TclGrammar;
options {
output=AST;
ASTLabelType=CommonTree;
}
#members {
public boolean isSum(String type) {
System.out.println("Tipo: " + type);
return "+".equals(type);
}
public boolean isSub(String type) {
System.out.println("Tipo: " + type);
return "-".equals(type);
}
}
prog
: exprMain ';' {System.out.println(
$exprMain.tree == null ? "null" : $exprMain.tree.toStringTree());}
;
exprMain
: exprQuando? (exprDeveSatis | exprDeveFalharCaso)
;
exprDeveSatis
: 'DEVE SATISFAZER' '{'! expr '}'!
;
exprDeveFalharCaso
: 'DEVE FALHAR CASO' '{'! expr '}'!
;
exprQuando
: 'QUANDO' '{'! expr '}'!
;
expr
: logicExpr
;
logicExpr
: boolExpr (('E'|'OU')^ boolExpr)*
;
boolExpr
: comparatorExpr
| emExpr
| 'VERDADE'
| 'FALSO'
;
emExpr
: FIELD 'EM' '[' (variable_lista | field_lista) comCruzamentoExpr? ']'
-> ^('EM' FIELD (variable_lista+)? (field_lista+)? comCruzamentoExpr?)
;
comCruzamentoExpr
: 'COM CRUZAMENTO' '(' FIELD ';' FIELD (';' FIELD)* ')' -> ^('COM CRUZAMENTO' FIELD+)
;
comparatorExpr
: sumExpr (('<'^|'<='^|'>'^|'>='^|'='^|'<>'^) sumExpr)?
| naoPreenchidoExpr
| preenchidoExpr
;
naoPreenchidoExpr
: FIELD 'NAO PREENCHIDO' -> ^('NAO PREENCHIDO' FIELD)
;
preenchidoExpr
: FIELD 'PREENCHIDO' -> ^('PREENCHIDO' FIELD)
;
sumExpr
#init {boolean isSum=false;boolean isSub=false;}
: {isSum(input.LT(2).getText()) && !isSub}? multExpr('+'^{isSum=true;} sumExpr)+
| {isSub(input.LT(2).getText()) && !isSum}? multExpr('-'^{isSub=true;} sumExpr)+
| multExpr
;
multExpr
: funcExpr(('*'^|'/'^) funcExpr)?
;
funcExpr
: 'QUANTIDADE'^ '('! FIELD ')'!
| 'EXTRAI_TEXTO'^ '('! FIELD ';' INTEGER ';' INTEGER ')'!
| cruzaExpr
| 'COMBINACAO_UNICA' '(' FIELD ';' FIELD (';' FIELD)* ')' -> ^('COMBINACAO_UNICA' FIELD+)
| 'EXISTE'^ '('! FIELD ')'!
| 'UNICO'^ '('! FIELD ')'!
| atom
;
cruzaExpr
: operadorCruzaExpr ('CRUZA COM'^|'CRUZA AMBOS'^) operadorCruzaExpr ondeExpr?
;
operadorCruzaExpr
: FIELD('('!field_lista')'!)?
;
ondeExpr
: ('ONDE'^ '('!expr')'!)
;
atom
: FIELD
| VARIABLE
| '('! expr ')'!
;
field_lista
: FIELD(';' field_lista)?
;
variable_lista
: VARIABLE(';' variable_lista)?
;
FIELD
: NONCONTROL_CHAR+
;
NUMBER
: INTEGER | FLOAT
;
VARIABLE
: '\'' NONCONTROL_CHAR+ '\''
;
fragment SIGN: '+' | '-';
fragment NONCONTROL_CHAR: LETTER | DIGIT | SYMBOL;
fragment LETTER: LOWER | UPPER;
fragment LOWER: 'a'..'z';
fragment UPPER: 'A'..'Z';
fragment DIGIT: '0'..'9';
fragment SYMBOL: '_' | '.' | ',';
fragment FLOAT: INTEGER '.' '0'..'9'*;
fragment INTEGER: '0' | SIGN? '1'..'9' '0'..'9'*;
WS : ( ' '
| '\t'
| '\r'
| '\n'
) {skip();}
;
This is similar to not having operator precedence at all.
expr
: funcExpr
( ('+' funcExpr)*
| ('-' funcExpr)*
| ('*' funcExpr)*
| ('/' funcExpr)*
)
;
I think the following should work. I'm assuming some lexer tokens with obvious names.
expr: sumExpr;
sumExpr: onlySum | subExpr;
onlySum: atom ( PLUS onlySum )?;
subExpr: onlySub | multExpr;
onlySub: atom ( MINUS onlySub )? ;
multExpr: atom ( STAR atomic )? ;
parenExpr: OPEN_PAREN expr CLOSE_PAREN;
atom: FIELD | VARIABLE | parenExpr
The only* rules match an expression if it only has one type of operator outside of parentheses. The *Expr rules match either a line with the appropriate type of operators or go to the next operator.
If you have multiple types of operators, then they are forced to be inside parentheses because the match will go through atom.

ANTLR: Why the invalid input could match the grammar definition

I've written a very simple grammar definition for a calculation expression:
grammar SimpleCalc;
options {
output=AST;
}
tokens {
PLUS = '+' ;
MINUS = '-' ;
MULT = '*' ;
DIV = '/' ;
}
/*------------------------------------------------------------------
* LEXER RULES
*------------------------------------------------------------------*/
ID : ('a'..'z' | 'A' .. 'Z' | '0' .. '9')+ ;
WHITESPACE : ( '\t' | ' ' | '\r' | '\n'| '\u000C' )+ { Skip(); } ;
/*------------------------------------------------------------------
* PARSER RULES
*------------------------------------------------------------------*/
start: expr EOF;
expr : multExpr ((PLUS | MINUS)^ multExpr)*;
multExpr : atom ((MULT | DIV)^ atom )*;
atom : ID
| '(' expr ')' -> expr;
I've tried the invalid expression ABC &* DEF by start but it passed. It looks like the & charactor is ignored. What's the problem here?
Actually your invalid expression ABC &= DEF hasn't been passed; it causes NoViableAltException.

define a grammar in Antlr

I have defined the following grammar.
grammar Sample_1;
#header {
package a;
}
#lexer::header {
package a;
}
program
:
define*
implement*
;
define
: IDENT '=(' INTEGER',' INTEGER ')'
;
implement
:IDENT '=(' (IDENT ','?)* ')'
;
fragment LETTER : ('a'..'z' | 'A'..'Z') ;
fragment DIGIT : '0'..'9';
INTEGER : DIGIT+ ;
IDENT : LETTER (LETTER | DIGIT)*;
WS : (' ' | '\t' | '\n' | '\r' | '\f')+ {$channel = HIDDEN;};
COMMENT : '//' .* ('\n'|'\r') {$channel = HIDDEN;};
How to check in this grammar so that when I have the example
A=(1,1)
B=(1,2)
G=(A,B)
the result is successful but if I write
A=(1,1)
B=(1,2)
G=(A,E)
it gives an error that E is not defined
thanks
the result:
i got it working thanks a lot:
grammar Sample_1;
#members{
int level=0;
}
#header {
package a;
}
#lexer::header {
package a;
}
program
:
block
;
block
scope {
List symbols;
}
#init {
$block::symbols=new ArrayList();
level++;
}
#after {
System.err.println("Hello");
level--;
}
: (define* implement+)
;
define
: IDENT {$block::symbols.add($IDENT.text);} '=(' INTEGER',' INTEGER ')'
;
implement
:IDENT '=(' (a=IDENT
{if (!$block::symbols.contains($a.text)){
System.err.println("undefined");
}}','?)* ')'
;
fragment LETTER : ('a'..'z' | 'A'..'Z') ;
fragment DIGIT : '0'..'9';
INTEGER : DIGIT+ ;
IDENT : LETTER (LETTER | DIGIT)*;
WS : (' ' | '\t' | '\n' | '\r' | '\f')+ {$channel = HIDDEN;};
COMMENT : '//' .* ('\n'|'\r') {$channel = HIDDEN;};
Antlr supports actions, little snippets of code embedded in the grammar file.
An action for an assignment could store into a map. An action for a right-hand-side IDENT could try to pull a value from the map, and throw an exception if it fails.
Chapter 6 in Terrence Parr's "The Definitive ANTLR Reference" covers actions.

How to find shift/reduce conflict in this yacc file?

When I try to use yacc on the following file I get the error conflicts: 1 shift/reduce
How can I find and fix the conflict?
/* C-Minus BNF Grammar */
%token ELSE
%token IF
%token INT
%token RETURN
%token VOID
%token WHILE
%token ID
%token NUM
%token LTE
%token GTE
%token EQUAL
%token NOTEQUAL
%%
program : declaration_list ;
declaration_list : declaration_list declaration | declaration ;
declaration : var_declaration | fun_declaration ;
var_declaration : type_specifier ID ';'
| type_specifier ID '[' NUM ']' ';' ;
type_specifier : INT | VOID ;
fun_declaration : type_specifier ID '(' params ')' compound_stmt ;
params : param_list | VOID ;
param_list : param_list ',' param
| param ;
param : type_specifier ID | type_specifier ID '[' ']' ;
compound_stmt : '{' local_declarations statement_list '}' ;
local_declarations : local_declarations var_declaration
| /* empty */ ;
statement_list : statement_list statement
| /* empty */ ;
statement : expression_stmt
| compound_stmt
| selection_stmt
| iteration_stmt
| return_stmt ;
expression_stmt : expression ';'
| ';' ;
selection_stmt : IF '(' expression ')' statement
| IF '(' expression ')' statement ELSE statement ;
iteration_stmt : WHILE '(' expression ')' statement ;
return_stmt : RETURN ';' | RETURN expression ';' ;
expression : var '=' expression | simple_expression ;
var : ID | ID '[' expression ']' ;
simple_expression : additive_expression relop additive_expression
| additive_expression ;
relop : LTE | '<' | '>' | GTE | EQUAL | NOTEQUAL ;
additive_expression : additive_expression addop term | term ;
addop : '+' | '-' ;
term : term mulop factor | factor ;
mulop : '*' | '/' ;
factor : '(' expression ')' | var | call | NUM ;
call : ID '(' args ')' ;
args : arg_list | /* empty */ ;
arg_list : arg_list ',' expression | expression ;
As mientefuego pointed out you grammar has the classic "dangling else" problem.
You could beat the problem by assigning precedence to the rules that causes conflict.
The rule causing conflict is:
selection_stmt : IF '(' expression ')' statement
| IF '(' expression ')' statement ELSE statement ;
First start by making ELSE and LOWER_THAN_ELSE ( a pseudo-token ) non associative:
%nonassoc LOWER_THAN_ELSE
%nonassoc ELSE
This gives ELSE more precedence over LOWER_THAN_ELSE simply because LOWER_THAN_ELSE is declared first.
Then in the conflicting rule you have to assign a precedence to either the shift or reduce action:
selection_stmt : IF '(' expression ')' statement %prec LOWER_THAN_ELSE ;
| IF '(' expression ')' statement ELSE statement ;
Here, higher precedence is given to shifting. I have incorporated the above mentioned corrections and listed the complete grammar below:
/* C-Minus BNF Grammar */
%token ELSE
%token IF
%token INT
%token RETURN
%token VOID
%token WHILE
%token ID
%token NUM
%token LTE
%token GTE
%token EQUAL
%token NOTEQUAL
%nonassoc LOWER_THAN_ELSE
%nonassoc ELSE
%%
program : declaration_list ;
declaration_list : declaration_list declaration | declaration ;
declaration : var_declaration | fun_declaration ;
var_declaration : type_specifier ID ';'
| type_specifier ID '[' NUM ']' ';' ;
type_specifier : INT | VOID ;
fun_declaration : type_specifier ID '(' params ')' compound_stmt ;
params : param_list | VOID ;
param_list : param_list ',' param
| param ;
param : type_specifier ID | type_specifier ID '[' ']' ;
compound_stmt : '{' local_declarations statement_list '}' ;
local_declarations : local_declarations var_declaration
| /* empty */ ;
statement_list : statement_list statement
| /* empty */ ;
statement : expression_stmt
| compound_stmt
| selection_stmt
| iteration_stmt
| return_stmt ;
expression_stmt : expression ';'
| ';' ;
selection_stmt : IF '(' expression ')' statement %prec LOWER_THAN_ELSE ;
| IF '(' expression ')' statement ELSE statement ;
iteration_stmt : WHILE '(' expression ')' statement ;
return_stmt : RETURN ';' | RETURN expression ';' ;
expression : var '=' expression | simple_expression ;
var : ID | ID '[' expression ']' ;
simple_expression : additive_expression relop additive_expression
| additive_expression ;
relop : LTE | '<' | '>' | GTE | EQUAL | NOTEQUAL ;
additive_expression : additive_expression addop term | term ;
addop : '+' | '-' ;
term : term mulop factor | factor ;
mulop : '*' | '/' ;
factor : '(' expression ')' | var | call | NUM ;
call : ID '(' args ')' ;
args : arg_list | /* empty */ ;
arg_list : arg_list ',' expression | expression ;
maybe you should try a yacc -v <filename>, it generates an output of the details.
I tested here, and your grammar description fails in the classic "dangling else" problem.
Take a look at this Wikipedia article.
Ahem, the correct answer to this problem is usually: do nothing.
Shift/reduce conflicts are expected with ambiguous grammars. They are not errors, they are conflicts.
The conflict will be resolved by preferring shift over reduce, which just happens to solve the canonical dangling else problem.
And bison even has an %expect n statement so that you don't get a S/R conflict warning when there are exactly n conflicts.
First, get a state machine output from yacc. A state which can be either shifted or reduced represents a shift/reduce conflict. Find one, and then solve the conflict by rewriting the grammar.
This article gives an alternative solution to the one posted by ardsrk.