yacc shift/reduce conflict - conflict

I faced conflict problem during yacc compilation.
Error message below:
24: shift/reduce conflict (shift 66, reduce 99) on '/'
state 24
arithmetic_leaf : absolute_path . (99)
absolute_path : absolute_path . '/' relative_path (102)
Code below:
arithmetic_leaf: '(' arithmetic_expression ')'
{
}
| integer_value
{
}
| real_value
{
}
| absolute_path
{
}
;
absolute_path: '/'
{
}
| '/' relative_path
{
}
| absolute_path '/' relative_path
{
}
;
relative_path: path_segment
{
}
| relative_path '/' path_segment
{
}
;
path_segment: V_ATTRIBUTE_IDENTIFIER V_LOCAL_TERM_CODE_REF
{
}
| V_ATTRIBUTE_IDENTIFIER '[' V_ARCHETYPE_ID ']'
{
}
| V_ATTRIBUTE_IDENTIFIER
{
}
;
At this point, 'shift/reduce' conflict will occur.
I don't know what is the problem. How to solve this conflict?
Thanks.

The conflict (appears to me) to be between the alternatives for absolute_path.
It appears that a string like '/a/b will match either the absolute_path '/' relative_path rule, or the '/' relative_path rule.
At least to me, it looks like you just want to eliminate one of the two. I'd probably write it as:
absolute_path: '/'
| '/' relative_path
;
Alternatively, it might make more sense to allow a relative_path to just be an empty string, in which case, you could end up with something like:
absolute_path: '/' relative_path
;
relative_path:
| path_segment
| relative_path '/' path_segment
;

Related

How to fix extraneous input ' ' expecting, in antlr4

Hello when running antlr4 with the following input i get the following error
image showing problem
[
I have been trying to fix it by doing some changes here and there but it seems it only works if I write every component of whileLoop in a new line.
Could you please tell me what i am missing here and why the problem persits?
grammar AM;
COMMENTS :
'{'~[\n|\r]*'}' -> skip
;
body : ('BODY' ' '*) anything | 'BODY' 'BEGIN' anything* 'END' ;
anything : whileLoop | write ;
write : 'WRITE' '(' '"' sentance '"' ')' ;
read : 'READ' '(' '"' sentance '"' ')' ;
whileLoop : 'WHILE' expression 'DO' ;
block : 'BODY' anything 'END';
expression : 'TRUE'|'FALSE' ;
test : ID? {System.out.println("Done");};
logicalOperators : '<' | '>' | '<>' | '<=' | '>=' | '=' ;
numberExpressionS : (NUMBER numberExpression)* ;
numberExpression : ('-' | '/' | '*' | '+' | '%') NUMBER ;
sentance : (ID)* {System.out.println("Sentance");};
WS : [ \t\r\n]+ -> skip ;
NUMBER : [0-9]+ ;
ID : [a-zA-Z0-9]* ;
**`strong text`**
Your lexer rules produce conflicts:
body : ('BODY' ' '*) anything | 'BODY' 'BEGIN' anything* 'END' ;
vs
WS : [ \t\r\n]+ -> skip ;
The critical section is the ' '*. This defines an implicit lexer token. It matches spaces and it is defined above of WS. So any sequence of spaces is not handled as WS but as implicit token.
If I am right putting tabs between the components of whileloop will work, also putting more than one space between them should work. You should simply remove ' '*, since whitespace is to be skipped anyway.

Parsing DECAF grammar in ANTLR

I am creating a the parser for DECAF with Antlr
grammar DECAF ;
//********* LEXER ******************
LETTER: ('a'..'z'|'A'..'Z') ;
DIGIT : '0'..'9' ;
ID : LETTER( LETTER | DIGIT)* ;
NUM: DIGIT(DIGIT)* ;
COMMENTS: '//' ~('\r' | '\n' )* -> channel(HIDDEN);
WS : [ \t\r\n\f | ' '| '\r' | '\n' | '\t']+ ->channel(HIDDEN);
CHAR: (LETTER|DIGIT|' '| '!' | '"' | '#' | '$' | '%' | '&' | '\'' | '(' | ')' | '*' | '+'
| ',' | '-' | '.' | '/' | ':' | ';' | '<' | '=' | '>' | '?' | '#' | '[' | '\\' | ']' | '^' | '_' | '`'| '{' | '|' | '}' | '~'
'\t'| '\n' | '\"' | '\'');
// ********** PARSER *****************
program : 'class' 'Program' '{' (declaration)* '}' ;
declaration: structDeclaration| varDeclaration | methodDeclaration ;
varDeclaration: varType ID ';' | varType ID '[' NUM ']' ';' ;
structDeclaration : 'struct' ID '{' (varDeclaration)* '}' ;
varType: 'int' | 'char' | 'boolean' | 'struct' ID | structDeclaration | 'void' ;
methodDeclaration : methodType ID '(' (parameter (',' parameter)*)* ')' block ;
methodType : 'int' | 'char' | 'boolean' | 'void' ;
parameter : parameterType ID | parameterType ID '[' ']' ;
parameterType: 'int' | 'char' | 'boolean' ;
block : '{' (varDeclaration)* (statement)* '}' ;
statement : 'if' '(' expression ')' block ( 'else' block )?
| 'while' '(' expression ')' block
|'return' expressionA ';'
| methodCall ';'
| block
| location '=' expression
| (expression)? ';' ;
expressionA: expression | ;
location : (ID|ID '[' expression ']') ('.' location)? ;
expression : location | methodCall | literal | expression op expression | '-' expression | '!' expression | '('expression')' ;
methodCall : ID '(' arg1 ')' ;
arg1 : arg2 | ;
arg2 : (arg) (',' arg)* ;
arg : expression;
op: arith_op | rel_op | eq_op | cond_op ;
arith_op : '+' | '-' | '*' | '/' | '%' ;
rel_op : '<' | '>' | '<=' | '>=' ;
eq_op : '==' | '!=' ;
cond_op : '&&' | '||' ;
literal : int_literal | char_literal | bool_literal ;
int_literal : NUM ;
char_literal : '\'' CHAR '\'' ;
bool_literal : 'true' | 'false' ;
When I give it the input:
class Program {
void main(){
return 3+5 ;
}
}
The parse tree is not building correctly since it is not recognizing the 3+5 as an expression. Is there anything wrong with my grammar that is causing the problem?
Lexer rules are matched from top to bottom. When 2 or more lexer rules match the same amount of characters, the one defined first will win. Because of that, a single digit integer will get matched as a DIGIT instead of a NUM.
Try parsing the following instead:
class Program {
void main(){
return 33 + 55 ;
}
}
which will be parsed just fine. This is because 33 and 55 are matched as NUMs, because NUM can now match 2 characters (DIGIT only 1, so NUM wins).
To fix it, make DIGIT a fragment (and LETTER as well):
fragment LETTER: ('a'..'z'|'A'..'Z') ;
fragment DIGIT : '0'..'9' ;
ID : LETTER( LETTER | DIGIT)* ;
NUM: DIGIT(DIGIT)* ;
Lexer fragments are only used internally by other lexer rules, and will never become tokens of their own.
A couple of other things: your WS rule matches way too much (it now also matches a | and a '), it should be:
WS : [ \t\r\n\f]+ ->channel(HIDDEN);
and you shouldn't match a char literal in your parser: do it in the lexer:
CHAR : '\'' ( ~['\r\n\\] | '\\' ['\\] ) '\'';
If you don't, the following will not get parsed properly:
class Program {
void main(){
return '1';
}
}
because the 1 wil be tokenized as a NUM and not as a CHAR.

ANTLR - Field that accept attributes with more than one word

My Grammar file (see below) parses queries of the type:
(name = Jon AND age != 16 OR city = NY);
However, it doesn't allow something like:
(name = 'Jon Smith' AND age != 16);
ie, it doesn't allow assign to a field values with more than one word, separated by White Spaces. How can I modify my grammar file to accept that?
options
{
language = Java;
output = AST;
}
tokens {
BLOCK;
RETURN;
QUERY;
ASSIGNMENT;
INDEXES;
}
#parser::header {
package pt.ptinovacao.agorang.antlr;
}
#lexer::header {
package pt.ptinovacao.agorang.antlr;
}
query
: expr ('ORDER BY' NAME AD)? ';' EOF
-> ^(QUERY expr ^('ORDER BY' NAME AD)?)
;
expr
: logical_expr
;
logical_expr
: equality_expr (logical_op^ equality_expr)*
;
equality_expr
: NAME equality_op atom -> ^(equality_op NAME atom)
| '(' expr ')' -> ^('(' expr)
;
atom
: ID
| id_list
| Int
| Number
;
id_list
: '(' ID (',' ID)* ')'
-> ID+
;
NAME
: 'equipType'
| 'equipment'
| 'IP'
| 'site'
| 'managedDomain'
| 'adminState'
| 'dataType'
;
AD : 'ASC' | 'DESC' ;
equality_op
: '='
| '!='
| 'IN'
| 'NOT IN'
;
logical_op
: 'AND'
| 'OR'
;
Number
: Int ('.' Digit*)?
;
ID
: ('a'..'z' | 'A'..'Z' | '_' | '.' | '-' | Digit)*
;
String
#after {
setText(getText().substring(1, getText().length()-1).replaceAll("\\\\(.)", "$1"));
}
: '"' (~('"' | '\\') | '\\' ('\\' | '"'))* '"'
| '\'' (~('\'' | '\\') | '\\' ('\\' | '\''))* '\''
;
Comment
: '//' ~('\r' | '\n')* {skip();}
| '/*' .* '*/' {skip();}
;
Space
: (' ' | '\t' | '\r' | '\n' | '\u000C') {skip();}
;
fragment Int
: '1'..'9' Digit*
| '0'
;
fragment Digit
: '0'..'9'
;
indexes
: ('[' expr ']')+ -> ^(INDEXES expr+)
;
Include the String token as an alternative in your atom rule:
atom
: ID
| id_list
| Int
| Number
| String
;

ANTLR error : java.lang.NoSuchFieldError: offendingToken

I have written the following grammar file and am getting the following error. I have done many google search and some answer says that there something wrong in the grammar. But the following error message does not indicate the specific place, where is the error message possibly. Can you please advise, why I am getting the error, described below.
grammar ArchSpec;
options {
language = Java;
}
#lexer::header {
package iotsuite.parser;
}
#parser::header {
package iotsuite.parser;
import iotsuite.compiler.*;
import iotsuite.semanticmodel.*;
}
#members {
private SymbolTable context;
}
archSpec :
('structs' ':' struct_def)*
'softwarecomponents' ':' (component_def)+
;
component_def :
'computationalService' ':' (cs_def)+
;
struct_def:
CAPITALIZED_ID
(structField_def ';')+
;
structField_def:
lc_id ':' dataType
;
cs_def:
CAPITALIZED_ID
(csGeneratedInfo_def ';')+
(csConsumeInfo_def ';')*
(csRequest_def ';')*
(cntrlCommand_def ';')*
(partition_def ';')+
;
csGeneratedInfo_def:
'generate' lc_id ':' CAPITALIZED_ID
;
csConsumeInfo_def:
'consume' lc_id ('from' 'region-hops' ':' INT ':' CAPITALIZED_ID )?
;
csRequest_def :
'request' lc_id
;
cntrlCommand_def :
'command' name = CAPITALIZED_ID '(' (cntrlParameter_def)? ')' 'to' 'region-hops' ':' INT ':' CAPITALIZED_ID
;
cntrlParameter_def :
lc_id (',' parameter_def )?
;
partition_def:
csDeploymentConstraint='partition-per' ':' CAPITALIZED_ID
;
lc_id: ID
;
dataType:
primitiveType
;
primitiveType:
(id='Integer' | id='Boolean' | id='String' | id = 'double' | id = 'long' | id='boolean' )
;
ID : 'a'..'z' ('a'..'z' | 'A'..'Z' )*
;
INT : '0'..'9'('0'..'9')* ;
CAPITALIZED_ID: 'A'..'Z' ('a'..'z' | 'A'..'Z' )*;
WS: ('\t' | ' ' | '\r' | '\n' | '\u000C')+ {$channel = HIDDEN;};
Error Message I am getting on console is the following.
java.lang.NoSuchFieldError: offendingToken
at org.deved.antlride.runtime.AntlrErrorListener.extractToken(AntlrErrorListener.java:111)
at org.deved.antlride.runtime.AntlrErrorListener.report(AntlrErrorListener.java:79)
at org.deved.antlride.runtime.AntlrErrorListener.message(AntlrErrorListener.java:63)
at org.deved.antlride.runtime.AntlrErrorListener.error(AntlrErrorListener.java:53)
at org.antlr.tool.ErrorManager.grammarError(ErrorManager.java:742)
at org.antlr.tool.ErrorManager.grammarError(ErrorManager.java:750)
at org.antlr.tool.NameSpaceChecker.lookForReferencesToUndefinedSymbols(NameSpaceChecker.java:133)
at org.antlr.tool.NameSpaceChecker.checkConflicts(NameSpaceChecker.java:72)
at org.antlr.tool.Grammar.checkNameSpaceAndActions(Grammar.java:804)
at org.antlr.tool.CompositeGrammar.defineGrammarSymbols(CompositeGrammar.java:374)
at org.antlr.Tool.process(Tool.java:484)
at org.deved.antlride.runtime.Tool2.main(Tool2.java:24)
An instance of input I parse is the following:
softwarecomponents:
computationalService:
RoomAvgTemp
generate roomAvgTempMeasurement:TempStruct;
consume tempMeasurement from region-hops:0:Room;
partition-per : Room;
RoomController
consume roomAvgTempMeasurement from region-hops:0:Room;
command SetTemp(setTemp) to region-hops:0:Room;
partition-per : Room;

antlr gated predicate

This is a follow-up question from Antlr superfluous Predicate required? where I stated my problem in a simplified way, however it could not be solved there.
I have the following grammar and when I delete the {true}?=> predicates, the text is not recognized anymore. The input string is MODULE main LTLSPEC H {} {} {o} FALSE;. Note that the trailing ; is not tokenized as EOC, but as IGNORE. When I add {true}?=> to the EOC rule ; is tokenized as EOC.
I tried this from command-line with antlr-v3.3 and v3.4 without difference. Thanks in advance, I appreciate your help.
grammar NusmvInput;
options {
language = Java;
}
#parser::members{
public static void main(String[] args) throws Exception {
NusmvInputLexer lexer = new NusmvInputLexer(new ANTLRStringStream("MODULE main LTLSPEC H {} {} {o} FALSE;"));
NusmvInputParser parser = new NusmvInputParser(new CommonTokenStream(lexer));
parser.specification();
}
}
#lexer::members{
private boolean inLTL = false;
}
specification :
module+ EOF
;
module :
MODULE module_decl
;
module_decl :
NAME parameter_list ;
parameter_list
: ( LP (parameter ( COMMA parameter )*)? RP )?
;
parameter
: (NAME | INTEGER )
;
/**************
*** LEXER
**************/
COMMA
:{!inLTL}?=> ','
;
OTHER
: {!inLTL}?=>( '&' | '|' | 'xor' | 'xnor' | '=' | '!' |
'<' | '>' | '-' | '+' | '*' | '/' |
'mod' | '[' | ']' | '?')
;
RCP
: {!inLTL}?=>'}'
;
LCP
: {!inLTL}?=>'{'
;
LP
: {!inLTL}?=>'('
;
RP
: {!inLTL}?=>')'
;
MODULE
: {true}?=> 'MODULE' {inLTL = false;}
;
LTLSPEC
: {true}?=> 'LTLSPEC'
{inLTL = true; skip(); }
;
EOC
: ';'
{
if (inLTL){
inLTL = false;
skip();
}
}
;
WS
: (' ' | '\t' | '\n' | '\r')+ {$channel = HIDDEN;}
;
COMMENT
: '--' .* ('\n' | '\r') {$channel = HIDDEN;}
;
INTEGER
: {!inLTL}?=> ('0'..'9')+
;
NAME
:{!inLTL}?=> ('A'..'Z' | 'a'..'z') ('a'..'z' | 'A'..'Z' | '0'..'9' | '_' | '$' | '#' | '-')*
;
IGNORE
: {inLTL}?=> . {skip();}
;
It seems that without a predicate before MODULE and LTLSPEC, the NAME gets precedence over them even if these tokens are defined before the NAME token. Whether this is by design or a bug, I don't know.
However, the way you're trying to solve it seems rather complicated. As far as I can see, you seem to want to ignore (or skip) input starting with LTLSPEC and ending with a semi colon. Why not do something like this instead:
specification : module+ EOF;
module : MODULE module_decl;
module_decl : NAME parameter_list;
parameter_list : (LP (parameter ( COMMA parameter )*)? RP)?;
parameter : (NAME | INTEGER);
MODULE : 'MODULE';
LTLSPEC : 'LTLSPEC' ~';'* ';' {skip();};
COMMA : ',';
OTHER : ( '&' | '|' | 'xor' | 'xnor' | '=' | '!' |
'<' | '>' | '-' | '+' | '*' | '/' |
'mod' | '[' | ']' | '?')
;
RCP : '}';
LCP : '{';
LP : '(';
RP : ')';
EOC : ';';
WS : (' ' | '\t' | '\n' | '\r')+ {$channel = HIDDEN;};
COMMENT : '--' .* ('\n' | '\r') {$channel = HIDDEN;};
INTEGER : ('0'..'9')+;
NAME : ('A'..'Z' | 'a'..'z') ('a'..'z' | 'A'..'Z' | '0'..'9' | '_' | '$' | '#' | '-')*;