How to fix this yacc shift/reduce conflict - yacc

I have this grammar
value
: INTEGER
| REAL
| LEFTBRACKET value RIGHTBRACKET
| op expression
| expression binaryop expression
;
and I am getting this shift reduce error
47 expression: value .
53 value: LEFTBRACKET value . RIGHTBRACKET
RIGHTBRACKET shift, and go to state 123
RIGHTBRACKET [reduce using rule 47 (expression)]
$default reduce using rule 47 (expression)`
So far I tried setting %left and %right priorities with no luck. I have also tried to use a new grammar for value that does not call itself again but I get conflicts. I tried this solution too
any thoughts?
Thank you in advance
EDIT
expression
: lvalue
| value
;
lvalue
: IDENTIFIER
| lvalue LEFTSQBRACKET expression RIGHTSQBRACKET
| LEFTBRACKET lvalue RIGHTBRACKET
binaryop
: PLUS
| MINUS
| MUL
| DIVISION
| DIV
| MOD
;
I manage to overcome most of the conflict using this grammar but i still get the conflict i mention above
binaryop
: expression PLUS expression
| expression MINUS expression
| expression MUL expression
| expression DIVISION expression
| expression DIV expression
| expression MOD expression
;

Why do you have both value and expression? Without seeing the rest of the grammar, I hesitate to guess the use of expression which leads to that conflict, but my guess is that it has to do with the unnecessary unit production.
On the other hand, you will not be able to resolve precedences if you lump all operator terminals intobinaryop (unless all binary operators have the same precedence). So I'd suggest you find a standard expression grammar (such as in the bison manual or wikipedia) and use it as a base.

Related

How to write yacc rules

I'm a newbie to yacc and not really understand how to write the rules, especially handle the recursive definitions.
%token NUMBER
%token VARIABLE
%left '+' '-'
%left '*' '/' '%'
%left '(' ')'
%%
S: VARIABLE'='E {
printf("\nEntered arithmetic expression is Valid\n\n");
return 0;
}
E : E'+'E
| E'-'E
| E'*'E
| E'/'E
| E'%'E
| '('E')'
| NUMBER
| VARIABLE
;
%%
The above example is work well, but when I changed it as below, it got "5 shift/reduce conflicts".
%token NUMBER
%token VARIABLE
%token MINS
%token PULS
%token MUL
%token DIV
%token MOD
%token LP
%token RP
%left MINS PULS
%left MUL DIV MOD
%left LP RP
%%
S: VARIABLE'='E {
printf("\nEntered arithmetic expression is Valid\n\n");
return 0;
}
E : E operator E
| LP E RP
| NUMBER
| VARIABLE
;
operator: MINS
| PULS
| MUL
| DIV
| MOD
;
%%
Can any one tell me what is the difference between these examples? Thanks a lot..
The difference is the additional indirection with the non-terminal operator. That serves to defeat your precedence declarations.
Precedence is immediate, not transparent. That is, it only functions in the production directly including the terminal. In your second grammar, that production is:
operator: MINS
| PULS
| MUL
| DIV
| MOD
;
But there is no ambiguity to resolve in that production. All of those terminals are unambiguously reduced to operator. The ambiguity is in the production
E : E operator E
And that production has no terminals in it.
By contrast, in your first grammar, the productions
E : E'+'E
| E'-'E
| E'*'E
| E'/'E
| E'%'E
(which would be easier to read with a bit more whitespace) do include terminals whose precedences can be compared with each other.
The precise working of precedence declarations is explained in the Bison manual. In case, it's useful, here's a description of the algorithm I wrote a few years ago in a different answer on this site.

Multiplication by juxtaposition in yacc

I'm trying to implement a grammar that allows multiplication by juxtaposition.
This is for parsing polynomial inputs for a CAS.
It works quite well, except few edge cases, as far as I'm aware of.
There are two problems I have identified:
Conflict with other rules, e.g., a^2 b is (erroneously) parsed as (^ a (* 2 b)), not as (* (^ a 2) b).
yacc(bison) reports 28 shift/reduce conflicts and 8 reduce/reduce conflicts.
I'm pretty sure properly resolving the first issue will resolve the second as well, but so far I haven't been successful.
The following is the gist of the grammar that I'm working with:
%start prgm
%union {
double num;
char *var;
ASTNode *node;
}
%token <num> NUM
%token <var> VAR
%type <node> expr
%left '+' '-'
%left '*' '/'
%right '^'
%%
prgm: // nothing
| prgm '\n'
| prgm expr '\n'
;
expr: NUM
| VAR
| expr '+' expr
| expr '-' expr
| expr '*' expr
| expr '/' expr
| expr '^' expr
| expr expr %prec '*'
| '-' expr
| '(' expr ')'
;
%%
Removing the rule for juxtaposition (expr expr %prec '*') resolves the shift/reduce & reduce/reduce warnings.
Note that ab in my grammar should mean (* a b).
Multi-character variables should be preceded by a quote('); this is already handled fine in the lex file.
The lexer ignores spaces( ) and tabs(\t) entirely.
I'm aware of this question, but the use of juxtaposition here does not seem to indicate multiplication.
Any comments or help would be greatly appreciated!
P.S. If it helps, this is the link to the entire project.
As indicated in the answer to the question you linked, it is hard to specify the operator precedence of juxtaposition because there is no operator to shift. (As in your code, you can specify the precedence of the production expr: expr expr. But what lookahead token will this reduction be compared with? Adding every token in FIRST(expr) to your precedence declarations is not very scalable, and might lead to unwanted precedence resolutions.
An additional problem with the precedence solution is the behaviour of the unary minus operator (an issue not addressed in the linked question), because as written your grammar allows a - b to be parsed either as a subtraction or as the juxtaposed multiplication of a and -b. (And note that - is in FIRST(expr), leading to one of the possibly unwanted resolutions I referred to above.)
So the best solutions, as recommended in the linked question, is to use a grammar with explicit precedence, such as the following: (Here, I used juxt as the name of the non-terminal, rather than expr_sequence):
%start prgm
%token NUM
%token VAR
%left '+' '-'
%left '*' '/'
%right '^'
%%
prgm: // nothing
| prgm '\n'
| prgm expr '\n'
expr: juxt
| '-' juxt
| expr '+' expr
| expr '-' expr
| expr '*' expr
| expr '/' expr
| expr '^' expr
juxt: atom
| juxt atom
atom: NUM
| VAR
| '(' expr ')'
This grammar may not be what you want:
it's rather simple-minded handling of unary minus has a couple of issues. I don't think it's problematic that it parses -xy into -(xy) instead of (-x)y, but it's not ideal. Also, it doesn't allow --x (also, probably not a problem but not ideal). Finally, it does not parse -x^y as -(x^y), but as (-x)^y, which is contrary to frequent practice.
In addition, it incorrectly binds juxtaposition too tightly. You might or might not consider it a problem that a/xy parses as a/(xy), but you would probably object to 2x^7 being parsed as (2x)^7.
The simplest way to avoid those issues is to use a grammar in which operator precedence is uniformly implemented with unambiguous grammar rules.
Here's an example which implements standard precedence rules (exponentiation takes precedence over unary minus; juxtaposing multiply has the same precedence as explicit multiply). It's worth taking a few minutes to look closely at which non-terminal appears in which production, and think about how that correlates with the desired precedence rules.
%union {
double num;
char *var;
ASTNode *node;
}
%token <num> NUM
%token <var> VAR
%type <node> expr mult neg expt atom
%%
prgm: // nothing
| prgm '\n'
| prgm error '\n'
| prgm expr '\n'
expr: mult
| expr '+' mult
| expr '-' mult
mult: neg
| mult '*' neg
| mult '/' neg
| mult expt
neg : expt
| '-' neg
expt: atom
| atom '^' neg
atom: NUM
| VAR
| '(' expr ')'

Antlr4 parser not parsing reassignment statement correctly

I've been creating a grammar parser using Antlr4 and wanted to add variable reassignment (without having to declare a new variable)
I've tried changing the reassignment statement to be an expression, but that didn't change anything
Here's a shortened version of my grammar:
grammar MyLanguage;
program: statement* EOF;
statement
: expression EOC
| variable EOC
| IDENTIFIER ASSIGNMENT expression EOC
;
variable: type IDENTIFIER (ASSIGNMENT expression)?;
expression
: STRING
| INTEGER
| IDENTIFIER
| expression MATH expression
| ('+' | '-') expression
;
MATH: '+' | '-' | '*' | '/' | '%' | '//' | '**';
ASSIGNMENT: MATH? '=';
EOC: ';';
WHITESPACE: [ \t\r\n]+ -> skip;
STRING: '"' (~[\u0000-\u0008\u0010-\u001F"] | [\t])* '"' | '\'' (~[\u0000-\u0008\u0010-\u001F'] | [\t])* '\'';
INTEGER: '0' | ('+' | '-')? [1-9][0-9]*;
IDENTIFIER: [a-zA-Z_][a-zA-Z0-9_]*;
type: 'str';
if anything else might be of relevance, please ask
so I tried to parse
str test = "empty";
test = "not empty";
which worked, but when I tried (part of the fibbionaci function)
temp = n1;
n1 = n1 + n2;
n2 = temp;
it got an error and parsed it as
temp = n1; //statement
n1 = n1 //statement - <missing ';'>
+n2; //statement
n2 = temp; //statement
Your problem has nothing to do with assignment statements. Additions simply don't work at all - whether they're part of an assignment or not. So the simplest input to get the error would be x+y;. If you print the token stream for that input (using grun with the -tokens option for example), you'll get the following output:
[#0,0:0='x',<IDENTIFIER>,1:0]
[#1,1:1='+',<'+'>,1:1]
[#2,2:2='y',<IDENTIFIER>,1:2]
[#3,3:3=';',<';'>,1:3]
[#4,4:3='<EOF>',<EOF>,1:4]
line 1:1 no viable alternative at input 'x+'
Now compare this to x*y;, which works fine:
[#0,0:0='x',<IDENTIFIER>,1:0]
[#1,1:1='*',<MATH>,1:1]
[#2,2:2='y',<IDENTIFIER>,1:2]
[#3,3:3=';',<';'>,1:3]
[#4,4:3='<EOF>',<EOF>,1:4]
The important difference here is that * is recognized as a MATH token, but + isn't. It's recognized as a '+' token instead.
This happens because you introduced a separate '+' (and '-') token type in the alternative | ('+' | '-') expression. So whenever the lexer sees a + it produces a '+' token, not a MATH token, because string literals in parser rules take precedence over named lexer rules.
If you turn MATH into a parser rule math (or maybe mathOperator) instead, all of the operators will be literals and the problem will go away. That said, you probably don't want a single rule for all math operators because that doesn't give you the precedence you want, but that's a different issue.
PS: Something like x+1 still won't work because it will see +1 as a single INTEGER token. You can fix that by removing the leading + and - from the INTEGER rule (that way x = -2 would be parsed as a unary minus applied to the integer 2 instead of just the integer -2, but that's not a problem).

ANTLR4 - How do I get the token TYPE as the token text in ANTLR?

Say I have a grammar that has tokens like this:
AND : 'AND' | 'and' | '&&' | '&';
OR : 'OR' | 'or' | '||' | '|' ;
NOT : 'NOT' | 'not' | '~' | '!';
When I visualize the ParseTree using TreeViewer or print the tree using tree.toStringTree(), each node's text is the same as what was matched.
So if I parse "A and B or C", the two binary operators will be "and" / "or".
If I parse "A && B || C", they'll be "&&" / "||".
What I would LIKE is for them to always be "AND" / "OR / "NOT", regardless of what literal symbol was matched. Is this possible?
This is what the vocabulary is for. Use yourLexer.getVocabulary() or yourParser.getVocabulary() and then vocabulary.getSymbolicName(tokenType) for the text representation of the token type. If that returns an empty string try as second step vocabulary.getLiteralName(tokenType), which returns the text used to define the token.

Antlr evaluation order

I defined the following expression rule using Antlr 4 for a script language,
basically I am trying to evaluate
x = y.z.aa * 6
the correct evaluation order should be y.z then y.z.aa then it times 6;
((y.z).aa) * 6
however after the parsing aa*6 evaluated first, then z.(aa*6) then y.(z.(aa*6)), it becomes
y.(z.(aa * 6))
the square bracket is evaluated right
x = y[z][aa] * 6
can anyone help to point what I did wrong in dot access rule?
expression
: primary #PrimaryExpression
| expression ('.' expression ) + #DotAccessExpression
| expression ('[' expression ']')+ #ArrayAccessExpression
| expression ('*'|'/') expression #MulExpression
| expression ('+'|'-') expression #AddExpression
;
primary
: '(' expression ')'
| literal
| ident
;
literal
: NUMBER
| STRING
| NULL
| TRUE
| FALSE
;
You used the following rule:
expression ('.' expression)+
This rule does not fit the syntax pattern for a binary expression, so it's actually getting treated as a suffix expression. In particular, the expression following a . character is no longer restricted within the precedence hierarchy. You may be additionally affected by issue #679, but the real resolution is the same either way. You need to replace this alternative with the following:
expression '.' expression
The same goes for the ArrayAccessExpression, which should be written as follows:
expression '[' expression ']' #ArrayAccessExpression