Shift Reduce Error In YACC - yacc

I am working with lex and yacc. following is the program for lex and yacc calculator. while excuting yacc file. Please help me soving this problem . i am getting following errors:
This is the error:
conflicts: 20 shift/reduce
//YACC program
%{
#include<stdio.h>
#include<math.h>
extern void printsymbol();
struct symboltable
{
char name[20];
double value;
}ST[20];
%}
%union
{
double p;
}
%token <p> NUM
%token <p> IDENTIFIER
%token SIN COS TAN ROOT
%left '+' '-'
%left '*' '/'
%type <p> E
%%
Edash:E';'{printf("\n=%f",$1);printsymbol();}
|Edash E';'{printf("\n=%f",$2);printsymbol();}
E: E'+'E {$$=$1+$3;}
|E'-'E {$$=$1-$3;}
|E'*'E {$$=$1*$3;}
|E'/'E {$$=$1/$3;}
|NUM {$$=$1;}
|IDENTIFIER {$$=ST[(int)]$1.value;}
|'('E')' {$$=$2;}
|IDENTIFIER'='E {$$=ST[(int)]$1.value=$3;}
|SIN E {$$=sin($2*3.141/180);}
|COS E {$$=sin($2*3.141/180);}
|TAN E {$$=sin($2*3.141/180);}
|ROOT E {$$=sqrt($2);}
%%
int main()
{
yyparse();
}
yyerror()
{
printf("Error Found..!");
}

Your conflicts come from the rules:
|IDENTIFIER '=' E
|SIN E
|COS E
|TAN E
|ROOT E
Because none of these rules have precedences set for them (not set on any of the tokens in these rules), when you get an input like SIN X + Y, the parser doesn't know whether to
parse it as(SIN X) + Y or SIN (X + Y)
You can fix it by setting precedences for all these rules, which is most easily done by adding a line
%nonassoc '=' SIN COS TAN ROOT
setting precedence values for all of those tokens, which will be inherited by the rules. Its up to you whether they should be higher or lower precedence than the binary operators. For normal conventions, you probably want = as lower precedence and the functions as higher precedence (which means you actually need two new lines rather than having them all in one line)

You must run yacc with the -v option to generate a file called y.output. There you can find clues about the conflicts. The file shows the detailed states of the generated parser, and which states have conflicts in them between shifting a particular token or reducing via some rule.
A possible issue with your grammar is this:
E : IDENTIFIER '=' E
Suppose you have you have
X = 3 + 5
what is the precedence of = versus +? If the parser has just seen X = 3 and the next lookahead token is +, what should it do? Should it reduce X = 3 to E by the rule E : IDENTIFIER '=' E or should it shift the + and continue scanning a longer right-hand-side E?
Look at y.output and see if it confirms this hypothesis.

Related

3 Address Code Generation using lex and yacc

I'm trying to generate 3 address code corresponding to basic arithmetic expressions. I haven't worked with lex and yacc tools before much (Newbie) and I'm having trouble understanding the flow of control/command among the two i.e how the two programs are interacting.
lex.l
%{
#include<stdio.h>
#include"y.tab.h"
int k=1;
%}
%%
[0-9]+ {
yylval.dval=yytext[0];
return NUM;
}
\n {return 0;}
. {return yytext[0];}
%%
void yyerror(char* str)
{
printf("\n%s",str);
}
char *gencode(char word[],char first,char op,char second)
{
char temp[10];
sprintf(temp,"%d",k);
strcat(word,temp);
k++;
printf("%s = %c %c %c\n",word,first,op,second);
return word; //Returns variable name like t1,t2,t3... properly
}
int yywrap()
{
return 1;
}
main()
{
yyparse();
return 0;
}
yacc.y
%{
#include<stdio.h>
int aaa;
%}
%union{
char dval;
}
%token <dval> NUM
%type <dval> E
%left '+' '-'
%left '*' '/' '%'
%%
statement : E {printf("\nt = %c \n",$1);}
;
E : E '+' E
{
char word[]="t";
char *test=gencode(word,$1,'+',$3);
$$=test;
}
| E '-' E
{
char word[]="t";
char *test=gencode(word,$1,'-',$3);
$$=test;
}
| E '%' E
{
char word[]="t";
char *test=gencode(word,$1,'%',$3);
$$=test;
}
| E '*' E
{
char word[]="t";
char *test=gencode(word,$1,'*',$3);
$$=test;
}
| E '/' E
{
char word[]="t";
char *test=gencode(word,$1,'/',$3);
$$=test;
}
| '(' E ')'
{
$$=$2;
}
| NUM
{
$$=$1;
}
;
%%
Problem:
getting garbage value in output
Expected output for expression (2+3)*5 should be like:
t1= 2 + 3
t2= t1 * 5
Obtained output:
t1= 2 + 3
t2= garbage value * 5
I'm unable to figure out how to correct this. The variable names (eg t1,t2,t3 ) are being properly returned from gencode() method in lex.l
char *test=gencode(word,$1,'%',$3);
But I'm completely clueless about what is going wrong after that. I believe I'm not handling the $$,$1,$3 terms correctly.
Please help me understand what is going wrong, what needs to be done and how to do it.
A little help and some explanation would be very helpful. Thank you.
The problem here is not in the use of flex or bison; rather, it is an Undefined Behaviour in your C code.
Your gencode function returns its first argument. Then you call it like this, roughly:
{
char word[] = ...
... = gencode(word, ...);
}
The lifetime of word ends when the block finishes, which is right after the call to gencode. In effect, that is no different from the classic dangling pointer generator:
char* dangle(void) {
char temporary[] = "some string";
return temporary;
}
which is obviously incorrect, since the local variable ceases to exist before its address is returned.
In addition, you actually create word as a two-character array:
char word[] = "t";
since leaving out the size tells C to leave exactly enough space for the initial string (one character plus null terminator). That's fine, but you cannot then append more characters to the string (with strcat) because there is no space left and you will end up overwriting some other variable (or worse).
mentall n't end even after the function returns. That's why I declared char word[] before calling the function. This ideone.com/RBz0y2 is a code I wrote seperately and used it in here too. Is it not right? – Swagnik Dutta Mar 9 '16 at 16:38
#novice: If the caller allocates on the stack and passes to the called function, that is fine. The caller still has the memory. But as soon as the caller returns to its caller, the memory is gone. You cannot set a persistent variable to the address. So, no, it is not right. If you need keep the value around for the future, you need to allocate with malloc; stack-allocated storage

failed to parse number by yacc and lex

i have finished my lex file and start to learn about yacc
but i have some question about part of my code of lex:
%{
#include "y.tab.h"
int num_lines = 1;
int comment_mode=0;
int stack =0;
%}
digit ([0-9])
integer ({digit}+)
float_num ({digit}+\.{digit}+)
%%
{integer} { //deal with integer
printf("#%d: NUM:",num_lines); ECHO;printf("\n");
yylval.Integer = atoi(yytext);
return INT;
}
{float_num} {// deal with float
printf("#%d: NUM:",num_lines);ECHO;printf("\n");
yylval.Float = atof(yytext);
return FLOAT;
}
\n { ++num_lines; }
. if(strcmp(yytext," "))ECHO;
%%
int yywrap() {
return 1;
}
every time i got an integer or a float i return the token and save it into yylval
and here is my code in parser.y:
%{
#include <stdio.h>
#define YYDEBUG 1
void yyerror (char const *s) {
fprintf (stderr, "%s\n", s);
}
%}
%union{
int Integer;
float Float;
}
%token <int>INT;
%token <float>FLOAT;
%%
statement :
INT {printf("int yacc\n");}
| FLOAT {printf("float yacc\n");}
|
;
%%
int main(int argc, char** argv)
{
yyparse();
return 0;
}
which compiled by
byacc –d parser.y
lex lex.l
gcc lex.yy.c y.tab.c –ll
since i just want to try something easy to get started, i want to see if i can parse
only int and float number first, i print them in both .l and .y file after i input an
integer or a float.int the begining i input fisrt random number, for example 123
, then my program print :
1: NUM: 123
in yylex() and
"int yacc\n"
in parser.y
but if i input the second else number, it shows syntax error and the program shutdown
i dont know where is the problem.
is there any solution?
Your grammar only accepts a single token, either an INT or a FLOAT. So it will only accept a single number, which is why it produces a syntax error when it reads the second number; it is expecting an end-of-file.
The solution is to change the grammar so that it accepts any number of "statements":
program: /* EMPTY */
| program statement
;
Two notes:
1) You don't need an (expensive) strcmp in your lexer. Just do this:
" " /* Do nothing */;
. { return yytext[0]; }
It's better to return the unknown character to the parser, which will produce a syntax error if the character doesn't correspond to any token type (as in your simple grammar) than to just echo the character to stdout, which will prove confusing. Some people would prefer to produce an error message in the lexer for invalid input, but while you are developing a grammar I think it is easier to just pass through the characters, because that lets you add operators to your parser without regenerating the lexer.
2) When you specify %types in bison, you use the tagname from the union, not the C type. Some (but not all) versions of bison let you get away with using the C type if it is a simple type, but you can't count on it; it's not posix standard and it may well break if you use an older or newer version of bison. (For example, it won't work with bison 3.0.) So you should write, for example:
%union{
int Integer;
float Float;
}
%token <Integer>INT;
%token <Float>FLOAT;

ANTLR: removing clutter

i'm learning ANTLR right now. Let's say, I have a VHDL code and would like to do some processing on the PROCESS blocks. The rest should be completely ignored. I don't want to describe the whole VHDL language, since I'm interested only in the process blocks. So I could write a rule that matches process blocks. But how do I tell ANTLR to match only the process block rule and ignore anything else?
I know next to no VHDL, so let's say you want to replace all single line comments in a (Java) source file with multi-line comments:
//foo
should become:
/* foo */
You need to let the lexer match single line comments, of course. But you should also make sure it recognizes multi-line comments because you don't want //bar to be recognized as a single line comment in:
/*
//bar
*/
The same goes for string literals:
String s = "no // comment";
Finally, you should create some sort of catch-all rule in the lexer that will match any character.
A quick demo:
grammar T;
parse
: (t=. {System.out.print($t.text);})* EOF
;
Str
: '"' ('\\' . | ~('\\' | '"'))* '"'
;
MLComment
: '/*' .* '*/'
;
SLComment
: '//' ~('\r' | '\n')*
{
setText("/* " + getText().substring(2) + " */");
}
;
Any
: . // fall through rule, matches any character
;
If you now parse input like this:
//comment 1
class Foo {
//comment 2
/*
* not // a comment
*/
String s = "not // a // comment"; //comment 3
}
the following will be printed to your console:
/* comment 1 */
class Foo {
/* comment 2 */
/*
* not // a comment
*/
String s = "not // a // comment"; /* comment 3 */
}
Note that this is just a quick demo: a string literal in Java could contain Unicode escapes, which my demo doesn't support, and my demo also does not handle char-literals (the char literal char c = '"'; would break it). All of these things are quite easy to fix, of course.
In the upcoming ANTLR v4, you can do fuzzy parsing. take a look at
http://www.antlr.org/wiki/display/ANTLR4/Wildcard+Operator+and+Nongreedy+Subrules
You can get the beta software here:
http://antlr.org/download/antlr-4.0b3-complete.jar
Terence

Whats the correct way to add new tokens (rewrite) to create AST nodes that are not on the input steam

I've a pretty basic math expression grammar for ANTLR here and what's of interest is handling the implied * operator between parentheses e.g. (2-3)(4+5)(6*7) should actually be (2-3)*(4+5)*(6*7).
Given the input (2-3)(4+5)(6*7) I'm trying to add the missing * operator to the AST tree while parsing, in the following grammar I think I've managed to achieve that but I'm wondering if this is the correct, most elegant way?
grammar G;
options {
language = Java;
output=AST;
ASTLabelType=CommonTree;
}
tokens {
ADD = '+' ;
SUB = '-' ;
MUL = '*' ;
DIV = '/' ;
OPARN = '(' ;
CPARN = ')' ;
}
start
: expression EOF!
;
expression
: mult (( ADD^ | SUB^ ) mult)*
;
mult
: atom (( MUL^ | DIV^) atom)*
;
atom
: INTEGER
| (
OPARN expression CPARN -> expression
)
(
OPARN expression CPARN -> ^(MUL expression)+
)*
;
INTEGER : ('0'..'9')+ ;
WS : (' ' | '\t' | '\n' | '\r' | '\f')+ {$channel = HIDDEN;};
This grammar appears to output the correct AST Tree in ANTLRworks:
I'm only just starting to get to grips with parsing and ANTLR, don't have much experience so feedback with really appreciated!
Thanks in advance! Carl
First of all, you did a great job given the fact that you've never used ANTLR before.
You can omit the language=Java and ASTLabelType=CommonTree, which are the default values. So you can just do:
options {
output=AST;
}
Also, you don't have to specify the root node for each operator separately. So you don't have to do:
(ADD^ | SUB^)
but the following:
(ADD | SUB)^
will suffice. With only two operators, there's not much difference, but when implementing relational operators (>=, <=, > and <), the latter is a bit easier.
Now, for you AST: you'll probably want to create a binary tree: that way, all internal nodes are operators, and the leafs will be operands which makes the actual evaluating of your expressions much easier. To get a binary tree, you'll have to change your atom rule slightly:
atom
: INTEGER
| (
OPARN expression CPARN -> expression
)
(
OPARN e=expression CPARN -> ^(MUL $atom $e)
)*
;
which produces the following AST given the input "(2-3)(4+5)(6*7)":
(image produced by: graphviz-dev.appspot.com)
The DOT file was generated with the following test-class:
import org.antlr.runtime.*;
import org.antlr.runtime.tree.*;
import org.antlr.stringtemplate.*;
public class Main {
public static void main(String[] args) throws Exception {
GLexer lexer = new GLexer(new ANTLRStringStream("(2-3)(4+5)(6*7)"));
GParser parser = new GParser(new CommonTokenStream(lexer));
CommonTree tree = (CommonTree)parser.start().getTree();
DOTTreeGenerator gen = new DOTTreeGenerator();
StringTemplate st = gen.toDOT(tree);
System.out.println(st);
}
}

How to match a string, but case-insensitively?

Let's say that I want to match "beer", but don't care about case sensitivity.
Currently I am defining a token to be ('b'|'B' 'e'|'E' 'e'|'E' 'r'|'R') but I have a lot of such and don't really want to handle 'verilythisisaverylongtokenindeedomyyesitis'.
The antlr wiki seems to suggest that it can't be done (in antlr) ... but I just wondered if anyone had some clever tricks ...
I would like to add to the accepted answer: a ready -made set can be found at case insensitive antlr building blocks, and the relevant portion included below for convenience
fragment A:[aA];
fragment B:[bB];
fragment C:[cC];
fragment D:[dD];
fragment E:[eE];
fragment F:[fF];
fragment G:[gG];
fragment H:[hH];
fragment I:[iI];
fragment J:[jJ];
fragment K:[kK];
fragment L:[lL];
fragment M:[mM];
fragment N:[nN];
fragment O:[oO];
fragment P:[pP];
fragment Q:[qQ];
fragment R:[rR];
fragment S:[sS];
fragment T:[tT];
fragment U:[uU];
fragment V:[vV];
fragment W:[wW];
fragment X:[xX];
fragment Y:[yY];
fragment Z:[zZ];
So an example is
HELLOWORLD : H E L L O W O R L D;
How about define a lexer token for each permissible identifier character, then construct the parser token as a series of those?
beer: B E E R;
A : 'A'|'a';
B: 'B'|'b';
etc.
A case-insensitive option was just added to ANTLR
options { caseInsensitive = true; }
https://github.com/antlr/antlr4/commit/7bc825776357a0e6e7fc399bb0841d570a7e824b
The old links are now broken, these should continue to work.
Case-Insensitive Lexing
CaseChangingCharStream
CaseChangingCharStream.java
Define case-insensitive tokens with
BEER: [Bb] [Ee] [Ee] [Rr];
New documentation page has appeared in ANTLR GitHub repo: Case-Insensitive Lexing. You can use two approaches:
The one described in #javadba's answer
Or add a character stream to your code, which will transform an input stream to lower or upper case. Examples for the main languages you can find on the same doc page.
My opinion, it's better to use the first approach and have the grammar which describes all the rules. But if you use well-known grammar, for example from Grammars written for ANTLR v4, then second approach may be more appropriate.
A solution I used in C#: use ASCII code to shift character to smaller case.
class CaseInsensitiveStream : Antlr4.Runtime.AntlrInputStream {
public CaseInsensitiveStream(string sExpr)
: base(sExpr) {
}
public override int La(int index) {
if(index == 0) return 0;
if(index < 0) index++;
int pdx = p + index - 1;
if(pdx < 0 || pdx >= n) return TokenConstants.Eof;
var x1 = data[pdx];
return (x1 >= 65 && x1 <= 90) ? (97 + x1 - 65) : x1;
}
}