failed to parse number by yacc and lex - yacc

i have finished my lex file and start to learn about yacc
but i have some question about part of my code of lex:
%{
#include "y.tab.h"
int num_lines = 1;
int comment_mode=0;
int stack =0;
%}
digit ([0-9])
integer ({digit}+)
float_num ({digit}+\.{digit}+)
%%
{integer} { //deal with integer
printf("#%d: NUM:",num_lines); ECHO;printf("\n");
yylval.Integer = atoi(yytext);
return INT;
}
{float_num} {// deal with float
printf("#%d: NUM:",num_lines);ECHO;printf("\n");
yylval.Float = atof(yytext);
return FLOAT;
}
\n { ++num_lines; }
. if(strcmp(yytext," "))ECHO;
%%
int yywrap() {
return 1;
}
every time i got an integer or a float i return the token and save it into yylval
and here is my code in parser.y:
%{
#include <stdio.h>
#define YYDEBUG 1
void yyerror (char const *s) {
fprintf (stderr, "%s\n", s);
}
%}
%union{
int Integer;
float Float;
}
%token <int>INT;
%token <float>FLOAT;
%%
statement :
INT {printf("int yacc\n");}
| FLOAT {printf("float yacc\n");}
|
;
%%
int main(int argc, char** argv)
{
yyparse();
return 0;
}
which compiled by
byacc –d parser.y
lex lex.l
gcc lex.yy.c y.tab.c –ll
since i just want to try something easy to get started, i want to see if i can parse
only int and float number first, i print them in both .l and .y file after i input an
integer or a float.int the begining i input fisrt random number, for example 123
, then my program print :
1: NUM: 123
in yylex() and
"int yacc\n"
in parser.y
but if i input the second else number, it shows syntax error and the program shutdown
i dont know where is the problem.
is there any solution?

Your grammar only accepts a single token, either an INT or a FLOAT. So it will only accept a single number, which is why it produces a syntax error when it reads the second number; it is expecting an end-of-file.
The solution is to change the grammar so that it accepts any number of "statements":
program: /* EMPTY */
| program statement
;
Two notes:
1) You don't need an (expensive) strcmp in your lexer. Just do this:
" " /* Do nothing */;
. { return yytext[0]; }
It's better to return the unknown character to the parser, which will produce a syntax error if the character doesn't correspond to any token type (as in your simple grammar) than to just echo the character to stdout, which will prove confusing. Some people would prefer to produce an error message in the lexer for invalid input, but while you are developing a grammar I think it is easier to just pass through the characters, because that lets you add operators to your parser without regenerating the lexer.
2) When you specify %types in bison, you use the tagname from the union, not the C type. Some (but not all) versions of bison let you get away with using the C type if it is a simple type, but you can't count on it; it's not posix standard and it may well break if you use an older or newer version of bison. (For example, it won't work with bison 3.0.) So you should write, for example:
%union{
int Integer;
float Float;
}
%token <Integer>INT;
%token <Float>FLOAT;

Related

3 Address Code Generation using lex and yacc

I'm trying to generate 3 address code corresponding to basic arithmetic expressions. I haven't worked with lex and yacc tools before much (Newbie) and I'm having trouble understanding the flow of control/command among the two i.e how the two programs are interacting.
lex.l
%{
#include<stdio.h>
#include"y.tab.h"
int k=1;
%}
%%
[0-9]+ {
yylval.dval=yytext[0];
return NUM;
}
\n {return 0;}
. {return yytext[0];}
%%
void yyerror(char* str)
{
printf("\n%s",str);
}
char *gencode(char word[],char first,char op,char second)
{
char temp[10];
sprintf(temp,"%d",k);
strcat(word,temp);
k++;
printf("%s = %c %c %c\n",word,first,op,second);
return word; //Returns variable name like t1,t2,t3... properly
}
int yywrap()
{
return 1;
}
main()
{
yyparse();
return 0;
}
yacc.y
%{
#include<stdio.h>
int aaa;
%}
%union{
char dval;
}
%token <dval> NUM
%type <dval> E
%left '+' '-'
%left '*' '/' '%'
%%
statement : E {printf("\nt = %c \n",$1);}
;
E : E '+' E
{
char word[]="t";
char *test=gencode(word,$1,'+',$3);
$$=test;
}
| E '-' E
{
char word[]="t";
char *test=gencode(word,$1,'-',$3);
$$=test;
}
| E '%' E
{
char word[]="t";
char *test=gencode(word,$1,'%',$3);
$$=test;
}
| E '*' E
{
char word[]="t";
char *test=gencode(word,$1,'*',$3);
$$=test;
}
| E '/' E
{
char word[]="t";
char *test=gencode(word,$1,'/',$3);
$$=test;
}
| '(' E ')'
{
$$=$2;
}
| NUM
{
$$=$1;
}
;
%%
Problem:
getting garbage value in output
Expected output for expression (2+3)*5 should be like:
t1= 2 + 3
t2= t1 * 5
Obtained output:
t1= 2 + 3
t2= garbage value * 5
I'm unable to figure out how to correct this. The variable names (eg t1,t2,t3 ) are being properly returned from gencode() method in lex.l
char *test=gencode(word,$1,'%',$3);
But I'm completely clueless about what is going wrong after that. I believe I'm not handling the $$,$1,$3 terms correctly.
Please help me understand what is going wrong, what needs to be done and how to do it.
A little help and some explanation would be very helpful. Thank you.
The problem here is not in the use of flex or bison; rather, it is an Undefined Behaviour in your C code.
Your gencode function returns its first argument. Then you call it like this, roughly:
{
char word[] = ...
... = gencode(word, ...);
}
The lifetime of word ends when the block finishes, which is right after the call to gencode. In effect, that is no different from the classic dangling pointer generator:
char* dangle(void) {
char temporary[] = "some string";
return temporary;
}
which is obviously incorrect, since the local variable ceases to exist before its address is returned.
In addition, you actually create word as a two-character array:
char word[] = "t";
since leaving out the size tells C to leave exactly enough space for the initial string (one character plus null terminator). That's fine, but you cannot then append more characters to the string (with strcat) because there is no space left and you will end up overwriting some other variable (or worse).
mentall n't end even after the function returns. That's why I declared char word[] before calling the function. This ideone.com/RBz0y2 is a code I wrote seperately and used it in here too. Is it not right? – Swagnik Dutta Mar 9 '16 at 16:38
#novice: If the caller allocates on the stack and passes to the called function, that is fine. The caller still has the memory. But as soon as the caller returns to its caller, the memory is gone. You cannot set a persistent variable to the address. So, no, it is not right. If you need keep the value around for the future, you need to allocate with malloc; stack-allocated storage

Simple yacc grammars give an error

I have a question for yacc compiler. I do not compile simple yacc grammar. Here is the code section :
/*anbn_0.y */
%token A B
%%
start: anbn '\n' {printf(" is in anbn_0\n");
return 0;}
anbn: empty
| A anbn B
;
empty: ;
%%
#include "lex.yy.c"
yyerror(s)
char *s;
{ printf("%s, it is not in anbn_0\n", s);
I use mac os x and, i try yo command;
$ yacc anbn_0.y and then
$ gcc -o anbn_0 y.tab.c -ll and give me error. Here is the error ;
warning: implicit declaration of function 'yylex' is invalid in C99 [-Wimplicit-function-declaration]
yychar = YYLEX;
Why do I get an error ?
Its a warning, not an error, so you should be fine if you ignore it. But if you really want to get rid of the warning, you could add
%{
int yylex();
%}
to the top of your .y file
Here is an answer to a more sophisticated version of this problem which isn't easily solved just by adding a declaration.
GNU Bison supports the generation of re-entrant parsers which work together with Flex (using Flex's %option bison-bridge re-entrant). Berkeley Yacc provides a compatible implementation.
Here is a guide on how to solve this undeclared yylex for both parser generators.
With a re-entrant, "Bison bridged" lexer, the declaration of yylex turns into this:
int yylex(YYSTYPE *yylval, void *scanner);
If you place this prototype in the %{ ... %} initial header section of your Yacc parser, and generate the parser with either Bison or Berkeley Yacc, the compiler will complain that YYSTYPE is not declared.
You cannot simply create a forward declaration for YYSTYPE, because in Berkeley Yacc, it does not have a union tag. In Bison, it is typedef union YYSTYPE { ... } YYSTYPE, but in Berkeley Yacc it is typedef { ... } YYSTYPE: no tag.
But, in Berkeley Yacc, if you put a declaration in the third section of the parser, it is in scope of the yylex call! So the following works for Berkeley yacc:
%{
/* includes, C defs */
%}
/* Yacc defs */
%%
/* Yacc grammar */
%%
int yylex(YYSTYPE *, void *);
/* code */
If this is generated with Bison, the problem persists: there is no prototype in scope of the yylex call.
This little fix makes it work for GNU Bison:
%{
/* includes, C defs */
#if YYBISON
union YYSTYPE;
int yylex(union YYSTYPE *, void *);
#endif
%}
/* Yacc defs */
%%
/* Yacc grammar */
%%
int yylex(YYSTYPE *, void *);
/* code */
There you go.

ignoring return value of ‘int scanf(const char*, ...)’, declared with attribute warn_unused_result [-Wunused-result]?

When I compiled the following program like:
g++ -O2 -s -static 2.cpp it gave me the warning ignoring return value of ‘int scanf(const char*, ...)’, declared with attribute warn_unused_result [-Wunused-result].
But when I remove -02 from copiling statement no warning is shown.
My 2.cpp program:
#include<stdio.h>
int main()
{
int a,b;
scanf("%d%d",&a,&b);
printf("%d\n",a+b);
return 0;
}
What is the meaning of this warning and what is the meaning of -O2 ??
It means that you do not check the return value of scanf.
It might very well return 1 (only a is set) or 0 (neither a nor b is set).
The reason that it is not shown when compiled without optimization is that the analytics needed to see this is not done unless optimization is enabled. -O2 enables the optimizations - http://gcc.gnu.org/onlinedocs/gcc/Optimize-Options.html.
Simply checking the return value will remove the warning and make the program behave in a predicable way if it does not receive two numbers:
if( scanf( "%d%d", &a, &b ) != 2 )
{
// do something, like..
fprintf( stderr, "Expected at least two numbers as input\n");
exit(1);
}
I took care of the warning by making an if statement that matches the number of arguments:
#include <iostream>
#include <cstdio>
using namespace std;
int main() {
int i;
long l;
long long ll;
char ch;
float f;
double d;
//6 arguments expected
if(scanf("%d %ld %lld %c %f %lf", &i, &l, &ll, &ch, &f, &d) == 6)
{
printf("%d\n", i);
printf("%ld\n", l);
printf("%lld\n", ll);
printf("%c\n", ch);
printf("%f\n", f);
printf("%lf\n", d);
}
return 0;
}

How to fetch the row and column number of error

How to fetch the row and column number of error (i.e which part of string does not follow the grammar rules)?
I am using yacc parser to check the grammar.
Thank you.
you'd better read the dragon book and the aho book that explain and show example of how to write a lex/yacc based compiler.
In order to get line/column of the error, you shall make your lexer preserve the column and line. So in your lexer, you have to declare two globals, SourceLine and SourceCol (of course you can use better non-camel cased names).
In each token production, you have to calculate the column of the produced token, for that purpose I use a macro as follows:
#define Return(a, b, c) \
{\
SourceCol = (SourceCol + yyleng) * c; \
DPRINT ("## Source line: %d, returned token: "a".\n", SourceLine); \
return b; \
}
and the token production, with that macro, is:
"for" { Return("FOR", FOR, 1);
then to keep lines, for each token that makes a new line, I'm using:
{NEWLINES} {
BEGIN(INITIAL);
SourceLine += yyleng;
Return("LINE", LINE, 0);
}
Then in your parser, you can get SourceCol and SourceLine if you declare those as extern globals:
extern unsigned int SourceCol;
extern unsigned int SourceLine;
and now in your parse_error grammar production, you can do:
parse_error : LEXERROR
{
printf("OMG! Your code sucks at line %u and col %u!", SourceLine, SourceCol);
}
of course you may want to add yytext, handle a more verbose error message etc.. But all that's up to you!

programatic way to find ELF aux header (or envp) in shared library code?

I'm looking for a programatic way to find the powerpc cpu type on Linux. Performing some google searches associated an answer suggesting the mfpvr instruction I found that this is available in the ELF AUX header, and sure enough I can obtain the POWER5 string for the machine I'm running on with the following:
#include <stdio.h>
#include <elf.h>
int main( int argc, char **argv, char **envp )
{
/* walk past all env pointers */
while ( *envp++ != NULL )
;
/* and find ELF auxiliary vectors (if this was an ELF binary) */
#if 0
Elf32_auxv_t * auxv = (Elf32_auxv_t *) envp ;
#else
Elf64_auxv_t * auxv = (Elf64_auxv_t *) envp ;
#endif
char * platform = NULL ;
for ( ; auxv->a_type != AT_NULL ; auxv++ )
{
if ( auxv->a_type == AT_PLATFORM )
{
platform = (char *)auxv->a_un.a_val ;
break;
}
}
if ( platform )
{
printf( "%s\n", platform ) ;
}
return 0 ;
}
In the shared library context where I want to use this info I have no access to envp. Is there an alternate programatic method to find the beginning of the ELF AUX header?
You can get if from /proc/self/auxv file
According to man proc /proc/self/auxv is available since kernel level 2.6.0-test7.
Another option - get some (existing) environment variable - let say HOME,
or PATH, or whatever. Please note that you'll get it's ADDRESS. From here you can go back and find previous env variable, then one before it, etc. After that you can likewise skip all argv arguments. And then you get to the last AUXV entry. Some steps back - and you should be able find your AT_PLATFORM.
EDIT: It looks like glibc now provides a programatic method to get at this info:
glibc-headers-2.17-106: /usr/include/sys/auxv.h : getauxinfo()
Example:
#include <sys/auxv.h>
#include <stdio.h>
int main()
{
unsigned long v = getauxval( AT_PLATFORM ) ;
printf( "%s\n", (char *)v ) ;
return 0 ;
}