Print tokens properly using Lex and Yacc - yacc

I'm having difficulties printing a sequence of tokens that behaves recursively. To better explain, I will show the sections of the corresponding codes: First, the code on Lex:
%{
#include <stdio.h>
#include "y.tab.h"
installID(){
}
%}
abreparentese "("
fechaparentese ")"
pontoevirgula ";"
virgula ","
id {letra}(({letra}|{digito})|({letra}|{digito}|{underline}))*
digito [0-9]
letra [a-z|A-Z]
porreal "%real"
portexto "%texto"
porinteiro "%inteiro"
leia "leia"
%%
{abreparentese} { return ABREPARENTESE; }
{fechaparentese} { return FECHAPARENTESE; }
{pontoevirgula} { return PONTOEVIRGULA; }
{virgula} { return VIRGULA; }
{id} { installID();
return ID; }
{porinteiro} { return PORINTEIRO; }
{porreal} { return PORREAL; }
{portexto} { return PORTEXTO; }
{leia} { return LEIA;}
%%
int yywrap() {
return 1;
}
Now, the code on Yacc:
%{
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <stdbool.h>
#define YYSTYPE char*
int yylex(void);
void yyerror(char *);
extern FILE *yyin, *yyout;
extern char* yytext;
%}
%token ABREPARENTESE FECHAPARENTESE PONTOEVIRGULA VIRGULA ID PORREAL PORTEXTO PORINTEIRO LEIA
%%
programs : programs program
| program
| ABREPARENTESE {fprintf(yyout,"%s",yytext);}
| FECHAPARENTESE {fprintf(yyout,"%s",yytext);}
;
program:
leia
;
leia:
LEIA ABREPARENTESE entradas ids FECHAPARENTESE PONTOEVIRGULA
{
fprintf(yyout,"scanf(\"%s\",%s);",$3,$4);
}
;
entradas:
tipo_entrada VIRGULA entradas {fprintf(yyout,"%s,",$1);}
| tipo_entrada VIRGULA {fprintf(yyout,"%s", $1); }
;
tipo_entrada:
| PORREAL {$$ = "%f";}
| PORTEXTO {$$ = "%c";}
| PORINTEIRO {$$ = "%d";}
;
ids:
id VIRGULA ids {fprintf(yyout,"&%s,",$1);}
| id {fprintf(yyout,"&%s",$1);}
;
id:
ID {$$ = strdup(yytext);}
;
%%
void yyerror(char *s) {
fprintf(stderr, "%s\n", s);
}
int main(int argc, char *argv[]){
yyout = fopen(argv[2],"w");
yyin = fopen(argv[1], "r");
yyparse();
return 0;
}
I believe I have copied all the relevant part of my problem on the code (some things maybe I forgot to copy and paste), however my problem is this part of the code:
leia: LEIA ABREPARENTESE entradas ids FECHAPARENTESE PONTOEVIRGULA
{
fprintf(yyout,"scanf(\"%s\",%s);",$3,$4);
}
;
In the input file, I have the following line:
leia (%real, %inteiro, id1, id2);
The expectation was this on the output file:
scanf("%f,%d",&id1,&id2);
But actually this is the result in the output file:
%d%f,&id2&id1,scanf("%f",id1);
Can you help me solve this problem? How do I print the tokens in the right place?

Normally, with bottom-up parsing, we use left-recursive productions, which has the result that the productions are reduced from left to right.
When you use right recursion, then productions are stacked up until the end, and then popped off the stack and therefore reductions are executed right-to-left.
So for example, it would be more usual to write:
ids: id
| ids ',' id
and then the semantic rules will execute in the expected order.

Related

Simple Lex/Yacc Calculator not printing output

I'm trying to understand how compilers and programming languages are made. And to do so I thought about creating a simple calculator which does just addition and subtraction. Below are the Lex and Yacc files which I wrote.
calc.yacc file:
%{
#include <stdio.h>
#include <stdlib.h>
extern int yylex();
void yyerror(char *);
%}
%union { int number; }
%start line
%token <number> NUM
%type <number> expression
%%
line: expression { printf("%d\n", $1); };
expression: expression '+' NUM { $$ = $1 + $3; };
expression: expression '-' NUM { $$ = $1 - $3; };
expression: NUM { $$ = $1; };
%%
void yyerror(char *s) {
fprintf(stderr, "%s", s);
exit(1);
}
int main() {
yyparse();
return 0;
}
calc.lex file:
%{
#include <stdio.h>
#include <stdlib.h>
#include "y.tab.h"
%}
%%
[0-9]+ {
yylval.number = atoi(yytext);
return NUM;
}
[-+] { return yytext[0]; }
[ \t\f\v\n] { ; }
%%
int yywrap() {
return 1;
}
It compiles nicely but when I run it and type something like 2 + 4 then it gets stuck and doesn't print the answer. Can somebody explain why? My guess is that my grammar is not correct (but I don't know how).
I came to the same idea like rici and changed your samples appropriately:
file calc.l:
%{
#include <stdio.h>
#include <stdlib.h>
#include "calc.y.h"
%}
%%
[0-9]+ {
yylval.number = atoi(yytext);
return NUM;
}
[-+] { return yytext[0]; }
"\n" { return EOL; }
[ \t\f\v\n] { ; }
%%
int yywrap() {
return 1;
}
file calc.y:
%{
#include <stdio.h>
#include <stdlib.h>
extern int yylex();
void yyerror(char *);
%}
%union { int number; }
%start input
%token EOL
%token <number> NUM
%type <number> expression
%%
input: line input | line
line: expression EOL { printf("%d\n", $1); };
expression: expression '+' NUM { $$ = $1 + $3; };
expression: expression '-' NUM { $$ = $1 - $3; };
expression: NUM { $$ = $1; };
%%
void yyerror(char *s) {
fprintf(stderr, "%s", s);
exit(1);
}
int main() {
yyparse();
return 0;
}
Compiled & tested in cygwin on Windows 10 (64 bit):
$ flex -o calc.l.c calc.l
$ bison -o calc.y.c -d calc.y
$ gcc -o calc calc.l.c calc.y.c
$ ./calc
2 + 4
6
2 - 4
-2
234 + 432
666
Notes:
Minor issue: According to the build commands, I had to change the #include for the generated token table. (A matter of taste.)
I introduced the EOL token in the lex source as well as in the line rule of the parser.
While testing I recognized that the 2nd input ended everytimes in a syntax error. I needed a while until I recognized that the grammer was actually limited now to accept precisely one line. Thus, I inserted the recursive input rule in the parser source.

lex and yacc to parse trignometric expression

I have the following code for lex and yacc. I am getting kind of extra values in the printed statement can anyone tell. whats wrong with the code?
Lex code:
%{
#include <stdio.h>
#include "y.tab.h"
%}
%%
[ \t] ;
[+-] { yylval=yytext; return Sym;}
(s|c|t)..x { yylval=yytext; return Str;}
[a-zA-Z]+ { printf("Invalid");}
%%
int yywrap()
{
return 1;
}
yacc code:
%{
#include<stdio.h>
%}
%start exps
%token Sym Str
%%
exps: exps exp
| exp
;
exp : Str Sym Str {printf("%s",$1); printf("%s",$2); printf("%s",$3);}
;
%%
int main (void)
{
while(1){
return yyparse();
}
}
yyerror(char *err) {
fprintf(stderr, "%s\n",err);
}
Input:
sinx+cosx
output:
sinx+cosx+cosxcosx
look at the output of the code!!!
yytext is a pointer into flex's internal scanning buffer, so its contents will be modified when the next token is read. If you want to return it to the parser, you need to make a copy:
[+-] { yylval=strdup(yytext); return Sym;}
(s|c|t)..x { yylval=strdup(yytext); return Str;}
Where symbols are a single character, it might make more sense to return that character directly in the scanner:
[-+] { return *yytext; }
in which case, your yacc rules should use the character directly in '-single quotes:
exp : Str '+' Str {printf("%s + %s",$1, $3); free($1); free($3); }
| Str '-' Str {printf("%s - %s",$1, $3); free($1); free($3); }

Syntax error in Bison after one token is processed

I am trying to come up to speed on Flex and Bison. I can parse one token with a very simple "language" but it fails on the second, even though the token is legitimate.
test.l:
%{
#include <stdio.h>
#include "test.hpp"
%}
%%
[0-9]+ {printf("Number entered\n"); return INTEGER_NUMBER;}
[a-zA-Z]+ {printf("plain text entered: '%s'\n",yytext); return PLAIN_TEXT;}
[ \t] ;
. ;
%%
test.y
%{
#include <stdio.h>
extern "C" {
int yyparse(void);
int yylex(void);
int yywrap() { return 1; }
extern int yylineno;
extern char* yytext;
extern int yylval;
}
/* #define YYSTYPE char * */
void yyerror(const char *message)
{
fprintf(stderr, "%d: error: '%s' at '%s', yylval=%u\n", yylineno, message, yytext, yylval);
}
main()
{
yyparse();
}
%}
%token PLAIN_TEXT INTEGER_NUMBER
%%
test : text | number;
text : PLAIN_TEXT
{
/*printf("plain text\n");*/
};
number : INTEGER_NUMBER
{
/*printf("number\n");*/
};
%%
Results:
$ ./test
cat
plain text entered: 'cat'
dog
plain text entered: 'dog'
1: error: 'syntax error' at 'dog', yylval=0
$ ./test
34
Number entered
34
Number entered
1: error: 'syntax error' at '34', yylval=0
Why am I getting this syntax error?
Your test.y seems to lack the grammar for the case that several tests
continue.
So, how about adding the grammar like the following?
%%
tests : test | tests test; /* added */
test : text | number;
...

correcting some simple logic errors in lex and yacc

Please i need help in solving those two simple logic errors that i am facing in my example.
Here are the details:
The Input File: (input.txt)
FirstName:James
LastName:Smith
normal text
The output File: (output.txt) - [with two logic errors]
The Name is: James
The Name is: LastName:Smith
The Name is: normal text
What I am expecting as output (instead of the above lines) - [without logical errors]
The Name is: James
The Name is: Smith
normal text
In other words, i don't want the lastName to be sent to output, and i want to match normal text as well if it is written after the "FirstName:" or "LastName:".
Here is my lex File (example.l):
%{
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <ctype.h>
#include "y.tab.h"
/* prototypes */
void yyerror(const char*);
/* Variables: */
char *tempString;
%}
%START sBody
%%
"FirstName:" { BEGIN sBody; }
"LastName:" { BEGIN sBody; }
.? { return sNormalText; }
\n /* Ignore end of line */;
[ \t]+ /* Ignore whitespace */;
<sBody>.+ {
tempString = (char *)calloc(strlen(yytext)+1, sizeof(char));
strcpy(tempString, yytext);
yylval.sValue = tempString;
return sText;
}
%%
int main(int argc, char *argv[])
{
if ( argc < 3 )
{
printf("Please you need two args: inputFileName and outputFileName");
}
else
{
yyin = fopen(argv[1], "r");
yyout = fopen(argv[2], "w");
yyparse();
fclose(yyin);
fclose(yyout);
}
return 0;
}
Here is my yacc file: (example.y):
%{
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <ctype.h>
#include "y.tab.h"
void yyerror(const char*);
int yywrap();
extern FILE *yyout;
%}
%union
{
int iValue;
char* sValue;
};
%token <sValue> sText
%token <sValue> sNormalText
%%
StartName: /* for empty */
| sName StartName
;
sName:
sText
{
fprintf(yyout, "The Name is: %s\n", $1);
}
|
sNormalText
{
fprintf(yyout, "%s\n", $1);
}
;
%%
void yyerror(const char *str)
{
fprintf(stderr,"error: %s\n",str);
}
int yywrap()
{
return 1;
}
Please if you can help me out correcting those simple logical errors, i will be grateful.
Thanks in advance for your help and for reading my post.
Part of the trouble is that you move into state 'sBody' but you never move back to the initial state 0.
Another problem - not yet a major one - is that you use a right-recursive grammar rule instead of the (natural for Yacc) left-recursive rule:
StartName: /* empty */
| sName StartName
;
vs
StartName: /* empty */
| StartName sName
;
Adding BEGIN 0; to the <sBody> Lex rule improves things a lot; the remaining trouble is that you get one more line 'Smith' in the output file for each single letter in the normal text. You need to review how the value is returned to your grammar.
By adding yylval.sValue = yytext; before the return in the rule that returns sNormalText, I got the 'expected' output.
example.l
%{
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include "y.tab.h"
/* prototypes */
void yyerror(const char*);
/* Variables: */
char *tempString;
%}
%START sBody
%%
"FirstName:" { puts("FN"); BEGIN sBody; }
"LastName:" { puts("LN"); BEGIN sBody; }
.? { printf("NT: %s\n", yytext); yylval.sValue = yytext; return sNormalText; }
\n /* Ignore end of line */;
[ \t]+ /* Ignore whitespace */;
<sBody>.+ {
tempString = (char *)calloc(strlen(yytext)+1, sizeof(char));
strcpy(tempString, yytext);
yylval.sValue = tempString;
puts("SB");
BEGIN 0;
return sText;
}
%%
int main(int argc, char *argv[])
{
if ( argc < 3 )
{
printf("Please you need two args: inputFileName and outputFileName");
}
else
{
yyin = fopen(argv[1], "r");
if (yyin == 0)
{
fprintf(stderr, "failed to open %s for reading\n", argv[1]);
exit(1);
}
yyout = fopen(argv[2], "w");
if (yyout == 0)
{
fprintf(stderr, "failed to open %s for writing\n", argv[2]);
exit(1);
}
yyparse();
fclose(yyin);
fclose(yyout);
}
return 0;
}
example.y
%{
#include <stdio.h>
#include "y.tab.h"
void yyerror(const char*);
int yywrap();
extern FILE *yyout;
%}
%union
{
char* sValue;
};
%token <sValue> sText
%token <sValue> sNormalText
%%
StartName: /* for empty */
| StartName sName
;
sName:
sText
{
fprintf(yyout, "The Name is: %s\n", $1);
}
|
sNormalText
{
fprintf(yyout, "The Text is: %s\n", $1);
}
;
%%
void yyerror(const char *str)
{
fprintf(stderr,"error: %s\n",str);
}
int yywrap()
{
return 1;
}
output.txt
The Name is: James
The Name is: Smith
The Text is: n
The Text is: o
The Text is: r
The Text is: m
The Text is: a
The Text is: l
The Text is:
The Text is: t
The Text is: e
The Text is: x
The Text is: t
It might make more sense to put yywrap() in with the lexical analyzer rather than with the grammar. I've left the terse debugging prints in the code - they helped me see what was going wrong.
FN
SB
LN
SB
NT: n
NT: o
NT: r
NT: m
NT: a
NT: l
NT:
NT: t
NT: e
NT: x
NT: t
You'll need to play with the '.?' rule to get normal text returned in its entirety. You may also have to move it around the file - start states are slightly peculiar critters. When I changed the rule to '.+', Flex gave me the warning:
example.l:25: warning, rule cannot be matched
example.l:27: warning, rule cannot be matched
These lines referred to the blank/tab and sBody rules. Moving the unqualified '.+' after the sBody rule removed the warnings, but didn't seem to do what was wanted. Have fun...

How to fix a warning message associated with strlen() used in Yacc?

Please i need your help. Basically, I am facing this warning message upon compiling with gcc, and am not able to deduce the error:
Here are the details:
The warning message i am receiving is literrally as follows:
y.tab.c: In function ‘yyparse’: y.tab.c:1317
warning: incompatible implicit declaration of built-in function ‘strlen’
My Lex File looks like:
%{
#include <stdio.h>
#include <stdlib.h>
#include <ctype.h>
#include "y.tab.h"
void yyerror(const char*);
char *ptrStr;
%}
%START nameState
%%
"Name:" { BEGIN nameState; }
<nameState>.+ {
ptrStr = (char *)calloc(strlen(yytext)+1, sizeof(char));
strcpy(ptrStr, yytext);
yylval.sValue = ptrStr;
return sText;
}
%%
int main(int argc, char *argv[])
{
if ( argc < 3 )
{
printf("Two args are needed: input and output");
}
else
{
yyin = fopen(argv[1], "r");
yyout = fopen(argv[2], "w");
yyparse();
fclose(yyin);
fclose(yyout);
}
return 0;
}
My Yacc file is as follows:
%{
#include <stdio.h>
#include <stdlib.h>
#include <ctype.h>
#include "y.tab.h"
void yyerror(const char*);
int yywrap();
extern FILE *yyout;
%}
%union
{
int iValue;
char* sValue;
};
%token <sValue> sText
%token nameToken
%%
StartName: /* for empty */
| sName
;
sName:
sText
{
fprintf(yyout, "The Name is: %s", $1);
fprintf(yyout, "The Length of the Name is: %d", strlen($1));
}
;
%%
void yyerror(const char *str)
{
fprintf(stderr,"error: %s\n",str);
}
int yywrap()
{
return 1;
}
*I was wondering how to remove this warning message. Please any suggestions are highly appreciated!
Thanks in advance.
Include string.h thats where strlen & friends are declared.