In my language i can write
a = 1
b = 2
if true { } else { }
if true { } **Here is the problem**
else {}
My grammer doesnt support newlines between statements. An else can only be used with an if. When i add optionalNL in my rule
IfExpr:
IF rval optionalNL codeBlock optionalNL ELSE codeBlock
| IF rval optionalNL codeBlock
The optionalNL before the else causes 3 reduce/reduce. Reason is it can reduce using the 2nd rule in IfExpr or reduce to exprLoop where it allows many newlines between expressions.
No matter what i do (i tried writing %prec before optionalNL and ELSE) it always reduces to exprLoop which cases bison to give me a synax error on else. How do i tell bison to shift at this point (to optionalNL else) instead of reduce? (to exprLoop causing else to be an error).
example file to test with
%%
program:
exprLoop;
exprLoop:
exprLoop2 expr
| exprLoop2
exprLoop2:
| exprLoop2 expr EOS
| exprLoop2 EOS
;
expr:
'i' Var optEOS '{' '}'
| 'i' Var optEOS '{' '}' optEOS 'e' '{' '}'
EOS: '\n' ;
Var: 'v';
optEOS: | optEOS EOS
%%
//this can be added to the lex file
[iev] { return *yytext; }
y.output http://www.pastie.org/707448
Alternative .y and output. You can see it looking ahead seeing a \n and doesnt know to reduce the rule or keep going. I change change the order of the rules to get different results. But it either always expects a \n or always expects an else thus one rule always end up being ignore.
state 15
9 expr: 'i' Var optEOS '{' '}' . [$end, '\n']
10 | 'i' Var optEOS '{' '}' . 'e' '{' '}'
11 | 'i' Var optEOS '{' '}' . '\n' 'e' '{' '}'
'e' shift, and go to state 16
'\n' shift, and go to state 17
'\n' [reduce using rule 9 (expr)]
$default reduce using rule 9 (expr)
Thanks to Kinopiko for his answer
I changed his code to have no conflicts then worked on making it more flexible. Heres are my files
test.y
%{
#include <stdio.h>
%}
%%
program: expr { printf ("First expr\n"); }
| program expr { printf ("Another expr\n"); }
expr:
if optEOS { printf ("IF only\n"); }
| if optEOS else optEOS { printf ("IF/ELSE\n"); }
if: 'i' Var optEOS '{' optEOS '}'
else: 'e' optEOS '{' optEOS '}'
EOS: '\n'
Var: 'v'
optEOS:
| EOS optEOS { ;}//printf ("many EOS\n"); }
%%
int main(int argc, char **argv)
{
int i;
printf("starting\n");
if(argc < 2) {
printf("Reading from stdin\n");
yyparse();
return 0;
}
for(i = 1; i < argc; i++) {
FILE *f;
char fn[260];
sprintf(fn, "./%s", argv[i]);
f = fopen(fn, "r");
if(!f) {
perror(argv[i]);
return (1);
}
printf("Running '%s'\n", argv[i]);
yyrestart(f);
yyparse();
fclose(f);
printf("done\n");
}
return 0;
}
test.y
%{
#include <stdio.h>
#include "y.tab.h"
%}
%option noyywrap
%%
[ \t] { }
\n { return *yytext; }
. { return *yytext; }
%%
int yyerror ()
{
printf ("syntax error\n");
exit (1);
}
a test file that auto ran after compiling
i v { }
i v { }
e { }
i v { }
e { }
i v {
} e {
}
i v { }
i v { } i v { } e { }
i v
{ } i v { } e { } i v { } e {
} i v {
} e
{ }
I don't understand your problem very well, so I started from scratch:
This is my grammar:
%{
#include <stdio.h>
%}
%%
program: expr { printf ("First expr\n") }
| program EOS { printf ("Ate an EOS\n") }
| program expr { printf ("Another expr\n") }
expr:
ifeos { printf ("IF only\n"); }
| ifelse { printf ("IF/ELSE\n"); }
ifelse: ifeos else
| if else
ifeos: if EOS
| ifeos EOS
if: 'i' Var optEOS '{' '}'
else: 'e' '{' '}'
EOS: '\n'
Var: 'v'
optEOS:
| EOS optEOS { printf ("many EOS\n") }
%%
Here is the lexer:
%{
#include <stdio.h>
#include "1763243.tab.h"
%}
%option noyywrap
%%
[iev\{\}\n] { return *yytext; }
\x20 { }
%%
int yyerror ()
{
printf ("syntax error\n");
exit (1);
}
int main () {
yyparse ();
}
Here is some test input:
i v { }
i v { }
e { }
i v { }
e { }
i v { } e { }
i v { }
Here is the output:
IF only
First expr
IF/ELSE
Another expr
Ate an EOS
IF/ELSE
Another expr
Ate an EOS
IF/ELSE
Another expr
Ate an EOS
IF only
Another expr
There is a shift/reduce conflict remaining.
According to 'Lex & Yacc' the default resolution of the reduce/reduce is the first defined rule, so as you say the exprLoop wins, so I'll assume it is defined first.
But switching the order may not solve the problem how you expect.
Further reading (page 237) it appears that you need more look ahead, which is not an option for standard yacc/bison. But Bison does have a GLR mode, which may be of use.
One thing you can do is parse out newlines completely using a lex rule for them. This way, it doesn't matter where the newlines are. This is what C/C++ do... newlines are largely ignored.
The problem is that:
IfExpr:
IF rval optionalNL codeBlock optionalNL ELSE codeBlock
| IF rval optionalNL codeBlock
requires two-token lookahead after the codeblock to see the 'else' after the newline if that's what there is. You can avoid this by duplicating the optionalNL in both if rules:
IfExpr:
IF rval optionalNL codeBlock optionalNL ELSE codeBlock
| IF rval optionalNL codeBlock optionalNL
Now the parser doesn't have to decide between the two rules until after the optionalNL is parsed, letting it see the ELSE (or its lack) in the one-token lookahead.
A potential drawback here is that the second if rule (but not the first) will now absorb any trailing newlines, so if your grammar for programs requires a newline between each statement, it won't find one after ifs without elses and its already been consumed.
Related
I am trying to learn this language for a college class and our teacher gave us a prompt to try. Basically we are to take a boolean expression and output if that expression is true or false. The input will be in the format of:
true and (false or true) or false.
I have talked with my professor about many solutions and he is wanting the class to make tokens for AND OR NOT TRUE FALSE. He also wants us to use the logical operators in the yacc file instead of the tokens, IE ||, &&, !.
test.l
%{
#include "y.tab.h"
%}
AND [Aa][Nn][Dd]
OR [Oo][Rr]
NOT [Nn][Oo][Tt]
op '&' | '|' | "!"
%%
[a-zA-Z] {return ALPHA;}
[\t]+ ;
[\n] {return '\n';}
{AND} { return (AND); }
{OR} { return (OR); }
{NOT} { return (NOT); }
[Tt][Rr][Uu][Ee] { yylval = 1;
return (boolean); }
[Ff][Aa][Ll][Ss][Ee] { yylval = 0;
return (boolean); }
. {();}
%%
test.y
%{
#include<stdio.h>
#include<stdlib.h>
int yylex();
%}
%token ALPHA AND OR NOT TRUE FALSE boolean
%left "&" "|"
%right '!'
%%
program: bexpr '\n' {if ($1 >= 1)
{
printf("TRUE\n");
exit(0);
}
else{
printf("FALSE\n");
exit(0);
}
|
;
bexpr: bexpr "|""|" bterm { $$ = $1 || $3; }
| bterm { $$ = $1; }
;
bterm: bterm "&""&" bfactor { $$ = $1 && $3; }
| bfactor { $$ = $1; }
;
bfactor: '!' bfactor { $$ = ! $2; }
| '(' bexpr ')' { $$ = $2; }
| TRUE { $$ = $1; }
| FALSE {$$ = $1; }
| boolean { $$ = $1; }
;
%%
int main()
{
printf("Enter your truth statement\n");
yyparse();
return 0;
}
If i were to put in true and false, would expect false. However, I get syntax error. If I only put in true, the output is correct, same for false. Basically if I put anything other than one term, the program throws an error.
I have the following code for lex and yacc. I am getting kind of extra values in the printed statement can anyone tell. whats wrong with the code?
Lex code:
%{
#include <stdio.h>
#include "y.tab.h"
%}
%%
[ \t] ;
[+-] { yylval=yytext; return Sym;}
(s|c|t)..x { yylval=yytext; return Str;}
[a-zA-Z]+ { printf("Invalid");}
%%
int yywrap()
{
return 1;
}
yacc code:
%{
#include<stdio.h>
%}
%start exps
%token Sym Str
%%
exps: exps exp
| exp
;
exp : Str Sym Str {printf("%s",$1); printf("%s",$2); printf("%s",$3);}
;
%%
int main (void)
{
while(1){
return yyparse();
}
}
yyerror(char *err) {
fprintf(stderr, "%s\n",err);
}
Input:
sinx+cosx
output:
sinx+cosx+cosxcosx
look at the output of the code!!!
yytext is a pointer into flex's internal scanning buffer, so its contents will be modified when the next token is read. If you want to return it to the parser, you need to make a copy:
[+-] { yylval=strdup(yytext); return Sym;}
(s|c|t)..x { yylval=strdup(yytext); return Str;}
Where symbols are a single character, it might make more sense to return that character directly in the scanner:
[-+] { return *yytext; }
in which case, your yacc rules should use the character directly in '-single quotes:
exp : Str '+' Str {printf("%s + %s",$1, $3); free($1); free($3); }
| Str '-' Str {printf("%s - %s",$1, $3); free($1); free($3); }
a simple calculator support only + - * / and integer. I use GNU/Linux.
hoc1.l:
%{
#include "y.tab.h"
extern int yylval;
%}
%%
[ \t] { ; }
[0-9]+ { sscanf(yytext, "%d", &yylval); printf("\nget %d\n", yylval); return NUMBER; }
\n {return 0;}
%%
int yywrap(void) {
return 1;
}
hoc1.y
%{
#include<stdio.h>
#define YYSTYPE int
%}
%token NUMBER
%left '+' '-'
%left '*' '/'
%%
list:
| list '\n'
| list expr '\n' {printf("\t%d\n",$2);}
;
expr: NUMBER { $$ = $1; }
| expr '+' expr {$$ = $1+$3;}
| expr '-' expr {$$ = $1-$3;}
| expr '*' expr {$$ = $1*$3;}
| expr '/' expr {$$ = $1/$3;}
;
%%
int main(void)
{
yyparse();
return 0;
}
int yyerror(char *s) {
fprintf(stderr, "*%s*\n", s);
return 0;
}
runtime-error:
% ./hoc
8+9
get 8
+
get 9
*syntax error*
why and how to sovle it, thx!
You forgot to include your operators in your lex file, and you should return nonzero on a successful token read: returning 0 intuitively means there was no match by yylex. Remove the line in your lex file handling the newline character and replace it with the following:
[-+*/\n] { return *yytext; }
. { yyerror("unrecognized character"); return 0; }
Now it should work. Returning *yytext allows your yacc grammar to parse an expression successfully, e.g. if you get a '+', return it to allow the grammar to parse properly.
I AM USING BISON AND FLEX.
What does return 0 do in case of the kcalc.l file that I have posted?
And I am not getting the use of yywrap without a body (i mean not literally but an empty body).The code is of a calculator without any variable managing and basic operations that can be done like addition subtraction multiplication division and handling of unary minus operator. I have been studying through the lex and yacc specifications but did not get any answer for the query I asked .
Kcal.y
%{
#include <stdio.h>
%}
%token Number
%left '-' '+'
%left '*' '/'
%nonassoc UMINUS
%%
statement: expression
{ printf(" result = %d\n", $1);} ;
expression: expression '+' expression
{ $$ = $1 + $3;
printf("Recognised'+'expression\n");
}
| expression '-' expression
{ $$ = $1 - $3;
printf("Recognised '-' expression\n");
}
| expression '*' expression
{ $$ = $1 * $3;
printf("Recognised '*' expression\n");
}
| expression '/' expression
{ if ($3 == 0)
printf ("divide by zero\n");
else
$$ = $1 / $3;
printf("Recognised '/' expression\n");
}
| '-' expression %prec UMINUS
{
$$ = - $2;
printf("Recognised paranthesized expression\n");
}
| '(' expression ')'
{
$$ = $2;
printf("Recognised paranthesized expression");
}
| Number { $$ = $1;
printf("Recognised a no.\n");
}
;
%%
int main(void)
{
return yyparse();
}
int yyerror (char *msg)
{
return fprintf(stderr,"Yacc :%s", msg);
}
yywrap()
{
}
kcalc.l
%{
#include "y.tab.h"
extern int yylval;
%}
%%
[0-9]+ { yylval = atoi(yytext);
printf("accepted the number : %d\n", yylval);
return Number; }
[ \t] { printf("skipped whitespace \n");}
\n { printf("reached end of line\n");
**return 0;**
}
. { printf("found other data \" %s\n", yytext);
return yytext[0];
}
%%
The return 0 notifies the end-of-input to the parser, so apparently the expression should be contained on a single line. The empty body of yywrap is just wrong. If you use -Wall with the gcc compiler it will give two warnings for yywrap:
kcal.y:54: warning: return type defaults to ‘int’
kcal.y:55: warning: control reaches end of non-void function
The first one because no result type for the function is specified (K&R style C), so it is assumed it should return an int. The second warning because it lacks a return statement for such an int.
Since a newline terminates the input, the chances of yywrap ever being called are slim. But it will be called if the input does not contain a newline. If by sheer accident the (more or less random) return value of yywrap were to be interpreted as 0 the tokenizer would end up in an infinite loop of repeatedly calling yywrap.
When I run yacc -d parser.y on the following file I get the following errors:
parser.y:23.3-24.4: warning: unused value: $4
15 rules never reduced
parser.y: warning: 7 useless nonterminals and 15 useless rules
parser.y:16.1-14: fatal error: start symbol statement_list does not derive any sentence
make: *** [y.tab.c] Error 1
I'm particularly concerned about how to get rid of the fatal error.
%{
#include "parser.h"
#include <string.h>
%}
%union {
double dval;
struct symtab *symp;
}
%token <symp> NAME
%token <dval> NUMBER
%type <dval> expression
%type <dval> term
%type <dval> factor
%%
statement_list: statement '\n'
| statement_list statement '\n'
;
statement: NAME '=' expression { $1->value = $3; }
| expression { printf("= %g\n", $1); }
;
expression: expression '+' term { $$ = $1 + $3; }
| expression '-' term { $$ = $1 - $3; }
term
;
term: term '*' factor { $$ = $1 * $3; }
| term '/' factor { if($3 == 0.0)
yyerror("divide by zero");
else
$$ = $1 / $3;
}
| factor
;
factor: '(' expression ')' { $$ = $2; }
| '-' factor { $$ = -$2; }
| NUMBER
| NAME { $$ = $1->value; }
;
%%
/* look up a symbol table entry, add if not present */
struct symtab *symlook(char *s) {
char *p;
struct symtab *sp;
for(sp = symtab; sp < &symtab[NSYMS]; sp++) {
/* is it already here? */
if(sp->name && !strcmp(sp->name, s))
return sp;
if(!sp->name) { /* is it free */
sp->name = strdup(s);
return sp;
}
/* otherwise continue to next */
}
yyerror("Too many symbols");
exit(1); /* cannot continue */
} /* symlook */
yyerror(char *s)
{
printf( "yyerror: %s\n", s);
}
All those warnings and errors are caused by the missing | before term in your expression rule. The hint is the unused $4 in a snippet that's plainly should only have 3 arguments. That problem cascades into all the others.
Change:
expression: expression '+' term { $$ = $1 + $3; }
| expression '-' term { $$ = $1 - $3; }
term
;
into:
expression: expression '+' term { $$ = $1 + $3; }
| expression '-' term { $$ = $1 - $3; }
| term
;
and try again.
you forget the or | here
expression: expression '+' term { $$ = $1 + $3; }
| expression '-' term { $$ = $1 - $3; }
term
;
the last rule should be |term {};