Decode steps of parsing from y.output file - syntax-error

1.if i input int a; or float b; the grammar is using LR parsing to parse it. I have read the rules of how LR works but wanted to know a step by step instruction of it.
2.Why am i getting a syntax error? And how do i know for which line it is occuring?
EDIT to question 2: %error-verbose gives a more detailed reason of error, i replaced NAME token from flex file with id and it works.
y.output
Terminals unused in grammar
LCURL
RCURL
LPAREN
RPAREN
NAME
DOUBLE
THEN
ELSE
"<"
">"
"<="
">="
"="
"!="
"+"
"-"
"*"
"/"
UMINUS
Grammar
0 $accept: start $end
1 start: program
2 program: program unit
3 | unit
4 unit: var_dec
5 var_dec: type_specifier declaration_list SEMICOLON
6 type_specifier: "int"
7 | FLOAT
8 | VOID
9 declaration_list: declaration_list COMMA ID
10 | declaration_list COMMA ID LTHIRD CONST_INT RTHIRD
11 | ID
12 | ID LTHIRD CONST_INT RTHIRD
Terminals, with rules where they appear
$end (0) 0
error (256)
"int" (258) 6
COMMA (259) 9 10
SEMICOLON (260) 5
ID (261) 9 10 11 12
FLOAT (262) 7
VOID (263) 8
LCURL (264)
RCURL (265)
LPAREN (266)
RPAREN (267)
CONST_INT (268) 10 12
LTHIRD (269) 10 12
RTHIRD (270) 10 12
NAME (271)
DOUBLE (272)
THEN (273)
ELSE (274)
"<" (275)
">" (276)
"<=" (277)
">=" (278)
"=" (279)
"!=" (280)
"+" (281)
"-" (282)
"*" (283)
"/" (284)
UMINUS (285)
Nonterminals, with rules where they appear
$accept (31)
on left: 0
start (32)
on left: 1, on right: 0
program (33)
on left: 2 3, on right: 1 2
unit (34)
on left: 4, on right: 2 3
var_dec (35)
on left: 5, on right: 4
type_specifier (36)
on left: 6 7 8, on right: 5
declaration_list (37)
on left: 9 10 11 12, on right: 5 9 10
State 0
0 $accept: . start $end
"int" shift, and go to state 1
FLOAT shift, and go to state 2
VOID shift, and go to state 3
start go to state 4
program go to state 5
unit go to state 6
var_dec go to state 7
type_specifier go to state 8
State 1
6 type_specifier: "int" .
$default reduce using rule 6 (type_specifier)
State 2
7 type_specifier: FLOAT .
$default reduce using rule 7 (type_specifier)
State 3
8 type_specifier: VOID .
$default reduce using rule 8 (type_specifier)
State 4
0 $accept: start . $end
$end shift, and go to state 9
State 5
1 start: program .
2 program: program . unit
"int" shift, and go to state 1
FLOAT shift, and go to state 2
VOID shift, and go to state 3
$default reduce using rule 1 (start)
unit go to state 10
var_dec go to state 7
type_specifier go to state 8
State 6
3 program: unit .
$default reduce using rule 3 (program)
State 7
4 unit: var_dec .
$default reduce using rule 4 (unit)
State 8
5 var_dec: type_specifier . declaration_list SEMICOLON
ID shift, and go to state 11
declaration_list go to state 12
State 9
0 $accept: start $end .
$default accept
State 10
2 program: program unit .
$default reduce using rule 2 (program)
State 11
11 declaration_list: ID .
12 | ID . LTHIRD CONST_INT RTHIRD
LTHIRD shift, and go to state 13
$default reduce using rule 11 (declaration_list)
State 12
5 var_dec: type_specifier declaration_list . SEMICOLON
9 declaration_list: declaration_list . COMMA ID
10 | declaration_list . COMMA ID LTHIRD CONST_INT RTHIRD
COMMA shift, and go to state 14
SEMICOLON shift, and go to state 15
State 13
12 declaration_list: ID LTHIRD . CONST_INT RTHIRD
CONST_INT shift, and go to state 16
State 14
9 declaration_list: declaration_list COMMA . ID
10 | declaration_list COMMA . ID LTHIRD CONST_INT RTHIRD
ID shift, and go to state 17
State 15
5 var_dec: type_specifier declaration_list SEMICOLON .
$default reduce using rule 5 (var_dec)
State 16
12 declaration_list: ID LTHIRD CONST_INT . RTHIRD
RTHIRD shift, and go to state 18
State 17
9 declaration_list: declaration_list COMMA ID .
10 | declaration_list COMMA ID . LTHIRD CONST_INT RTHIRD
LTHIRD shift, and go to state 19
$default reduce using rule 9 (declaration_list)
State 18
12 declaration_list: ID LTHIRD CONST_INT RTHIRD .
$default reduce using rule 12 (declaration_list)
State 19
10 declaration_list: declaration_list COMMA ID LTHIRD . CONST_INT RTHIRD
CONST_INT shift, and go to state 20
State 20
10 declaration_list: declaration_list COMMA ID LTHIRD CONST_INT . RTHIRD
RTHIRD shift, and go to state 21
State 21
10 declaration_list: declaration_list COMMA ID LTHIRD CONST_INT RTHIRD .
$default reduce using rule 10 (declaration_list)
My output is this
type_specifier -> INT
syntax error
I'm adding the flex and bison file just in case.
Flex file: scanner.l
%option noyywrap
%{
#include<stdlib.h>
#include<stdio.h>
#include "y.tab.h"
#include "SymbolTable.h"
#include "SymbolInfo.h"
#include "ScopeTable.h"
void yyerror (char *);
extern YYSTYPE yylval;
extern SymbolTable *table;
int line_count = 1;
%}
NAME [a-z]*
DOUBLE (([0-9]+(\.[0-9]*)?)|([0-9]*\.[0-9]+))
id NAME
newline \n
%%
{newline} {line_count++;}
[ \t]+ {}
(([0-9]+(\.[0-9]*)?)|([0-9]*\.[0-9]+)) {
yylval.f = atof(yytext);
return DOUBLE;
}
"int" {return INT;}
"float" {return FLOAT;}
"void" {return VOID;}
[a-z]+ {
yylval.s = *yytext;
return NAME;
}
";" {return SEMICOLON;}
"," {return COMMA;}
"(" {return LPAREN;}
")" {return RPAREN;}
"{" {return LCURL;}
"}" {return RCURL;}
{id} {
SymbolInfo *s;
std::string str;
for(int i = 0;yytext[i] != '\0';i++)
{
str = yytext[i];
}
s = table -> scope_lookup(str);
if(s == NULL)
{
s->setName(str);
}
yylval.sym = s;
return ID;
}
%%
and yacc file: parser.y
%{
#include<stdio.h>
#include<stdlib.h>
#include<string.h>
#include "SymbolTable.h"
#include "SymbolInfo.h"
#include "ScopeTable.h"
int yyparse(void);
int yylex(void);
extern char* yytext;
extern FILE * yyin;
extern int tableSize;
FILE *logout;
extern int line_count;
SymbolTable *table;
void yyerror (char *s)
{
fprintf(stderr,"%s\n",s);
return;
}
%}
%union {
class SymbolInfo* sym;
char *s;
float f;
}
%verbose
%token INT "int"
%token COMMA SEMICOLON ID FLOAT VOID LCURL RCURL LPAREN RPAREN
%token CONST_INT LTHIRD RTHIRD
%token <s> NAME
%token <f> DOUBLE
%type <s> INT
//%expect 1
%precedence THEN
%precedence ELSE
%left "<" ">" "<=" ">=" "=" "!="
%left "+" "-"
%left "*" "/"
%left UMINUS
%%
start : program { printf("start -> program\n");
fprintf(logout,"%d : start -> program\n",line_count);
}
;
program : program unit {
printf("program -> program unit\n");
fprintf(logout,"%d : program -> program unit\n",line_count);
}
| unit {
printf("program -> unit\n");
fprintf(logout,"%d : program -> unit\n",line_count);
}
;
unit : var_dec {
printf("unit -> var_dec\n");
fprintf(logout,"%d : unit -> var_dec\n",line_count);
}
;
var_dec: type_specifier declaration_list SEMICOLON {
printf("var_dec -> type_specifier declaration_list SEMICOLON \n");
fprintf(logout,"%d -> var_dec: type_specifier declaration_list SEMICOLON \n", line_count);
}
;
type_specifier : INT {printf("type_specifier -> INT\n");
fprintf(logout,"%d : type_specifier-> INT\n", line_count);
}
| FLOAT {printf("type_specifier ->FLOAT\n");
fprintf(logout,"%d : type_specifier-> FLOAT\n",line_count);
}
| VOID {printf("type_specifier -> VOID\n");
fprintf(logout,"%d : type_specifier-> VOID\n",line_count);
}
;
declaration_list : declaration_list COMMA ID {
printf("declaration_list -> declaration_list COMMA ID\n");
fprintf(logout,"%d : declaration_list -> declaration_list COMMA ID\n",line_count);
}
| declaration_list COMMA ID LTHIRD CONST_INT RTHIRD {
printf("declaration_list -> declaration_list COMMA ID LTHIRD CONST_INT RTHIRD\n");
fprintf(logout,"%d : declaration_list -> declaration_list COMMA ID LTHIRD CONST_INT RTHIRD\n",line_count);
}
|ID {
printf("declaration_list -> ID\n");
fprintf(logout,"%d : declaration_list -> ID\n",line_count);
}
|ID LTHIRD CONST_INT RTHIRD {
printf("declaration_list -> ID LTHIRD CONST_INT RTHIRD\n");
fprintf(logout,"%d : declaration_list -> ID LTHIRD CONST_INT RTHIRD\n",line_count);
}
;
%%
int main(int argc, char *argv[])
{
FILE *fp ;
int token = 0;
if((fp = fopen(argv[1],"r")) == NULL)
{
fprintf(logout,"cannot open file");
exit(1);
}
logout = fopen("log.txt","w");
yyin = fp;
yyparse();
fclose(fp);
fclose(logout);
return 0;
}

If you want to see how your parser works in detail, you are better off using bison's own trace facility. Please note that you must do two things:
Tell bison to generate the tracing code, using the -t option when you generate the parser;
Set the global variable yydebug to 1 in order to start tracing.
There are other useful options, too, all described in Debugging Your Parser.
Flex will automatically track the input line number if you add
%option yylineno
to your flex prolog. Once you do that, you can use the global variable yylineno in your yyerror function. Note that since yylineno is defined in the scanner, you must explicitly declare it in your parser:
void yyerror (const char *msg)
{
extern int yylineno;
fprintf(stderr, "At %d: %s\n", yylineno, msg);
}

Related

Yacc parser not detecting my language well

I am new to yacc and I am trying to define some rules for my language.
I have written a grammar "well" and it runs and executes without an error but for some reason, it doesn't do what it is supposed to do.
mylex.l
%{
#include <stdio.h>
#include "myyacc.tab.h"
extern int yyval;
%}
/* KEEP TRACK OF LINE NUMBER*/
%option yylineno
uppercase [A-Z]
lowercase [a-z]
alpha [{uppercase}{lowercase}]
digit [0-9]
alphanum [{alpha}{digit}]
id uppercase({alphanum}|_)*
int_literal [0-9]+
float_literal [0-9]+\.[0-9]+
string_literal \"[^\"]*\"
comment (##)(.)*(##)
%%
"int" {return INT;}
"float" {return FLOAT;}
"boolean" {return BOOLEAN;}
"if" {return IF;}
"else" {return ELSE;}
"end" {return END;}
"true" {return TRUE;}
"false" {return FALSE;}
"read" {return READ;}
"print" {return PRINT;}
"while" {return WHILE;}
"START" {return START;}
"END" {return END;}
"+" {return ADD;}
"-" {return SUB;}
"*" {return MUL;}
"/" {return DIV;}
"&&" {return LOG_AND;}
"||" {return LOG_OR;}
"!" {return LOG_NOT;}
"==" {return EQ;}
"<>" {return NEQ;}
"<" {return LT;}
"<=" {return LEQ;}
">" {return GT;}
">=" {return GEQ;}
"=" {return ASSIGN;}
"(" {return LPAREN;}
")" {return RPAREN;}
"{" {return LBRACE;}
"}" {return RBRACE;}
{int_literal} {return INT_LITERAL;}
{float_literal} {return FLOAT_LITERAL;}
{string_literal} {return STRING_LITERAL;}
{id} {return ID;}
{comment} { ; }
%%
int yywrap() {
return 1;
}
myyacc.y
%{
#include <stdio.h>
#include <stdlib.h>
extern int yylineno;
extern FILE* yyin;
extern int yyerror (char* msg);
extern char * yytext;
%}
/* definitions section start */
%token INT FLOAT BOOLEAN IF ELSE END TRUE FALSE READ PRINT WHILE START
%token INT_LITERAL FLOAT_LITERAL STRING_LITERAL ID ERROR
%right ASSIGN
%right LOG_NOT
%left MUL DIV
%left ADD SUB
%left LPAREN RPAREN
%left LBRACE RBRACE
%left LT LEQ GT GEQ
%left EQ NEQ
%left LOG_AND
%left LOG_OR
%start program
/* definitions section end */
%%
/* rules section start */
program : START statements END {printf("No syntax errors detected")};
statements : statements statement
| statement
;
statement : dec_stmt
| assignment_stmt
| print_stmt
| read_stmt
| condition_stmt
| while_stmt
;
dec_stmt : type ID
;
type : INT
| FLOAT
| BOOLEAN
;
assignment_stmt : ID ASSIGN expression
;
expression : exp EQ exp
| exp NEQ exp
| exp LT exp
| exp LEQ exp
| exp GT exp
| exp GEQ exp
| exp
;
exp : exp MUL exp
| exp DIV exp
| exp ADD exp
| exp SUB exp
| exp LOG_AND exp
| exp LOG_OR exp
| LOG_NOT exp
| LPAREN exp RPAREN
| INT_LITERAL
| FLOAT_LITERAL
| ID
| TRUE
| FALSE
;
print_stmt : PRINT LPAREN ID RPAREN
| PRINT LPAREN STRING_LITERAL RPAREN
;
read_stmt : ID ASSIGN READ LPAREN RPAREN
;
condition_stmt : IF LPAREN expression RPAREN LBRACE statement RBRACE END
| IF LPAREN expression RPAREN LBRACE statement RBRACE ELSE LBRACE statement RBRACE END
;
while_stmt : WHILE LPAREN expression RPAREN LBRACE statement RBRACE
;
/* rules section end */
%%
/* auxiliary routines start */
int main(int argc, char *argv[])
{
// don't change this part
yyin = fopen(argv[1], "r" );
if(!yyparse())
printf("\nParsing complete\n");
else
printf("\nParsing failed\n");
fclose(yyin);
return 0;
}
int yyerror (char* msg)
{
printf("Line %d: %s near %s\n", yylineno, msg, yytext);
exit(1);
}
/* auxiliary routines end */
Test case
START
int X12
float ABC1
DDe = 7
while(QNn >0) ## this a Comment ##
{ RLk9999 = ACc - 2
CCC = true
}
if ( ACc ==5){ print ( " Inside IF inside Loop " ) } end }
print ( " Hello .. " )
END
Output
Line 3: syntax error near 12
It also gets the line number wrong.
I've been trying to see what I'm doing wrong for some time now and I'd really appreciate a second set of eyes.
You cannot use macros inside character classes. Inside a character class, pattern operators lose their special meaning, so when you write
alphanum [{alpha}{digit}]
you are defining a character class containing {, }, and the letters adghilpt. That doesn't match the 12 in X12.
Anyway, flex already has predefined sets of characters which you can include in your character classes:
* [:lower:] a-z
* [:upper:] A-Z
* [:alpha:] [:lower:][:upper:]
* [:digit:] 0-9
* [:alnum:] [:alpha:][:digit:]
Note that these can only be used inside a character class. So you could write your id pattern as
id [[:upper:]][[:alnum:]_]*
without the need for any other macros.
Please see the flex pattern documentation for more details.
In addition to #rici's answer, I've also noticed that my while_statement in the yacc file has only been set to accept only one statement in it's body

flex&bison shift/reduce conflict

Here are part of my grammar:
expr_address
: expr_address_category expr_opt { $$ = new ExprAddress($1,*$2);}
| axis axis_data { $$ = new ExprAddress($1,*$2);}
;
axis_data
: expr_opt { $$ = $1;}
| sign { if($1 == MINUS)
$$ = new IntergerExpr(-1000000000);
else if($1 == PLUS)
$$ = new IntergerExpr(+1000000000);}
;
expr_opt
: { $$ = new IntergerExpr(0);}
| expr { $$ = $1;}
;
expr_address_category
: I { $$ = NCAddress_I;}
| J { $$ = NCAddress_J;}
| K { $$ = NCAddress_K;}
;
axis
: X { $$ = NCAddress_X;}
| Y { $$ = NCAddress_Y;}
| Z { $$ = NCAddress_Z;}
| U { $$ = NCAddress_U;}
| V { $$ = NCAddress_V;}
| W { $$ = NCAddress_W;}
;
expr
: '[' expr ']' {$$ = $2;}
| COS parenthesized_expr {$$ = new BuiltinMethodCallExpr(COS,*$2);}
| SIN parenthesized_expr {$$ = new BuiltinMethodCallExpr(SIN,*$2);}
| ATAN parenthesized_expr {$$ = new BuiltinMethodCallExpr(ATAN,*$2);}
| SQRT parenthesized_expr {$$ = new BuiltinMethodCallExpr(SQRT,*$2);}
| ROUND parenthesized_expr {$$ = new BuiltinMethodCallExpr(ROUND,*$2);}
| variable {$$ = $1;}
| literal
| expr '+' expr {$$ = new BinaryOperatorExpr(*$1,PLUS,*$3);}
| expr '-' expr {$$ = new BinaryOperatorExpr(*$1,MINUS,*$3);}
| expr '*' expr {$$ = new BinaryOperatorExpr(*$1,MUL,*$3);}
| expr '/' expr {$$ = new BinaryOperatorExpr(*$1,DIV,*$3);}
| sign expr %prec UMINUS {$$ = new UnaryOperatorExpr($1,*$2);}
| expr EQ expr {$$ = new BinaryOperatorExpr(*$1,EQ,*$3);}
| expr NE expr {$$ = new BinaryOperatorExpr(*$1,NE,*$3);}
| expr GT expr {$$ = new BinaryOperatorExpr(*$1,GT,*$3);}
| expr GE expr {$$ = new BinaryOperatorExpr(*$1,GE,*$3);}
| expr LT expr {$$ = new BinaryOperatorExpr(*$1,LT,*$3);}
| expr LE expr {$$ = new BinaryOperatorExpr(*$1,LE,*$3);}
;
variable
: d_h_address {$$ = new AddressExpr(*$1);}
;
d_h_address
: D INTEGER_LITERAL { $$ = new IntAddress(NCAddress_D,$2);}
| H INTEGER_LITERAL { $$ = new IntAddress(NCAddress_H,$2);}
;
I hope my grammar support that like:
H100=20;
X;
X+0;
X+;
X+H100; //means H100 variable ref
The top two are same with X0; By the way,sign -> +/-;
But bison report conflicts,the key part of bison.output:
State 108
11 expr: sign . expr
64 axis_data: sign .
INTEGER_LITERAL shift, and go to state 93
REAL_LITERAL shift, and go to state 94
'+' shift, and go to state 74
'-' shift, and go to state 75
COS shift, and go to state 95
SIN shift, and go to state 96
ATAN shift, and go to state 97
SQRT shift, and go to state 98
ROUND shift, and go to state 99
D shift, and go to state 35
H shift, and go to state 36
'[' shift, and go to state 100
D [reduce using rule 64 (axis_data)]
H [reduce using rule 64 (axis_data)]
$default reduce using rule 64 (axis_data)
State 69
62 expr_address: axis . axis_data
INTEGER_LITERAL shift, and go to state 93
REAL_LITERAL shift, and go to state 94
'+' shift, and go to state 74
'-' shift, and go to state 75
COS shift, and go to state 95
SIN shift, and go to state 96
ATAN shift, and go to state 97
SQRT shift, and go to state 98
ROUND shift, and go to state 99
D shift, and go to state 35
H shift, and go to state 36
'[' shift, and go to state 100
D [reduce using rule 65 (expr_opt)]
H [reduce using rule 65 (expr_opt)]
$default reduce using rule 65 (expr_opt)
State 68
61 expr_address: expr_address_category . expr_opt
INTEGER_LITERAL shift, and go to state 93
REAL_LITERAL shift, and go to state 94
'+' shift, and go to state 74
'-' shift, and go to state 75
COS shift, and go to state 95
SIN shift, and go to state 96
ATAN shift, and go to state 97
SQRT shift, and go to state 98
ROUND shift, and go to state 99
D shift, and go to state 35
H shift, and go to state 36
'[' shift, and go to state 100
D [reduce using rule 65 (expr_opt)]
H [reduce using rule 65 (expr_opt)]
$default reduce using rule 65 (expr_opt)
I don't know how to deal with this,thanks advance.
EDIT:
I make a minimal grammar:
%{
#include <stdio.h>
extern "C" int yylex();
void yyerror(const char *s) { printf("ERROR: %s/n", s); }
%}
%token PLUS '+' MINUS '-'
%token D H I J K X Y Z INT
/*%type sign expr var expr_address_category expr_opt
%type axis */
%start word_list
%%
/*Above grammar lost this rule,it makes ambiguous*/
word_list
: word
| word_list word
;
sign
: PLUS
| MINUS
;
expr
: var
| sign expr
| '[' expr ']'
;
var
: D INT
| H INT
;
word
: expr_address
| var '=' expr
;
expr_address
: expr_address_category expr_opt
/*| '(' axis sign ')'*/
| axis sign
;
expr_opt
: /* empty */
| expr
;
expr_address_category
: I
| J
| K
| axis
;
axis
: X
| Y
| Z
;
%%
and I hope it can support:
X;
X0;
X+0; //the top three are same with X0
X+;
X+H100; //this means X's data is ref +H100;
X+H100=10; //two word on a block,X+ and H100=10;
XH100=10; //two word on a block,X and H100=10;
EDIT2:
The above EDIT lost this rule.
block
: word_list ';'
| ';'
;
Because I have to allow such grammar:
H000 = 100 H001 = 200 H002 = 300;
This is essentially the classic LR(2) grammar, except that in your case it is LR(3) because your variables consist of two tokens [Note 1]:
var : D INT | H INT
The basic problem is the concatenation of words without separators:
word_list : word | word_list word
combined with the fact that one of the options for word ends with an optional var:
word: expr_address
expr_address: expr_address_category expr_opt
while the other one starts with a var:
word: var '=' expr
The = makes this unambiguous, since nothing in an expr can contain that symbol. But at the point where a decision needs to be made, the = is not visible, because the lookahead is the first token of a var -- either an H or a D -- and the equals sign is still two tokens away.
This LR(2) grammar is very similar to the grammar used by yacc/bison itself, a fact which I always find to be ironic, because the grammar for yacc does not require ; between productions:
production: SYMBOL ':' | production SYMBOL /* Lots of detail omitted */
As with your grammar, this makes it impossible to know whether a SYMBOL should be shifted or trigger a reduce because the disambiguating : is still not visible.
Since the grammar is (I assume) unambiguous, and bison can now generate GLR parsers, that will be the simplest solution: just add
%glr-parser
to your bison prologue (but read the section of the bison manual on GLR parsers to understand the trade-off).
Note that the shift-reduce conflicts will still be reported as warnings; since it is impossible to reliably decide whether a grammar is ambiguous, bison doesn't attempt to do so and ambiguities will be reported at run-time if they exist.
You should also fix the issue mentioned in #ChrisDodd's answer regarding the refactoring of expr_address (although with a GLR parser it is not strictly necessary).
If, for whatever reason, you feel that a GLR parser will not meet your needs, you could use the solution in most implementations of yacc (including bison), which is a hack in the lexical scanner. The basic idea is to mark whether a symbol is followed by a colon or not in the lexer, so that the above production could be rewritten as:
production: SYMBOL_COLON | production SYMBOL
This solution would work for you if you were willing to combine the letter and the number into a single token:
word: expr_address expr_opt
| VARIABLE_EQUALS expr
// ...
expr: VARIABLE
My preference is to do this transformation in a wrapper around the lexer, which keeps a (one-element) queue of pending tokens:
/* The use of static variables makes this yylex wrapper unreliable
* if it is reused after a syntax error.
*/
int yylex_wrapper() {
static int saved_token = -1;
static YYSTYPE saved_yylval = {0};
int token = saved_token;
saved_token = -1;
yylval = saved_yylval;
// Read a new token only if we don't have one in the queue.
if (token < 0) token = yylex();
// If the current token is IDENTIFIER, check the next token
if (token == IDENTIFIER) {
// Read the next token into the queue (saved_token / saved_yylval)
YYSTYPE temp_val = yylval;
saved_token = yylex();
saved_yylval = yylval;
yylval = temp_val;
// If the second token is '=', then modify the current token
// and delete the '=' from the queue
if (saved_token == '=') {
saved_token = -1;
token = IDENTIFIER_EQUALS;
}
}
return token;
}
Notes
Personally, I would start by making a var a single token (do you really want to allow people to write:
H /* Some comment in the middle of the variable name */ 100
but that's not going to solve any problems; it merely reduces the grammar's lookahead requirement from LR(3) to LR(2).
The main problem is that it can't figure out where one word in a word_list ends and the next one begins, because there is no separator token between words. This is in contrast to your examples, which all have ; terminators. So that suggests one obvious fix -- put in the ; separators:
word: expr_address ';'
| var '=' expr ';'
That fixes most of the problems, but leaves a lookahead conflict where it can't decide whether an axis is an expr_address_category or not when the lookahead is a sign, because it depends on whether there's an expr after the sign or not. You can fix that by refactoring to defer deciding:
expr_address
: expr_address_category expr_opt
| axis expr_opt
| axis sign
..and remove axis from expr_address_category

Yacc only reads the first match of the rule

Hi I am writing a simple yacc program that takes a program code and counts how many assign statements there are.
For example, for the following code snippet:
void main() {
int a = 3;
int bb = 10;
}
I'd like my yacc to print out that there are 2 assign sentences. Since I am a beginner, I found a sample code from Oreily's book online and modified the code.
yacc.y
%{
2 #include <stdio.h>
3 int assign = 0;
4 %}
5
6 %token NAME NUMBER
7 %start statement
8 %%
9 statement: NAME '=' expression ';' {assign++;}
11 ;
12 | expression { printf("= %d\n", $1); }
13 ;
14 expression: expression '+' NUMBER { $$ = $1 + $3;
15 printf ("Recognized '+' expression.\n");
16 }
17 | expression '-' NUMBER { $$ = $1 - $3;
18 printf ("Recognized '-' expression.\n");
19 }
20 | NUMBER { $$ = $1;
21 printf ("Recognized a number.\n");
22 }
23 ;
24 %%
25 int main (void) {
26 yyparse();
27 printf("assign =%d", assign);
28 }
29
30 /* Added because panther doesn't have liby.a installed. */
31 int yyerror (char *msg) {
32 return fprintf (stderr, "YACC: %s\n", msg);
33 }
lex.l
1 %{
2 #include "y.tab.h"
3 extern int yylval;
4 %}
5
6 %%
7 [0-9]+ { yylval = atoi (yytext);
8 printf ("scanned the number %d\n", yylval);
9 return NUMBER; }
10 [ \t] { printf ("skipped whitespace\n"); }
11 \n { printf ("reached end of line\n");
12 return 0;
13 }
14 [a-zA-Z]+ {printf("found name"); return NAME;}
15 . { printf ("found other data \"%s\"\n", yytext);
16 return yytext[0];
17 /* so yacc can see things like '+', '-', and '=' */
18 }
19 %%
20 int yywrap(){
21 return 1;
22 }
~
test.txt
a = 3;
3+2;
b = 3;
When I build the code, I get a.out. When I run ./a.out < test.txt, the output shows that the there is one assign. IT seems like it only recognized the first sentence.
How do I make it so that the program keeps looking for the matches after the first match?
Also, why is there semi-colon in line 11 and 13 in yacc.y? Since it's all connected by '|', I don't understand why ; is placed there.
Your grammar only parses one statement. Make the following changes:
%start statements
statements
: statement
| statements statement
;
etc. as before.
It's very important knowing how to debug your program. In first part of the file you need to add #define YYDEBUG 1, and in main function yydebug = 1. This will allow you to see exact steps when you run your parser and then you'll know where your error is. Knowing this is extremely important, because mistakes in Yacc are usually very hard to find. So debug your program!
%{
#define YYDEBUG 1 // This is new
%}
int main(){
yydebug = 1; // This is new
yyparse();
}
Semicolon on line 11 is wrong. Yacc rules look like this:
Nonterminal : something here
| something else here
| ... etc.
...
;

Yacc - a line returns a syntax error when there is a matching rule

I am writing a simple Yacc program that takes a program code and returns the counts of int and double type of variables and the functions.
I ran into a bizarre problem that a program returns a syntax error when there is a matching rule for the line, but the line picked up a different rule. I brought the components of the code that shows this error: (If you see unused variables, that's because I deleted other parts that are irrelevant to this error)
yacc code
%{
#define YYDEBUG 1
#include <stdio.h>
#include <stdlib.h>
int func_count=0;
int int_count=0;
int char_count=0;
int double_count=0;
int float_count=0;
int pointer_count=0;
int array_count=0;
int condition_count=0;
int for_count=0;
int return_count=0;
int numeric_count=0;
%}
%token INT_KEYWORD DOUBLE_KEYWORD CHAR_KEYWORD RETURN_KEYWORD FLOAT_KEYWORD IF_KEYWORD VARIABLE OPERATOR COMPARE DIGIT FOR_KEYWORD POINTER_VARIABLE
%start program
%%
program:
program statement '\n'
|
;
statement:
declaration_statement |
function_declaration_statement {func_count++;}
;
function_declaration_statement:
datatype VARIABLE '(' datatype VARIABLE ')' '{'
;
declaration_statement:
int_declaration_statement |
double_declaration_statement
;
int_declaration_statement:
INT_KEYWORD VARIABLE '[' DIGIT ']' ';'{array_count++;}
|
INT_KEYWORD VARIABLE ';' {int_count++;}
|
INT_KEYWORD VARIABLE '=' DIGIT ';' {int_count++;}
double_declaration_statement:
DOUBLE_KEYWORD VARIABLE '[' DIGIT ']' ';' {array_count++;}
|
DOUBLE_KEYWORD VARIABLE ';' {double_count++;}
|
DOUBLE_KEYWORD VARIABLE '=' DIGIT ';' {double_count++;}
datatype:
INT_KEYWORD
|
DOUBLE_KEYWORD
|
CHAR_KEYWORD
|
FLOAT_KEYWORD
;
%%
int yyerror(char *s){
fprintf(stderr,"%s\n",s);
return 0;
}
int main (void){
yydebug=1;
yyparse();
printf("#int variable=%d, #double variable=%d",int_count,double_count);
printf("#array=%d\n",array_count);
printf("#function=%d\n",func_count);
}
lex
%{
#include <stdio.h>
#include <stdlib.h>
#include "y.tab.h"
void yyerror(char *);
%}
%%
"int" {return INT_KEYWORD;}
"double" {return DOUBLE_KEYWORD;}
"char" {return CHAR_KEYWORD;}
"float" {return FLOAT_KEYWORD;}
"if" {return IF_KEYWORD;}
"for" {return FOR_KEYWORD;}
"return" {return RETURN_KEYWORD;}
"==" {return COMPARE;}
">" {return COMPARE;}
"<" {return COMPARE;}
">=" {return COMPARE;}
"<=" {return COMPARE;}
"+" {return OPERATOR;}
"-" {return OPERATOR;}
"/" {return OPERATOR;}
"*" {return OPERATOR;}
"%" {return OPERATOR;}
[0-9]+ {return DIGIT;}
[a-z]+ {return VARIABLE;}
"*"" "?[a-zA-Z]+ {return POINTER_VARIABLE;}
"[" {return *yytext;}
"=" {return *yytext;}
"]" {return *yytext;}
[;\n(){}] {return *yytext;}
[ \t] ;
. {printf("%s\n",yytext); yyerror("invalid charactor");}
%%
int yywrap(void){
return 1;
}
test file:
int a;
int a[3];
int a(int a) {
Expected output
#int variable=1, #double variable=0 #array=1
#function=1
But instead it fails at the third line, int a(int a), because the program seemed to choose int variable declaration rule, and it fails when it sees '(' token, generating a syntax error.
The debug error message says...
....
Reading a token: Next token is token INT_KEYWORD ()
Shifting token INT_KEYWORD ()
Entering state 3
Reading a token: Next token is token VARIABLE ()
Shifting token VARIABLE ()
Entering state 13
Reading a token: Next token is token '(' ()
syntax error
....
Could anyone please point out what I did wrong? Thanks.
You have two shift/reduce conflicts in your grammar. You can see where in the output file generated by yacc :
State 3
8 int_declaration_statement: INT_KEYWORD . VARIABLE '[' DIGIT ']' ';'
9 | INT_KEYWORD . VARIABLE ';'
10 | INT_KEYWORD . VARIABLE '=' DIGIT ';'
14 datatype: INT_KEYWORD .
VARIABLE shift, and go to state 13
VARIABLE [reduce using rule 14 (datatype)]
State 4
11 double_declaration_statement: DOUBLE_KEYWORD . VARIABLE '[' DIGIT ']' ';'
12 | DOUBLE_KEYWORD . VARIABLE ';'
13 | DOUBLE_KEYWORD . VARIABLE '=' DIGIT ';'
15 datatype: DOUBLE_KEYWORD .
VARIABLE shift, and go to state 14
VARIABLE [reduce using rule 15 (datatype)]
Here, when yacc encounter an INT_KEYWORD or a DOUBLE_KEYWORD, it does not know whether it needs to shift or reduce (i.e. it does not know if it is a declaration or just a datatype). By default, yacc will shift.
Also, in your function_declaration_statement, you first have a datatype: yacc will reduce it (since it is the only production rule for it). Then it will have something like INT_KEYWORD VARIABLE (or DOUBLE_KEYWORD), so it will think it is a int_declaration_statement... The syntax error happens when yacc encounter a '('.
To solve this, you can remove the function_declaration_statement and add a line to your int_declaration_statement (and double). Something like :
statement: int_declaration_statement
| double_declaration_statement
;
int_declaration_statement: INT_KEYWORD VARIABLE '[' DIGIT ']' ';'{array_count++;}
| INT_KEYWORD VARIABLE ';' {int_count++;}
| INT_KEYWORD VARIABLE '=' DIGIT ';' {int_count++;}
| INT_KEYWORD VARIABLE '(' datatype VARIABLE ')' '{' {func_count++;}
;
That will remove you shift/reduce conflicts and give you the result you want, for instance :
--- ~ » ./a.out
int a;
int a[3];
int a(int a) {
#int variable=1, #double variable=0#array=1
#function=1
Hope it helps.

Bison:syntax error at the end of parsing

Hello this is my bison grammar file for a mini-programming language:
%{
#include <stdlib.h>
#include <stdio.h>
#include <math.h>
#include "projectbison.tab.h"
void yyerror(char const *);
extern FILE *yyin;
extern FILE *yyout;
extern int yylval;
extern int yyparse(void);
extern int n;
int errNum = 0;
int forNum = 0;
%}
%left PLUS MINUS
%left MULT DIV MOD
%nonassoc EQUAL NEQUAL LESS GREATER LEQUAL GEQUAL
%token INTEGER BOOLEAN STRING VOID
%token ID
%token AND
%token BEGINP
%token ENDP
%token EXTERN
%token COMMA
%token EQ
%token RETURN1
%token IF1 ELSE1 WHILE1 FOR1 DO1
%token LOR LAND LNOT
%token TRUE FALSE
%token EQUAL NEQUAL LESS GREATER LEQUAL GEQUAL
%token LB1 RB1
%token LCB1 RCB1
%token SEMIC
%token NEWLINE
%token PLUS MINUS
%token MULT DIV MOD
%token DIGIT STRING1
%start program
%%
/*50*/
program : external-decl program-header defin-field command-field
;
external-decl : external-decl external-prototype
|
;
external-prototype : EXTERN prototype-func NEWLINE
;
program-header : VOID ID LB1 RB1 NEWLINE
;
defin-field : defin-field definition
|
;
definition : variable-defin
| func-defin
| prototype-func
;
variable-defin : data-type var-list SEMIC newline
;
data-type : INTEGER
| BOOLEAN
| STRING
;
var-list : ID extra-ids
;
extra-ids : COMMA var-list
|
;
func-defin : func-header defin-field command-field
;
prototype-func : func-header SEMIC
;
func-header : data-type ID LB1 lists RB1 newline
;
lists: list-typ-param
|
;
list-typ-param : typical-param typical-params
;
typical-params : COMMA list-typ-param
|
;
typical-param : data-type AND ID
;
command-field : BEGINP commands newline ENDP newline
;
commands : commands newline command
|
;
command : simple-command SEMIC
| struct-command
| complex-command
;
complex-command : LCB1 newline command newline RCB1
;
struct-command : if-command
| while-command
| for-command
;
simple-command : assign
| func-call
| return-command
| null-command
;
if-command : IF1 LB1 gen-expr RB1 newline command else-clause
;
else-clause: ELSE1 newline command
;
while-command : WHILE1 LB1 gen-expr RB1 DO1 newline RCB1 command LCB1
;
for-command : FOR1 LB1 conditions RB1 newline RCB1 command LCB1
;
conditions : condition SEMIC condition SEMIC condition SEMIC
;
condition : gen-expr
|
;
assign : ID EQ gen-expr
;
func-call : ID LB1 real-params-list RB1
| ID LB1 RB1
;
real-params-list : real-param real-params
;
real-params : COMMA real-param real-params
|
;
real-param : gen-expr
;
return-command : RETURN1 gen-expr
;
null-command :
;
gen-expr : gen-terms gen-term
;
gen-terms : gen-expr LOR
|
;
gen-term : gen-factors gen-factor
;
gen-factors : gen-term LAND
|
;
gen-factor : LNOT first-gen-factor
| first-gen-factor
;
first-gen-factor : simple-expr comparison
| simple-expr
;
comparison : compare-operator simple-expr
;
compare-operator : EQUAL
| NEQUAL
| LESS
| GREATER
| LEQUAL
| GEQUAL
;
simple-expr : expresion simple-term
;
expresion : simple-expr PLUS
|simple-expr MINUS
|
;
simple-term : mul-expr simple-parag
;
mul-expr: simple-term MULT
| simple-term DIV
| simple-term MOD
|
;
simple-parag : simple-prot-oros
| MINUS simple-prot-oros
;
simple-prot-oros : ID
| constant
| func-call
| LB1 gen-expr RB1
;
constant : DIGIT
| STRING1
| TRUE
| FALSE
;
newline:NEWLINE
|
;
%%
void yyerror(char const *msg)
{
errNum++;
fprintf(stderr, "%s\n", msg);
}
int main(int argc, char **argv)
{
++argv;
--argc;
if ( argc > 0 )
{yyin= fopen( argv[0], "r" ); }
else
{yyin = stdin;
yyout = fopen ( "output", "w" );}
int a = yyparse();
if(a==0)
{printf("Done parsing\n");}
else
{printf("Yparxei lathos sti grammi: %d\n", n);}
printf("Estimated number of errors: %d\n", errNum);
return 0;
}
for a simple input like this :
void main()
integer k;
boolean l;
begin
aek=32;
end
i get the following :
$ ./MyParser.exe file2.txt
void , id ,left bracket , right bracket
integer , id ,semicolon
boolean , id ,semicolon
BEGIN PROGRAM
id ,equals , digit ,semicolon
END PROGRAM
syntax error
Yparxei lathos sti grammi: 8
Estimated number of errors: 1
And whatever change i make to the input file i get a syntax error at the end....Why do i get this and what can i do??thanks a lot in advance!here is the flex file just in case someone needs it :
%{
#include "projectbison.tab.h"
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
int n=1;
%}
%option noyywrap
digit [0-9]+
id [a-zA-Z][a-zA-Z0-9]*
%%
"(" {printf("left bracket , "); return LB1;}
")" {printf("right bracket\n"); return RB1;}
"{" {printf("left curly bracket , "); return LCB1;}
"}" {printf("right curly bracket\n"); return RCB1;}
"==" {printf("isotita ,"); return EQUAL;}
"!=" {printf("diafora ,"); return NEQUAL;}
"<" {printf("less_than ,"); return LESS;}
">" {printf("greater_than ,"); return GREATER;}
"<=" {printf("less_eq ,"); return LEQUAL;}
">=" {printf("greater_eq ,"); return GEQUAL;}
"||" {printf("lor\n"); return LOR;}
"&&" {printf("land\n"); return LAND;}
"&" {printf("and ,"); return AND;}
"!" {printf("lnot ,"); return LNOT;}
"+" {printf("plus ,"); return PLUS; }
"-" {printf("minus ,"); return MINUS;}
"*" {printf("multiply ,"); return MULT;}
"/" {printf("division ,"); return DIV;}
"%" {printf("mod ,"); return MOD;}
";" {printf("semicolon \n"); return SEMIC;}
"=" {printf("equals , "); return EQ;}
"," {printf("comma ,"); return COMMA;}
"\n" {n++; return NEWLINE;}
void {printf("void ,"); return VOID;}
return {printf("return ,"); return RETURN1;}
extern {printf("extern\n"); return EXTERN;}
integer {printf("integer ,"); return INTEGER;}
boolean {printf("boolean ,"); return BOOLEAN;}
string {printf("string ,"); return STRING;}
begin {printf("BEGIN PROGRAM\n"); return BEGINP;}
end {printf("END PROGRAM\n"); return ENDP;}
for {printf("for\n"); return FOR1;}
true {printf("true ,"); return TRUE;}
false {printf("false ,"); return FALSE;}
if {printf("if\n"); return IF1; }
else {printf("else\n"); return ELSE1; }
while {printf("while\n"); return WHILE1;}
{id} {printf("id ,"); return ID;}
{digit} {printf("digit ,"); return DIGIT;}
[a-zA-Z0-9]+ {return STRING1;}
` {/*catchcall*/ printf("Mystery character %s\n", yytext); }
<<EOF>> { static int once = 0; return once++ ? 0 : '\n'; }
%%
Your scanner pretty well guarantees that two newline characters will be sent at the end of the input: one from the newline present in the input, and another one as a result of your trapping <<EOF>>. However, your grammar doesn't appear to accept unexpected newlines, so the second newline will trigger a syntax error.
The simplest solution would be to remove the <<EOF>> rule, since text files without a terminating newline are very rare, and it is entirely legitimate to consider them syntax errors. A more general solution would be to allow any number of newline characters to appear where a newline is expected, by defining something like:
newlines: '\n' | newlines '\n';
(Using actual characters for single-character tokens makes your grammar much more readable, and simplifies your scanner. But that's a side issue.)
You might also ask yourself whether you really need to enforce newline terminators, since your grammar seems to use ; as a statement terminator, making the newline redundant (aside from stylistic considerations). Removing newlines from the grammar (and ignoring them, as with other whitespace, in the scanner) will also simplify your code.