Embedding Code in Yacc - yacc

I'm writing a yacc file as part of a compiler.
I have the following error:
lang_grammar.y:143.54-55: $2 of `ClassDeclaration' has no declared type
lang_grammar.y:143.69-70: $4 of `ClassDeclaration' has no declared type
lang_grammar.y:143.84-85: $6 of `ClassDeclaration' has no declared type
occurring on this line in my .y file:
CLASS { /* code will be embedded here */ } ID EXTENDS ID '{' ClassBody '}'
{ $$.classDeclaration = new ClassDeclaration($2.identifier, $4.identifier, $6.classBody); }
When I remove the inner embedded code:
CLASS ID EXTENDS ID '{' ClassBody '}'
{ $$.classDeclaration = new ClassDeclaration($2.identifier, $4.identifier, $6.classBody); }
It works just fine.
Are there limitations to embedding code within yacc? I was under the impression that this was possible.
Thanks.

I think you have used wrong indexes. In previous way, embedded codes are also indexed, say
CLASS { /* code will be embedded here */ } ID EXTENDS ID '{' ClassBody '}'
$1 $2 $3 $4 $5 $6 $7 $8
So the action codes should be
{ $$.classDeclaration = new ClassDeclaration($3.identifier, $5.identifier, $7.classBody); }

Related

Bison Grammar %type and %token

Why is it that I have to use $<nVal>4 explicitly in the below grammar snippet?
I thought the %type <nVal> expr line would remove the need so that I can simply put $4?
Is it not possible to use a different definition for expr so that I can?
%union
{
int nVal;
char *pszVal;
}
%token <nVal> tkNUMBER
%token <pszVal> tkIDENT
%type <nVal> expr
%%
for_statement : tkFOR
tkIDENT { printf( "I:%s\n", $2 ); }
tkEQUALS
expr { printf( "A:%d\n", $<nVal>4 ); } // Why not just $4?
tkTO
expr { printf( "B:%d\n", $<nVal>6 ); } // Why not just $6?
step-statement
list
next-statement;
expr : tkNUMBER { $$ = $1; }
;
Update following rici's answer. This now works a treat:
for_statement : tkFOR
tkIDENT { printf( "I:%s\n", $2 ); }
tkEQUALS
expr { printf( "A:%d\n", $5 /* $<nVal>5 */ ); }
tkTO
expr { printf( "A:%d\n", $8 /* $<nVal>8 */ ); }
step-statement
list
next-statement;
Why is it that I have to use $<nVal>4 explicitly in the below grammar snippet?
Actually, you should use $5 if you want to refer to the expr. $4 is the tkEQUALS, which has no declared type, so any use must be explicitly typed. $3 is the previous midrule action, which has no value since $$ is not assigned in that action.
By the same logic, the second expr is $8; $6 is the second midrule action, which also has no value (and no type).
See the Bison manual:
The mid-rule action itself counts as one of the components of the rule. This makes a difference when there is another action later in the same rule (and usually there is another at the end): you have to count the actions along with the symbols when working out which number n to use in $n.

How to Read Multiple Lines of input file for arithmetic yacc program?

I am new to compilers and learning to make calculator that inputs multiple line equations (one equation each line) from a .txt file. And I am facing the problem of segmentation fault.
YACC Code :
%{
#include <stdio.h>
#include <string.h>
#define YYSTYPE int /* the attribute type for Yacc's stack */
extern int yylval; /* defined by lex, holds attrib of cur token */
extern char yytext[]; /* defined by lex and holds most recent token */
extern FILE * yyin; /* defined by lex; lex reads from this file */
%}
%token NUM
%%
Begin : Line
| Begin Line
;
Line : Calc {printf("%s",$$); }
;
Calc : Expr {printf("Result = %d\n",$1);}
Expr : Fact '+' Expr { $$ = $1 + $3; }
| Fact '-' Expr { $$ = $1 - $3; }
| Fact '*' Expr { $$ = $1 * $3; }
| Fact '/' Expr { $$ = $1 / $3; }
| Fact { $$ = $1; }
| '-' Expr { $$ = -$2; }
;
Fact : '(' Expr ')' { $$ = $2; }
| Id { $$ = $1; }
;
Id : NUM { $$ = yylval; }
;
%%
void yyerror(char *mesg); /* this one is required by YACC */
main(int argc, char* *argv){
char ch;
if(argc != 2) {printf("useage: calc filename \n"); exit(1);}
if( !(yyin = fopen(argv[1],"r")) ){
printf("cannot open file\n");exit(1);
}
yyparse();
}
void yyerror(char *mesg){
printf("Bad Expression : %s\n", mesg);
exit(1); /* stop after the first error */
}
LEX Code :
%{
#include <stdio.h>
#include "y.tab.h"
int yylval; /*declared extern by yacc code. used to pass info to yacc*/
%}
letter [A-Za-z]
digit [0-9]
num ({digit})*
op "+"|"*"|"("|")"|"/"|"-"
ws [ \t\n]
other .
%%
{ws} { /* note, no return */ }
{num} { yylval = atoi(yytext); return NUM;}
{op} { return yytext[0];}
{other} { printf("bad%cbad%d\n",*yytext,*yytext); return '?'; }
%%
/* c functions called in the matching section could go here */
I am trying to print the expression along with result.
Thanks In Advance.
In your parser, you have:
Line : Calc {printf("%s",$$); }
Now $$ is the semantic value which the rule is computing, and you haven't assigned anything to it. So it would not be unreasonable to assume that it is undefined, which would be bad, but in fact it does have a value because of the default rule $$ = $1;. All the same, it would be much more readable to write
printf("%s", $1);
But that's not correct, is it? After all, you have
#define YYSTYPE int
so all semantic types are integers. But you're telling printf that $1 is a string (%s). printf will believe you, so it will go ahead and try to dereference the int as though it were a char*, with predictable results (i.e., a segfault).
You are probably using a compiler which is clever enough to notice the fact that you are trying to print an int with a %s format code. But either you haven't asked the compiler to help you or you are ignoring its advice.
Always compile with warnings enabled. If you are using gcc or clang, that means putting -Wall in the command line. (If you are using some other compiler, find out how to produce warnings. It will be documented.) And then read the warnings and fix them before trying to run the program.
There are several other errors and/or questionable practices in your code. Your grammar is inaccurate (why do you use fact as the left-hand operand of every operator?), and despite your comment, your lexical scanner ignores newline characters, so there is no way the parser can know whether expressions are one per line, two per line, or spread over multiple lines; that will make it hard to use the calculator as a command-line tool.
There is no need to define the lex macro digit; (f)lex recognizes the Posix character class [[:digit:]] (and others, documented here) automatically. Nor is it particularly useful to define the macro num. Overuse of lex macros makes your program harder to read; it is usually better to just write the patterns out in place:
[[:digit:]]+ { yylval = atoi(yytext); return NUM; }
which would be more readable and less work both for you and for anyone reading your code. (If your professor or tutor disagrees, I'd be happy to discuss the matter with them directly.)

MAWK: Store match() in variable

I try to use MAWK where the match() built-in function doesn't have a third value for variable:
match($1, /9f7fde/) {
substr($1, RSTART, RLENGTH);
}
See doc.
How can I store this output into a variable named var when later I want to construct my output like this?
EDIT2 - Complete example:
Input file structure:
<iframe src="https://vimeo.com/191081157" frameborder="0" height="481" width="608" scrolling="no"></iframe>|Random title|Uploader|fun|tag1,tag2,tag3
<iframe src="https://vimeo.com/212192268" frameborder="0" height="481" width="608" scrolling="no"></iframe>|Random title|Uploader|fun|tag1,tag2,tag3
parser.awk:
{
Embed = $1;
Title = $2;
User = $3;
Categories = $4;
Tags = $5;
}
BEGIN {
FS="|";
}
# Regexp without pattern matching for testing purposes
match(Embed, /191081157/) {
Id = substr(Embed, RSTART, RLENGTH);
}
{
print Id"\t"Title"\t"User"\t"Categories"\t"Tags;
}
Expected output:
191081157|Random title|Uploader|fun|tag1,tag2,tag3
I want to call the Id variable outside the match() function.
MAWK version:
mawk 1.3.4 20160930
Copyright 2008-2015,2016, Thomas E. Dickey
Copyright 1991-1996,2014, Michael D. Brennan
random-funcs: srandom/random
regex-funcs: internal
compiled limits:
sprintf buffer 8192
maximum-integer 2147483647
The obvious answer would seem to be
match($1, /9f7fde/) { var = "9f7fde"; }
But more general would be:
match($1, /9f7fde/) { var = substr($1, RSTART, RLENGTH); }
UPDATE : The solution above mine could be simplified to :
from
match($1, /9f7fde/) { var = substr($1, RSTART, RLENGTH) }
to
{ __=substr($!_,match($!_,"9f7fde"),RLENGTH) }
A failed match would have RLENGTH auto set to -1, so nothing gets substring'ed out.
But even that is too verbose : since the matching criteria is a constant string, then simply
mawk '$(_~_)~_{__=_}' \_='9f7fde'
============================================
let's say this line
.....vimeo.com/191081157" frameborder="0" height="481" width="608" scrolling="no">Random title|Uploader|fun|tag1,tag2,tag3
{mawk/mawk2/gawk} 'BEGIN { OFS = "";
FS = "(^.+vimeo[\056]com[\057]|[\042] frameborder.+[\057]iframe[>])" ;
} (NF < 4) || ($2 !~ /191081157/) { next } ( $1 = $1 )'
\056 is the dot ( . ) \057 is forward slash ( / ) and \042 is double straight quote ( " )
if it can't even match at all, move onto next row. otherwise, use the power of the field separator to gobble away all the unneeded parts of the line. The $1 = $1 will collect the prefix and the rest of the HTML tags you don't need.
The assignment operation of $1 = $1 will also return true, providing the input for boolean evaluation for it to print. This way, you don't need either match( ) or substr( ) at all.

Is the below expression valid in Yacc

In Yacc(or bison) , is the below expression a syntactically valid one ?
sentence : noun verb {
/* some action here which uses only $1 , $2 */
}
predicate {
/*some action which uses $1,$2,$3,$4 */
}
Yes, it is valid.
The first action is a mid-rule action. It itself has a semantic value, which will be $3, so the comment in the second action should include $4 (the value of predicate).

Yacc/Bison: The pseudo-variables ($$, $1, $2,..) and how to print them using printf

I have a lexical analyser written in flex that passes tokens to my parser written in bison.
The following is a small part of my lexer:
ID [a-z][a-z0-9]*
%%
rule {
printf("A rule: %s\n", yytext);
return RULE;
}
{ID} {
printf( "An identifier: %s\n", yytext );
return ID;
}
"(" return LEFT;
")" return RIGHT;
There are other bits for parsing whitespace etc too.
Then part of the parser looks like this:
%{
#include <stdio.h>
#include <stdlib.h>
#define YYSTYPE char*
%}
%token ID RULE
%token LEFT RIGHT
%%
rule_decl :
RULE LEFT ID RIGHT { printf("Parsing a rule, its identifier is: %s\n", $2); }
;
%%
It's all working fine but I just want to print out the ID token using printf - that's all :). I'm not writing a compiler.. it's just that flex/bison are good tools for my software. How are you meant to print tokens? I just get (null) when I print.
Thank you.
I'm not an expert at yacc, but the way I've been handling the transition from the lexer to the parser is as follows: for each lexer token, you should have a separate rule to "translate" the yytext into a suitable form for your parser. In your case, you are probably just interested in yytext itself (while if you were writing a compiler, you'd wrap it in a SyntaxNode object or something like that). Try
%token ID RULE
%token LEFT RIGHT
%%
rule_decl:
RULE LEFT id RIGHT { printf("%s\n", $3); }
id:
ID { $$ = strdup(yytext); }
The point is that the last rule makes yytext available as a $ variable that can be referenced by rules involving id.