3 Address Code Generation using lex and yacc - yacc

I'm trying to generate 3 address code corresponding to basic arithmetic expressions. I haven't worked with lex and yacc tools before much (Newbie) and I'm having trouble understanding the flow of control/command among the two i.e how the two programs are interacting.
lex.l
%{
#include<stdio.h>
#include"y.tab.h"
int k=1;
%}
%%
[0-9]+ {
yylval.dval=yytext[0];
return NUM;
}
\n {return 0;}
. {return yytext[0];}
%%
void yyerror(char* str)
{
printf("\n%s",str);
}
char *gencode(char word[],char first,char op,char second)
{
char temp[10];
sprintf(temp,"%d",k);
strcat(word,temp);
k++;
printf("%s = %c %c %c\n",word,first,op,second);
return word; //Returns variable name like t1,t2,t3... properly
}
int yywrap()
{
return 1;
}
main()
{
yyparse();
return 0;
}
yacc.y
%{
#include<stdio.h>
int aaa;
%}
%union{
char dval;
}
%token <dval> NUM
%type <dval> E
%left '+' '-'
%left '*' '/' '%'
%%
statement : E {printf("\nt = %c \n",$1);}
;
E : E '+' E
{
char word[]="t";
char *test=gencode(word,$1,'+',$3);
$$=test;
}
| E '-' E
{
char word[]="t";
char *test=gencode(word,$1,'-',$3);
$$=test;
}
| E '%' E
{
char word[]="t";
char *test=gencode(word,$1,'%',$3);
$$=test;
}
| E '*' E
{
char word[]="t";
char *test=gencode(word,$1,'*',$3);
$$=test;
}
| E '/' E
{
char word[]="t";
char *test=gencode(word,$1,'/',$3);
$$=test;
}
| '(' E ')'
{
$$=$2;
}
| NUM
{
$$=$1;
}
;
%%
Problem:
getting garbage value in output
Expected output for expression (2+3)*5 should be like:
t1= 2 + 3
t2= t1 * 5
Obtained output:
t1= 2 + 3
t2= garbage value * 5
I'm unable to figure out how to correct this. The variable names (eg t1,t2,t3 ) are being properly returned from gencode() method in lex.l
char *test=gencode(word,$1,'%',$3);
But I'm completely clueless about what is going wrong after that. I believe I'm not handling the $$,$1,$3 terms correctly.
Please help me understand what is going wrong, what needs to be done and how to do it.
A little help and some explanation would be very helpful. Thank you.

The problem here is not in the use of flex or bison; rather, it is an Undefined Behaviour in your C code.
Your gencode function returns its first argument. Then you call it like this, roughly:
{
char word[] = ...
... = gencode(word, ...);
}
The lifetime of word ends when the block finishes, which is right after the call to gencode. In effect, that is no different from the classic dangling pointer generator:
char* dangle(void) {
char temporary[] = "some string";
return temporary;
}
which is obviously incorrect, since the local variable ceases to exist before its address is returned.
In addition, you actually create word as a two-character array:
char word[] = "t";
since leaving out the size tells C to leave exactly enough space for the initial string (one character plus null terminator). That's fine, but you cannot then append more characters to the string (with strcat) because there is no space left and you will end up overwriting some other variable (or worse).

mentall n't end even after the function returns. That's why I declared char word[] before calling the function. This ideone.com/RBz0y2 is a code I wrote seperately and used it in here too. Is it not right? – Swagnik Dutta Mar 9 '16 at 16:38
#novice: If the caller allocates on the stack and passes to the called function, that is fine. The caller still has the memory. But as soon as the caller returns to its caller, the memory is gone. You cannot set a persistent variable to the address. So, no, it is not right. If you need keep the value around for the future, you need to allocate with malloc; stack-allocated storage

Related

Determine types from a variadic function's arguments in C

I'd like a step by step explanation on how to parse the arguments of a variadic function
so that when calling va_arg(ap, TYPE); I pass the correct data TYPE of the argument being passed.
Currently I'm trying to code printf.
I am only looking for an explanation preferably with simple examples but not the solution to printf since I want to solve it myself.
Here are three examples which look like what I am looking for:
https://stackoverflow.com/a/1689228/3206885
https://stackoverflow.com/a/5551632/3206885
https://stackoverflow.com/a/1722238/3206885
I know the basics of what typedef, struct, enum and union do but can't figure out some practical application cases like the examples in the links.
What do they really mean? I can't wrap my brain around how they work.
How can I pass the data type from a union to va_arg like in the links examples? How does it match?
with a modifier like %d, %i ... or the data type of a parameter?
Here's what I've got so far:
#include <stdarg.h>
#include <stdio.h>
#include <stdlib.h>
#include "my.h"
typedef struct s_flist
{
char c;
(*f)();
} t_flist;
int my_printf(char *format, ...)
{
va_list ap;
int i;
int j;
int result;
int arg_count;
char *cur_arg = format;
char *types;
t_flist flist[] =
{
{ 's', &my_putstr },
{ 'i', &my_put_nbr },
{ 'd', &my_put_nbr }
};
i = 0;
result = 0;
types = (char*)malloc( sizeof(*format) * (my_strlen(format) / 2 + 1) );
fparser(types, format);
arg_count = my_strlen(types);
while (format[i])
{
if (format[i] == '%' && format[i + 1])
{
i++;
if (format[i] == '%')
result += my_putchar(format[i]);
else
{
j = 0;
va_start(ap, format);
while (flist[j].c)
{
if (format[i] == flist[j].c)
result += flist[i].f(va_arg(ap, flist[i].DATA_TYPE??));
j++;
}
}
}
result += my_putchar(format[i]);
i++;
}
va_end(ap);
return (result);
}
char *fparser(char *types, char *str)
{
int i;
int j;
i = 0;
j = 0;
while (str[i])
{
if (str[i] == '%' && str[i + 1] &&
str[i + 1] != '%' && str[i + 1] != ' ')
{
i++;
types[j] = str[i];
j++;
}
i++;
}
types[j] = '\0';
return (types);
}
You can't get actual type information from va_list. You can get what you're looking for from format. What it seems you're not expecting is: none of the arguments know what the actual types are, but format represents the caller's idea of what the types should be. (Perhaps a further hint: what would the actual printf do if a caller gave it format specifiers that didn't match the varargs passed in? Would it notice?)
Your code would have to parse the format string for "%" format specifiers, and use those specifiers to branch into reading the va_list with specific hardcoded types. For example, (pseudocode) if (fspec was "%s") { char* str = va_arg(ap, char*); print out str; }. Not giving more detail because you explicitly said you didn't want a complete solution.
You will never have a type as a piece of runtime data that you can pass to va_arg as a value. The second argument to va_arg must be a literal, hardcoded specification referring to a known type at compile time. (Note that va_arg is a macro that gets expanded at compile time, not a function that gets executed at runtime - you couldn't have a function taking a type as an argument.)
A couple of your links suggest keeping track of types via an enum, but this is only for the benefit of your own code being able to branch based on that information; it is still not something that can be passed to va_arg. You have to have separate pieces of code saying literally va_arg(ap, int) and va_arg(ap, char*) so there's no way to avoid a switch or a chain of ifs.
The solution you want to make, using the unions and structs, would start from something like this:
typedef union {
int i;
char *s;
} PRINTABLE_THING;
int print_integer(PRINTABLE_THING pt) {
// format and print pt.i
}
int print_string(PRINTABLE_THING pt) {
// format and print pt.s
}
The two specialized functions would work fine on their own by taking explicit int or char* params; the reason we make the union is to enable the functions to formally take the same type of parameter, so that they have the same signature, so that we can define a single type that means pointer to that kind of function:
typedef int (*print_printable_thing)(PRINTABLE_THING);
Now your code can have an array of function pointers of type print_printable_thing, or an array of structs that have print_printable_thing as one of the structs' fields:
typedef struct {
char format_char;
print_printable_thing printing_function;
} FORMAT_CHAR_AND_PRINTING_FUNCTION_PAIRING;
FORMAT_CHAR_AND_PRINTING_FUNCTION_PAIRING formatters[] = {
{ 'd', print_integer },
{ 's', print_string }
};
int formatter_count = sizeof(formatters) / sizeof(FORMAT_CHAR_AND_PRINTING_FUNCTION_PAIRING);
(Yes, the names are all intentionally super verbose. You'd probably want shorter ones in the real program, or even anonymous types where appropriate.)
Now you can use that array to select the correct formatter at runtime:
for (int i = 0; i < formatter_count; i++)
if (current_format_char == formatters[i].format_char)
result += formatters[i].printing_function(current_printable_thing);
But the process of getting the correct thing into current_printable_thing is still going to involve branching to get to a va_arg(ap, ...) with the correct hardcoded type. Once you've written it, you may find yourself deciding that you didn't actually need the union nor the array of structs.

Flex/Lex - How to know if a variable was declared

My grammar allows:
C → id := E // assign a value/expression to a variable (VAR)
C → print(id) // print variables(VAR) values
To get it done, my lex file is:
[a-z]{
yylval.var_index=get_var_index(yytext);
return VAR;
}
get_var_index returns the index of the variable in the list, if it does not exist then it creates one.
It is working!
The problem is:
Everytime a variable is matched on lex file it creates a index to that variable.
I have to report if 'print(a)' is called and 'a' was not declared, and that will never happen since print(a) always creates an index to 'a'.*
How can I solve it?
Piece of yacc file:
%union {
int int_val;
int var_index;
}
%token <int_val> INTEGER
%token <var_index> VAR
...
| PRINT '(' VAR ')'{
n_lines++;
printf("%d\n",values[$3]);
}
...
| VAR {$$ =values[$1];}
This does seem a bit like a Computer Science class homework question for us to do.
Normally one would not use bison/yacc in this way. One would do the parse with bison/yacc and make a parse tree which then gets walked to perform semantic checks, such as checking for declaration before use and so on. The identifiers would normally be managed in a symbol table, rather than just a table of values to enable other attributes, such as declared to be managed. It's for these reasons that it looks like an exercise rather than a realistic application of the tools. OK; those disclaimers disposed of, lets get to an answer.
The problem would be solved by remembering what has been declared and what not. If one does not plan to use a full symbol table then a simple array of booleans indicating which are the valid values could be used. The array can be initialised to false and set to true on declaration. This value can be checked when a variable is used. As C uses ints for boolean we can use that. The only changes needed are in the bison/yacc. You omitted any syntax for the declarations, but as you indicated they are declared there must be some. I guessed.
%union {
int int_val;
int var_index;
}
int [MAX_TABLE_SIZE] declared; /* initialize to zero before starting parse */
%token <int_val> INTEGER
%token <var_index> VAR
...
| DECLARE '(' VAR ')' { n_lines++; declared[$3] = 1; }
...
| PRINT '(' VAR ')'{
n_lines++;
if (declared[$3]) printf("%d\n",values[$3]);
else printf("Variable undeclared\n");
}
...
| VAR {$$ =value[$1]; /* perhaps need to show more syntax to show how VAR used */}

Stange behavior with my C string reverse function

I'm just an amateur programmer...
And when reading, for the second time, and more than two years apart, kochan's "Programming in Objective-C", now the 6th ed., reaching the pointer chapter i tried to revive the old days when i started programming with C...
So, i tried to program a reverse C string function, using char pointers...
At the end i got the desired result, but... got also a very strange behavior, i cannot explain with my little programming experience...
First the code:
This is a .m file,
#import <Foundation/Foundation.h>
#import "*pathToFolder*/NSPrint.m"
int main(int argc, char const *argv[])
{
#autoreleasepool
{
char * reverseString(char * str);
char *ch;
if (argc < 2)
{
NSPrint(#"No word typed in the command line!");
return 1;
}
NSPrint(#"Reversing arguments:");
for (int i = 1; argv[i]; i++)
{
ch = reverseString(argv[i]);
printf("%s\n", ch);
//NSPrint(#"%s - %s", argv[i], ch);
}
}
return 0;
}
char * reverseString(char * str)
{
int size = 0;
for ( ; *(str + size) != '\0'; size++) ;
//printf("Size: %i\n", size);
char result[size + 1];
int i = 0;
for (size-- ; size >= 0; size--, i++)
{
result[i] = *(str + size);
//printf("%c, %c\n", result[i], *(str + size));
}
result[i] = '\0';
//printf("result location: %lu\n", result);
//printf("%s\n", result);
return result;
}
Second some notes:
This code is compiled in a MacBook Pro, with MAC OS X Maverick, with CLANG (clang -fobjc-arc $file_name -o $file_name_base)
That NSPrint is just a wrapper for printf to print a NSString constructed with stringWithFormat:arguments:
And third the strange behavior:
If I uncomment all those commented printf declarations, everything work just fine, i.e., all printf functions print what they have to print, including the last printf inside main function.
If I uncomment one, and just one, randomly chosen, of those comment printf functions, again everything work just fine, and I got the correct printf results, including the last printf inside main function.
If I leave all those commented printf functions as they are, I GOT ONLY BLANK LINES with the last printf inside main block, and one black line for each argument passed...
Worst, if I use that NSPrint function inside main, instead of the printf one, I get the desired result :!
Can anyone bring some light here please :)
You're returning a local array, that goes out of scope as the function exits. Dereferencing that memory causes undefined behavior.
You are returning a pointer to a local variable of the function that was called. When that function returns, the memory for the local variable becomes invalid, and the pointer returned is rubbish.

failed to parse number by yacc and lex

i have finished my lex file and start to learn about yacc
but i have some question about part of my code of lex:
%{
#include "y.tab.h"
int num_lines = 1;
int comment_mode=0;
int stack =0;
%}
digit ([0-9])
integer ({digit}+)
float_num ({digit}+\.{digit}+)
%%
{integer} { //deal with integer
printf("#%d: NUM:",num_lines); ECHO;printf("\n");
yylval.Integer = atoi(yytext);
return INT;
}
{float_num} {// deal with float
printf("#%d: NUM:",num_lines);ECHO;printf("\n");
yylval.Float = atof(yytext);
return FLOAT;
}
\n { ++num_lines; }
. if(strcmp(yytext," "))ECHO;
%%
int yywrap() {
return 1;
}
every time i got an integer or a float i return the token and save it into yylval
and here is my code in parser.y:
%{
#include <stdio.h>
#define YYDEBUG 1
void yyerror (char const *s) {
fprintf (stderr, "%s\n", s);
}
%}
%union{
int Integer;
float Float;
}
%token <int>INT;
%token <float>FLOAT;
%%
statement :
INT {printf("int yacc\n");}
| FLOAT {printf("float yacc\n");}
|
;
%%
int main(int argc, char** argv)
{
yyparse();
return 0;
}
which compiled by
byacc –d parser.y
lex lex.l
gcc lex.yy.c y.tab.c –ll
since i just want to try something easy to get started, i want to see if i can parse
only int and float number first, i print them in both .l and .y file after i input an
integer or a float.int the begining i input fisrt random number, for example 123
, then my program print :
1: NUM: 123
in yylex() and
"int yacc\n"
in parser.y
but if i input the second else number, it shows syntax error and the program shutdown
i dont know where is the problem.
is there any solution?
Your grammar only accepts a single token, either an INT or a FLOAT. So it will only accept a single number, which is why it produces a syntax error when it reads the second number; it is expecting an end-of-file.
The solution is to change the grammar so that it accepts any number of "statements":
program: /* EMPTY */
| program statement
;
Two notes:
1) You don't need an (expensive) strcmp in your lexer. Just do this:
" " /* Do nothing */;
. { return yytext[0]; }
It's better to return the unknown character to the parser, which will produce a syntax error if the character doesn't correspond to any token type (as in your simple grammar) than to just echo the character to stdout, which will prove confusing. Some people would prefer to produce an error message in the lexer for invalid input, but while you are developing a grammar I think it is easier to just pass through the characters, because that lets you add operators to your parser without regenerating the lexer.
2) When you specify %types in bison, you use the tagname from the union, not the C type. Some (but not all) versions of bison let you get away with using the C type if it is a simple type, but you can't count on it; it's not posix standard and it may well break if you use an older or newer version of bison. (For example, it won't work with bison 3.0.) So you should write, for example:
%union{
int Integer;
float Float;
}
%token <Integer>INT;
%token <Float>FLOAT;

Objective c, Scanf() string taking in the same value twice

Hi all I am having a strange issue, when i use scanf to input data it repeats strings and saves them as one i am not sure why.
Please Help
/* Assment Label loop - Loops through the assment labels and inputs the percentage and the name for it. */
i = 0;
j = 0;
while (i < totalGradedItems)
{
scanf("%s%d", assLabel[i], &assPercent[i]);
i++;
}
/* Print Statement */
i = 0;
while (i < totalGradedItems)
{
printf("%s", assLabel[i]);
i++;
}
Input Data
Prog1 20
Quiz 20
Prog2 20
Mdtm 15
Final 25
Output Via Console
Prog1QuizQuizProg2MdtmMdtmFinal
Final diagnosis
You don't show your declarations...but you must be allocating just 5 characters for the strings:
When I adjust the enum MAX_ASSESSMENTLEN from 10 to 5 (see the code below) I get the output:
Prog1Quiz 20
Quiz 20
Prog2Mdtm 20
Mdtm 15
Final 25
You did not allow for the terminal null. And you didn't show us what was causing the bug! And the fact that you omitted newlines from the printout obscured the problem.
What's happening is that 'Prog1' is occupying all 5 bytes of the string you read in, and is writing a null at the 6th byte; then Quiz is being read in, starting at the sixth byte.
When printf() goes to read the string for 'Prog1', it stops at the first null, which is the one after the 'z' of 'Quiz', producing the output shown. Repeat for 'Prog2' and 'Mtdm'. If there was an entry after 'Final', it too would suffer. You are lucky that there are enough zero bytes around to prevent any monstrous overruns.
This is a basic buffer overflow (indeed, since the array is on the stack, it is a basic Stack Overflow); you are trying to squeeze 6 characters (Prog1 plus '\0') into a 5 byte space, and it simply does not work well.
Preliminary diagnosis
First, print newlines after your data.
Second, check that scanf() is not returning errors - it probably isn't, but neither you nor we can tell for sure.
Third, are you sure that the data file contains what you say? Plausibly, it contains a pair of 'Quiz' and a pair of 'Mtdm' lines.
Your variable j is unused, incidentally.
You would probably be better off having the input loop run until you are either out of space in the receiving arrays or you get a read failure. However, the code worked for me when dressed up slightly:
#include <stdio.h>
#include <stdlib.h>
int main(void)
{
char assLabel[10][10];
int assPercent[10];
int i = 0;
int totalGradedItems = 5;
while (i < totalGradedItems)
{
if (scanf("%9s%d", assLabel[i], &assPercent[i]) != 2)
{
fprintf(stderr, "Error reading\n");
exit(1);
}
i++;
}
/* Print Statement */
i = 0;
while (i < totalGradedItems)
{
printf("%-9s %3d\n", assLabel[i], assPercent[i]);
i++;
}
return 0;
}
For the quoted input data, the output results are:
Prog1 20
Quiz 20
Prog2 20
Mdtm 15
Final 25
I prefer this version, though:
#include <stdio.h>
enum { MAX_GRADES = 10 };
enum { MAX_ASSESSMENTLEN = 10 };
int main(void)
{
char assLabel[MAX_GRADES][MAX_ASSESSMENTLEN];
int assPercent[MAX_GRADES];
int i = 0;
int totalGradedItems;
for (i = 0; i < MAX_GRADES; i++)
{
if (scanf("%9s%d", assLabel[i], &assPercent[i]) != 2)
break;
}
totalGradedItems = i;
for (i = 0; i < totalGradedItems; i++)
printf("%-9s %3d\n", assLabel[i], assPercent[i]);
return 0;
}
Of course, if I'd set up the scanf() format string 'properly' (meaning safely) so as to limit the length of the assessment names to fit into the space allocated, then the loop would stop reading on the second attempt:
...
char format[10];
...
snprintf(format, sizeof(format), "%%%ds%%d", MAX_ASSESSMENTLEN-1);
...
if (scanf(format, assLabel[i], &assPercent[i]) != 2)
With MAX_ASSESSMENTLEN at 5, the snprintf() generates the format string "%4s%d". The code compiled reads:
Prog 1
and stops. The '1' comes from the 5th character of 'Prog1'; the next assessment name is '20', and then the conversion of 'Quiz' into a number fails, causing the input loop to stop (because only one of two expected items was converted).
Despite the nuisance value, if you want to make your scanf() strings adjust to the size of the data variables it is reading into, you have to do something akin to what I did here - format the string using the correct size values.
i guess, you need to put a
scanf("%s%d", assLabel[i], &assPercent[i]);
space between %s and %d here.
And it is not saving as one. You need to put newline or atlease a space after %s on print to see difference.
add:
when i tried
#include <stdio.h>
int main (int argc, const char * argv[])
{
char a[1][2];
for(int i =0;i<3;i++)
scanf("%s",a[i]);
for(int i =0;i<3;i++)
printf("%s",a[i]);
return 0;
}
with inputs
123456
qwerty
sdfgh
output is:
12qwsdfghqwsdfghsdfgh
that proves that, the size of string array need to be bigger then decleared there.