I'm back and now writing my own language and my OS, but as I'm now starting in the development of my own development language, I'm getting some errors when using Bison and I don't know how to solve them. This is my *.y file code:
input:
| input line
;
line: '\n'
| exp '\n' { printf ("\t%.10g\n", $1); }
;
exp: NUM { $$ = $1; }
| exp exp '+' { $$ = $1 + $2; }
| exp exp '-' { $$ = $1 - $2; }
| exp exp '*' { $$ = $1 * $2; }
| exp exp '/' { $$ = $1 / $2; }
/* Exponentiation */
| exp exp '^' { $$ = pow ($1, $2); }
/* Unary minus */
| exp 'n' { $$ = -$1; }
;
%%
And when I try to use Bison with this source code I'm getting this error:
calc.y:1.1-5: syntax error, unexpected identifier:
You need a '%%' before the rules as well as after them (or, strictly, instead; if there is no code after the second '%%', you can omit that line).
You will also need a '%token NUM' before the first '%%'; the grammar then passes Bison.
Another alternative solution exists, which is to upgrade to bison version 3.0.4. I guess between version 2.x and 3.x, they changed the file syntax.
Related
I'm trying to create a grammar for a programming language in Jison, and have run into a problem with calls. Functions in my language is invoked with the following syntax:
functionName arg1 arg2 arg3
In order to do arguments that aren't just simple expressions, they need to be wrapped in parenthesizes like this:
functionName (1 + 2) (3 + 3) (otherFunction 5)
However, there is a bug in my grammar that causes my parser to interpret functionName arg1 arg2 arg3 as functionName(arg1(arg2(arg3))) instead of functionName(arg1, arg2, arg3).
The relevant part of my jison grammar file looks like this:
expr:
| constantExpr { $$ = $1; }
| binaryExpr { $$ = $1; }
| callExpr { $$ = $1; }
| tupleExpr { $$ = $1; }
| parenExpr { $$ = $1; }
| identExpr { $$ = $1; }
| blockExpr { $$ = $1; }
;
callArgs:
| callArgs expr { $$ = $1.concat($2); }
| expr { $$ = [$1]; }
;
callExpr:
| path callArgs { $$ = ast.Expr.Call($1, $2); }
;
identExpr:
| path { $$ = ast.Expr.Ident($1); }
;
How can I make Jison prefer the callArgs rather than the expr?
You might be able to do this by playing games with precedence relations, but I think the most straightforward solution is to be clear.
What you want to say is that callArgs cannot directly contain a callExpr. As in your example, if you want to pass a callExpr as an argument, you need to enclose it in parentheses, in which case it will match some other production (presumably parenExpr).
So you can write that directly:
callArgExpr
: constantExpr
| binaryExpr
| tupleExpr
| parenExpr
| identExpr
| blockExpr
;
expr
: callArgExpr
| callExpr
;
callArgs
: callArgs callArgExpr { $$ = $1.concat($2); }
| callArgExpr { $$ = [$1]; }
;
callExpr
: path callArgs { $$ = ast.Expr.Call($1, $2); }
;
In fact, it's likely that you want to restrict callArgs even further, since (if I understand correctly) func a + b does not mean "apply a+b to func", which would have been written func (a + b). So you might want to also remove binaryExpr from callArgExpr, and possibly some other. I hope the model above shows how to do that.
By the way, I removed all the empty productions, assuming that they were unintentional (unless jison has some exception for that syntax; I'm not really a jison expert). And I removed { $$ = $1; }, which I believe is as unnecessary in jison as in the classic yacc/bison/etc., since it is the default action.
It is important to review other parts of your grammar to give a precise answer. I do not know if what I think is right, but from what I saw in your code, you could create a rule explicitly for the arguments in the order that you want without nesting one inside the other:
args:
| "(" simple_expression ")" args { /*Do something with $2*/ }
| "\n"
;
I hope this has helped you a little. Greetings.
I am pretty sure I have conflicting YACC rules (specifically the exp exp and group_open exp group_close rules). I am trying to build a simple boolean query syntax that lets people do stuff like a "b c" -(d or e) which would rougly be equivalent to a AND "b c" AND NOT (d OR e).
However I am having trouble implemnting both the group rule () and the AND rule (basically just spaces).
%lex
%%
\s+ ;
or|OR return 'or';
and|AND return 'and';
\"[^\"]+\" return 'phrase';
"-"\b return 'not';
"(" return 'group_open';
")" return 'group_close';
[^\s,]+ return 'word';
/lex
%token space
%token phrase
%token group_open
%token group_close
%token word
%left or
%left and
%left not
%%
query : exp { return $1; }
;
exp : term
| exp or exp { $$ = $1+" OR "+$3; }
| exp and exp { $$ = $1+" AND "+$3; }
/* this is the one that is casuing me issues */
| exp exp { $$ = $1+" AND "+$3; }
| not exp { $$ = "NOT "+$2; }
| group_open exp group_close { $$ = "("+$2+")"; }
;
term : phrase { $$ = "PHRASE"; }
| word { $$ = "WORD"; }
;
Any help would be great.
I am testing my grammar by using jison.org
Below are the errors I am getting
Conflicts encountered:
Resolve S/R conflict (shift by default.)
(1,8, 2,5) -> 1,8Resolve S/R conflict (shift by default.)
(1,9, 2,5) -> 1,9Resolve S/R conflict (shift by default.)
(1,6, 2,5) -> 1,6Resolve S/R conflict (shift by default.)
(1,7, 2,5) -> 1,7Resolve S/R conflict (shift by default.)
(1,4, 2,5) -> 1,4Resolve S/R conflict (shift by default.)
(1,5, 2,5) -> 1,5Resolve S/R conflict (shift by default.)
(1,6, 2,6) -> 1,6Resolve S/R conflict (shift by default.)
(1,7, 2,6) -> 1,7Resolve S/R conflict (shift by default.)
(1,5, 2,6) -> 1,5Resolve S/R conflict (shift by default.)
(1,6, 2,3) -> 1,6Resolve S/R conflict (shift by default.)
(1,7, 2,3) -> 1,7Resolve S/R conflict (shift by default.)
(1,5, 2,3) -> 1,5Resolve S/R conflict (shift by default.)
(1,6, 2,4) -> 1,6Resolve S/R conflict (shift by default.)
(1,7, 2,4) -> 1,7Resolve S/R conflict (shift by default.)
(1,5, 2,4) -> 1,5
Using precedence rules to resolve this conflict really requires understanding the details of how LR parsing works, and how yacc precendece levels are used to resolve shift/reduce conflicts.
Ambiguities in an expression grammar like yours manifest as shift/reduce conflicts where the parser does not know whether to reduce a rule for an operator that it has parsed or shift a token that might lead to some higher precedence operation. If the rule that the shift leads to is higher precedence, then it should be shifted, but sometimes it is tough to know what rule the token will lead to.
In your example, having recognized the RHS of some rule that ends with exp, and looking at a lookahead token that can be the start of another exp, it needs to reduce if the rule seen is higher precedence than an exp exp expression, and shift otherwise. So you need to set the precedence of every token that might start an expression as just lower than the precedence of the exp exp rule (assuming you want left associativity), and higher than other lower precedence things:
%left or
%nonassoc phrase word group_open not
%left and
%left UNARY
%%
query : exp { return $1; }
;
exp : term
| exp or exp { $$ = $1+" OR "+$3; }
| exp and exp { $$ = $1+" AND "+$3; }
| exp exp %prec and { $$ = $1+" AND "+$2; }
| not exp %prec UNARY { $$ = "NOT "+$2; }
| group_open exp group_close { $$ = "("+$2+")"; }
;
term : phrase { $$ = "PHRASE"; }
| word { $$ = "WORD"; }
;
Note that not may start a expression, so needs to have lower precedence than exp exp, so we introduce a new fake token UNARY that will never be returned by the lexer; it exists soly to give higher precedence to the not exp rule with the %prec UNARY directive. Also, the exp exp rule needs an explicit %prec directive to give it a precedence level (by default rules get the precedence of the first token on the RHS, but exp exp has no tokens on the RHS).
The above rules make the precedence of exp exp and exp and exp the same and left associative. That means 'a b and c' will be parsed as '(a b) and c', and 'a and b c' will be parsed as '(a and b) c'. If you instead want exp exp to be higher precedence than exp and exp, you need to create another fake token with higher precedence than and and use that for the precedence of exp exp, moving the %nonassoc up to be just below that as well.
Alternately, you can avoid yacc precedence rules altogether, and instead rewrite your grammar with multiple exp rules, one for each precedence level:
query : exp1 { return $1; } ;
exp1 : exp1 or exp2 { $$ = $1+" OR "+$3; }
| exp2 ;
exp2 : exp2 and exp3 { $$ = $1+" AND "+$3; }
| exp2 exp3 { $$ = $1+" AND "+$2; }
| exp3 ;
exp3 : not exp3 { $$ = "NOT "+$2; }
| '(' exp1 ')' { $$ = "("+$2+")"; }
| phrase { $$ = "PHRASE"; }
| word { $$ = "WORD"; } ;
I'm trying to get: (20 + (-3)) * 3 / (20 / 3) / 2 to equal 4. Right now it equals 17.
So basically it's doing (20/3) then dividing that by 2, then dividing 3 by [(20/3)/2], then multiplying that by 17. Not sure how to alter my grammar/rules/precedences to get it to read correctly. Any guidance would be appreciated, thanks.
%%
start: PROGRAMtoken IDtoken IStoken compoundstatement
compoundstatement: BEGINtoken {print_header();} statement semi ENDtoken {print_end();}
semi: SEMItoken statement semi
|
statement: IDtoken EQtoken exp
{ regs[$1] = $3; }
| PRINTtoken exp
{ cout << $2 << endl; }
| declaration
declaration: VARtoken IDtoken comma
comma: COMMAtoken IDtoken comma
|
exp: exp PLUStoken term
{ $$ = $1 + $3; }
| exp MINUStoken term
{ $$ = $1 - $3; }
| term
{ $$ = $1; }
| MINUStoken term
{ $$ = -$2;}
term: factor
{ $$ = $1;
}
| factor TIMEStoken term
{$$ = $1 * $3;
}
| factor DIVIDEtoken term
{ $$ = $1 / $3;
}
factor: ICONSTtoken
{ $$ = $1;}
| IDtoken
{ $$ = regs[$1]; }
| LPARENtoken exp RPARENtoken
{ $$ = $2;}
%%
My tokens and types look like:
%token BEGINtoken
%token COMMAtoken
%left DIVIDEtoken
%left TIMEStoken
%token ENDtoken
%token EOFtoken
%token EQtoken
%token <value> ICONSTtoken
%token <value> IDtoken
%token IStoken
%token LPARENtoken
%left PLUStoken MINUStoken
%token PRINTtoken
%token PROGRAMtoken
%token RPARENtoken
%token SEMItoken
%token VARtoken
%type <value> exp
%type <value> term
%type <value> factor
You really wanted someone to work hard to give you an answer, which is why the question has hung around for a year. Have a read of this help page: https://stackoverflow.com/help/how-to-ask, in particular the part about simplifying the problem. There are lots of rules in your grammar file that were not needed to reproduce the problem. We did not need:
%token BEGINtoken
%token COMMAtoken
%token ENDtoken
%token EOFtoken
%token EQtoken
%token <value> IDtoken
%token IStoken
%token PROGRAMtoken
%token VARtoken
%%
start: PROGRAMtoken IDtoken IStoken compoundstatement
compoundstatement: BEGINtoken {print_header();} statement semi ENDtoken {print_end();}
semi: SEMItoken statement semi
|
| declaration
declaration: VARtoken IDtoken comma
comma: COMMAtoken IDtoken comma
|
You could have just removed these tokens and rules to get to the heart of the operator precedence issue. We did not need any variables, declarations, assignments or program structure to illustrate the failure. Learning to simplify is the heart of competent debugging and thus programming. If you'd done this simplification more people would have had a go at answering. I'm saying this not for the OP, but for those that will follow with similar problems!
I'm wondering what school is setting this assignment, as I've seen a fair number of yacc questions on SO around the same dumb problem. I suspect more will come here every year, so answering this will help them. I knew what the issue was on inspection of the grammar, but to test my solution I had to code up a working lexer, some symbol table routines, a main program and other ancillary code. Again, another deterrent for problem solvers.
Lets get to the heart of the problem. You have these token declarations:
%left DIVIDEtoken
%left TIMEStoken
%left PLUStoken MINUStoken
These tell yacc that if any rules are ambiguous that the operators associate left. Your rules for these operators are:
exp: exp PLUStoken term
{ $$ = $1 + $3; }
| exp MINUStoken term
{ $$ = $1 - $3; }
| term
{ $$ = $1; }
| MINUStoken term
{ $$ = -$2;}
term: factor
{ $$ = $1;
}
| factor TIMEStoken term
{$$ = $1 * $3;
}
| factor DIVIDEtoken term
{ $$ = $1 / $3;
}
However, these rules are not ambiguous, and thus the operator precedence declaration is not required. Yacc will follow the non-ambiguous grammar you have used. The ways these rules are written, it tells yacc that the operators have right associativity, which is the opposite of what you want. Now, it could be clearly seen from the simple arithmetic in your example that the operators were being calculated in a right associative way, and you wanted the opposite. There were really big clues there weren't there?
OK. How to change the associativity? One way would be to make the grammar ambiguous again so that the %left declaration is used, or just flip the rules around to invert the associativity. That's what I did:
exp: term PLUStoken exp
{ $$ = $1 + $3; }
| term MINUStoken exp
{ $$ = $1 - $3; }
| term
{ $$ = $1; }
| MINUStoken term
{ $$ = -$2;}
term: factor
{ $$ = $1;
}
| term TIMEStoken factor
{$$ = $1 * $3;
}
| term DIVIDEtoken factor
{ $$ = $1 / $3;
}
Do you see what I did there? I rotated the grammar rule around the operator.
Now for some more disclaimers. I said this is a dumb exercise. Interpreting expressions on the fly is a poor use of the yacc tool, and not what happens in real compilers or interpreters. In a realistic implementation, a parse tree would be built and the value calculations would be performed during the tree walk. This would then enable the issues of undeclared variables to be resolved (which also occurs in this exercise). The use of the regs array to store values is also dumb, because there is clearly an ancillary symbol table in use to return a unique integer ID for the symbols. In a real compiler/interpreter those values would also be stored in that symbol table.
I hope this tutorial has helped further students understand these parsing issues in their classwork.
I wrote a PHP5 parser in ANTLR 3.4, which is almost ready, but I can not handle one of the tricky feature of PHP. My problem is with the precedence of assignment operator. As the PHP manual says the precedence of assignment is almost at the end of the list. Only and, xor, or and , are after it in the list.
But there is a note on this the manual page which says:
Although = has a lower precedence than most other operators, PHP will
still allow expressions similar to the following: if (!$a = foo()), in
which case the return value of foo() is put into $a.
The small example in the note isn't a problem for my parser, I can handle this as a special case in the assigment rule.
But there are more complex codes eg:
if ($a && $b = func()) {}
My parser fails here, because it recognizes first $a && $b and can not deal with the rest of the conditioin. This is because the && has higher precedence, than =.
If I put brackets around the right side of &&:
if ($a && ($b = func())) {}
In this way the parser recognizes the structure well.
The operators are built in the way that the ANTLR book recommends: there are the base exressions at the first step and each level of operators are coming after each other.
Is there any way to handle this precedence jumping?
Don't look at it as an assignment, but let's name it an assignment expression. Put this assignment expression "below" the unary expressions (so they have a higher precedence than the unary ones):
grammar T;
options {
output=AST;
}
tokens {
BLOCK;
FUNC_CALL;
EXPR_LIST;
}
parse
: stat* EOF!
;
stat
: assignment ';'!
| if_stat
;
assignment
: Var '='^ expr
;
if_stat
: If '(' expr ')' block -> ^(If expr block)
;
block
: '{' stat* '}' -> ^(BLOCK stat*)
;
expr
: or_expr
;
or_expr
: and_expr ('||'^ and_expr)*
;
and_expr
: unary_expr ('&&'^ unary_expr)*
;
unary_expr
: '!'^ assign_expr
| '-'^ assign_expr
| assign_expr
;
assign_expr
: Var ('='^ atom)*
| atom
;
atom
: Num
| func_call
;
func_call
: Id '(' expr_list ')' -> ^(FUNC_CALL Id expr_list)
;
expr_list
: (expr (',' expr)*)? -> ^(EXPR_LIST expr*)
;
If : 'if';
Num : '0'..'9'+;
Var : '$' Id;
Id : ('a'..'z')+;
Space : (' ' | '\t' | '\r' | '\n')+ {skip();};
If you'd now parse the source:
if (!$a = foo()) { $a = 1 && 2; }
if ($a && $b = func()) { $b = 2 && 3; }
if ($a = baz() && $b) { $c = 3 && 4; }
the following AST would get constructed:
Say I have a grammar like this:
expr : expr '+' expr { $$ = operation('+', $1, $3); }
| expr '-' expr { $$ = operation('-', $1, $3); }
| expr '*' expr { $$ = operation('*', $1, $3); }
| expr '/' expr { $$ = operation('/', $1, $3); }
| num
;
Where each of those operators has a precedence attached and is marked as left associative.
Then I want to refactor my grammar such that:
op : '+' | '-' | '*' | '/' ;
expr : expr op expr { $$ = operation($2, $1, $3); }
| num
;
How does yacc (if even at all) determine the associativity and precedence of op in this case? Will it trace its way through all the possible precedences/associativities of +, -, * and / when evaluating op, or does defining an associativity for nonterminal symbols make no sense?
AFAIK, with precedence order for nonterminals, it uses the precedence of the rightmost terminal symbol, but I can't find any documentation on the associativity rules themselves for nonterminals.
The "normal" way to do this (as far as I'm aware) is to define a different expr type for each operator, that way you get very explicit control over what's happening.
Python's grammar is a good example of this: http://docs.python.org/reference/grammar.html.