Consider the following BNF grammer (where non-terminals are enclosed in angle-brackets and <identifier> matches to any legal Java variable identifier).
<exp> ::= <exp> + <term>
| <exp> - <term>
| <term>
<term> ::= <term> * <factor>
| <term> / <factor>
| <factor>
<factor> ::= ( <exp> )
| <identifier>
Produce a derivation three for the following expression:
(x - a) * (y + b)
Staring with exp:
<exp>
replace exp with term:
<term>
replace term with:
<term> * <factor>
replace term with factor:
<factor> * <factor>
replace both factors with (exp):
( <exp> ) * ( <exp> )
replace the first exp with exp - term and the second with exp + term
( <exp> - <term> ) * ( <exp> + <term> )
replace both exp's with term, and then replace all 4 terms with factors.
( <factor> - <factor> ) * ( <factor> + <factor> )
replace all factors with identifiers
( <identifier> - <identifier> ) * ( <identifier> + <identifier> )
Does this suffice?
You need to go one step further - <factor> is a nonterminal, and you should reduce it down to <identifier>.
Additionally, you should be starting from <expr> (and then reducing it to <term>) rather than starting from <term> directly.
Related
I'm trying to parse a haskell-like language using antlr4 and I'm stuck with lambdas. In haskell, lambdas can be mixed with operators. So, given operators >>=, + and lambda syntax '\\' args* '->' expr, the following expression is valid:
a >>= \a -> b >>= \b -> Just(a + b)
and it should be parsed into the following AST:
>>=
/ \
a ->
/ \
a >>=
/ \
b ->
/ \
b Just
|
+
/ \
a b
So I can think of two ways of structuring grammar for this kind of syntax.
The first is to put lambda expression into the top expression rule, among with ifs and other syntax constructs:
grammar Test;
root
: expr0 EOF
;
expr0
: '\\' ID '->' expr0
| expr1
;
expr1
: expr2 ('>>=' expr2)*
;
expr2
: expr3 ('+' expr3)*
;
expr3
: '(' expr0 ')'
| ID ('(' expr0 ')')?
;
This grammar cannot parse the above expression. It is required to add parens around lambda: a >>= (\a -> b >>= (\b -> Just(a + b))). While I understand why parens are required, this behaviour is pretty inconvenient.
The second approach would be to put lambda to the last expression rule, among with literals and nested expressions:
grammar Test;
root
: expr0 EOF
;
expr0
: expr1 ('>>=' expr1)*
;
expr1
: expr2 ('+' expr2)*
;
expr2
: '(' expr0 ')'
| ID ('(' expr0 ')')?
| '\\' ID '->' expr0
;
This grammar accepts my expression, however, it contains ambiguity because a >>= \a -> b >>= \b -> Just(a + b) can be parsed either as a >>= \a -> (b >>= \b) -> Just(a + b) or as a >>= \a -> (b >>= \b -> Just(a + b)).
So my question is, how to implement this kind of grammar properly?
I have the following EBNF expression grammar:
<expr> -> <term> { (+|-) <term> }
<term> -> <factor> { (*|/|%) <factor> }
<factor> -> <pow> { ** <pow> }
<pow> -> ( <expr> ) | <id>
<id> -> A | B | C
I need to determine if the grammar enforces any particular associativity for its operators, or if that would have to be implemented in the parser code. From what I have read so far, it doesn't look like it does, but I am having a hard time understanding what causes associativity. Any help would be greatly appreaciated!
The standard transformation which mutilatesconverts an expression grammar into a form which can be parsed with a top-down (LL) grammar has already removed associativity information, because the LL grammar cannot cope with left-associative operatord. In effect, the parse tree is nduced by an LL grammar makes all bi ary operators right-associative. However, you can generally re-associate the operators without too much trouble in a semantic action.
That's why the multiplication and exponentiation operators seem to have analogous grammar productions, although normally exponentiation would be right-associative while the other binary operators are left-associative.
In an LR grammar, this would be evident:
<expr> -> <term> | <expr> + <term> | <expr> - <term>
<term> -> <factor> | <term> * <factor> | <term> / <factor> | <term> % <factor>
<factor> -> <pow> | <pow> ** <factor>
<pow> -> ( <expr> ) | <id>
<id> -> A | B | C
In the above grammar, an operator is left-associative if the production is left-recursive (because the operator can only occur as part of the non-terminal on the left of the operator). Similarly, the right associative operator has a right-recursive rule, for the same reason.
I'm trying to write a grammar that supports functions calls without using parentheses:
f x, y
As in Haskell, I'd like function calls to minimally slurp up their parameters. That is, I want
g 5 + 3
to mean
(g 5) + 3
instead of
g (5 + 3)
Unfortunately, I'm getting the second parse with this grammar:
grammar Parameters;
expr
: '(' expr ')'
| expr MULTIPLICATIVE_OPERATOR expr
| expr ADDITIVE_OPERATOR expr
| ID (expr (',' expr)*?)??
| INT
;
MULTIPLICATIVE_OPERATOR: [*/%];
ADDITIVE_OPERATOR: '+';
ID: [a..z]+;
INT: '-'? [0-9]+;
WHITESPACE: [ \t\n\r]+ -> skip;
The parse tree I'm getting is this:
I had thought that the subrule listed first would get attempted first. In this case, expr ADDITIVE_OPERATOR expr appears before the ID subrule, so why is the ID subrule taking higher precedence?
In this case ANTLR does not the correct rule transformation (to eliminate left recursion and to handle precedences):
expr
: expr_1[0]
;
expr_1[int p]
: ('(' expr_1[0] ')' | INT | ID (expr_1[0] (',' expr_1[0])*?)??)
( {4 >= $p}? MULTIPLICATIVE_OPERATOR expr_1[5]
| {3 >= $p}? ADDITIVE_OPERATOR expr_1[4]
)*
;
leading to (expr (expr_1 a (expr_1 5 + (expr_1 3))))
correct would be:
expr
: expr_1[0]
;
expr_1[int p]
: ('(' expr_1[0] ')' | INT | ID (expr_1[5] (',' expr_1[5])*?)??)
( {4 >= $p}? MULTIPLICATIVE_OPERATOR expr_1[5]
| {3 >= $p}? ADDITIVE_OPERATOR expr_1[4]
)*
;
leading to (expr (expr_1 a (expr_1 5) + (expr_1 3)))
I am not certain if this is a bug in ANTLR4 or a trade-off of the transformation algorithm. Perhaps one should write an issue to the ANTLR4 jira.
To solve your problem you can simply put the correctly transformed grammar into your code and it should work. The explanation of rule transformation is found in "The Definitive ANTLR4 Reference" on pages 249ff (and perhaps somewhere on the web).
I am translating a grammar from LALR to ANTLR and I am having trouble with translating this one rule, piecewise expression.
Attached is the sample grammar:
grammar Test;
options {
language = Java;
output = AST;
}
parse : expression ';'
;
expression : binaryExpression
| piecesExpression
;
binaryExpression : addingExpression (('=='|'!='|'<='|'>='|'>'|'<') addingExpression)*
;
addingExpression : multiplyingExpression (('+'|'-') multiplyingExpression)*
;
multiplyingExpression : unaryExpression
(('*'|'/') unaryExpression)*
;
unaryExpression: ('!'|'-')* primitiveElement;
primitiveElement : literalExpression
| id
| '(' expression ')'
;
literalExpression : INT
;
id : IDENTIFIER
;
piecesExpression : 'piecewise' '{' piece expression '}' ('(' expression ',' expression ')')? expression?
;
piece : expression '->' expression ';' (expression '->' expression ';')*
;
// L E X I C A L R U L E S
INT : DIGITS ;
IDENTIFIER : LETTER (LETTER | DIGIT)*;
WS : ( ' '
| '\t'
| '\r'
| '\n'
) {$channel=HIDDEN;}
;
fragment LETTER : ('a'..'z' | 'A'..'Z' | '_') ;
fragment DIGITS: DIGIT+;
fragment DIGIT : '0'..'9';
ANTLR v3.5 is complaining about the piecesExpression rule. It has 2 fatal errors and I would rather not use backtrack option.
Expected results:
piecewise {t -> s; t -> x; 100}
piecewise {t -> s; t -> x; 100} (0, x+1)
piecewise {t -> s; t -> x; 100} (0, x+1) y+5
How can piecesExpression be able to capture the above results?
Thanks in advance!
ANTLR has problems determining which alternatives to take in (at least) 2 cases:
piece starts with a expression but inside the piecewise{...}, it should also end with an expression
piecesExpression ends with '(' expression ... but also has an optional trailing expression (and an primitiveElement also matches '(' expression ... in its turn)
There's no need to use global backtracking, but without rewriting many rules, you do need to add some predicates (the (...)=> in the example below) to fix the two issues outlined above.
Try this:
piecesExpression
: 'piecewise' '{' ((expression '->')=> piece)+ expression '}'
( ('(' expression ',')=> '(' expression ',' expression ')' expression?
| expression
)
;
piece
: expression '->' expression ';'
;
Given that I have the following grammar how would I add a rule to match something like 2^3 to create a power operator?
negation : '!'* term ;
unary : ('+'!|'-'^)* negation ;
mult : unary (('*' | '/' | ('%'|'mod') ) unary)* ;
add : mult (('+' | '-') mult)* ;
relation : add (('=' | '!=' | '<' | '<=' | '>=' | '>') add)* ;
expression : relation (('&&' | '||') relation)* ;
// LEXER ================================================================
HEX_NUMBER : '0x' HEX_DIGIT+;
fragment
FLOAT: ;
INTEGER : DIGIT+ ({input.LA(1)=='.' && input.LA(2)>='0' && input.LA(2)<='9'}?=> '.' DIGIT+ {$type=FLOAT;})? ;
fragment
HEX_DIGIT : (DIGIT|'a'..'f'|'A'..'F') ;
fragment
DIGIT : ('0'..'9') ;
What I have tried:
I tried something like power : ('+' | '-') unary'^' unary but that doesn't seem to work.
I also tried mult : unary (('*' | '/' | ('%'|'mod') | '^' ) unary)* ; but that doesn't work either.
To give ^ higher precedence than negation, do this:
pow : term ('^' term)* ;
negation : '!' negation | pow ;
unary : ('+'! | '-'^)* negation ;
If you want to consider the right-associativity already in the grammar, you can also use recursion:
pow : term ('^'^ pow)?
;
negation : '!'* pow;
...