Converting a BNF grammar to JavaCC

Converting a BNF grammar to JavaCC - grammar

I wrote the following JavaCC code from a BNF grammar:
TOKEN: {
<NEXPR: <NTERM> ((<ADD> | <SUB>) <NTERM>)*> |
<NTERM: <NFACTOR> ((<MUL> | <DIV>) <NFACTOR>)*> |
<NFACTOR: (<SUB>)? (<DIGIT> | <REPCOUNT> | <PARAMETER> | <LPAR> <NEXPR> <RPAR>)>
}
I recive an error from javacc:
Loop in regular expression detected: "...NEXPR... --> ...NTERM... -->
...NFACTOR... --> ...NEXPR..."
I think my grammar is not left-recursion, anyway, I read that JavaCC doesn't support a left-recursion grammar, so I need to convert that code in a right-recursion one. Now I know the following principle:
This grammar (left recursion)
X -> Xa | b
Becomes (right recursion):
X -> bX'
X' -> aX' | {}
or (adapted to JavaCC situation)
X -> b(a)*
But I don't know how can I convert my example since it's formed by three rules. Can you help me, please?
Initial grammar (BNF form):
NExpr = NTerm { ( "+" | "-" ) NTerm }
NTerm = NFactor { ( "*" | "/" ) NFactor }
NFactor = "-" ( Number | REPCOUNT | Parameter | "(" NExpr ")" ) |
Number | REPCOUNT | Parameter | "(" NExpr ")"

Related

Simple arithmetic parser created with Kotlin and better-parse fails

Please have a look at the following parser I implemented in Kotlin using the parser combinator library better-parse. It parses—should parse, rather—simple arithmetic expressions:
object CalcGrammar: Grammar<Int>() {
val num by regexToken("""\d+""")
val pls by literalToken("+")
val min by literalToken("-")
val mul by literalToken("*")
val div by literalToken("/")
val lpr by literalToken("(")
val rpr by literalToken(")")
val wsp by regexToken("\\s+", ignore = true)
// expr ::= term + expr | term - expr | term
// term ::= fact * term | fact / term | fact
// fact ::= (expr) | -fact | int
val fact: Parser<Int> by
skip(lpr) and parser(::expr) and skip(rpr) or
(skip(min) and parser(::fact) map { -it }) or
(num map { it.text.toInt() })
val term by
leftAssociative(fact, mul) { a, _, b -> a * b } or
leftAssociative(fact, div) { a, _, b -> a / b } or
fact
val expr by
leftAssociative(term, pls) { a, _, b -> a + b } or
leftAssociative(term, min) { a, _, b -> a - b } or
term
override val rootParser by expr
}
However, when I parse -2 + 4 - 5 + 6, I get this ParseException:
Could not parse input: UnparsedRemainder(startsWith=min#8 for "-" at 7 (1:8))
At first, I thought the issue was that I have not codified the self-recursion in the expr productions, i.e., the code does not accurately represent the grammar:
// Reference to `expr` is missing on the right-hand side...
val expr = ... leftAssociative(term, min) ...
// ...although the corresponding production defines it.
// expr ::= ... term - expr ...
But then I noticed that the official example for an arithmetic parser provided as part of the library's documentation—which turns out to be almost identical afaict—also omits this.
What did I do wrong if not this? And how can I make it work?

I am not entirely familiar with this library, but it seems your translation from grammar to code is too literal for the library's syntax and the library actually implicitly handles much of what you are explicitly writing, and it seems that as a part of this "ease of use" it breaks what is seemingly correct code.
For starters, you will find that your code behaves the exact same way whether or not you chain or term on to the end of expr, and or fact on to the end of term.
Based on this I was able to come to the conclusion that these ors are not working as expected when chaining together the leftAssociatives, in fact you would run in to the same problem had you attempted to parse a division. I believe it is for this reason that the example you provided a link to combines addition and subtraction (as with multiplication and division) into a single, more dynamic, leftAssociative call. And if you copy that same work to your own code, it runs flawlessly:
object CalcGrammar: Grammar<Int>() {
val num by regexToken("""\d+""")
val pls by literalToken("+")
val min by literalToken("-")
val mul by literalToken("*")
val div by literalToken("/")
val lpr by literalToken("(")
val rpr by literalToken(")")
val wsp by regexToken("\\s+", ignore = true)
// expr ::= term + expr | term - expr | term
// term ::= fact * term | fact / term | fact
// fact ::= (expr) | -fact | int
val fact: Parser<Int> by
skip(lpr) and parser(::expr) and skip(rpr) or
(skip(min) and parser(::fact) map { -it }) or
(num map { it.text.toInt() })
private val term by
leftAssociative(fact, div or mul use { type }) { a, op, b ->
if (op == div) a / b else a * b
}
private val expr by
leftAssociative(term, pls or min use { type }) { a, op, b ->
if (op == pls) a + b else a - b
}
override val rootParser by expr
}

Antlr3 Non-recursive / leftfactorized grammar for expressions with a nice AST

I have defined the following production rules for expressions. The grammar is not allowed to have backtracking and k better or equal to 3. The current version seams to have some ambiguity, but I can't figure out where. I've removed the AST rules here, but the grammar is supposed to create a nice AST where operations are presented according to their priority as well as showing the left associativity of operations.
Antlr 3.2.1 with Antlerworks 1.5.1
disjunctionExpresion
: (conjunctionExpresion Disjunction conjunctionExpresion disjunctionExpresionDash) | conjunctionExpresion;
disjunctionExpresionDash
: (Disjunction conjunctionExpresion disjunctionExpresionDash) |;
conjunctionExpresion
: (relationalExpresion Conjunction relationalExpresion conjunctionExpresionDash) | relationalExpresion;
conjunctionExpresionDash
: (Conjunction relationalExpresion conjunctionExpresionDash)|;
relationalExpresion
: (addExpresion RelationalOperator addExpresion relationalExpresionDash) | addExpresion;
relationalExpresionDash
: (RelationalOperator addExpresion relationalExpresionDash)|;
addExpresion
: (multiExpresion addOperator multiExpresion addExpresionDash)| multiExpresion;
addExpresionDash
: (addOperator multiExpresion addExpresionDash)|;
multiExpresion
: (unaryExpresion MultiOperator unaryExpresion multiExpresionDash) | unaryExpresion;
multiExpresionDash
: (MultiOperator unaryExpresion multiExpresionDash) | ;
unaryExpresion
: (unaryOperator basicExpr)->^(unaryOperator basicExpr) | basicExpr -> basicExpr;
basicExpr
: Number | var basicExprDash | ('(' expr ')')->expr;
basicExprDash
: 'touches' var | ;

With #kaby76's hint of using EBNF and looking up example grammars, I have ended up with something similar to this C++ Antlr3 example. The Antrl3 Documentations for Tree construction was also very helpful.
The "^" quick operator allowed me to create the desired AST.
expr : disjunctionExpression;
disjunctionExpression
: conjunctionExpression (Disjunction^ conjunctionExpression)*;
/*equal to:
disjunctionExpression
: (a=conjunctionExpression->$a) (o=Disjunction b=conjunctionExpression -> ^($o $disjunctionExpression $b) )*;
*/
conjunctionExpression
: relationalExpression (Conjunction^ relationalExpression)*;
relationalExpression
: additiveExpression (relationalOperator^ additiveExpression)*;
additiveExpression
: multiExpression (addOperator^ multiExpression)*;
multiExpression
: unaryExpression (multiOperator^ unaryExpression)*;
unaryExpression
: (unaryOperator^)? basicExpr;
basicExpr
: Number | var ('touches'^ var)? | '(' expr ')' -> expr;
Which is actually the compact version of what I had before:
expr : disjunctionExpresion;
disjunctionExpresion
: conjunctionExpresion disjunctionExpresionDash;
disjunctionExpresionDash
: (Disjunction conjunctionExpresion disjunctionExpresionDash) |;

Returning a boxed iterator over a value in a structure [duplicate]

This question already has answers here:
Why is adding a lifetime to a trait with the plus operator (Iterator<Item = &Foo> + 'a) needed?
(1 answer)
What is the correct way to return an Iterator (or any other trait)?
(2 answers)
Closed 5 years ago.
I want a function on a struct which returns an iterator to one of the struct's members. I tried something like the following, which doesn't work.
struct Foo {
bar: Vec<String>,
}
impl Foo {
fn baz(&self) -> Box<Iterator<Item = &String>> {
Box::new(self.bar.iter())
}
}
fn main() {
let x = Foo { bar: vec!["quux".to_string()] };
x.baz();
}
When compiling, I get the error below:
error[E0495]: cannot infer an appropriate lifetime for lifetime parameter in function call due to conflicting requirements
--> src/main.rs:7:27
|
7 | Box::new(self.bar.iter())
| ^^^^
|
note: first, the lifetime cannot outlive the anonymous lifetime #1 defined on the method body at 6:5...
--> src/main.rs:6:5
|
6 | / fn baz(&self) -> Box<Iterator<Item = &String>> {
7 | | Box::new(self.bar.iter())
8 | | }
| |_____^
note: ...so that reference does not outlive borrowed content
--> src/main.rs:7:18
|
7 | Box::new(self.bar.iter())
| ^^^^^^^^
= note: but, the lifetime must be valid for the static lifetime...
note: ...so that expression is assignable (expected std::boxed::Box<std::iter::Iterator<Item=&std::string::String> + 'static>, found std::boxed::Box<std::iter::Iterator<Item=&std::string::String>>)
--> src/main.rs:7:9
|
7 | Box::new(self.bar.iter())
| ^^^^^^^^^^^^^^^^^^^^^^^^^
I also tried adding lifetime information as suggested elsewhere
struct Foo {
bar: Vec<String>,
}
impl Foo {
fn baz<'a>(&self) -> Box<Iterator<Item = &String> + 'a> {
Box::new(self.bar.iter())
}
}
fn main() {
let x = Foo { bar: vec!["quux".to_string()] };
x.baz();
}
Now my error is the following:
error[E0495]: cannot infer an appropriate lifetime for lifetime parameter in function call due to conflicting requirements
--> src/main.rs:7:27
|
7 | Box::new(self.bar.iter())
| ^^^^
|
note: first, the lifetime cannot outlive the anonymous lifetime #1 defined on the method body at 6:5...
--> src/main.rs:6:5
|
6 | / fn baz<'a>(&self) -> Box<Iterator<Item = &String> + 'a> {
7 | | Box::new(self.bar.iter())
8 | | }
| |_____^
note: ...so that reference does not outlive borrowed content
--> src/main.rs:7:18
|
7 | Box::new(self.bar.iter())
| ^^^^^^^^
note: but, the lifetime must be valid for the lifetime 'a as defined on the method body at 6:5...
--> src/main.rs:6:5
|
6 | / fn baz<'a>(&self) -> Box<Iterator<Item = &String> + 'a> {
7 | | Box::new(self.bar.iter())
8 | | }
| |_____^
note: ...so that expression is assignable (expected std::boxed::Box<std::iter::Iterator<Item=&std::string::String> + 'a>, found std::boxed::Box<std::iter::Iterator<Item=&std::string::String>>)
--> src/main.rs:7:9
|
7 | Box::new(self.bar.iter())
| ^^^^^^^^^^^^^^^^^^^^^^^^^

ANTLR3 - Decision can match input using multiple alternatives

When running ANTLR3 on the following code, I get the message - warning(200): MYGRAMMAR.g:40:36: Decision can match input such as "QMARK" using multiple alternatives: 3, 4
As a result, alternative(s) 4 were disabled for that input.
The warning message is pointing me to postfixExpr. Is there a way to fix this?
grammar MYGRAMMAR;
options {language = C;}
tokens {
BANG = '!';
COLON = ':';
FALSE_LITERAL = 'false';
GREATER = '>';
LSHIFT = '<<';
MINUS = '-';
MINUS_MINUS = '--';
PLUS = '+';
PLUS_PLUS = '++';
QMARK = '?';
QMARK_COLON = '?:';
TILDE = '~';
TRUE_LITERAL = 'true';
}
condExpr
: shiftExpr (QMARK condExpr COLON condExpr)? ;
shiftExpr
: addExpr ( shiftOp addExpr)* ;
addExpr
: qmarkColonExpr ( addOp qmarkColonExpr)* ;
qmarkColonExpr
: prefixExpr ( QMARK_COLON prefixExpr )? ;
prefixExpr
: ( prefixOrUnaryMinus | postfixExpr) ;
prefixOrUnaryMinus
: prefixOp prefixExpr ;
postfixExpr
: primaryExpr ( postfixOp | BANG | QMARK )*;
primaryExpr
: literal ;
shiftOp
: ( LSHIFT | rShift);
addOp
: (PLUS | MINUS);
prefixOp
: ( BANG | MINUS | TILDE | PLUS_PLUS | MINUS_MINUS );
postfixOp
: (PLUS_PLUS | MINUS_MINUS);
rShift
: (GREATER GREATER)=> a=GREATER b=GREATER {assertNoSpace($a,$b)}? ;
literal
: ( TRUE_LITERAL | FALSE_LITERAL );
assertNoSpace [pANTLR3_COMMON_TOKEN t1, pANTLR3_COMMON_TOKEN t2]
: { $t1->line == $t2->line && $t1->getCharPositionInLine($t1) + 1 == $t2->getCharPositionInLine($t2) }? ;

I think one problem is that PLUS_PLUS as well as MINUS_MINUS will never be matched as they are defined after the respective PLUS or MINUS token. therefore the lexer will always output two PLUS tokens instead of one PLUS_PLUS token.
In order to avaoid something like this you have to define your PLUS_PLUS or MINUS_MINUS token before the PLUS or MINUS token as the lexer processes them in the order they are defined and won't look any further once it found a way to match the current input.
The same problem applies to QMARK_COLON as it is defined after QMARK (this only is a problem because there is another token type COLON to match the following colon).
See if fixing the ambiguities resolves the error message.

Whats the correct way to add new tokens (rewrite) to create AST nodes that are not on the input steam

I've a pretty basic math expression grammar for ANTLR here and what's of interest is handling the implied * operator between parentheses e.g. (2-3)(4+5)(6*7) should actually be (2-3)*(4+5)*(6*7).
Given the input (2-3)(4+5)(6*7) I'm trying to add the missing * operator to the AST tree while parsing, in the following grammar I think I've managed to achieve that but I'm wondering if this is the correct, most elegant way?
grammar G;
options {
language = Java;
output=AST;
ASTLabelType=CommonTree;
}
tokens {
ADD = '+' ;
SUB = '-' ;
MUL = '*' ;
DIV = '/' ;
OPARN = '(' ;
CPARN = ')' ;
}
start
: expression EOF!
;
expression
: mult (( ADD^ | SUB^ ) mult)*
;
mult
: atom (( MUL^ | DIV^) atom)*
;
atom
: INTEGER
| (
OPARN expression CPARN -> expression
)
(
OPARN expression CPARN -> ^(MUL expression)+
)*
;
INTEGER : ('0'..'9')+ ;
WS : (' ' | '\t' | '\n' | '\r' | '\f')+ {$channel = HIDDEN;};
This grammar appears to output the correct AST Tree in ANTLRworks:
I'm only just starting to get to grips with parsing and ANTLR, don't have much experience so feedback with really appreciated!
Thanks in advance! Carl

First of all, you did a great job given the fact that you've never used ANTLR before.
You can omit the language=Java and ASTLabelType=CommonTree, which are the default values. So you can just do:
options {
output=AST;
}
Also, you don't have to specify the root node for each operator separately. So you don't have to do:
(ADD^ | SUB^)
but the following:
(ADD | SUB)^
will suffice. With only two operators, there's not much difference, but when implementing relational operators (>=, <=, > and <), the latter is a bit easier.
Now, for you AST: you'll probably want to create a binary tree: that way, all internal nodes are operators, and the leafs will be operands which makes the actual evaluating of your expressions much easier. To get a binary tree, you'll have to change your atom rule slightly:
atom
: INTEGER
| (
OPARN expression CPARN -> expression
)
(
OPARN e=expression CPARN -> ^(MUL $atom $e)
)*
;
which produces the following AST given the input "(2-3)(4+5)(6*7)":
(image produced by: graphviz-dev.appspot.com)
The DOT file was generated with the following test-class:
import org.antlr.runtime.*;
import org.antlr.runtime.tree.*;
import org.antlr.stringtemplate.*;
public class Main {
public static void main(String[] args) throws Exception {
GLexer lexer = new GLexer(new ANTLRStringStream("(2-3)(4+5)(6*7)"));
GParser parser = new GParser(new CommonTokenStream(lexer));
CommonTree tree = (CommonTree)parser.start().getTree();
DOTTreeGenerator gen = new DOTTreeGenerator();
StringTemplate st = gen.toDOT(tree);
System.out.println(st);
}
}

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Converting a BNF grammar to JavaCC - grammar

Related

Simple arithmetic parser created with Kotlin and better-parse fails

Antlr3 Non-recursive / leftfactorized grammar for expressions with a nice AST

Returning a boxed iterator over a value in a structure [duplicate]

ANTLR3 - Decision can match input using multiple alternatives

Whats the correct way to add new tokens (rewrite) to create AST nodes that are not on the input steam

Categories

Resources