I want to write a concrete grammar to parse BNF-like syntax definitions.
Looking at the EXP Concrete Syntax recipe I created this very simple first version:
module BNFParser
lexical Identifier = [a-z]+ ;
syntax GrammarRule = left RuleHead ":" RuleCase* ";" ;
syntax RuleHead = Identifier ;
syntax RuleCase = Identifier ;
and invoked it in the Repl like this:
import BNFParser;
import ParseTree;
parse(#GrammarRule, "foo : bar baz ;");
But this results in a rather arcane error message:
|std:///ParseTree.rsc|(13035,1963,<393,0>,<439,114>): ParseError(|unknown:///|(3,1,<1,3>,<1,4>))
at *** somewhere ***(|std:///ParseTree.rsc|(13035,1963,<393,0>,<439,114>))
at parse(|std:///ParseTree.rsc|(14991,5,<439,107>,<439,112>))
ok
I also tried using the start keyword ahead of GrammarRule, but that didn't help. What am I doing wrong?
lexical Identifier = [a-z]+ !>> [a-z];
That helps for ambiguous lists of identifiers. The additional !>> constraint declares that identifiers are only acceptable if no further characters can be consumed.
Also this is required for fixing the parse error:
layout Whitespace = [\ \n\r]*;
For all syntax rules in scope it will intermix this nonterminal between all symbols. It leaves the lexical rules alone.
Related
I have facing a problem while parsing some command with the parser which, I have implemented using ANLTR3. Parser fails to parse those commands which contains 'any-word' that is declared as lexer rule in the grammar.
For Example take a look following grammar:
show :
SHOW TABLES '[' projectName? tableName']' -> ^(SHOW TABLES_ ^(PROJECT_NAME projectName)? ^(DATASET_TABLE tableName));
SHOW : S H O W;
If i try to parse command 'SHOW TABLES [sample-project:SHOW]' then parse fails for this command.But if I change the SHOW word then it works.
SHOW TABLES [sample-project:SHOW] - this works.
I don't want to get name as string which is surrounded in quotes(").
Can anyone suggest solution? I am using ANTLR3.
Thanks in advance.
This is a typical effect of using a reserved word as identifier. In ANTLR when you define a reserved word like your SHOW rule it will implicitly excluded from a identifier rule you might have defined after that keyword rule.
The solution to allow such keywords also as identifiers in rules like your tablName is to make that rule accept certain (or all) keywords that could be accepted in that place (and will not act as keywords then). Example:
tableName:
IDENTIFIER
| SHOW
| <others go here>
;
Surprise, I am building an SQL like language parser for a project.
I had it mostly working, but when I started testing it against real requests it would be handling, I realized it was behaving differently on the inside than I thought.
The main issue in the following grammar is that I define a lexer rule PCT_WITHIN for the language keyword 'pct_within'. This works fine, but if I try to match a field like 'attributes.pct_vac', I get the field having text of 'attributes.ac' and a pretty ANTLR error of:
line 1:15 mismatched character u'v' expecting 'c'
GRAMMAR
grammar Select;
options {
language=Python;
}
eval returns [value]
: field EOF
;
field returns [value]
: fieldsegments {print $field.text}
;
fieldsegments
: fieldsegment (DOT (fieldsegment))*
;
fieldsegment
: ICHAR+ (USCORE ICHAR+)*
;
WS : ('\t' | ' ' | '\r' | '\n')+ {self.skip();};
ICHAR : ('a'..'z'|'A'..'Z');
PCT_CONTAINS : 'pct_contains';
USCORE : '_';
DOT : '.';
I have been reading everything I can find on the topic. How the Lexer consumes stuff as it finds it even if it is wrong. How you can use semantic predication to remove ambiguity/how to use lookahead. But everything I read hasn't helped me fix this issue.
Honestly I don't see how it even CAN be an issue. I must be missing something super obvious because other grammars I see have Lexer rules like EXISTS but that doesn't cause the parser to take a string like 'existsOrNot' and spit out and IDENTIFIER with the text of 'rNot'.
What am I missing or doing completely wrong?
Convert your fieldsegment parser rule into a lexer rule. As it stands now it will accept input like
"abc
_ abc"
which is probably not what you want. The keyword "pct_contains" won't be matched by this rule since it is defined separately. If you want to accept the keyword in certain sequences as regular identifier you will have to include it in the accepted identifier rule.
I'm using ANTLR to generate recognizer for a java-like language and the following rules are used to recognize generic types:
referenceType
: singleType ('.' singleType)*
;
singleType
: Identifier typeArguments?
;
typeArguments
: '<' typeArgument (',' typeArgument)* '>'
;
typeArgument
: referenceType
;
Now, for the following input statement, ANTLR produces the 'no viable alternative' error.
Iterator<Entry<K,V>> i = entrySet().iterator();
However, if I put a space between the two consecutive '>' characters, no error is produced. It seams that ANTLR cannot distinguish between the above rule and the rule used to recognize shift expressions, but I don't know how to modify the grammar to resolve this ambiguity. Any help would be appreciated.
You probably have a rule like the following in the lexer:
RightShift : '>>';
For ANTLR to recognize >> as either two > characters or one >> operator, depending on context, you'll need to instead place your shift operator in the parser:
rightShift : '>' '>';
If your language includes the >>> or >>= operators, those would need to be moved to the parser as well.
To validate that x > > y isn't allowed, you'll want to make a pass over the resulting parse tree (ANTLR 4) or AST (ANTLR 3) to verify that the two > characters parsed by the rightShift parser rule appear in sequence.
280Z28 is probably right in his diagnosis that you have a rule like
RightShift : '>>';
An alternative solution is to explicitly include the possibility of a trailing >> in your parser. (I have seen this in other grammars, but only in LALR.)
typeArguments
: ('<' typeArgument (',' typeArgument)* '>') |
('<' typeArgument ',' referenceType '<' typeArgument RightShift );
;
In Antlr3, that will need to be left factored.
Whether this is clearer or having a second pass that validates your right shift operator depends on how often you need to use this.
In ANTLR, I have a MismatchedTokenException with the following definition:
type : IDENTIFIER ('<' (type (',' type)*) '>')?;
And the following test:
A<B,C<D>>
The exception occurs when parsing the first >. ANTLR tries parsing both '>>' at once, and fails.
With a silent whitespace channel, the following test does work:
A<B,C<D> >
In which ANTLR is clearly instructed to treat each token separately.
How can I fix that?
I could not reproduce that. The parser generated by:
grammar T;
type : IDENTIFIER ('<' (type (',' type)*) '>')?;
IDENTIFIER : 'A'..'Z';
parses the input A<B,C<D>> (without spaces) into the following parse tree:
You'll need to provide the grammar that causes this input to produce a MismatchedTokenException.
Perhaps you're using ANTLRWorks' interpreter (or Eclipse's ANTLR-IDE, which uses the same interpreter)? In that case, that is probably the problem: it's notoriously buggy. Don't use it, but use ANTLRWorks' debugger: it's great (the image posted above comes from the debugger).
Lazlo Bonin wrote:
Got it. I had a << token defined. Quickly, is there a way to priorize token recognition over another?
No, the lexer simply tries to match as much as possible. So if it can create a token matching << (or >>), it will do so in favor of two single < (or >) tokens. Only when two (or more) lexer rules match the same amount of characters, a prioritization is made: the rule defined first will then "win" over the one(s) defined later in the grammar.
I'm using CUP to create a parser that I need for my thesis. I have a shift/reduce conflict in my grammar. I have this production rule:
command ::= IDENTIFIER | IDENTIFIER LPAREN parlist RPAREN;
and I have this warning:
Warning : *** Shift/Reduce conflict found in state #3
between command ::= IDENTIFIER (*)
and command ::= IDENTIFIER (*) LPAREN parlist RPAREN
under symbol LPAREN
Now, I actually wanted it to shift so I'm pretty ok with it, but my professor told me to find a way to solve the conflict. I'm blind. I've always read about the if/else conflict but to me this doesn't seem the case.
Can you help me?
P.S.: IDENTIFIER, LPAREN "(" and RPAREN ")" are terminal, parlist and command are not.
Your problem is not in those rules at all. Although Michael Mrozek answer is correct approach to resolving the "dangling else problem", it does not grasp the problem at hand.
If you look at the error message, you see that the shift / reduce conflict is present when lexing LPAREN. I am pretty sure that the rules alone will not create a conflict.
I can't see your grammar, so I can't help you. But your conflict is probably when a command is followed by a different rule that start with a LPAREN.
Look at any other rules that can potentially be after command and start with LPAREN. You will then have to consolidate the rules. There is a very good chance that your grammar is erroneous for a specific input.
You have two productions:
command ::= IDENTIFIER
command ::= IDENTIFIER LPAREN parlist RPAREN;
It's a shift/reduce conflict when the input tokens are IDENTIFIER LPAREN, because:
LPAREN could be the start of a new production you haven't listed, in which case the parser should reduce the IDENTIFIER already on the stack into command, and have command LPAREN remaining
They could both be the start of the second production, so it should shift the LPAREN onto the stack next to IDENTIFIER and keep reading, trying to find a parlist.
You can fix it by doing something like this:
command ::= IDENTIFIER command2
command2 ::= LPAREN parlist RPAREN |;
Try to set a precedence:
precedence left LPAREN, RPARENT;
It forces CUP to decide the conflict, taking the left match.