I try to use the Java.g directly written by Terrence (https://github.com/antlr/grammars-v4/blob/master/java/Java.g4). And I will use this grammar in ANTLRWorks (http://tunnelvisionlabs.com/products/demo/antlrworks). In this code, I got the error
"Syntax Error, '<' came as a complete surprise"
| <assoc=right> expression
( '='
| '+='
| '-='
| '*='
| '/='
| '&='
| '|='
| '^='
| '>>='
| '>>>='
| '<<='
| '%='
)
expression
That means that ANTLRWorks2 is slightly out of date and uses an earlier version of ANTLR. I think Sam will be updating soon.
Related
I'm actually just extracting a portion of a T-SQL grammar (https://github.com/antlr/grammars-v4). Specifically, the portion that deals with the WHERE-clause, minus any sub-query logic. Unfortunately, it appears that
expression
: primitive_expression
| function_call
| expression COLLATE id
| case_expression
| full_column_name
| bracket_expression
| unary_operator_expression
| expression op=(ASTERSIK | SLASH_F | P_SIGN) expression
| expression op=(PLUS | HYPHEN | AMPERSAND | CARET | PIPE | (PIPE PIPE)) expression
| expression comparison_operator expression
| expression assignment_operator expression
| over_clause
;
is left-recursive with itself. I'm not sure why I would be getting the error and the whole project doesn't.
Particularly when I use more than 3 OR symbols.
datatype:
Integer | Float | Char | Blah | Blah
entity:
Class | Struct | Enumeration | Union
the complete grammar can be found here: https://gist.github.com/Mrprofessor/7b8df3f00c75ef2ac67bffd0a20e983c
The problem is that your grammar is ambigous
consider this model
Bla;
Blubb;
Pling;
are these Bits | Pointers | Labels | Entrys | Logicals | HwordLogicals | Bytes
I am getting the error: The following sets of rules are mutually left-recursive [symbolExpression]. In my grammar, symbolExpression is directly left-recursive so shouldn't ANTLR4 be handling this?
Here are the relevant parts of my parser:
operation:
OPERATOR '(' (operation | values | value | symbolExpression) ')' #OperatorExpression
| bracketedSymbolExpression #BracketedOperatorExpression
;
symbolExpression:
(operation | values | value | symbolExpression) SYMBOL (operation | values | value | symbolExpression);
bracketedSymbolExpression:
'(' (operation | values | value | symbolExpression) SYMBOL (operation | values | value | symbolExpression) ')';
list: '[' (operation | value) (',' (operation | value))* ']';
values: (operation | value) (',' (operation | value))+;
value:
NUMBER
| IDENTIFIER
| list
| object;
The elements 'symbolExpression' and 'operation' in the rule 'symbolExpression' are interdependently left recursive.
Without knowing the language specification, it is impossible to say whether the language itself is irrecoverably ambiguous.
Nonetheless, one avenue to try is to refactor the grammar to move repeated clauses, like
( operation | value )
and
(operation | values | value | symbolExpression)
to their own rules with the goal of unifying the 'operation' and 'symbolExpression' (and perhaps 'bracketedSymbolExpression') rules into a single rule -- a rule that is at most self left-recursive. Something like
a : value
| LPAREN a* RPAREN
| LBRACK a* LBRACK
| a SYMBOL a
| a ( COMMA a )+
;
I downloaded the TL Grammar from https://raw.githubusercontent.com/bkiers/tiny-language-antlr4/master/src/main/antlr4/tl/antlr4/TL.g4
And after attempting to try it, I realized the grammar is unable to handle user defined function calls at the top-level
For example, if your file contents are:
def s(n)
return n+n;
end
s("5", "6");
And you listen for a FunctionCallExpression, you don't get a callback. However, if your file contents are:
def s(n)
return n+n;
end
s(s("5"))
you do get the call back.
Your input:
s("5", "6");
is matched by the statement (not an expression!):
functionCall
: Identifier '(' exprList? ')' #identifierFunctionCall
| ...
;
and "5", "6" are two expressions matched by exprList.
The first s in your input s(s("5")) will again match identifierFunctionCall, and the inner s will be matched as an expression (a functionCallExpression to be precise).
Here are the different parse trees:
s("5", "6");
'- parse
|- block
| '- statement
| |- identifierFunctionCall
| | |- s
| | |- (
| | |- exprList
| | | |- stringExpression
| | | | '- "5"
| | | |- ,
| | | '- stringExpression
| | | '- "6"
| | '- )
| '- ;
'- <EOF>
s(s("5"));
'- parse
|- block
| '- statement
| |- identifierFunctionCall
| | |- s
| | |- (
| | |- exprList
| | | '- functionCallExpression
| | | '- identifierFunctionCall
| | | |- s
| | | |- (
| | | |- exprList
| | | | '- stringExpression
| | | | '- "5"
| | | '- )
| | '- )
| '- ;
'- <EOF>
In short: the grammar works as it is supposed to.
EDIT
A valid TL script is a code block where each code block consists of statements. To simplify the grammar and eliminate some ambiguous rules (which was needed for the older ANTLRv3), it was easiest to not allow a statement to be a simple expression. For example, the following code is not a valid TL script:
1 + 2;
I.e. 1 + 2 is not a statement, but an expression.
However a function call might be a statement, but, when placed at the right hand side of an assignment statement, it could also be an expression:
foo(); // this is a statement
i = foo(); // now foo is an expression
That is why you observed one s(...) to trigger a certain enter...() method, while the other did not.
I'm trying to write a grammar for Parsekit to be used in my iphone app. Am I doing this correctly?
#start = wff;
wff = disjunction ('IMPLIES' | disjunction);
disjunction = conjunction ('OR' | conjunction)*;
conjunction = notexpression ('AND' | notexpression)*'
notexpression = ('NOT')+ primaryexpression;
primaryexpression = (literal | '(' wff ')');
literal = (A | B | C | D | E | F | G | H | I | J | K | L | M | N |O | P | Q | R | S | T | U | V | W | X | Y | Z);
I am getting the error:
2012-11-26 10:41:06.348 SemanticTab[4092:c07] *** Terminating app due to uncaught exception 'NSInternalInconsistencyException', reason: 'Could not build ClassName from token array for parserName: conjunction'
*** First throw call stack:
When trying to parse P OR Q?
Developer of ParseKit here.
I see two obvious problems:
The line with the conjunction production definition is terminated with a ' (single quote). That should instead be a ; (semi colon).
The definition for the literal production is not valid. There are no productions called A, B, C, etc. defined. However, if I understand your intention, the easier way to define literal is to use the built-in Word production:
literal = Word;