Antlr evaluation order - sequence

I defined the following expression rule using Antlr 4 for a script language,
basically I am trying to evaluate
x = y.z.aa * 6
the correct evaluation order should be y.z then y.z.aa then it times 6;
((y.z).aa) * 6
however after the parsing aa*6 evaluated first, then z.(aa*6) then y.(z.(aa*6)), it becomes
y.(z.(aa * 6))
the square bracket is evaluated right
x = y[z][aa] * 6
can anyone help to point what I did wrong in dot access rule?
expression
: primary #PrimaryExpression
| expression ('.' expression ) + #DotAccessExpression
| expression ('[' expression ']')+ #ArrayAccessExpression
| expression ('*'|'/') expression #MulExpression
| expression ('+'|'-') expression #AddExpression
;
primary
: '(' expression ')'
| literal
| ident
;
literal
: NUMBER
| STRING
| NULL
| TRUE
| FALSE
;

You used the following rule:
expression ('.' expression)+
This rule does not fit the syntax pattern for a binary expression, so it's actually getting treated as a suffix expression. In particular, the expression following a . character is no longer restricted within the precedence hierarchy. You may be additionally affected by issue #679, but the real resolution is the same either way. You need to replace this alternative with the following:
expression '.' expression
The same goes for the ArrayAccessExpression, which should be written as follows:
expression '[' expression ']' #ArrayAccessExpression

Related

Antlr4 parser not parsing reassignment statement correctly

I've been creating a grammar parser using Antlr4 and wanted to add variable reassignment (without having to declare a new variable)
I've tried changing the reassignment statement to be an expression, but that didn't change anything
Here's a shortened version of my grammar:
grammar MyLanguage;
program: statement* EOF;
statement
: expression EOC
| variable EOC
| IDENTIFIER ASSIGNMENT expression EOC
;
variable: type IDENTIFIER (ASSIGNMENT expression)?;
expression
: STRING
| INTEGER
| IDENTIFIER
| expression MATH expression
| ('+' | '-') expression
;
MATH: '+' | '-' | '*' | '/' | '%' | '//' | '**';
ASSIGNMENT: MATH? '=';
EOC: ';';
WHITESPACE: [ \t\r\n]+ -> skip;
STRING: '"' (~[\u0000-\u0008\u0010-\u001F"] | [\t])* '"' | '\'' (~[\u0000-\u0008\u0010-\u001F'] | [\t])* '\'';
INTEGER: '0' | ('+' | '-')? [1-9][0-9]*;
IDENTIFIER: [a-zA-Z_][a-zA-Z0-9_]*;
type: 'str';
if anything else might be of relevance, please ask
so I tried to parse
str test = "empty";
test = "not empty";
which worked, but when I tried (part of the fibbionaci function)
temp = n1;
n1 = n1 + n2;
n2 = temp;
it got an error and parsed it as
temp = n1; //statement
n1 = n1 //statement - <missing ';'>
+n2; //statement
n2 = temp; //statement
Your problem has nothing to do with assignment statements. Additions simply don't work at all - whether they're part of an assignment or not. So the simplest input to get the error would be x+y;. If you print the token stream for that input (using grun with the -tokens option for example), you'll get the following output:
[#0,0:0='x',<IDENTIFIER>,1:0]
[#1,1:1='+',<'+'>,1:1]
[#2,2:2='y',<IDENTIFIER>,1:2]
[#3,3:3=';',<';'>,1:3]
[#4,4:3='<EOF>',<EOF>,1:4]
line 1:1 no viable alternative at input 'x+'
Now compare this to x*y;, which works fine:
[#0,0:0='x',<IDENTIFIER>,1:0]
[#1,1:1='*',<MATH>,1:1]
[#2,2:2='y',<IDENTIFIER>,1:2]
[#3,3:3=';',<';'>,1:3]
[#4,4:3='<EOF>',<EOF>,1:4]
The important difference here is that * is recognized as a MATH token, but + isn't. It's recognized as a '+' token instead.
This happens because you introduced a separate '+' (and '-') token type in the alternative | ('+' | '-') expression. So whenever the lexer sees a + it produces a '+' token, not a MATH token, because string literals in parser rules take precedence over named lexer rules.
If you turn MATH into a parser rule math (or maybe mathOperator) instead, all of the operators will be literals and the problem will go away. That said, you probably don't want a single rule for all math operators because that doesn't give you the precedence you want, but that's a different issue.
PS: Something like x+1 still won't work because it will see +1 as a single INTEGER token. You can fix that by removing the leading + and - from the INTEGER rule (that way x = -2 would be parsed as a unary minus applied to the integer 2 instead of just the integer -2, but that's not a problem).

Extend Antlr grammar file with lists

I have the following assignment about extending an Antlr grammar.
What I've tried is:
I am not sure whether this is the correct solution or not. Can anyone guide me in the right direction?
2 problems here: 1) you have 2 the same alt-labels (# Lists), and 2) you only allow zero or a single expression in your list. It should be this:
expr
: ...
| '(' expr ')' # Parenthesis
| '[' ( expr ( ',' expr )* )? ']' # Lists
;

How to fix this yacc shift/reduce conflict

I have this grammar
value
: INTEGER
| REAL
| LEFTBRACKET value RIGHTBRACKET
| op expression
| expression binaryop expression
;
and I am getting this shift reduce error
47 expression: value .
53 value: LEFTBRACKET value . RIGHTBRACKET
RIGHTBRACKET shift, and go to state 123
RIGHTBRACKET [reduce using rule 47 (expression)]
$default reduce using rule 47 (expression)`
So far I tried setting %left and %right priorities with no luck. I have also tried to use a new grammar for value that does not call itself again but I get conflicts. I tried this solution too
any thoughts?
Thank you in advance
EDIT
expression
: lvalue
| value
;
lvalue
: IDENTIFIER
| lvalue LEFTSQBRACKET expression RIGHTSQBRACKET
| LEFTBRACKET lvalue RIGHTBRACKET
binaryop
: PLUS
| MINUS
| MUL
| DIVISION
| DIV
| MOD
;
I manage to overcome most of the conflict using this grammar but i still get the conflict i mention above
binaryop
: expression PLUS expression
| expression MINUS expression
| expression MUL expression
| expression DIVISION expression
| expression DIV expression
| expression MOD expression
;
Why do you have both value and expression? Without seeing the rest of the grammar, I hesitate to guess the use of expression which leads to that conflict, but my guess is that it has to do with the unnecessary unit production.
On the other hand, you will not be able to resolve precedences if you lump all operator terminals intobinaryop (unless all binary operators have the same precedence). So I'd suggest you find a standard expression grammar (such as in the bison manual or wikipedia) and use it as a base.

ANTLR4 - How do I get the token TYPE as the token text in ANTLR?

Say I have a grammar that has tokens like this:
AND : 'AND' | 'and' | '&&' | '&';
OR : 'OR' | 'or' | '||' | '|' ;
NOT : 'NOT' | 'not' | '~' | '!';
When I visualize the ParseTree using TreeViewer or print the tree using tree.toStringTree(), each node's text is the same as what was matched.
So if I parse "A and B or C", the two binary operators will be "and" / "or".
If I parse "A && B || C", they'll be "&&" / "||".
What I would LIKE is for them to always be "AND" / "OR / "NOT", regardless of what literal symbol was matched. Is this possible?
This is what the vocabulary is for. Use yourLexer.getVocabulary() or yourParser.getVocabulary() and then vocabulary.getSymbolicName(tokenType) for the text representation of the token type. If that returns an empty string try as second step vocabulary.getLiteralName(tokenType), which returns the text used to define the token.

Objective-C operator precedence of square brackets used as message expression/notation?

Does the Objective-C message expression (message notation) operator, which uses square brackets [], have the same precedence as the C operator for array subscripting, which also uses square brackets []?
I refer to this table of C operators.
Also, an analogous question applies to the Objective-C "dot syntax" operator for accessor method invocation compared to the C operator for "element selection by reference". Do they have the same precedence?
I searched for an hour for a straightforward, definitive answer to this basic question. Surprisingly, I did not find one. Hence, this question. Links welcome.
You have become confused because many explanations of grammars and precedence take the shortcut of saying that operators have precedence. They don't. It is productions in the grammar that have precedence, and they have precedence relative to other productions. It is only meaningful for two productions to have precedence relative to each other if the grammar is ambiguous (meaning it can produce two different parse trees for the same input), and if the ambiguity is resolved by specifying the precedence of one production over the other.
Let me explain with an example.
Here's a toy grammar:
expression =
| IDENTIFIER
| NUMBER
| expression '+' expression
| expression '*' expression
| expression '(' expression ')' // function call
| '(' expression ')' // grouping
| expression '[' expression ']' // array subscript
| '[' expression IDENTIFIER ':' expression ']' // message send
;
Now, consider parsing 1 + 2 * 3 with this grammar. There are two valid parse trees:
+ *
/ \ / \
1 * + 3
/ \ / \
2 3 1 2
By specifying that the * production has a higher precedence than the + production, we require the parser to produce the left tree instead of the right tree. Thus the idea of a precedence relationship between the + production and the * production makes sense: it has an effect on the parser's output.
Similarly, 1 + foo(3) has two parse trees:
+ ()
/ \ / \
1 () + 3
/ \ / \
foo 3 1 foo
So again the idea of a precedence relationship between the '+' production and the function call production makes sense. The case of 1 + foo[3] (which uses the subscript production in place of the function call production) is analogous, so it makes sense to specify a precedence relationship between the '+' production and the subscript production.
Now consider 1 + (2 * 3). The grammar can only produce one possible parse tree:
+
/ \
1 ( )
|
*
/ \
2 3
There is no need for a precedence relationship between the + production and the grouping production, because there is only one way to parse this input. It would be meaningless to specify that the grouping production has higher precedence than the + production, because there is no other parse tree that you could produce by doing so.
Finally, consider 1 + [2 add:3]. This is analogous to the grouping example. There is only one possible parse tree:
+
/ \
/ \
1 [ ]
/ | \
/ | \
2 add 3
No other parse tree is possible. There is no need to specify a precedence relationship between the + production and the message send production. Specifying a precedence relationship between them would have no effect, because the grammar simply doesn't allow this input to be parsed any other way.
They are in the same precedence group. I believe the message send [] is equivalent to () because the runtime treats them as parenthesis in the case of messages.
http://www.techotopia.com/index.php/Objective-C_2.0_Operator_Precedence