“list label” operator not working for a set of alternatives - antlr

It seems the following rule will not work in antlr4
testSetLabel
: (flags+=( 'A' | 'B' | 'C' | 'D' ))* ;
It will give this error:
TestSetLabelParser.java:69: error: incompatible types
((TestSetLabelContext)_localctx).flags = _input.LT(1);
^
required: List<Token>
found: Token
If I change the rule to this:
testSetLabel2
: ( flags+= 'A' | flags+='B' | flags+='C' | flags+='D' )* ;
I get warning: 'Factor label out of set'
Is this a bug or expected behavior?

It sounds like a bug. The = operator works, as in the following case.
flags=('A' | 'B' | 'C' | 'D')
The message you are seeing is only a performance suggestion, so I would use the working method for now and factor the label out of the set when ANTLR 4.1 comes out at the end of June.
Here is the issue report

Related

Antlr4 instruction keywords and longest-statement matching

I am attempting to write a grammar, but I've found a problem occurring that I'm not quite sure how to solve 'elegantly'.
The issue is that I have 'bro' as a reserved instruction keyword, and it can be followed(or not) by a predication statement. IE: 'bro_t' or 'bro'.
Now, the issue is that currently 'bro_t' matches the definition for ID, while 'bro' is a token by itself, and clearly 'bro_t' is longer than 'bro', so the parser matches that statement to an ID and the parse fails. The solutions that I have come up with are to make 'bro_t' and 'bro_f' reserved as well, but that would be relatively time consuming for the entire instruction set. The other solution that I was looking at was wildcard operators, but I don't really understand if they are applicable here and if so how to apply them.
Grammar:
predicate
: '_t' '<' register '>' | '_f' '<' register '>' | ;
operation
: 'bro' predicate ;
ID: ('a' .. 'z' | 'A' .. 'Z' | '_') ( 'a' .. 'z' | 'A' .. 'Z' | '0' .. '9' | '_' | '$' | '.')* ;
Why not do:
operation
: BRO '<' register '>'
;
BRO : 'bro' ( '_' [a-z]+ )?
ID : [a-zA-Z_] [a-zA-Z0-9_$.]*;
?

XText datatype definition and usage

I want to build an Editor for a language with different groups of variable types, but have problems with the generated content assistant.
Type:
'TYPE' ':' name=ID '(' type=[ANY] ')' ';'
;
ANY:
ANY_NUM | Type
;
ANY_NUM:
ANY_REAL | ANY_INT ...
;
ANY_REAL:
'real' | 'float'
;
ANY_INT:
'int' | 'sint' | 'lint'
;
The idea is, that specific types are not allowed everywhere, so I want to use type=(ANY_REAL) for example in some cases. The generated content assistant does not show anything here, so I want to know if this is the correct approach to specify variable types and groups.
OK. The answer is quite simple. Each Variable type has to be defined within an enum (EnumRule), the structure itself is a simple type reference (ParserRule):
TR_Any:
TR_AnyDerived | TR_AnyElementary
;
TR_AnyDerived:
...
;
TR_AnyElementary:
TR_AnyReal | TR_AnyInt |...
;
TR_AnyReal:
type = E_AnyReal
;
TR_AnyInt:
type = E_AnyInt
;
enum E_AnyReal:
FLOAT = "float" |
DOUBLE = "double" |
...
;
enum E_AnyInt:
INT = "int"
;
The types can be referenced as described in the xtext documentation:
MyRule:
anyvar = [TR_Any]
intvar = [TR_Int]
;

Lvalue awareness in ANTLR grammar and syntax predicates

I am implementing a parser with ANTLR for D. This language is based on C so there are some ambiguity around the declarations and the expressions. Consider this:
a* b = c; // This is a declaration of the variable d with a pointer-to-a type.
c = a * b; // as an expression is a multiplication.
As the second example could only appear on the right of an assignment expression I tried to resolve this problem with the following snippet:
expression
: left = assignOrConditional
(',' right = assignOrConditional)*
;
assignOrConditional
: ( postfixExpression ('=' | '+=' | '-=' | '*=' | '/=' | '%=' | '&=' | '|=' | '^=' | '~=' | '<<=' | '>>=' | '>>>=' | '^^=') )=> assignExpression
| conditionalExpression
;
assignExpression
: left = postfixExpression
( op = ('=' | '+=' | '-=' | '*=' | '/=' | '%=' | '&=' | '|=' | '^=' | '~=' | '<<=' | '>>=' | '>>>=' | '^^=')
right = assignOrExpression
)?
;
conditionalExpression
: left = logicalOrExpression
('?' e1 = conditionalExpression ':' e2 = conditionalExpression)?
;
As far as my understanding goes, this should do the trick to avoid the ambiguity but the tests are failing. If I feed the interpreter with any input, starting with the rule assignOrConditional, it will fail with NoViableAltException.
the inputs were
a = b
b-=c
d
Maybe I'm misunderstanding how the predicates are working therefore it would be great if someone could correct my explanation to the code: If the input can be read as a postfixExpression it will check if the next token after the postfixExpression is one of the assignment operators and if it is, it will parse the rule as an assignmentExpression. (Note, that the assignmentExpression and the conditionalExpression works well). If the next token isn't of them, it tries to parse it as a conditionalExpression.
EDIT
[solved] Now, there's an other problem with this solution that I could realize: the assignmentExpression has to choose in it's right hand expression is an assignment again (that is, postfix and assignment operator follows), if it is chained up.
Any idea what's wrong with my understanding?
If I feed the interpreter with any input, ...
Don't use ANTLRWorks' interpreter: it is buggy, and disregards any type of predicate. Use its debugger: it works flawlessly.
If the input can be read as a postfixExpression it will check if the next token after the postfixExpression is one of the assignment operators and if it is, it will parse the rule as an assignmentExpression.
You are correct.
EDIT [solved] Now, there's an other problem with this solution that I could realize: the assignmentExpression has to choose in it's right hand expression is an assignment again (that is, postfix and assignment operator follows), if it is chained up.
What's wrong with that?

Unambiguous grammar for arithmetic expression with Unary + and -

I have just started self-studying the Dragon book of Compiler Design. I am working on a problem that says to design grammar for an expression containing binary +,-,*,/ and unary +,-
I came up with following
E -> E+T | E-T | T
T -> T*P | T/P | P
P -> +S | -S | S
S -> id | constant | (E)
However, there is an obvious flaw in it. According to this grammar, expressions like
1--3
are valid, which is an error in all programming languages I know. Though, expressions like
1+-+3
and
1- -3
must be valid. How can such a grammar be designed?
I believe your problem is with tokenization. You are identifying 1--3 as an error because you think it should be resolved as 1 --3 rather than 1 - -3, the latter being perfectly valid. So I think you problem comes because when you tokenize the string you are getting:
['1', '-', '-' , '3']
rather than:
['1', '--', '3']
I think you have one extra production rule
P -> +S | -S | S
S -> id | constant | (E)
can be shrinked to
P -> +P | -P | id | constant | (E)
With such grammar you will succesfully match exp "1+-+3" as valid.
You have tokenizer(scanner) problem! before passing token to parser You must differentiate between "-" and "--". you must define a token struct which contains a token type and value, then parse the token list.
Also the rule P->--S must be added to the production rules!!

ANTLR 3.x - How to format rewrite rules

I'm finding myself challenged on how to properly format rewrite rules when certain conditions occur in the original rule.
What is the appropriate way to rewrite this:
unaryExpression: op=('!' | '-') t=term
-> ^(UNARY_EXPR $op $t)
Antlr doesn't seem to like me branding anything in parenthesis with a label and "op=" fails. Also, I've tried:
unaryExpression: ('!' | '-') t=term
-> ^(UNARY_EXPR ('!' | '-') $t)
Antlr doesn't like the or '|' and throws a grammar error.
Replacing the character class with a token name does solve this problem, however it creates a quagmire of other issues with my grammar.
--- edit ----
A second problem has been added. Please help me format this rule with tree grammar:
multExpression
: unaryExpression (MULT_OP unaryExpression)*
;
Pretty simple: My expectation is to enclose every matched token in a parent (imaginary) token MULT so that I end up with something like:
MULT
o
|
o---o---o---o---o
| | | | |
'3' '*' '6' '%' 2
unaryExpression
: (op='!' | op='-') term
-> ^(UNARY_EXPR[$op] $op term)
;
I used the UNARY_EXPR[$op] so the root node gets some useful line/column information instead of defaulting to -1.