Antlr tokens with multiple assignments

Antlr tokens with multiple assignments - antlr

I have a token I want to have 2 assignments be valid and im trying to figure out the best way to do it.
For example I have
TOSTRING = 'tostring'
But I also want 'toString' to be valid like so:
TOSTRING = 'toString'
What is the best way to achieve this?
EDIT:
I want to have it output to the *.token file as
TOSTRING=9
'toString'=9
'tostring'=9
my code that uses the language uses this structure and putting TOSTRING='tostring' in the token{ } section generates this. Even lexer rules with a single assignment does this. But when I have multiple assignments tokens are not make for 'toString' or 'tostring'

In general, don't use the tokens section as you lose some control of the lexer. Always use real lexer rules. The tokens section just automatically adds lexer rules anyway. There is no difference except you start to run in to the limitations when you want more than just a simple string.
If you want case independence, then see the article here:
How do I get Case independence?
But implement it via the override of LA() (described there) and not the 'A'|'a' methods, which will generate lots of code you don't need. If it is JUST this camel case then:
TOSTRING
: 'to' ('s' | 'S') 'tring'
;

The fastest way is to define the lexer rule TOSTRING to accept both:
TOSTRING
: 'tostring' //alternative #1, lower-case 's'
| 'toString' //alternative #2, upper-case 'S'
;
or the equivalent:
TOSTRING
: 'to' ('s' | 'S') 'tring'
;

Related

How do I replace a missing optional token with a default value?

Given the rules:
list: 'LIST' code?;
code: 'CODE' CODE_VALUE;
CODE_VALUE: [0-9]+;
How can I set a default value for optional token code if not present?
The only similar question I could found is How do I replace a missing optional token with a default?, but it is 10 years old and tagged Antlr3.
Thanks.

In ANTLR v3 there was a possibility to rewrite the parse tree into an abstract (syntax) tree. While doing so, one had the ability to "inject" tokens during this process. Since ANTLR v4 does not have this tree rewriting ability*, the answer to your question is: you cannot set a default value/token. My question to you: why do you need this? What use-case is there to inject a token?
* theantlrguy.atlassian.net/wiki

This is a semantic task, not a syntactic one and parser are made to handle syntax. For any additional handling, which goes beyond the language syntax use the semantic phase after the parse phase, where you check for things like duplicate names, wrong types etc.

Xtext typesafe variable qualifier

I have an xtext grammar for a modelling language that has multiple types of variables. In some cases I want to delimit the type a variable can have.
The current workflow is to just use a VariableQualifier (like in the grammar below) and use a validator to only permit the type I want. Then every time I access the reference I have to explicitly cast it.
Is there a better solution?
VariableReference:
ref=[Variable]
;
VariableQualifier:
(namespace+=NamespaceReference '.')* element=VariableReference
;
EnumerationReference:
ref=[Enumeration]
;
EnumerationQualifier:
(namespace+=NamespaceReference '.')* element=EnumerationReference
;
NamespaceReference:
ref=[Namespace]
;

One general pattern for this sort of problem is to have one generic reference syntactically that points to the abstract super type for all possible targets (common super type of Variable|Enumeration|Namespace).
E.g.:
VariableReference:
ref=[AbstractElement] ({VariableReference.parent=current} '.' ref=[AbstractElement])*;
Also note that modeling and referencing namespaces is often not really needed. You can instead use fully qualified names of things.
E.g.
VariableReference:
ref=[AbstractElement|QualifiedName]

Is "Implicit token definition in parser rule" something to worry about?

I'm creating my first grammar with ANTLR and ANTLRWorks 2. I have mostly finished the grammar itself (it recognizes the code written in the described language and builds correct parse trees), but I haven't started anything beyond that.
What worries me is that every first occurrence of a token in a parser rule is underlined with a yellow squiggle saying "Implicit token definition in parser rule".
For example, in this rule, the 'var' has that squiggle:
variableDeclaration: 'var' IDENTIFIER ('=' expression)?;
How it looks exactly:
The odd thing is that ANTLR itself doesn't seem to mind these rules (when doing test rig test, I can't see any of these warning in the parser generator output, just something about incorrect Java version being installed on my machine), so it's just ANTLRWorks complaining.
Is it something to worry about or should I ignore these warnings? Should I declare all the tokens explicitly in lexer rules? Most exaples in the official bible The Defintive ANTLR Reference seem to be done exactly the way I write the code.

I highly recommend correcting all instances of this warning in code of any importance.
This warning was created (by me actually) to alert you to situations like the following:
shiftExpr : ID (('<<' | '>>') ID)?;
Since ANTLR 4 encourages action code be written in separate files in the target language instead of embedding them directly in the grammar, it's important to be able to distinguish between << and >>. If tokens were not explicitly created for these operators, they will be assigned arbitrary types and no named constants will be available for referencing them.
This warning also helps avoid the following problematic situations:
A parser rule contains a misspelled token reference. Without the warning, this could lead to silent creation of an additional token that may never be matched.
A parser rule contains an unintentional token reference, such as the following:
number : zero | INTEGER;
zero : '0'; // <-- this implicit definition causes 0 to get its own token

If you're writing lexer grammar which wouldn't be used across multiple parser grammmar(s) then you can ignore this warning shown by ANTLRWorks2.

NullPointerException with ANTLR text attribute

I have a problem that I've been stuck on for a while and I would appreciate some help if possible.
I have a few rules in an ANTLR tree grammar:
block
: compoundstatement
| ^(VAR declarations) compoundstatement
;
declarations
: (^(t=type idlist))+
;
idlist
: IDENTIFIER+
;
type
: REAL
| i=INTEGER
;
I have written a Java class VarTable that I will insert all of my variables into as they are declared at the beginning of my source file. The table will also hold their variable types (ie real or integer). I'll also be able to use this variable table to check for undeclared variables or duplicate declarations etc.
So basically I want to be able to send the variable type down from the 'declarations' rule to the 'idlist' rule and then loop through every identifier in the idlist rule, adding them to my variable table one by one.
The major problem I'm getting is that I get a NullPointerException when I try and access the 'text' attribute if the $t variable in the 'declarations' rule (This is one one which refers to the type).
And yet if I try and access the 'text' attribute of the $i variable in the 'type' rule, there's no problem.
I have looked at the place in the Java file where the NullPointerException is being generated and it still makes no sense to me.
Is it a problem with the fact that there could be multiple types because the rule is
(^(typeidlist))+
??
I have the same issue when I get down to the idlist rule, becasue I'm unsure how I can write an action that will allow me to loop through all of the IDENTIFIER Tokens found.
Grateful for any help or comments.
Cheers

You can't reference the attributes from production rules like you tried inside tree grammars, only in parser (or combined) grammars (they're different objects!). Note that INTEGER is not a production rule, just a "simple" token (terminal). That's why you can invoke its .text attribute.
So, if you want to get a hold the text of the type rule in your tree grammar and print it in your declarations rule, your could do something like this:
tree grammar T;
...
declarations
: (^(t=type idlist {System.out.println($t.returnValue);}))+
;
...
type returns [String returnValue]
: i=INTEGER {returnValue = "[" + $i.text + "]";}
;
...
But if you really want to do it without specifying a return object, you could do something like this:
declarations
: (^(t=type idlist {System.out.println($t.start.getText());}))+
;
Note that type returns an instance of a TreeRuleReturnScope which has an attribute called start which in its turn is a CommonTree instance. You could then call getText() on that CommonTree instance.

Am i forced to use %glr-parser?

I have been keeping the shift/reduce errors away. Now finally i think i met my match.
Int[] a
a[0] = 1
The problem is int[] is defined as
Type OptSquareBrackets
while a[0] is defined as
Var | Var '[' expr ']'
Var and Type both are defined as VAR which is any valid variable [a-zA-Z][a-zA-Z0-9_]. Apart from adding a dummy token (such as **Decl** Type OptSquareBrackets instead) is there a way to write this to not have a conflict? From this one rule i get 1 shift/reduce and 1 reduce/reduce warning.

Could you define a new Token
VarLBracket [a-zA-Z][a-zA-Z0-9_]*\[
And therefore define declaration
Type | VarLBracket ']';
and define assignment target as
Var | VarLBracket expr ']';

Create a Lex rule with [] since [] is only used in declaration and everywhere else would use [var]

Technically, this problem stems from trying to tie the grammar to a semantic meaning that doesn't actually differ in syntax.
ISTM that you just need a single grammar construct that describes both types and expressions. Make the distinction in code and not in the grammar, especially if there is not actually a syntactic difference. Yacc is called a compiler generator but it is not the least bit true. It just makes parsers.
Having said that, recognizing [] as a terminal symbol might be an easier way to fix the problem and get on with things. Yacc isn't very good at ambiguous grammars and it needs to make early decisions on which path to follow.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Antlr tokens with multiple assignments - antlr

The fastest way is to define the lexer rule TOSTRING to accept both: TOSTRING : 'tostring' //alternative #1, lower-case 's' | 'toString' //alternative #2, upper-case 'S' ; or the equivalent: TOSTRING : 'to' ('s' | 'S') 'tring' ;

Related

How do I replace a missing optional token with a default value?

Xtext typesafe variable qualifier

Is "Implicit token definition in parser rule" something to worry about?

NullPointerException with ANTLR text attribute

Am i forced to use %glr-parser?

Categories

Resources