NullPointerException with ANTLR text attribute - antlr

I have a problem that I've been stuck on for a while and I would appreciate some help if possible.
I have a few rules in an ANTLR tree grammar:
block
: compoundstatement
| ^(VAR declarations) compoundstatement
;
declarations
: (^(t=type idlist))+
;
idlist
: IDENTIFIER+
;
type
: REAL
| i=INTEGER
;
I have written a Java class VarTable that I will insert all of my variables into as they are declared at the beginning of my source file. The table will also hold their variable types (ie real or integer). I'll also be able to use this variable table to check for undeclared variables or duplicate declarations etc.
So basically I want to be able to send the variable type down from the 'declarations' rule to the 'idlist' rule and then loop through every identifier in the idlist rule, adding them to my variable table one by one.
The major problem I'm getting is that I get a NullPointerException when I try and access the 'text' attribute if the $t variable in the 'declarations' rule (This is one one which refers to the type).
And yet if I try and access the 'text' attribute of the $i variable in the 'type' rule, there's no problem.
I have looked at the place in the Java file where the NullPointerException is being generated and it still makes no sense to me.
Is it a problem with the fact that there could be multiple types because the rule is
(^(typeidlist))+
??
I have the same issue when I get down to the idlist rule, becasue I'm unsure how I can write an action that will allow me to loop through all of the IDENTIFIER Tokens found.
Grateful for any help or comments.
Cheers

You can't reference the attributes from production rules like you tried inside tree grammars, only in parser (or combined) grammars (they're different objects!). Note that INTEGER is not a production rule, just a "simple" token (terminal). That's why you can invoke its .text attribute.
So, if you want to get a hold the text of the type rule in your tree grammar and print it in your declarations rule, your could do something like this:
tree grammar T;
...
declarations
: (^(t=type idlist {System.out.println($t.returnValue);}))+
;
...
type returns [String returnValue]
: i=INTEGER {returnValue = "[" + $i.text + "]";}
;
...
But if you really want to do it without specifying a return object, you could do something like this:
declarations
: (^(t=type idlist {System.out.println($t.start.getText());}))+
;
Note that type returns an instance of a TreeRuleReturnScope which has an attribute called start which in its turn is a CommonTree instance. You could then call getText() on that CommonTree instance.

Related

Substitution in grammar action code throwing bizarre "P6opaque" error

I have this action which overrides an action in another action class:
method taskwiki-prefix($/ is copy) {
my $prefix = $/.Str;
$prefix ~~ s:g!'|'!!;
make $prefix;
}
The substitution throws this error:
P6opaque: no such attribute '$!made' on type Match in a List when trying to bind a value
If I comment out the substitution, the error goes away. dd $prefix shows:
Str $prefix = " Tasks ||"
So it's just a plain string.
If I remove the :g adverb, no more error, but doing that makes the made value Nil and nothing shows up in the output for $<taskwiki-prefix>.made.
Looks to me like there is some bad interaction going on with the matches in the substitution and the action, if I were to guess.
Any fix?
This is another case of your previous question, Raku grammar action throwing "Cannot bind attributes in a Nil type object. Did you forget a '.new'?" error when using "make". As there, the make function wants to update the $/ currently in scope.
Substitutions update $/, and:
In the case of the :g adverb, a List ends up in $/, and make gets confused. I've proposed an improved error.
In the case of no :g, there is a Match in $/ and it is attached to that - however, it's no longer the Match object that was passed into the method
I recommend to:
Always have the signature of your action methods be ($/), so there's no confusion about the target of make.
When possible, avoid reparsing (which was the achieved solution mentioned in your own answer).
If you can't avoid doing other matches or substitutions in your action method, put them in a sub or private method instead and then call it.
Problem was solved by changing the the grammar to give me a cleaner output so I did not have to manipulate the $/ variable.

Xtext typesafe variable qualifier

I have an xtext grammar for a modelling language that has multiple types of variables. In some cases I want to delimit the type a variable can have.
The current workflow is to just use a VariableQualifier (like in the grammar below) and use a validator to only permit the type I want. Then every time I access the reference I have to explicitly cast it.
Is there a better solution?
VariableReference:
ref=[Variable]
;
VariableQualifier:
(namespace+=NamespaceReference '.')* element=VariableReference
;
EnumerationReference:
ref=[Enumeration]
;
EnumerationQualifier:
(namespace+=NamespaceReference '.')* element=EnumerationReference
;
NamespaceReference:
ref=[Namespace]
;
One general pattern for this sort of problem is to have one generic reference syntactically that points to the abstract super type for all possible targets (common super type of Variable|Enumeration|Namespace).
E.g.:
VariableReference:
ref=[AbstractElement] ({VariableReference.parent=current} '.' ref=[AbstractElement])*;
Also note that modeling and referencing namespaces is often not really needed. You can instead use fully qualified names of things.
E.g.
VariableReference:
ref=[AbstractElement|QualifiedName]

What scope does ":my $foo" have and what is it used for?

With a regex, token or rule, its possible to define a variable like so;
token directive {
:my $foo = "in command";
<command> <subject> <value>?
}
There is nothing about it in the language documentation here, and very little in S05 - Regexes and Rules, to quote;
Any grammar regex is really just a kind of method, and you may declare variables in such a routine using a colon followed by any scope declarator parsed by the Perl 6 grammar, including my, our, state, and constant. (As quasi declarators, temp and let are also recognized.) A single statement (up through a terminating semicolon or line-final closing brace) is parsed as normal Perl 6 code:
token prove-nondeterministic-parsing {
:my $threshold = rand;
'maybe' \s+ <it($threshold)>
}
I get that regexen within grammars are very similar to methods in classes; I get that you can start a block anywhere within a rule and if parsing successfully gets to that point, the block will be executed - but I don't understand what on earth this thing is for.
Can someone clearly define what it's scope is; explain what need it fulfills and give the typical use case?
What scope does :my $foo; have?
:my $foo ...; has the lexical scope of the rule/token/regex in which it appears.
(And :my $*foo ...; -- note the extra * signifying a dynamic variable -- has both the lexical and dynamic scope of the rule/token/regex in which it appears.)
What this is used for
Here's what happens without this construct:
regex scope-too-small { # Opening `{` opens a regex lexical scope.
{ my $foo = / bar / } # Block with its own inner lexical scope.
$foo # ERROR: Variable '$foo' is not declared
}
grammar scope-too-large { # Opening `{` opens lexical scope for gramamr.
my $foo = / bar / ;
regex r1 { ... } # `$foo` is recognized inside `r1`...
...
regex r999 { ... } # ...but also inside r999
}
So the : ... ; syntax is used to get exactly the desired scope -- neither too broad nor too narrow.
Typical use cases
This feature is typically used in large or complex grammars to avoid lax scoping (which breeds bugs).
For a suitable example of precise lexical only scoping see the declaration and use of #extra_tweaks in token babble as defined in a current snapshot of Rakudo's Grammar.nqp source code.
P6 supports action objects. These are classes with methods corresponding one-to-one with the rules in a grammar. Whenever a rule matches, it calls its corresponding action method. Dynamic variables provide precisely the right scoping for declaring variables that are scoped to the block (method, rule, etc.) they're declared in both lexically and dynamically -- which latter means they're available in the corresponding action method too. For an example of this, see the declaration of #*nibbles in Rakudo's Grammar module and its use in Rakudo's Actions module.

Interpreting a variable number of tree nodes in ANTLR Tree Grammar

Whilst creating an inline ANTLR Tree Grammar interpreter I have come across an issue regarding the multiplicity of procedure call arguments.
Consider the following (faulty) tree grammar definition.
procedureCallStatement
: ^(PROCEDURECALL procedureName=NAME arguments=expression*)
{
if(procedureName.equals("foo")) {
callFooMethod(arguments[0], arguments[1]);
}elseif(procedureName.equals("bar")) {
callBarMethod(arguments[0], arguments[1], arguments[2]);
}
}
;
My problem lies with the retrieval of the given arguments. If there would be a known quantity of expressions I would just assign the values coming out of these expressions to their own variable, e.g.:
procedureCallStatement
: ^(PROCEDURECALL procedureName=NAME argument1=expression argument2=expression)
{
...
}
;
This however is not the case.
Given a case like this, what is the recommendation on interpreting a variable number of tree nodes inline within the ANTLR Tree Grammar?
Use the += operator. To handle any number of arguments, including zero:
procedureCallStatement
: ^(PROCEDURECALL procedureName=NAME argument+=expression*)
{
...
}
;
See the tree construction documentation on the antlr website.
The above will change the type of the variable argument from typeof(expression) to a List (well, at least when you're generating Java code). Note that the list types are untyped, so it's just a plain list.
If you use multiple parameters with the same variable name, they will also create a list, for example:
twoParameterCall
: ^(PROCEDURECALL procedureName=NAME argument=expression argument=expression)
{
...
}
;

Am i forced to use %glr-parser?

I have been keeping the shift/reduce errors away. Now finally i think i met my match.
Int[] a
a[0] = 1
The problem is int[] is defined as
Type OptSquareBrackets
while a[0] is defined as
Var | Var '[' expr ']'
Var and Type both are defined as VAR which is any valid variable [a-zA-Z][a-zA-Z0-9_]. Apart from adding a dummy token (such as **Decl** Type OptSquareBrackets instead) is there a way to write this to not have a conflict? From this one rule i get 1 shift/reduce and 1 reduce/reduce warning.
Could you define a new Token
VarLBracket [a-zA-Z][a-zA-Z0-9_]*\[
And therefore define declaration
Type | VarLBracket ']';
and define assignment target as
Var | VarLBracket expr ']';
Create a Lex rule with [] since [] is only used in declaration and everywhere else would use [var]
Technically, this problem stems from trying to tie the grammar to a semantic meaning that doesn't actually differ in syntax.
ISTM that you just need a single grammar construct that describes both types and expressions. Make the distinction in code and not in the grammar, especially if there is not actually a syntactic difference. Yacc is called a compiler generator but it is not the least bit true. It just makes parsers.
Having said that, recognizing [] as a terminal symbol might be an easier way to fix the problem and get on with things. Yacc isn't very good at ambiguous grammars and it needs to make early decisions on which path to follow.