antlr4: how to know which alternative is chosen given a context - antlr

Assume there is a rule about 'type'. It is either a predefined type (referred by IDENTIFIER) or a typeDescriptor.
type
: IDENTIFIER
| typeDescriptor
;
In my program, I have got an instance of typeContext 'ctx'. How do I know if the path IDENTIFIER is chosen, or typeDescriptor is chosen.
I recognise one way which is to test ctx.IDENTIFIER() == null and ctx.typeDescriptor() == null. But it seems not working very well when there are a lot more alternatives. Is there a way to return an index to indicate which rule is chosen? Thanks.

No, you can either use the method you described (checking if an item is non-null), or you can label the outer alternatives of the rule using the # operator.
type
: IDENTIFIER # someType
| typeDescriptor # someOtherType
;
When you label the outer alternatives, it will produce ParserRuleContext classes for each of the labels. In the example above, you'll either get a SomeTypeContext or a SomeOtherTypeContext, which applies equally to the generated listener and visitor interfaces.

Related

How to implement type checking in a Listener

My script grammar contains the following:
if_statement
: IF condition_block (ELSE IF condition_block)* (ELSE statement_block)?
;
condition_block
: expression statement_block
;
expression
: expression op=(LTEQ | GTEQ | LT | GT) expression #relationalExpression
| expression op=(EQ | NEQ) expression #equalityExpression
| expression AND expression #andExpression
| expression OR expression #orExpression
| atom #atomExpression
;
atom
: OPAR expression CPAR #parenExpression
| INT #numberAtom
| (TRUE | FALSE) #booleanAtom
| STRING #stringAtom
;
What I would like to do, is to make sure that the user doesn't compare e.g. an INT to a STRING.
I use a Listener to provide errors to the user when they create a script. So what I want to do is something like
public override void EnterRelationalExpression([NotNull] ScriptEvaluatorParser.RelationalExpressionContext context)
{
<..compare context.expression(0) to context.expression(1) here
and add an error if not the same base type...>
base.EnterRelationalExpression(context);
}
Doing this in a Visitor is easy
object left = Visit(context.expression(0)
object right = Visit(context.expression(1)
<...compare types...>
But how do I do the same in the Listener? I can new up a Visitor and do it that way, but I was wondering if there is a better way to do the check without having to new up a Visitor.
I’ve done this before by adding a type stack to my listener.
I use the exit*() listener hooks (you can’t really have any useful information about children in the enter*() methods, as the children have not been visited.
As an expression is exited, I can determine the type directly, if it’s a simple type (or looking it’s type up in a symbol table if it’s an identifier). Then push the type on the type stack. For expressions like you equalityExpression, I pop the top two items from the type stack and check their compatibility (of course, it then pushes a boolean type on the type stack.
For and and or expressions, just pop the top two items, ensure they’re boolean and then push boolean.
This does depend on having a symbol table available to resolve identifier types, and is a bit of a work-around for listeners not returning values, but it has worked well for me. I like the visitor handling the navigation and ensuring all nodes are visited. But, as Bart mentions, if you’re comfortable with using visitors to accomplish this, there’s not really one way that’s “better” than another.
You can also look into adding locals to your rules to hold that resulting type. This avoids the need for a type stack, and the management of that stack, but makes your grammar target language specific (which I like to avoid). You’d still need to leverage the exit*() methods since children would have to be visited before the locals were populated (BTW, locals are just a way of adding additional fields to the ParseTreeContext for nodes.)

Just check for existence in ALFA target clause

I want write a target clause that says "If a certain attribute is set (oneAndOnly), then the policy applies". I have seen the [mustbepresent] thing, however, it always requires a comparator (like ==).
This was my approch, but the syntax checker complains...
policy reportPolicies {
target clause stringBagSize(my.company.person.doctor.id)==1
}
I've seen you defining a string attribute "resourceType" but I don't like to define such a meta attribute. I'd rather like to check for existence of certain attributes.
Again, great questions. Yes I often use an artificial attribute e.g. resourceType and compare it to values e.g. medical record or transaction. You do not have to do that because the attribute identifiers themselves convey the fact that you are dealing with one or another. However, I do think that it helps the policy be more readable.
On to the other issue: how to make sure an attribute has at least one value. In a Target element you can use the mustBePresent tag but I do not like it. If the attribute has no value, then the PDP returns Indeterminate and it short-circuits evaluation.
An alternative is to compare an attribute using > (greater than). For instance:
clause user.role > ""
clause user.age>0
That will force the value to be defined.
The cleaner way to do this, though, is to use bag functions inside a condition. For instance
condition stringBagSize(user.role)>0 // True if the user has at least one role

a middle approach between Dynamic Typing and Static Typing

I wanted to ask if anyone knows of a programming language where there is dynamic typing but the binding between a name and a type is permanent. Static typing guards your code from assigning a wrong value into a variable, but forces you to declare(and know) the type before compilation. Dynamic typing allows you to assign values with a different type to the same variable one after the other. What I was thinking is, it would be nice to have dynamic typing, but once the variable is bound, the first binding also determines the type of the variable.
For example, using python-like syntax, if I write by mistake:
persons = []
....
adam = Person("adam")
persons = adam #(instead of persons += [adam])
Then I want to get an error(either at runtime or during compilation if possible) because name was defined as a list, and cannot accept values of type Person.
Same thing if the type can not be resolved statically:
result = getData()
...
result = 10
Will generate a runtime error iff getData() did not return an integer.
I know you can hack a similar behavior with a wrapper class but it would be nice to have the option by default in the language as I don't see a good legitimate use for this flexibility in dynamic languages(except for inheritance, or overwriting a common default value such as null/None which could be permitted as special cases).

How to recognize non reserved keywords in yacc? [duplicate]

I am using Flex & bison on Linux. I have have the following set up:
// tokens
CREATE { return token::CREATE;}
SCHEMA { return token::SCHEMA; }
RECORD { return token::RECORD;}
[_a-zA-Z0-9][_a-zA-Z0-9]* { yylval->strval = strdup(yytext); return TOKEN::NAME;}
...
// rules
CREATE SCHEMA NAME ...
CREATE RECORD NAME ...
...
Everything worked just fine. But if users enter: "create schema record ..." (where 'record' is the name of the schema to be created), Flex will report an error since it matches 'record' as a token and it is looking for the rule "CREATE SCHEMA RECORD". I understand that keywords can be escaped, but that makes user experiences awkward. My question is:
"How can I design the above rules so that it accepts 'create schema record ...' and matches this input to 'CREATE SCHEMA NAME ...'?"
Thanks!
"Semi-reserved" words are common in languages which have a lot of reserved words. (Even modern C++ has a couple of these: override and final.) But they create some difficulties for traditional scanners, which generally assume that a keyword is a keyword.
The lemon parser generator, which not coincidentally was designed for parsing SQL, has a useful "fallback" feature, where a token which is not valid in context can be substituted by another token (without changing the semantic value). Unfortunately, bison does not implement this feature, and nor does any other parser generator I know of. However, in many cases it is possible to implement the feature in Bison grammars. For example, in the simple case presented here, we can substitute:
create_statement: CREATE RECORD NAME ...
| CREATE SCHEMA NAME ...
with:
create_statement: CREATE RECORD name
| CREATE SCHEMA name
name: NAME
| CREATE
| RECORD
| SCHEMA
| ...
Obviously, care needs to be taken that the (semi-)keywords in the list of alternatives for name are not valid in the context in which name is used. This may require the definition of a variety of name productions, valid for different contexts. (This is where lemon-style fallbacks are more convenient.)
If you do this, it is important that the semantic values of the keywords be correctly set up, either by the scanner or by the reduction rule of the name non-terminal. If there is only one name non-terminal, it is probably more efficient to do it in the reduction actions (because it avoids unnecessary allocation and deallocation of strings, where the deallocation will complicate the other grammar rules in which the keywords appear), so that the name rule would actually look like this:
name: NAME
| CREATE { $$ = strdup("CREATE"); }
| RECORD { $$ = strdup("RECORD"); }
| SCHEMA { $$ = strdup("SCHEMA"); }
| ...
There are, of course, many other possible ways to deal with the semantic value issue.
You shouldn't do this, for the same reason you can't have a variable in C++ named for, while, or class. But if you really want to, look into Start Conditions (it'll be messy).

NullPointerException with ANTLR text attribute

I have a problem that I've been stuck on for a while and I would appreciate some help if possible.
I have a few rules in an ANTLR tree grammar:
block
: compoundstatement
| ^(VAR declarations) compoundstatement
;
declarations
: (^(t=type idlist))+
;
idlist
: IDENTIFIER+
;
type
: REAL
| i=INTEGER
;
I have written a Java class VarTable that I will insert all of my variables into as they are declared at the beginning of my source file. The table will also hold their variable types (ie real or integer). I'll also be able to use this variable table to check for undeclared variables or duplicate declarations etc.
So basically I want to be able to send the variable type down from the 'declarations' rule to the 'idlist' rule and then loop through every identifier in the idlist rule, adding them to my variable table one by one.
The major problem I'm getting is that I get a NullPointerException when I try and access the 'text' attribute if the $t variable in the 'declarations' rule (This is one one which refers to the type).
And yet if I try and access the 'text' attribute of the $i variable in the 'type' rule, there's no problem.
I have looked at the place in the Java file where the NullPointerException is being generated and it still makes no sense to me.
Is it a problem with the fact that there could be multiple types because the rule is
(^(typeidlist))+
??
I have the same issue when I get down to the idlist rule, becasue I'm unsure how I can write an action that will allow me to loop through all of the IDENTIFIER Tokens found.
Grateful for any help or comments.
Cheers
You can't reference the attributes from production rules like you tried inside tree grammars, only in parser (or combined) grammars (they're different objects!). Note that INTEGER is not a production rule, just a "simple" token (terminal). That's why you can invoke its .text attribute.
So, if you want to get a hold the text of the type rule in your tree grammar and print it in your declarations rule, your could do something like this:
tree grammar T;
...
declarations
: (^(t=type idlist {System.out.println($t.returnValue);}))+
;
...
type returns [String returnValue]
: i=INTEGER {returnValue = "[" + $i.text + "]";}
;
...
But if you really want to do it without specifying a return object, you could do something like this:
declarations
: (^(t=type idlist {System.out.println($t.start.getText());}))+
;
Note that type returns an instance of a TreeRuleReturnScope which has an attribute called start which in its turn is a CommonTree instance. You could then call getText() on that CommonTree instance.