How to implement type checking in a Listener - antlr

My script grammar contains the following:
if_statement
: IF condition_block (ELSE IF condition_block)* (ELSE statement_block)?
;
condition_block
: expression statement_block
;
expression
: expression op=(LTEQ | GTEQ | LT | GT) expression #relationalExpression
| expression op=(EQ | NEQ) expression #equalityExpression
| expression AND expression #andExpression
| expression OR expression #orExpression
| atom #atomExpression
;
atom
: OPAR expression CPAR #parenExpression
| INT #numberAtom
| (TRUE | FALSE) #booleanAtom
| STRING #stringAtom
;
What I would like to do, is to make sure that the user doesn't compare e.g. an INT to a STRING.
I use a Listener to provide errors to the user when they create a script. So what I want to do is something like
public override void EnterRelationalExpression([NotNull] ScriptEvaluatorParser.RelationalExpressionContext context)
{
<..compare context.expression(0) to context.expression(1) here
and add an error if not the same base type...>
base.EnterRelationalExpression(context);
}
Doing this in a Visitor is easy
object left = Visit(context.expression(0)
object right = Visit(context.expression(1)
<...compare types...>
But how do I do the same in the Listener? I can new up a Visitor and do it that way, but I was wondering if there is a better way to do the check without having to new up a Visitor.

I’ve done this before by adding a type stack to my listener.
I use the exit*() listener hooks (you can’t really have any useful information about children in the enter*() methods, as the children have not been visited.
As an expression is exited, I can determine the type directly, if it’s a simple type (or looking it’s type up in a symbol table if it’s an identifier). Then push the type on the type stack. For expressions like you equalityExpression, I pop the top two items from the type stack and check their compatibility (of course, it then pushes a boolean type on the type stack.
For and and or expressions, just pop the top two items, ensure they’re boolean and then push boolean.
This does depend on having a symbol table available to resolve identifier types, and is a bit of a work-around for listeners not returning values, but it has worked well for me. I like the visitor handling the navigation and ensuring all nodes are visited. But, as Bart mentions, if you’re comfortable with using visitors to accomplish this, there’s not really one way that’s “better” than another.
You can also look into adding locals to your rules to hold that resulting type. This avoids the need for a type stack, and the management of that stack, but makes your grammar target language specific (which I like to avoid). You’d still need to leverage the exit*() methods since children would have to be visited before the locals were populated (BTW, locals are just a way of adding additional fields to the ParseTreeContext for nodes.)

Related

Elm Language What do the Multiple Types in a row (without the arrow) in a signature mean?

In the Elm language, I'm having a hard time explaining my question...
In these snippets in elm:
I understand the signature in something like
update : Msg -> Model -> Model
where the parameters / output is separated by arrows, but how do I read / grok things like:
Sub Msg
Program Never Model Msg
In:
main : Program Never Model Msg
main =
program
{ init = init
, view = view
, update = update
, subscriptions = subscriptions
}
subscriptions : Model -> Sub Msg
subscriptions model =
Sub.none
In a type signature, parameter types are separated by ->, with the last type being the return value.
If there are no -> symbols, then it means it is a value of that type. In your main example, the type of main is Program Never Model Msg. It has no arrows, so it takes no parameters.
Now, each parameter and the return value in the type annotation may have several things separated by spaces, as in your main example. The leftmost is the type, followed by type parameters separated by spaces.
Program Never Model Msg
| | | |
| ------|-----
type type parameters
A type parameter is similar to Generics in a language like C#. The equivalent syntax in C# would be:
void Program<Never, Model, Msg>()
C# doesn't directly correlate because it has a different way of constraining generic type parameters, but the general idea holds.
The Elm guide doesn't currently have a great deal of info, but here is the section talking about types.
Sub Msg, List Int, Program Never Model Msg
Sub, List and Program are type constructors. You can think about them as functions that take a type and return another type.
By themselves, Sub, List and Program are not complete type. They are like a puzzle missing a piece. When one adds the missing piece, the puzzle is complete.
I usually read them in my head using the word of as in a List of Ints, a Program of Never, Model and Msg.

How to recognize non reserved keywords in yacc? [duplicate]

I am using Flex & bison on Linux. I have have the following set up:
// tokens
CREATE { return token::CREATE;}
SCHEMA { return token::SCHEMA; }
RECORD { return token::RECORD;}
[_a-zA-Z0-9][_a-zA-Z0-9]* { yylval->strval = strdup(yytext); return TOKEN::NAME;}
...
// rules
CREATE SCHEMA NAME ...
CREATE RECORD NAME ...
...
Everything worked just fine. But if users enter: "create schema record ..." (where 'record' is the name of the schema to be created), Flex will report an error since it matches 'record' as a token and it is looking for the rule "CREATE SCHEMA RECORD". I understand that keywords can be escaped, but that makes user experiences awkward. My question is:
"How can I design the above rules so that it accepts 'create schema record ...' and matches this input to 'CREATE SCHEMA NAME ...'?"
Thanks!
"Semi-reserved" words are common in languages which have a lot of reserved words. (Even modern C++ has a couple of these: override and final.) But they create some difficulties for traditional scanners, which generally assume that a keyword is a keyword.
The lemon parser generator, which not coincidentally was designed for parsing SQL, has a useful "fallback" feature, where a token which is not valid in context can be substituted by another token (without changing the semantic value). Unfortunately, bison does not implement this feature, and nor does any other parser generator I know of. However, in many cases it is possible to implement the feature in Bison grammars. For example, in the simple case presented here, we can substitute:
create_statement: CREATE RECORD NAME ...
| CREATE SCHEMA NAME ...
with:
create_statement: CREATE RECORD name
| CREATE SCHEMA name
name: NAME
| CREATE
| RECORD
| SCHEMA
| ...
Obviously, care needs to be taken that the (semi-)keywords in the list of alternatives for name are not valid in the context in which name is used. This may require the definition of a variety of name productions, valid for different contexts. (This is where lemon-style fallbacks are more convenient.)
If you do this, it is important that the semantic values of the keywords be correctly set up, either by the scanner or by the reduction rule of the name non-terminal. If there is only one name non-terminal, it is probably more efficient to do it in the reduction actions (because it avoids unnecessary allocation and deallocation of strings, where the deallocation will complicate the other grammar rules in which the keywords appear), so that the name rule would actually look like this:
name: NAME
| CREATE { $$ = strdup("CREATE"); }
| RECORD { $$ = strdup("RECORD"); }
| SCHEMA { $$ = strdup("SCHEMA"); }
| ...
There are, of course, many other possible ways to deal with the semantic value issue.
You shouldn't do this, for the same reason you can't have a variable in C++ named for, while, or class. But if you really want to, look into Start Conditions (it'll be messy).

antlr4: how to know which alternative is chosen given a context

Assume there is a rule about 'type'. It is either a predefined type (referred by IDENTIFIER) or a typeDescriptor.
type
: IDENTIFIER
| typeDescriptor
;
In my program, I have got an instance of typeContext 'ctx'. How do I know if the path IDENTIFIER is chosen, or typeDescriptor is chosen.
I recognise one way which is to test ctx.IDENTIFIER() == null and ctx.typeDescriptor() == null. But it seems not working very well when there are a lot more alternatives. Is there a way to return an index to indicate which rule is chosen? Thanks.
No, you can either use the method you described (checking if an item is non-null), or you can label the outer alternatives of the rule using the # operator.
type
: IDENTIFIER # someType
| typeDescriptor # someOtherType
;
When you label the outer alternatives, it will produce ParserRuleContext classes for each of the labels. In the example above, you'll either get a SomeTypeContext or a SomeOtherTypeContext, which applies equally to the generated listener and visitor interfaces.

Get a name of a method parameter using Javassist

I have a CtMethod instance, but I don't know how to get names of parameters (not types) from it. I tried getParameterTypes, but it seems it returns only types.
I'm assuming it's possible, because libraries I'm using don't have sources, just class files and I can see names of method parameters in IDE.
It is indeed possible to retrieve arguments' names, but only if the code has been compiled with debug symbols otherwise you won't be able to do it.
To retrieve this information you have to access the method's local variable table. For further information about this data structure I suggest you to check section 4.7.13. The LocalVariableTable Attribute of the jvm spec. As I usually say, JVM spec may look bulky but it's an invaluable friend when you're working at this level!
Accessing the local variable table attribute of your ctmethod
CtMethod method = .....;
MethodInfo methodInfo = method.getMethodInfo();
LocalVariableAttribute table = methodInfo.getCodeAttribute().getAttribute(javassist.bytecode.LocalVariableAttribute.tag);
You now have the the local variable attribute selected in table variable.
Detecting the number of localVariables
int numberOfLocalVariables = table.tableLenght();
Now keep in mind two things regarding the number in numberOfLocalVariables:
1st: local variables defined inside your method's body will also be accounted in tableLength();
2nd: if you're in a non static method so will be this variable.
The order of your local variable table will be something like:
|this (if non static) | arg1 | arg2 | ... | argN | var1 | ... | varN|
Retriving the argument name
Now if you want to retrieve, for example, the arg2's name from the previous example, it's the 3rd position in the array. Hence you do the following:
// remember it's an array so it starts in 0, meaning if you want position 3 => use index 2
int frameWithNameAtConstantPool = table.nameIndex(2);
String variableName = methodInfo.getConstPool().getUtf8Info(frameAtConstantPool)
You now have your variable's name in variableName.
Side Note: I've taken you through the scenic route so you could learn a bit more about Java (and javassists) internals. But there are already tools that do this kind of operations for you, I can remember at least one by name called paranamer. You might want to give a look at that too.
Hope it helped!
If you don't actually want the names of the parameters, but just want to be able to access them, you can use "$1, $2, ..." as seen in this tutorial.
It works with Javaassist 3.18.2 (and later, at least up to 3.19 anyway) if you cast, like so:
LocalVariableAttribute nameTable = (LocalVariableAttribute)methodInfo.getCodeAttribute().getAttribute(LocalVariableAttribute.tag);

NullPointerException with ANTLR text attribute

I have a problem that I've been stuck on for a while and I would appreciate some help if possible.
I have a few rules in an ANTLR tree grammar:
block
: compoundstatement
| ^(VAR declarations) compoundstatement
;
declarations
: (^(t=type idlist))+
;
idlist
: IDENTIFIER+
;
type
: REAL
| i=INTEGER
;
I have written a Java class VarTable that I will insert all of my variables into as they are declared at the beginning of my source file. The table will also hold their variable types (ie real or integer). I'll also be able to use this variable table to check for undeclared variables or duplicate declarations etc.
So basically I want to be able to send the variable type down from the 'declarations' rule to the 'idlist' rule and then loop through every identifier in the idlist rule, adding them to my variable table one by one.
The major problem I'm getting is that I get a NullPointerException when I try and access the 'text' attribute if the $t variable in the 'declarations' rule (This is one one which refers to the type).
And yet if I try and access the 'text' attribute of the $i variable in the 'type' rule, there's no problem.
I have looked at the place in the Java file where the NullPointerException is being generated and it still makes no sense to me.
Is it a problem with the fact that there could be multiple types because the rule is
(^(typeidlist))+
??
I have the same issue when I get down to the idlist rule, becasue I'm unsure how I can write an action that will allow me to loop through all of the IDENTIFIER Tokens found.
Grateful for any help or comments.
Cheers
You can't reference the attributes from production rules like you tried inside tree grammars, only in parser (or combined) grammars (they're different objects!). Note that INTEGER is not a production rule, just a "simple" token (terminal). That's why you can invoke its .text attribute.
So, if you want to get a hold the text of the type rule in your tree grammar and print it in your declarations rule, your could do something like this:
tree grammar T;
...
declarations
: (^(t=type idlist {System.out.println($t.returnValue);}))+
;
...
type returns [String returnValue]
: i=INTEGER {returnValue = "[" + $i.text + "]";}
;
...
But if you really want to do it without specifying a return object, you could do something like this:
declarations
: (^(t=type idlist {System.out.println($t.start.getText());}))+
;
Note that type returns an instance of a TreeRuleReturnScope which has an attribute called start which in its turn is a CommonTree instance. You could then call getText() on that CommonTree instance.