I am attempting to build a language that can declare methods and fields, with intrinsic support for generics. I would like to be able to use primitive types like String, as well as declare my own classes.
This should be valid syntax:
String somePrimitive
class MyClass { }
MyClass someObject;
class List { }
List<String> stringList;
List<MyClass> objectList;
List<String> getNames() { }
I have a grammar that supports these operations:
Model:
(members+=ModelMembers)*;
ModelMembers:
Class | Field | MethodDeclaration
;
Class:
'class' name=ID '{' '}'
;
Field:
type=Type
name=ID
;
enum PrimitiveType: STRING="String" | NUMBER="Number";
Type:
(
{TypeObject} clazz=[Class] ("<" a+=Type ("," a+=Type)* ">")?
|
{TypePrimitive} type=PrimitiveType
)
;
MethodDeclaration:
returnType=Type name=ID "(" ")" "{"
"}"
;
But it contains an error:
[fatal] rule rule__ModelMembers__Alternatives has non-LL(*) decision due to recursive rule invocations reachable from alts 2,3. Resolve by left-factoring or using syntactic predicates or using backtrack=true option.
The problem seems to stem from the fact that the Type rule is recursive, and can be matches as either the beginning of a MethodDeclaration or a Field.
However, it is possible to figure out what rule one is building, as the method will have () { } after the name.
What really confuses me, is if I replace the recursive rule with simply [Class], e.g. Field: type=[Class] name=ID (and the same for the MethodDeclaration) the grammar is valid.
I get that there is some ambiguity when one sees an instance of Type, as it could lead onto a method or field.... but that's exactly the same when I replace with [Class]. Instances of class can lead onto a method or field.
How can it be ambiguous using Type, but not ambiguous using [Class]?
This is not direct answer to your question, but did you considered using Xbase instead of the plain Xtext? With Xbase you can simple use predefined rules, that match everything you need:
Java types with generics
Expressions
Annotations
and many more.
Here are a couple of useful links:
Xbase: https://wiki.eclipse.org/Xbase
Extending Xbase blog: http://koehnlein.blogspot.de/2011/07/extending-xbase.html
7 Languages For The JVM (7 examples): http://www.eclipse.org/Xtext/7languages.html
Screencasts: http://xtextcasts.org/?tag_id=10
If Xbase doesn't suite you, then you can learn from it's Xbase-Xtext-Grammar.
This grammar parses the example code without throwing errors (the layout is the one favored by Terence Parr, the ANTLR man. I find it helps greatly):
Model
: (members+=ModelMembers)*
;
ModelMembers
: Class
| MethodDeclaration
| Field
;
Class
: 'class' name=ID '{' '}'
;
Field
: type=Type name=ID ';'
;
PrimitiveType
: ("String" |"Number")
;
TypeReferenceOrPrimitive
: {TypeClass} type=[Class]
| {TypePrimitive} PrimitiveType
;
Type
: {TypeObject} clazz=[Class] ("<" a+=TypeReferenceOrPrimitive ("," a+=TypeReferenceOrPrimitive)* ">")?
| {TypePrimitive} type=PrimitiveType
;
MethodDeclaration
: returnType=Type name=ID "(" ")" "{" "}"
;
I'm no Xtext expert so there may be better ways. My 'trick' is to
define TypeReferenceOrPrimitive. You will probably need to play around with the grammar a bit more in order to get an AST that is easier to process.
Related
I'm struggling with what seems to be a bug in Visual Basic. I'm probably missing something. Hopefully someone can point out what it is.
Section 7.5 of the Visual Basic Specification for version 10.0 says that this is the grammar for class declaration. Forgive the lack of italics that indicates the difference between literals and grammar nodes.
ClassDeclaration ::=
[ Attributes ] [ ClassModifier+ ] Class Identifier [ TypeParameterList ] StatementTerminator
[ ClassBase ]
[ TypeImplementsClause+ ]
[ ClassMemberDeclaration+ ]
End Class StatementTerminator
ClassModifier ::= TypeModifier | MustInherit | NotInheritable | Partial
So the minimal class declaration would be
Class Identifier StatementTerminator
End Class StatementTerminator
The grammar for Identifier and some other supporting nodes are specified in 13.1.2,
Identifier ::=
NonEscapedIdentifier [ TypeCharacter ] |
Keyword TypeCharacter |
EscapedIdentifier
NonEscapedIdentifier ::= < IdentifierName but not Keyword >
TypeCharacter ::=
IntegerTypeCharacter |
LongTypeCharacter |
DecimalTypeCharacter |
SingleTypeCharacter |
DoubleTypeCharacter |
StringTypeCharacter
IntegerTypeCharacter ::= %
LongTypeCharacter ::= &
DecimalTypeCharacter ::= #
SingleTypeCharacter ::= !
DoubleTypeCharacter ::= #
StringTypeCharacter ::= $
Based on my reading of this foo! should be a legal identifier because ! is a TypeCharacter.
So, based on the minimal legal class declaration above, this should be legal.
Class foo!
End Class
But Visual Studio 2010 gives this:
Type declaration characters are not valid in this context.
Am I missing something in the spec, or does the compiler disagree with the spec?
The grammar doesn’t stand on its own, the describing text is part of the specification. This is important here, because otherwise you’d be right. But as competent_tech has noted in his answer, the TypeCharacter production, while forming part of the Identifier production, is not a valid character in an identifier – except to denote a variable’s (or function’s) type.
This is specified in 2.2.1:
A type character denotes the type of the preceding identifier. The type character is not considered part of the identifier. If a declaration includes a type character, the type character must agree with the type specified in the declaration itself; otherwise, a compile-time error occurs.
Appending a type character to an identifier that conceptually does not have a type (for example, a namespace name) or to an identifier whose type disagrees with the type of the type character causes a compile-time error.
(Emphasis mine.)
The section even gives an explicit example of an invalid use that is equivalent to your use. So the specification explicitly forbids this.
The TypeIdentifier is used to declare the data type of a property, variable, function, etc.
Public Class Foo
Dim x!
' The above declaration is the same as
'Dim x As Single
End Class
Since there is no data type for a class, adding a type identifier does not make any sense. Based on this, it seems like the spec may not be entirely correct.
For the question and the grammar suggested by #BartKiers (Thank you!), I added the options block to specify the output to be
options{
language=Java;
output=AST;
ASTLabelType=CommonTree;
}
However, I am not able to figure out how to access the output i.e. AST. I need to traverse through the tree and process each operation that was specified in the input.
Using your example here, I am trying to implement rules returning values. However, I am running into following errors:
relational returns [String val]
: STRINGVALUE ((operator)^ term)?
{val = $STRINGVALUE.text + $operator.text + $term.text; }
;
term returns [String rhsOperand]
: QUOTEDSTRINGVALUE {rhsOperand = $QUOTEDSTRINGVALUE.text;}
| NUMBERVALUE {rhsOperand = $NUMBERVALUE.text; }
| '(' condition ')'
;
Compilation Error:
Checking Grammar RuleGrammarParser.g...
\output\RuleGrammarParser.java:495: cannot find symbol
symbol : variable val
location: class RuleGrammarParser
val = (STRINGVALUE7!=null?STRINGVALUE7.getText():null) + (operator8!=null?input.toString(operator8.start,operator8.stop):null) + (term9!=null?input.toString(term9.start,term9.stop):null);
^
\output\RuleGrammarParser.java:612: cannot find symbol
symbol : variable rhsOperand
location: class RuleGrammarParser
rhsOperand = (QUOTEDSTRINGVALUE10!=null?QUOTEDSTRINGVALUE10.getText():null);
^
\output\RuleGrammarParser.java:632: cannot find symbol
symbol : variable rhsOperand
location: class RuleGrammarParser
rhsOperand = (NUMBERVALUE11!=null?NUMBERVALUE11.getText():null);
^
3 errors
Can you please help me understand why this fails to compiler?
Added the pastebin: http://pastebin.com/u1Bv3L0A
By simply adding output=AST to the options section you don't create a AST, but a flat, 1 dimensional list of tokens. To mark certain tokens as root (or children), you need to do a bit of work.
Checkout this answer which explains how to create a proper AST and get access to the tree the parser then produces (the CommonTree tree in the main method of the answer I mentioned).
Note that you can safely remove language=Java;: by default the target language is Java (no harm in leaving it there though).
I have a problem that I've been stuck on for a while and I would appreciate some help if possible.
I have a few rules in an ANTLR tree grammar:
block
: compoundstatement
| ^(VAR declarations) compoundstatement
;
declarations
: (^(t=type idlist))+
;
idlist
: IDENTIFIER+
;
type
: REAL
| i=INTEGER
;
I have written a Java class VarTable that I will insert all of my variables into as they are declared at the beginning of my source file. The table will also hold their variable types (ie real or integer). I'll also be able to use this variable table to check for undeclared variables or duplicate declarations etc.
So basically I want to be able to send the variable type down from the 'declarations' rule to the 'idlist' rule and then loop through every identifier in the idlist rule, adding them to my variable table one by one.
The major problem I'm getting is that I get a NullPointerException when I try and access the 'text' attribute if the $t variable in the 'declarations' rule (This is one one which refers to the type).
And yet if I try and access the 'text' attribute of the $i variable in the 'type' rule, there's no problem.
I have looked at the place in the Java file where the NullPointerException is being generated and it still makes no sense to me.
Is it a problem with the fact that there could be multiple types because the rule is
(^(typeidlist))+
??
I have the same issue when I get down to the idlist rule, becasue I'm unsure how I can write an action that will allow me to loop through all of the IDENTIFIER Tokens found.
Grateful for any help or comments.
Cheers
You can't reference the attributes from production rules like you tried inside tree grammars, only in parser (or combined) grammars (they're different objects!). Note that INTEGER is not a production rule, just a "simple" token (terminal). That's why you can invoke its .text attribute.
So, if you want to get a hold the text of the type rule in your tree grammar and print it in your declarations rule, your could do something like this:
tree grammar T;
...
declarations
: (^(t=type idlist {System.out.println($t.returnValue);}))+
;
...
type returns [String returnValue]
: i=INTEGER {returnValue = "[" + $i.text + "]";}
;
...
But if you really want to do it without specifying a return object, you could do something like this:
declarations
: (^(t=type idlist {System.out.println($t.start.getText());}))+
;
Note that type returns an instance of a TreeRuleReturnScope which has an attribute called start which in its turn is a CommonTree instance. You could then call getText() on that CommonTree instance.
Whilst creating an inline ANTLR Tree Grammar interpreter I have come across an issue regarding the multiplicity of procedure call arguments.
Consider the following (faulty) tree grammar definition.
procedureCallStatement
: ^(PROCEDURECALL procedureName=NAME arguments=expression*)
{
if(procedureName.equals("foo")) {
callFooMethod(arguments[0], arguments[1]);
}elseif(procedureName.equals("bar")) {
callBarMethod(arguments[0], arguments[1], arguments[2]);
}
}
;
My problem lies with the retrieval of the given arguments. If there would be a known quantity of expressions I would just assign the values coming out of these expressions to their own variable, e.g.:
procedureCallStatement
: ^(PROCEDURECALL procedureName=NAME argument1=expression argument2=expression)
{
...
}
;
This however is not the case.
Given a case like this, what is the recommendation on interpreting a variable number of tree nodes inline within the ANTLR Tree Grammar?
Use the += operator. To handle any number of arguments, including zero:
procedureCallStatement
: ^(PROCEDURECALL procedureName=NAME argument+=expression*)
{
...
}
;
See the tree construction documentation on the antlr website.
The above will change the type of the variable argument from typeof(expression) to a List (well, at least when you're generating Java code). Note that the list types are untyped, so it's just a plain list.
If you use multiple parameters with the same variable name, they will also create a list, for example:
twoParameterCall
: ^(PROCEDURECALL procedureName=NAME argument=expression argument=expression)
{
...
}
;
I have been keeping the shift/reduce errors away. Now finally i think i met my match.
Int[] a
a[0] = 1
The problem is int[] is defined as
Type OptSquareBrackets
while a[0] is defined as
Var | Var '[' expr ']'
Var and Type both are defined as VAR which is any valid variable [a-zA-Z][a-zA-Z0-9_]. Apart from adding a dummy token (such as **Decl** Type OptSquareBrackets instead) is there a way to write this to not have a conflict? From this one rule i get 1 shift/reduce and 1 reduce/reduce warning.
Could you define a new Token
VarLBracket [a-zA-Z][a-zA-Z0-9_]*\[
And therefore define declaration
Type | VarLBracket ']';
and define assignment target as
Var | VarLBracket expr ']';
Create a Lex rule with [] since [] is only used in declaration and everywhere else would use [var]
Technically, this problem stems from trying to tie the grammar to a semantic meaning that doesn't actually differ in syntax.
ISTM that you just need a single grammar construct that describes both types and expressions. Make the distinction in code and not in the grammar, especially if there is not actually a syntactic difference. Yacc is called a compiler generator but it is not the least bit true. It just makes parsers.
Having said that, recognizing [] as a terminal symbol might be an easier way to fix the problem and get on with things. Yacc isn't very good at ambiguous grammars and it needs to make early decisions on which path to follow.