I'm struggling with what seems to be a bug in Visual Basic. I'm probably missing something. Hopefully someone can point out what it is.
Section 7.5 of the Visual Basic Specification for version 10.0 says that this is the grammar for class declaration. Forgive the lack of italics that indicates the difference between literals and grammar nodes.
ClassDeclaration ::=
[ Attributes ] [ ClassModifier+ ] Class Identifier [ TypeParameterList ] StatementTerminator
[ ClassBase ]
[ TypeImplementsClause+ ]
[ ClassMemberDeclaration+ ]
End Class StatementTerminator
ClassModifier ::= TypeModifier | MustInherit | NotInheritable | Partial
So the minimal class declaration would be
Class Identifier StatementTerminator
End Class StatementTerminator
The grammar for Identifier and some other supporting nodes are specified in 13.1.2,
Identifier ::=
NonEscapedIdentifier [ TypeCharacter ] |
Keyword TypeCharacter |
EscapedIdentifier
NonEscapedIdentifier ::= < IdentifierName but not Keyword >
TypeCharacter ::=
IntegerTypeCharacter |
LongTypeCharacter |
DecimalTypeCharacter |
SingleTypeCharacter |
DoubleTypeCharacter |
StringTypeCharacter
IntegerTypeCharacter ::= %
LongTypeCharacter ::= &
DecimalTypeCharacter ::= #
SingleTypeCharacter ::= !
DoubleTypeCharacter ::= #
StringTypeCharacter ::= $
Based on my reading of this foo! should be a legal identifier because ! is a TypeCharacter.
So, based on the minimal legal class declaration above, this should be legal.
Class foo!
End Class
But Visual Studio 2010 gives this:
Type declaration characters are not valid in this context.
Am I missing something in the spec, or does the compiler disagree with the spec?
The grammar doesn’t stand on its own, the describing text is part of the specification. This is important here, because otherwise you’d be right. But as competent_tech has noted in his answer, the TypeCharacter production, while forming part of the Identifier production, is not a valid character in an identifier – except to denote a variable’s (or function’s) type.
This is specified in 2.2.1:
A type character denotes the type of the preceding identifier. The type character is not considered part of the identifier. If a declaration includes a type character, the type character must agree with the type specified in the declaration itself; otherwise, a compile-time error occurs.
Appending a type character to an identifier that conceptually does not have a type (for example, a namespace name) or to an identifier whose type disagrees with the type of the type character causes a compile-time error.
(Emphasis mine.)
The section even gives an explicit example of an invalid use that is equivalent to your use. So the specification explicitly forbids this.
The TypeIdentifier is used to declare the data type of a property, variable, function, etc.
Public Class Foo
Dim x!
' The above declaration is the same as
'Dim x As Single
End Class
Since there is no data type for a class, adding a type identifier does not make any sense. Based on this, it seems like the spec may not be entirely correct.
Related
A context entry & key are defined by the following grammar (cf DMN v1.2, page 111, Section 10.3.1.2)
60. context entry = key , ":", expression;
61. key = name | string literal;
Consider the following instance of a context object
{ "12" : "hello" }
How do I access "hello" from such an object?
Could this be an issue in the grammar? Not sure if this kind of accession is valid.
Accordingly to DMN specification, as "12" cannot be transformed into a legal name I concur with you, cannot be accessed with the dot operator.
But you can use the built-in function get value() as for the spec:
If key1 is not a legal name or for whatever reason one wishes to treat
the key as a string, the following syntax is allowed: get value(m,
"key1").
For example:
get value({ "12" : "hello" }, "12")
this is valid FEEL and would result in "hello".
I see no issue in the grammar.
I believe the only way to access this entry value is by using the built-in function.
In the spec for generic signatures, ClassSignature has the form
ClassSignature:
[TypeParameters] SuperclassSignature {SuperinterfaceSignature}
TypeParameters:
< TypeParameter {TypeParameter} >
TypeParameter:
Identifier ClassBound {InterfaceBound}
ClassBound:
: [ReferenceTypeSignature]
InterfaceBound:
: ReferenceTypeSignature
So the superclass bound of a type parameter can be omitted (some examples here).
If I have a class declaration public class A<T, LFooBar>, the Java compiler generates the signature <T:Ljava/lang/Object;LFooBar:Ljava/lang/Object;>Ljava/lang/Object;.
IUC, the class bound could be omitted, in which case the signature would be <T:LFooBar:>Ljava/lang/Object;.
Parsing that short signature requires looking ahead to the second : in order to know that T:LFooBar: are two type parameters, and not one type parameter T with class bound FooBar.
Maybe in practice, leaving away the class bound is only done if there's an interface bound? For public class A<T extends Comparable<? super T>>, javac produces the signature <T::Ljava/lang/Comparable<-TT;>;>Ljava/lang/Object;. But I guess i cannot rely on this assumption.
Did I misunderstand something?
If you look closely, the only thing that can follow an omitted ReferenceTypeSignature is an Identifier or a >. Since ReferenceTypeSignature must either begin with an [ or end with a ; and identifiers can't contain these characters, while identifiers must be followed by a :, which can't appear type signatures, there is no ambiguity between those options.
Note that identifiers can start with >, so you need to look ahead for a colon to determine whether you are at the end TypeParameters or not. But that's a separate issue.
I'm not sure how the JVM implements it, but one possible approach is this:
Examine the first character. If it is [, you have a type signature. If it is >, scan ahead for the first [, ;, or :. If the first one you see is :, that means you have identifier, otherwise you have end of type parameters.
Otherwise, scan ahead for the first ; or :. If it is :, you have a identifier, otherwise you have a class bound.
Edit: Identifiers in signatures cannot contain >, so ignore that bit. (They also can't contain :, another potential source of ambiguity)
I am attempting to build a language that can declare methods and fields, with intrinsic support for generics. I would like to be able to use primitive types like String, as well as declare my own classes.
This should be valid syntax:
String somePrimitive
class MyClass { }
MyClass someObject;
class List { }
List<String> stringList;
List<MyClass> objectList;
List<String> getNames() { }
I have a grammar that supports these operations:
Model:
(members+=ModelMembers)*;
ModelMembers:
Class | Field | MethodDeclaration
;
Class:
'class' name=ID '{' '}'
;
Field:
type=Type
name=ID
;
enum PrimitiveType: STRING="String" | NUMBER="Number";
Type:
(
{TypeObject} clazz=[Class] ("<" a+=Type ("," a+=Type)* ">")?
|
{TypePrimitive} type=PrimitiveType
)
;
MethodDeclaration:
returnType=Type name=ID "(" ")" "{"
"}"
;
But it contains an error:
[fatal] rule rule__ModelMembers__Alternatives has non-LL(*) decision due to recursive rule invocations reachable from alts 2,3. Resolve by left-factoring or using syntactic predicates or using backtrack=true option.
The problem seems to stem from the fact that the Type rule is recursive, and can be matches as either the beginning of a MethodDeclaration or a Field.
However, it is possible to figure out what rule one is building, as the method will have () { } after the name.
What really confuses me, is if I replace the recursive rule with simply [Class], e.g. Field: type=[Class] name=ID (and the same for the MethodDeclaration) the grammar is valid.
I get that there is some ambiguity when one sees an instance of Type, as it could lead onto a method or field.... but that's exactly the same when I replace with [Class]. Instances of class can lead onto a method or field.
How can it be ambiguous using Type, but not ambiguous using [Class]?
This is not direct answer to your question, but did you considered using Xbase instead of the plain Xtext? With Xbase you can simple use predefined rules, that match everything you need:
Java types with generics
Expressions
Annotations
and many more.
Here are a couple of useful links:
Xbase: https://wiki.eclipse.org/Xbase
Extending Xbase blog: http://koehnlein.blogspot.de/2011/07/extending-xbase.html
7 Languages For The JVM (7 examples): http://www.eclipse.org/Xtext/7languages.html
Screencasts: http://xtextcasts.org/?tag_id=10
If Xbase doesn't suite you, then you can learn from it's Xbase-Xtext-Grammar.
This grammar parses the example code without throwing errors (the layout is the one favored by Terence Parr, the ANTLR man. I find it helps greatly):
Model
: (members+=ModelMembers)*
;
ModelMembers
: Class
| MethodDeclaration
| Field
;
Class
: 'class' name=ID '{' '}'
;
Field
: type=Type name=ID ';'
;
PrimitiveType
: ("String" |"Number")
;
TypeReferenceOrPrimitive
: {TypeClass} type=[Class]
| {TypePrimitive} PrimitiveType
;
Type
: {TypeObject} clazz=[Class] ("<" a+=TypeReferenceOrPrimitive ("," a+=TypeReferenceOrPrimitive)* ">")?
| {TypePrimitive} type=PrimitiveType
;
MethodDeclaration
: returnType=Type name=ID "(" ")" "{" "}"
;
I'm no Xtext expert so there may be better ways. My 'trick' is to
define TypeReferenceOrPrimitive. You will probably need to play around with the grammar a bit more in order to get an AST that is easier to process.
I have a problem that I've been stuck on for a while and I would appreciate some help if possible.
I have a few rules in an ANTLR tree grammar:
block
: compoundstatement
| ^(VAR declarations) compoundstatement
;
declarations
: (^(t=type idlist))+
;
idlist
: IDENTIFIER+
;
type
: REAL
| i=INTEGER
;
I have written a Java class VarTable that I will insert all of my variables into as they are declared at the beginning of my source file. The table will also hold their variable types (ie real or integer). I'll also be able to use this variable table to check for undeclared variables or duplicate declarations etc.
So basically I want to be able to send the variable type down from the 'declarations' rule to the 'idlist' rule and then loop through every identifier in the idlist rule, adding them to my variable table one by one.
The major problem I'm getting is that I get a NullPointerException when I try and access the 'text' attribute if the $t variable in the 'declarations' rule (This is one one which refers to the type).
And yet if I try and access the 'text' attribute of the $i variable in the 'type' rule, there's no problem.
I have looked at the place in the Java file where the NullPointerException is being generated and it still makes no sense to me.
Is it a problem with the fact that there could be multiple types because the rule is
(^(typeidlist))+
??
I have the same issue when I get down to the idlist rule, becasue I'm unsure how I can write an action that will allow me to loop through all of the IDENTIFIER Tokens found.
Grateful for any help or comments.
Cheers
You can't reference the attributes from production rules like you tried inside tree grammars, only in parser (or combined) grammars (they're different objects!). Note that INTEGER is not a production rule, just a "simple" token (terminal). That's why you can invoke its .text attribute.
So, if you want to get a hold the text of the type rule in your tree grammar and print it in your declarations rule, your could do something like this:
tree grammar T;
...
declarations
: (^(t=type idlist {System.out.println($t.returnValue);}))+
;
...
type returns [String returnValue]
: i=INTEGER {returnValue = "[" + $i.text + "]";}
;
...
But if you really want to do it without specifying a return object, you could do something like this:
declarations
: (^(t=type idlist {System.out.println($t.start.getText());}))+
;
Note that type returns an instance of a TreeRuleReturnScope which has an attribute called start which in its turn is a CommonTree instance. You could then call getText() on that CommonTree instance.
I have been keeping the shift/reduce errors away. Now finally i think i met my match.
Int[] a
a[0] = 1
The problem is int[] is defined as
Type OptSquareBrackets
while a[0] is defined as
Var | Var '[' expr ']'
Var and Type both are defined as VAR which is any valid variable [a-zA-Z][a-zA-Z0-9_]. Apart from adding a dummy token (such as **Decl** Type OptSquareBrackets instead) is there a way to write this to not have a conflict? From this one rule i get 1 shift/reduce and 1 reduce/reduce warning.
Could you define a new Token
VarLBracket [a-zA-Z][a-zA-Z0-9_]*\[
And therefore define declaration
Type | VarLBracket ']';
and define assignment target as
Var | VarLBracket expr ']';
Create a Lex rule with [] since [] is only used in declaration and everywhere else would use [var]
Technically, this problem stems from trying to tie the grammar to a semantic meaning that doesn't actually differ in syntax.
ISTM that you just need a single grammar construct that describes both types and expressions. Make the distinction in code and not in the grammar, especially if there is not actually a syntactic difference. Yacc is called a compiler generator but it is not the least bit true. It just makes parsers.
Having said that, recognizing [] as a terminal symbol might be an easier way to fix the problem and get on with things. Yacc isn't very good at ambiguous grammars and it needs to make early decisions on which path to follow.