Here is a QML grammar (extracted from https://github.com/kropp/intellij-qml/blob/master/grammars/qml.bnf):
/* identifier, value, integer and float are terminals */
qml ::= object /* Simplified */
object ::= type body
body ::= '{' (property_definition|signal_definition|attribute_assignment|method_attribute)* '}'
type ::= 'double'|'real'|identifier
attribute_assignment ::= (attribute ':')? attribute_value ';'?
item ::= list|object|string|boolean|number|identifier|value
attribute_value ::= method_call|method_body|item|value+
property_definition ::= 'default'? 'readonly'? 'property' ('alias'|'var'|type) property (':' attribute_value)?
signal_definition ::= 'signal' signal ('(' (signal_parameter ',')* signal_parameter? ')')?
signal_parameter ::= ('var'|type) parameter
method_attribute ::= 'function' method '(' (parameter ',')* parameter? ')' method_body
method_call ::= method '(' (argument ',')* argument? ')'
method_body ::= '{' javascript '}'
javascript ::= ('{' javascript '}'|'var'|'['|']'|'('|')'|','|':'|';'|string|identifier|number|value)*
list ::= '[' item? (',' item)* ']'
property ::= identifier
attribute ::= identifier
signal ::= identifier
parameter ::= identifier
method ::= identifier
argument ::= string|boolean|number|identifier|value
number ::= integer|float
boolean ::= 'true'|'false'
Is it LALR(1)? My program raises a reduce/reduce conflict for the closure I[n] which contains the conflicting items:
// other items here...
[item ::= identifier . , {] // -> ACTION[n, {] = reduce to item
[type ::= identifier . , {] // -> ACTION[n, {] = reduce to type
// other items here...
Note:
The following answer was written on the basis of the information provided in the question. As it happens, the actual implementation of QML only accepts user declarations for types whose names start with an upper case letter, while names of properties must start with a lower case letter. (Many built-in types have names which start with lower case letters, too. So it's not quite as simple as just dividing identifiers into two categories in the lexical scan based on their first letter. Built-in types and keywords still need to be recognised as such.)
Unfortunately, I haven't been able to find a definitive QML grammar, or even a formal description of the syntax. The comments above were based on Qt's QML Reference.
Thanks to #mishmashru for bringing the above to my attention.
The grammar is ambiguous so the parser generator correctly identifies a reduce/reduce conflict.
In particular, consider the following simplified productions extracted from the grammar, where most alternatives have been removed to focus on the conflict:
body ::= '{' attribute_assignment* '}'
attribute_assignment ::= attribute_value
attribute_value ::= method_body | item
method_body ::= '{' javascript '}'
item ::= object | identifier
object ::= type body
type ::= identifier
Now, consider the body which starts
{ x {
We'll suppose that the parser has just seen x and is now looking at the second {, to figure out what action(s) to take.
If x is an ordinary identifier (whatever "ordinary" might mean, then it can resolve to item, which is an alternative for attribute_value. Then the second { presumably starts a method_body, which is also an alternative for attribute_value.
If, on the other hand, x is a type, then we're looking at an object, which starts type body. And in that case the second { is the start of the interior body.
So the parser needs to decide whether to make x into an attribute_value directly, or to make it into a type. The decision cannot be made at this point, because the { lookahead token doesn't provide enough information.
So it's clear that the grammar is not LR(1).
Without knowing anything more about the problem domain, it's hard to give good advice. If it is possible to distinguish identifier and type, perhaps by consulting a symbol table, then you could solve this problem by using some kind of lexical feedback.
Related
I've been trying to write the graphql language grammar for grammarkit and I've found myself really stuck on an ambiguity issue for quite some time now. Keywords in graphql (such as: type, implements, scalar ) can also be names of types or fields. I.E.
type type implements type {}
At first I defined these keywords as tokens in the bnf but that'd mean the case above is invalid. But if I write these keywords directly as I'm describing the rule, It results in an ambiguity in the grammar.
An example of an issue I'm seeing based on this grammar below is if you define something like this
directive #foo on Baz | Bar
scalar Foobar #cool
the PSI viewer is telling me that in the position of #cool it's expecting a DirectiveAddtlLocation, which is a rule I don't even reference in the scalar rule. Is anyone familiar with grammarkit and have encountered something like this? I'd really appreciate some insight. Thank You.
Here's an excerpt of grammar for the error example I mentioned above.
{
tokens=[
LEFT_PAREN='('
RIGHT_PAREN=')'
PIPE='|'
AT='#'
IDENTIFIER="regexp:[_A-Za-z][_0-9A-Za-z]*"
WHITE_SPACE = 'regexp:\s+'
]
}
Document ::= Definition*
Definition ::= DirectiveTypeDef | ScalarTypeDef
NamedTypeDef ::= IDENTIFIER
// I.E. #foo #bar(a: 10) #baz
DirectivesDeclSet ::= DirectiveDecl+
DirectiveDecl ::= AT TypeName
// I.E. directive #example on FIELD_DEFINITION | ARGUMENT_DEFINITION
DirectiveTypeDef ::= 'directive' AT NamedTypeDef DirectiveLocationsConditionDef
DirectiveLocationsConditionDef ::= 'on' DirectiveLocation DirectiveAddtlLocation*
DirectiveLocation ::= IDENTIFIER
DirectiveAddtlLocation ::= PIPE? DirectiveLocation
TypeName ::= IDENTIFIER
// I.E. scalar DateTime #foo
ScalarTypeDef ::= 'scalar' NamedTypeDef DirectivesDeclSet?
Once your grammar sees directive #TOKEN on IDENTIFIER, it consumes a sequence of DirectiveAddtlLocation. Each of those consists of an optional PIPE followed by an IDENTIFIER. As you note in your question, the GraphQL "keywords" are really just special cases of identifiers. So what's probably happening here is that, since you allow any token as an identifier, scalar and Foobar are both being consumed as DirectiveAddtlLocation and it's never actually getting to see a ScalarTypeDef.
# Parses the same as:
directive #foo on Bar | Baz | scalar | Foobar
#cool # <-- ?????
You can get around this by listing out the explicit set of allowed directive locations in your grammar. (You might even be able to get pretty far by just copying the grammar in Appendix B of the GraphQL spec and changing its syntax.)
DirectiveLocation ::= ExecutableDirectiveLocation | TypeSystemDirectiveLocation
ExecutableDirectiveLocation ::= 'QUERY' | 'MUTATION' | ...
TypeSystemDirectiveLocation ::= 'SCHEMA' | 'SCALAR' | ...
Now when you go to parse:
directive #foo on QUERY | MUTATION
# "scalar" is not a directive location, so the DirectiveTypeDef must end
scalar Foobar #cool
(For all that the "identifier" vs. "keyword" distinction is a little weird, I'm pretty sure the GraphQL grammar isn't actually ambiguous; in every context where a free-form identifier is allowed, there's punctuation before a "keyword" could appear again, and in cases like this one there's unambiguous lists of not-quite-keywords that don't overlap.)
So I have a list that is iterated over like so:
body(foo) ::= "<foo:{it|<\n><\n><bar(it)>}>"
bar(x) ::= "[<x.key>:<x.value>]"
I'd like to use the index.
bar(x) ::= "[<i0>:<x.key>:x.value>]"
I saw that there is an <i> and <i0> index token, but I don't understand how it is used, or if it could be used to do what I want to do.
Ok, so the trick is to pass <i> or <i0> in if you have a function, but i is implicitly available inside an iterator:
body(foo) ::= "<foo:{it|<\n><\n><bar(i0,it)>}>"
bar(i,x) ::= "[<i>:<x.key>:<x.value>]"
is it possible in ANTLR 4 to create a parser rule with arguments of type 'token', i.e. a sort of a rule
list[elem Token] : '[' elem (',' elem)* ']';
which should match a list of tokens of the type 'elem'. For example, list[ID] should match a list of identifiers while list[String] should match a list of strings both following the syntax given in the above rule.
No, such semantic checks are generally done after parsing, in a listener or visitor (which ANTLR generates as well).
In SPARQL a QuadPattern is defined as
QuadPattern ::= '{' Quads '}'
Quads ::= TriplesTemplate? ( QuadsNotTriples '.'? TriplesTemplate? )*
From this I understand that a QuadPattern can be empty. But I can not understand the reason. Whats the purpose of an empty QuadPattern?
As #Antoine Zimmermann points out just because the syntax allows it doesn't mean it is meaningful.
In this case I believe it was done to keep the grammar within a certain constraint and to simplify it. If you don't allows Quads to be empty then you'd have to redefine the QuadPattern rule as so:
QuadPattern ::= '{' '}' | '{' Quads '}'
Which just adds unnecessary complication particularly when you are using a parser generator
With an empty quad pattern, you can, for instance, delete the default graph completely:
DELETE WHERE { }
But the fact that something is allowed by the syntax does not necessarily mean that there was a deliberate choice to allow a specific pattern. It may be, in some cases, that it is more convenient to define things in a more generic way.
I'm using antlr to analyse and re-write sql query.
I have:
select : SELECT ;
fragment S : 's' | 'S' ;
....
fragment LETTER : 'a'..'z' | 'A'..'Z' ;
SELECT : S E L E C T ;
IDENTIFIER : LETTER+ ;
to define reserved key words and let them to be case-insensitive.
My question is how can I define non-reserved key words?
Your problem seems similar to the problem we had when building the parser for the Drools (www.jboss.org/drools) language (DRL). In DRL, for instance, "rule" is a keyword, but could also be used by a java programmer as a property name in his POJO. So we can't have that as a reserved keyword.
rule /*keyword*/ "my rule"
when
SomeClass( rule /*property name*/ == "foo" )
...
We called these keywords "soft keywords".
To do that in ANTLR, we defined only "true"/"false"/"null" as hard keywords in the LEXER:
https://github.com/droolsjbpm/drools/blob/master/drools-compiler/src/main/resources/org/drools/lang/DRLLexer.g#L132
Everything else is an ID. Then in the PARSER, we used semantic predicates for each soft keyword:
https://github.com/droolsjbpm/drools/blob/master/drools-compiler/src/main/resources/org/drools/lang/DRLExpressions.g#L597
This makes it possible to seamlessly integrate with java created POJOs without clashing property names and other things with Drools defined keywords.
Hope it helps.