SPARQL QuadPattern - sparql

In SPARQL a QuadPattern is defined as
QuadPattern ::= '{' Quads '}'
Quads ::= TriplesTemplate? ( QuadsNotTriples '.'? TriplesTemplate? )*
From this I understand that a QuadPattern can be empty. But I can not understand the reason. Whats the purpose of an empty QuadPattern?

As #Antoine Zimmermann points out just because the syntax allows it doesn't mean it is meaningful.
In this case I believe it was done to keep the grammar within a certain constraint and to simplify it. If you don't allows Quads to be empty then you'd have to redefine the QuadPattern rule as so:
QuadPattern ::= '{' '}' | '{' Quads '}'
Which just adds unnecessary complication particularly when you are using a parser generator

With an empty quad pattern, you can, for instance, delete the default graph completely:
DELETE WHERE { }
But the fact that something is allowed by the syntax does not necessarily mean that there was a deliberate choice to allow a specific pattern. It may be, in some cases, that it is more convenient to define things in a more generic way.

Related

Is QML Grammar LALR(1)?

Here is a QML grammar (extracted from https://github.com/kropp/intellij-qml/blob/master/grammars/qml.bnf):
/* identifier, value, integer and float are terminals */
qml ::= object /* Simplified */
object ::= type body
body ::= '{' (property_definition|signal_definition|attribute_assignment|method_attribute)* '}'
type ::= 'double'|'real'|identifier
attribute_assignment ::= (attribute ':')? attribute_value ';'?
item ::= list|object|string|boolean|number|identifier|value
attribute_value ::= method_call|method_body|item|value+
property_definition ::= 'default'? 'readonly'? 'property' ('alias'|'var'|type) property (':' attribute_value)?
signal_definition ::= 'signal' signal ('(' (signal_parameter ',')* signal_parameter? ')')?
signal_parameter ::= ('var'|type) parameter
method_attribute ::= 'function' method '(' (parameter ',')* parameter? ')' method_body
method_call ::= method '(' (argument ',')* argument? ')'
method_body ::= '{' javascript '}'
javascript ::= ('{' javascript '}'|'var'|'['|']'|'('|')'|','|':'|';'|string|identifier|number|value)*
list ::= '[' item? (',' item)* ']'
property ::= identifier
attribute ::= identifier
signal ::= identifier
parameter ::= identifier
method ::= identifier
argument ::= string|boolean|number|identifier|value
number ::= integer|float
boolean ::= 'true'|'false'
Is it LALR(1)? My program raises a reduce/reduce conflict for the closure I[n] which contains the conflicting items:
// other items here...
[item ::= identifier . , {] // -> ACTION[n, {] = reduce to item
[type ::= identifier . , {] // -> ACTION[n, {] = reduce to type
// other items here...
Note:
The following answer was written on the basis of the information provided in the question. As it happens, the actual implementation of QML only accepts user declarations for types whose names start with an upper case letter, while names of properties must start with a lower case letter. (Many built-in types have names which start with lower case letters, too. So it's not quite as simple as just dividing identifiers into two categories in the lexical scan based on their first letter. Built-in types and keywords still need to be recognised as such.)
Unfortunately, I haven't been able to find a definitive QML grammar, or even a formal description of the syntax. The comments above were based on Qt's QML Reference.
Thanks to #mishmashru for bringing the above to my attention.
The grammar is ambiguous so the parser generator correctly identifies a reduce/reduce conflict.
In particular, consider the following simplified productions extracted from the grammar, where most alternatives have been removed to focus on the conflict:
body ::= '{' attribute_assignment* '}'
attribute_assignment ::= attribute_value
attribute_value ::= method_body | item
method_body ::= '{' javascript '}'
item ::= object | identifier
object ::= type body
type ::= identifier
Now, consider the body which starts
{ x {
We'll suppose that the parser has just seen x and is now looking at the second {, to figure out what action(s) to take.
If x is an ordinary identifier (whatever "ordinary" might mean, then it can resolve to item, which is an alternative for attribute_value. Then the second { presumably starts a method_body, which is also an alternative for attribute_value.
If, on the other hand, x is a type, then we're looking at an object, which starts type body. And in that case the second { is the start of the interior body.
So the parser needs to decide whether to make x into an attribute_value directly, or to make it into a type. The decision cannot be made at this point, because the { lookahead token doesn't provide enough information.
So it's clear that the grammar is not LR(1).
Without knowing anything more about the problem domain, it's hard to give good advice. If it is possible to distinguish identifier and type, perhaps by consulting a symbol table, then you could solve this problem by using some kind of lexical feedback.

ANTLR4 : clean grammar and tree with keywords (aliases ?)

I am looking for a solution to a simple problem.
The example :
SELECT date, date(date)
FROM date;
This is a rather stupid example where a table, its column, and a function all have the name "date".
The snippet of my grammar (very simplified) :
simple_select
: SELECT selected_element (',' selected_element) FROM from_element ';'
;
selected_element
: function
| REGULAR_WORD
;
function
: REGULAR_WORD '(' function_argument ')'
;
function_argument
: REGULAR_WORD
;
from_element
: REGULAR_WORD
;
DATE: D A T E;
FROM: F R O M;
SELECT: S E L E C T;
REGULAR_WORD
: (SIMPLE_LETTER) (SIMPLE_LETTER | '0'..'9')*
;
fragment SIMPLE_LETTER
: 'a'..'z'
| 'A'..'Z'
;
DATE is a keyword (it is used somewhere else in the grammar).
If I want it to be recognised by my grammar as a normal word, here are my solutions :
1) I add it everywhere I used REGULAR_WORD, next to it.
Example :
selected_element
: function
| REGULAR_WORD
| DATE
;
=> I don't want this solution. I don't have only "DATE" as a keyword, and I have many rules using REGULAR_WORD, so I would need to add a list of many (50+) keywords like DATE to many (20+) parser rules : it would be absolutely ugly.
PROS: make a clean tree
CONS: make a dirty grammar
2) I use a parser rule in between to get all those keywords, and then, I replace every occurrence of REGULAR_WORD by that parser rule.
Example :
word
: REGULAR_WORD
| DATE
;
selected_element
: function
| word
;
=> I do not want this solution either, as it adds one more parser rule in the tree and polluting the informations (I do not want to know that "date" is a word, I want to know that it's a selected_element, a function, a function_argument or a from_element ...
PROS: make a clean grammar
CONS: make a dirty tree
Either way, I have a dirty tree or a dirty grammar. Isn't there a way to have both clean ?
I looked for aliases, parser fragment equivalent, but it doesn't seem like ANTLR4 has any ?
Thank you, have a nice day !
There are four different grammars for SQL dialects in the Antlr4 grammar repository and all four of them use your second strategy. So it seems like there is a consensus among Antlr4 sql grammar writers. I don't believe there is a better solution given the design of the Antlr4 lexer.
As you say, that leads to a bit of noise in the full parse tree, but the relevant non-terminal (function, selected_element, etc.) is certainly present and it does not seem to me to be very difficult to collapse the unit productions out of the parse tree.
As I understand it, when Antlr4 was being designed, a decision was made to only automatically produce full parse trees, because the design of condensed ("abstract") syntax trees is too idiosyncratic to fit into a grammar DSL. So if you find an AST more convenient, you have the responsibility to generate one yourself. That's generally straight-forward although it involves a lot of boilerplate.
Other parser generators do have mechanisms which can handle "semireserved keywords". In particular, the Lemon parser generator, which is part of the Sqlite project, includes a %fallback declaration which allows you to specify that one or more tokens should be automatically reclassified in a context in which no grammar rule allows them to be used. Unfortunately, Lemon does not generate Java parsers.
Another similar option would be to use a parser generator which supports "scannerless" parsing. Such parsers typically use algorithms like Earley/GLL/GLR, capable of parsing arbitrary CFGs, to get around the need for more lookahead than can conveniently be supported in fixed-lookahead algorithms such as LALR(1).
This is the socalled keywords-as-identifiers problem and has been discussed many times before. For instance I asked a similar question already 6 years ago in the ANTLR mailing list. But also here at Stackoverflow there are questions touching this area, for instance Trying to use keywords as identifiers in ANTLR4; not working.
Terence Parr wrote a wiki article for ANTLR3 in 2008 that shortly describes 2 possible solutions:
This grammar allows "if if call call;" and "call if;".
grammar Pred;
prog: stat+ ;
stat: keyIF expr stat
| keyCALL ID ';'
| ';'
;
expr: ID
;
keyIF : {input.LT(1).getText().equals("if")}? ID ;
keyCALL : {input.LT(1).getText().equals("call")}? ID ;
ID : 'a'..'z'+ ;
WS : (' '|'\n')+ {$channel=HIDDEN;} ;
You can make those semantic predicates more efficient by intern'ing those strings so that you can do integer comparisons instead of string compares.
The other alternative is to do something like this
identifier : KEY1 | KEY2 | ... | ID ;
which is a set comparison and should be faster.
Normally, as #rici already mentioned, people prefer the solution where you keep all keywords in an own rule and add that to your normal identifier rule (where such a keyword is allowed).
The other solution in the wiki can be generalized for any keyword, by using a lookup table/list in an action in the ID lexer rule, which is used to check if a given string is a keyword. This solution is not only slower, but also sacrifies clarity in your parser grammar, since you can no longer use keyword tokens in your parser rules.

How to solve ambiguity in with keywords as identifiers in grammar kit

I've been trying to write the graphql language grammar for grammarkit and I've found myself really stuck on an ambiguity issue for quite some time now. Keywords in graphql (such as: type, implements, scalar ) can also be names of types or fields. I.E.
type type implements type {}
At first I defined these keywords as tokens in the bnf but that'd mean the case above is invalid. But if I write these keywords directly as I'm describing the rule, It results in an ambiguity in the grammar.
An example of an issue I'm seeing based on this grammar below is if you define something like this
directive #foo on Baz | Bar
scalar Foobar #cool
the PSI viewer is telling me that in the position of #cool it's expecting a DirectiveAddtlLocation, which is a rule I don't even reference in the scalar rule. Is anyone familiar with grammarkit and have encountered something like this? I'd really appreciate some insight. Thank You.
Here's an excerpt of grammar for the error example I mentioned above.
{
tokens=[
LEFT_PAREN='('
RIGHT_PAREN=')'
PIPE='|'
AT='#'
IDENTIFIER="regexp:[_A-Za-z][_0-9A-Za-z]*"
WHITE_SPACE = 'regexp:\s+'
]
}
Document ::= Definition*
Definition ::= DirectiveTypeDef | ScalarTypeDef
NamedTypeDef ::= IDENTIFIER
// I.E. #foo #bar(a: 10) #baz
DirectivesDeclSet ::= DirectiveDecl+
DirectiveDecl ::= AT TypeName
// I.E. directive #example on FIELD_DEFINITION | ARGUMENT_DEFINITION
DirectiveTypeDef ::= 'directive' AT NamedTypeDef DirectiveLocationsConditionDef
DirectiveLocationsConditionDef ::= 'on' DirectiveLocation DirectiveAddtlLocation*
DirectiveLocation ::= IDENTIFIER
DirectiveAddtlLocation ::= PIPE? DirectiveLocation
TypeName ::= IDENTIFIER
// I.E. scalar DateTime #foo
ScalarTypeDef ::= 'scalar' NamedTypeDef DirectivesDeclSet?
Once your grammar sees directive #TOKEN on IDENTIFIER, it consumes a sequence of DirectiveAddtlLocation. Each of those consists of an optional PIPE followed by an IDENTIFIER. As you note in your question, the GraphQL "keywords" are really just special cases of identifiers. So what's probably happening here is that, since you allow any token as an identifier, scalar and Foobar are both being consumed as DirectiveAddtlLocation and it's never actually getting to see a ScalarTypeDef.
# Parses the same as:
directive #foo on Bar | Baz | scalar | Foobar
#cool # <-- ?????
You can get around this by listing out the explicit set of allowed directive locations in your grammar. (You might even be able to get pretty far by just copying the grammar in Appendix B of the GraphQL spec and changing its syntax.)
DirectiveLocation ::= ExecutableDirectiveLocation | TypeSystemDirectiveLocation
ExecutableDirectiveLocation ::= 'QUERY' | 'MUTATION' | ...
TypeSystemDirectiveLocation ::= 'SCHEMA' | 'SCALAR' | ...
Now when you go to parse:
directive #foo on QUERY | MUTATION
# "scalar" is not a directive location, so the DirectiveTypeDef must end
scalar Foobar #cool
(For all that the "identifier" vs. "keyword" distinction is a little weird, I'm pretty sure the GraphQL grammar isn't actually ambiguous; in every context where a free-form identifier is allowed, there's punctuation before a "keyword" could appear again, and in cases like this one there's unambiguous lists of not-quite-keywords that don't overlap.)

Antlr: ignore keywords in specific context

I'm constructing an English-like domain specific language with ANTLR. Its keywords are context-sensitive. (I know it sounds dirty, but it makes a lot of sense for the non-programmer target users.) For example, the usual logical operators such as or and not are to be treated as identifiers when surrounded in brackets, [like this and this]. My current approach looks like this:
bracketedStatement
: '[' bracketedWord+ ']'
;
bracketedWord
: (~(']')+
;
This, when combined with lexical definitions such as the following:
AND: 'and' ;
OR: 'or' ;
Produces the warning"Decision can match input such as "{AND..PROCESS, RPAREN..'with'}" using multiple alternatives: 1, 2". I'm clearly creating ambiguity for ANTLR, but I don't know how to resolve it. How do I fix this?
For anyone who finds this, check out this stack overflow question. It clarifies how to use the negation symbol correctly.

Stripping actions from ANTLR grammar changes its parsing algorithm

I have a grammar Foo.xtext (too complex to include it here). Xtext generates InternalFoo.g from it. After some tweaking it also generates DebugInternalFoo.g which claims to be the same thing without actions. Now, I strip off actions with ANTLR directly
java -cp antlr-3.4.jar org.antlr.tool.Strip Internal.g > Stripped.g
I'd expect the three grammars to behave the same way when I check them. But here is what I experienced
InternalFoo.g - error, rule assignment has non-LL(*) decision
DebugInternalFoo.g - no problem, parses fine
Stripped.g - warnings at rule assignment, decision can match using multiple alternatives. It fails to parse properly.
Is it possible that a grammar parses a text differently with or without actions? Or is it a bug in any of the action-remover tools? (The rule in question has syntactic predicates, and without them, it would really have a non-LL(*) decision.)
UPDATE:
I partly found what caused the problem. The rule in question was like this
trickyRule:
({ some complex action})
(expression '=')=>...
Stripping with Antlr removed the action, but left an empty group there:
// Stripped.g
trickyRule:
() (expression '=')=>...
The generation of the debug grammar removes both the action, and the now empty group around it:
// DebugInternalFoo.g
trickyRule:
(expression '=')=>...
So the lesson learned is: an empty group before a syntactic predicate is not the same as nothing at all.
Is it possible that a grammar parses a text differently with or without actions?
Yes, that is possible. org.antlr.tool.Strip leaves syntactic predicates1, but removes validating2- and gated3 semantic predicates (and member sections that these semantic predicates might use).
For example, the following rules would only match an A_TOKEN:
parser_rule1
: (parser_rule2)=> parser_rule2
;
parser_rule2
: {input.LT(1).getType() == A_TOKEN}? .
;
but if you use the Strip tool on it, it leaves the following:
parser_rule1
: (parser_rule2)=> parser_rule2
;
parser_rule2
: /*{input.LT(1).getType() == A_TOKEN}?*/ .
;
making it match any token.
In other words, Strip could change the behavior of the generated lexer or parser.
1 syntactic predicate: ( ... )=>
2 validating semantic predicate { ... }?
3 gated semantic predicate { ... }?=>