What kind of SPARQL patterns result in multisets (or sequences) of solution mappings - sparql

I am doing some SPARQL proofs where I use multisets to establish the proofs (instead of patterns). When applying the results in actual SPARQL queries, I need to make sure that every pattern used produces a multiset so that my proofs still hold.
EXAMPLE
Let JOIN be the SPARQL JOIN (aka AND) operator, and eval be the evaluation function.
Lemma A: Assume two multisets of solution mappings; M1 and M2: I made the proof that JOIN(M1, M2) = JOIN(M2, M1).
In order to apply this in an actual query (in algebraic form): JOIN(P1, P2) ≡ JOIN(P2, P1) which is correct if eval((JOIN(P1, P2)) = eval(JOIN(P2, P1)) which can be written as JOIN(eval(P1), eval(P2() ≡ JOIN(eval(P2), eval(P1)). In order to use Lemma A, it needs to hold that eval(P1) and eval(P2) yield multisets of solution mappings. However, not all SPARQL patterns yield multisets of solution mappings (I am not talking about sequences here as a sequence can be converted into a multiset). For example GROUP BY yields a set of mappings from solution mappings to multisets of solution mappings.
QUESTION
Which constraints on the structures of P1 and P2 I need to impose in order to ensure that, e.g., my Lemma A, is applicable?

Related

Arity of BETWEEN expression

What is the arity of the sql BETWEEN expression? I thought it was three (ternary) since the expression usually looks like:
WHERE...
1 BETWEEN 2 AND 3
But it's listed as binary on BigQuery's documentation, and I assume other places as well.
Source: Operators.
What is the arity of the BETWEEN expression and why? I think the answer is 3 from the following example:
select
~ (SELECT -1 AS expr_1) AS 'bitwise_arity_1',
(SELECT 1 AS expr_1) * (SELECT 2 AS expr_2) AS 'times_arity_2',
(SELECT 1 AS expr_1) BETWEEN
(SELECT 2 AS expr_2) AND (SELECT 3 AS expr_3) AS 'bitwise_arity_3?'
I suppose one way to interpret it might just be that the grammar is:
expr 'BETWEEN' logicalAndExpr
And so the two expressions in the logicalAnd are just grouped into one. Is that a correct understanding?
SQLFiddle: http://www.sqlfiddle.com/#!9/b28da2/2156
It's binary, in syntactic terms. See below for a discussion of syntax vs. semantics, where I note that a better syntactic term is "infix".
Similarly, function calls and array subscripting are postfix unary operators and the C family's conditional operator (often misnamed "the ternary operator" as though it were the only such thing) is also infix. The reason is that the interior operands (the operands between BETWEEN...AND, (...), [...], and ?...:, respectively) are fenced off from the rest of the syntax by the pair of surrounding terminal tokens which function as a syntactic barrier, like parentheses. Precedence does not penetrate to the enclosed operands; only the outer operand(s) remain floating in the syntax.
The semantic view is quite different, of course. BETWEEN...AND and ?...: are certainly three-argument functions, although since the latter is short-circuiting, only two of the three arguments are ever evaluated, which makes it hard to discuss in strict mathematical terms [Note 1]. Moreover, the semantic view is complicated by the fact that there is not just a single way to look at what an argument is. As noted in a comment, you can always curry functions into a series of unary applications of higher-order functions. Although you might be tempted to try to redefine "arity" as the length of that sequence, you will soon find higher-order functions which have different sequence lengths depending on the values of their arguments. Also, in most programming languages (unlike SQL) the function being called is a full expression which does not need to be evaluated at compile-time, and since different functions have different argument counts, there is no good way to describe the arity of a function call unless you respecify the call to be the application of a list-of-arguments object to a callable object. That's often done, but it's a bit unsatisfying because (in most languages), the list object does not really exist and cannot be observed as an object.
I'd suggest taking the Wikipedia article on arity with a good-sized saline dosage, because it completely misses the distinction between semantics and syntactic structure, giving rise to the confusing ambiguity between the semantic and syntactic view of SQL's range operator or C's conditional operator. Personally, I prefer to reserve "arity" for the semantic meaning, using "fixity" or "valence" for the syntactic feature. (The advantage of "fixity" is that it encourages the distinction between prefix and postfix, which is a real distinction hidden by calling both cases "unary operators".)
Notes
BETWEEN...AND could short-circuit, too, but standard SQL doesn't guarantee short-circuiting, as far as I know (although some SQL implementations do.)

How to design a context free grammar that avoids repetition?

I am learning about context-free grammar and I would like to know how (if at all) it is possible to design a language that avoids repetition.
Let's take the select statement from SQL as an example:
possible:
SELECT * FROM table
SELECT * FROM table WHERE x > 5
SELECT * FROM table WHERE x > 5 ORDER desc
SELECT * FROM table WHERE x > 5 ORDER desc LIMIT 5
impossible (multiple conflicting statements):
SELECT * FROM table WHERE X > 5 WHERE X > 5
Grammar could look something like this:
S -> SW | SO | SL | "SELECT statement"
W -> "WHERE statement"
O -> "ORDER statement"
L -> "Limit statement"
This grammar would allow for an impossible statement like the one mentioned above. How could I design a context-free grammar that avoids an impossible statement, while still being flexible?
Flexible:
The order of W, O, L does not matter. It also does not matter how many of these sub-statements are present. I would like to avoid a grammar that just lists all possible combinations since this would get quite messy if there are many possibilities.
In a context-free grammar, the set of sentences generated by a non-terminal is the same for every use of the non-terminal. That's what context-free means. A particular non-terminal, S, cannot sometimes allow a match and other times disallow it. So every set of possible matches must have its own non-terminal, and in the case of restricting a list of k cases to sentences without repeated cases, a minimum of 2k different non-terminals would be required, one for every subset of the k cases.
Worse, if the repetition you're trying to restrict has an unlimited number of possibilities (for example, you want to allow more than one W clause but not allow two identical Ws), then it cannot be done with a context-free grammar at all. The same is true if you want to insist on such repetition, which is basically what you would need to do to make a context-free grammar insist that variables be declared before use.
However, it is easy to do the check in a semantic action, for example by keeping a bit vector of clauses you have encountered (or a hash-set if it is not easy to enumerate the possible clauses). Then the semantic action for adding a clause to the statement only needs to check whether that particular clause has already been added, and flag an error if it has. That will also allow for better error messages since you can easily describe the problem when you detect it, as opposed to just st reporting a "syntax" error and leaving the user to guess what the problem was.
I am not sure I am understanding your problem based on the grammar. Perhaps you mean for statement and S to be the same symbol. If that's the case, I would argue that your grammar is simply not right for the language you intend to describe. If we ignore ORDER and LIMIT then your grammar is
S -> SW | "SELECT S" | foo
W -> "WHERE S"
Then yes, you can derive nonsense like
S -> SW -> SWW -> SWWW -> "SELECT foo WHERE foo WHERE foo WHERE foo"
But this is just your first attempt at a grammar, this does not prove there is no grammar that works. Consider this:
<S> -> <A><B>
<A> -> SELECT <C>
<B> -> epsilon | WHERE <D>
<C> -> (rules for select lists)
<D> -> (rules for WHERE condition)
The rules for <C> and <D> can refer back to S and A and B, properly, perhaps using parentheses, as required to produce strings that work for you. No longer can you get the bad strings.
This is not really a problem that CFGs cannot overcome by themselves. To do things like enforce that only declared variables can be used, yes, context-sensitive or better machinery is needed, but we are just talking about repeating keywords and phrases. This is well within the bounds of what CFGs can do. Now, if you want to support aliases and enforce correct alias referencing in the query, that is impossible in context-free languages. But that's not what we're discussing here. The reason it's impossible is that the language L = {ww | w in E*} is not a context-free language, and that's essentially what is involved in enforcing variable names or table aliases.

What is the significance of type signatures that are similar to definition?

I realized that if I were to create a swap function in Idris, its type signature is almost exactly the same as its definition
swap : (a, b) -> (b, a)
swap (x, y) = (y, x)
Are there any implications or significance to this with regards to dependent typing?
For example, can I define some type/value agnostic swap and use it for both definitions here? Is there a special name for this parallelism?
I don't think this has any significance to dependent typing. Your function is already fully polymorphic (i.e. agnostic). I am unaware of a term for type-level between symmetric with term level syntax.

move subtree from one part of AST to another

I am working on a tool to convert Oracle SQL to ANSI SQL. I have a grammar that will parse both Oracle SQL and ANSI SQL.
I want to extract the Oracle outer join expressions from the where clause part of the AST and insert new join clauses at the end of the from clause part of the AST for the matching select or subquery.
Can a tree parser with rewrite rules do this type of tree transformation?
i.e. take an AST generated from Oracle SQL
SELECT
a.columna, b.columnb
FROM
tablea a,
tableb b
WHERE
a.columna2 (+) = b.columnb2 (+)
AND
a.columna3 = 'foo'
AND
b.columnb3 = 'bar'
and transform it to an AST for ANSI SQL
SELECT
a.columna, b.columnb
FROM
tablea a FULL OUTER JOIN tableb b ON (a.columna2 = b.columnb2 )
WHERE
a.columna3 = 'foo'
and
b.columnb3 = 'bar'
NOTE1: the table references for tablea and tableb are deleted from the FROM clause and replaced with a JOIN clause referring to the same tables and table aliases.
NOTE2: the Oracle join condition is identified as a FULL OUTER JOIN by the presence of the OuterJoinIndicator (+) on both sides of the sql_condition comparison.
NOTE3: the join condition comparison is deleted from the WHERE clause and used to construct the join clause ON condition [with the OuterJoinIndicator(s) removed].
Yes, this is quite possible, especially since you have a grammar that recognizes both Oracle and ANSI SQL. I once wrote a translator from AREV BASIC to Visual BASIC and did many similar transformations.
In my project I used ANTLR 2 and wrote a master tree grammar which did nothing but completely walk the tree according to all rules in my grammars. I then used ANTLR 2's subclassing to override specific rules to do the transformations. I liked this as it let me build up the translation in passes and keep all my expression handling in one pass, control structures in another pass, etc.
ANTLR 3 does not provide grammar subclassing, so you won't be able to use that approach. You will need a complete tree grammar to print out your resulting tree. Personally, I would write that tree grammar first and get it working properly. Then I would copy that grammar and strip all the actions out but put in the option to rewrite the AST. Then modify the rules you need for your transformation. If you do many transformations you may want to use multiple passes, one tree grammar for each pass. You may have a pass or two that does analysis to help drive the later passes. On my BASIC translation project I did control flow analysis, data flow analysis and dead code removal as analysis passes.
If you want help writing the specific transformation you'll need to share your tree grammar. There are quite a few tree grammar idioms to wrap your head around. Terence's ANTLR 3 book would be a valuable purchase if you need help there. If you haven't written the tree grammar yet then post questions when you get stuck. Choosing the correct root nodes is important. If you want to get an idea of how to build trees and tree parsers, you can look at my C grammar. It is ANTLR 2, but the tree building concepts are the same. http://www.antlr3.org/grammar/cgram/grammars/
Do you need to retain comments and formatting? That adds another layer of complexity, for which I would recommend creating another question.
If you have two different grammars, you are likely to discover that the "small differences" in the grammars lead to quite considerably different ASTs for these clauses, and so your real problem is to convert the tree structure for one into the tree structure for another. And you'll have to do this piecewise for the whole tree because such differences are spread all across the grammars. YMMV.
ANTLR's tree parser will pretty likely let you recognize arbitrary fragments; these are certainly cues for generating equivalents in the other grammar's AST. But you'll have to write lots of such fragments, and the code corresponding routines to assemble the equivalent tree node-by-node. As a general rule for a large grammar (such as Oracle SQL), this can be quite a lot of work. You can do it this way.
An alternative is program transformation systems. These are tools that allow you to write surface syntax patterns (e.g., phrases in Oracle SQL and ANSI SQL) to code and apply your transformations directly. Writing transformations this way is considerably easier IMHO. You'd end up writing something like this:
source domain Oracle.
target domain ANSISQL.
rule xlate_Oracle_SELECT(c: columns, t1: table, t2: table,
c1: column, c2: column,
more_conditions: conditions):SQL_phrase
"SELECT \c FROM \t1, \t2 WHERE c1 (+) = c2 (+) and \more_conditions";
=>
"SELECT \c FROM \t1 FULL OUTER JOIN \t2 on ( c1 = c2 ) WHERE \more_conditions";
(The backslash-IDs are pattern variables which can match an arbitrary subtree of the the declared syntax type legal in that location.)
The reason this works is that the transformation tool parses the first pattern with the first grammar, and so gets a tree it can match on trees of the first grammar, and similarly parses the second pattern using the second grammar, getting a replacement tree that follows the rules of the second grammar. The transformation engine matches the tree for first pattern, and substitutes the tree for the second. So such a rule transforms a small set of blue tree nodes from the blue tree, to a small set of green nodes of the desired tree type. The color analogy should make it clear that you have to translate all the blue nodes into green ones if you want an accurate translation.
You'd need additional rules to translate the various subclauses just to paper over the differences in the grammar, e.g.,:
rule translate column(t: IDENTIFIER, c: IDENTIFIER, ):table->table
"\t.\c" -> " \toSQLidentifier\(\t\).\toSQLidentifier\(\c\)";
This would handles differences in how the two languages spell identifiers, by calling a custom function toSQLidentifier that does string hacking.
I don't think ANTLR supports these kind of transformation rules. You can simulate it all by lots of code.
You might avoid some of this if you have one "union" grammar for both languages (which is what you imply), but this usually gets you a highly ambiguous grammar and that's a huge amount of trouble. If you have succeeded in this, than you only have to apply translation rules where the languages differ (e.g., everything is a blue node).
You can also hack it: scan the tree left to right; prettyprint the parts that are equivalent (figuring this out is harder than it looks), where they differ, prettyprint a substitution. This is a very fragile way to do this.

Checking logical implication relationships between OWL expressions?

I have a simple question which I suspect has no simple answer. Essentially, I want to check whether it is true that one OWL expression (#B) follows on logically from another (#A) - in other words I want to ask: is it true that #A -> #B?
The reason for this is that I'm writing a matching algorithm for an application which matches structures in a knowledge based (represented by the #KnowledgeStructure class) to a structure which describes the needs of the current application state (#StateRequirement). Both structures have properties which have string values representing OWL expressions over the state of a third kind of structure (#Model). These are: #KnowledgeStructure.PostCondition which expresses how the knowledge structure being applied to #Model will transform #Model; and #StateRequirement.GoalCondition, which expresses the #Model state that the application aims to achieve. I want to see, therefore, if the #KnowledgeStructure will satisfy the #StateRequirement by checking that the #KnowledgeStructure.PostCondition produces the desired #StateRequiremment.GoalCondition. I could express this abstractly as: (#KnowledgeStructure.Postcondition => #StateRequirement.GoalCondition) => Match(#KnowledgeStructure, #StateRequirement). Less confusingly I could express this as: ((#A -> #B) -> Match(#A, #B)) where both #A and #B are valid OWL expressions.
In the general case I would like to be able to express the following rule: "If it is true that the expression #B follows from #A, then the expression Match(#A, #B) is also true".
Essentially, my question is this: how do I pose or realise such a rule in OWL? How do I test whether one expression follows from another expression? Also, are existing reasoners sufficiently powerful to determine the relation #A -> #B between two expressions if this relation is not explicitly stated?
I'm not 100% sure that I fully understood the question, but from what I grasped I would face the situation in this way.
First of all I refer to Java because all the libraries I know are meant for this language. Secondly, I don't think that OWL on its own is able to satisfy your goal, given that it can represent rules and axioms, but it does not provide reasoning, that is, you need a reasoner, so you need to build a program that uses it, plus doing additional processing that I will sketch below:
1) You didn't mention it, but I guess you have an underlying ontology w.r.t. you need to prove your consequence relation (what you denote with symbol "->"). If the ontology is not explicit, maybe it can be extracted/composed from the textual expressions you mentioned in the question.
2) you need to use a library for ontology manipulation, I suggest OWL API from Manchester University, it is very powerful and simple, in the tutorial under section "documentation" you have an overview of the main functionalities, including the use of reasoners (the example shows Hermit, but the principle holds for any other reasoner).
3) At this point you need to check if the ontology is consistent (otherwise anything can be derived, as it often happens with false premises)
4) You add the following axiom to the ontology (you build it directly in Java, no need to serialize back, you can let the reasoner work on the in-memory representation) and check for consistency: A \sqsubseteq B, that is, using the associated interpretation: A^I \subseteq B^I, so it is equivalent to A => B (they have the same truth table).
5) At this point you can add the axiom Match(A,B), where A and B are your class expressions and Match is a Role/Relation that relates all the class expressions for which the second is a consequence of the first.
6) After a number of repetitions of these steps you may want to serialize the result and store it, and this again can be achieved quite simply using OWL API from the in-memory representation.
For some basics about Description Logics (the logic underpinning OWL ontologies) you can refer to A description logic Primer (2012), Horrocks et al. and to Foundations of Description Logics (2011), Rudolph.
I'm not a logician or a DL expert, so please verify all the information I provided and feel free to correct me :)