SPARQL query works in Fuseki but not in Jena TDB - sparql

I have my data organised in multiple graphs. The graph in which a triple is saved matters. The data structure is complicated but it can be simplified like this:
My store contains cakes, where there's a hierarchy of different cake types, all subclasses of <cake>
<http://example.com/a1> a <http://example.com/applecake>
<http://example.com/a2> a <http://example.com/rainbowcake>
...
Depending on how they get created by a user in a UI, they end up in a different graph. If for instance the user "bakes" a cake, it goes in the <http://example.com/homemade> graph, if they "buy" one, it goes into the <http://example.com/shopbought> graph.
When I retrieve my cakes from the store, I want to know for each cake whether it's homemade or shopbought. There is no property for this, I want to retrieve the information purely based on the graph the triple is stored in.
I have tried various ways of achieving this but none of them work in Jena TDB. The problem is that all cakes come back as "shopbought". All of the queries however work in Fuseki (on the exact sae dataset) and I was wondering whether this is a TDB bug or if there's another way. Here are the simplified queries (without variations):
Version 1:
SELECT DISTINCT *
FROM <http://example.com/homemade>
FROM <http://example.com/shopbought>
FROM NAMED <http://example.com/homemade>
FROM NAMED <http://example.com/shopbought>
WHERE {
?cake rdf:type ?caketype .
?caketype rdfs:subClassOf* <cake>
{
GRAPH <http://example.com/homemade> { ?cake rdf:type ?typeHomemade }
} UNION {
GRAPH <http://example.com/shopbought> { ?cake rdf:type ?typeShopbought }
}
BIND(str(if(bound(?typeHomemade), true, false)) AS ?homemade)
}
Version 2:
SELECT DISTINCT *
FROM <http://example.com/homemade>
FROM <http://example.com/shopbought>
FROM NAMED <http://example.com/homemade>
FROM NAMED <http://example.com/shopbought>
WHERE {
?cake rdf:type ?caketype .
?caketype rdfs:subClassOf* <cake>
GRAPH ?g {
?cake rdf:type ?caketype .
}
BIND(STR(IF(?g=<http://example.com/homemade>, true, false)) AS ?homemade)
}
Any ideas why this works in Fuseki but not in TDB?
Edit:
I'm beginning to think it has something to do with the GRAPH keyword. Here are some much simpler queries (which work in Fuseki and tdbquery) and the results I get using the Jena API:
SELECT * WHERE { GRAPH <http://example.com/homemade> { ?s ?p ?o }}
0 results
SELECT * WHERE { GRAPH ?g { ?s ?p ?o }}
0 results
SELECT * FROM <http://example.com/homemade> WHERE { ?s ?p ?o }
x results
SELECT * FROM <http://example.com/homemade> WHERE { GRAPH <http://example.com/homemade> { ?s ?p ?o }}
0 results
SELECT * FROM NAMED <http://example.com/homemade> WHERE { GRAPH <http://example.com/homemade> { ?s ?p ?o }}
0 results

OK so my solution has actually to do with the way I executed the query. My initial idea was to pre-filter the dataset so that a query only gets executed on the relevant graphs (the dataset contains many graphs and they can be quite large which would make querying "everything" slow). This can be done either by adding them to the SPARQL or directly in Jena (although this would not work for other triple stores). Combining both ways however "to be on the safe side" does not work.
This query runs on the entire dataset and works as expected:
Query query = QueryFactory.create("SELECT * WHERE { GRAPH ?g { ?s ?p ?o } }", Syntax.syntaxARQ);
QueryExecution qexec = QueryExecutionFactory.create(query, dataset);
ResultSet result = qexec.execSelect();
The same query can be executed only on a specific graph, where it doesn't matter which graph that is, it does not give any results:
//run only on one graph
Model target = dataset.getNamedModel("http://example.com/homemade");
//OR run on the union of all graphs
Model target = dataset.getNamedModel("urn:x-arq:UnionGraph");
//OR run on a union of specific graphs
Model target = ModelFactory.createUnion(dataset.getNamedModel("http://example.com/shopbought"), dataset.getNamedModel("http://example.com/homemade"), ...);
[...]
QueryExecution qexec = QueryExecutionFactory.create(query, target);
[...]
My workaround was to now always query the entire dataset (which supports the SPARQL GRAPH keyword fine) and for each query always specify the graphs on which it should run to avoid having to query the entire dataset.
Not sure if this is expected behaviour for the Jena API

Related

Can't access named graph via a Dataset-bound QueryExecution

this is how I create a dataset and add three named models to it:
Dataset dataset = DatasetFactory.create();
dataset.addNamedModel("NG1", model1);
dataset.addNamedModel("NG2", model2);
dataset.addNamedModel("NG3", model3);
Query q = QueryFactory.create(select_sparql);
QueryExecution qq = QueryExecutionFactory.create(q,dataset);
qq.execSelect();
Sparql query (#1):
SELECT DISTINCT ?p WHERE { GRAPH <NG1> { [] ?p [] . } }
This returns an empty ResultSet with no QuerySolutions. Why?
However, when I run the following Sparql query (#2):
SELECT DISTINCT ?g ?p WHERE { GRAPH ?g { [] ?p [] . } }
It works just fine, I get pairs like
<NG1> | foaf:name
<NG2> | example:example
as expected. The named graphs are checked and found here, but not in (#1) when I specify it with an IRI.

Difference in performance between using VALUES keyword and using directly the URI in the query?

I have a fairly complex SPARQL query with the structure outlined below, involving multiple graph patterns, UNION and nested FILTER NOT EXISTS.
I want the query to remain generic, and I want to be able to inject values for certain variables at execution time, and my idea is to append a VALUES keyword at the end of the query to specify the value of certain variables in the query. In the structure below, I set the value of ?x, and I illustrate all the places in the query where ?x applies.
However, in Fuseki I see that executing the query like that takes around 4 to 5 seconds, but manually replacing the ?x variable in the query with a URI, instead of specifying a VALUES clause, makes it run very fast.
I always thought that using the VALUES keyword at the end of the WHERE clause was like setting values inline for some variables, so I would expect using the VALUES clause or replacing the variables with their corresponding URI was the same in terms of query execution. Can someone confirm the expected behavior of the VALUES keyword? also explain the difference between using it outside of the WHERE clause or inside of the WHERE clause ?
Does the fact that the variable set using VALUES appears in FILTER NOT EXISTS clause change something?
Can you confirm this is the correct approach for the requirement above (I want the query to remain generic and I want to be able to inject values for certain variables at execution time)?
Is it possible that this behavior is specific to how Fuseki handles VALUES?
Thanks !
SELECT DISTINCT ...
WHERE {
# ?x ...
# ... basic graph pattern here
{
{
# ... basic graph pattern here
FILTER NOT EXISTS {
# ?x ...
# ... basic graph pattern here
}
FILTER NOT EXISTS {
# ... basic graph pattern here
FILTER NOT EXISTS {
# ?x ...
# ... basic graph pattern here
}
}
}
UNION
{
?x ...
# ... basic graph pattern here
}
UNION
{
# ... basic graph pattern here
FILTER NOT EXISTS {
?x ...
# ... basic graph pattern here
}
FILTER NOT EXISTS {
# ... basic graph pattern here
FILTER NOT EXISTS {
?x ...
# ... basic graph pattern here
}
}
}
UNION
{
?x ...
}
}
}
VALUES ?x { <http://example.com/Foo> }
Not supposed to be an answer, but formatting in comments is impossible...
There is at least some obvious difference in the algebra tree. How this is handled is probably implementation specific. Andy knows better and hopefully give a more useful answer than mine.
without VALUES:
Query
SELECT ?s ?o
WHERE
{ { <test_val> <p> ?o }
UNION
{ <test_val> <p> ?o
FILTER NOT EXISTS { <test_val> a ?type }
}
}
Algebra tree (optimized)
(base <http://example/base/>
(project (?s ?o)
(union
(bgp (triple <test_val> <p> ?o))
(filter (notexists (bgp (triple <test_val> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> ?type)))
(bgp (triple <test_val> <p> ?o))))))
with VALUES
Query
SELECT ?s ?o
WHERE
{ { ?s <p> ?o }
UNION
{ ?s <p> ?o
FILTER NOT EXISTS { ?s a ?type }
}
}
VALUES ?s { <test_val> }
Algebra tree
(base <http://example/base/>
(project (?s ?o)
(join
(union
(bgp (triple ?s <p> ?o))
(filter (notexists (bgp (triple ?s <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> ?type)))
(bgp (triple ?s <p> ?o))))
(table (vars ?s)
(row [?s <test_val>])
))))
Algebra tree(optimized)
(base <http://example/base/>
(project (?s ?o)
(sequence
(table (vars ?s)
(row [?s <test_val>])
)
(union
(bgp (triple ?s <p> ?o))
(filter (notexists (bgp (triple ?s <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> ?type)))
(bgp (triple ?s <p> ?o)))))))

insert from two named graphs?

I'm looking for an easy way to insert triples from two or more named graphs (but not the entire unnamed default graph) into another named graph. I'm using GraphDB.
I guess this could be done by writing out the same query multiple times in the WHERE section, wrapped in multiple GRAPH specifications, and then unioning them together, but my WHEREs are long and I'd prefer not to write them out multiple times.
Let's say I have loaded some data like this:
INSERT DATA {
GRAPH <http://example.com/ngA> {
<http://example.com/person1> <http://example.com/name> "Arthur" .
}
GRAPH <http://example.com/ngB> {
<http://example.com/person1> <http://example.com/name> "Brian" .
}
GRAPH <http://example.com/ngC> {
<http://example.com/person1> <http://example.com/name> "Charlie" .
}
}
I can copy all of the triples of a certain pattern from the default unnamed graph into a new named graph with something like this:
INSERT {
GRAPH <http://example.com/ngZ> {
?s <http://example.com/moniker> ?o .
}
}
WHERE
{ ?s <http://example.com/name> ?o }
An easy way to SELECT for triples of a given pattern from two or more (but not all) named graphs is
SELECT *
FROM <http://example.com/ngA>
FROM <http://example.com/ngB>
WHERE
{ ?s <http://example.com/name> ?o }
What if I want to copy those triples, from those specified graphs, into another graph?
I'm getting an error from GraphDB 8.3 (and from the sparql.org validator) when I try to
INSERT {
GRAPH <http://example.com/ngZ> {
?s <http://example.com/moniker> ?o .
}
}
WHERE
{ SELECT *
FROM <http://example.com/ngA>
FROM <http://example.com/ngB>
WHERE
{ ?s <http://example.com/name> ?o } }
Try this query:
PREFIX ex: <http://example.com/>
INSERT {
GRAPH ex:ngZ { ?s ex:moniker ?o }
}
WHERE {
GRAPH ?g { ?s ex:name ?o }
FILTER (?g IN ( ex:ngA, ex:ngB ) )
}
And then:
PREFIX ex: <http://example.com/>
SELECT *
FROM NAMED ex:ngZ
WHERE {
GRAPH ?g { ?s ?p ?o }
} LIMIT 100
Is it what you need?
By the way, there exist COPY (use with caution!) and ADD.
SPARQL Update provides USING and USING NAMED analogous to FROM and FROM NAMED in queries:
The USING and USING NAMED clauses affect the RDF Dataset used while evaluating the WHERE clause. This describes a dataset in the same way as FROM and FROM NAMED clauses
You can express the requirement as an UPDATE like so:
INSERT {
GRAPH <http://example.com/ngZ> {
?s <http://example.com/moniker> ?o .
}
}
USING <http://example.com/ngA>
USING <http://example.com/ngB>
WHERE
{ ?s <http://example.com/name> ?o }
Also note that, according to the SPARQL query grammar, a subquery does not admit a dataset clause. This is why the SPARQL parsers are rejecting your query.
Thanks, #Stanislav Kralin
Come to think of it, this also works:
PREFIX ex: <http://example.com/>
INSERT {
GRAPH ex:ngZ {
?s ex:moniker ?o
}
}
WHERE {
values ?g {
ex:ngA ex:ngB
}
GRAPH ?g {
?s ex:name ?o
}
}

Is it possible to Filter Graphs in a way that they at most contain requested Data?

Let me start with an example query to explain my problem:
SELECT ?g ?s ?p ?o WHERE
{
{GRAPH ?g
{ ?s ?p ?o.
OPTIONAL{ ?s
ab:temperature ?temperature.}
FILTER (?temperature = 20)
FILTER NOT EXISTS {?s ab:person ?person}
}
}
}
This query gives me all graphs (in this case representing context data) that have a temperature of 20 but don't have a person associated. My problem is I want to query the graphs for certain optional properties but they shouldn't have any other properties. At the time of the query I only know the OPTIONAL part but I don't know which additional property might be there. Is there an easy way to do this with SPARQL or is that something that would be easier to check after I received the graph and converted it to an object which I can handle with my programm?
If i understand your question correctly, you are searching for graphs that only have that subjects with some properties but not others. In that case i'd run something like this:
SELECT ?g ?s ?p ?o WHERE {
GRAPH ?g {
?s ?p ?o.
FILTER NOT EXISTS {
?s ?bad [] .
FILTER (?bad NOT IN ( ab:temperature, ... ) )
}
}
}

SPARQL query on Protege for Number of Classes

In order to determine the number of classes in a .owl file,
I was advised to use the following SPARQL query:
SELECT ( count(?class) as ?count )
WHERE { graph <put_your_model_graph_name_here> { ?class a owl:Class . } }
However, when I replace the put_your_model_graph_name_here with my ontology IRI, I get 0
I also tried http://blahblahblah followed immediately by # to no avail.
What am I doing wrong?
Difficult to tell without seeing how you are loading and querying the data. Try using:
SELECT ( count(?class) as ?count ) { ?class a owl:Class }
which will query the default graph, or
SELECT ?g ( count(?class) as ?count )
{ graph ?g { ?class a owl:Class } }
group by ?g
which will give counts for all the named graphs.