SPARQL Update (DELETE/INSERT) with WHERE Condition Referencing Multiple Graphs [closed] - sparql

Closed. This question is not reproducible or was caused by typos. It is not currently accepting answers.
This question was caused by a typo or a problem that can no longer be reproduced. While similar questions may be on-topic here, this one was resolved in a way less likely to help future readers.
Closed 3 months ago.
Improve this question
I'm trying to do a SPARQL update in Jena Fuseki, but I can't seem to get it to work.
The following is the issue:
I have triples in a graph (named freetest), and those triples need to be changed.
And in a separate graph (named knora-base), I have a "ontology" describing the data in the first graph.
In order to identify the triples that need updating, I need information from the ontology.
In a SELECT query, this works perfectly fine:
SELECT *
FROM <http://www.knora.org/data/0001/freetest>
FROM <http://www.knora.org/ontology/knora-base>
WHERE {
?s ?p ?o.
?s <http://www.knora.org/ontology/knora-base#hasPermissions> 'b' .
?def <http://www.w3.org/2000/01/rdf-schema#subClassOf>* <http://www.knora.org/ontology/knora-base#Value> .
?s a ?def
}
LIMIT 25
The two FROM clauses make the WHERE block act on a union of the two graphs, so that I can check if hasPermission equals 'b', which is a triple in the data graph (freetest), as well as that the resource at hand is of type ?def (again in the data graph), which must be a <subClassOf>* of Value (which is defined in the ontology graph).
As mentioned, this works as expected.
However I don't get it to work to update those triples. Among many variations, I tried:
DELETE { GRAPH <http://www.knora.org/data/0001/freetest>
{?s <http://www.knora.org/ontology/knora-base#hasPermissions> ?perm}
}
INSERT { GRAPH <http://www.knora.org/data/0001/freetest>
{?s <http://www.knora.org/ontology/knora-base#hasPermissions> ?new}
}
USING <http://www.knora.org/ontology/knora-base>
USING <http://www.knora.org/data/0001/freetest>
WHERE {
VALUES ?perm {'b'}
VALUES ?new {'c'}
?s <http://www.knora.org/ontology/knora-base#hasPermissions> ?perm .
?def <http://www.w3.org/2000/01/rdf-schema#subClassOf>* <http://www.knora.org/ontology/knora-base#Resource> .
?s a ?def
}
I expected this to have the same behaviour as the SELECT query above, with multiple USING statements giving me access to multiple graphs in the WHERE clause.
However this doesn't update any triples in the database.
Note though, that when I remove the line ?def <http://www.w3.org/2000/01/rdf-schema#subClassOf>* <http://www.knora.org/ontology/knora-base#Resource> . it does update the triples. But obviously without the restriction I wanted, so too many triples get updated.
I'm sure I'm just not fully understanding named graphs, sparql update, USING, or all of them. But I also can't find much information on the topic. (While named graphs in SELECT seem to be covered by quite a number of very informative articles.)
Update 1:
According to the comments, I also tried to explicitly define the graph in the WHERE clause:
WITH <http://www.knora.org/data/0001/freetest>
DELETE { GRAPH <http://www.knora.org/data/0001/freetest>
{?s <http://www.knora.org/ontology/knora-base#hasPermissions> ?perm}
}
INSERT { GRAPH <http://www.knora.org/data/0001/freetest>
{?s <http://www.knora.org/ontology/knora-base#hasPermissions> ?new}
}
WHERE {
VALUES ?perm {'c'}
VALUES ?new {'d'}
?s <http://www.knora.org/ontology/knora-base#hasPermissions> ?perm .
GRAPH <http://www.knora.org/ontology/knora-base> {
?def <http://www.w3.org/2000/01/rdf-schema#subClassOf>* <http://www.knora.org/ontology/knora-base#Resource> .
}
?s a ?def
}
This did not change the observed behaviour.
Update 2:
I tried to reproduce it with a simple sample, so I did the following:
Use my regular Fuseki Setup
Send a DROP ALL to have the same settings but no triples
Upload a simple dataset:
#prefix a: <http://test.org/a#> .
#prefix b: <http://test.org/b#> .
#prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
#prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
#prefix xsd: <http://www.w3.org/2001/XMLSchema#> .
#prefix owl: <http://www.w3.org/2002/07/owl#> .
a:A {
a:Resource
a owl:Class ;
rdfs:label "Resource"#en .
a:Value
a owl:Class ;
rdfs:label "Value"#en .
a:SubResource
a owl:Class ;
rdfs:subClassOf a:Resource ;
rdfs:label "Sub-Resource"#en .
}
b:B {
b:ResourceInstance
a a:Resource ;
b:hasValue "a".
b:ValueInstance
a a:Value ;
b:hasValue "a".
b:SubResourceInstance
a a:SubResource ;
b:hasValue "a".
}
In this, a: is the ontology and b: is the data.
Then I sent the following update:
WITH <http://test.org/b#B>
DELETE {
?s <http://test.org/b#hasValue> ?oldVal
}
INSERT {
?s <http://test.org/b#hasValue> ?newVal
}
USING <http://test.org/a#A>
USING <http://test.org/b#B>
WHERE {
VALUES ?oldVal {'a'}
VALUES ?newVal {'b'}
?s <http://test.org/b#hasValue> ?oldVal .
?def <http://www.w3.org/2000/01/rdf-schema#subClassOf>* <http://test.org/a#Resource> .
?s a ?def
}
This produced the expected result: The values in b:ResourceInstance and b:SubResourceInstance get updated, but not in b:ValueInstance.
So, to conclude: SPARQL and Fuseki do exactly what they should... so the initial problem seems to have to do with my real data (or my lack of understanding of my data).

Related

Conditional insertion of a new concept class to an ontology model

I am in a learning phase of SPARQL and ontology building. I have a model and I would like to add a new concept class to multiple concepts in a model using regex/filter.
I have following concepts:
A647674
A878678
RR36868
DD36868
The expected output is :
A647674
A878678
RR36868 rdf:type http://schemas.aaaaaaa.com/ontologies/drug#SmallMolecule
DD36868 rdf:type http://schemas.aaaaaaa.com/ontologies/drug#SmallMolecule
I am using below SPARQL query to do this.
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
INSERT {
?s rdf:type 'http://schemas.aaaaaaa.com/ontologies/drug#SmallMolecule' .
}
WHERE
{
{?s ?p ?o .
filter regex(str(?s), "http://ontology.aaaaaaa.com/drugs/aaaaaaa#RR-").
}
union
{?s ?p ?o .
filter regex(str(?s), "http://ontology.aaaaaaa.com/drugs/aaaaaaa#DD").
}
};
#LIMIT 100
I am getting below error using above query.
OmServerGenericException[message="http://schemas.aaaaaaa.com/ontologies/drug#SmallMolecule",responseCode=500]
Caused by: org.apache.jena.rdf.model.ResourceRequiredException: "http://schemas.aaaaaaa.com/ontologies/drug#SmallMolecule"
Any help is highly appreciated
You are providing a string value:
?s rdf:type 'http://schemas.aaaaaaa.com/ontologies/drug#SmallMolecule' .
You need to provide a URI value:
?s rdf:type <http://schemas.aaaaaaa.com/ontologies/drug#SmallMolecule> .
Or you could define a prefix (PREFIX drug: <http://schemas.aaaaaaa.com/ontologies/drug#>) to use it like this:
?s rdf:type drug:SmallMolecule .
(The suggestions given in my answer to your previous question apply here, too: you could use STRSTARTS instead of REGEX, and one FILTER with || instead of UNION.)

Why does this federated SPARQL query work in TopBraid but not in Apache Fuseki?

I have the following federated SPARQL query that works as I expect in TopBraid Composer Free Edition (version 5.1.4) but does not work in Apache Fuseki (version 2.3.1):
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX movie: <http://data.linkedmdb.org/resource/movie/>
PREFIX dcterms: <http://purl.org/dc/terms/>
SELECT ?s WHERE {
SERVICE <http://data.linkedmdb.org/sparql> {
<http://data.linkedmdb.org/resource/film/1> movie:actor ?actor .
?actor movie:actor_name ?actorName .
}
SERVICE <http://dbpedia.org/sparql?timeout=30000> {
?s ?p ?o .
FILTER(regex(str(?s), replace(?actorName, " ", "_"))) .
}
}
I monitor the sub SPARQL queries that are being executed under the hood and notice that TopBraid correctly executes the following query to the http://dbpedia.org/sparql endpoint:
SELECT *
WHERE
{ ?s ?p ?o
FILTER regex(str(?s), replace("Paul Reubens", " ", "_"))
}
while Apache Fuseki executes the following sub query:
SELECT *
WHERE
{ ?s ?p ?o
FILTER regex(str(?s), replace(?actorName, " ", "_"))
}
Notice the difference; TopBraid replace the variable ?actorName with a particular value 'Paul Reubens', while Apache Fuseki does not. This results in an error from the http://dbpedia.org/sparql endpoint because the ?actorName is used in the result set but not assigned.
Is this a bug in Apache Fuseki or a feature in TopBraid? How can I make Apache Fuseki correctly execute this Federated query.
update 1: to clarify the behaviour difference between TopBraid and Apache Fuseki a bit more. TopBraid executes the linkedmdb.org subquery first and then executes the dbpedia.org subquery for each result of the linkedmdb.org query )(and substitutes the ?actorName with the results from the linkedmdb.org query). I assumed Apache Fuseki behaves similar, but the first subquery to dbpedia.org fails (because ?actorName is used in the result set but not assigned) and so it does not continue. But now I am not sure if it actually want to execute the subquery to dbpedia.org multiple times, because it never gets there.
update 2: I think both TopBraid and Apache Fuseki use Jena/ARQ, but I noticed that in stack traces from TopBraid the package name is something like com.topbraid.jena.* which might indicate they use a modified version of Jena/ARQ?
update 3: Joshua Taylor says below: "Surely you wouldn't expect the second service block to be executed for each one of them?". Both TopBraid and Apache Fuseki use exactly this method for the following query:
PREFIX owl: <http://www.w3.org/2002/07/owl#>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX movie: <http://data.linkedmdb.org/resource/movie/>
PREFIX dcterms: <http://purl.org/dc/terms/>
SELECT ?film ?label ?subject WHERE {
SERVICE <http://data.linkedmdb.org/sparql> {
?film a movie:film .
?film rdfs:label ?label .
?film owl:sameAs ?dbpediaLink
FILTER(regex(str(?dbpediaLink), "dbpedia", "i"))
}
SERVICE <http://dbpedia.org/sparql> {
?dbpediaLink dcterms:subject ?subject
}
}
LIMIT 50
but I agree that in principle they should execute both parts once and join them, but maybe for performance reasons they chose a different strategy?
Additionally, notice how the above query works on Apache Fuseki, while the first query of this post does not. So, Apache Fuseki is actually behaving similarly to TopBraid in this particular case. It seems to be related to using an URI variable (?dbpediaLink) in two triple patterns (which works in Fuseki) compared to using a String variable (?actorName) from a triple pattern in a FILTER regex function (which does not work in Fuseki).
Updated (Simpler) Response
In the original answer I wrote (below), I said that the issue was that SPARQL queries are executed innermost first. I think that that still applies here, but I think the problem can be isolated even more easily. If you have
service <ex1> { ... }
service <ex2> { ... }
then the results have to be what you'd get from executing each query separately on the endpoints and then joining the results. The join will merge any results where the common variables have the same values. E.g.,
service <ex1> { values ?a { 1 2 3 } }
service <ex2> { values ?a { 2 3 4 } }
would execute, and you'd have two possible values for ?a in the outer query (2 and 3). In your query, the second service can't produce any results. If you take:
?s ?p ?o .
FILTER(regex(str(?s), replace(?actorName, " ", "_"))) .
and execute it at DBpedia, you shouldn't get any results, because ?actorName isn't bound, so the filter will never succeed. It appears that TopBraid is performing the first service first and then injecting the resulting values into your second service. That's convenient, but I don't think it's correct, because it returns different results than what you'd get if the DBpedia query had been executed first and the other query executed second.
Original Answer
Subqueries in SPARQL are executed inner-most first. That means that a query like
select * {
{ select ?x { ?x a :Cat } }
?x foaf:name ?name
}
Would first find all the cats, and would then find their names. "Candidate" values for ?x are determined first by the subquery, and then those values for ?x are made available to the outer query. Now, when there are two subqueries, e.g.,
select * {
{ select ?x { ?x a :Cat } }
{ select ?x ?name { ?x foaf:name ?name } }
}
the first subquery is going to find all the cats. The second subquery finds all the names of everything that has a name, and then in the outer query, the results are joined to get just the names of the cats. The values of ?x from the first subquery aren't available during the execution of the second subquery. (At least in principle, a query optimizer might be able to figure out that some things should be restricted.)
My understanding is that service blocks have the same kind of semantics. In your query, you have:
SERVICE <http://data.linkedmdb.org/sparql> {
<http://data.linkedmdb.org/resource/film/1> movie:actor ?actor .
?actor movie:actor_name ?actorName .
}
SERVICE <http://dbpedia.org/sparql?timeout=30000> {
?s ?p ?o .
FILTER(regex(str(?s), replace(?actorName, " ", "_"))) .
}
You say that tracing shows that TopBraid is executing
SELECT *
WHERE
{ ?s ?p ?o
FILTER regex(str(?s), replace("Paul Reubens", " ", "_"))
}
If TopBraid already executed the first service block and got a unique solution, then that might be an acceptable optimization, but what if, for instance, the first query had returned multiple bindings for ?actorName? Surely you wouldn't expect the second service block to be executed for each one of them? Instead, the second service block is executed as written, and will return a result set that will be joined with the result set from the first.
The reason that it probably "doesn't work" in Jena is because the second query doesn't actually bind any variables, so it's pretty much got to look at every triple in the data, which is obviously going to take a long time.
I think that you can get around this by nesting the service calls. If nested service are all launched by the "local" endpoint (i.e., nesting a service call doesn't ask a remote endpoint to make another remote query), then you might be able to do:
SERVICE <http://dbpedia.org/sparql?timeout=30000> {
SERVICE <http://data.linkedmdb.org/sparql> {
<http://data.linkedmdb.org/resource/film/1> movie:actor ?actor .
?actor movie:actor_name ?actorName .
}
?s ?p ?o .
FILTER(regex(str(?s), replace(?actorName, " ", "_"))) .
}
That might get you the kind of optimization that you want, but that still seems like it might not work unless DBpedia has some efficient ways of figuring out which triples to retrieve based on computing the replace. You're asking DBpedia to look at all its triples, and then to keep the ones where the string form of the subject matches a particular regular expression. It'd probably be better to construct that IRI manually in a subquery and then search for it. I.e.,
SERVICE <http://dbpedia.org/sparql?timeout=30000> {
{ select ?actor {
SERVICE <http://data.linkedmdb.org/sparql> {
<http://data.linkedmdb.org/resource/film/1> movie:actor ?actor .
?actor movie:actor_name ?actorName .
}
bind(iri(concat("http://dbpedia.org/resource",
replace(?actorName," ","_")))
as ?actor)
} }
?actor ?p ?o
}
(long comment)
Consider:
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX movie: <http://data.linkedmdb.org/resource/movie/>
PREFIX dcterms: <http://purl.org/dc/terms/>
SELECT ?s WHERE {
{
<http://data.linkedmdb.org/resource/film/1> movie:actor ?actor .
?actor movie:actor_name ?actorName .
}
{
?s ?p ?o .
FILTER(regex(str(?s), replace(?actorName, " ", "_"))) .
}
}
that is the same query but with no SERVICE calls.
?actorName is not in a pattern of the inner second {}.
As join is a commutative operation, this has the same answers as the first query.
SELECT ?s WHERE {
{
?s ?p ?o .
FILTER(regex(str(?s), replace(?actorName, " ", "_"))) .
}
{
<http://data.linkedmdb.org/resource/film/1> movie:actor ?actor .
?actor movie:actor_name ?actorName .
}
}
The SERVICE version highlights this because the parts are executes separately on different machines.
The join of the two parts happens on the results of each part.

Custom SPARQL Construct with enumeration

Is it possible to execute SPARQL construct while adding information outside the scope of query? e.g., I want to execute SPARQL construct while defining enumeration information like this:
PREFIX skos:<http://www.w3.org/2004/02/skos/core#>
construct {
?s a skos:Concept
?s ex:index <enumeration starting from 1 -- this is just a sample>
}
where {
?s a skos:Concept
}
is it possible to do something like that with pure SPARQL? what are the alternatives?
* Additional Information *
Probably I am not explained my problem clearly, so basically I want to achieve the following (assuming that ex:index is a valid datatypeProperty):
== Initial RDF triples ==
#prefix skos:<http://www.w3.org/2004/02/skos/core#>
#prefix ex: <http://example.org/> .
ex:abc rdf:type skos:Concept .
ex:def rdf:type skos:Concept .
...
ex:endOfSample rdf:type skos:Concept .
== RDF triples after SPARQL Update execution ==
#prefix skos:<http://www.w3.org/2004/02/skos/core#>
#prefix ex: <http://example.org/> .
ex:abc rdf:type skos:Concept ;
ex:index 1 .
ex:def rdf:type skos:Concept ;
ex:index 2 .
...
ex:endOfSample rdf:type skos:Concept ;
ex:index <endOfSampleNumber> .
You can construct any valid RDF value in a CONSTRUCT. However the query will fail if any of the variables in the CONSTRUCT graph pattern is unbound after executing the WHERE graph. I.e. there can be no binding for ?p in your query and the CONSTRUCT will never execute.
This is an example that should get you started:
PREFIX skos:<http://www.w3.org/2004/02/skos/core#>
PREFIX ex:<http://example.org/construct#>
construct {
ex:someProp a owl:ObjectProperty .
?s ex:someProp (1 2 3)
}
where {
?s a skos:Concept
}
This will result in the construction of seven triples for the property value and the list structure.
The ex:someProp is added because there isn't a good object property in SKOS for ad-hoc lists. It would be best to define the property with some semantic meaning. Also note that while the {ex:someProp a owl:ObjectProperty} triple will be asserted for each match of {?s a skos:Concept}, it is the same triple, hence there will be only one in the end. The price is efficiency, so asserting the property outside of this query would be a better choice - it is included in the above query for the sake of example completeness.

full text search in jena sparql?

I am new to sparql and I am trying to search a word in one of the property . The simple queries works fine but I don't know how to perform full text search . I saw this example on jena website :
PREFIX text: <http://jena.apache.org/text#>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
SELECT ?s
{ ?s text:query (rdfs:label 'word' 10) ;
rdfs:label ?label
}
my model contains property named SUB: and I want to write a query for that . I don't understand what is text and query in text:query means in the above example . Pardon me if this question doesn't meet the requirements of SO.
Link to website:http://jena.apache.org/documentation/query/text-query.html
You may not need a full text index:
SELECT ?s
{ ?s your:property ?o .
FILTER regex(str(?o), "word", "i")
}
but if you do text:query is a "property function" -- it trigger accessing the Apache Lucene index and causing ?s to be bound to each of the answers from a match of 'word' (to a limit of 10) over the rdfs:label properties if you have correctly configured and loaded the data and index.

Sparql to recover the Type of a DBpedia resource

I need a Sparql query to recover the Type of a specific DBpedia resource. Eg.:
pt.DBpedia resource: http://pt.dbpedia.org/resource/Argentina
Expected type: Country (as can be seen at http://pt.dbpedia.org/page/Argentina)
Using pt.DBpedia Sparql Virtuoso Interface (http://pt.dbpedia.org/sparql) I have the query below:
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
select ?l ?t where {
?l rdfs:label "Argentina"#pt .
?l rdf:type ?t .
}
But it is not recovering anything, just print the variable names. The virtuoso answer.
Actually I do not need to recover the label (?l) too.
Anyone can fix it, or help me to define the correct query?
http in graph name
I'm not sure how you generated your query string, but when I copy and paste your query into the endpoint and run it, I get results, and the resulting URL looks like:
http://pt.dbpedia.org/sparql?default-graph-uri=http%3A%2F%2Fpt.dbpedia.org&sho...
However, the link in your question is:
http://pt.dbpedia.org/sparql?default-graph-uri=pt.dbpedia.org%2F&should-sponge...
If you look carefully, you'll see that the default-graph-uri parameters are different:
yours: pt.dbpedia.org%2F
mine: http%3A%2F%2Fpt.dbpedia.org
I'm not sure how you got a URL like the one you did, but it's not right; the default-graph-uri needs to be http://pt.dbpedia.org, not pt.dbpedia.org/.
The query is fine
When I run the query you've provided at the endpoint you've linked to, I get the results that I'd expect. It's worth noting that the label here is the literal "Argentina"#pt, and that what you've called ?l is the individual, not the label. The individual ?l has the label "Argentina"#pt.
We can simplify your query a bit, using ?i instead of ?l (to suggest individual):
select ?i ?type where {
?i rdfs:label "Argentina"#pt ;
a ?type .
}
When I run this at the Portuguese endpoint, I get these results:
If you don't want the individual in the results, you don't have to select it:
select ?type where {
?i rdfs:label "Argentina"#pt ;
a ?type .
}
or even:
select ?type where {
[ rdfs:label "Argentina"#pt ; a ?type ]
}
If you know the identifier of the resource, and don't need to retrieve it by using its label, you can even just do:
select ?type where {
dbpedia-pt:Argentina a ?type
}
type
==========================================
http://www.w3.org/2002/07/owl#Thing
http://www.opengis.net/gml/_Feature
http://dbpedia.org/ontology/Place
http://dbpedia.org/ontology/PopulatedPlace
http://dbpedia.org/ontology/Country
http://schema.org/Place
http://schema.org/Country