Can't delete triples from Fuseki/Jena TDB - sparql

I'm trying to do some simple deletes of triples in my TDB. I'm trying to delete any triples that have a certain value, and any triples that link to it. This is an example of one of the queries I'm running through Fuseki.
with <http://XXXXXXXXXXXX/XXXX/>
delete {
?s2 ?p2 ?s .
?s ?p ?o .
}
where
{
?s2 ?p2 ?s .
?s ?p ?o .
filter(strStarts(?o, "cPage")) .
}
I get this response:
Success
Update succeeded
But no triples are actually removed. I've checked that the --update flag is getting passed to Fuseki, but I can't figure out why nothing's happening.

In the SPARQL UPDATE section of fuseki, type clear default.

Had this problem myself once. What I did wrong was to use the default endpoint: /datasetname/sparql (or /datasetname/query). These are read-only endpoints. If you want to insert or delete data, you can to use /datasetname/update as the enpoint.
See picture below: If you use the localhost webinterface, you can change the endpoint in the lower left.

Related

How to get all related triples to a subject in SPARQL?

I would need to get all data related to a subject. I tried to query for
SELECT * WHERE {?s ?p ?o}
But the problem is that some of the ?o objects are URIs and I also need those connections, until the connections end. For example, I have:
http://www.example.com/data/subject-test http://www.example.com#hasNetListPrice http://www.example.com/data/price-subject-test.
http://www.example.com/data/price-subject-test http://www.example.com/si-cpq/data/price-currency http://www.example.com/data/EURO.
And so on, until the triples are no longer connected to the initial subject, http://www.example.com/si-cpq/data/subject-test.
Also, there are sometimes 4 triples connected, sometimes more, so I could not use the same pattern for all. Also, would need a general select query, not one that works only for the price triples, because the data has more attributes.
(This is probably only a partial answer yet, but might be improved by more information from comments/edits):
First of all it might be necessary to specify which rdf:types you accept in the result. Also, you probably will have to add "layers of optional intermediate variables":
# untested due to the lack of data
prefix cpq: <http://www.example.com/si-cpq/data/>
SELECT * WHERE
{
# defining admissible types (here: guessed type-names)
VALUES ?t {cpq:Currency cpq:Foobar}
?s ?p ?o1. # first relation
{
# direct connection between ?s and target type
?o1 rdf:type ?t.
}
UNION
{
# level 1 indirect connection between ?s and target type
?o1 ?p2 ?o2.
?o2 rdf:type ?t.
}
UNION
{
# level 2 indirect connection between ?s and target type
?o1 ?p2 ?o2.
?o2 ?p3 ?o3.
?o3 rdf:type ?t.
}
# add more levels and restrictions as needed
}
You might also have a look on property paths, but they are only applicable for fixed (i.e. not variable) properties.

Understanding CONSTRUCT templates

I'm learning SPARQL right now, and I'm having trouble understanding the limits of CONSTRUCT templates. Normally everything works fine, just about the way I'd expect it to. However, my understanding breaks down when I start to make templates that don't semantically make sense. Here's an example:
I have the following data stored:
me: a foaf:Person .
foaf:mbox rdfs:label "Email" .
With a default template of ?s ?p ?o, I obviously get that exact data back. If I go for something a bit nonsenical, like this:
CONSTRUCT {
?type ?labeled ?label
}
WHERE {
me: a ?type .
?labeled rdfs:label ?label .
}
I get back this triple:
foaf:Person foaf:mbox "Email" .
This kinda makes sense to me, because there's three variables, and each has only one value it can bind to in the dataset. However, as soon as I switch the order of the variables in the template to be like this: ?type ?label ?labeled, I get literally nothing back. Why is that? The template ?type ?labeled ?label already breaks the original structure of the data and I still get something back, so why would ?type ?label ?labeled be any different?
As noted by #AKSW, literals can't be predicates. "Email" is a literal, and thus it couldn't be CONSTRUCTed into a predicate position.

Why does this federated SPARQL query work in TopBraid but not in Apache Fuseki?

I have the following federated SPARQL query that works as I expect in TopBraid Composer Free Edition (version 5.1.4) but does not work in Apache Fuseki (version 2.3.1):
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX movie: <http://data.linkedmdb.org/resource/movie/>
PREFIX dcterms: <http://purl.org/dc/terms/>
SELECT ?s WHERE {
SERVICE <http://data.linkedmdb.org/sparql> {
<http://data.linkedmdb.org/resource/film/1> movie:actor ?actor .
?actor movie:actor_name ?actorName .
}
SERVICE <http://dbpedia.org/sparql?timeout=30000> {
?s ?p ?o .
FILTER(regex(str(?s), replace(?actorName, " ", "_"))) .
}
}
I monitor the sub SPARQL queries that are being executed under the hood and notice that TopBraid correctly executes the following query to the http://dbpedia.org/sparql endpoint:
SELECT *
WHERE
{ ?s ?p ?o
FILTER regex(str(?s), replace("Paul Reubens", " ", "_"))
}
while Apache Fuseki executes the following sub query:
SELECT *
WHERE
{ ?s ?p ?o
FILTER regex(str(?s), replace(?actorName, " ", "_"))
}
Notice the difference; TopBraid replace the variable ?actorName with a particular value 'Paul Reubens', while Apache Fuseki does not. This results in an error from the http://dbpedia.org/sparql endpoint because the ?actorName is used in the result set but not assigned.
Is this a bug in Apache Fuseki or a feature in TopBraid? How can I make Apache Fuseki correctly execute this Federated query.
update 1: to clarify the behaviour difference between TopBraid and Apache Fuseki a bit more. TopBraid executes the linkedmdb.org subquery first and then executes the dbpedia.org subquery for each result of the linkedmdb.org query )(and substitutes the ?actorName with the results from the linkedmdb.org query). I assumed Apache Fuseki behaves similar, but the first subquery to dbpedia.org fails (because ?actorName is used in the result set but not assigned) and so it does not continue. But now I am not sure if it actually want to execute the subquery to dbpedia.org multiple times, because it never gets there.
update 2: I think both TopBraid and Apache Fuseki use Jena/ARQ, but I noticed that in stack traces from TopBraid the package name is something like com.topbraid.jena.* which might indicate they use a modified version of Jena/ARQ?
update 3: Joshua Taylor says below: "Surely you wouldn't expect the second service block to be executed for each one of them?". Both TopBraid and Apache Fuseki use exactly this method for the following query:
PREFIX owl: <http://www.w3.org/2002/07/owl#>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX movie: <http://data.linkedmdb.org/resource/movie/>
PREFIX dcterms: <http://purl.org/dc/terms/>
SELECT ?film ?label ?subject WHERE {
SERVICE <http://data.linkedmdb.org/sparql> {
?film a movie:film .
?film rdfs:label ?label .
?film owl:sameAs ?dbpediaLink
FILTER(regex(str(?dbpediaLink), "dbpedia", "i"))
}
SERVICE <http://dbpedia.org/sparql> {
?dbpediaLink dcterms:subject ?subject
}
}
LIMIT 50
but I agree that in principle they should execute both parts once and join them, but maybe for performance reasons they chose a different strategy?
Additionally, notice how the above query works on Apache Fuseki, while the first query of this post does not. So, Apache Fuseki is actually behaving similarly to TopBraid in this particular case. It seems to be related to using an URI variable (?dbpediaLink) in two triple patterns (which works in Fuseki) compared to using a String variable (?actorName) from a triple pattern in a FILTER regex function (which does not work in Fuseki).
Updated (Simpler) Response
In the original answer I wrote (below), I said that the issue was that SPARQL queries are executed innermost first. I think that that still applies here, but I think the problem can be isolated even more easily. If you have
service <ex1> { ... }
service <ex2> { ... }
then the results have to be what you'd get from executing each query separately on the endpoints and then joining the results. The join will merge any results where the common variables have the same values. E.g.,
service <ex1> { values ?a { 1 2 3 } }
service <ex2> { values ?a { 2 3 4 } }
would execute, and you'd have two possible values for ?a in the outer query (2 and 3). In your query, the second service can't produce any results. If you take:
?s ?p ?o .
FILTER(regex(str(?s), replace(?actorName, " ", "_"))) .
and execute it at DBpedia, you shouldn't get any results, because ?actorName isn't bound, so the filter will never succeed. It appears that TopBraid is performing the first service first and then injecting the resulting values into your second service. That's convenient, but I don't think it's correct, because it returns different results than what you'd get if the DBpedia query had been executed first and the other query executed second.
Original Answer
Subqueries in SPARQL are executed inner-most first. That means that a query like
select * {
{ select ?x { ?x a :Cat } }
?x foaf:name ?name
}
Would first find all the cats, and would then find their names. "Candidate" values for ?x are determined first by the subquery, and then those values for ?x are made available to the outer query. Now, when there are two subqueries, e.g.,
select * {
{ select ?x { ?x a :Cat } }
{ select ?x ?name { ?x foaf:name ?name } }
}
the first subquery is going to find all the cats. The second subquery finds all the names of everything that has a name, and then in the outer query, the results are joined to get just the names of the cats. The values of ?x from the first subquery aren't available during the execution of the second subquery. (At least in principle, a query optimizer might be able to figure out that some things should be restricted.)
My understanding is that service blocks have the same kind of semantics. In your query, you have:
SERVICE <http://data.linkedmdb.org/sparql> {
<http://data.linkedmdb.org/resource/film/1> movie:actor ?actor .
?actor movie:actor_name ?actorName .
}
SERVICE <http://dbpedia.org/sparql?timeout=30000> {
?s ?p ?o .
FILTER(regex(str(?s), replace(?actorName, " ", "_"))) .
}
You say that tracing shows that TopBraid is executing
SELECT *
WHERE
{ ?s ?p ?o
FILTER regex(str(?s), replace("Paul Reubens", " ", "_"))
}
If TopBraid already executed the first service block and got a unique solution, then that might be an acceptable optimization, but what if, for instance, the first query had returned multiple bindings for ?actorName? Surely you wouldn't expect the second service block to be executed for each one of them? Instead, the second service block is executed as written, and will return a result set that will be joined with the result set from the first.
The reason that it probably "doesn't work" in Jena is because the second query doesn't actually bind any variables, so it's pretty much got to look at every triple in the data, which is obviously going to take a long time.
I think that you can get around this by nesting the service calls. If nested service are all launched by the "local" endpoint (i.e., nesting a service call doesn't ask a remote endpoint to make another remote query), then you might be able to do:
SERVICE <http://dbpedia.org/sparql?timeout=30000> {
SERVICE <http://data.linkedmdb.org/sparql> {
<http://data.linkedmdb.org/resource/film/1> movie:actor ?actor .
?actor movie:actor_name ?actorName .
}
?s ?p ?o .
FILTER(regex(str(?s), replace(?actorName, " ", "_"))) .
}
That might get you the kind of optimization that you want, but that still seems like it might not work unless DBpedia has some efficient ways of figuring out which triples to retrieve based on computing the replace. You're asking DBpedia to look at all its triples, and then to keep the ones where the string form of the subject matches a particular regular expression. It'd probably be better to construct that IRI manually in a subquery and then search for it. I.e.,
SERVICE <http://dbpedia.org/sparql?timeout=30000> {
{ select ?actor {
SERVICE <http://data.linkedmdb.org/sparql> {
<http://data.linkedmdb.org/resource/film/1> movie:actor ?actor .
?actor movie:actor_name ?actorName .
}
bind(iri(concat("http://dbpedia.org/resource",
replace(?actorName," ","_")))
as ?actor)
} }
?actor ?p ?o
}
(long comment)
Consider:
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX movie: <http://data.linkedmdb.org/resource/movie/>
PREFIX dcterms: <http://purl.org/dc/terms/>
SELECT ?s WHERE {
{
<http://data.linkedmdb.org/resource/film/1> movie:actor ?actor .
?actor movie:actor_name ?actorName .
}
{
?s ?p ?o .
FILTER(regex(str(?s), replace(?actorName, " ", "_"))) .
}
}
that is the same query but with no SERVICE calls.
?actorName is not in a pattern of the inner second {}.
As join is a commutative operation, this has the same answers as the first query.
SELECT ?s WHERE {
{
?s ?p ?o .
FILTER(regex(str(?s), replace(?actorName, " ", "_"))) .
}
{
<http://data.linkedmdb.org/resource/film/1> movie:actor ?actor .
?actor movie:actor_name ?actorName .
}
}
The SERVICE version highlights this because the parts are executes separately on different machines.
The join of the two parts happens on the results of each part.

Sparql to recover the Type of a DBpedia resource

I need a Sparql query to recover the Type of a specific DBpedia resource. Eg.:
pt.DBpedia resource: http://pt.dbpedia.org/resource/Argentina
Expected type: Country (as can be seen at http://pt.dbpedia.org/page/Argentina)
Using pt.DBpedia Sparql Virtuoso Interface (http://pt.dbpedia.org/sparql) I have the query below:
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
select ?l ?t where {
?l rdfs:label "Argentina"#pt .
?l rdf:type ?t .
}
But it is not recovering anything, just print the variable names. The virtuoso answer.
Actually I do not need to recover the label (?l) too.
Anyone can fix it, or help me to define the correct query?
http in graph name
I'm not sure how you generated your query string, but when I copy and paste your query into the endpoint and run it, I get results, and the resulting URL looks like:
http://pt.dbpedia.org/sparql?default-graph-uri=http%3A%2F%2Fpt.dbpedia.org&sho...
However, the link in your question is:
http://pt.dbpedia.org/sparql?default-graph-uri=pt.dbpedia.org%2F&should-sponge...
If you look carefully, you'll see that the default-graph-uri parameters are different:
yours: pt.dbpedia.org%2F
mine: http%3A%2F%2Fpt.dbpedia.org
I'm not sure how you got a URL like the one you did, but it's not right; the default-graph-uri needs to be http://pt.dbpedia.org, not pt.dbpedia.org/.
The query is fine
When I run the query you've provided at the endpoint you've linked to, I get the results that I'd expect. It's worth noting that the label here is the literal "Argentina"#pt, and that what you've called ?l is the individual, not the label. The individual ?l has the label "Argentina"#pt.
We can simplify your query a bit, using ?i instead of ?l (to suggest individual):
select ?i ?type where {
?i rdfs:label "Argentina"#pt ;
a ?type .
}
When I run this at the Portuguese endpoint, I get these results:
If you don't want the individual in the results, you don't have to select it:
select ?type where {
?i rdfs:label "Argentina"#pt ;
a ?type .
}
or even:
select ?type where {
[ rdfs:label "Argentina"#pt ; a ?type ]
}
If you know the identifier of the resource, and don't need to retrieve it by using its label, you can even just do:
select ?type where {
dbpedia-pt:Argentina a ?type
}
type
==========================================
http://www.w3.org/2002/07/owl#Thing
http://www.opengis.net/gml/_Feature
http://dbpedia.org/ontology/Place
http://dbpedia.org/ontology/PopulatedPlace
http://dbpedia.org/ontology/Country
http://schema.org/Place
http://schema.org/Country

CONSTRUCT into a named graph

I am attempting to use a SPARQL Construct query to create a new named graph from an existing one. The database I am querying contains http://graph.com/old as an existing named graph. I am using Jena TDB as the database, accessed through a Jena Fuseki endpoint. The below query gives me an error:
CONSTRUCT
{
GRAPH <http://graph.com/new> {
?s ?p ?o
}
}
WHERE
{
GRAPH <http://graph.com/old> {
?s ?p ?o
}
}
If I remove the graph statement from the CONSTRUCT block, the query works perfectly, but I would like to place the triples into a named graph that I specify instead of the default one.
As far as I could find, the SPARQL 1.1 section on CONSTRUCT does not say anything about constructing into named graphs. Is there a way to do this?
Just as SELECT queries are used when you are interested in getting a set of variable bindings back, CONSTRUCT queries are used you are interested in getting a model back. Just as the variables bound in a SELECT result set are not put into any model or persistent set of bindings, neither is the model built by a CONSTRUCT stored anywhere. You want to use SPARQL 1.1 INSERT. The update features are described in 3 SPARQL 1.1 Update Language. Your update request can thus be written as:
INSERT {
GRAPH <http://graph.com/new> {
?s ?p ?o
}
}
WHERE {
GRAPH <http://graph.com/old> {
?s ?p ?o
}
}
For this particular case, though, you might be able to use the COPY operation described in 3.2.3 COPY. COPY removes all the data from the target graph first though, so it might not be applicable to your actual case (understanding that the code you provided may be a minimal example, and not necessarily the actual update you're trying to perform). About COPY the standard says:
The COPY operation is a shortcut for inserting all data from an input
graph into a destination graph. Data from the input graph is not
affected, but data from the destination graph, if any, is removed
before insertion.
COPY ( SILENT )? ( ( GRAPH )? IRIref_from | DEFAULT) TO ( ( GRAPH )? IRIref_to | DEFAULT )
is similar in operation to:
DROP SILENT (GRAPH IRIref_to | DEFAULT);
INSERT { ( GRAPH IRIref_to )? { ?s ?p ?o } } WHERE { ( GRAPH IRIref_from )? { ?s ?p ?o } }
The difference between COPY and the DROP/INSERT combination is that if
COPY is used to copy a graph onto itself then no operation will be
performed and the data will be left as it was. Using DROP/INSERT in
this situation would result in an empty graph.
If the destination graph does not exist, it will be created. By
default, the service may return failure if the input graph does not
exist. If SILENT is present, the result of the operation will always
be success.
If COPY isn't suitable, then ADD may be what you're looking for:
3.2.5 ADD
The ADD operation is a shortcut for inserting all data from an input
graph into a destination graph. Data from the input graph is not
affected, and initial data from the destination graph, if any, is kept
intact.
ADD ( SILENT )? ( ( GRAPH )? IRIref_from | DEFAULT) TO ( ( GRAPH )? IRIref_to | DEFAULT)
is equivalent to:
INSERT { ( GRAPH IRIref_to )? { ?s ?p ?o } } WHERE { ( GRAPH IRIref_from )? { ?s ?p ?o } }
If the destination graph does not exist,
it will be created. By default, the service may return failure if the
input graph does not exist. If SILENT is present, the result of the
operation will always be success.
If you must use CONSTRUCT, you have to add triples to identify the named graph. One way to do this would be rdf reification.
CONSTRUCT {
?s ?p ?o .
?statement
a rdf:Statement ;
rdf:subject ?s ;
rdf:predicate ?p ;
rdf:object ?o ;
ex:targetGraph <http://graph.com/new> ;
.
}
WHERE {
?s ?p ?o .
BIND(BNODE() AS ?statement)
}
The constructed triples could be loaded into an other database and ingested into the target named graph by another INSERT.
INSERT {
GRAPH ?graph {
?s ?p ?o .
}
WHERE {
{
SERVICE <your-triple-store-containing-the-construct> {
?s ?p ?o .
?statement
a rdf:Statement ;
rdf:subject ?s ;
rdf:predicate ?p ;
rdf:object ?o ;
ex:targetGraph ?graph .
.
}
}
}
Why on earth would you do such things? As I said above, it is not always possible to use INSERT directly. I.e. it is not available in the context of SHACL-Rules.