I am attempting to use a SPARQL Construct query to create a new named graph from an existing one. The database I am querying contains http://graph.com/old as an existing named graph. I am using Jena TDB as the database, accessed through a Jena Fuseki endpoint. The below query gives me an error:
CONSTRUCT
{
GRAPH <http://graph.com/new> {
?s ?p ?o
}
}
WHERE
{
GRAPH <http://graph.com/old> {
?s ?p ?o
}
}
If I remove the graph statement from the CONSTRUCT block, the query works perfectly, but I would like to place the triples into a named graph that I specify instead of the default one.
As far as I could find, the SPARQL 1.1 section on CONSTRUCT does not say anything about constructing into named graphs. Is there a way to do this?
Just as SELECT queries are used when you are interested in getting a set of variable bindings back, CONSTRUCT queries are used you are interested in getting a model back. Just as the variables bound in a SELECT result set are not put into any model or persistent set of bindings, neither is the model built by a CONSTRUCT stored anywhere. You want to use SPARQL 1.1 INSERT. The update features are described in 3 SPARQL 1.1 Update Language. Your update request can thus be written as:
INSERT {
GRAPH <http://graph.com/new> {
?s ?p ?o
}
}
WHERE {
GRAPH <http://graph.com/old> {
?s ?p ?o
}
}
For this particular case, though, you might be able to use the COPY operation described in 3.2.3 COPY. COPY removes all the data from the target graph first though, so it might not be applicable to your actual case (understanding that the code you provided may be a minimal example, and not necessarily the actual update you're trying to perform). About COPY the standard says:
The COPY operation is a shortcut for inserting all data from an input
graph into a destination graph. Data from the input graph is not
affected, but data from the destination graph, if any, is removed
before insertion.
COPY ( SILENT )? ( ( GRAPH )? IRIref_from | DEFAULT) TO ( ( GRAPH )? IRIref_to | DEFAULT )
is similar in operation to:
DROP SILENT (GRAPH IRIref_to | DEFAULT);
INSERT { ( GRAPH IRIref_to )? { ?s ?p ?o } } WHERE { ( GRAPH IRIref_from )? { ?s ?p ?o } }
The difference between COPY and the DROP/INSERT combination is that if
COPY is used to copy a graph onto itself then no operation will be
performed and the data will be left as it was. Using DROP/INSERT in
this situation would result in an empty graph.
If the destination graph does not exist, it will be created. By
default, the service may return failure if the input graph does not
exist. If SILENT is present, the result of the operation will always
be success.
If COPY isn't suitable, then ADD may be what you're looking for:
3.2.5 ADD
The ADD operation is a shortcut for inserting all data from an input
graph into a destination graph. Data from the input graph is not
affected, and initial data from the destination graph, if any, is kept
intact.
ADD ( SILENT )? ( ( GRAPH )? IRIref_from | DEFAULT) TO ( ( GRAPH )? IRIref_to | DEFAULT)
is equivalent to:
INSERT { ( GRAPH IRIref_to )? { ?s ?p ?o } } WHERE { ( GRAPH IRIref_from )? { ?s ?p ?o } }
If the destination graph does not exist,
it will be created. By default, the service may return failure if the
input graph does not exist. If SILENT is present, the result of the
operation will always be success.
If you must use CONSTRUCT, you have to add triples to identify the named graph. One way to do this would be rdf reification.
CONSTRUCT {
?s ?p ?o .
?statement
a rdf:Statement ;
rdf:subject ?s ;
rdf:predicate ?p ;
rdf:object ?o ;
ex:targetGraph <http://graph.com/new> ;
.
}
WHERE {
?s ?p ?o .
BIND(BNODE() AS ?statement)
}
The constructed triples could be loaded into an other database and ingested into the target named graph by another INSERT.
INSERT {
GRAPH ?graph {
?s ?p ?o .
}
WHERE {
{
SERVICE <your-triple-store-containing-the-construct> {
?s ?p ?o .
?statement
a rdf:Statement ;
rdf:subject ?s ;
rdf:predicate ?p ;
rdf:object ?o ;
ex:targetGraph ?graph .
.
}
}
}
Why on earth would you do such things? As I said above, it is not always possible to use INSERT directly. I.e. it is not available in the context of SHACL-Rules.
Related
In the below query I extract all entities with class :Entity, regardless of whether they live inside a specific context (named graph) or not.
SELECT ?s
WHERE {
?s a :Entity .
}
I would like to filter among these entities, to keep only those that don't live inside any named graph, i.e., only the entities that exist outside all named graphs (EDIT: I guess the correct terminology is that I want to search for triples only in the default graph).
How can I do this?
You can take advantage of the FILTER NOT EXISTS (see here).
In particular, you are looking for resources (I assume IRIs) that exists in the default graph, but not in any named graphs.
So a query like this will work:
SELECT ?s
WHERE {
?s a :Entity
FILTER NOT EXISTS {
GRAPH ?g {
?s ?p ?o
}
}
}
This is saying that ?s is not the subject of any triple in any named graph.
We can extend the query above to include the case where ?s is not the object of any triple too:
SELECT ?s
WHERE {
?s a :Entity
FILTER NOT EXISTS {
GRAPH ?g {
{?s ?p ?o}
UNION
{?x ?p ?s}
}
}
}
whenever I start using SQL I tend to throw a couple of exploratory statements at the database in order to understand what is available, and what form the data takes.
e.g.
show tables
describe table
select * from table
Could anyone help me understand the way to complete a similar exploration of an RDF datastore using a SPARQL endpoint?
Well, the obvious first start is to look at the classes and properties present in the data.
Here is how to see what classes are being used:
SELECT DISTINCT ?class
WHERE {
?s a ?class .
}
LIMIT 25
OFFSET 0
(LIMIT and OFFSET are there for paging. It is worth getting used to these especially if you are sending your query over the Internet. I'll omit them in the other examples.)
a is a special SPARQL (and Notation3/Turtle) syntax to represent the rdf:type predicate - this links individual instances to owl:Class/rdfs:Class types (roughly equivalent to tables in SQL RDBMSes).
Secondly, you want to look at the properties. You can do this either by using the classes you've searched for or just looking for properties. Let's just get all the properties out of the store:
SELECT DISTINCT ?property
WHERE {
?s ?property ?o .
}
This will get all the properties, which you probably aren't interested in. This is equivalent to a list of all the row columns in SQL, but without any grouping by the table.
More useful is to see what properties are being used by instances that declare a particular class:
SELECT DISTINCT ?property
WHERE {
?s a <http://xmlns.com/foaf/0.1/Person>;
?property ?o .
}
This will get you back the properties used on any instances that satisfy the first triple - namely, that have the rdf:type of http://xmlns.com/foaf/0.1/Person.
Remember, because a rdf:Resource can have multiple rdf:type properties - classes if you will - and because RDF's data model is additive, you don't have a diamond problem. The type is just another property - it's just a useful social agreement to say that some things are persons or dogs or genes or football teams. It doesn't mean that the data store is going to contain properties usually associated with that type. The type doesn't guarantee anything in terms of what properties a resource might have.
You need to familiarise yourself with the data model and the use of SPARQL's UNION and OPTIONAL syntax. The rough mapping of rdf:type to SQL tables is just that - rough.
You might want to know what kind of entity the property is pointing to. Firstly, you probably want to know about datatype properties - equivalent to literals or primitives. You know, strings, integers, etc. RDF defines these literals as all inheriting from string. We can filter out just those properties that are literals using the SPARQL filter method isLiteral:
SELECT DISTINCT ?property
WHERE {
?s a <http://xmlns.com/foaf/0.1/Person>;
?property ?o .
FILTER isLiteral(?o)
}
We are here only going to get properties that have as their object a literal - a string, date-time, boolean, or one of the other XSD datatypes.
But what about the non-literal objects? Consider this very simple pseudo-Java class definition as an analogy:
public class Person {
int age;
Person marriedTo;
}
Using the above query, we would get back the literal that would represent age if the age property is bound. But marriedTo isn't a primitive (i.e. a literal in RDF terms) - it's a reference to another object - in RDF/OWL terminology, that's an object property. But we don't know what sort of objects are being referred to by those properties (predicates). This query will get you back properties with the accompanying types (the classes of which ?o values are members of).
SELECT DISTINCT ?property, ?class
WHERE {
?s a <http://xmlns.com/foaf/0.1/Person>;
?property ?o .
?o a ?class .
FILTER(!isLiteral(?o))
}
That should be enough to orient yourself in a particular dataset. Of course, I'd also recommend that you just pull out some individual resources and inspect them. You can do that using the DESCRIBE query:
DESCRIBE <http://example.org/resource>
There are some SPARQL tools - SNORQL, for instance - that let you do this in a browser. The SNORQL instance I've linked to has a sample query for exploring the possible named graphs, which I haven't covered here.
If you are unfamiliar with SPARQL, honestly, the best resource if you get stuck is the specification. It's a W3C spec but a pretty good one (they built a decent test suite so you can actually see whether implementations have done it properly or not) and if you can get over the complicated language, it is pretty helpful.
I find the following set of exploratory queries useful:
Seeing the classes:
select distinct ?type ?label
where {
?s a ?type .
OPTIONAL { ?type rdfs:label ?label }
}
Seeing the properties:
select distinct ?objprop ?label
where {
?objprop a owl:ObjectProperty .
OPTIONAL { ?objprop rdfs:label ?label }
}
Seeing the data properties:
select distinct ?dataprop ?label
where {
?dataprop a owl:DatatypeProperty .
OPTIONAL { ?dataprop rdfs:label ?label }
}
Seeing which properties are actually used:
select distinct ?p ?label
where {
?s ?p ?o .
OPTIONAL { ?p rdfs:label ?label }
}
Seeing what entities are asserted:
select distinct ?entity ?elabel ?type ?tlabel
where {
?entity a ?type .
OPTIONAL { ?entity rdfs:label ?elabel } .
OPTIONAL { ?type rdfs:label ?tlabel }
}
Seeing the distinct graphs in use:
select distinct ?g where {
graph ?g {
?s ?p ?o
}
}
SELECT DISTINCT * WHERE {
?s ?p ?o
}
LIMIT 10
I often refer to this list of queries from the voiD project. They are mainly of a statistical nature, but not only. It shouldn't be hard to remove the COUNTs from some statements to get the actual values.
Especially with large datasets, it is important to distinguish the pattern from the noise and to understand which structures are used a lot and which are rare. Instead of SELECT DISTINCT, I use aggregation queries to count the major classes, predicates etc. For example, here's how to see the most important predicates in your dataset:
SELECT ?pred (COUNT(*) as ?triples)
WHERE {
?s ?pred ?o .
}
GROUP BY ?pred
ORDER BY DESC(?triples)
LIMIT 100
I usually start by listing the graphs in a repository and their sizes, then look at classes (again with counts) in the graph(s) of interest, then the predicates of the class(es) I am interested in, etc.
Of course these selectors can be combined and restricted if appropriate. To see what predicates are defined for instances of type foaf:Person, and break this down by graph, you could use this:
SELECT ?g ?pred (COUNT(*) as ?triples)
WHERE {
GRAPH ?g {
?s a foaf:Person .
?s ?pred ?o .
}
GROUP BY ?g ?pred
ORDER BY ?g DESC(?triples)
This will list each graph with the predicates in it, in descending order of frequency.
whenever I start using SQL I tend to throw a couple of exploratory statements at the database in order to understand what is available, and what form the data takes.
e.g.
show tables
describe table
select * from table
Could anyone help me understand the way to complete a similar exploration of an RDF datastore using a SPARQL endpoint?
Well, the obvious first start is to look at the classes and properties present in the data.
Here is how to see what classes are being used:
SELECT DISTINCT ?class
WHERE {
?s a ?class .
}
LIMIT 25
OFFSET 0
(LIMIT and OFFSET are there for paging. It is worth getting used to these especially if you are sending your query over the Internet. I'll omit them in the other examples.)
a is a special SPARQL (and Notation3/Turtle) syntax to represent the rdf:type predicate - this links individual instances to owl:Class/rdfs:Class types (roughly equivalent to tables in SQL RDBMSes).
Secondly, you want to look at the properties. You can do this either by using the classes you've searched for or just looking for properties. Let's just get all the properties out of the store:
SELECT DISTINCT ?property
WHERE {
?s ?property ?o .
}
This will get all the properties, which you probably aren't interested in. This is equivalent to a list of all the row columns in SQL, but without any grouping by the table.
More useful is to see what properties are being used by instances that declare a particular class:
SELECT DISTINCT ?property
WHERE {
?s a <http://xmlns.com/foaf/0.1/Person>;
?property ?o .
}
This will get you back the properties used on any instances that satisfy the first triple - namely, that have the rdf:type of http://xmlns.com/foaf/0.1/Person.
Remember, because a rdf:Resource can have multiple rdf:type properties - classes if you will - and because RDF's data model is additive, you don't have a diamond problem. The type is just another property - it's just a useful social agreement to say that some things are persons or dogs or genes or football teams. It doesn't mean that the data store is going to contain properties usually associated with that type. The type doesn't guarantee anything in terms of what properties a resource might have.
You need to familiarise yourself with the data model and the use of SPARQL's UNION and OPTIONAL syntax. The rough mapping of rdf:type to SQL tables is just that - rough.
You might want to know what kind of entity the property is pointing to. Firstly, you probably want to know about datatype properties - equivalent to literals or primitives. You know, strings, integers, etc. RDF defines these literals as all inheriting from string. We can filter out just those properties that are literals using the SPARQL filter method isLiteral:
SELECT DISTINCT ?property
WHERE {
?s a <http://xmlns.com/foaf/0.1/Person>;
?property ?o .
FILTER isLiteral(?o)
}
We are here only going to get properties that have as their object a literal - a string, date-time, boolean, or one of the other XSD datatypes.
But what about the non-literal objects? Consider this very simple pseudo-Java class definition as an analogy:
public class Person {
int age;
Person marriedTo;
}
Using the above query, we would get back the literal that would represent age if the age property is bound. But marriedTo isn't a primitive (i.e. a literal in RDF terms) - it's a reference to another object - in RDF/OWL terminology, that's an object property. But we don't know what sort of objects are being referred to by those properties (predicates). This query will get you back properties with the accompanying types (the classes of which ?o values are members of).
SELECT DISTINCT ?property, ?class
WHERE {
?s a <http://xmlns.com/foaf/0.1/Person>;
?property ?o .
?o a ?class .
FILTER(!isLiteral(?o))
}
That should be enough to orient yourself in a particular dataset. Of course, I'd also recommend that you just pull out some individual resources and inspect them. You can do that using the DESCRIBE query:
DESCRIBE <http://example.org/resource>
There are some SPARQL tools - SNORQL, for instance - that let you do this in a browser. The SNORQL instance I've linked to has a sample query for exploring the possible named graphs, which I haven't covered here.
If you are unfamiliar with SPARQL, honestly, the best resource if you get stuck is the specification. It's a W3C spec but a pretty good one (they built a decent test suite so you can actually see whether implementations have done it properly or not) and if you can get over the complicated language, it is pretty helpful.
I find the following set of exploratory queries useful:
Seeing the classes:
select distinct ?type ?label
where {
?s a ?type .
OPTIONAL { ?type rdfs:label ?label }
}
Seeing the properties:
select distinct ?objprop ?label
where {
?objprop a owl:ObjectProperty .
OPTIONAL { ?objprop rdfs:label ?label }
}
Seeing the data properties:
select distinct ?dataprop ?label
where {
?dataprop a owl:DatatypeProperty .
OPTIONAL { ?dataprop rdfs:label ?label }
}
Seeing which properties are actually used:
select distinct ?p ?label
where {
?s ?p ?o .
OPTIONAL { ?p rdfs:label ?label }
}
Seeing what entities are asserted:
select distinct ?entity ?elabel ?type ?tlabel
where {
?entity a ?type .
OPTIONAL { ?entity rdfs:label ?elabel } .
OPTIONAL { ?type rdfs:label ?tlabel }
}
Seeing the distinct graphs in use:
select distinct ?g where {
graph ?g {
?s ?p ?o
}
}
SELECT DISTINCT * WHERE {
?s ?p ?o
}
LIMIT 10
I often refer to this list of queries from the voiD project. They are mainly of a statistical nature, but not only. It shouldn't be hard to remove the COUNTs from some statements to get the actual values.
Especially with large datasets, it is important to distinguish the pattern from the noise and to understand which structures are used a lot and which are rare. Instead of SELECT DISTINCT, I use aggregation queries to count the major classes, predicates etc. For example, here's how to see the most important predicates in your dataset:
SELECT ?pred (COUNT(*) as ?triples)
WHERE {
?s ?pred ?o .
}
GROUP BY ?pred
ORDER BY DESC(?triples)
LIMIT 100
I usually start by listing the graphs in a repository and their sizes, then look at classes (again with counts) in the graph(s) of interest, then the predicates of the class(es) I am interested in, etc.
Of course these selectors can be combined and restricted if appropriate. To see what predicates are defined for instances of type foaf:Person, and break this down by graph, you could use this:
SELECT ?g ?pred (COUNT(*) as ?triples)
WHERE {
GRAPH ?g {
?s a foaf:Person .
?s ?pred ?o .
}
GROUP BY ?g ?pred
ORDER BY ?g DESC(?triples)
This will list each graph with the predicates in it, in descending order of frequency.
I have a graph G. I'd like to create G', a subset of G, by filtering out from G all items belonging to specific types, let's say {:Foo, :Bar}.
For example, if this is G
:x a :Foo
:y a :Bar
:x :predicate_a :hh
:kk :predicate_b :y
:mm :predicate_b :kk
G' should be:
:mm :predicate_b :kk
My best current option is using DELETE on G. I need two queries per each type:
(i) one for the subjects
delete where
{
?s ?p ?o .
?s a :Foo .
}
(i) another one for the objects
delete where
{
?s ?p ?o .
?o a :Foo .
}
In that way, I should get what I need. It seems to me that's not the best option, though. Are there better ways (i.e., more efficient/compact)?
It can be done in a single query, using UNION and VALUES. This should work for both classes in one go:
PREFIX : <http://www.example.com/foo#>
DELETE { ?s ?p ?o }
WHERE
{
VALUES (?toDeleteClass) {
(:Foo)
(:Bar)
}
?toDelete a ?toDeleteClass
# or, if you want transitivity: ?toDelete a/rdfs:subClassOf* ?toDeleteClass
{ BIND( ?toDelete AS ?s ). ?s ?p ?o }
UNION { BIND( ?toDelete AS ?o ). ?s ?p ?o }
}
Combining this with comments under your question, you can build a new graph G', rather than modifying the existing one (using INSERT and GRAPH), or, using CONSTRUCT, you can extract and download G' (but in this case you might need to do it in chunks, via LIMIT/OFFSET, since many triple stores have a limit about the result size a query can return).
An alternative to VALUES would be FILTER ( ?toDeleteClass IN ( :Foo, :Bar ). However, VALUES look more natural for the task you have and might be faster as well.
Beware of inference: if your triple store has some inference enabled by default, the pattern ?toDelete a ?toDeleteClass might pick transitive instances of Foo/Bar too, i.e., those that are instances of subclasses of Foo/Bar, not just the direct ones. If you don't want this, the best is to find how you can disable inference in your triple store (you could detect indirect instances via FILTER, but it's more complicated and slower).
I'm trying to do some simple deletes of triples in my TDB. I'm trying to delete any triples that have a certain value, and any triples that link to it. This is an example of one of the queries I'm running through Fuseki.
with <http://XXXXXXXXXXXX/XXXX/>
delete {
?s2 ?p2 ?s .
?s ?p ?o .
}
where
{
?s2 ?p2 ?s .
?s ?p ?o .
filter(strStarts(?o, "cPage")) .
}
I get this response:
Success
Update succeeded
But no triples are actually removed. I've checked that the --update flag is getting passed to Fuseki, but I can't figure out why nothing's happening.
In the SPARQL UPDATE section of fuseki, type clear default.
Had this problem myself once. What I did wrong was to use the default endpoint: /datasetname/sparql (or /datasetname/query). These are read-only endpoints. If you want to insert or delete data, you can to use /datasetname/update as the enpoint.
See picture below: If you use the localhost webinterface, you can change the endpoint in the lower left.