SparQL: Replace subject URIs - sparql

We have graphs containing oa:Annotations with certain properties. Since I am working on a local copy of the server, it would be useful to me to change these URIs to point to localhost. According to the book I read, I thought this should work, but it does not. It does not seem to change anything. Still, the server returns a 204, I am using the priting port and url (/update). So I definitely should be able to change things. There is no error message.
PREFIX oa: <http://www.w3.org/ns/oa#>
DELETE
{ GRAPH ?g {?oldIRI ?p ?o} }
INSERT
{ GRAPH ?g {?newIRI ?p ?o} }
WHERE
{
GRAPH ?g {
?oldIRI a oa:Annotation .
?oldIRI ?p ?o .
}
BIND(
CONCAT("http://localhost:80",
SUBSTR( STR(?oldIRI),
34,
STRLEN(STR(?oldIRI)) )
) AS ?newIRI
)
FILTER(CONTAINS(?oldIRI, "part_of_old_url"))
}
Any idea why this does not have the effect I hoped for? The book I use as reference does have "recipies" to change properties and it's values, but there is no example changing the subjects, so I assume there is a more general problem?
Update: using STR()
As suggested in the comments, I used CONTAINS(STR(?oldIRI), "part_of_old_url") to convert oldIRI to a string. I am not fully aware of all changes, but this is what I can say: (I have backups, no worry :D)
PREFIX oa: <http://www.w3.org/ns/oa#>
SELECT *
WHERE {
GRAPH ?g {
?iri a oa:Annotation .
}
} LIMIT 100
This query has zero results. It's a default query I often used to get some annotation uris for looking into things.

Related

What is the best way to discover metadata about an RDF schema from a SPARQL endpoint? [duplicate]

whenever I start using SQL I tend to throw a couple of exploratory statements at the database in order to understand what is available, and what form the data takes.
e.g.
show tables
describe table
select * from table
Could anyone help me understand the way to complete a similar exploration of an RDF datastore using a SPARQL endpoint?
Well, the obvious first start is to look at the classes and properties present in the data.
Here is how to see what classes are being used:
SELECT DISTINCT ?class
WHERE {
?s a ?class .
}
LIMIT 25
OFFSET 0
(LIMIT and OFFSET are there for paging. It is worth getting used to these especially if you are sending your query over the Internet. I'll omit them in the other examples.)
a is a special SPARQL (and Notation3/Turtle) syntax to represent the rdf:type predicate - this links individual instances to owl:Class/rdfs:Class types (roughly equivalent to tables in SQL RDBMSes).
Secondly, you want to look at the properties. You can do this either by using the classes you've searched for or just looking for properties. Let's just get all the properties out of the store:
SELECT DISTINCT ?property
WHERE {
?s ?property ?o .
}
This will get all the properties, which you probably aren't interested in. This is equivalent to a list of all the row columns in SQL, but without any grouping by the table.
More useful is to see what properties are being used by instances that declare a particular class:
SELECT DISTINCT ?property
WHERE {
?s a <http://xmlns.com/foaf/0.1/Person>;
?property ?o .
}
This will get you back the properties used on any instances that satisfy the first triple - namely, that have the rdf:type of http://xmlns.com/foaf/0.1/Person.
Remember, because a rdf:Resource can have multiple rdf:type properties - classes if you will - and because RDF's data model is additive, you don't have a diamond problem. The type is just another property - it's just a useful social agreement to say that some things are persons or dogs or genes or football teams. It doesn't mean that the data store is going to contain properties usually associated with that type. The type doesn't guarantee anything in terms of what properties a resource might have.
You need to familiarise yourself with the data model and the use of SPARQL's UNION and OPTIONAL syntax. The rough mapping of rdf:type to SQL tables is just that - rough.
You might want to know what kind of entity the property is pointing to. Firstly, you probably want to know about datatype properties - equivalent to literals or primitives. You know, strings, integers, etc. RDF defines these literals as all inheriting from string. We can filter out just those properties that are literals using the SPARQL filter method isLiteral:
SELECT DISTINCT ?property
WHERE {
?s a <http://xmlns.com/foaf/0.1/Person>;
?property ?o .
FILTER isLiteral(?o)
}
We are here only going to get properties that have as their object a literal - a string, date-time, boolean, or one of the other XSD datatypes.
But what about the non-literal objects? Consider this very simple pseudo-Java class definition as an analogy:
public class Person {
int age;
Person marriedTo;
}
Using the above query, we would get back the literal that would represent age if the age property is bound. But marriedTo isn't a primitive (i.e. a literal in RDF terms) - it's a reference to another object - in RDF/OWL terminology, that's an object property. But we don't know what sort of objects are being referred to by those properties (predicates). This query will get you back properties with the accompanying types (the classes of which ?o values are members of).
SELECT DISTINCT ?property, ?class
WHERE {
?s a <http://xmlns.com/foaf/0.1/Person>;
?property ?o .
?o a ?class .
FILTER(!isLiteral(?o))
}
That should be enough to orient yourself in a particular dataset. Of course, I'd also recommend that you just pull out some individual resources and inspect them. You can do that using the DESCRIBE query:
DESCRIBE <http://example.org/resource>
There are some SPARQL tools - SNORQL, for instance - that let you do this in a browser. The SNORQL instance I've linked to has a sample query for exploring the possible named graphs, which I haven't covered here.
If you are unfamiliar with SPARQL, honestly, the best resource if you get stuck is the specification. It's a W3C spec but a pretty good one (they built a decent test suite so you can actually see whether implementations have done it properly or not) and if you can get over the complicated language, it is pretty helpful.
I find the following set of exploratory queries useful:
Seeing the classes:
select distinct ?type ?label
where {
?s a ?type .
OPTIONAL { ?type rdfs:label ?label }
}
Seeing the properties:
select distinct ?objprop ?label
where {
?objprop a owl:ObjectProperty .
OPTIONAL { ?objprop rdfs:label ?label }
}
Seeing the data properties:
select distinct ?dataprop ?label
where {
?dataprop a owl:DatatypeProperty .
OPTIONAL { ?dataprop rdfs:label ?label }
}
Seeing which properties are actually used:
select distinct ?p ?label
where {
?s ?p ?o .
OPTIONAL { ?p rdfs:label ?label }
}
Seeing what entities are asserted:
select distinct ?entity ?elabel ?type ?tlabel
where {
?entity a ?type .
OPTIONAL { ?entity rdfs:label ?elabel } .
OPTIONAL { ?type rdfs:label ?tlabel }
}
Seeing the distinct graphs in use:
select distinct ?g where {
graph ?g {
?s ?p ?o
}
}
SELECT DISTINCT * WHERE {
?s ?p ?o
}
LIMIT 10
I often refer to this list of queries from the voiD project. They are mainly of a statistical nature, but not only. It shouldn't be hard to remove the COUNTs from some statements to get the actual values.
Especially with large datasets, it is important to distinguish the pattern from the noise and to understand which structures are used a lot and which are rare. Instead of SELECT DISTINCT, I use aggregation queries to count the major classes, predicates etc. For example, here's how to see the most important predicates in your dataset:
SELECT ?pred (COUNT(*) as ?triples)
WHERE {
?s ?pred ?o .
}
GROUP BY ?pred
ORDER BY DESC(?triples)
LIMIT 100
I usually start by listing the graphs in a repository and their sizes, then look at classes (again with counts) in the graph(s) of interest, then the predicates of the class(es) I am interested in, etc.
Of course these selectors can be combined and restricted if appropriate. To see what predicates are defined for instances of type foaf:Person, and break this down by graph, you could use this:
SELECT ?g ?pred (COUNT(*) as ?triples)
WHERE {
GRAPH ?g {
?s a foaf:Person .
?s ?pred ?o .
}
GROUP BY ?g ?pred
ORDER BY ?g DESC(?triples)
This will list each graph with the predicates in it, in descending order of frequency.

How to find relations/properties for SPARQL queries [duplicate]

whenever I start using SQL I tend to throw a couple of exploratory statements at the database in order to understand what is available, and what form the data takes.
e.g.
show tables
describe table
select * from table
Could anyone help me understand the way to complete a similar exploration of an RDF datastore using a SPARQL endpoint?
Well, the obvious first start is to look at the classes and properties present in the data.
Here is how to see what classes are being used:
SELECT DISTINCT ?class
WHERE {
?s a ?class .
}
LIMIT 25
OFFSET 0
(LIMIT and OFFSET are there for paging. It is worth getting used to these especially if you are sending your query over the Internet. I'll omit them in the other examples.)
a is a special SPARQL (and Notation3/Turtle) syntax to represent the rdf:type predicate - this links individual instances to owl:Class/rdfs:Class types (roughly equivalent to tables in SQL RDBMSes).
Secondly, you want to look at the properties. You can do this either by using the classes you've searched for or just looking for properties. Let's just get all the properties out of the store:
SELECT DISTINCT ?property
WHERE {
?s ?property ?o .
}
This will get all the properties, which you probably aren't interested in. This is equivalent to a list of all the row columns in SQL, but without any grouping by the table.
More useful is to see what properties are being used by instances that declare a particular class:
SELECT DISTINCT ?property
WHERE {
?s a <http://xmlns.com/foaf/0.1/Person>;
?property ?o .
}
This will get you back the properties used on any instances that satisfy the first triple - namely, that have the rdf:type of http://xmlns.com/foaf/0.1/Person.
Remember, because a rdf:Resource can have multiple rdf:type properties - classes if you will - and because RDF's data model is additive, you don't have a diamond problem. The type is just another property - it's just a useful social agreement to say that some things are persons or dogs or genes or football teams. It doesn't mean that the data store is going to contain properties usually associated with that type. The type doesn't guarantee anything in terms of what properties a resource might have.
You need to familiarise yourself with the data model and the use of SPARQL's UNION and OPTIONAL syntax. The rough mapping of rdf:type to SQL tables is just that - rough.
You might want to know what kind of entity the property is pointing to. Firstly, you probably want to know about datatype properties - equivalent to literals or primitives. You know, strings, integers, etc. RDF defines these literals as all inheriting from string. We can filter out just those properties that are literals using the SPARQL filter method isLiteral:
SELECT DISTINCT ?property
WHERE {
?s a <http://xmlns.com/foaf/0.1/Person>;
?property ?o .
FILTER isLiteral(?o)
}
We are here only going to get properties that have as their object a literal - a string, date-time, boolean, or one of the other XSD datatypes.
But what about the non-literal objects? Consider this very simple pseudo-Java class definition as an analogy:
public class Person {
int age;
Person marriedTo;
}
Using the above query, we would get back the literal that would represent age if the age property is bound. But marriedTo isn't a primitive (i.e. a literal in RDF terms) - it's a reference to another object - in RDF/OWL terminology, that's an object property. But we don't know what sort of objects are being referred to by those properties (predicates). This query will get you back properties with the accompanying types (the classes of which ?o values are members of).
SELECT DISTINCT ?property, ?class
WHERE {
?s a <http://xmlns.com/foaf/0.1/Person>;
?property ?o .
?o a ?class .
FILTER(!isLiteral(?o))
}
That should be enough to orient yourself in a particular dataset. Of course, I'd also recommend that you just pull out some individual resources and inspect them. You can do that using the DESCRIBE query:
DESCRIBE <http://example.org/resource>
There are some SPARQL tools - SNORQL, for instance - that let you do this in a browser. The SNORQL instance I've linked to has a sample query for exploring the possible named graphs, which I haven't covered here.
If you are unfamiliar with SPARQL, honestly, the best resource if you get stuck is the specification. It's a W3C spec but a pretty good one (they built a decent test suite so you can actually see whether implementations have done it properly or not) and if you can get over the complicated language, it is pretty helpful.
I find the following set of exploratory queries useful:
Seeing the classes:
select distinct ?type ?label
where {
?s a ?type .
OPTIONAL { ?type rdfs:label ?label }
}
Seeing the properties:
select distinct ?objprop ?label
where {
?objprop a owl:ObjectProperty .
OPTIONAL { ?objprop rdfs:label ?label }
}
Seeing the data properties:
select distinct ?dataprop ?label
where {
?dataprop a owl:DatatypeProperty .
OPTIONAL { ?dataprop rdfs:label ?label }
}
Seeing which properties are actually used:
select distinct ?p ?label
where {
?s ?p ?o .
OPTIONAL { ?p rdfs:label ?label }
}
Seeing what entities are asserted:
select distinct ?entity ?elabel ?type ?tlabel
where {
?entity a ?type .
OPTIONAL { ?entity rdfs:label ?elabel } .
OPTIONAL { ?type rdfs:label ?tlabel }
}
Seeing the distinct graphs in use:
select distinct ?g where {
graph ?g {
?s ?p ?o
}
}
SELECT DISTINCT * WHERE {
?s ?p ?o
}
LIMIT 10
I often refer to this list of queries from the voiD project. They are mainly of a statistical nature, but not only. It shouldn't be hard to remove the COUNTs from some statements to get the actual values.
Especially with large datasets, it is important to distinguish the pattern from the noise and to understand which structures are used a lot and which are rare. Instead of SELECT DISTINCT, I use aggregation queries to count the major classes, predicates etc. For example, here's how to see the most important predicates in your dataset:
SELECT ?pred (COUNT(*) as ?triples)
WHERE {
?s ?pred ?o .
}
GROUP BY ?pred
ORDER BY DESC(?triples)
LIMIT 100
I usually start by listing the graphs in a repository and their sizes, then look at classes (again with counts) in the graph(s) of interest, then the predicates of the class(es) I am interested in, etc.
Of course these selectors can be combined and restricted if appropriate. To see what predicates are defined for instances of type foaf:Person, and break this down by graph, you could use this:
SELECT ?g ?pred (COUNT(*) as ?triples)
WHERE {
GRAPH ?g {
?s a foaf:Person .
?s ?pred ?o .
}
GROUP BY ?g ?pred
ORDER BY ?g DESC(?triples)
This will list each graph with the predicates in it, in descending order of frequency.

Setting to query only Default Graph and exclude Named Graphs

In the GraphDB documentation, I see that "the dataset’s default graph contains the merge of the database’s default graph AND all the database named graphs." This means that "if a statement ex:x ex:y ex:z exists in the database in the graph ex:g" then a query such as SELECT * { ?s ?p ?o } will return the triple ex:x ex:y ex:z
I am wondering if there is a setting which can be triggered either via the web interface or via the RDF4J/OpenRDF API which will disable this behavior in a specified GraphDB repository. That is, for the purposes of my project I would prefer to have triples which are stored in named graphs to only appear in results which specifically query that named graph.
I have not seen anything like this searching through the documentation or on the settings available on the web interface, but maybe somebody here knows something I don't.
EDIT: I am not looking for a SPARQL solution to this problem. I know that I can query just the default graph using SPARQL, but I want to be able to use the query SELECT * { ?s ?p ?o } and only see results which are in the default graph by default.
GraphDB/RDF4J have a different interpretation than Jena how to query the default graph. The only easy way to query only explicit statements in the default graph is to use the special graph sesame:nil. The SPARQL-based solution is to write:
PREFIX sesame: <http://www.openrdf.org/schema/sesame#>
SELECT ?s ?p ?o
FROM sesame:nil
WHERE {
?s ?p ?o .
} LIMIT 100
I don't think there is any easy non-SPARQL based solution like changing a configuration option or even use this special graph over the SPARQL Graph Store protocol.

Named Graph Support in MarkLogic

I am not understanding how FROM NAMED graph is supported in MarkLogic. I am experimenting with SPARQL queries to find which collection the triples are coming from. The result is really confusing. For example:
select *
FROM <http://x.y.z/c>
FROM NAMED <http://x.y.z/c>
WHERE {
# GRAPH ?g
{?s ?p ?o}
}
returns a set of triple. However, if I un-comment the line # GRAPH ?g, the following error is returned:
[1.0-ml] XDMP-COLLXCNNOTFOUND: amped-qconsole:qconsole-sparql($query, (), (), (), ()) -- Collection lexicon not enabled
and highlight is on the WHERE { line.
Additionally, the following works and returns a set of triples:
select *
FROM <http://x.y.z/c>
WHERE {
{?s ?p ?o}
}
but not this:
select *
FROM NAMED <http://x.y.z/c>
WHERE {
{?s ?p ?o}
}
it returns an empty set. Adding the GRAPH ?g line causes the same error as above being returned. I am really confused. Can someone give an explanation of the behavior?
MarkLogic uses collections in its implementation of graphs. There is a note in the GRAPH keyword documentation that mentions the need for the collection lexicon.
You must enable the collection lexicon when you use a GRAPH construct in a SPARQL query. You can enable the collection lexicon from the database configuration pages or the Admin Interface.
I'll add that you can also enable the collection lexicon through the Management API.

CONSTRUCT into a named graph

I am attempting to use a SPARQL Construct query to create a new named graph from an existing one. The database I am querying contains http://graph.com/old as an existing named graph. I am using Jena TDB as the database, accessed through a Jena Fuseki endpoint. The below query gives me an error:
CONSTRUCT
{
GRAPH <http://graph.com/new> {
?s ?p ?o
}
}
WHERE
{
GRAPH <http://graph.com/old> {
?s ?p ?o
}
}
If I remove the graph statement from the CONSTRUCT block, the query works perfectly, but I would like to place the triples into a named graph that I specify instead of the default one.
As far as I could find, the SPARQL 1.1 section on CONSTRUCT does not say anything about constructing into named graphs. Is there a way to do this?
Just as SELECT queries are used when you are interested in getting a set of variable bindings back, CONSTRUCT queries are used you are interested in getting a model back. Just as the variables bound in a SELECT result set are not put into any model or persistent set of bindings, neither is the model built by a CONSTRUCT stored anywhere. You want to use SPARQL 1.1 INSERT. The update features are described in 3 SPARQL 1.1 Update Language. Your update request can thus be written as:
INSERT {
GRAPH <http://graph.com/new> {
?s ?p ?o
}
}
WHERE {
GRAPH <http://graph.com/old> {
?s ?p ?o
}
}
For this particular case, though, you might be able to use the COPY operation described in 3.2.3 COPY. COPY removes all the data from the target graph first though, so it might not be applicable to your actual case (understanding that the code you provided may be a minimal example, and not necessarily the actual update you're trying to perform). About COPY the standard says:
The COPY operation is a shortcut for inserting all data from an input
graph into a destination graph. Data from the input graph is not
affected, but data from the destination graph, if any, is removed
before insertion.
COPY ( SILENT )? ( ( GRAPH )? IRIref_from | DEFAULT) TO ( ( GRAPH )? IRIref_to | DEFAULT )
is similar in operation to:
DROP SILENT (GRAPH IRIref_to | DEFAULT);
INSERT { ( GRAPH IRIref_to )? { ?s ?p ?o } } WHERE { ( GRAPH IRIref_from )? { ?s ?p ?o } }
The difference between COPY and the DROP/INSERT combination is that if
COPY is used to copy a graph onto itself then no operation will be
performed and the data will be left as it was. Using DROP/INSERT in
this situation would result in an empty graph.
If the destination graph does not exist, it will be created. By
default, the service may return failure if the input graph does not
exist. If SILENT is present, the result of the operation will always
be success.
If COPY isn't suitable, then ADD may be what you're looking for:
3.2.5 ADD
The ADD operation is a shortcut for inserting all data from an input
graph into a destination graph. Data from the input graph is not
affected, and initial data from the destination graph, if any, is kept
intact.
ADD ( SILENT )? ( ( GRAPH )? IRIref_from | DEFAULT) TO ( ( GRAPH )? IRIref_to | DEFAULT)
is equivalent to:
INSERT { ( GRAPH IRIref_to )? { ?s ?p ?o } } WHERE { ( GRAPH IRIref_from )? { ?s ?p ?o } }
If the destination graph does not exist,
it will be created. By default, the service may return failure if the
input graph does not exist. If SILENT is present, the result of the
operation will always be success.
If you must use CONSTRUCT, you have to add triples to identify the named graph. One way to do this would be rdf reification.
CONSTRUCT {
?s ?p ?o .
?statement
a rdf:Statement ;
rdf:subject ?s ;
rdf:predicate ?p ;
rdf:object ?o ;
ex:targetGraph <http://graph.com/new> ;
.
}
WHERE {
?s ?p ?o .
BIND(BNODE() AS ?statement)
}
The constructed triples could be loaded into an other database and ingested into the target named graph by another INSERT.
INSERT {
GRAPH ?graph {
?s ?p ?o .
}
WHERE {
{
SERVICE <your-triple-store-containing-the-construct> {
?s ?p ?o .
?statement
a rdf:Statement ;
rdf:subject ?s ;
rdf:predicate ?p ;
rdf:object ?o ;
ex:targetGraph ?graph .
.
}
}
}
Why on earth would you do such things? As I said above, it is not always possible to use INSERT directly. I.e. it is not available in the context of SHACL-Rules.