Querying named RDF graphs in TDB using tdbquery - sparql

I am trying to query my newly created TDB database use the tdbquery program. However, I am having a hard time writing a query that targets the correct named graph. I am doing the following:
First a create a new dataset and add a name graph called "facts"
Dataset dataset = TDBFactory.createDataset("/tdb/");
dataset.begin(ReadWrite.WRITE) ;
try {
Model facts = RDFDataMgr.loadModel("lineitem.ttl") ;
dataset.addNamedModel("facts", facts);
dataset.commit();
TDB.sync(dataset);
dataset.end();
} finally {
dataset.close();
}
When I query all graphs in my TDB database it looks fine.
./tdbquery --loc /tdb/ "SELECT * { GRAPH ?g { ?s ?p ?o } }"
--------------------------------------------------
| s | p | o | g |
==================================================
| <fact1> | <predicate> | <nation> | <facts> |
| <fact2> | <predicate> | <region> | <facts> |
--------------------------------------------------
If I try to query the named graph I do not find and triples.
./tdbquery -v --loc /tdb/ "SELECT * { GRAPH <facts> { ?s ?p ?o } }"
OR
./tdbquery -v --loc /tdb/ "SELECT * FROM NAMED <facts> WHERE { ?s ?p ?o }"
-------------
| s | p | o |
=============
-------------
When I look at the algebra version of the query I see that the context (the graph) in my quad is wrong.
INFO exec :: ALGEBRA
(quadpattern (quad <file:///usr/local/apache-jena-2.12.1/bin/facts> ?s ?p ?o))
I know that the quad pattern should be:
(quad ?s ?p ?o)
How do I query a named graph in a TDB database?
Regards

When I look at the algebra version of the query I see that the context
(the graph) in my quad is wrong.
INFO exec :: ALGEBRA
(quadpattern (quad <file:///usr/local/apache-jena-2.12.1/bin/facts> ?s ?p ?o))
I know that the quad pattern should be: (quad ?s ?p ?o)
No its correct (if not what you expect)
A quadpattern searches against quads and so includes 4 fields the first of which is the name of the graph to be searched
And this is where your problem lies, graph names are URIs but you only provided facts as the name which is treated as a relative URI and as such subject to resolution which may differ in different parts of the system.
In your example the query parser uses the working directory as the Base URI leading to the strange graph name you see in the algebra plan.
You can see exactly what graph names are in the TDB store by issuing the following query:
SELECT ?g WHERE { GRAPH ?g { } }
If you get an absolute URI back then you can specify that directly in your original query, if you do not then there is no way to query for it from the command line.
Fixing your Issue
Don't use relative URIs wherever possible. If you do want to use them then don't use them without specifying a base URI explicitly
So in your code where you load the data make sure you give an absolute URI to the graph e.g.
dataset.addNamedModel("http://example.org/facts", facts);
And if you do want to be able to use relative URIs to refer to your graph in your queries use an appropriate BASE declaration so the URI is resolved as you want it e.g.
./tdbquery -v --loc /tdb/ "BASE <http://example.org/> SELECT * { GRAPH <facts> { ?s ?p ?o } }"

The problem here is that you have put a relative URI graph name into the data. RDF is define to work with absolute URIs (i.e start with "http:" or some other URI-scheme name).
Try
RDFDataMgr.write(System.out, dataset, Lang.NQUADS)
to see more clearly what's in the dataset. The output of tdbquery may invoke URI shorteners so some of your data has absolute URIs and some relative but it looks the same in the text format.
When "SELECT * { GRAPH { ?s ?p ?o } }" is parsed, just like if you read data from a file, relative URIs are resolved - the base URI is where the code is running so you get file:///usr/local/apache-jena-2.12.1/bin/facts
Try dataset.addNamedModel("http://example/facts", facts);
PS 1
Model m = dataset.getNamedModel("http://example/facts") ;
m.read("lineitem.ttl") ;
PS 2
sync() isn't necessary if you use transactions

Related

Sparql query to read from all named graphs without knowing the names

I am looking to run a SPARQL query over any dataset. We dont know the names of the named graphs in the datasets.
These are lots of documentation and examples of selection from named graphs when you know the name of the named graph/s. There are examples showing listing named graphs.
We are running the Jena from Java so it would be possible to run 2 queries, the first gets the named graphs and we inject these into the 2nd.
But surely you can write a single query that reads from all named graphs when you dont know their names?
Note: we are looking to stay away from using default graph/s as their behaviour seems implementation dependent.
Example:
{
?s foaf:name ?name ;
vCard:nickname ?nickName .
}
If you want the pattern to match within one graph and wish to try each graph, use the GRAPH ?g form.
GRAPH ?g
{ ?s foaf:name ?name ;
vc:nickname ?nickName .
}
If you want to make a query where the pattern matches across named graphs, -- e.g. foaf:name in one graph and vCard:nickname in another, same subject --
then set union default graph tdb2:unionDefaultGraph true then the default graph as seen by the query is the union (actually, RDF merge - no duplicates) of all the named graphs. Use the pattern as originally given.
Fuseki configuration file extract:
:dataset_tdb2 rdf:type tdb2:DatasetTDB2 ;
tdb2:location "DB2" ;
## Optional - with union default for query and update WHERE matching.
tdb2:unionDefaultGraph true ;
.
In code, not Fuseki, the application can use Dataset.getUnionModel().

How to completely delete an RDF node/instance from a graph database?

Good day, I am using Graphdb to store some triples as seen in the image below. This particular RDF node uses a regular URI http://example/regular/uri. What I wish to do is to not only completely delete all properties attached to this node, but also delete the node itself. (with the result that http://example/regular/uri does not appear in the graph database any longer)
So far I am only able to delete all properties, but I am not able to delete the actual RDF node itself. It seemed rather simple, but the more I research online, the more this seems impossible unless clearing the complete graph.
I have tried simple "delete where" queries as shown in example 11 of SPARQL documentation. And i have also tried using simple "delete where"-queries using the wildcard operator as shown in the query below:
Is there a way to delete such RDF nodes?
Thanks in advance!
A node exists in a graph as long as there is one or more triples with that node in subject or object position. So the easiest way would be to issue two delete statements, one deleting all statements with the node in subject position and one deleting all statements with the node in object position. But if you need/want to do it with a single operation you can do that as well with filters.
Here is a sample that delete uri://node/to/delete from uri://my/graph :
DELETE { GRAPH <uri://my/graph> {
?s ?p ?o .
}}
USING <uri://my/graph>
WHERE {
{
?s ?p ?o . VALUES ?s { <uri://node/to/delete>}
} UNION {
?s ?p ?o . VALUES ?o { <uri://node/to/delete>}
}
}

SPARQL performance in property path query Sesame / rdf4j

Let's say I want to find all subjects which are connected with objects via property path. The connection could be represented:
Subject - prop 1 -> A - prop 2 -> B - prop 3 -> Object
This could be achieved by quite simple SPARQL query:
SELECT ?s WHERE {
?s prop1/prop2/prop3 ?o .
VALUES ?o { <uri1> ... <urin> }
}
But I also want to include paths which use subclasses of A and/or B:
Subject - prop 1 -> subclassOfA - prop 2 -> subclassOfB - prop 3 -> Object
To achieve this I've added intermediate "sublassOf" property into the the path:
SELECT ?s WHERE {
?s prop1/<subclassOf>*/prop2/<subclassOf>*/prop3 ?o .
VALUES ?o { <uri1> ... <urin> }
}
This worked really fast on my dataset in Sesame 2.7.2, but after migrating to rdf4j 2.5.2 this query just hangs. The question is whether this is the correct way of querying this way or there's something much more efficient? And what could've caused such a significant performance drop in the new versions?

Named Graph Support in MarkLogic

I am not understanding how FROM NAMED graph is supported in MarkLogic. I am experimenting with SPARQL queries to find which collection the triples are coming from. The result is really confusing. For example:
select *
FROM <http://x.y.z/c>
FROM NAMED <http://x.y.z/c>
WHERE {
# GRAPH ?g
{?s ?p ?o}
}
returns a set of triple. However, if I un-comment the line # GRAPH ?g, the following error is returned:
[1.0-ml] XDMP-COLLXCNNOTFOUND: amped-qconsole:qconsole-sparql($query, (), (), (), ()) -- Collection lexicon not enabled
and highlight is on the WHERE { line.
Additionally, the following works and returns a set of triples:
select *
FROM <http://x.y.z/c>
WHERE {
{?s ?p ?o}
}
but not this:
select *
FROM NAMED <http://x.y.z/c>
WHERE {
{?s ?p ?o}
}
it returns an empty set. Adding the GRAPH ?g line causes the same error as above being returned. I am really confused. Can someone give an explanation of the behavior?
MarkLogic uses collections in its implementation of graphs. There is a note in the GRAPH keyword documentation that mentions the need for the collection lexicon.
You must enable the collection lexicon when you use a GRAPH construct in a SPARQL query. You can enable the collection lexicon from the database configuration pages or the Admin Interface.
I'll add that you can also enable the collection lexicon through the Management API.

CONSTRUCT into a named graph

I am attempting to use a SPARQL Construct query to create a new named graph from an existing one. The database I am querying contains http://graph.com/old as an existing named graph. I am using Jena TDB as the database, accessed through a Jena Fuseki endpoint. The below query gives me an error:
CONSTRUCT
{
GRAPH <http://graph.com/new> {
?s ?p ?o
}
}
WHERE
{
GRAPH <http://graph.com/old> {
?s ?p ?o
}
}
If I remove the graph statement from the CONSTRUCT block, the query works perfectly, but I would like to place the triples into a named graph that I specify instead of the default one.
As far as I could find, the SPARQL 1.1 section on CONSTRUCT does not say anything about constructing into named graphs. Is there a way to do this?
Just as SELECT queries are used when you are interested in getting a set of variable bindings back, CONSTRUCT queries are used you are interested in getting a model back. Just as the variables bound in a SELECT result set are not put into any model or persistent set of bindings, neither is the model built by a CONSTRUCT stored anywhere. You want to use SPARQL 1.1 INSERT. The update features are described in 3 SPARQL 1.1 Update Language. Your update request can thus be written as:
INSERT {
GRAPH <http://graph.com/new> {
?s ?p ?o
}
}
WHERE {
GRAPH <http://graph.com/old> {
?s ?p ?o
}
}
For this particular case, though, you might be able to use the COPY operation described in 3.2.3 COPY. COPY removes all the data from the target graph first though, so it might not be applicable to your actual case (understanding that the code you provided may be a minimal example, and not necessarily the actual update you're trying to perform). About COPY the standard says:
The COPY operation is a shortcut for inserting all data from an input
graph into a destination graph. Data from the input graph is not
affected, but data from the destination graph, if any, is removed
before insertion.
COPY ( SILENT )? ( ( GRAPH )? IRIref_from | DEFAULT) TO ( ( GRAPH )? IRIref_to | DEFAULT )
is similar in operation to:
DROP SILENT (GRAPH IRIref_to | DEFAULT);
INSERT { ( GRAPH IRIref_to )? { ?s ?p ?o } } WHERE { ( GRAPH IRIref_from )? { ?s ?p ?o } }
The difference between COPY and the DROP/INSERT combination is that if
COPY is used to copy a graph onto itself then no operation will be
performed and the data will be left as it was. Using DROP/INSERT in
this situation would result in an empty graph.
If the destination graph does not exist, it will be created. By
default, the service may return failure if the input graph does not
exist. If SILENT is present, the result of the operation will always
be success.
If COPY isn't suitable, then ADD may be what you're looking for:
3.2.5 ADD
The ADD operation is a shortcut for inserting all data from an input
graph into a destination graph. Data from the input graph is not
affected, and initial data from the destination graph, if any, is kept
intact.
ADD ( SILENT )? ( ( GRAPH )? IRIref_from | DEFAULT) TO ( ( GRAPH )? IRIref_to | DEFAULT)
is equivalent to:
INSERT { ( GRAPH IRIref_to )? { ?s ?p ?o } } WHERE { ( GRAPH IRIref_from )? { ?s ?p ?o } }
If the destination graph does not exist,
it will be created. By default, the service may return failure if the
input graph does not exist. If SILENT is present, the result of the
operation will always be success.
If you must use CONSTRUCT, you have to add triples to identify the named graph. One way to do this would be rdf reification.
CONSTRUCT {
?s ?p ?o .
?statement
a rdf:Statement ;
rdf:subject ?s ;
rdf:predicate ?p ;
rdf:object ?o ;
ex:targetGraph <http://graph.com/new> ;
.
}
WHERE {
?s ?p ?o .
BIND(BNODE() AS ?statement)
}
The constructed triples could be loaded into an other database and ingested into the target named graph by another INSERT.
INSERT {
GRAPH ?graph {
?s ?p ?o .
}
WHERE {
{
SERVICE <your-triple-store-containing-the-construct> {
?s ?p ?o .
?statement
a rdf:Statement ;
rdf:subject ?s ;
rdf:predicate ?p ;
rdf:object ?o ;
ex:targetGraph ?graph .
.
}
}
}
Why on earth would you do such things? As I said above, it is not always possible to use INSERT directly. I.e. it is not available in the context of SHACL-Rules.