How to get a concise bounded description of a resource with Sesame? - sparql

I've been testing Sesame 2.7.2 and I got a big surprise when faced to the fact that DESCRIBE queries do not include blank nodes closure [EDIT: the right term for this is CBD for concise bounded description]
If I correctly understand, the SPARQL spec is quite loose on that and says that what is returned is actually up to the provider, but I'm still surprised at the choice, since bnodes (in the results of the describe query) cannot be used in subsequent SPARQL queries.
So the question is: how can I get a closed description of a resource <uri1> without doing:
query DESCRIBE <uri1>
iterate over the result to determine which objects are blank nodes
then DESCRIBE ?b WHERE { <uri1> pred_relating_to_bnode_ ?b }
do it recursively and chaining over as long as bnodes are found
If I'm not mistaken, depth-2 bnodes would have to be described with
DESCRIBE ?b2 WHERE {<uri1> <p1&> ?b . ?b <p2> ?b2 }
unless there is a simpler way to do this?
Finally, would it not be better and simpler to let DESCRIBE return a closed description of a resource where you can still obtain the currently returned result with something like the following?
CONSTRUCT {<uri1> ?p ?o} WHERE {<uri1> ?p ?o}
EDIT: here is an example of a closed result I want to get back from Sesame
<urn:sites#1> a my:WebSite .
<urn:sites#1> my:domainName _:autos1 .
<urn:sites#1> my:online "true"^^xsd:boolean .
_:autos1 a rdf:Alt .
_:autos1 rdf:_1 _:autos2
_:autos2 my:url "192.168.2.111:15001"#fr
_:autos2 my:url "192.168.2.111:15002"#en
Currently: DESCRIBE <urn:sites#1> returns me the same result as the query CONSTRUCT WHERE {<urn:sites#1> ?p ?o}, so I get only that
<urn:sites#1> a my:WebSite .
<urn:sites#1> my:domainName _:autos1 .
<urn:sites#1> my:online "true"^^xsd:boolean .

Partial solutions using SPARQL
Based on your comments, this isn't an exact solution yet, but note that you can describe multiple things in a given describe query. For instance, given the data:
#prefix : <http://example.org/> .
:Alice :named "Alice" ;
:likes :Bill, [ :named "Carl" ;
:likes [ :named "Daphne" ]].
:Bill :likes :Elaine ;
:named "Bill" .
you can run the query:
PREFIX : <http://example.org/>
describe :Alice ?object where {
:Alice :likes* ?object .
FILTER( isBlank( ?object ) )
}
and get the results:
#prefix : <http://example.org/> .
:Alice
:likes :Bill ;
:likes [ :likes [ :named "Daphne"
] ;
:named "Carl"
] ;
:named "Alice" .
That's not a complete description of course, because it's only following :likes out from :Alice, not arbitrary predicates. But it does get the blank nodes named "Carl" and "Daphne", which is a start.
The larger issue in Sesame
It looks like you're going to have to do something like what's described above, and possibly with multiple searches, or you're going to have to modify Sesame. The alternative to writing some creative SPARQL is to change the way that Sesame implements describe queries. Some endpoints make this relatively easy, but Sesame doesn't seem to be one of them. There's a mailing list thread from 2011, Custom SPARQL DESCRIBE Implementation, that seems addressed at this same problem.
Roberto García asks:
I'm trying to customise the behaviour of SPARQL DESCRIBE queries.
I'm willing to get something similar to CBD (i.e. all properties and
values for the described resource plus all properties and values for
the blank nodes connected to it).
I have tried to reproduce a similar behaviour using a CONSTRUCT query
but the performance is not good and the query gets quite complex if I
try to consider long chains of properties pointing to blank nodes
starting from the described resource.
Jeen Broekstra replies:
The implementation of DESCRIBE in Sesame is hardcoded in the query
parser. It can only be changed by adapting the parser itself, and even
then it will be tricky, as the query model has no easy way to express it
either: it needs an extension of the algebra.
> If this is not possible, any advice about how to implement it using CONSTRUCT
queries?
I'm not sure it's technically possible to do this in a single query.
CBDs are recursive in nature, and while SPARQL does have some support
for recursivity (property chains), the problem is that you have to do an
intermediate check in every step of the property chain to see if the
bound value is a blank node or not. This is not something that SPARQL
supports out of the box: property chains are defined to have only length
of the path as the stop condition.
Perhaps something is possible using a convoluted combination of
subqueries, unions and optionals, but I doubt it.
I think the best workaround is instead to use the standard DESCRIBE
format that Sesame supports, and for each blank node value in that
result do a separate consecutive query. In other words: you solve it by
hand.
The only other option is to log a feature request for support of CBDs in
Sesame. I can't give any guarantees about if/when that will be followed
up on though.

Related

How to retrieve anonymous subclass from OWL in SPARQL

I am unsure of the SPARQL query needed to replicate the results of DL Query "has part some benzamide".
That query should return all entities that have some part that is a benzamide or a subclass of benzamide.
My SPARQL attempt:
PREFIX opioid: <https://mac389.github.io/ontology#>
SELECT ?substance ?substance_label
{ ?substance rdfs:subClassOf* opioid:chemical_entity.
?substance rdfs:subClassOf* / opioid:has_part / owl:someValuesFrom opioid:benzamide.
?substance rdfs:label ?substance_label }
Link to OWL as RDF/XML
In this code the 1st and 3rd lines work as intended, retrieving a list of all chemical entities and their labels. When I add the second line, the query returns no answers (there should be 21 items which is what the DL Query in Protege returns).
How does one query for anonymous subclasses like this?
I have looked at this question but I am looking for a subclass that only fulfills one of the property restrictions. I don't fully understand the answer to this question, but mine seems very related.

Querying DBpedia-Live with SPARQL does not give same answer as DBpedia

I want to query DBpedia with DBpedia Live endpoint.
I have this query :
SELECT *
WHERE {
?x a dbo:Person .
?x rdfs:label "Usain Bolt"#en .
}
This query gives the correct answer with most names I tried (for example “Teddy Riner"#en) but it fails with Usain Bolt and Rachid Badouri.
I don’t get why as their DBpedia pages (Teddy Riner, Usain Bolt) are constructed the same way: they both have a rdfs:label, which is written exactly like I did.
It seems to me that there is an incoherence between the endpoint and DBpedia. But I don’t think that it's because the endpoint is not to date.
Even more surprising, this query gives the correct answer:
SELECT *
WHERE {
?x rdfs:label "Usain Bolt"#en .
}
However, Usain Bolt is a dbo:Person! Same thing for Rachid Badouri.
Could someone explain me why the first query does not give answer?
Any help would be appreciated! Thanks
According to DBpedia-Live, at the time of writing, the entity with rdfs:label "Usain Bolt"#en has many types, but is not a dbo:Person. Similar for the entity with rdfs:label "Rachid Badouri"#en.
In contrast, the entity with rdfs:label "Teddy Riner"#en is a dbo:Person.
Note: DBpedia-Live content is a moving target, varying with Wikipedia content changes, adjustments in the templates, and other variables. The statements I made above may no longer be true when you read this.

Complex SPARQL query - Virtuoso performance hints?

I have a rather complex SPARQL query, which is executed thousands of times in parallel threads (400 threads). The query is here somewhat simplified (namespaces, properties, and variables have been reduced) for readability, but the complexity is left untouched (unions, number of graphs, etc.). The query is run against 4 graphs, the biggest of which contains 5,561,181 triples.
PREFIX graphA: <GraphABaseURI:>
ASK
FROM NAMED <GraphBURI>
FROM NAMED <GraphCURI>
FROM NAMED <GraphABaseURI>
FROM NAMED <GraphDBaseURI>
WHERE{
{
GRAPH <GraphABaseURI>{
?variableA a graphA:ClassA .
?variableA graphA:propertyA ?variableB .
?variableB dcterms:title ?variableC .
?variableA graphA:propertyB ?variableD .
?variableL<GraphABaseURI:propertyB> ?variableD .
?variableD <propertyBURI> ?variableE
}
.
GRAPH <GraphBURI>{
?variableF <propertyCURI>/<propertyDURI> ?variableG .
?variableF <propertyEURI> ?variableH
}
.
GRAPH <GraphCURI>{
?variableI <http://www.w3.org/2004/02/skos/core#notation> ?variableJ .
?variableI <http://www.w3.org/2004/02/skos/core#prefLabel> ?variableK .
FILTER (isLiteral(?variableK) && REGEX(?variableK, "literalA", "i"))
}
.
FILTER (isLiteral(?variableJ) && ?variableG = ?variableJ) .
FILTER (?variableE = ?variableH)
}
UNION
{
GRAPH <GraphABaseURI>{
?variableA a graphA:ClassA .
?variableA graphA:propertyA ?variableB .
?variableB dcterms:title ?variableC .
?variableA graphA:propertyB ?variableD .
?variableL<propertyBURI> ?variableE .
?variableL <propertyFURI> ?variableD .
}
.
GRAPH <GraphDBaseURI>{
?variableM <propertyGURI> ?variableN .
?variableM <propertyHURI> ?variableO .
FILTER (isLiteral(?variableO) && REGEX(?variableO, "literalA", "i"))
}
.
FILTER (?variableE = ?variableN) .
}
UNION
{
GRAPH <GraphABaseURI>{
?variableA a graphA:ClassA .
?variableA graphA:propertyA ?variableB .
?variableB dcterms:title ?variableC .
?variableA graphA:propertyB ?variableD .
?variableL<propertyBURI> ?variableE .
?variableL <propertyIURI> ?variableD .
}
.
GRAPH <GraphDBaseURI>{
?variableM <propertyGURI> ?variableN .
?variableM <propertyHURI> ?variableO .
FILTER (isLiteral(?variableO) && REGEX(?variableO, "literalA", "i"))
}
.
FILTER (?variableE = ?variableN) .
}
. FILTER (isLiteral(?variableC) && REGEX(?variableC, "literalB", "i")) .
}
I would not expect someone to transform the above query (of course...). I am only posting the query to demonstrate the complexity and all the SPARQL structures used.
My questions:
Would I gain regarding performance if I had all my triples in one graph? This way I would avoid unions and simplify my query, however, would this also benefit in terms of performance?
Are there any kind of indexes that I could built and they could be of any help with the above query? I am not really confident on data indexing, however reading in the RDF Index Scheme section of RDF Performance Tuning, I wonder if Virtuoso 7's default indexing scheme is suitable for queries like the above. While the predicates are defined in the above query's SPARQL triple patterns, there are many triple patterns that have not defined subject or predicate. Could this be a major problem regarding performance?
Perhaps there is a SPARQL syntax structure that I am not aware of and could be of great help in the above query. Could you suggest something? For example, I have already improved performance by removing STR() casts and using the isLiteral() function. Could you suggest anything else?
Perhaps you could suggest overusing a complex SPARQL syntax structure?
Please note that I use Virtuoso Open source edition, built on Ubuntu, Version: 07.20.3214, Build: Oct 14 2015.
Regards,
Pantelis Natsiavas
First thing -- your Virtuoso build is long outdated; updating to 7.20.3217 as of April 2016 (or later) is strongly recommended.
Optimization suggestions are naturally limited when looking at a simplified query., but here are several thoughts, in no particular order...
Index Scheme Selection, the RDF Performance Tuning doc section following RDF Index Scheme, offers a couple of alternative and/or additional indexes which may make sense for your queries and data. As you say that some of your patterns will have defined graph and object, and undefined subject and predicate, some other indexes may also make sense (e.g., GOPS, GOSP), depending on some other factors.
Depending on how much your data has changed since original load, it may be worth rebuilding the free-text indexes, with this SQL command (which may be issued through any SQL interface -- iSQL, ODBC, JDBC, etc.) —
VT_INC_INDEX_DB_DBA_RDF_OBJ ()
Using the bif:contains predicate can result in substantially better performance than regex() filters, for instance replacing —
FILTER (isLiteral(?variableO) && REGEX(?variableO, "literalA", "i")) .
— with —
?variableO bif:contains "'literalA'" .
FILTER ( isLiteral(?variableO) ) .
Explain() and profile() can be helpful in query optimization efforts. Much of this output is meant for analysis by Development, so it may not mean much to you, but providing it to other Virtuoso users can still yield helpful suggestions.
For a number of reasons, the rdf:type predicate (often expressed as a, thanks to SPARQL/Turtle semantic sugar) can be a performance killer. Removing those predicates from your graph pattern is likely to boost performance substantially. If needed, there are other ways to limit the solution set (such as by testing for attributes only possessed by entities your desired rdf:type) which do not have such negative performance impacts.
(ObDisclaimer: OpenLink Software produces Virtuoso, and employs me.)

How to execute SPARQL Query (Call a service) Over extracted subgraph?

I have a RDF graph with several types of relations (relations with the same prefix and with different prefixes also). I need to call a service over the graph but filtering out some relations.
Example:
#prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>.
#prefix myPref: <http://www.myPref.com/>.
#prefix otherPref: <http://www.otherPref.com/>.
myPref:1
myPref:label "1" ;
myPref:solid myPref:2 ;
myPref:dotted myPref:4 ;
otherPref:dashed myPref:3 ;
otherPref:dashed2 myPref:3 .
myPref:2
myPref:label "2" ;
myPref:solid myPref:3 .
myPref:3
myPref:label "3" .
myPref:4
myPref:label "4" ;
myPref:dotted myPref:3 .
I would like to run the service call over an extracted sub-graph containing only the solid and dotted relations (In this particular case, running a service calculating the shortest path between 1 to 3, I want to exclude those direct links).
I run the service (Over the entire graph) like this:
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>.
PREFIX myPref: <http://www.myPref.com/>.
PREFIX otherPref: <http://www.otherPref.com/>.
PREFIX gas: <http://www.bigdata.com/rdf/gas#>
SELECT ?sp ?out {
SERVICE gas:service {
gas:program gas:gasClass "com.bigdata.rdf.graph.analytics.SSSP" .
gas:program gas:in myPref:1 .
gas:program gas:target myPref:3 .
gas:program gas:out ?out .
gas:program gas:out1 ?sp .
}
}
How can I extract a subgraph containing only the links I want (Dotted and solid) and the run the service call over the extracted sub-graph?
SPARQL doesn't provide any functionality for querying a constructed graph, unfortunately. I've come across places where it would make some queries very easy. Some endpoints do have extensions to support it, though. I think that dotNetRDF might support it. There are probably a few aspects: in many cases, it's not actually necessary; if the endpoint supports updates, you can create a new named graph and construct into it, and then launch a second query against it (which is pretty much what you're asking for, but in two steps); this could be a very expensive operation, so endpoints might disable it anyway, even if it was directly supported.
The first note, though, that it's often times not necessary, appears that it might be the case here.
I need to call a service over the graph but filtering out some relations.
In this case, you can query over the subgraph that you want, I think, by using property paths. You can ask for paths built from just solid and dashed edges like:
?s myPref:solid|myPref:dotted ?t
If you want an arbitrary path of them, you can repeat it:
?s (myPref:solid|myPref:dotted)+ ?t
If you have unique paths between sources and destinations, then you can figure out the lengths of paths using the standard "count the ways of splitting the path" technique:
select (count(?t) as ?length) {
?s (myPref:solid|myPref:dotted)* ?t
?t (myPref:solid|myPref:dotted)* ?u
}
group by ?s ?t

OWL inferencing question

I am using the Jena semantic web framework version 2.6.3. I have code that creates a model with owl inferencing and then adds the following triples:
_:bnode-3 rdf:type owl:Restriction .
_:bnode-3 owl:onProperty :offspringOf .
_:bnode-3 owl:someValuesFrom :Person .
_:bnode-3 rdfs:subClassOf :Person .
_:bnode-3 is supposed to be a restriction class which, for example, would contain :joe if :bob is a :Person and the following triple were asserted:
:joe :offspringOf :bob .
Then, since the restriction class is a subclass of Person, :joe would also be a person.
And, in fact, this works. What's confusing to me is that after I assert just the 4 triples at the top of this post, the inferencer creates a blank node which is a Person. In other words, the following triple is now in the model:
_:b0 rdf:type :Person
I don't understand why it would do this. Any help in understanding this would be greatly appreciated.
Thanks.
Kent.
I am not sure why the inferencer would do this as I am not an OWL expert - have you tried asking your question on the jena-users lists?
They will usually answer you pretty promptly and they should know why you get the observed behaviour.
Note
I reformatted your question as your code samples were somewhat confusing - please don't write out Triples as [ex:subject ex:predicate ex:object] since it looks rather like some syntactic sugar in Turtle/N3/SPARQL which would result in additional Blank Nodes being created beyond just those you intended