extracting a chain of instances connecting two instances through a relatio - instance

I want to extract a chain of instances between two instances of my ontology by asking a SPARQL query. for example in the following figure if I want to know how A is connected to E, the result of query should be something like a list of A, B, D, F, E.
how the ontology should be designed and query should be built?
Is it even possible?

This isn't too hard. In RDF, your data can be something as simple as a direct encoding of the graph:
#prefix : <urn:ex:>
:A :connectedTo :B .
:B :connectedTo :C, :D .
:D :connectedTo :F .
:F :connectedTo :E, :G .
Then, using SPARQL property paths, you can find every node such that there's a path of connectedTo properties from A to it and from it to E, including A and E themselves:
prefix : <urn:ex:>
select ?mid where {
:A :connectedTo* ?mid .
?mid :connectedTo* :E .
}
-------
| mid |
=======
| :D |
| :F |
| :B |
| :A |
| :E |
-------
If you want to get those in order, you can additionally count how many things are between A and the "mid-node". (This is described in my answer to Is it possible to get the position of an element in an RDF Collection in SPARQL?)
prefix : <urn:ex:>
select ?mid (count(?premid) as ?i) where {
:A :connectedTo* ?premid .
?premid :connectedTo* ?mid .
?mid :connectedTo* :E .
}
group by ?mid
-----------
| mid | i |
===========
| :D | 3 |
| :F | 4 |
| :E | 5 |
| :B | 2 |
| :A | 1 |
-----------
If you actually want a single result that looks more or less like "A, B, C, D, E, F", then you adapt these queries using the techniques from my answer to Aggregating results from SPARQL query, which shows how to concatenate these into a single string.

Related

Is there a "DISTINCT ON" equivalent in SPARQL?

My data is basically an event log in RDF. I have cases and events, the latter belong to the former. Events have timestamps and an actor who triggered them.
For each case I now need the latest event, when it happened, and who triggered it.
This is roughly my current query:
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX ex: <http://example.org/>
SELECT ?case ?event ?timestamp ?actor
WHERE {
?case rdf:type ex:Case ;
ex:hasEvent ?event .
?event ex:timestamp ?timestamp ;
ex:hasActor ?actor .
}
ORDER BY ASC(?case) DESC(?timestamp)
Which yields something like this:
| case | event | timestamp | actor |
=================================================================================
| ex:case1 | ex:event1 | "2020-01-01T02:00:00Z"^^xsd:dateTimeStamp | ex:Alice |
| ex:case1 | ex:event2 | "2020-01-01T01:00:00Z"^^xsd:dateTimeStamp | ex:Bob |
| ex:case2 | ex:event3 | "2020-01-01T03:00:00Z"^^xsd:dateTimeStamp | ex:Charlie |
| ex:case2 | ex:event4 | "2020-01-01T02:00:00Z"^^xsd:dateTimeStamp | ex:Dan |
However I would like to only get the first and third row, as they correspond to the latest events for this case. Like this:
| case | event | timestamp | actor |
=================================================================================
| ex:case1 | ex:event1 | "2020-01-01T02:00:00Z"^^xsd:dateTimeStamp | ex:Alice |
| ex:case2 | ex:event3 | "2020-01-01T03:00:00Z"^^xsd:dateTimeStamp | ex:Charlie |
In order to achieve this I tried to use SELECT ?case ?event (MAX(?timestamp) AS ?latest) ?actor combined with GROUP BY ?case however SPARQL complains I need to group by ?event and ?actor as well which is not what I want of course.
I am aware that PostgreSQL has DISTINCT ON which would solve my problem, but I need to do it in SPARQL. Is there a nice way to achieve this?
Self answer based on #UninformedUser's comment:
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX ex: <http://example.org/>
SELECT ?case ?event (?latest as ?timestamp) ?actor WHERE {
?case ex:hasEvent ?event .
?event ex:timestamp ?latest ;
ex:hasActor?actor .
{ SELECT ?case (MAX(?timestamp) AS ?latest) {
?case rdf:type ex:case ;
ex:hasEvent ?event .
?event ex:timestamp ?timestamp }
group by ?case }
}

How can retrieve relation between 2 class with object property using sparql query

Assume we have an OWL file which contains follows the following properties with domains and ranges:
Domain Property Range
----------------------------
tour hascountry country
country hascity city
city hasward ward
ward hashouse house
Using SPARQL, how can I get results "between" the Tour and House classes? That is, properties whose domains and ranges such there's a "path" from Tour to the domain and from the range to House. With just these two classes, how could we find results like the following? It seems like some kind of loop might be necessary, but I don't know how to do that in SPARQL.
|tour -------- (hascountry) ----- country|
|country -------- (hascity) ----- city |
|city -------- (hasward) ----- ward |
|ward -------- (hashouse) ----- house |
First, it's always easier to work with some real data. We can't write real SPARQL queries against data that we don't have. In the future, please be sure to provide some sample that we can work with. For now, here's some sample data that describes the domains and ranges of the properties that you mentioned. Also note that properties don't connect classes; properties connect individuals. Properties can have domains and ranges, and that provides us with a way to infer additional information about the individuals that are related by the property. Anyhow, here's the data:
#prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
#prefix : <https://stackoverflow.com/q/29737549/1281433/> .
:hascountry rdfs:domain :tour ; rdfs:range :country .
:hascity rdfs:domain :country ; rdfs:range :city .
:hasward rdfs:domain :city ; rdfs:range :ward .
:hashouse rdfs:domain :ward ; rdfs:range :house .
Now, note that you could get from :tour to :country if you follow the rdfs:domain property backward to :hascountry, and then follow the rdfs:range property forward to :country. In SPARQL, you can write that as a property path:
:tour ^rdfs:domain/rdfs:range :country
If you can follow chains of that property path, you can find all the properties that are "between" :tour and :house:
prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#>
prefix : <https://stackoverflow.com/q/29737549/1281433/>
select ?domain ?property ?range where {
#-- find ?properties that have a domain
#-- and range...
?property rdfs:domain ?domain ;
rdfs:range ?range .
#-- where there's a ^rdfs:domain to rdfs:range
#-- chain from :tour to ?domain...
:tour (^rdfs:domain/rdfs:range)* ?domain .
#-- and from ?range to :house.
?range (^rdfs:domain/rdfs:range)* :house .
}
-------------------------------------
| domain | property | range |
=====================================
| :ward | :hashouse | :house |
| :city | :hasward | :ward |
| :country | :hascity | :city |
| :tour | :hascountry | :country |
-------------------------------------
Getting the results "in order"
If you want the properties "in order" from the start class to the end class, you can compute the distance from the start class to each property and order by that. You can do that using the technique in my answer to Is it possible to get the position of an element in an RDF Collection in SPARQL?. Here's what it looks like as a SPARQL query:
prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#>
prefix : <https://stackoverflow.com/q/29737549/1281433/>
select ?domain ?property ?range
(count(?mid) as ?dist)
where {
#-- find ?properties that have a domain
#-- and range...
?property rdfs:domain ?domain ;
rdfs:range ?range .
#-- where there's a ^rdfs:domain to rdfs:range
#-- chain from :tour to ?domain...
:tour (^rdfs:domain/rdfs:range)* ?domain .
#-- and from ?range to :house.
?range (^rdfs:domain/rdfs:range)* :house .
#-- then, compute the "distance" from :tour
#-- to the property. This is based on binding
#-- ?mid to each class in between them and
#-- taking the number of distinct ?mid values
#-- as the distance.
:tour (^rdfs:domain/rdfs:range)* ?mid .
?mid (^rdfs:domain/rdfs:range)* ?domain .
}
group by ?domain ?property ?range
order by ?dist
--------------------------------------------
| domain | property | range | dist |
============================================
| :tour | :hascountry | :country | 1 |
| :country | :hascity | :city | 2 |
| :city | :hasward | :ward | 3 |
| :ward | :hashouse | :house | 4 |
--------------------------------------------
I included ?dist in the select just so we could see the values. You don't have to select it in order to sort by it. You can do this too:
select ?domain ?property ?range {
#-- ...
}
group by ?domain ?property ?range
order by count(?mid)
-------------------------------------
| domain | property | range |
=====================================
| :tour | :hascountry | :country |
| :country | :hascity | :city |
| :city | :hasward | :ward |
| :ward | :hashouse | :house |
-------------------------------------

The SPARQL query - the closest blond antecedor

Let us consider we two classes: Person and its subclass BlondePerson.
Let us consider a relationship: isParent where a Person is parent of another person.
Let us define the relationship: isAncestor where there is a sequence of isParent relationships.
There might be many BlondPersons ancestor of me.
My question: how to write a SPARQL query so I learn the closest ancestor who is blond. The closest means my parent if possible, if not the grandparents, otherwise grandgrandparents and so on.
How to compose a SPARQL query for that? How to assure that I will get the ancestor who it the closest one?
Thank you.
This isn't too hard; you can use the same technique demonstrated in Is it possible to get the position of an element in an RDF Collection in SPARQL?. The idea is essentially to treat the ancestry as a sequence, from which you can get the "closeness" of each ancestor, and select the closest ancestor from a given class. If we create some sample data, we end up with something like this:
#prefix : <urn:ex:>
:a a :Person ; :hasParent :b .
:b a :Person ; :hasParent :c .
:c a :Person, :Blond ; :hasParent :d .
:d a :Person, :Blond ; :hasParent :e .
:e a :Person .
prefix : <urn:ex:>
select distinct
?person
?ancestor
(count(distinct ?mid) as ?closeness)
?isBlond
where {
values ?person { :a }
?a :hasParent+ ?mid .
?mid a :Person .
?mid :hasParent* ?ancestor .
?ancestor a :Person .
bind( if( exists { ?ancestor a :Blond }, true, false ) as ?isBlond )
}
group by ?person ?ancestor ?isBlond
order by ?person ?closeness
-------------------------------------------
| person | ancestor | closeness | isBlond |
===========================================
| :a | :b | 1 | false |
| :a | :c | 2 | true |
| :a | :d | 3 | true |
| :a | :e | 4 | false |
-------------------------------------------
That's actually more information than we needed, I just included it to show how this works. Now we can actually just require that ?ancestor is blond, order by closeness, and limit the results to the first (and thus the closest):
prefix : <urn:ex:>
select distinct
?person
?ancestor
(count(distinct ?mid) as ?closeness)
where {
values ?person { :a }
?a :hasParent+ ?mid .
?mid a :Person .
?mid :hasParent* ?ancestor .
?ancestor a :Person, :Blond .
}
group by ?person ?ancestor
order by ?person ?closeness
limit 1
---------------------------------
| person | ancestor | closeness |
=================================
| :a | :c | 2 |
---------------------------------

tbloader vs SPARQL INSERT - Why different behaviour with named graphs?

There is a strange behaviour in the connection of the commandline tools of ARQ, TDB and Named Graphs. If importing data via tdbloader in a named graph it can not be queried via GRAPH clause in a SPARQL SELECT query. However, this query is possible when inserting the data in the same graph with SPARQL INSERT.
I have following assembler description file tdb.ttl:
#prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
#prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
#prefix ja: <http://jena.hpl.hp.com/2005/11/Assembler#> .
#prefix tdb: <http://jena.hpl.hp.com/2008/tdb#> .
[] ja:loadClass "com.hp.hpl.jena.tdb.TDB" .
tdb:DatasetTDB rdfs:subClassOf ja:RDFDataset .
tdb:GraphTDB rdfs:subClassOf ja:Model .
[] rdf:type tdb:DatasetTDB ;
tdb:location "DB" ;
.
There is a dataset in the file data.ttl:
<a> <b> <c>.
Now, I am inserting this data with tdbloader and secondly another triple with SPARQL INSERT, both in the named graph data:
tdbloader --desc tdb.ttl --graph data data.ttl
update --desc tdb.ttl "INSERT DATA {GRAPH <data> {<d> <e> <f>.}}"
Now, the data can be queried with SPARQL via:
$arq --desc tdb.ttl "SELECT * WHERE{ GRAPH ?g {?s ?p ?o.}}"
----------------------------
| s | p | o | g |
============================
| <a> | <b> | <c> | <data> |
| <d> | <e> | <f> | <data> |
----------------------------
Everything seems perfect. But now I want to query only this specifc named graph data:
$ arq --desc tdb.ttl "SELECT * WHERE{ GRAPH <data> {?s ?p ?o.}}"
-------------------
| s | p | o |
===================
| <d> | <e> | <f> |
-------------------
Why is the data imported from tdbloader missing? What is wrong with this query? How can I get results back from both imports?
Try this query:
PREFIX : <data>
SELECT * { { ?s ?p ?o } UNION { GRAPH ?g { ?s ?p ?o } } }
and the output is
----------------------------
| s | p | o | g |
============================
| <a> | <b> | <c> | <data> |
| <d> | <e> | <f> | : |
----------------------------
or try:
tdbquery --loc DB --file Q.rq -results srj
to get the results in a different form.
The text output is makign things look nice but two different things end up as <data>.
What you are seeing is that
tdbloader --desc tdb.ttl --graph data data.ttl
used data exactly as is to name the graph. But
INSERT DATA {GRAPH <data> {<d> <e> <f>.}}
does a full SPARQL parse, and resolves against the base URI, probably looking like file://*currentdirectory*.
When printing in text, URIs get abbreviated, including using the base. So both the original data (from tdbloader) and file:///path/data appear as <data>.
PREFIX : <data>
gives the text output a different way to write it as :.
Finally try:
BASE <http://example/>
SELECT * { { ?s ?p ?o } UNION { GRAPH ?g { ?s ?p ?o } } }
which sets the base URI to something no where near your data URIs so switching off nice formatting by base URI:
----------------------------------------------------------------------------------------------------------------
| s | p | o | g |
================================================================================================================
| <file:///home/afs/tmp/a> | <file:///home/afs/tmp/b> | <file:///home/afs/tmp/c> | <data> |
| <file:///home/afs/tmp/d> | <file:///home/afs/tmp/e> | <file:///home/afs/tmp/f> | <file:///home/afs/tmp/data> |
----------------------------------------------------------------------------------------------------------------

How to rank values in SPARQL?

I would like to create a ranking of observations using SPARQL. Suppose I have:
#prefix : <http://example.org#> .
:A :value 60 .
:B :value 23 .
:C :value 89 .
:D :value 34 .
The ranking should be: :C = 1 (the highest), :A = 2, :D = 3, :B = 4. Up until now, I was able solve it using the following query:
prefix : <http://example.org#>
prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
SELECT ?x ?v ?ranking {
?x :value ?v .
{ SELECT (GROUP_CONCAT(?x;separator="") as ?ordered) {
{ SELECT ?x {
?x :value ?v .
} ORDER BY DESC(?v)
}
}
}
BIND (str(?x) as ?xName)
BIND (strbefore(?ordered,?xName) as ?before)
BIND ((strlen(?before) / strlen(?xName)) + 1 as ?ranking)
} ORDER BY ?ranking
But that query only works if the URIs for ?x have the same length. A better solution would be to have a function similar to strpos in PHP or isIndexOf in Java, but as far as I know, they are not available in SPARQL 1.1. Are there simpler solutions?
One way of doing this is to take as the ranking for a value the number of values which are less than or equal to it. This might be inefficient for larger data sets, since for each value it has to check all the other values. It doesn't require string manipulation though.
PREFIX : <http://example.org#>
SELECT ?x ?v (COUNT(*) as ?ranking) WHERE {
?x :value ?v .
[] :value ?u .
FILTER( ?v <= ?u )
}
GROUP BY ?x ?v
ORDER BY ?ranking
---------------------
| x | v | ranking |
=====================
| :C | 89 | 1 |
| :A | 60 | 2 |
| :D | 34 | 3 |
| :B | 23 | 4 |
---------------------