Length of path between nodes, allowing for multiple relation types (SPARQL) - sparql

I have to write a SPARQL query that returns the length of the path between two nodes (:persA and :persD) that are connected by these relationships:
#prefix : <http://www.example.org/> .
:persA :knows :knowRelation1 .
:knowRelation1 :hasPerson :persB .
:persB :knows :knowRelation2 .
:knowRelation2 :hasPerson :persC .
:persC :knows :knowRelation3 .
:knowRelation3 :hasPerson :persD .
I tried with this query:
PREFIX : <http://www.example.org/>
SELECT (COUNT(?mid) AS ?length)
WHERE
{
:persA (:knows | :hasPerson)* ?mid .
?mid (:knows | :hasPerson)+ :persD .
}
the result seems to be a infinite loop.
Any advice/examples of how this can be done?

After fixing some syntax errors in your initial post, the provided triples and query work for me in GraphDB Free 8.2 and BlazeGraph 2.1.1. I have since applied these edits to your post itself.
added a trailing / to your definition of the empty prefix
added a trailing . to your prefix definition line (required if you want to start with a #)
fixed the spelling of length (OK, that's just a cosmetic fix)
.
#prefix : <http://www.example.org/> .
:persA :knows :knowRelation1 .
:knowRelation1 :hasPerson :persB .
:persB :knows :knowRelation2 .
:knowRelation2 :hasPerson :persC .
:persC :knows :knowRelation3 .
:knowRelation3 :hasPerson :persD .
.
PREFIX : <http://www.example.org/>
SELECT (COUNT(?mid) AS ?length)
WHERE
{ :persA (:knows|:hasPerson)* ?mid .
?mid (:knows|:hasPerson)+ :persD
}
result:
length
"6"^^xsd:integer

Related

Extract synonyms and label from Turtle file using SPARQL

I am in a learning phase of SPARQL. I am working with a Turtle file to extract some information. The condition is: if the exact synonym has a substring 'stroke' or 'Stroke', the query should return all the synonyms and rdfs:label.
I am using below query but getting no output:
prefix oboInOwl: <http://www.geneontology.org/formats/oboInOwl#>
prefix obo: <http://purl.obolibrary.org/obo/>
prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#>
Select * where {
?s ?p ?o .
rdfs:label <http://www.geneontology.org/formats/oboInOwl#hasExactSynonym> "stroke"^^xsd:string
}
Below is the sample Turtle file:
### https://ontology.aaaa.com/aaaa/meddra_10008196
:meddra_10008196
rdf:type owl:Class ;
<http://www.geneontology.org/formats/oboInOwl#hasDbXref> "DOID:6713" , "EFO:0000712" , "EFO:0003763" , "HE:A10008190" ;
<http://www.geneontology.org/formats/oboInOwl#hasExactSynonym>
"(cva) cerebrovascular accident" ,
"Acute Cerebrovascular Accident" ,
"Acute Cerebrovascular Accidents" ,
"Acute Stroke" ,
"Acute Strokes" ;
rdfs:label "Cerebrovascular disorder"#en ;
:hasSocs "Nervous system disorders [meddra:10029205]" , "Vascular disorders [meddra:10047065]" ;
:uid "6e46da69b727e4e924c31027cdf47b8a" .
I am expecting this output:
(cva) cerebrovascular accident
Acute Cerebrovascular Accident
Acute Cerebrovascular Accidents
Acute Stroke
Acute Strokes
Cerebrovascular disorder
With this triple pattern, you are querying for rdfs:label as subject, not as predicate:
rdfs:label <http://www.geneontology.org/formats/oboInOwl#hasExactSynonym> "stroke"^^xsd:string
What you are asking with this is: "Does the resource rdfs:label have the property oboInOwl:hasExactSynonym with the string value 'stroke'?"
But you want to ask this about the class (e.g., :meddra_10008196), not rdfs:label:
?class oboInOwl:hasExactSynonym "stroke" .
Finding matches
As you don’t want to find only exact string matches, you can use CONTAINS:
?class oboInOwl:hasExactSynonym ?matchingSynonym .
FILTER( CONTAINS(?matchingSynonym, "stroke") ) .
As you want to ignore case, you can query lower-cased synonyms with LCASE:
?class oboInOwl:hasExactSynonym ?matchingSynonym .
FILTER( CONTAINS(LCASE(?matchingSynonym), "stroke") ) .
Displaying results
To display the label and all synonyms in the same column, you could use a property path with | (AlternativePath):
?class rdfs:label|oboInOwl:hasExactSynonym ?labelOrSynonym .
Full query
# [prefixes]
SELECT ?class ?labelOrSynonym
WHERE {
?class rdfs:label|oboInOwl:hasExactSynonym ?labelOrSynonym .
FILTER EXISTS {
?class oboInOwl:hasExactSynonym ?matchingSynonym .
FILTER( CONTAINS(LCASE(?matchingSynonym), "stroke") ) .
}
}

SPARQL property paths based on a new property defined in a CONSTRUCT subquery

Given the following schema, "driver-passenger" lineages can be easily seen:
tp:trip a owl:Class ;
rdfs:label "trip"#en ;
rdfs:comment "an 'asymmetric encounter' where someone is driving another person."#en .
tp:driver a owl:ObjectProperty ;
rdfs:label "driver"#en ;
rdfs:comment "has keys."#en ;
rdfs:domain tp:trip ;
rdfs:range tp:person .
tp:passenger a owl:ObjectProperty ;
rdfs:label "passenger"#en ;
rdfs:comment "has drinks."#en ;
rdfs:domain tp:trip ;
rdfs:range tp:person .
Consider the following data:
<alice> a tp:person .
<grace> a tp:person .
<tim> a tp:person .
<ruth> a tp:person .
<trip1> a tp:trip ;
tp:participants <alice> , <grace> ;
tp:driver <alice> ;
tp:passenger <grace> .
<trip2> a tp:trip ;
tp:participants <alice> , <tim> ;
tp:driver <alice> ;
tp:passenger <tim> .
<trip3> a tp:trip ;
tp:participants <tim> , <grace> ;
tp:driver <tim> ;
tp:passenger <grace> .
<trip4> a tp:trip ;
tp:participants <grace> , <ruth> ;
tp:driver <grace> ;
tp:passenger <ruth> .
<trip5> a tp:trip ;
tp:participants <grace> , <tim> ;
tp:driver <grace> ;
tp:passenger <tim> .
Now let a "driver-passenger descendent" be any tp:passenger at the end of a trip sequence where the tp:passenger of one trip is the tp:driver of the next trip
Ex. <ruth> is a descendent of <alice> according to the following sequence of trips:
<trip2> -> <trip3> -> <trip4>.
Question:
How to get the (ancestor,descendent) pairs of all driver-passenger lineages?
Attempt 1:
I initially tried the following CONSTRUCT subquery to define an object property: tp:drove, which can be easily used in a property path. However, this did not work on my actual data:
SELECT ?originalDriver ?passengerDescendent
WHERE {
?originalDriver tp:drove+ ?passengerDescendent .
{
CONSTRUCT { ?d tp:drove ?p . }
WHERE { ?t a tp:trip .
?t tp:driver ?d .
?t tp:passenger ?p .}
}
}
Attempt 2:
I tried to create property path which expresses an ancestor as the driver of a passenger, but I don't think I've properly understood how this is supposed to work:
(tp:driver/^tp:passenger)+
Regarding MWE: Is there some kind of RDF sandbox that would allow me to create an MWE by defining a simple ontology like tp above, along with some sample data? The following "playgrounds" are available but none of them seem to support defining a toy ontology: SPARQL Playground, SPARQL Explorer.
Notes on related content:
This question is directly related to a previous question, but no longer requires saving the paths themselves, a feature not directly supported by SPARQL 1.1.
This answer by Joshua Taylor seems relevant, but doesn't address the identification of specific types of paths, such as the lineages defined above.
This one seems to do the trick:
select ?driver ?passenger where {
?driver (^tp:driver/tp:passenger)+ ?passenger .
filter( ?driver != ?passenger)
}
The filter condition can be removed if you want to also see relationships that lead back to the same person.

Pairwise comparison with SPARQL

I'd like to compare a collection of objects pairwise for a given similarity metric. The metric will be defined explicitly such that some properties much match exactly and some others can only be so different to each other (i.e., comparing floats: no more than a 50% SMAPE between them).
How would I go about constructing such a query? The output would ideally be an Nx2 table, where each row contains two IRIs for the comparable objects. Duplicates (i.e., 1==2 is a match as well as 2==1) are admissible but if we can avoid them that would be great as well.
I would like to run this on all pairs with a single query. I would probably be able to figure out how to do it for a given object, but when querying across all objects simultaneously this problem becomes much more difficult.
Does anyone have insights into how to perform this?
The idea is this:
PREFIX ex: <http://example.org/ex#>
SELECT DISTINCT ?subject1 ?subject2
WHERE {
?subject1 ex:belongs ex:commonCategory .
?subject2 ex:belongs ex:commonCategory .
?subject1 ex:exactProperty ?e .
?subject2 ex:exactProperty ?e .
?subject1 ex:approxProperty ?a1 .
?subject2 ex:approxProperty ?a2 .
FILTER ( ?subject1 > ?subject2 ) .
FILTER ( (abs(?a1-?a2)/(abs(?a1)+abs(?a2))) < 0.5 )
}
E.g., on DBpedia:
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>
PREFIX dbo: <http://dbpedia.org/ontology/>
PREFIX umbel-rc: <http://umbel.org/umbel/rc/>
SELECT DISTINCT ?subject1 ?subject2
WHERE {
?subject1 rdf:type umbel-rc:Actor .
?subject2 rdf:type umbel-rc:Actor .
?subject1 dbo:spouse ?spouse1 .
?subject2 dbo:spouse ?spouse2 .
?subject1 dbo:wikiPageID ?ID1 .
?subject2 dbo:wikiPageID ?ID2 .
FILTER ( ?subject1 > ?subject2 ) .
FILTER ( ?spouse1 = ?spouse2 ) .
FILTER ( abs(?ID1-?ID2)/xsd:float(?ID1+?ID2) < 0.05 )
}
Thus, probably, Zsa Zsa Gabor and Magda Gabor are the same person.
Both were spouses of George Sanders and their wikiPageID's are not very different from each other.
Some explanations:
The ?subject1 > ?subject2 clause removes "permutation duplicates";
On the usage of xsd:float see this question.

Query evaluation took too long

I am using GraphDB Free 7.1 and I have created a repository with the default settings. I have uploaded a ttl file with 2.7 million triplets. I am trying to issue a query (not very complex, but quite complex) that should return 200k answers and the Workbench displays just 1k answers and the GraphDB log displays an exception
10:52:19.580 [repositories/PaaSport] INFO c.o.f.sesame.RepositoryController - POST query -1325396809
10:52:29.594 [repositories/PaaSport] ERROR o.o.h.s.r.TupleQueryResultView - Query interrupted
org.openrdf.query.QueryInterruptedException: Query evaluation took too long
...
10:52:29.594 [repositories/PaaSport] INFO o.o.h.s.r.TupleQueryResultView - Request for query -1325396809 is finished
The query I'm using is:
SELECT DISTINCT ?offering ?Value
WHERE {
?offering a paasport:Offering ;
DUL:satisfies ?groundDescription .
?groundDescription paasport:offers ?characteristic .
?characteristic a paasport:Storage ;
DUL:hasParameter ?par .
?par a paasport:StorageCapacity ;
DUL:hasParameterDataValue ?Value ;
DUL:parametrizes ?qualityValue .
?qualityValue uomvocab:measuredIn ?Units .
?Units a ?AppParMeasureUnitType .
ucum:GB a ?AppParMeasureUnitType .
?Units a uomvocab:SimpleDerivedUnit .
ucum:GB a uomvocab:SimpleDerivedUnit .
ucum:GB uomvocab:derivesFrom ?BasicUnit .
?Units uomvocab:derivesFrom ?BasicUnit .
ucum:GB uomvocab:modifierPrefix ?prefix1 .
?Units uomvocab:modifierPrefix ?prefix2 .
?prefix1 uomvocab:factor ?Factor1 .
?prefix2 uomvocab:factor ?Factor2 .
FILTER( xsd:double(?Factor2)*?Value = xsd:double(?Factor1)*4)
}
Since the query timeout is set to 0, I am not sure what causes the query interruption exception; most probably memory problems?
Very simple queries (e.g. return all instances of a certain class) work OK.
Are there any hints? Any help would be appreciated.
I can provide more details if needed.
Best,
Nick
Actually I have managed to cut down the query to the minimum, in order to be answered. The problem was mainly due to the following triple patterns:
ucum:GB rdf:type ?AppParMeasureUnitType .
ucum:GB rdf:type uomvocab:SimpleDerivedUnit .
ucum:GB uomvocab:derivesFrom ?BasicUnit .
If these are omitted and the corresponding variables in the original query are replaced by constant resources, then the query is answered.
Here is the resulting query:
SELECT DISTINCT ?offering ?Value
WHERE {
?offering rdf:type paasport:Offering .
?offering DUL:satisfies ?groundDescription .
?groundDescription paasport:offers ?characteristic .
?characteristic rdf:type paasport:Storage .
?characteristic DUL:hasParameter ?par .
?par rdf:type paasport:StorageCapacity .
?par DUL:hasParameterDataValue ?Value .
?par DUL:parametrizes ?qualityValue .
?qualityValue uomvocab:measuredIn ?Units .
?Units rdf:type ucum:UnitOf-infotech .
?Units rdf:type uomvocab:SimpleDerivedUnit .
?Units uomvocab:derivesFrom <http://purl.oclc.org/NET/muo/ucum/unit/amount-of-information/byte> .
ucum:GB uomvocab:modifierPrefix ?prefix1 .
?Units uomvocab:modifierPrefix ?prefix2 .
?prefix1 uomvocab:factor ?Factor1 .
?prefix2 uomvocab:factor ?Factor2 .
FILTER( xsd:double(?Factor2)*?Value >= xsd:double(?Factor1)*2)
}
Do you really need DISTINCT? That always slows down things, since it has to fetch all results in memory, sort them and uniq them before starting to serve.
Do you need = or >= in the FILTER? If = then replace the filter with a BIND(...as ?Factor2), and put these things before searching for ?prefix2
Have you profiled the query? http://graphdb.ontotext.com/documentation/standard/explain-plan.html

How do i fit that sparql calculation?

this is my actual problem:
?var0 is a group variable and ?var1 is not. But whenever I try to validate the syntax, there comes the following error message:
Non-group key variable in SELECT: ?var1 in expression ( sum(?var0) / ?var1 )
The complete Query:
PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX cz: <http://www.vs.cs.hs-rm.de/ontostor/SVC#Cluster>
PREFIX n: <http://www.vs.cs.hs-rm.de/ontostor/SVC#Node>
SELECT ( (SUM(?var0) / ?var1) AS ?result)
WHERE{
?chain0 rdf:type rdfs:Property .
?chain0 rdfs:domain <http://www.vs.cs.hs-rm.de/ontostor/SVC#Cluster> .
?chain0 rdfs:range <http://www.vs.cs.hs-rm.de/ontostor/SVC#Node> .
?this ?chain0 ?arg0 .
?arg0 n:node_realtime_cpu ?var0 .
?this cz:node_count ?var1 .
}
My question is how to correct that calculation to fit the SPARQL syntax?
The immediate problem is that ?var1 is not grouped on, so a fix would be to simply append
GROUP BY ?var1
at the end of your query.
However, whether that gives you the calculation you actually want is another matter.
It's not quite clear what you're trying to calculate, but it looks as if you're attempting to determine the average node_realtime_cpu for a cluster. If that is the case, you can probably do your calculation by just using SPARQL's AVG function instead:
SELECT ( AVG(?var0) AS ?result)
WHERE{
?chain0 rdf:type rdfs:Property .
?chain0 rdfs:domain <http://www.vs.cs.hs-rm.de/ontostor/SVC#Cluster> .
?chain0 rdfs:range <http://www.vs.cs.hs-rm.de/ontostor/SVC#Node> .
?this ?chain0 ?arg0 .
?arg0 n:node_realtime_cpu ?var0 .
}
GROUP BY ?this // grouping on the cluster identifier so we get an average _per cluster_
Yet another alternative would be to keep your query as-is, but group on two variables:
GROUP BY ?this ?var1
Which is best depends on what your data looks like and what, exactly, you're trying to calculate.