SPARQL count item in an intersection - sparql

Given the following triples
#prefix ex: <http://example.org/> .
#base <http://example.com/> .
<person1> ex:has_interpretation <interpretation1> .
<interpretation1> ex:refers_to <objectA> ;
ex:resultIn <X> .
<person2> ex:has_interpretation <interpretation2> .
<interpretation2> ex:refers_to <objectA> ;
ex:resultIn <Y> .
<person2> ex:has_interpretation <interpretation3> .
<interpretation3> ex:refers_to <objectB> ;
ex:resultIn <Z> .
<person3> ex:has_interpretation <interpretation4> .
<interpretation3> ex:refers_to <objectA> ;
ex:resultIn <ZZ> .
I am trying to use SPARQL to:
count only the number of object referred to by an interpretation by both person1 and person2 (intersection)
count the number of distinct interpretations over the object
count the number of object not referred to by an interpretation by both person1 and person2
having the above count together with a list of objects referred to by an interpretation and the people who create the interpretation.
I am having trouble specifically with 1 (and consequently, 3), as I cannot find a way to count the intersection of the interpreted objects.
My current SPARQL query which does not obtain what I want:
PREFIX ex: <http://example.org/>
SELECT ?person (COUNT(distinct ?object) as ?c_object) (group_concat(distinct ?interpretation;separator="; ") as ?interpretations)
WHERE {
?person ex:has_interpretation ?interpretation .
?interpretation
ex:refers_to ?object ;
ex:resultIn ?result .
FILTER (?person = <http://example.com/person1> || ?person = <http://example.com/person2> )
}
GROUP BY ?person ?object
What instead I would like is just:
object_uri
number_object
interpretations
person_involved
<objectA>
1
<interpretation1>,<interpretation2>
<person1>,<person2>
Any ideas?

count only the number of object referred to by an interpretation by both person1 and person2 (intersection)
PREFIX ex: <http://example.org/>
SELECT DISTINCT ?obj
WHERE {
<http://example.com/person1> ex:has_interpretation/ex:refers_to ?obj .
<http://example.com/person2> ex:has_interpretation/ex:refers_to ?obj .
}
Here the query matches only those objects (?obj) which have ex:has_interpretation/ex:refers_to paths to them from both <person1> and <person2>.
count the number of distinct interpretations over the object
Just use the COUNT() function as the paths used in the answer for 1. above are distinct (different):
PREFIX ex: <http://example.org/>
SELECT DISTINCT ?obj
WHERE {
<http://example.com/person1> ex:has_interpretation/ex:refers_to ?obj .
<http://example.com/person2> ex:has_interpretation/ex:refers_to ?obj .
}
count the number of object not referred to by an interpretation by both person1 and person2
You might be able to find a fancy way to query for this but I would just count distinct objects:
PREFIX ex: <http://example.org/>
SELECT DISTINCT ?obj
WHERE {
?x ex:refers_to ?obj .
}
...and then subtract the results from 1. above. This is easy to understand and you've already got the results from 1. to work with.

Related

Extract synonyms and label from Turtle file using SPARQL

I am in a learning phase of SPARQL. I am working with a Turtle file to extract some information. The condition is: if the exact synonym has a substring 'stroke' or 'Stroke', the query should return all the synonyms and rdfs:label.
I am using below query but getting no output:
prefix oboInOwl: <http://www.geneontology.org/formats/oboInOwl#>
prefix obo: <http://purl.obolibrary.org/obo/>
prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#>
Select * where {
?s ?p ?o .
rdfs:label <http://www.geneontology.org/formats/oboInOwl#hasExactSynonym> "stroke"^^xsd:string
}
Below is the sample Turtle file:
### https://ontology.aaaa.com/aaaa/meddra_10008196
:meddra_10008196
rdf:type owl:Class ;
<http://www.geneontology.org/formats/oboInOwl#hasDbXref> "DOID:6713" , "EFO:0000712" , "EFO:0003763" , "HE:A10008190" ;
<http://www.geneontology.org/formats/oboInOwl#hasExactSynonym>
"(cva) cerebrovascular accident" ,
"Acute Cerebrovascular Accident" ,
"Acute Cerebrovascular Accidents" ,
"Acute Stroke" ,
"Acute Strokes" ;
rdfs:label "Cerebrovascular disorder"#en ;
:hasSocs "Nervous system disorders [meddra:10029205]" , "Vascular disorders [meddra:10047065]" ;
:uid "6e46da69b727e4e924c31027cdf47b8a" .
I am expecting this output:
(cva) cerebrovascular accident
Acute Cerebrovascular Accident
Acute Cerebrovascular Accidents
Acute Stroke
Acute Strokes
Cerebrovascular disorder
With this triple pattern, you are querying for rdfs:label as subject, not as predicate:
rdfs:label <http://www.geneontology.org/formats/oboInOwl#hasExactSynonym> "stroke"^^xsd:string
What you are asking with this is: "Does the resource rdfs:label have the property oboInOwl:hasExactSynonym with the string value 'stroke'?"
But you want to ask this about the class (e.g., :meddra_10008196), not rdfs:label:
?class oboInOwl:hasExactSynonym "stroke" .
Finding matches
As you don’t want to find only exact string matches, you can use CONTAINS:
?class oboInOwl:hasExactSynonym ?matchingSynonym .
FILTER( CONTAINS(?matchingSynonym, "stroke") ) .
As you want to ignore case, you can query lower-cased synonyms with LCASE:
?class oboInOwl:hasExactSynonym ?matchingSynonym .
FILTER( CONTAINS(LCASE(?matchingSynonym), "stroke") ) .
Displaying results
To display the label and all synonyms in the same column, you could use a property path with | (AlternativePath):
?class rdfs:label|oboInOwl:hasExactSynonym ?labelOrSynonym .
Full query
# [prefixes]
SELECT ?class ?labelOrSynonym
WHERE {
?class rdfs:label|oboInOwl:hasExactSynonym ?labelOrSynonym .
FILTER EXISTS {
?class oboInOwl:hasExactSynonym ?matchingSynonym .
FILTER( CONTAINS(LCASE(?matchingSynonym), "stroke") ) .
}
}

Nested SPARQL to query collection - Dependent Lookup

I have this SPARQL query that returns the id and Label for THING X from vocabulary 1, by matching against a range of potential attribute matches (labels and notation)
PREFIX skos:<http://www.w3.org/2004/02/skos/core#>
PREFIX rdfs:<http://www.w3.org/2000/01/rdf-schema#>
SELECT DISTINCT ?id ?displayText
WHERE {
VALUES ?pred {
skos:prefLabel
skos:altLabel
skos:hiddenLabel
rdfs:label
skos:notation
}
FILTER (STR(?displayText) = "THING X")
?id skos:inScheme <http://exampleuri.com/def/vocabulary1> .
?id ?pred ?displayText .
}
ORDER BY ?displayText ?concept
I have this query that returns the concepts in the collection of stuff related to THING X from vocabulary2
PREFIX skos:<http://www.w3.org/2004/02/skos/core#>
SELECT DISTINCT ?id ?displayText
WHERE {{ <http://exampleuri.com/def/vocabulary2/THING-X-collection> skos:member ?id . ?id skos:prefLabel ?displayText }} ORDER BY ?displayText
What I need is to nest the first query into the second so that I search for THING X in vocabulary 1 and it returns the skos:members of the associated collection in vocabulary 2 (where the label of the vocab 1 concept and the vocab 2 collection are equal).

Aggregate inside Subquery for SPARQL

Im a using Virtuoso and DBpedia as an endpoint.
My purpose is to retrieve all movies which have a greater amount of actor than the mean number of actors for all movies.
I thought the following query would work:
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
SELECT
DISTINCT ?film
COUNT(?actor) AS ?numActors
WHERE{
?film rdf:type dbp:Film .
?film dbp:starring ?actor .
{
SELECT
AVG(?numActors) AS ?avgNumActors
WHERE{
SELECT
?Sfilm
COUNT(?Sactor) AS ?numActors
WHERE{
?Sfilm rdf:type dbp:Film .
?Sfilm dbp:starring ?Sactor
}
}
}
}
GROUP BY ?film
HAVING (COUNT(?actor) > ?avgNumActors)
LIMIT 20
but I receveice the following error
Variable ?avgNumActors is used in the result set outside aggregate and not mentioned in GROUP BY clause
What am I doing wrong?

How can I translate SPARQL query into English

Could you please translate this query into English?
I am trying to write a naive implementation in code.
PREFIX om-owl: <http://knoesis.wright.edu/ssw/ont/sensor-observation.owl#>
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX weather: <http://knoesis.wright.edu/ssw/ont/weather.owl#>
SELECT DISTINCT ?sensor ?value ?uom
FROM NAMED STREAM <http://www.cwi.nl/SRBench/observations> [NOW - 1 HOURS]
WHERE {
?observation om-owl:procedure ?sensor ;
rdf:type/rdfs:subClassOf* weather:PrecipitationObservation ;
om-owl:result ?result .
?result ?p1 ?value .
OPTIONAL {
?result ?p2 ?uom .
}
}
Any help will be appreciated
As I understand it:
SELECT DISTINCT ?sensor ?value ?uom
Give me all the distinct sensors name, their value and the uom (I am not familiar with sensors) that correspond to the following conditions :
?observation om-owl:procedure ?sensor ;
First, give me the observations related by a procedure to a sensor.
rdf:type/rdfs:subClassOf* weather:PrecipitationObservation ;
From these observations, take all those are subclasses of Precipitations.
om-owl:result ?result .
And extract me their result.
?result ?p1 ?value .
Take all their value.
OPTIONAL { ?result ?p2 ?uom . }
And if it exist, all their uom (?).
So in the end, it seems to get all the value of rainfall aggregated by hour for each sensor.

Pairwise comparison with SPARQL

I'd like to compare a collection of objects pairwise for a given similarity metric. The metric will be defined explicitly such that some properties much match exactly and some others can only be so different to each other (i.e., comparing floats: no more than a 50% SMAPE between them).
How would I go about constructing such a query? The output would ideally be an Nx2 table, where each row contains two IRIs for the comparable objects. Duplicates (i.e., 1==2 is a match as well as 2==1) are admissible but if we can avoid them that would be great as well.
I would like to run this on all pairs with a single query. I would probably be able to figure out how to do it for a given object, but when querying across all objects simultaneously this problem becomes much more difficult.
Does anyone have insights into how to perform this?
The idea is this:
PREFIX ex: <http://example.org/ex#>
SELECT DISTINCT ?subject1 ?subject2
WHERE {
?subject1 ex:belongs ex:commonCategory .
?subject2 ex:belongs ex:commonCategory .
?subject1 ex:exactProperty ?e .
?subject2 ex:exactProperty ?e .
?subject1 ex:approxProperty ?a1 .
?subject2 ex:approxProperty ?a2 .
FILTER ( ?subject1 > ?subject2 ) .
FILTER ( (abs(?a1-?a2)/(abs(?a1)+abs(?a2))) < 0.5 )
}
E.g., on DBpedia:
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>
PREFIX dbo: <http://dbpedia.org/ontology/>
PREFIX umbel-rc: <http://umbel.org/umbel/rc/>
SELECT DISTINCT ?subject1 ?subject2
WHERE {
?subject1 rdf:type umbel-rc:Actor .
?subject2 rdf:type umbel-rc:Actor .
?subject1 dbo:spouse ?spouse1 .
?subject2 dbo:spouse ?spouse2 .
?subject1 dbo:wikiPageID ?ID1 .
?subject2 dbo:wikiPageID ?ID2 .
FILTER ( ?subject1 > ?subject2 ) .
FILTER ( ?spouse1 = ?spouse2 ) .
FILTER ( abs(?ID1-?ID2)/xsd:float(?ID1+?ID2) < 0.05 )
}
Thus, probably, Zsa Zsa Gabor and Magda Gabor are the same person.
Both were spouses of George Sanders and their wikiPageID's are not very different from each other.
Some explanations:
The ?subject1 > ?subject2 clause removes "permutation duplicates";
On the usage of xsd:float see this question.