I'd like to compare a collection of objects pairwise for a given similarity metric. The metric will be defined explicitly such that some properties much match exactly and some others can only be so different to each other (i.e., comparing floats: no more than a 50% SMAPE between them).
How would I go about constructing such a query? The output would ideally be an Nx2 table, where each row contains two IRIs for the comparable objects. Duplicates (i.e., 1==2 is a match as well as 2==1) are admissible but if we can avoid them that would be great as well.
I would like to run this on all pairs with a single query. I would probably be able to figure out how to do it for a given object, but when querying across all objects simultaneously this problem becomes much more difficult.
Does anyone have insights into how to perform this?
The idea is this:
PREFIX ex: <http://example.org/ex#>
SELECT DISTINCT ?subject1 ?subject2
WHERE {
?subject1 ex:belongs ex:commonCategory .
?subject2 ex:belongs ex:commonCategory .
?subject1 ex:exactProperty ?e .
?subject2 ex:exactProperty ?e .
?subject1 ex:approxProperty ?a1 .
?subject2 ex:approxProperty ?a2 .
FILTER ( ?subject1 > ?subject2 ) .
FILTER ( (abs(?a1-?a2)/(abs(?a1)+abs(?a2))) < 0.5 )
}
E.g., on DBpedia:
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>
PREFIX dbo: <http://dbpedia.org/ontology/>
PREFIX umbel-rc: <http://umbel.org/umbel/rc/>
SELECT DISTINCT ?subject1 ?subject2
WHERE {
?subject1 rdf:type umbel-rc:Actor .
?subject2 rdf:type umbel-rc:Actor .
?subject1 dbo:spouse ?spouse1 .
?subject2 dbo:spouse ?spouse2 .
?subject1 dbo:wikiPageID ?ID1 .
?subject2 dbo:wikiPageID ?ID2 .
FILTER ( ?subject1 > ?subject2 ) .
FILTER ( ?spouse1 = ?spouse2 ) .
FILTER ( abs(?ID1-?ID2)/xsd:float(?ID1+?ID2) < 0.05 )
}
Thus, probably, Zsa Zsa Gabor and Magda Gabor are the same person.
Both were spouses of George Sanders and their wikiPageID's are not very different from each other.
Some explanations:
The ?subject1 > ?subject2 clause removes "permutation duplicates";
On the usage of xsd:float see this question.
Related
I am in a learning phase of SPARQL. I am working with a Turtle file to extract some information. The condition is: if the exact synonym has a substring 'stroke' or 'Stroke', the query should return all the synonyms and rdfs:label.
I am using below query but getting no output:
prefix oboInOwl: <http://www.geneontology.org/formats/oboInOwl#>
prefix obo: <http://purl.obolibrary.org/obo/>
prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#>
Select * where {
?s ?p ?o .
rdfs:label <http://www.geneontology.org/formats/oboInOwl#hasExactSynonym> "stroke"^^xsd:string
}
Below is the sample Turtle file:
### https://ontology.aaaa.com/aaaa/meddra_10008196
:meddra_10008196
rdf:type owl:Class ;
<http://www.geneontology.org/formats/oboInOwl#hasDbXref> "DOID:6713" , "EFO:0000712" , "EFO:0003763" , "HE:A10008190" ;
<http://www.geneontology.org/formats/oboInOwl#hasExactSynonym>
"(cva) cerebrovascular accident" ,
"Acute Cerebrovascular Accident" ,
"Acute Cerebrovascular Accidents" ,
"Acute Stroke" ,
"Acute Strokes" ;
rdfs:label "Cerebrovascular disorder"#en ;
:hasSocs "Nervous system disorders [meddra:10029205]" , "Vascular disorders [meddra:10047065]" ;
:uid "6e46da69b727e4e924c31027cdf47b8a" .
I am expecting this output:
(cva) cerebrovascular accident
Acute Cerebrovascular Accident
Acute Cerebrovascular Accidents
Acute Stroke
Acute Strokes
Cerebrovascular disorder
With this triple pattern, you are querying for rdfs:label as subject, not as predicate:
rdfs:label <http://www.geneontology.org/formats/oboInOwl#hasExactSynonym> "stroke"^^xsd:string
What you are asking with this is: "Does the resource rdfs:label have the property oboInOwl:hasExactSynonym with the string value 'stroke'?"
But you want to ask this about the class (e.g., :meddra_10008196), not rdfs:label:
?class oboInOwl:hasExactSynonym "stroke" .
Finding matches
As you don’t want to find only exact string matches, you can use CONTAINS:
?class oboInOwl:hasExactSynonym ?matchingSynonym .
FILTER( CONTAINS(?matchingSynonym, "stroke") ) .
As you want to ignore case, you can query lower-cased synonyms with LCASE:
?class oboInOwl:hasExactSynonym ?matchingSynonym .
FILTER( CONTAINS(LCASE(?matchingSynonym), "stroke") ) .
Displaying results
To display the label and all synonyms in the same column, you could use a property path with | (AlternativePath):
?class rdfs:label|oboInOwl:hasExactSynonym ?labelOrSynonym .
Full query
# [prefixes]
SELECT ?class ?labelOrSynonym
WHERE {
?class rdfs:label|oboInOwl:hasExactSynonym ?labelOrSynonym .
FILTER EXISTS {
?class oboInOwl:hasExactSynonym ?matchingSynonym .
FILTER( CONTAINS(LCASE(?matchingSynonym), "stroke") ) .
}
}
Given the following triples
#prefix ex: <http://example.org/> .
#base <http://example.com/> .
<person1> ex:has_interpretation <interpretation1> .
<interpretation1> ex:refers_to <objectA> ;
ex:resultIn <X> .
<person2> ex:has_interpretation <interpretation2> .
<interpretation2> ex:refers_to <objectA> ;
ex:resultIn <Y> .
<person2> ex:has_interpretation <interpretation3> .
<interpretation3> ex:refers_to <objectB> ;
ex:resultIn <Z> .
<person3> ex:has_interpretation <interpretation4> .
<interpretation3> ex:refers_to <objectA> ;
ex:resultIn <ZZ> .
I am trying to use SPARQL to:
count only the number of object referred to by an interpretation by both person1 and person2 (intersection)
count the number of distinct interpretations over the object
count the number of object not referred to by an interpretation by both person1 and person2
having the above count together with a list of objects referred to by an interpretation and the people who create the interpretation.
I am having trouble specifically with 1 (and consequently, 3), as I cannot find a way to count the intersection of the interpreted objects.
My current SPARQL query which does not obtain what I want:
PREFIX ex: <http://example.org/>
SELECT ?person (COUNT(distinct ?object) as ?c_object) (group_concat(distinct ?interpretation;separator="; ") as ?interpretations)
WHERE {
?person ex:has_interpretation ?interpretation .
?interpretation
ex:refers_to ?object ;
ex:resultIn ?result .
FILTER (?person = <http://example.com/person1> || ?person = <http://example.com/person2> )
}
GROUP BY ?person ?object
What instead I would like is just:
object_uri
number_object
interpretations
person_involved
<objectA>
1
<interpretation1>,<interpretation2>
<person1>,<person2>
Any ideas?
count only the number of object referred to by an interpretation by both person1 and person2 (intersection)
PREFIX ex: <http://example.org/>
SELECT DISTINCT ?obj
WHERE {
<http://example.com/person1> ex:has_interpretation/ex:refers_to ?obj .
<http://example.com/person2> ex:has_interpretation/ex:refers_to ?obj .
}
Here the query matches only those objects (?obj) which have ex:has_interpretation/ex:refers_to paths to them from both <person1> and <person2>.
count the number of distinct interpretations over the object
Just use the COUNT() function as the paths used in the answer for 1. above are distinct (different):
PREFIX ex: <http://example.org/>
SELECT DISTINCT ?obj
WHERE {
<http://example.com/person1> ex:has_interpretation/ex:refers_to ?obj .
<http://example.com/person2> ex:has_interpretation/ex:refers_to ?obj .
}
count the number of object not referred to by an interpretation by both person1 and person2
You might be able to find a fancy way to query for this but I would just count distinct objects:
PREFIX ex: <http://example.org/>
SELECT DISTINCT ?obj
WHERE {
?x ex:refers_to ?obj .
}
...and then subtract the results from 1. above. This is easy to understand and you've already got the results from 1. to work with.
Could you please translate this query into English?
I am trying to write a naive implementation in code.
PREFIX om-owl: <http://knoesis.wright.edu/ssw/ont/sensor-observation.owl#>
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX weather: <http://knoesis.wright.edu/ssw/ont/weather.owl#>
SELECT DISTINCT ?sensor ?value ?uom
FROM NAMED STREAM <http://www.cwi.nl/SRBench/observations> [NOW - 1 HOURS]
WHERE {
?observation om-owl:procedure ?sensor ;
rdf:type/rdfs:subClassOf* weather:PrecipitationObservation ;
om-owl:result ?result .
?result ?p1 ?value .
OPTIONAL {
?result ?p2 ?uom .
}
}
Any help will be appreciated
As I understand it:
SELECT DISTINCT ?sensor ?value ?uom
Give me all the distinct sensors name, their value and the uom (I am not familiar with sensors) that correspond to the following conditions :
?observation om-owl:procedure ?sensor ;
First, give me the observations related by a procedure to a sensor.
rdf:type/rdfs:subClassOf* weather:PrecipitationObservation ;
From these observations, take all those are subclasses of Precipitations.
om-owl:result ?result .
And extract me their result.
?result ?p1 ?value .
Take all their value.
OPTIONAL { ?result ?p2 ?uom . }
And if it exist, all their uom (?).
So in the end, it seems to get all the value of rainfall aggregated by hour for each sensor.
PREFIX category: <http://dbpedia.org/resource/Category:>
SELECT DISTINCT ?attractions
?location
WHERE
{ ?attractions dcterms:subject ?places
. ?places skos:broader ?border
. ?attractions dbpprop:location|dbpedia-owl:locatedInArea|dbpprop:locale ?location
. FILTER( ?border = category:Visitor_attractions_in_Delhi )
}
I have above query giving result of attraction location of Delhi. I need to make it generic for all places, and secondly I want to filter out unwanted data. I want only attraction places, e.g., I didn't want List of Monuments and SelectCityWalk like data in my output.
this is my actual problem:
?var0 is a group variable and ?var1 is not. But whenever I try to validate the syntax, there comes the following error message:
Non-group key variable in SELECT: ?var1 in expression ( sum(?var0) / ?var1 )
The complete Query:
PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX cz: <http://www.vs.cs.hs-rm.de/ontostor/SVC#Cluster>
PREFIX n: <http://www.vs.cs.hs-rm.de/ontostor/SVC#Node>
SELECT ( (SUM(?var0) / ?var1) AS ?result)
WHERE{
?chain0 rdf:type rdfs:Property .
?chain0 rdfs:domain <http://www.vs.cs.hs-rm.de/ontostor/SVC#Cluster> .
?chain0 rdfs:range <http://www.vs.cs.hs-rm.de/ontostor/SVC#Node> .
?this ?chain0 ?arg0 .
?arg0 n:node_realtime_cpu ?var0 .
?this cz:node_count ?var1 .
}
My question is how to correct that calculation to fit the SPARQL syntax?
The immediate problem is that ?var1 is not grouped on, so a fix would be to simply append
GROUP BY ?var1
at the end of your query.
However, whether that gives you the calculation you actually want is another matter.
It's not quite clear what you're trying to calculate, but it looks as if you're attempting to determine the average node_realtime_cpu for a cluster. If that is the case, you can probably do your calculation by just using SPARQL's AVG function instead:
SELECT ( AVG(?var0) AS ?result)
WHERE{
?chain0 rdf:type rdfs:Property .
?chain0 rdfs:domain <http://www.vs.cs.hs-rm.de/ontostor/SVC#Cluster> .
?chain0 rdfs:range <http://www.vs.cs.hs-rm.de/ontostor/SVC#Node> .
?this ?chain0 ?arg0 .
?arg0 n:node_realtime_cpu ?var0 .
}
GROUP BY ?this // grouping on the cluster identifier so we get an average _per cluster_
Yet another alternative would be to keep your query as-is, but group on two variables:
GROUP BY ?this ?var1
Which is best depends on what your data looks like and what, exactly, you're trying to calculate.