DBPedia - Most relevant predicates per resource - sparql

I'd like to determine the most relevant properties / predicates (not objects) for any resource in DBPedia and Yago (e.g. the top 20). For instance, intuitively for a music artist you would be interested in his age, genre, music label, records etc.
What should a good algorithm look like to solve this problem?
My current naive approach is the following.
First I retreive all classes, ordered by their "size". (Warning, very expensive query!)
SELECT distinct ?class (count(distinct ?e) as ?c)
WHERE {
?e rdf:type ?class .
}
ORDER BY DESC(?c)
Then I make a query for each of those classes to get the number of entities within that class that have that certain property.
SELECT distinct ?prop (count(distinct ?e) as ?c)
WHERE {
?e rdf:type <--CLASS--> .
?e ?prop []
}
ORDER BY DESC(?c)
<--CLASS--> is replaced by the URI of the respective class. After some post-processing this gives me a list like this:
"dbo:Agent": {
"count": 1974654,
"properties": {
"http://www.w3.org/1999/02/22-rdf-syntax-ns#type": 399948,
"http://www.w3.org/2002/07/owl#sameAs": 67799,
"dbp:name": 22272,
"dbp:hasPhotoCollection": 13122,
"http://xmlns.com/foaf/0.1/givenName": 10799,
"dbo:birthPlace": 10055,
"dbo:birthDate": 9953,
"dbo:birthYear": 9735
}
},
"dbo:Person": {
count:
...
It tells me, which properties are most relevant for which class. Of course "meta" properties like http://www.w3.org/2002/07/owl#sameAs should be ignored in a later step.
However, entities are in multiple classes and potentially every of those is important and gives additional information. E.g. dbr:John_Lennon is (among others) in dbo:Person and dbo:MusicalArtist. I need to combine these classes' property rankings. I thought of the following approach, but I'm unsure if this is actually a reasonable solution.
So my idea was to compute relative weights for every property (e.g. propX in classA) by dividing the number of entities within classA that have propX by the total number of properties in classA. If I want to merge two classes then, e.g. classA and classB (or Person and MusicalArtist), I'd simply rank the properties of both classes in combination, ordered by their relative weights (is this a legit comparison?). If a property occurs in both classes, I'd compute the harmonic mean over both for the ranking.
Assuming the above steps would actually make sense (please let me know what you think), I got one more problem. I want to combine information from DBPedia and Yago, so for dbr:John_Lennon I want to fetch the equivalent (owl:sameAs) yr:John_Lennon from Yago. How can I merge the property ranking from both datasets to finally get a list of top 20 most relevant properties consisting of a mix of both DBP and Yago properties?

Related

Do all ontologies that import 'owl' or 'rdf', implement 'domain', 'range' and other related predicates?

Sorry if this is a noob's and simple question, but it will help me resolve a conceptual confusion of mine! I have some guesses, but want to make sure.
I got the location of a part of brain via NeuroFMA ontology and the query below:
PREFIX fma: <http://sig.uw.edu/fma#>
select ?loc{
fma:Superior_temporal_gyrus fma:location ?loc}
The result was: fma:live_incus_fm_14056
I thought I might be able to get some more information on this item.
Question 1: Was there a difference if the result was a literal?
So, I used optional {?loc ?p ?o} and got some results.
However, I thought since this ontology also imported RDF and OWL, the following queries should work too, but it was not the case (hopefully these codes are correct)!
optional {?value rdfs:range ?loc}
optional {?loc rdfs:domain ?value}
optional {?loc rdf:type ?value}
Question 2 If the above queries are correct, are RDFS and OWL just a suggestion? Or do ontologies that import/ follow them have to use all their resources or at least expand on them?
Thanks!
An import declaration in OWL is, for the most part, just informative. It is typically used to signal that this ontology re-uses some of the concepts defined in the target (for example, it could define some additional subclasses of classes defined in the target data).
Whether the import results in any additional data being loaded into your dataset depends on what database/API/reasoner you use to process the ontology. Most tools don't automatically load the targets of import declarations, by default, so the presence or absence of the import-declaration will have no influence on what your queries return.
I thought since this ontology also imported RDF and OWL, the following queries should work too, but it was not the case (hopefully
these codes are correct)!
optional {?value rdfs:range ?loc}
optional {?loc rdfs:domain ?value}
optional {?loc rdfs:type ?value}
It's rdf:type, not rdfs:type. Apart from that, each of these individually look fine. However, judging from your broader query, ?loc is usually not a property, but a property value. Property values don't have domains and ranges. You could query for something like this, possibly:
optional { fma:location rdfs:domain ?value}
This asks "if the property fma:location has a domain declaration, return that declaration and bind it to the ?value variable".
More generally, whether these queries return any results has little or nothing to do with what import declaration are present in your ontology. If your ontology contains a range declaration for a property, the first pattern will return a result. If it contains a domain declaration, the second one will return a result.
And finally, if your ontology contains an instance of some class, the third pattern (corrected) will return a result. It's as simple as that.
There is no magic here: the query only returns what is present in your dataset. What is present in your dataset is determined by how you have loaded the data into your database, and (optionally) what form of reasoner you have enabled on top of your database.

Multiple disjoint classes in rdf range constraint

I want to define multiple classes (with limited inferencing) as the range of an owl objecttypeproperty. Let me explain in detail by providing you an example.
I have two classes: Furniture and Device, which are not disjoint, i.e., another subclass/instance can inherit from both classes, e.g., Lamp can be a furniture and device.
Now I would like to define an OWL objecttypeproperty: hasComponent that can only accept range as either :Furniture or :Device, NOT both.
:hasComponent rdf:type owl:ObjectProperty ;
rdf:type owl:TransitiveProperty ;
rdfs:range :Furniture ,
:Device .
When I create an instance using the property:
:furniture1 rdf:type :furniture .
:device1 rdf:type :device .
:furtniture1 :hasComponent :lamp .
The inferencing engine will infer that :device1 is a :furniture, which I dont want, because I have already defined that device1 is a device.
One solution is to remove rdf:range and explicitly define the instance types, but I did not want to remove the range because it will limit the scope of the search space.
You have to create a union class of all the classes involved and subtract their intersection (example: ((Furniture or Device) and not (Furniture and Device))) and set that class as the range. The same approach needs to be used for domains.
You can declare this as a named class, or insert it (with the necessary RDF/XML structure around it) directly into the range axiom. I would think you'll probably need the same class in multiple places, so a named class might be the best solution.

How to retrieve classes of an individual, ignoring classes that are the same, in SPARQL?

I have a taxonomy in my ontology. For some of these classes, there are "sameAs" predicates with other classes.
I need to retrieve the classes of a given individual, which is classified in this taxonomy. However, when two classes are the same (related by "sameAs"), I want to retrieve just one of them.
I am using the following query, for retrieving the labels of the classes:
select ?classLabel
where {{
OPTIONAL {
URI a ?class
?class rdfs:label ?classLabel.
}
}}
It is retrieving the label of the classes of URI. However, it is not excluding the redundante classes (classes that are the same).
How to retrieve just one class of two classes that are the same in SPARQL?

Strings in Sparql

I'm playing around with DBPedia.
With this query I get all people who were born in London:
SELECT ?person
WHERE {
?person dbo:birthPlace :London
}
But why I get an empty result when I execute this query?
SELECT ?person
WHERE {
?person dbo:birthPlace "London"
}
I just changed London to a String.
This is because the object of this relation is an entity, and not a string, hence the absence of result with the second query.
To know if a property (i.e dbo:birthPlace) relates an entity to a literal or not, one approach is to have a look at the "About" page of the property, for example, birthPlace's one.
What can be seen there is that the type of birthPlace is owl:ObjectProperty, meaning that the object of the relation will have to be an entity, defined with a URI.
The other possibility would be DatatypeProperty, as for the "abstract" property for example, where the object of the relation will be a literal.
The fact that the birth place is an entity allows a lot of things, such as retrieving specific information about that place in the same query, for example.
Hope that helps !

How to query for all direct subclasses in SPARQL?

I have A, B and C as classes related by the transitive property isSubClassOf.
So A isSuclassOF B and B isSubClassOf C. So by inference we have A isSubClassOf C.
My question: How can I write a SPARQL query to just return back for each Class its direct only subclass number. for example
A 0
B 1
C 1
Within the standard SPARQL language, you can do this by querying for those subclasses where no other subclass exists "in between", like so:
SELECT ?directSub ?super
WHERE { ?directSub rdfs:subClassOf ?super .
FILTER NOT EXISTS { ?otherSub rdfs:subClassOf ?super.
?directSub rdfs:subClassOf ?otherSub .
FILTER (?otherSub != ?directSub)
}
}
If you want to count the number of subclasses, you will need to adapt the above query using the COUNT and GROUP BY operators.
Many SPARQL engines offer some shortcuts for querying direct subclasses, however. For example in Sesame, when querying programmatically, you can disable inferencing for the duration of the query by setting a boolean property on the Query object to false. It also offers an additional reasoner which can be configured on top of a datastore and which allows you to query using a "virtual" property, sesame:directSubClassOf (as well as sesame:directType and sesame:directSubPropertyOf).
Other SPARQL engines have similar mechanisms.