Extracting hierarchy for dbpedia entity using SPARQL - sparql

I am trying to extract the hierarchy of Wikipedia category or Yago classification for DBpedia resources using the SPARQL endpoint. For instance, I would like to find out all the possible categories and classes in hierarchical form of entity, say, http://dbpedia.org/resource/Nokia, like Thing → Organization → Company → … → Nokia.

A simple SPARQL select can retrieve the information that you're interested in, though it won't be arranged hierarchically. You're interested in getting all the types of a resource, as well as the rdfs:subClassOf relations between them. Here's a very simple query for Nokia that can be run on the DBpedia SPARQL endpoint
SELECT * WHERE {
dbpedia:Nokia a ?c1 ; a ?c2 .
?c1 rdfs:subClassOf ?c2 .
}
SPARQL results
If you treat each pair of classes in that result set as a directed edge and perform a topological sort , then you'll see the hierarchy of the classes to which the Nokia resource belongs. In fact, since it is probably convenient to treat this as a graph, you can get it in the form of an RDF graph by using a SPARQL construct query.
CONSTRUCT WHERE {
dbpedia:Nokia a ?c1 ; a ?c2 .
?c1 rdfs:subClassOf ?c2 .
}
SPARQL results
The construct query produces this graph (in N3 format):
#prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
#prefix dbpedia-owl: <http://dbpedia.org/ontology/> .
#prefix owl: <http://www.w3.org/2002/07/owl#> .
#prefix yago: <http://dbpedia.org/class/yago/> .
#prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
#prefix dbpedia: <http://dbpedia.org/resource/> .
dbpedia-owl:Agent rdfs:subClassOf owl:Thing .
dbpedia-owl:Company rdfs:subClassOf dbpedia-owl:Organisation .
dbpedia-owl:Organisation rdfs:subClassOf dbpedia-owl:Agent .
yago:CompaniesBasedInEspoo rdfs:subClassOf yago:Company108058098 .
dbpedia:Nokia rdf:type yago:CompaniesListedOnTheHelsinkiStockExchange ,
owl:Thing ,
yago:CompaniesBasedInEspoo ,
dbpedia-owl:Agent ,
yago:DisplayTechnologyCompanies ,
yago:ElectronicsCompaniesOfFinland ,
dbpedia-owl:Company ,
dbpedia-owl:Organisation ,
yago:Company108058098 ,
yago:CompaniesEstablishedIn1865 .
yago:CompaniesEstablishedIn1865 rdfs:subClassOf yago:Company108058098 .
yago:CompaniesListedOnTheHelsinkiStockExchange rdfs:subClassOf yago:Company108058098 .
yago:DisplayTechnologyCompanies rdfs:subClassOf yago:Company108058098 .
yago:ElectronicsCompaniesOfFinland rdfs:subClassOf yago:Company108058098 .
Remarks
The queries above retrieve the rdf:type hierarchy for Nokia. In the question, you also mention Wikipedia categories. DBpedia resources are associated with the Wikipedia categories to which their corresponding articles belong by the dcterms:subject property. Those Wikipedia categories are then structured hierarchically by skos:broader. These really are not types for the individuals though. For instance, the data contain:
dbpedia:Nokia dcterms:subject category:Finnish_brands
category:Finnish_brands skos:broader category:Brands_by_country
While it probably makes sense to say that Nokia is a Finnish_brand, it makes much less sense to say that Nokia is a Brand_by_country.

Related

Returning full subgraph from node in SPARQL with GraphDB

I am trying to get a full subgraph, with all relations intact from the following ontology: http://purl.obolibrary.org/obo/NCBITaxon_9443
The NCBI ontology is very large and we only want the subset of all the primates for example. I currently use the following query:
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
SELECT DISTINCT ?s2 ?s3 ?s4 ?s5 FROM <http://purl.obolibrary.org/obo/merged/NCBITAXON> WHERE {
SERVICE <http://sparql.hegroup.org/sparql>
{
?s rdf:type ?o .
?s2 rdfs:subClassOf ?s.
?s3 rdfs:subClassOf ?s2.
?s4 rdfs:subClassOf ?s3.
?s5 rdfs:subClassOf ?s4.
FILTER ( ?s = <http://purl.obolibrary.org/obo/NCBITaxon_9443> )
}
}
I don't think this is extensive enough and it also does not return a "nice" format. I would preferably get a subgraph in OWL or RDF(S) format back.
Can anyone help me with this? Thanks!

How to draw an RDF graph?

I am learning about the semantic web and would like to create an rdf graph of my data
#prefix dbc: <http://dbpedia.org/resource/Category:> .
#prefix dbr: <http://dbpedia.org/resource/> .
#prefix dbo: <http://dbpedia.org/ontology/> .
#prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
#prefix dct: <http://purl.org/dc/terms/> .
#prefix skos: <http://www.w3.org/2004/02/skos/core#> .
#prefix owl: <http://www.w3.org/2002/07/owl#> .
skos:broader rdf:range skos:Concept ;
owl:inverseOf skos:narrower .
dbc:Jane skos:broader dbc:Singer .
dbr:Jane a dbo:Person ;
dct:subject dbc:Singer ,
dbc:Jane .
skos:Concept a owl:Class .
dbc:Abstract_expressionism skos:narrower dbc:Singer .
Here is my attempt:
RDF Generated Graph
I used an RDF graph generator but I am unsure as to whether the results are accurate, my confusion is it did not include the first dbc line "dbc:Jane skos:broader dbc:Singer ."
Is this a mistake or is this as expected?
(Added Question) I have also used two prefixes for the same entity, is this acceptable practice or do I need to revise my data.
The graph as drawn is correct. It does include the statement about dbc:Jane as well. Also note that dbc:Jane and dbr:Jane are completely different entities, since these names "expand" to different URIs.

generalize pattern with intersection of restrictions?

I'm interested in finding diagnosis codes for variants of diabetes (or any other enumerated disease), axiomatically excluding some diseases that are more accurately diabetes-related syndromes.
I use the Monarch Disease Ontology (MonDO) as my authority on the modelling of diseases and the codes that indicate them in an electronic medical record (like ICD or SNOMED in some countries.)
I have been retrieving "all variants of a disease" with rdfs:subClassOf* triples. The MonDO model seems pretty good, but sometimes it seems a little overinclusive, and I would like to axiomatically omit some of the subClass relations.
For example, H Syndrome is asserted to be a rdfs:subClassOf* diabetes, but as a syndrome, it includes many features other than the essence of diabetes (elevated blood sugar levels for extended periods of time.) The ask below shows how I could minus this particular disease out of a query about types of diabetes, based on the fact that it has syndromic and genetic modifications.
How can I look for diseases with syndromic modifications in a general way, anticipating that the owl:Restrictions may not be composed in a consistent way? There could be more (or fewer) intersections, rds:subClassOf might be used instead of owl:equivalentClass...
PREFIX owl: <http://www.w3.org/2002/07/owl#>
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX obo: <http://purl.obolibrary.org/obo/>
prefix H_Synd: <http://purl.obolibrary.org/obo/MONDO_0019589>
prefix hasMod: <http://purl.obolibrary.org/obo/RO_0002573>
prefix syndromePres: <http://purl.obolibrary.org/obo/MONDO_0021127>
prefix genetic: <http://purl.obolibrary.org/obo/MONDO_0021152>
ask where {
H_Synd: owl:equivalentClass ?ec .
?ec owl:intersectionOf ?i1 .
?i1 rdf:first|rdf:rest ?ilpart .
?ilpart rdf:first|rdf:rest ?ilpartf , ?ilpartr .
?ilpartf rdf:type owl:Restriction ;
owl:onProperty hasMod: ;
owl:someValuesFrom syndromePres: .
?ilpartr rdf:first|rdf:rest ?ilpartrpart .
?ilpartrpart rdf:type owl:Restriction ;
owl:onProperty hasMod: ;
owl:someValuesFrom genetic: .
}
As usual, this is based on generous comments:
The property path query below retrieves 4000+ syndromic disease from my repository. I haven't gotten it to run on OntoBee yet.
Concerns:
I thought I should use a more inclusive property path, like (rdf:first|rdf:rest|rdfs:subClassOf)*, but that kept running for several minutes before I aborted. Obviously the ?pre rdf:first* ?ilpartf gets some useful classes, but ?pre rdf:first ?ilpartf pattern only gets 36 blank nodes.
I can get the same results from a simple rdfs:subClassOf* query. MonDO contains the OWL axiom below. If it has already been reasoned, I'm just getting disease classes that have been directly asserted as syndromic diseases
The thousands of syndromic disease classes are all bound to a single pair of { ?pre ?sub }. The other 35 ?pre blank nodes aren't bound to any ?sub class other than themselves.
Syndromic disease axiom
disease or disorder and (has
modifier some
has a syndromic presentation)
rdfs:subClassOf* query:
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX obo: <http://purl.obolibrary.org/obo/>
select * where {
?s rdfs:subClassOf* obo:MONDO_0002254 .
}
property path query:
PREFIX owl: <http://www.w3.org/2002/07/owl#>
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX obo: <http://purl.obolibrary.org/obo/>
prefix H_Synd: <http://purl.obolibrary.org/obo/MONDO_0019589>
prefix hasMod: <http://purl.obolibrary.org/obo/RO_0002573>
prefix syndromePres: <http://purl.obolibrary.org/obo/MONDO_0021127>
prefix genetic: <http://purl.obolibrary.org/obo/MONDO_0021152>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
select * where {
graph obo:mondo.owl {
# ?pre (rdf:first|rdf:rest|rdfs:subClassOf)* ?ilpartf .
?pre rdf:first* ?ilpartf .
?ilpartf rdf:type owl:Restriction ;
owl:onProperty hasMod: ;
owl:someValuesFrom syndromePres: .
?sub rdfs:subClassOf* ?pre .
optional {
?sub rdfs:label ?l
}
}
}

Removing unwanted superclass answers in SPARQL

I have an OWL file that includes a taxonomic hierarchy that I want to write a query where the answer includes each individual and its immediate taxonomic parent. Here's an example (the full query is rather messier).
#prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
#prefix rdf: <http:://www.w3.org/1999/02/22-rdf-syntax-ns#> .
#prefix : <urn:ex:> .
:fido rdf:type :Dog .
:Dog rdfs:subClassOf :Mammal .
:Mammal rdfs:subClassOf :Vertebrate .
:Vertebrate rdfs:subClassOf :Animal .
:fido :hasToy :bone
:kitty rdf:type :Cat .
:Cat rdfs:subClassOf :Mammal .
:kitty :hasToy :catnipMouse .
And this query does what I want.
prefix rdf: <http:://www.w3.org/1999/02/22-rdf-syntax-ns#> .
prefix : <urn:ex:> .
SELECT ?individual ?type
WHERE {
?individual :hasToy :bone .
?individual rdf:type ?type .
}
The problem is that I'd rather use a reasoned-over version of the OWL file, which unsurprisingly includes additional statements:
:fido rdf:type :Mammal .
:fido rdf:type :Vertebrate .
:fido rdf:type :Animal .
:kitty rdf:type :Mammal .
:kitty rdf:type :Vertebrate .
:kitty rdf:type :Animal .
And now the query results in additional answers about Fido being a Mammal, etc. I could just give up on using the reasoned version of the file, or, since the SPARQL queries are called from java, I could do a bunch of additional queries to find the least inclusive type that appears. My question is whether there is a reasonable pure SPARQL solution to only returning the Dog solution.
A generic solution is that you make sure you ask for the direct type only. A class C is the direct type of an instance X if:
X is of type C
there is no C' such that:
X is of type C'
C' is a subclass of C
C' is not equal to C
(that last condition is necessary, by the way, because in RDF/OWL, the subclass-relation is reflexive: every class is a subclass of itself)
In SPARQL, this becomes something like this:
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX : <urn:ex:> .
SELECT ?individual ?type
WHERE {
?individual :hasToy :bone .
?individual a ?type .
FILTER NOT EXISTS { ?individual a ?other .
?other rdfs:subClassOf ?type .
FILTER(?other != ?type)
}
}
Depending on which API/triplestore/library you use to execute these queries, there may also be other, tool-specific solutions. For example, the Sesame API (disclosure: I am on the Sesame dev team) has the option to disable reasoning for the purpose of a single query:
TupleQuery query = conn.prepareTupleQuery(SPARQL, "SELECT ...");
query.setIncludeInferred(false);
TupleQueryResult result = query.evaluate();
Sesame also offers an optional additional inferencer (called the 'direct type inferencer') which introduces additional 'virtual' properties you can query, such as sesame:directType, sesame:directSubClassOf, etc. Other tools will undoubtedly have similar options.

SPARQL query and reasoning to get similar indviduals from two different ontologies

The first ontology has the following:
Issue Ontology members(classes):
<http://www.issueonto.com/ontologies/issues#issues>
<http://www.issueonto.com/ontologies/issues#products>
Predicate/Properties:
<http://www.issueonto.com/ontologies/issues#hasIssues>
Triple store for this ontology (raw data), I show it here in Turtle format:
#prefix : <http://www.issueonto.com/ontologies/issues#> .
#prefix owl: <http://www.w3.org/2002/07/owl#> .
#prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
#prefix xml: <http://www.w3.org/XML/1998/namespace> .
#prefix xsd: <http://www.w3.org/2001/XMLSchema#> .
#prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
#base <http://www.issueonto.com/ontologies/issues> .
:Fido rdf:type :products ,
owl:NamedIndividual ;
:productName "FidoProdCEO_12"^^xsd:string ;
:hasIssues :issue_1239 .
### http://www.issueonto.com/ontologies/issues#issue_1239
:issue_1239 rdf:type :issues ,
owl:NamedIndividual ;
:issueName "FeatureIssue"^^xsd:string .
The Second ontology has the following:
Project Ontology members (classes):
<http://www.projectexample.com/ontology/project#GroupProject>
<http://www.projectexample.com/ontology/project#Project>
<http://www.projectexample.com/ontology/project#ProjectVersion>
Predicate/Properties:
<http://www.projectexample.com/ontology/project#belongsTo>
<http://www.projectexample.com/ontology/project#dependsOn>
Triple store for the ontology (raw data), I show it here in Turtle format:
#prefix : <http://www.projectexample.com/ontology/project#> .
#prefix owl: <http://www.w3.org/2002/07/owl#> .
#prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
#prefix xml: <http://www.w3.org/XML/1998/namespace> .
#prefix xsd: <http://www.w3.org/2001/XMLSchema#> .
#prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
#base <http://www.projectexample.com/ontology/project> .
### http://www.projectexample.com/ontology/project#Apple
:Apple rdf:type :ProjectVersion ,
owl:NamedIndividual ;
:hasProjectName "AppleTowandOne"^^xsd:string ;
:belongsTo :RedBlueCompany .
### http://www.projectexample.com/ontology/project#Fido
:Fido rdf:type :ProjectVersion ,
owl:NamedIndividual ;
:hasProjectName "FidoProdCEO"^^xsd:string ;
:dependsOn :Apple .
### http://www.projectexample.com/ontology/project#RedBlueCompany
:RedBlueCompany rdf:type :GroupProject ,
owl:NamedIndividual ;
:groupName "RedGroupCompant lmt"^^xsd:string .
Question
1- I would like to say, project:projectversion from ontology project same as issues:product from ontology issues, is that possible and how?
2- if question (1) is yes, how could I infer the similar individuals from the shared concepts, i.e, if we say projectversion is same as product it does not mean all the individuals are similar, in the example, i would like to automatically infer the individual issues:Fido type of issue:products is same as the individual prject:Fido type of project:projectversion. From that inferred fact, i would infer automatically that project:Fido issue:hasissue issues:issues_1239. Finally, I would like to run SPARQL query as follow:
SELECT ?product ?issue FROM <namegraph>
WHERE{
?product issues:hasIssues ?issue.
}
The results that I should get as follow:
?product ?issue
--------------------------------------------------------------------------
<http://www.projectexample.com/ontology/project#Fido> <http://www.issueonto.com/ontologies/issues#issue_1239>
<http://www.issueonto.com/ontologies/issues#Fido> <http://www.issueonto.com/ontologies/issues#issue_1239>
I would like to say, project:projectversion from ontology project same
as issues:product from ontology issues, is that possible and how?
All you need is the triple
project:projectversion owl:equivalentClass issues:product
I don't know how you're combining these ontologies; whether you're just loading the data from both into a triple store, or creating a third ontology that imports both and loading that (along with its imports) into a triple store, but somewhere you need that axiom. For a "merging" ontology like this, I'd usually create a third ontology that imports both (but leaving them unchanged) and add the axiom to that third ontology.
2- if question (1) is yes, how could I infer the similar individuals
from the shared concepts, i.e, if we say projectversion is same as
product it does not mean all the individuals are similar, in the
example, i would like to automatically infer the individual
issues:Fido type of issue:products is same as the individual
prject:Fido type of project:projectversion. From that inferred fact, i
would infer automatically that project:Fido issue:hasissue
issues:issues_1239.
You still have told us what criteria you would use to decide that issues:Fido and project:Fido are the same individual. The only apparent similarity that they have is the strings "FidoProdCEO_12" and "FidoProdCEO". Is that what the decision is supposed to be based on? If so, then you could do something like the following. I've created a minimal amount of data for convenience:
#prefix o1: <urn:ex:ont1#> .
#prefix o2: <urn:ex:ont2#> .
#prefix owl: <http://www.w3.org/2002/07/owl#> .
o1:A a o1:Product ;
o1:productName "ProductA_1234" ;
o1:hasIssue o1:issue42 .
o2:B a o2:ProjectVersion ;
o2:projectName "ProductA" .
o1:Product owl:equivalentClass o2:ProjectVersion .
prefix o1: <urn:ex:ont1#>
prefix o2: <urn:ex:ont2#>
prefix owl: <http://www.w3.org/2002/07/owl#>
select ?product ?issue where {
#-- A *product* is something that's an instance of
#-- o1:Product or another class that's equivalent
#-- to it.
?product a/(owl:equivalentClass|^owl:equivalentClass)* o1:Product
#-- The issues of a product are any of its
#-- o1:hasIssue values, or the o1:hasIssue
#-- value of any product that has a name
#-- beginning with its o2:projectName.
{ ?product o1:hasIssue ?issue }
union
{ ?product o2:projectName ?projectName .
?_product o1:productName ?productName ;
o1:hasIssue ?issue .
filter strstarts(?productName,?projectName)
}
}
------------------------
| product | issue |
========================
| o2:B | o1:issue42 |
| o1:A | o1:issue42 |
------------------------
Of course, the fact that you still end up having to examine projectName and productName values means that the equivalent class axiom isn't actually buying you all that much (at least in terms of this query). That is, it would be sufficient to just ask for "products (and projects with matching names) and their issues." That is, you get the same results from this query, which is just the second part of the first query:
prefix o1: <urn:ex:ont1#>
prefix o2: <urn:ex:ont2#>
prefix owl: <http://www.w3.org/2002/07/owl#>
select ?product ?issue where {
{ ?product o1:hasIssue ?issue }
union
{ ?product o2:projectName ?projectName .
?_product o1:productName ?productName ;
o1:hasIssue ?issue .
filter strstarts(?productName,?projectName)
}
}
The solution here is an owl:equivalentClass relation. To make this work you have to perform the following tasks:
Create and RDF document into which you place the owl:equivalentClass relation -- alternatively you can use SPARQL 1.1 INSERT to place the relation into a Virtuoso hosted Named Graph
Create an Inference Rule that functions as a Context-Lense when viewing the data in Virtuoso -- be it via SPARQL Query Results of an Entity Description Page etc.
Command (issued via SQL CommandLine of Conductor UI) for associating a Named Graph with an Inference Rule:
RDFS_RULE_SET ('{rule-name}', '{named-graph-uri-or-rdf-document-url}');
Links:
Sample file (note SQL part is commented out re., Rules and Named Graph association)
Live Example -- showcasing owl:equivalentClass reasoning via Schema.org, FOAF, and GoodRelations ontologies.