How to write a SPARQL query to pull from OWL file - sparql

So this is my owl File I absolutely new to Jena/SQL items so I am just trying to test it out.
prefix part:
<Ontology xmlns="http://www.w3.org/2002/07/owl#"
xml:base="http://www.semanticweb.org/ontologies/5/test-ontology-2
xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
xmlns:xml="http://www.w3.org/XML/1998/namespace"
xmlns:xsd="http://www.w3.org/2001/XMLSchema#"
xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#"
ontologyIRI="http://www.semanticweb.org/ontologies/5/test-ontology-2>
<Prefix name="" IRI="http://www.semanticweb.org/ontologies/5/test-ontology-2/>
<Prefix name="owl" IRI="http://www.w3.org/2002/07/owl#"/>
<Prefix name="rdf" IRI="http://www.w3.org/1999/02/22-rdf-syntax-ns#"/>
<Prefix name="xml" IRI="http://www.w3.org/XML/1998/namespace"/>
<Prefix name="xsd" IRI="http://www.w3.org/2001/XMLSchema#"/>
<Prefix name="rdfs" IRI="http://www.w3.org/2000/01/rdf-schema#"/>
I am attempting to write one to just pull anything that shows hasConcept. However, I know the hasConcept is part of the prefix? I think that is what it is called. So I'm not sure how to just filter it to pull it.
<owl:NamedIndividual rdf:about="http://www.semanticweb.org/ontologies/5/test-ontology-
2#Structures">
<rdf:type rdf:resource="http://www.semanticweb.org/ontologies/5/test-ontology-2#Course"/>
<untitled-ontology-2:hasConcept
rdf:resource="http://www.semanticweb.org/ontologies/5/test-ontology-2#Cong"/>
<untitled-ontology-2:hasConcept
rdf:resource="http://www.semanticweb.org/ontologies/5/test-ontology-2#Func"/>
<untitled-ontology-2:hasTopic rdf:resource="http://www.semanticweb.org/ontologies/5/test-
ontology-2#Time"/>
<untitled-ontology-2:courseNumber>CMSC 2123</untitled-ontology-2:courseNumber>
</owl:NamedIndividual>
I've tried going through the documentation on Apache Jena which I can understand through their example for RDF but I still get a little confused mainly because I'm not a good programmer so concepts are still hard for me to understand.
But I'm just trying any help would be greatly appreciated.
I'm not sure how to pull just the #func,#cong,#time part or if it is even possible to pull just this section.

Related

Slow SPARQL query with rdf:Seq

I'm trying to retrieve all elements of a rdf:Seq with SPARQL. The RDF structure is as follows. A subproject with a rdf:Seq of timeclaims and the individual timeclaim information. The list of timeclaims for a subproject can be of any length:
<rdf:Description rdf:about="http://www.example.com/resource/subproject/2017-nieuw-1">
<rdf:type rdf:resource="http://www.example.com/ontologie/example/Subproject"/>
<rdfs:label>Subproject label</rdfs:label>
<pbl:subproject_timeclaims rdf:resource="http://www.example.com/resource/list/5853abbfdcc97"/>
</rdf:Description>
<rdf:Description rdf:about="http://www.example.com/resource/list/5853abbfdcc97">
<rdf:type rdf:resource="http://www.w3.org/1999/02/22-rdf-syntax-ns#Seq"/>
<rdf:_1 rdf:resource="http://www.example.com/resource/timeclaim/5853abbfd6aa4"/>
<rdf:_7 rdf:resource="http://www.example.com/resource/timeclaim/5853abbfd957b"/>
<rdf:_6 rdf:resource="http://www.example.com/resource/timeclaim/5853abbfd8e68"/>
<rdf:_14 rdf:resource="http://www.example.com/resource/timeclaim/5853abbfdc541"/>
<rdf:_5 rdf:resource="http://www.example.com/resource/timeclaim/5853abbfd879f"/>
<rdf:_2 rdf:resource="http://www.example.com/resource/timeclaim/5853abbfd71db"/>
<rdf:_3 rdf:resource="http://www.example.com/resource/timeclaim/5853abbfd78be"/>
<rdf:_4 rdf:resource="http://www.example.com/resource/timeclaim/5853abbfd7f92"/>
<rdf:_8 rdf:resource="http://www.example.com/resource/timeclaim/5853abbfd9c4c"/>
<rdf:_9 rdf:resource="http://www.example.com/resource/timeclaim/5853abbfda31c"/>
<rdf:_10 rdf:resource="http://www.example.com/resource/timeclaim/5853abbfdaa08"/>
<rdf:_11 rdf:resource="http://www.example.com/resource/timeclaim/5853abbfdb0e6"/>
<rdf:_12 rdf:resource="http://www.example.com/resource/timeclaim/5853abbfdb7bd"/>
<rdf:_13 rdf:resource="http://www.example.com/resource/timeclaim/5853abbfdbe7f"/>
</rdf:Description>
<rdf:Description rdf:about="http://www.example.com/resource/timeclaim/5853abbfdc541">
<rdf:type rdf:resource="http://www.example.com/ontologie/example/Timeclaim"/>
<pbl:timeclaim_description>Description</pbl:timeclaim_description>
<pbl:timeclaim_hours>25</pbl:timeclaim_hours>
<pbl:timeclaim_employee
rdf:resource="http://www.example.com/resource/employee/2222333334444"/>
</rdf:Description>
Starting from the timeclaims I'm trying to retrieve the information of the subproject above (and filter on it). But the query is taking forever. Eventually the data is returned but I have the feeling it could be quicker.
SELECT *
WHERE {
?tc_item a :Timeclaim .
?tc_list ?p ?tc_item .
?subproject pbl:subproject_timeclaims ?tc_list
}
Could you point out any mistakes in the SPARQL query and better ways of doing this? Or maybe the RDF structure could be improved? The numbering in this case is not really relevant but the same list structure with rdf:Seq is present in more places in the database (and the order is important in those cases).

SPARQL query does not find individual in imported ontology when run in jena

I have 3 ontology files where the first imports the second and the second imports the third:
The first ontology imports the second one:
<?xml version="1.0"?>
<rdf:RDF xmlns="http://www.example.com/user/rainer/ontologies/2016/1/usecase_individuals#"
xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
xmlns:owl="http://www.w3.org/2002/07/owl#"
xmlns:xml="http://www.w3.org/XML/1998/namespace"
xmlns:xsd="http://www.w3.org/2001/XMLSchema#"
xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#"
xmlns:uc="http://www.example.com/user/rainer/ontologies/2016/1/usecase#">
<owl:Ontology rdf:about="http://www.example.com/user/rainer/ontologies/2016/1/usecase_individuals">
<owl:imports rdf:resource="http://www.example.com/user/rainer/ontologies/2016/1/usecase"/>
</owl:Ontology>
....
The second ontology imports the thrid one:
<?xml version="1.0"?>
<rdf:RDF xmlns="http://www.example.com/user/rainer/ontologies/2016/1/usecase#"
xml:base="http://www.example.com/user/rainer/ontologies/2016/1/usecase"
xmlns:fgcm="http://www.example.com/user/rainer/ontologies/2016/1/fgcm#"
xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
xmlns:owl="http://www.w3.org/2002/07/owl#"
xmlns:xml="http://www.w3.org/XML/1998/namespace"
xmlns:xsd="http://www.w3.org/2001/XMLSchema#"
xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#"
xmlns:uc="http://www.example.com/user/rainer/ontologies/2016/1/usecase#">
<owl:Ontology rdf:about="http://www.example.com/user/rainer/ontologies/2016/1/usecase">
<owl:imports rdf:resource="http://www.boeing.com/user/rainer/ontologies/2016/1/fgcm"/>
</owl:Ontology>
....
And the third ontology (created in Protégé) asserts an individual:
<?xml version="1.0"?>
<rdf:RDF xmlns="http://www.boeing.com/user/rainer/ontologies/2016/1/fgcm#"
...
<owl:NamedIndividual rdf:about="http://www.boeing.com/user/rainer/ontologies/2016/1/fgcm#admin">
<rdf:type rdf:resource="http://www.boeing.com/user/rainer/ontologies/2016/1/fgcm#User"/>
<userName>admin</userName>
</owl:NamedIndividual>
...
When I open the first ontology in Protégé and execute the SPARQL query
PREFIX fgcm: <http://www.example.com/user/rainer/ontologies/2016/1/fgcm#>
SELECT ?subject ?name WHERE { ?subject fgcm:userName ?name}
it find the individual in the third ontology without a problem. When I run the same SPARQL query from code in Jena I don't get that individual. The query is run against an OntModel that was created with the default settings.
I know that Jena is able to load and import the ontologies because I can access classes and properties from the imported ontologies, both in SPARQL queries and directly using the Jena API. My problem appears to be limited to the individuals that are asserted in the imported ontology.
I have searched for settings (when loading the ontology such as the different OntModelSpecs or when creating/ running the query) that might change this behavior but haven't found any solutions.
It turned out that I was mistaken about Jena successfully loading the imported ontologies. (Not getting an error does not imply that the ontology that should be imported was actually found).
The SPARQL queries returned the expected result after using an OntDocumentManager and telling it where to find the ontology files that needed to be imported. This is the code snipped that worked for me:
OntDocumentManager mgr = new OntDocumentManager ();
mgr.addAltEntry("http://www.boeing.com/user/rainer/ontologies/2016/1/usecase", "file:C:\\Dev\\luna_workspace\\fgcm_translate\\usecase.owl");
mgr.addAltEntry("http://www.boeing.com/user/rainer/ontologies/2016/1/fgcm", "file:C:\\Dev\\luna_workspace\\fgcm_translate\\fgcm.owl");
OntModelSpec spec = new OntModelSpec ( OntModelSpec .OWL_DL_MEM_TRANS_INF);
spec.setDocumentManager(mgr);
OntModel model = ModelFactory.createOntologyModel(spec);
I hope this helps someone if they run into a similar problem.

Upgrading sitecore 6.6 index configuration to sitecore 7 (using ComputedFields)

Sitecore CMS+DMS 6.6.0 rev.130404 => 7.0 rev.130424
In our project we have been using AdvancedDatabaseCrawler (ADC) for our indexes (specially because of it's dynamic fields feature). Here's a sample index configuration:
<index id="GeoIndex" type="Sitecore.Search.Index, Sitecore.Kernel">
<param desc="name">$(id)</param>
<param desc="folder">$(id)</param>
<analyzer ref="search/analyzer" />
<locations hint="list:AddCrawler">
<web type="scSearchContrib.Crawler.Crawlers.AdvancedDatabaseCrawler, scSearchContrib.Crawler">
<database>web</database>
<root>/sitecore/content/Globals/Locations</root>
<IndexAllFields>true</IndexAllFields>
<include hint="list:IncludeTemplate">
<!--Suburb Template-->
<suburb>{FF0D64AA-DCB4-467A-A310-FF905F9393C0}</suburb>
</include>
<dynamicFields hint="raw:AddDynamicFields">
<dynamicField type="OurApp.CustomSearchFields.SearchTextField,OurApp" name="search text" storageType="NO" indexType="TOKENIZED" vectorType="NO" />
<dynamicField type="OurApp.CustomSearchFields.LongNameField,OurApp" name="display name" storageType="YES" indexType="UN_TOKENIZED" vectorType="NO" />
</dynamicFields>
</web>
</locations>
</index>
As you can see, we use scSearchContrib.Crawler.Crawlers.AdvancedDatabaseCrawler as the crawler and it uses the fields defined inside <dynamicFields hint="raw:AddDynamicFields"> section to inject custom fields into the index.
Now we are upgrading our project to sitecore 7. In Sitecore 7, they have ported the DynamicFields functionality from ADC into sitecore. I found out some articles on this and converted our custom search field classes to implement sitecore 7 IComputedIndexField interface instead of inheriting from BaseDynamicField class in ADC. Now my problem is how to change the index configuration to match with new sitecore 7 APIs. There were bits and pieces on the web but couldn't find all the examples I needed to convert my configuration. Can anybody help me on this?
While I'm doing this I'm under the impression that we won't have to rebuild our indexes since it still uses Lucene internally. I don't want to change the index structure. Just want to upgrade the code and configuration from AdvancedDatabaseCrawler to Sitecore 7. Should I be worried about breaking our existing indexes? Please shed some light on this as well.
Thanks
A few quick clarifications :)
We have not merged ADC into Sitecore 7, the ContentSearch layer is a complete rewrite of the old search layer for Sitecore. We have taken some of the core concepts from ADC, such as dynamic fields, and put them in the new implementation (as ComputedFields). They are not 1:1 compatible and you will have to do some work on your indexes.
The version of Lucene has also been upgraded from 2.* to 3.0.3 so all indexes will need to be re-created anyway as they are a very different version of Lucene.
There are two options here, the old Lucene search (Sitecore.Search namespace) (which ADC was built upon) has not been touched and will still work in the same way, although I am not sure about ADC compatibility with SItecore 7 as in theory this has now been superseded.
The next option is to update your index to take advantage of the new search features of Sitecore 7. The configuration you have will not be directly compatible but, while you will need to rework your index into the new configuration, the basic concepts should be familiar to you. The dynamic fields you have should map logically to ComputedFields (fields that are calculated when an item is indexed) and everything else is straightforward.
While it looks like a lot of extra config for ContentSearch a lot of it is default config that you will not need to touch, you will just need to override the configuration parts for the computed fields you want to add and the template you want to include.
An example of creating your own configuration override can be found here : http://www.mikkelhm.dk/post/2013/10/12/Defining-a-custom-index-in-Sitecore-7-and-utilizing-it.aspx
I would also recommend making sure you upgrade to 7.0 rev. 131127 (7.0 Update-3) as this fixes a bug in the IncludeTemplates logic in the version you currently have.
I managed to convert the index configuration for sitecore ContentSearch API. Looking at Sitecore default index configurations was a great help for this.
Note: As also mentioned by Stephen, <include hint="list:IncludeTemplate"> does not work in Sitecore 7.0 initial release. It's fixed in Sitecore 7.0 rev. 131127 (7.0 Update-3) and I'm planning to upgrade to it.
Here's a good article on sitecore 7 index update strategies by John West. It'll help you in configuration your indexes the way you want.
Converted configuration:
<sitecore>
<contentSearch>
<configuration type="Sitecore.ContentSearch.LuceneProvider.LuceneSearchConfiguration, Sitecore.ContentSearch.LuceneProvider">
<DefaultIndexConfiguration type="Sitecore.ContentSearch.LuceneProvider.LuceneIndexConfiguration, Sitecore.ContentSearch.LuceneProvider">
<IndexAllFields>true</IndexAllFields>
<include hint="list:IncludeTemplate">
<!--Suburb Template-->
<suburb>{FF0D64AA-DCB4-467A-A310-FF905F9393C0}</suburb>
</include>
<fields hint="raw:AddComputedIndexField">
<field fieldName="search text" storageType="NO" indexType="TOKENIZED" vectorType="NO">OurApp.CustomSearchFields.SearchTextField,OurApp</field>
<field fieldName="display name" storageType="YES" indexType="UN_TOKENIZED" vectorType="NO">OurApp.CustomSearchFields.LongNameField,OurApp</field>
</fields>
</DefaultIndexConfiguration>
<indexes hint="list:AddIndex">
<index id="GeoIndex" type="Sitecore.ContentSearch.LuceneProvider.LuceneIndex, Sitecore.ContentSearch.LuceneProvider">
<param desc="name">$(id)</param>
<param desc="folder">$(id)</param>
<!-- This initializes index property store. Id has to be set to the index id -->
<param desc="propertyStore" ref="contentSearch/databasePropertyStore" param1="$(id)" />
<strategies hint="list:AddStrategy">
<!-- NOTE: order of these is controls the execution order -->
<strategy ref="contentSearch/indexUpdateStrategies/onPublishEndAsync" />
</strategies>
<commitPolicy hint="raw:SetCommitPolicy">
<policy type="Sitecore.ContentSearch.TimeIntervalCommitPolicy, Sitecore.ContentSearch" />
</commitPolicy>
<commitPolicyExecutor hint="raw:SetCommitPolicyExecutor">
<policyExecutor type="Sitecore.ContentSearch.CommitPolicyExecutor, Sitecore.ContentSearch" />
</commitPolicyExecutor>
<locations hint="list:AddCrawler">
<crawler type="Sitecore.ContentSearch.LuceneProvider.Crawlers.DefaultCrawler, Sitecore.ContentSearch.LuceneProvider">
<Database>web</Database>
<Root>/sitecore/content/Globals/Countries</Root>
</crawler>
</locations>
</index>
</indexes>
</configuration>
</contentSearch>
</sitecore>

Modeling geographic co-ordinates in OWL

I am developing an ontology and need to model geographic co-ordinates (lat/long) as part of an address of a person. Geo Names was the obvious choice, but it's too large and verbose for my use, which led me to W3C Geo vocabulary (http://www.w3.org/2003/01/geo/).
It has a Point class, and lat/long/alt properties which should suffice my need. However, I am not able to find it, let alone set it as properties in Protege. Further investigation reveaved that “Point” is an rdfs:Class and "lat/long/alt" are rdf:Properties. I am guessing this is the reason why it is not showing up in Protege.
Is there a way to use these properties in an OWL ontology? Or are there other vocabularies that would let me specify geographic Points, Lines etc?
Thanks,
Assuming you are trying to open the file wgs84_pos present on the page http://www.w3.org/2003/01/geo/, it appears that the properties are using a format not understood by Protege 4.1 (plain RDF). Look at the line 143, you will see that:
<rdf:Property rdf:about="http://www.w3.org/2003/01/geo/wgs84_pos#lat">
<rdfs:domain rdf:resource="http://www.w3.org/2003/01/geo/wgs84_pos#SpatialThing" />
<rdfs:label>latitude</rdfs:label>
<rdfs:comment>The WGS84 latitude of a SpatialThing (decimal degrees).</rdfs:comment>
</rdf:Property>
rdf:Property is not in the scope of OWL (too generic, in OWL properties are either object or data properties), therefore not displayed by Protege 4.1.
I advise that you re-create the ontology from scratch following the documentation on the web page and by looking at the RDF file. Just add the properties you need (should be quick), save, open the saved file and compare with the one you downloaded to see the differences.
The rough structure of the ontology made with Protege looks like this:
<?xml version="1.0"?>
<!DOCTYPE rdf:RDF [
<!ENTITY owl "http://www.w3.org/2002/07/owl#" >
<!ENTITY xsd "http://www.w3.org/2001/XMLSchema#" >
<!ENTITY rdfs "http://www.w3.org/2000/01/rdf-schema#" >
<!ENTITY wgs84_pos "http://www.w3.org/2003/01/geo/wgs84_pos#" >
<!ENTITY rdf "http://www.w3.org/1999/02/22-rdf-syntax-ns#" >
]>
<rdf:RDF xmlns="http://www.w3.org/2003/01/geo/wgs84_pos#"
xml:base="http://www.w3.org/2003/01/geo/wgs84_pos"
xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#"
xmlns:wgs84_pos="http://www.w3.org/2003/01/geo/wgs84_pos#"
xmlns:owl="http://www.w3.org/2002/07/owl#"
xmlns:xsd="http://www.w3.org/2001/XMLSchema#"
xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#">
<owl:Ontology rdf:about="http://www.w3.org/2003/01/geo/wgs84_pos#"/>
<owl:DatatypeProperty rdf:about="&wgs84_pos;lat">
<rdfs:domain rdf:resource="&wgs84_pos;SpatialThing"/>
</owl:DatatypeProperty>
<owl:DatatypeProperty rdf:about="&wgs84_pos;long">
<rdfs:domain rdf:resource="&wgs84_pos;SpatialThing"/>
</owl:DatatypeProperty>
<owl:Class rdf:about="&wgs84_pos;Point">
<rdfs:subClassOf rdf:resource="&wgs84_pos;SpatialThing"/>
</owl:Class>
<owl:Class rdf:about="&wgs84_pos;SpatialThing"/>

Does the Wikipedia API support searches for a specific template?

Is it possible to query the Wikipedia API for articles that contain a specific template? The documentation does not describe any action that would filter search results to pages that contain a template. Specifically, I am after pages that contain Template:Persondata. After that, I am hoping to be able to retrieve just that specific template in order to populate genealogy data for the openancestry.org project.
The query below shows that the Albert Einstein page contains the Persondata Template, but it doesn't return the contents of the template, and I don't know how to get a list of page titles that contain the template.
http://en.wikipedia.org/w/api.php?action=query&prop=templates&titles=Albert%20Einstein&tlcontinue=736|10|ParmPart
Returns:
<api>
<query>
<pages>
<page pageid="736" ns="0" title="Albert Einstein">
<templates>
...
<tl ns="10" title="Template:Persondata"/>
...
</templates>
</page>
</pages>
</query>
<query-continue>
<templates tlcontinue="736|10|Reflist"/>
</query-continue>
</api>
I suspect that I can't get what I need from the API, but I'm hoping I'm wrong and that someone has already blazed a trail down this path.
You can use the embeddedin query to find all pages that include the template:
curl 'http://en.wikipedia.org/w/api.php?action=query&list=embeddedin&eititle=Template:Persondata&eilimit=5&format=xml'
Which gets you:
<?xml version="1.0"?>
<api>
<query>
<embeddedin>
<ei pageid="307" ns="0" title="Abraham Lincoln" />
<ei pageid="308" ns="0" title="Aristotle" />
<ei pageid="339" ns="0" title="Ayn Rand" />
<ei pageid="340" ns="0" title="Alain Connes" />
<ei pageid="344" ns="0" title="Allan Dwan" />
</embeddedin>
</query>
<query-continue>
<embeddedin eicontinue="10|Persondata|595" />
</query-continue>
</api>
See full docs at mediawiki.org.
Edit Use embeddedin query instead of backlinks (which doesn't cover template inclusions)
Using embeddedin does not allow you to search for a specific person, the search string becomes the Template:Persondata.
The best way I've found to get only people from Wikipedia is to use list=search and filter the search using AND"Born"AND"Occupation":
http://en.wikipedia.org/w/api.php?action=query&list=search&srsearch="Tom Cruise"AND"Born"AND"Occupation"&format=jsonfm&srprop=snippet&srlimit=50`
Remember that Wikipedia is using a search engine that doesn't yet allow us to search only the title, it will search the full text. You can take advantage of that to get more precise results.
The accepted answer explains how to list pages using a certain template, but if you need to search for pages using the template, you can with the hastemplate: search keyword: https://en.wikipedia.org/w/api.php?action=query&list=search&srsearch=hastemplate:NPOV%20physics