Using an ontology to produce semantic full information from the raw data - sparql

Problem Definition: Store sensor data (temperature readings, sensor description) into rdf form using an ontoloy. Further, use SPARQL to perform queries on stored data.
My Approach: I am not an expert in this domain, but I have some basic understanding and accordingly I am using this approach: 1. Create Ontology, 2. convert data according to ontology vocabulary, 3. store converted data into triple store, 4. Perform SPARQL queries. I am not sure whether I am following the right way. Any comments from your side will be valuable.
Till now, I have done the following:
I have created an ontology in Protege 5.0.0 as for representing temperature sensor. This ontology only represents one part of full ontology.
I have collected data in a CSV file which includes date, time and temperature reading as shown as
Now, I want to use this ontoloy for storing the csv file in rdf form in some data store. At this step I am stuck from last three days. I have found some links like link1, link2 but still I am finding it difficult to proceed further. Do I need a script which will read csv file and perform mapping to given ontology concepts. If yes, is there a sample script which does the same? Possibly, outcome might look like:
<datetime>valX</datetime>
<tempvalue>valY</tempvalue>
Can anyone guide me in the following:
1. Am I taking correct steps to solve the problem?
2. How should I solve step 3, i.e, store data according to ontology.
P.S: I have posted this question on answers.semanticweb.com also. This is only to get the response asap.

actually, this is a great use of D2RQ mapping language, and D2RQ server.
Go install D2RQ, then start it up with a connection to your relational database. Then generate a mapping file using their generator that comes with the software. Then you'll have a mapping file -- edit that and swap out the automatically generated ontology prefixes with your own. Their website has a page that explains how the mapping language works.
Once you've done that and there are no errors in the mapping file, you can actually query your whole relational dataset with SPARQL without even having to export it and load it in a real triplestore.
However, if you want to export and load into a triplestore, you'd just run the D2RQ generate triples functionality (Also included in d2rq server), and then import that triples file into a triplestore like Jena Fuseki.

Related

How to populating RDF data from RDBMS-based system

I am new to semantic-web and ontology. From few weeks ago I am start reading papers and online course about it. I have an idea to use ontology rule-based system for extending the feature on my existing reminder system, as can be seen in the attached picture. I've read about Ontology, Rules (e.g. SPIN, SPARQL), Inference engine (e.g. Jena), RDF, RDFS, OWL etc. I think I've got the general idea about it.
System Architecture:
However, one thing that I still miss is: how to integrate this rule-based system into my current system. the current system data is stored in RDBMS (mysql) database. Every transaction data on the system has the possibility to be modified in later time after creation. Meanwhile, ontology-based system - AFAIK, rely on RDF data format. My thinking is, there should be a way to convert the trx data from RDBMS to RDF to be ready to use by the ontology system.
My question are:
Does my thinking correct?
What is best practise of this process?
When there is a modified data on the existing record (RDBMS), how to reflect it on the RDF?
In relation to #3, in case of not using RDBMS, how the ontology system manage their RDF data if there is an update of individual property? is that depend on the underlying triple-store database? Since I read that using TDB only able to insert or delete.

Mount a SPARQL endpoint for use with custom ontologies and triple RDFs

I've been trying to figure out how to mount a SPARQL endpoint for a couple of days, but as much as I read I can not understand it.
Comment my intention: I have an open data server mounted on CKAN and my goal is to be able to use SPARQL queries on the data. I know I could not do it directly on the datasets themselves, and I would have to define my own OWL and convert the data I want to use from CSV format (which is the format they are currently in) to RDF triple format (to be used as linked data).
The idea was to first test with the metadata of the repositories that can be generated automatically with the extension ckanext-dcat, but is that I really do not find where to start. I've searched for information on how to install a Virtuoso server for the SPARQL, but the information I've found leaves a lot to be desired, not to say that I can find nowhere to explain how I could actually introduce my own OWLs and RDFs into Virtuoso itself.
Someone who can lend me a hand to know how to start? Thank you
I'm a little confused. Maybe this is two or more questions?
1. How to convert tabular data, like CSV, into the RDF semantic format?
This can be done with an R2RML approach. Karma is a great GUI for that purpose. Like you say, a conversion like that can really be improved with an underlying OWL ontology. But it can be done without creating a custom ontology, too.
I have elaborated on this in the answer to another question.
2. Now that I have some RDF formatted data, how can I expose it with a SPARQL endpoint?
Virtuoso is a reasonable choice. There are multiple ways to deploy it and multiple ways to load the data, and therefore LOTs of tutorial on the subject. Here's one good one, from DBpedia.
If you'd like a simpler path to starting an RDF triplestore with a SPARQL endpoint, Stardog and Blazegraph are available as JARs, and RDF4J can easily be deployed within a container like Tomcat.
All provide web-based graphical interfaces for loading data and running queries, in addition to SPARQL REST endpoints. At least Stardog also provides command-line tools for bulk loading.

Does Semantic tools like Anzo create a copy of data?

I'm new to semantic technologies. I understand what RDF, OWL and Ontologies and other basic terminologies are and how semantic search uses them. When we create a semantic search module using anzo with enterprise search capabilities. It connects with various data sources and creates relationship between them. Now I'm interested in knowing what a semantic tool like anzo does internally.
Does it creates a copy of data on local machine or it hits data sources every time we execute a SPARQL query
If it stores data, is this data stored in its row format or data is stored after cleaning and creating semantic relation between them.
What happens to data after query is executed. How does it get current data every time?
Any thoughts over it would be valuable for me.
Thanks a lot in advance!
Based on your comments, it appears you're using Anzo Graph Query Engine? If so, then the answers to you questions are
A copy of the data is held in memory
Not clear from any of the published information.
It doesn't. You need to load in the data using the 'LOAD' command.
A bit more on 3: You would be responsible for implementing a mechanism to keep the data in here up-to-date with the underlying data source. (which might be as simple as rebuilding the graph from a nightly dump or trying to implement a change data capture against the underlying store which replicated CRUD operations on the graph)
My answers are based on the marketing and support information available on the CambridgeSemantics site.

Add Triples to Protege for querying

I have the ontology and a file about 7.4 GB containing triples. How do I add this file of triples so that I can run queries on it ?
You might want to import you data into a triplestore.
See http://wiki.bitplan.com/index.php/SPARQL#The_sample_Data for an example how to import Data into a triple store - in this case blazegraph.
Different triplestores have different options. Protége is not so god for managing data directly. It's good for designing ontologies - that is specifying the schema/structure/model for the data. If you have the data in triples you might want to find out this structure by not only looking at the data but also finding sources for the underlying design. Of course you can sort of "reverse engineer" this by loading the triples into a triple store and trying queries.

semantic web + linked data integration

i'm new to semantic web.
I'm trying to do a sample application where i can query data from different data sources in one query.
i have created a small rdf file which contains references to dbpedia resources for defining localities. my question is : how can i get the data contained in my file and other information which is in the description of the distant resource (for example : the name of the person from the local file, and the total poulation in a city dbpedia-owl:populationTotal from the distant rdf file).
i don't really understand the sparql query language, i tried to use the JENA ARQ API with the SERVICE keyword but it doesn't solve the problem.
Any help please?
I guess you are looking for something like the Semantic Web Client Library, which tries to leverage the GGG. Albeit, the standard exploration algorithm of this framework is that it follows rdfs:seeAlso links. Nevertheless, the general approach seems to be what your are looking for, i.e., you would create a local graph that consists of your seed graph and that traverse the relations up to a certain level, e.g., three steps, resolves the URIs and load that content into your local triple. Utilising advanced technologies like SPARQL federation might be something for later ;)
I have retrived data from two different sources using SPARQL query with named graphs.
I used jena-ARQ to execute the sparql query.