Export a project in RDF - openrefine

Is it possible to export a project in RDF (XML/RDF, N3 or other serialisation)?
Use case :
I imported RDF/XML data in Openrefine, made changes to some values, and wanted to export the result in RDF/XML, in order to replace the original file.

Yes, you can use the RDF extension for that.
I am not sure how easy it is to match the format of the RDF file you started from though. OpenRefine is primarily designed to work on tabular data so there might be better tools for this workflow.

I agree with #pintoch that OpenRefine is not necessarily the right tool for that. Was there a specific reason that you used OpenRefine and did not edit the RDF directly? Turtle is the best serialization in case you simply want to do some basic changes. For more complex things you could use Apache Jena tooling and for example write some SPARQL queries that bulk-edit content in the RDF.
If you describe in more details what you were doing we might give some other tips. But if it had to be OpenRefine, the RDF extension he linked is what you want.

Thank you
I have no specific use case, but I will train some people to use Openrefine, and I wanted to know if it was the right tool to batch edit RDF files. Well, it seems it is not the case.

Related

Using dotnetRDF library to query Large RDF file by SPARQL

i want to query an Ontology which is defined in an RDF file using SPARQL and dotnetRDF library. The problem is that the file is large, so it's not very practical to load the entire file in memory. What should i do ?
Thanks in advance
As AKSW says in the comment, the best approach would be to load your RDF file into a triple store and then run your SPARQL queries against that. dotNetRDF ships with support for several triple stores as listed at https://github.com/dotnetrdf/dotnetrdf/wiki/UserGuide-Storage-Providers. However, all you really need is a triple store that supports the SPARQL protocol and then you will be able to run your queries from dotNetRDF code using the SparqlRemoteEndpointclass as described at https://github.com/dotnetrdf/dotnetrdf/wiki/UserGuide-Querying-With-SPARQL#remote-query.
As for which triple store to use, Jena with Fuseki is probably a good open-source choice.

Mount a SPARQL endpoint for use with custom ontologies and triple RDFs

I've been trying to figure out how to mount a SPARQL endpoint for a couple of days, but as much as I read I can not understand it.
Comment my intention: I have an open data server mounted on CKAN and my goal is to be able to use SPARQL queries on the data. I know I could not do it directly on the datasets themselves, and I would have to define my own OWL and convert the data I want to use from CSV format (which is the format they are currently in) to RDF triple format (to be used as linked data).
The idea was to first test with the metadata of the repositories that can be generated automatically with the extension ckanext-dcat, but is that I really do not find where to start. I've searched for information on how to install a Virtuoso server for the SPARQL, but the information I've found leaves a lot to be desired, not to say that I can find nowhere to explain how I could actually introduce my own OWLs and RDFs into Virtuoso itself.
Someone who can lend me a hand to know how to start? Thank you
I'm a little confused. Maybe this is two or more questions?
1. How to convert tabular data, like CSV, into the RDF semantic format?
This can be done with an R2RML approach. Karma is a great GUI for that purpose. Like you say, a conversion like that can really be improved with an underlying OWL ontology. But it can be done without creating a custom ontology, too.
I have elaborated on this in the answer to another question.
2. Now that I have some RDF formatted data, how can I expose it with a SPARQL endpoint?
Virtuoso is a reasonable choice. There are multiple ways to deploy it and multiple ways to load the data, and therefore LOTs of tutorial on the subject. Here's one good one, from DBpedia.
If you'd like a simpler path to starting an RDF triplestore with a SPARQL endpoint, Stardog and Blazegraph are available as JARs, and RDF4J can easily be deployed within a container like Tomcat.
All provide web-based graphical interfaces for loading data and running queries, in addition to SPARQL REST endpoints. At least Stardog also provides command-line tools for bulk loading.

SPARQL Update minimal diff for RDF/XML?

My RDF/OWL ontology is versioned as an RDF/XML file in a git repository that I normally edit in a text editor, but I am planning a refactoring that would take too long manually and that is not possible with regular expressions alone.
Specifically, I want to split a generic property in two more specific ones based on the class of the object.
For example
:Alice :responsibleFor :ACME.
:Bob :responsibleFor :Cooking.
should become
:Alice :responsibleForCompany :ACME.
:Bob :responsibleForTask :Cooking.
I am interested in an answer for the general case as well, not just for this specific property refactoring.
My idea is to load the files into a Virtuoso Triple Store, use SPARQL Update queries to refactor the property and then export it back as RDF/XML file. The problem is that this won't keep the order and formatting, which will confuse git and make usage of the old history, such as undoing an old commit, impossible.
Is there a way to work directly with the file structure in order to produce a diff as minimal as possible?
I wouldn't worry much about the git history for undoing commits if you're going to use SPARQL update to make the changes; those update queries become your diffs. Some queries would be easy to invert to undo a change, but, if you have a base version of the ontology, applying all but the N most recent updates would effectively undo N commits.
This is a strategy we've been using for years and it works nicely.
Michael's answer is an excellent solution, but if you do wish to stick to using git history, I'd recommend that you switch to a different syntax format. RDF/XML, being XML (i.e. nested elements over multiple lines), is notoriously troublesome for line-by-line diffs, especially since the tool writing the XML can decide to completely rearrange blocks (there's no prescribed order for RDF/XML elements at the syntax level, and it's very hard to enforce anything like this).
Switch to a line-based syntax format, like N-Triples or N-Quads, and enforce a canonical ordering when exporting back from Virtuoso (should be possible by using a SPARQL query with an ORDER BY clause as the export mechanism).

Using an ontology to produce semantic full information from the raw data

Problem Definition: Store sensor data (temperature readings, sensor description) into rdf form using an ontoloy. Further, use SPARQL to perform queries on stored data.
My Approach: I am not an expert in this domain, but I have some basic understanding and accordingly I am using this approach: 1. Create Ontology, 2. convert data according to ontology vocabulary, 3. store converted data into triple store, 4. Perform SPARQL queries. I am not sure whether I am following the right way. Any comments from your side will be valuable.
Till now, I have done the following:
I have created an ontology in Protege 5.0.0 as for representing temperature sensor. This ontology only represents one part of full ontology.
I have collected data in a CSV file which includes date, time and temperature reading as shown as
Now, I want to use this ontoloy for storing the csv file in rdf form in some data store. At this step I am stuck from last three days. I have found some links like link1, link2 but still I am finding it difficult to proceed further. Do I need a script which will read csv file and perform mapping to given ontology concepts. If yes, is there a sample script which does the same? Possibly, outcome might look like:
<datetime>valX</datetime>
<tempvalue>valY</tempvalue>
Can anyone guide me in the following:
1. Am I taking correct steps to solve the problem?
2. How should I solve step 3, i.e, store data according to ontology.
P.S: I have posted this question on answers.semanticweb.com also. This is only to get the response asap.
actually, this is a great use of D2RQ mapping language, and D2RQ server.
Go install D2RQ, then start it up with a connection to your relational database. Then generate a mapping file using their generator that comes with the software. Then you'll have a mapping file -- edit that and swap out the automatically generated ontology prefixes with your own. Their website has a page that explains how the mapping language works.
Once you've done that and there are no errors in the mapping file, you can actually query your whole relational dataset with SPARQL without even having to export it and load it in a real triplestore.
However, if you want to export and load into a triplestore, you'd just run the D2RQ generate triples functionality (Also included in d2rq server), and then import that triples file into a triplestore like Jena Fuseki.

Manipulating RDF in Jena

currently, I found out that I can query using model (Model) syntax in Jena in a rdf after loading the model from a file, it gives me same output if I apply a sparql query. So, I want to know that , is it a good way to that without sparql? Though I have tested it with a small rdf file. I also want to know if I use Virtuoso can i manipulate using model syntax without sparql?
Thanks in Advance.
I'm not quite sure if I understand your question. If I can paraphrase, I think you're asking:
Is it OK to query and manipulate RDF data using the Jena Model API instead of using
SPARQL? Does it make a difference if the back-end store is Virtuoso?
Assuming that's the right re-phrasing of the question, then the first part is definitively yes: you can manipulate RDF data through the Model and OntModel APIs. In fact, I would say that's what the majority of Jena users do, particularly for small queries or updates. I find personally that going direct to the API is more succinct up to a certain point of complexity; after that, my code is clearer and more concise if I express the query in SPARQL. Obviously circumstances will have an effect: if you're working with a mixture of local stores and remote SPARQL endpoints (for which sending a query string is your only option) then you may find the consistency of always using SPARQL makes your code clearer.
Regarding Virtuoso, I don't have any direct experience to offer. As far as I know, the Virtuoso Jena Provider fully implements the features of the Model API using a Virtuoso store as the storage layer. Whether the direct API or using SPARQL queries gives you a performance advantage is something you should measure by benchmark with your data and your typical query patterns.