Plug in for Protege to Create/Edit SPIN Constraints and Constructors? - sparql

Is there a plug-in or other means to create and edit SPARQL/SPIN constraints and constructors in Protege?
As I understand it, to capture SPIN constraints in RDF, the SPARQL code for the ASK or CONSTRUCT queries needs to be parsed and encoded. It's not stored as an opaque string. Therefore, it would seem that some plugin with knowledge of SPARQL and SPIN would be required.
I've loaded RDF from Topbraid Composer including SPIN constraints into Protege 4.3.0, and it seems to see the constraints as annotations, but I cannot seem to find all of the details, critically including all of the underlying SPARQL code. I do see it when text editing the RDF file.
In the broad sense, I'm trying to find a way to create/edit SPIN constraints and constructors and load them into Sesame to have them operate on individuals instantiated from my classes. I posted another question about the path from TopBraid Composer into Sesame. I'm trying to keep my questions more specific since I'm a newbie on Stack Overflow.
BTW, no I don't want to use SWRL instead. I've had trouble expressing the constraints I need using SWRL. I've had success using SPARQL.
Thanks.

In some versions TopBraid Composer will store SPIN constraints in RDF by default. Given that the query is stored as RDF triples, there should be no problem storing them in any RDF data store. Applying the SPIN constraints is a different issue, as the system will need to know how to interpret the queries for different SPIN properties.
Are you certain you cannot "see" them in Protégé or Sesame? The constraints are defined on the class using the property spin:constraint and should appear as a bnode. Make sure you also import http://spinrdf.org/spin, or at least define a property named spin:constraint. In the very least, the following should always work to find your constraints:
SELECT ?constraint ?class
WHERE {
?class <http://spinrdf.org/spin#constraint> ?constraint
}
...where ?constraint is bound to a bnode representing the constraint in RDF and ?class is the class the constraint is defined for.
Also, if you would rather store the constraints as SPARQL strings, see Preferences > TopBraid Composer > SPIN and check one of the boxes in "Generate sp:text...". Then you can get the query text via the following query:
SELECT ?query ?class
WHERE {
?class <http://spinrdf.org/spin#constraint> ?constraint .
?constraint <http://spinrdf.org/sp#text> ?query
}

Related

Mount a SPARQL endpoint for use with custom ontologies and triple RDFs

I've been trying to figure out how to mount a SPARQL endpoint for a couple of days, but as much as I read I can not understand it.
Comment my intention: I have an open data server mounted on CKAN and my goal is to be able to use SPARQL queries on the data. I know I could not do it directly on the datasets themselves, and I would have to define my own OWL and convert the data I want to use from CSV format (which is the format they are currently in) to RDF triple format (to be used as linked data).
The idea was to first test with the metadata of the repositories that can be generated automatically with the extension ckanext-dcat, but is that I really do not find where to start. I've searched for information on how to install a Virtuoso server for the SPARQL, but the information I've found leaves a lot to be desired, not to say that I can find nowhere to explain how I could actually introduce my own OWLs and RDFs into Virtuoso itself.
Someone who can lend me a hand to know how to start? Thank you
I'm a little confused. Maybe this is two or more questions?
1. How to convert tabular data, like CSV, into the RDF semantic format?
This can be done with an R2RML approach. Karma is a great GUI for that purpose. Like you say, a conversion like that can really be improved with an underlying OWL ontology. But it can be done without creating a custom ontology, too.
I have elaborated on this in the answer to another question.
2. Now that I have some RDF formatted data, how can I expose it with a SPARQL endpoint?
Virtuoso is a reasonable choice. There are multiple ways to deploy it and multiple ways to load the data, and therefore LOTs of tutorial on the subject. Here's one good one, from DBpedia.
If you'd like a simpler path to starting an RDF triplestore with a SPARQL endpoint, Stardog and Blazegraph are available as JARs, and RDF4J can easily be deployed within a container like Tomcat.
All provide web-based graphical interfaces for loading data and running queries, in addition to SPARQL REST endpoints. At least Stardog also provides command-line tools for bulk loading.

Inference over linked data SPARQL endpoints

When querying some linked data SPARQL endpoints via SPARQL queries, what is the type of reasoning provided (if any)?
For example, DBpedia SNORQL endpoint doesn't even provide the basic subclass inference (if A subClassOf B and B subClassOf C, then A subClassOf C). While FactForge SPARQL endpoint provides some inference (though it is not clear what kind of inference it is), and provides the possibility to switch that inference on and off.
My question:
How is it possible to identify the kind of inference applied? and if the inference support is limited, could it be extended using the endpoint only?
Inference controls will vary with the engine as well as the endpoint.
The public DBpedia SPARQL endpoint (powered by Virtuoso, from my employer, OpenLink Software) does provide various inference rules (accessible through the "Inference rules" link at the top right corner of the SPARQL endpoint query form page) which are controlled by pragmas in your SPARQL (not SNORQL, to which form you linked), such as --
DEFINE input:inference 'urn:rules.skos'
You can see the content of any predefined ruleset via SPARQL -- for the above
SELECT *
FROM <urn:rules.skos>
WHERE { ?s ?p ?o }
You can see the live query and results.
See this tutorial containing many examples.
While inference is not universally supported across SPARQL endpoints, most of the inferences supported by RDFS, RSFS+ and OWL 2 RL profiles are supported by SPARQL itself. For example, querying for instances of :A using your subClassOf entailment can be supported with SPARQL property paths:
SELECT ?inst
WHERE {
?cls rdfs:subClassOf* :A .
?inst a ?cls .
}
The first triple pattern gets all subclasses of :A, including :A (use + instead of * if you just want subclasses of :A), and the second triple finds all instances of all those classes.
To see how most of OWL 2 can be implemented with SPARQL, see Reasoning in OWL 2 RL and RDF Graphs using Rules. With a couple of exceptions, all of these can be implemented in SPARQL (and in fact you probably won't need some of them, such as eq-ref, (which is good for a computational lol that logicians may scoff at)).
There are few uses cases, beyond heavy-lifting classification problems, that can't be solved with a subset of the OWL 2 RL rules.
So, in the end, a recommendation is to understand what entailments you need. Chances are that OWL has totally overthought the issue and you can live with a few SPARQL patterns. And then you can hit the SPARQL endpoints without having to worry about whether specific inference profiles are supported.

Is it possible to load multiple ontologies with Jena?

I am new with Jena and I am implementing an application in order to manipulate RDF data. I have already implemented some basic fonctions to see Classes, Properties and other stuffs.
And I wonder if it could be possible to load multiple ontologies with Jena and then have some SPARQL queries on those. (I already know that is possible because Protégé does it).
Thank's for your interest.
I'm open for any questions or precision.

delete rdf graph data

I'm new to sparql. Can anyone tell me how I can delete an rdf graph data (eg: http://mylocalhost.com/owl/file.owl) in virtuoso. Following is how I created it:
db.dba.rdf_load_rdfxml_mt(file_to_string_output('/data/file.owl'), '', 'http://example.com/file.owl');
I did the sparql clear graph <uri> and sparql drop graph <uri>, but didn't work.
Many thanks in advance
Note that CLEAR GRAPH, and DROP GRAPH are SPARQL Update operations, so you might need to use a different method, or endpoint, I'm not familiar with how Virtuoso works.
Notice that clearing the graphs does not necessarily free up space. You will notice that the main virtuoso.db file size may remain the same after the deletion.
If you need to delete all the RDF triples (start a clear database), you can delete (or rename) the db folder that virtuoso operates on.
Restart virtuoso, and you will notice that it will create a clean database.
It is a hacky way, but works well!

Named Graphs and Federated SPARQL Endpoints

I recently came across the working draft for SPARQL 1.1 Federation Extensions and wondered whether this was already possible using Named Graphs (not to detract from the usefulness of the aforementioned draft).
My understanding of Named Graphs is a little hazy, save that the only thing I have gleamed from reading the specs comprises rules around the merger, non merger in relation to other graphs at query time. Since this doesn't fully satisfy my understanding, my question is as follows:
Given the following query:
SELECT ?something
FROM NAMED <http://www.vw.co.uk/models/used>
FROM NAMED <http://www.autotrader.co.uk/cars/used>
WHERE {
...
}
Is it reasonable to assume that a query processor/endpoint could or should in the context of the named graphs do the following:
Check is the named graph exists locally
If it doesn't then perform the following operation (in the case of the above query, I will use the second named graph)
GET /sparql/?query=EncodedQuery HTTP/1.1
Host: www.autotrader.co.uk
User-agent: my-sparql-client/0.1
Where the EncodedQuery only includes the second named graph in the FROM NAMED clause and the WHERE clause is amended accordingly with respect to GRAPH clauses (e.g if a GRAPH <http://www.vw.co.uk/models/used> {...} is being used).
Only if it can't perform the above, then do any of the following:
GET /cars/used HTTP/1.1
Host: www.autotrader.co.uk
or
LOAD <http://www.autotrader.co.uk/cars/used>
Return appropriate search results.
Obviously there might be some additional considerations around OFFSET's and LIMIT's
I also remember reading somewhere a long time ago in galaxy far far away, that the default graph of any SPARQL endpoint should be a named graph according to the following convention:
For: http://www.vw.co.uk/sparql/ there should be a named graph of: http://www.vw.co.uk that represents the default graph and so by the above logic, it should already be possible to federate SPARQL endpoints using named graphs.
The reason I ask is that I want to start promoting federation across the domains in the above example, without having to wait around for the standard, making sure that I won't do something that is out of kilter or incompatible with something else in the future.
Named graph and URLs used in federated queries (using SERVICE or FROM) are two different things. The latter point to SPARQL endpoints, the named graphs are within a triple store and have the main function of separating different data sets. This, in turn, can be useful to both improve performance and represent knowledge, such as representing what is the source of a set of statements.
For instance, you might have two data sources both stating that ?movie has-rating ?x and you might want to know which source is stating which rating, in this case you can use two named graphs associated to the two sources (e.g., http://www.example.com/rotten-tomatoes and http://www.example.com/imdb). If you're storing both data sets in the same triple store, probably you will want to use NGs, and remote endpoints are a different thing. Furthermore, the URL of a named graph can be used with vocabularies like VoID to describe a dataset as a whole (eg, the data set name, where and when the triples are imported from, who is the maintainer, user licence). This is another reason to partition your triple store into NGs.
That said, your mechanism to bind NGs to endpoint URLs might be implemented as an option, but I don't think it's a good idea to have it as mandatory, since managing remote endpoint URLs and NGs separately can be more useful.
Moreover, the real challenge in federated queries is to offer endpoint-transparent queries, making the query engine smart enough to analyse the query and understand how to split it and perform partial queries on the right endpoints (and join the results later, in an efficient way). There is a lot of research being done on that, one of the most significant results (as far as I know) is FedX, which has been used to implement several query distribution optimisations (example).
Last thing to add, I vaguely remember the convention that you mention about $url, $url/sparql. There are a couple of approaches around (e.g., LOD cloud). That said, in most nowadays triple stores (e.g., Virtuoso), queries that don't specify a named graph (don't use GRAPH) work in a way different than falling into a default graph case, they actually query the union of all named graphs in the store, which is usually much more useful (when you don't know where something is stated, or you want to integrate cross-graph data).