SPARQL UPDATE validation - semantic-web

I have a Sesame triplestore with an ontology imported in it.
I know I can do SPARQL Update operations on it by inserting instances, deleting instances and updating things and stuff.
But what if these operations are used in wrong way, like inserting an invalid triple that has no logic and does not respect the ontology rules. A triple like:
foo:Anna foo:likesToEat foo:arsenic.
And the ontology looks like this:
#prefix foo: <http://www.foo.org/ontologies/example#>.
foo:Anna rdf:type foo:Person.
foo:Anna rdf:type owl:NamedIndividual.
foo:Food rdf:type owl:Class.
foo:Metal rdf:type owl:Class.
foo:Person rdf:type owl:Class.
foo:arsenic rdf:type foo:Metal.
foo:arsenic rdf:type owl:NamedIndividual.
foo:likesToEat rdf:type owl:ObjectProperty.
foo:likesToEat rdfs:domain foo:Person.
foo:likesToEat rdfs:range foo:Food.
foo:pizza rdf:type foo:Food.
foo:pizza rdf:type owl:NamedIndividual.
As you can see the triple "foo:Anna foo:likesToEat foo:arsenic" is invalid because the range of the objectProperty is not respected.
My questions are:
Is there a way of validating these kind of updates, so that the update operation will execute only if the ontology is respected? Is there way for setting the triple store to validate these things or it has to be done manually?

As you can see the triple "foo:Anna foo:likesToEat foo:arsenic" is invalid because the range of the objectProperty is not respected.
This is not how (RDF(S)) ontologies work. From the perspective of the ontology, that triple is perfectly valid. The fact that the range of foo:likesToEat is defined to be the class foo:Food just means that we can infer that foo:arsenic is of type foo:Food. There's nothing in your ontology that makes that invalid or inconsistent: after all you've said nowhere that something cannot be both a Food and a Metal.
More generally speaking: domain/range statements in RDF Schema are not about "closing" what a property can be used on. The semantics of RDF work the other way around: a domain/range restriction on a property P specifies that if a certain individual X uses property P, we can infer that X belongs to the domain/range class of P.
There is no built-in functionality in Sesame to perform the kind of validation you are asking for, mostly for this reason.
However, if you really wanted to, you could of course implement something that rejects or warns when a triple is being inserted that you consider invalid (for whatever reason). Depending on your use case you have several options:
implement a Sail(Connection)Wrapper or a Repository(Connection)Wrapper to intercept insert operations and do the necessary validation.
implement an RDFHandler (e.g. a subclass of RDFInserter) that does the validation, and use that handler to add/validate data (instead of using the standard RepositoryConnection.add methods directly).
Either approach allows you to inspect every incoming triple, do a quick lookup in the database for its predicate, check if there are domain/range restrictions on it, and if the triple "violates" that restriction throw an error. The second approach is probably easiest to do, and also most flexible: you can employ this validation in some use cases in your code, and can skip it completely in places where you know it isn't necessary (because obviously, this kind of validation will come with a performance penalty).

Jeen is spot on in his description of how RDFS, or OWL, works in this case.
As he mentioned, to perform validation, you need a closed world, which is not what you'd normally get with RDFS or OWL semantics.
With that said, wanting to validate your data is a perfectly reasonable thing to want to do! The W3C is attempting to define something in this area, but, it's not standardized yet. iirc, TopBraid has some support for it, but it may be a bit of a moving target as the working group evolves the standard. I don't know if there is a Sesame API for it, I thought TopBraid uses Jena, but it's probably worth a look.
Stardog ships with Integrity Constraint Validation (tutorial available here), which is a different take on data validation using OWL, SPARQL, or rules, as the syntax for defining constraints. Disclaimer is that I work on this, but it's relevant because a) it does validation precisely like what you're looking for and b) it ships with support for the Sesame API.

Related

Display all fields in Wikidata Query Service

Wikidata provides query browser at https://query.wikidata.org
I want to display films all fields. I tried with using * but its not working. Does anybody know how to display all fields of the data for Films?
To work with SPARQL is necessary to understand some concepts, as #AKSW said in the comments of the question. If you don't understand the meaning of ?film ?p ?o. This is called triple¹ and is composed by subject-predicate-object. E. g., in the case of the films, it could be: x is a film. This is what you are querying in the Wikidata Query Service (WDQS) when you use ?film wdt:P31 wd:Q11424.
I think it isn't possible to display all the property-values of an item. In addition it probably could cause a timeout because there is many statements of many items.
If you want to check the property-values of all the films in Wikidata I think an option might be you write or find a script to extract the items with P31-Q11424 (instance of films). For that, the accessing data section could be useful (e. g. with pywikibot you could query and extract what you want).
If you are interested in SPARQL and WDQS I recommend you to read some help resources:
Wikidata Query Service Help, specifically the SPARQL tutorial.
Query examples (read another queries is how I began to learn).
SPARQL 1.1 Query Language specification.
RDF Dump Format (because read about the ontology of Wikidata could help to understand the concepts).
Edit
When I answer it I wrote triplestore and linked it to its respective page in the Wikipedia in English, but after the comment of #AKSW I consider I was wrong because the triplestore is the concept which is used to refer to the storage and retrieval of triple or semantic triple, "a set of three entities that codifies a statement about semantic data in the form of subject–predicate–object expressions" (from Semantic triple page in Wikipedia in English).

SPARQL Query of FOAF doesn't return any result

I want to get some information from the FOAF ontology. I tried the following SPARQL query, but it returns no results. I tried this query to get familiar with FOAF, but what I really want to do is to find all the people that a particular person ?x knows (using the property foaf:knows). How do I do this?
PREFIX foaf:<http://xmlns.com/foaf/0.1/>
SELECT ?name ?mbox
WHERE { ?x foaf:name ?name .
?x foaf:mbox ?mbox .
}
Semantic web is made of different components.
Knowledge is represented as RDF triples. These triples describes Resources based on a Subject - Predicate - Object syntax. For example, "John is a Male" may be represented as a RDF triple.
On top of RDF, we may use RDFS and OWL to specify restrictions and other information on these data. Thanks to RDFS, I can specify that "Male is a subclass of Person" and it is therefore possible to infer that "John is a Person". RDFS and OWL helps to define ontologies. An ontology is a vocabulary (that can be general or specific to a domain) to represents data. For example, I may want to create an ontology CAT to represent data on cats.
In that case, I would create my CAT vocabulary defining that "Cat is a subclass of Animal" and "hasOwner is a property that links a cat to a Person" and some other properties. Then, I am able to instantiate some individuals to create data on cats. For example by saying that "Baccara is a Cat" and "Baccara hasOwner John".
FOAF is basically a vocabulary to represent data on people and especially links between these people. FOAF vocabulary gives some properties and classes to handle easily information on people. But it doesn't provide any piece of information, only the "structure"/"model"/"schema" to organize information.
There are no individuals in the FOAF dataset. That is why your query returns no result. Since there's no people in the FOAF dataset, it is normal that the query returns nothing.
You may want to build your own RDF dataset based on FOAF vocabulary. To do so, you can try a tool like Protégé, or more easily with a text editor if you're familiar with RDF/XML or Turtle.
Otherwise, if you only need to get familiar with FOAF, you can query the model. For example, you may want to get all the subclasses of Agent :
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX foaf: <http://xmlns.com/foaf/0.1/>
SELECT distinct ?c
WHERE { ?c rdfs:subClassOf foaf:Agent }
I recommend you to read a bit on the semantic web components (especially RDF and RDFS, and differences between them) before going any further in FOAF. Plus, a nice exercise to learn SPARQL consists in querying DBpedia: http://dbpedia.org/sparql.

Sesame not inferencing owl:sameAs

I have some data on vaccines in my Sesame triplestore. To the same store, I added additional data about vaccines from DBpedia.
<http://dbpedia.org/resource/Rotavirus_vaccine>
dbpedia2:routesOfAdministration "oral"#en
To specify that a particular vaccine in my native data is the same entity as the subject of the imported data from DBpedia, I inserted an owl:sameAs statement linking the two entities.
my_ns:Rota owl:sameAs <http://dbpedia.org/resource/Rotavirus_vaccine> .
Though that single triple has been added, I find no additional inferencing. For instance, I want this query to give me the route of administration of the vaccine in my native data by inferencing the property of the vaccine entity in DBpedia:
PREFIX : <http://dbpedia.org/resource/>
PREFIX dbpedia2: <http://dbpedia.org/property/>
PREFIX my_ns: <http://purl.org/net/ontology/my_ns/>
select ?roa where
{my_ns:Rota dbpedia2:routesOfAdministration ?roa}
At present, executing the query doesn't yield any results. I'd like the system to infer the following as the output of the query above:
my_ns:Rota dbpedia2:routesOfAdministration "oral"#en .
I installed GraphDB-Lite(OWLIM) by replacing the war files and verified that owl:sameAs works by executing a query on DBpedia.
The Sesame in-memory and native stores do not support OWL reasoning out of the box. They do offer (optional) support for RDFS reasoning (so understanding rdfs:subClassOf etc), which can be enabled at repository creation time (in the workbench, this is the dropdown option 'Memory/Native Store RDF Schema'). However, owl:sameAs is of course not part of RDFS reasoning.
Sesame also supports a custom graph query reasoner on top of the memory or native stores. This custom reasoner can be configured with your own inference rule, formulated as a combination of two SPARQL CONSTRUCT queries: a 'rule' query that expresses the actual inference rule, and a 'match' query that is used to do maintenance on the inferred statements when the store is updated. More explanation on how to set this up can found in the section on Repository creation in Programming with Sesame. The option in the Workbench is "Memory/Native store Custom Graph Query Inference".
In the case of owl:sameAs, a custom rule to support it would look roughly like this:
CONSTRUCT { ?s1 ?p1 ?o1 . ?o1 ?p2 ?o3 }
WHERE {
?o1 owl:sameAs ?o2 .
OPTIONAL { ?s1 ?p1 ?o2 . }
OPTIONAL { ?o2 ?p2 ?o3 . }
}
If your goal is purely to have owl:sameAs reasoning, this might be a simple way to enable it. However, for more comprehensive OWL reasoning support, the custom reasoner is not sufficiently powerful or scalable. Instead, you should use a Sesame backend store that has built-in support for it, such as Ontotext GraphDB (formerly known as OWLIM).
Solved the problem. The issue was the absence of GraphDB-Lite (formerly OWLIM-Lite). I was under the impression I had it installed by replacing the .war files. However, the absence of OWLIM-Lite option in the drop down while creating a new repository indicated that it had not been installed.
When I initially checked wherether owl:sameAs queries were working, I used the SERVICE clause in SPARQL to query DBpedia. As I was querying DBpedia (that supports owl:sameAs), the queries were being executed as I was essentially querying outside Sesame.
I solved the problem by deleting the old .war files and their corresponding folders in Tomcat, and copying the .war files from GraphDB distribution. When running the server for the first time after copying the files, the corresponding folders (openrdf-sesame and openrdf-workbench) are auto-generated. While creating a repository, the OWLIM-Lite option is then available.
I created an OWLIM-Lite repository and added the triples there. The owl:sameAs inferencing then started working and the query in the question was successfully executed.

protege how to add a reference to another ontology

I am tying to integrate my ontology with another ontologies. what i did is importing the ontologies in my protege, that works, but protege lists all the classes, which is normally. i am looking if there is a way in which i just the reference (uri) of these ontologies and then i can use them from their prefix.
ofc, i am building my ontology using owl2
i hope you help me
If you want to completely reason and materialise facts based on terms relating to the referenced concept, then you will need to fully import the ontology that the referenced concept belongs.
e.g given an external ontology with the following statements:
ex:Person a owl:Class;
rdfs:subClassOf ex:Agent.
If you reference this in your without importing:
ex2:Doctor a owl:Class;
rdfs:subClassOf ex:Person.
and make the following statement:
ex2:Jack a ex2:Doctor.
an run it through a reasoner, then you will also materialise the following:
ex2:Jack a ex:Person.
But not the following:
ex2:Jack a ex:Agent.
To materialise the latter, you will need to import the ontology with all the statements about ex:Person.

Different SPARQL query engines give differing results for DESCRIBE Query

I tried one SPARQL query in two different engines:
Protege 4.3 - SPARQL query tab
Jena 2.11.0
While the query is the same the results returned by these two tools are different.
I tried a DESCRIBE query like the following:
DESCRIBE ?x
WHERE { ?x :someproperty "somevalue"}
Results from protege give me tuples that take ?x as subject/object; while the ones from jena are that take ?x as subject only.
My questions are:
Is the syntax of SPARQL uniform?
If I want DESCRIBE to work as in protege, what should I do in Jena?
To answer your first question yes the SPARQL syntax is uniform since you've used the same query in both tools. However what I think you are actually asking is should the results for the two tools be different or not? i.e. are the semantics of SPARQL uniform
In the case of DESCRIBE then yes the results are explicitly allowed to be different by the SPARQL specification i.e. no the semantics of SPARQL are not uniform but this is only in the case of DESCRIBE.
See Section 16.4 DESCRIBE (Informative) of the SPARQL Specification which states the following:
The query pattern is used to create a result set. The DESCRIBE form
takes each of the resources identified in a solution, together with
any resources directly named by IRI, and assembles a single RDF graph
by taking a "description" which can come from any information
available including the target RDF Dataset. The description is
determined by the query service
The important part of this is the last couple of sentences that say the description is determined by the query service. This means that both Protege's and Jena's answers are correct since they are allowed to choose how they form the description.
Changing Jena DESCRIBE handling
To answer the second part of your question you can change how Jena processes DESCRIBE queries by implementing a custom DescribeHandler and an associated DescribeHandlerFactory. You then need to register your factory like so:
DescribeHandlerRegistry.get().set(new YourDescribeHandlerFactory());