Wikidata seems to redefine common RDF properties such as:
rdf:type (P31 in Wikidata),
rdfs:subClassOf (P279)
rdfs:subPropertyOf (P1647)
...
What's the motivation behind this? Why not just use the RDF properties, making it more similar to other knowledge graphs and therefore easier to query?
Following up on Stanislav's and AKSW's comments, I did a bit of research.
As far as I understand it (improvements/corrections are welcome), Wikidata has its own data model (it is not natively represented as an RDF graph -- quick introduction here). For example, Wikidata statements are ranked, they can be linked to references, etc.
These features can be replicated in RDF (using, for example, RDF reification) but that can be somewhat verbose and/or complex. As a result, Wikidata defines its own properties and classes, while still linking them to the equivalent RDF constructs.
Worth reading:
https://www.mediawiki.org/wiki/Wikibase/Indexing/RDF_Dump_Format
http://aidanhogan.com/docs/reification-wikidata-rdf-sparql.pdf
http://www.snee.com/bobdc.blog/2017/04/the-wikidata-data-model-and-yo.html
Related
Is it possible to infer new knowledge about an ontology only from a query in SPARQL?
I have a question about the use of the SPARQL language about ontologies. So far I have thought that SPARQL is the equivalent to the SQL language in the relational databases, that is to say, that with SPARQL it is only possible to consult the data that are explicitly in the ontology, without having access to the data that can be inferred , leaving the responsibility of the inference to the reasoners.
However, I have read documents from which I infer that SPARQL does have the capacity to infer implicit and non-explicit knowledge in the ontology. Is my inference true? That is, is it possible to infer knowledge through a SPARQL query without the need for a reasoner? If the answer is true, then what advantages does the use of a reasoner have over the use of SPARQL?
Greetings, Manuel Puebla.
Yes, on-the-fly inference may be a feature of the SPARQL processor, so you can get the benefits of inference/reasoning directly from a SPARQL query. (See Virtuoso SPARQL endpoints inference rules for some discussion of how this is done in Virtuoso, for example.)
I have a pizza ontology that defines different types of pizzas, ingredients and relations among them.
I just want to understand several basic things:
Is it correct that I should apply SPARQL if I want to obtain information without
reasoning? E.g. which pizzas contain onion?
What is the difference between SPARQL and reasoning algorithms
like Pellet? Which queries cannot be answered by SPARQL, while can
be answered by Pellet? Some examples of queries (question-like) for the pizza ontology would be helpful.
As far as I understand to use SPARQL from Java with Jena, I
should save my ontology in RDF/XML format. However, to use Pellet
with Jena, which format do I need to select? Pellet uses OWL2...
SPARQL is a query language, that is, a language for formulating questions in. Reasoning, on the other hand, is the process of deriving new information from existing data. These are two different, complementary processes.
To retrieve information from your ontology you use SPARQL, yes. You can do this without reasoning, or in combination with a reasoner, too. If you have a reasoner active it means your queries can be simpler, and in some cases reasoners can derive information that is not really retrievable at all with just a query.
Reasoners like Pellet don't really answer queries, they just reason: they figure out what implicit information can be derived from the raw facts, and can do things like verifying that things are consistent (i.e. that there are no logical contradictions in your data). Pellet can figure out that that if you own a Toyota, which is of type Car, you own a Vehicle (because a Car is a type of Vehicle). Or it can figure out that if you define a pizza to have the ingredient "Parmesan", you have a pizza of type "Cheesy" (because it knows Parmesan is a type of Cheese). So you use a reasoner like Pellet to derive this kind of implicit information, and then you use a query language like SPARQL to actually ask: "Ok, give me an overview of all Cheesy pizzas that also have anchovies".
APIs like Jena are toolkits that treat RDF as an abstract model. Which syntax format you save your file in is immaterial, it can read almost any RDF syntax. As soon as you have it read in a Jena model you can execute the Pellet reasoner on it - it doesn't matter which syntax your original file was in. Details on how to do this can be found in the Jena documentation.
I am using SPARQL plugin in protege to query my ontology, and I found out that it only works for asserted statements and not inferred ones. How can I change this?
SPARQL is defined by several standards. SPARQL 1.1 Query, the main standard, only shallowly relies on RDF semantics. A typical SPARQL query engine is not inferring anything from the RDF/RDFS terms like rdfs:subClassOf, rdfs:range, etc. However, the SPARQL standards also define SPARQL 1.1 Entailment Regimes, which defines how SPARQL engines should answer queries when they do implement inference, which is optional. In order to know whether a SPARQL query engine implements an entailment regime (such as RDFS or OWL DL), you may have to look at the documentation of the engine, or there may be a SPARQL service description available in RDF. SPARQL 1.1 Service Description is yet another SPARQL standard that provides an RDF vocabulary, and a standard way to interpret it, for knowing what features a SPARQL engine is implementing.
When querying some linked data SPARQL endpoints via SPARQL queries, what is the type of reasoning provided (if any)?
For example, DBpedia SNORQL endpoint doesn't even provide the basic subclass inference (if A subClassOf B and B subClassOf C, then A subClassOf C). While FactForge SPARQL endpoint provides some inference (though it is not clear what kind of inference it is), and provides the possibility to switch that inference on and off.
My question:
How is it possible to identify the kind of inference applied? and if the inference support is limited, could it be extended using the endpoint only?
Inference controls will vary with the engine as well as the endpoint.
The public DBpedia SPARQL endpoint (powered by Virtuoso, from my employer, OpenLink Software) does provide various inference rules (accessible through the "Inference rules" link at the top right corner of the SPARQL endpoint query form page) which are controlled by pragmas in your SPARQL (not SNORQL, to which form you linked), such as --
DEFINE input:inference 'urn:rules.skos'
You can see the content of any predefined ruleset via SPARQL -- for the above
SELECT *
FROM <urn:rules.skos>
WHERE { ?s ?p ?o }
You can see the live query and results.
See this tutorial containing many examples.
While inference is not universally supported across SPARQL endpoints, most of the inferences supported by RDFS, RSFS+ and OWL 2 RL profiles are supported by SPARQL itself. For example, querying for instances of :A using your subClassOf entailment can be supported with SPARQL property paths:
SELECT ?inst
WHERE {
?cls rdfs:subClassOf* :A .
?inst a ?cls .
}
The first triple pattern gets all subclasses of :A, including :A (use + instead of * if you just want subclasses of :A), and the second triple finds all instances of all those classes.
To see how most of OWL 2 can be implemented with SPARQL, see Reasoning in OWL 2 RL and RDF Graphs using Rules. With a couple of exceptions, all of these can be implemented in SPARQL (and in fact you probably won't need some of them, such as eq-ref, (which is good for a computational lol that logicians may scoff at)).
There are few uses cases, beyond heavy-lifting classification problems, that can't be solved with a subset of the OWL 2 RL rules.
So, in the end, a recommendation is to understand what entailments you need. Chances are that OWL has totally overthought the issue and you can live with a few SPARQL patterns. And then you can hit the SPARQL endpoints without having to worry about whether specific inference profiles are supported.
As I understand there are ways to validate serialized RDF (e.g RDF/XML) against RDF Schema (How to validate a RDF with your RDF schema).
Also, there are various converters from RDF/XML to JSON-LD serialization format (and vice versa).
Searching the Internet I could not find a straightforward way to validate JSON-LD against some sort of JSON Schema that relates to JSON-LD as RDF Schema relates to RDF(/XML).
Of course, there are various JSON-LD document forms so I assume that one schema cannot easily describe all forms.
So my question is, what is the proper or recommended way of validating JSON-LD document from the RDF perspective?
BTW I run on a project that tries to solve validation of JSON linked data https://github.com/common-workflow-language/schema_salad.
RDF Schema is somewhat Mia-named, but can be used to make sense of (actually, infer information from) an RDF graph. OWL provides more mechanisms for asserting shapes of RDF Graphs as does new work on RDF Shapes. The key is that these work on the data model, not the syntax. Both RDF/XML and JSON-LD are RDF serializations, which can be used to reduce documents expressed in an appropriate syntax into an RDF Graph, where these tools operate.
The Structured Data Linter uses this approach to "validate" web pages representing information in schema.org and many other vocabularies using these principles.
RDF Schema is not for validation. In fact you cannot express a contradiction with RDF Schema alone. For example if an instance of Person is the subject of a triple with maximumSpeed as predicate and the property maximumSpeed has Vehicle (rather than Person) as rdfs:domain there is no contradiction, there is simply a thing that is both a Person and a Vehicle. To say that something cannot at the same time be a person and a vehicle you would need OWL, RDF Schema is not enough for that.
RDF Data Shapes will allow constraints and validation.