How to validate JSON-LD against an schema? - schema

As I understand there are ways to validate serialized RDF (e.g RDF/XML) against RDF Schema (How to validate a RDF with your RDF schema).
Also, there are various converters from RDF/XML to JSON-LD serialization format (and vice versa).
Searching the Internet I could not find a straightforward way to validate JSON-LD against some sort of JSON Schema that relates to JSON-LD as RDF Schema relates to RDF(/XML).
Of course, there are various JSON-LD document forms so I assume that one schema cannot easily describe all forms.
So my question is, what is the proper or recommended way of validating JSON-LD document from the RDF perspective?
BTW I run on a project that tries to solve validation of JSON linked data https://github.com/common-workflow-language/schema_salad.

RDF Schema is somewhat Mia-named, but can be used to make sense of (actually, infer information from) an RDF graph. OWL provides more mechanisms for asserting shapes of RDF Graphs as does new work on RDF Shapes. The key is that these work on the data model, not the syntax. Both RDF/XML and JSON-LD are RDF serializations, which can be used to reduce documents expressed in an appropriate syntax into an RDF Graph, where these tools operate.
The Structured Data Linter uses this approach to "validate" web pages representing information in schema.org and many other vocabularies using these principles.

RDF Schema is not for validation. In fact you cannot express a contradiction with RDF Schema alone. For example if an instance of Person is the subject of a triple with maximumSpeed as predicate and the property maximumSpeed has Vehicle (rather than Person) as rdfs:domain there is no contradiction, there is simply a thing that is both a Person and a Vehicle. To say that something cannot at the same time be a person and a vehicle you would need OWL, RDF Schema is not enough for that.
RDF Data Shapes will allow constraints and validation.

Related

Redefined RDF properties in Wikidata

Wikidata seems to redefine common RDF properties such as:
rdf:type (P31 in Wikidata),
rdfs:subClassOf (P279)
rdfs:subPropertyOf (P1647)
...
What's the motivation behind this? Why not just use the RDF properties, making it more similar to other knowledge graphs and therefore easier to query?
Following up on Stanislav's and AKSW's comments, I did a bit of research.
As far as I understand it (improvements/corrections are welcome), Wikidata has its own data model (it is not natively represented as an RDF graph -- quick introduction here). For example, Wikidata statements are ranked, they can be linked to references, etc.
These features can be replicated in RDF (using, for example, RDF reification) but that can be somewhat verbose and/or complex. As a result, Wikidata defines its own properties and classes, while still linking them to the equivalent RDF constructs.
Worth reading:
https://www.mediawiki.org/wiki/Wikibase/Indexing/RDF_Dump_Format
http://aidanhogan.com/docs/reification-wikidata-rdf-sparql.pdf
http://www.snee.com/bobdc.blog/2017/04/the-wikidata-data-model-and-yo.html

Mount a SPARQL endpoint for use with custom ontologies and triple RDFs

I've been trying to figure out how to mount a SPARQL endpoint for a couple of days, but as much as I read I can not understand it.
Comment my intention: I have an open data server mounted on CKAN and my goal is to be able to use SPARQL queries on the data. I know I could not do it directly on the datasets themselves, and I would have to define my own OWL and convert the data I want to use from CSV format (which is the format they are currently in) to RDF triple format (to be used as linked data).
The idea was to first test with the metadata of the repositories that can be generated automatically with the extension ckanext-dcat, but is that I really do not find where to start. I've searched for information on how to install a Virtuoso server for the SPARQL, but the information I've found leaves a lot to be desired, not to say that I can find nowhere to explain how I could actually introduce my own OWLs and RDFs into Virtuoso itself.
Someone who can lend me a hand to know how to start? Thank you
I'm a little confused. Maybe this is two or more questions?
1. How to convert tabular data, like CSV, into the RDF semantic format?
This can be done with an R2RML approach. Karma is a great GUI for that purpose. Like you say, a conversion like that can really be improved with an underlying OWL ontology. But it can be done without creating a custom ontology, too.
I have elaborated on this in the answer to another question.
2. Now that I have some RDF formatted data, how can I expose it with a SPARQL endpoint?
Virtuoso is a reasonable choice. There are multiple ways to deploy it and multiple ways to load the data, and therefore LOTs of tutorial on the subject. Here's one good one, from DBpedia.
If you'd like a simpler path to starting an RDF triplestore with a SPARQL endpoint, Stardog and Blazegraph are available as JARs, and RDF4J can easily be deployed within a container like Tomcat.
All provide web-based graphical interfaces for loading data and running queries, in addition to SPARQL REST endpoints. At least Stardog also provides command-line tools for bulk loading.

Understanding the difference between SPARQL and semantic reasoning using Pellet

I have a pizza ontology that defines different types of pizzas, ingredients and relations among them.
I just want to understand several basic things:
Is it correct that I should apply SPARQL if I want to obtain information without
reasoning? E.g. which pizzas contain onion?
What is the difference between SPARQL and reasoning algorithms
like Pellet? Which queries cannot be answered by SPARQL, while can
be answered by Pellet? Some examples of queries (question-like) for the pizza ontology would be helpful.
As far as I understand to use SPARQL from Java with Jena, I
should save my ontology in RDF/XML format. However, to use Pellet
with Jena, which format do I need to select? Pellet uses OWL2...
SPARQL is a query language, that is, a language for formulating questions in. Reasoning, on the other hand, is the process of deriving new information from existing data. These are two different, complementary processes.
To retrieve information from your ontology you use SPARQL, yes. You can do this without reasoning, or in combination with a reasoner, too. If you have a reasoner active it means your queries can be simpler, and in some cases reasoners can derive information that is not really retrievable at all with just a query.
Reasoners like Pellet don't really answer queries, they just reason: they figure out what implicit information can be derived from the raw facts, and can do things like verifying that things are consistent (i.e. that there are no logical contradictions in your data). Pellet can figure out that that if you own a Toyota, which is of type Car, you own a Vehicle (because a Car is a type of Vehicle). Or it can figure out that if you define a pizza to have the ingredient "Parmesan", you have a pizza of type "Cheesy" (because it knows Parmesan is a type of Cheese). So you use a reasoner like Pellet to derive this kind of implicit information, and then you use a query language like SPARQL to actually ask: "Ok, give me an overview of all Cheesy pizzas that also have anchovies".
APIs like Jena are toolkits that treat RDF as an abstract model. Which syntax format you save your file in is immaterial, it can read almost any RDF syntax. As soon as you have it read in a Jena model you can execute the Pellet reasoner on it - it doesn't matter which syntax your original file was in. Details on how to do this can be found in the Jena documentation.

Add Triples to Protege for querying

I have the ontology and a file about 7.4 GB containing triples. How do I add this file of triples so that I can run queries on it ?
You might want to import you data into a triplestore.
See http://wiki.bitplan.com/index.php/SPARQL#The_sample_Data for an example how to import Data into a triple store - in this case blazegraph.
Different triplestores have different options. Protége is not so god for managing data directly. It's good for designing ontologies - that is specifying the schema/structure/model for the data. If you have the data in triples you might want to find out this structure by not only looking at the data but also finding sources for the underlying design. Of course you can sort of "reverse engineer" this by loading the triples into a triple store and trying queries.

What is the difference between RDF Schema and Ontology?

I am new to Semantic Web and confused regarding RDFs and Ontology. Can someone explain the difference between RDF Schema and Ontology?
RDF Schema (RDFS) is a language for writing ontologies.
An ontology is a model of (a relevant part of) the world, listing the types of object, the relationships that connect them, and constraints on the ways that objects and relationships can be combined.
A simple example of an ontology (though not written in RDFS syntax):
class: Person
class: Project
property: worksOn
worksOn domain Person
worksOn range Project
which says that in our model of the world, we only care about People and Projects. People can work on Projects, but not the other way around.
Do you mean 'what is the difference between RDF Schema' and 'Web Ontology Language (OWL2)'. If so then there are a few main differences. Both are ways to create vocabularies of terms to describe data when represented as RDF. OWL2 and its subsets (OWL DL, OWL Full, OWL Lite) contain all the terms contained in RDFS but allow for greater expressiveness, including quite sophisticated class and property expressions. In additional, one of the subsets of OWL2 (OWL Full) can be modelled in such a way that when reasoned using an OWL Full reasoner, is undecidable. Both are representable as RDF and both are W3C Web Standards.
If you want to compare RDFS and ontology, not specifically in the context above, but in the context of Semantic Web, then my advice would be to very careful. Careful because you will find several distinct and not necessarily mutually exclusive camps; those with an interest in ontology from a philosophical perspective, those from a computing perspective, those who think the philosophical perspective should be the only perspective and those that don't. If you are any of those ways inclined, you can end up having great debates. But if you want to engage in Semantic Web Development, then the fastest route is to study and understand the Web Standards mentioned initially.
Conceptually there is no difference, i.e., RDFS can be utilised to create a (e.g. domain specific) vocabulary or ontology, where RDFS is bootstrapping itself in companion with RDF (everything is at least an rdfs:Resource). Furthermore, in the context of Semantic Web technologies you could utilise OWL to describe advanced semantics of your ontology/vocabulary. See also this definition of ontology.
As per the spec, RDF schema is purely that - a schema or structure for defining things semantically. It gives you the vocabulary (key words and properties) for describing things. Think of it like an XML schema as used in XML documents and web pages.
An ontology is a classification hierarchy (for example, the biological taxonomy of life) normally combined with instances of those classes. It is used for classifying and reasoning.
What is an instance depends on how you define a taxonomy. It might be that you have an ontology of living creatures and so a living, breathing person is an instance of the ontological class "Homo Sapiens", or it might be that you have an ontology of species and so the entire Homo Sapiens species is an instance of the ontological class "Species".
In non technical terms, I would say RDFS is a language that helps to represent information. And an ontology is the term used to refer to all the information about a domain.
Cheers