I'm trying to find a way to implement SHACL validations using SPARQL in my AWS Neptune Graph database. Is there a way to do so?
Well, it depends on what you mean by "implement". ;-)
You cannot implement all of SHACL with SPARQL alone, but you could implement some subset; not with a single query, though. You could, for example, write a query that collects the constraints of your shapes, and then use those results to generate a query that gets the relevant parts of your data; you could then examine those results and produce a validation report. And if you are doing stuff programmatically, you could of course implement also those parts with cannot be expressed through SPARQL (e.g., literal string patterns).
All that is somewhat "academic". There are open source SHACL implementations that you could use as a Neptune client (e.g., pySHACL if you are using Python and RDFLib). That would be a better and certainly a far more practical way.
Related
So, I need to run SPARQL query over a semantic database but some of the triples are not going to be in the database but are going to be provided by webservices (and not as a SPARQL endpoint). I would want to be able to run a SELECT query that take into consideration those additional triples but without having to insert them in the database, is there a way to do that ?
This is not part of the SPARQL spec, so "no" is the general answer.
That said, Virtuoso (possibly among others) lets you include an external RDF source (a/k/a webservice) as part of the FROM (among other methods), to be dereferenced during SPARQL query processing.
Such webservice need not be a SPARQL endpoint, but best performance will result if it provides RDF (though serialization may vary).
The Virtuoso Sponger can also be invoked on the fly to derive RDF from many document formats (with an obvious performance hit). To pursue, please raise this to the OpenLink Community Forum.
I'm wrapping my head around Grakn a little to understands its added value, I wonder if Graql is compiled or translated to gremlin traversal step ?
This makes me wonder about the difference of expressivity between Sparql and Graql, given that the former is until now not fully translated into Gremlin. It seems to be an open problem ? Is Graql fundamentally simpler than sparql and that would explain the fact that is it fully translated if that's the case ? If not is there any limitation in translating it to gremlin steps at this point ?
I'll try to shine some light on your questions.
To begin with, Graql was designed to be a high-level, human-readable query language. The main idea was to abstract the node-vertex graph datastructure to concepts that are specific to a given user-defined domain. In that way the user doesn't need to worry about the underlying graph representation and low-level gremlin constructs and instead he can work with high-level terms he defined himself and/or he is familiar with.
Now, implementation-wise Graql is an abstraction over Gremlin which translates the high-level queries to Gremlin traversals which can then be executed against a specific graph. However, the mapping between Graql and Gremlin is not 1-1. In fact, Graql operates with some subset of Gremlin that allows to capture the intended behaviours of the Graql language. It was never our intention to find such a mapping as the goal was to translate high-level queries to queries understandable by the underlying graph processor.
Now the efficiency of the traversal generation. Graql queries can be decomposed to properties (has, isa, sub, etc) and fragments. Each fragment has a defined Gremlin counterpart and each property can possibly contain multiple fragments. Now the fragment translation is unambiguous, however there is a lot of freedom in picking and arranging the fragments that go into a property. Keeping in mind that queries contain multiple properties this makes the arrangement a strictly non-trivial task. To perform this arrangement, which in Gremlin is handed to the user, we implemented a query processor. The idea of the processor is to pick such an arrangement and ordering of the fragments that the resulting query execution is as fast as possible. This is reminiscent of SQL query processors and the motivation is exactly the same, to abstract the query optimisation from the user.
We are actively working on the query planning component and although it gives no guarantee to be produce the most optimal plan in all cases, we are trying to make the produced plans converge to optimal solutions.
I've been trying to figure out how to mount a SPARQL endpoint for a couple of days, but as much as I read I can not understand it.
Comment my intention: I have an open data server mounted on CKAN and my goal is to be able to use SPARQL queries on the data. I know I could not do it directly on the datasets themselves, and I would have to define my own OWL and convert the data I want to use from CSV format (which is the format they are currently in) to RDF triple format (to be used as linked data).
The idea was to first test with the metadata of the repositories that can be generated automatically with the extension ckanext-dcat, but is that I really do not find where to start. I've searched for information on how to install a Virtuoso server for the SPARQL, but the information I've found leaves a lot to be desired, not to say that I can find nowhere to explain how I could actually introduce my own OWLs and RDFs into Virtuoso itself.
Someone who can lend me a hand to know how to start? Thank you
I'm a little confused. Maybe this is two or more questions?
1. How to convert tabular data, like CSV, into the RDF semantic format?
This can be done with an R2RML approach. Karma is a great GUI for that purpose. Like you say, a conversion like that can really be improved with an underlying OWL ontology. But it can be done without creating a custom ontology, too.
I have elaborated on this in the answer to another question.
2. Now that I have some RDF formatted data, how can I expose it with a SPARQL endpoint?
Virtuoso is a reasonable choice. There are multiple ways to deploy it and multiple ways to load the data, and therefore LOTs of tutorial on the subject. Here's one good one, from DBpedia.
If you'd like a simpler path to starting an RDF triplestore with a SPARQL endpoint, Stardog and Blazegraph are available as JARs, and RDF4J can easily be deployed within a container like Tomcat.
All provide web-based graphical interfaces for loading data and running queries, in addition to SPARQL REST endpoints. At least Stardog also provides command-line tools for bulk loading.
i'm new to semantic web.
I'm trying to do a sample application where i can query data from different data sources in one query.
i have created a small rdf file which contains references to dbpedia resources for defining localities. my question is : how can i get the data contained in my file and other information which is in the description of the distant resource (for example : the name of the person from the local file, and the total poulation in a city dbpedia-owl:populationTotal from the distant rdf file).
i don't really understand the sparql query language, i tried to use the JENA ARQ API with the SERVICE keyword but it doesn't solve the problem.
Any help please?
I guess you are looking for something like the Semantic Web Client Library, which tries to leverage the GGG. Albeit, the standard exploration algorithm of this framework is that it follows rdfs:seeAlso links. Nevertheless, the general approach seems to be what your are looking for, i.e., you would create a local graph that consists of your seed graph and that traverse the relations up to a certain level, e.g., three steps, resolves the URIs and load that content into your local triple. Utilising advanced technologies like SPARQL federation might be something for later ;)
I have retrived data from two different sources using SPARQL query with named graphs.
I used jena-ARQ to execute the sparql query.
currently, I found out that I can query using model (Model) syntax in Jena in a rdf after loading the model from a file, it gives me same output if I apply a sparql query. So, I want to know that , is it a good way to that without sparql? Though I have tested it with a small rdf file. I also want to know if I use Virtuoso can i manipulate using model syntax without sparql?
Thanks in Advance.
I'm not quite sure if I understand your question. If I can paraphrase, I think you're asking:
Is it OK to query and manipulate RDF data using the Jena Model API instead of using
SPARQL? Does it make a difference if the back-end store is Virtuoso?
Assuming that's the right re-phrasing of the question, then the first part is definitively yes: you can manipulate RDF data through the Model and OntModel APIs. In fact, I would say that's what the majority of Jena users do, particularly for small queries or updates. I find personally that going direct to the API is more succinct up to a certain point of complexity; after that, my code is clearer and more concise if I express the query in SPARQL. Obviously circumstances will have an effect: if you're working with a mixture of local stores and remote SPARQL endpoints (for which sending a query string is your only option) then you may find the consistency of always using SPARQL makes your code clearer.
Regarding Virtuoso, I don't have any direct experience to offer. As far as I know, the Virtuoso Jena Provider fully implements the features of the Model API using a Virtuoso store as the storage layer. Whether the direct API or using SPARQL queries gives you a performance advantage is something you should measure by benchmark with your data and your typical query patterns.