SPARQL consider additional triples in a query - sparql

So, I need to run SPARQL query over a semantic database but some of the triples are not going to be in the database but are going to be provided by webservices (and not as a SPARQL endpoint). I would want to be able to run a SELECT query that take into consideration those additional triples but without having to insert them in the database, is there a way to do that ?

This is not part of the SPARQL spec, so "no" is the general answer.
That said, Virtuoso (possibly among others) lets you include an external RDF source (a/k/a webservice) as part of the FROM (among other methods), to be dereferenced during SPARQL query processing.
Such webservice need not be a SPARQL endpoint, but best performance will result if it provides RDF (though serialization may vary).
The Virtuoso Sponger can also be invoked on the fly to derive RDF from many document formats (with an obvious performance hit). To pursue, please raise this to the OpenLink Community Forum.

Related

Mount a SPARQL endpoint for use with custom ontologies and triple RDFs

I've been trying to figure out how to mount a SPARQL endpoint for a couple of days, but as much as I read I can not understand it.
Comment my intention: I have an open data server mounted on CKAN and my goal is to be able to use SPARQL queries on the data. I know I could not do it directly on the datasets themselves, and I would have to define my own OWL and convert the data I want to use from CSV format (which is the format they are currently in) to RDF triple format (to be used as linked data).
The idea was to first test with the metadata of the repositories that can be generated automatically with the extension ckanext-dcat, but is that I really do not find where to start. I've searched for information on how to install a Virtuoso server for the SPARQL, but the information I've found leaves a lot to be desired, not to say that I can find nowhere to explain how I could actually introduce my own OWLs and RDFs into Virtuoso itself.
Someone who can lend me a hand to know how to start? Thank you
I'm a little confused. Maybe this is two or more questions?
1. How to convert tabular data, like CSV, into the RDF semantic format?
This can be done with an R2RML approach. Karma is a great GUI for that purpose. Like you say, a conversion like that can really be improved with an underlying OWL ontology. But it can be done without creating a custom ontology, too.
I have elaborated on this in the answer to another question.
2. Now that I have some RDF formatted data, how can I expose it with a SPARQL endpoint?
Virtuoso is a reasonable choice. There are multiple ways to deploy it and multiple ways to load the data, and therefore LOTs of tutorial on the subject. Here's one good one, from DBpedia.
If you'd like a simpler path to starting an RDF triplestore with a SPARQL endpoint, Stardog and Blazegraph are available as JARs, and RDF4J can easily be deployed within a container like Tomcat.
All provide web-based graphical interfaces for loading data and running queries, in addition to SPARQL REST endpoints. At least Stardog also provides command-line tools for bulk loading.

Efficiently querying abstract elements using WikiData Sparql

I'm trying to build a query to fetch instances of / any subclasses of abstract elements such as "human" (Q5) by name, however the query fails with a timeout, probably because it has too many nodes to traverse in the graph.
Are there any better methods to query this? The best I could come up with is using the Wikidata API search entities endpoint with the element name, then filter the desired results in Sparql query to minimize the domain of the query instead of the whole graph.
I'm a little worried about using this method in a production environment since Wikidata Sparql is in Beta. Any best practices for migrating knowledge graph use cases from freebase? Is there any update regarding the migration of data from Freebase to Wikidata?
Finally are there any other mature alternatives to the deprecated Freebase service?
What endpoint are you querying against? Querying against a shared public endpoint with no SLA (beta or not) for a production service is very risky proposition.
Wikidata offers full database dumps that you can tailor/subset and load into whatever infrastructure you like. That would give you complete control over performance, quality, and any other metrics which are important to you.
As far as migrating from Freebase goes, there is no migration path. The track that train was on has come to an end (at least for external non-Google users). It's not just deprecated, it was shut down completely a while ago. A tiny fraction of the data was imported to Wikidata (and they shared a bunch in common already due to their common ancestor Wikipedia), but none of the programmatic features such as MQL's JSON query-by-example, Freebase Search, Freebase Suggest, Google-scale performance or availability, etc is available (yet?) for Wikidata.
If the data is important to you, you should self-host using whatever infrastructure meets your needs.

Understanding the difference between SPARQL and semantic reasoning using Pellet

I have a pizza ontology that defines different types of pizzas, ingredients and relations among them.
I just want to understand several basic things:
Is it correct that I should apply SPARQL if I want to obtain information without
reasoning? E.g. which pizzas contain onion?
What is the difference between SPARQL and reasoning algorithms
like Pellet? Which queries cannot be answered by SPARQL, while can
be answered by Pellet? Some examples of queries (question-like) for the pizza ontology would be helpful.
As far as I understand to use SPARQL from Java with Jena, I
should save my ontology in RDF/XML format. However, to use Pellet
with Jena, which format do I need to select? Pellet uses OWL2...
SPARQL is a query language, that is, a language for formulating questions in. Reasoning, on the other hand, is the process of deriving new information from existing data. These are two different, complementary processes.
To retrieve information from your ontology you use SPARQL, yes. You can do this without reasoning, or in combination with a reasoner, too. If you have a reasoner active it means your queries can be simpler, and in some cases reasoners can derive information that is not really retrievable at all with just a query.
Reasoners like Pellet don't really answer queries, they just reason: they figure out what implicit information can be derived from the raw facts, and can do things like verifying that things are consistent (i.e. that there are no logical contradictions in your data). Pellet can figure out that that if you own a Toyota, which is of type Car, you own a Vehicle (because a Car is a type of Vehicle). Or it can figure out that if you define a pizza to have the ingredient "Parmesan", you have a pizza of type "Cheesy" (because it knows Parmesan is a type of Cheese). So you use a reasoner like Pellet to derive this kind of implicit information, and then you use a query language like SPARQL to actually ask: "Ok, give me an overview of all Cheesy pizzas that also have anchovies".
APIs like Jena are toolkits that treat RDF as an abstract model. Which syntax format you save your file in is immaterial, it can read almost any RDF syntax. As soon as you have it read in a Jena model you can execute the Pellet reasoner on it - it doesn't matter which syntax your original file was in. Details on how to do this can be found in the Jena documentation.

how to set up multiples databases in Virtuoso triplestore?

Can I set up multiples triplestores in Virtuoso in the same way I create multiples databases in, for example, a conventional mysql DBMS? Each database would be independent with (possibly) its own sparql endpoint.
Yes you can,
at least as far as i understood your question.
You can add additional datastes to the virtuoso triple store under a new graph, which you would use in the FROM statament of your queries to point out the named graph you want your results to stem from:
create graph <http://myNewAndShinyGraph.org/some/path>;
Now you can add/upload you dataset into the triplestore under the new context you created. (As usual via SPARQL INSERT, TTLP or ld_dir...)
You can also expose this graph with a different SPARQL endpoint.
Follow the steps described by Hugh Williams here: Defining endpoints in Virtuoso
Also of interest: How to create a SPARQL endpoint using Virtuoso?
Your question is extremely broad and difficult to answer both concisely and usefully. The short answer is "Yes," but that seems less than useful.
Virtuoso (produced by my employer, OpenLink Software) is a "conventional" SQL-style DBMS, akin to MySQL, PostgreSQL, Oracle, SQL Server, etc., though being a hybrid engine, it is also a NoSQL, graph/RDF, XML, object, and various other style DBMS. In the graph/RDF realm, it is actually a Quad Store, which allows for use as either a simple triplestore, or a collection of Named Graph whether each might be considered a separate triplestore...
One Virtuoso DB file may contain multiple SQL-style CATALOGS, as well as multiple Named Graphs and other divisions of RDF/graph data, for which you can set up distinct SPARQL endpoints -- or you can set up distinct DB files (and Virtuoso instances), each with one database/data set. There may be other options appropriate to your needs ...
Virtuoso-specific questions are often better raised in Virtuoso-specific areas, such as the public Virtuoso Users mailing list, the public OpenLink Support Forums, a confidential OpenLink Support Case, etc.

Manipulating RDF in Jena

currently, I found out that I can query using model (Model) syntax in Jena in a rdf after loading the model from a file, it gives me same output if I apply a sparql query. So, I want to know that , is it a good way to that without sparql? Though I have tested it with a small rdf file. I also want to know if I use Virtuoso can i manipulate using model syntax without sparql?
Thanks in Advance.
I'm not quite sure if I understand your question. If I can paraphrase, I think you're asking:
Is it OK to query and manipulate RDF data using the Jena Model API instead of using
SPARQL? Does it make a difference if the back-end store is Virtuoso?
Assuming that's the right re-phrasing of the question, then the first part is definitively yes: you can manipulate RDF data through the Model and OntModel APIs. In fact, I would say that's what the majority of Jena users do, particularly for small queries or updates. I find personally that going direct to the API is more succinct up to a certain point of complexity; after that, my code is clearer and more concise if I express the query in SPARQL. Obviously circumstances will have an effect: if you're working with a mixture of local stores and remote SPARQL endpoints (for which sending a query string is your only option) then you may find the consistency of always using SPARQL makes your code clearer.
Regarding Virtuoso, I don't have any direct experience to offer. As far as I know, the Virtuoso Jena Provider fully implements the features of the Model API using a Virtuoso store as the storage layer. Whether the direct API or using SPARQL queries gives you a performance advantage is something you should measure by benchmark with your data and your typical query patterns.