Virtuoso SPARQL not retrieving values - sparql

I am uploading a RDF file to a Virtuoso Repository vía de graphic interface (ODS-Briefcase). The file is uploaded successfully. However, every time I make a SPARQL query, an empty result is returned.
I have tried with many other files, and I did not have this problem.
This file size is bigger than the previous ones (14MB), so I guess this could be the cause, but I am not sure about it.
Any help in this matter will be appreciated :)
UPDATE
I have tried uploading a smaller file (2KB) and the SPARQL returns results as expected.
However, I uploaded again the file (14 MB) and it seems it is not correctly uploaded.
When I try to read it from the ODS-Briefcase of Virtuoso, this happens:

To solve problems like this you have to fundamentally understand the task you are performing and how its interpreted using Virtuoso.
Task at hand:
Loading an RDF document into Virtuoso's WebDAV repository (for which ODS-Briefcase provides a front-end), in a manner that results in the content of said RDF document being loaded into the Quad Store (where RDF data is indexed and made available to SPARQL Queries etc.).
How you achieve your goal:
Use the ODS-Briefcase UI to create a DET Folder (a folder then provides an automatic conduit between WebDAV storage and the Virtuoso Quad Store) of type: Linked Data Import. One of the attributes (characteristics) of this kind of folder is a Named Graph IRI and Named Graph IRI Base:
With your Linked Data Import DET folder in place, you simply upload RDF documents to the newly created folder.
To verify existence of RDF Language statements imported from an RDF Document placed in this folder simply execute one of the following
SELECT COUNT (*)
FROM {targe-named-graph-iri}
WHERE {?s ?p ?o}
OR
SELECT DISTINCT *
FROM {targe-named-graph-iri}
WHERE {?s ?o ?o}
You can also leverage Virtuoso's in-built RDF data import middleware (a/k/a Sponger) within a SPARQL query using the pattern:
DEFINE get:soft "replace"
SELECT DISTINCT *
FROM {rdf-document-uri}
WHERE {?s ?o ?o}
I hope this brings clarity to options available for importing RDF Document content into the Virtuoso Quad Store (Engine that manages Data represented and RDF Property/Predicate Graphs).

It sounds like you have loaded your files into the Virtuoso WebDAV (file) repository, but you may have not loaded the RDF therein into the Virtuoso (RDF) Quad Store.
See this guide to the bulk loader, and this page of RDF loading methods.
(ObDisclaimer: I work for OpenLink Software, producer of Virtuoso.)

Related

How to make a selection of a giant ontology, built from several aligned reference ontologies?

My organization has an information requirement spanning several information domains. In order to capture this, we are building a large organization ontology in which we align several domain specific reference ontologies / vocabularies (think of dublin core, geosparql, industry specific information models etc) and where necessary, we add concepts in an ` extension' ontology (which is then also aligned with the reference ontologies).
The totality of this aligned ontology (>3000 classes and >10000 ObjectProperties) contains both unused concepts and semantic doubles, and for the newcomer is impossible to navigate. Further more the organization wishes to standardize the use of specific concepts, so doubles are extremely undesirable. We are therefore looking for a way to construct the SuperAwesomeOntology that contains all concepts (and their owl related predicates like subClassOf, domain/range etc) that have been labeled (maybe by something like dcterms:isRequiredBy "SuperAwesomeOntology"). The result should be a correct OWL-ontology that can be stored in a single file.
One constraint: it has to be done programmatically,(the copy/move/delete axioms interface of protege wont do), because if one of the reference ontologies gets an update, we want to be able to render the SuperAwesomeOntology again from its most up-to-date reference ontologies and find out if there are any conflicts.
How would we go about this? could SPARQL do this, how? Alternative suggestions to the isRequiredBy labeling are also welcome.
If I understand you correctly, you want to programmatically remove unused concepts from a large ontology or a collection of ontologies/graphs and you also want to remove concepts/classes that you identified as duplicates via interlinking.
Identified duplicates are easy to remove:
Define what a duplicate is for you. For example, nodes at either end of a owl:sameAs or skos:closeMatch link that are outside of your core graph (so you don't remove the "original").
Construct the new graph using a SPARQL query:
construct {?s ?p ?o.}
{
?s ?p ?o.
filter not exists {graph ?g {?s owl:sameAs ?x.} filter(?g!=<http://my.core.graph>)}
filter not exists {graph ?g {?o owl:sameAs ?x.} filter(?g!=<http://my.core.graph>)}
}
I tested this query for syntax and performance but not for correctness.
Unused concepts are more difficult to remove:
First, again, you need to define, what "unused" means for you. This criterion will however certainly involve reachability, or "connectedness" in the combined graph, where you want to only select the graph component that contains your core ontology. The problem is that, if you treat the triples as undirected edges, you will probably get a connected graph (that is only a single component and no nodes to remove) because the type hierarchy often connects everything. You could take the direction of edges into account, that is include only resources Y where there is a directed path from any resource X in your core ontology to Y.
This would ensure that you can go up the subclass hierarchy of the target ontology until e.g. owl:Thing but not down again. The problem is that you don't know what other type of edges are in the target ontology and in which direction they go but you could use only rdfs:subClassOf edges for now.
If you have sufficiently defined your "unused concept" or want to try it with some experimental definition, you can either use a graph library or graph analysis application and import your code there.
Here is an example of how to import a SPARQL endpoint into the Cytoscape.js JavaScript graph visualization library, it can be used in node as well. You need to heavily adapt the code however.
Or you do it again in SPARQL using SPARQL 1.1 property paths.
The problem is that those can have a large performance impact (or even a complexity that is way too large to ever complete) especially when applied to a large number of resources and an unrestricted path length. So it is possible that a query such as that times out but feel free to try and adapt it:
construct {?s ?p ?o.}
{
{?s ?p ?o.}
graph <http://my.core.graph> {?x rdfs:subClassOf ?X.}
{?x (<>|!<>)* ?s.}
}
The ?x rdfs:subClassOf ?X statement is just an identifier for which resources of your core ontology you want to use a source points, I couldn't get a valid query without that. When I apply a graph statement to the path expression, I get a syntax error from Virtuoso.

how to join the T-Box with the A-Box of this Ontology

I found this online Book ontology http://www.ebusiness-unibw.org/ontologies/opdm/book.html#
OWL Schema can be downloaded here:
http://www.ebusiness-unibw.org/ontologies/opdm/book.owl
RDF dump can be found here
http://eelst.cs.unibo.it/apps/LODE/source?url=http://www.ebusiness-unibw.org/ontologies/opdm/book_example.owl
I tried to copy past the RDF dump into the owl schema file, and then run protege and upload the resulting owl file, but I didn't see the instances, (i mean I didn't see the rdf dump data)
Any way to join them please?
I would like to open the resulting owl file in Protoge, and see the values for the instances and edit them and add more instances and so on
RDF graphs are just sets of triples. One very simple approach to combine the RDF would be be to just convert both RDF files to N-TRIPLES format (which has just one triple per line) and concatenate them into a single file. Then you'd have combined them.
But in order for the instance data to be interpreted correctly, it would need to have appropriate declarations (e.g., with resources declared as owl:NamedIndividuals). Even though the instance data uses some IRIs that are declared in the ontology doesn't mean that the instance data is an OWL ontology.

Downloaded DBpedia data does not include all instance-types triples

I downloaded some data from http://downloads.dbpedia.org/2015-04/core/,
including: instance-type_en.nt, mappingbased-properties_en.nt, and some others.
I have successfully loaded them into OpenLink Virtuoso DB, but when I run some sample SPARQL query, e.g., a query to see all the triples about a subject Xiamen_University, the problem appears.
select ?s ?p ?o
where
{
?s rdfs:label "Xiamen University"#en .
?s ?p ?o .
}
From the DBPedia SPARQL endpoint, there are heaps of triples about iamen_University; while in my db, there are only 4 or 5 of them.
Especially, there is no triple in db indicating Xiamen_University is a type of University, or any instance-type triples at all. I found similar cases on some other subjects as well.
I think the instance-types_en.nt file does not include all the instance-types triple from Wikipedia, same problem with mappingbased-properties. Is that right? If so, where could I find the right source file?
There's a whole list of the datasets available at on the downloads page. I don't see lots of documentation about exactly what's in each one of them, but the names are fairly descriptive, and the question mark links next to each one show a preview of what kind of information is in each one of them. Hovering over each title will provide a short description. E.g.:
It looks like to get most of the interesting properties, you'd probably want the mappingbased datasets, along with the labels dataset (since the query that you wrote identifies objects by label).

How do triple stores use linked data?

Lets say I have the following scenario:
I have some different ontology files hosted somewhere on the web on different domains like _http://foo1.com/ontolgy1.owl#, _http://foo2.com/ontology2.owl# etc.
I also have a triple store in which I want to insert instances based on the ontology files mentioned like this:
INSERT DATA
{
<http://foo1.com/instance1> a <http://foo1.com/ontolgy1.owl#class1>.
<http://foo2.com/instance2> a <http://foo2.com/ontolgy2.owl#class2>.
<http://foo2.com/instance2x> a <http://foo2.com/ontolgy2.owl#class2x>.
}
Lets say that _http://foo2.com/ontolgy2.owl#class2x is a subclass of _http://foo2.com/ontolgy2.owl#class2 defined within the same ontology.
And after the insert if I run a SPARQL query like this:
select ?a
where
{
?a rdf:type ?type.
?type rdfs:subClassOf* <http://foo2.com/ontolgy2.owl#class2> .
}
the result would be:
<http://foo2.com/instance2>
and not:
<http://foo2.com/instance2>
<http://foo2.com/instance2x>
as it should be. This is happening because the ontology file _http://foo2.com/ontolgy2.owl# is not imported into the triple store.
My question is:
Can we talk in this example about "linked" data? Because it seems to me that it is not linked at all. It has to be imported locally into a triple store, and after that you can start querying.
Lets say if you want to run a query on some complex data that is described by 20 ontology files, then all 20 ontology files would need to be imported.
Isn't this a bit disappointing?
Do I misunderstood triple stores and linked data and how they work together?
as it should be.
I'm not certain that should is the right term here. The semantics of the SPARQL query is to query the data stored in a particular graph stored at the endpoint. IRIs are more or less opaque identifiers; just because they might also be URLs from which additional data can be retrieved doesn't obligate any particular system to actually do that kind of retrieval. Doing that would easily make query behavior unpredictable: "this query worked yesterday, why doesn't it work today? oh, a remote website is no longer available…".
Lets say that _http://foo2.com/ontolgy2.owl#class2x is a subclass of _http://foo2.com/ontolgy2.owl#class2 defined within the same ontology.
Remember, since IRIs are opaque, anyone can define a term in any ontology. It's always possible for someone else to come along and say something else about a resource. You have no way of tracking all that information. For instance, if I go and write an ontology, I can declare http://foo2.com/ontolgy2.owl#class2x as a class and assert that it's equivalent to http://dbpedia.org/ontology/Person. Should the system have some way to know about what I did someplace else, and even if it did, should it be required to go and retrieve information from it? What if I made an ontology that's 2GB in size? Surely your endpoint can't be expected to go and retrieve that just to answer a quick query?
Can we talk in this example about "linked" data? Because it seems to
me that it is not linked at all. It has to be imported locally into a
triple store, and after that you can start querying.
Lets say if wan to run a query on some complex data that is describe
by 20 ontology files, in this case I have to import all 20 ontology
files.
This is usually the case, and the point about linked data is that you have a way to get more information if you choose to, and that you don't have to do as much work in negotiating how to identify resources in that data. However, you can use the service keyword in SPARQL to reference other endpoints, and that can provide a type of linking. For instance, knowing that DBpedia has a SPARQL endpoint, I can run a local query that incorporates DBpedia with something like this:
select ?person ?localValue ?publicName {
?person :hasLocalValueOfInterest ?localValue
service <http://dbpedia.org/sparql> {
?person foaf:name ?publicName
}
}
You can use multiple service blocks to aggregate data from multiple endpoints; you're not limited to just one. That seems pretty "linked" to me.

SPARQL 1.1 entailment regimes and query with FROM clause

I'm currently documenting/testing about SPARQL 1.1 entailment regimes and the recommendation repeatedly states that
The scoping graph is graph-equivalent to the active graph
but it does not specifies what is the active graph referring to : is it the data-set used in the query ? a union of all graphs in the store ?
As a test to determine this , I got this graph URIed <http://www.example.org/> in a Sesame Memory store with RDF Schema and direct type inferencing store (v2.7.14)
#prefix ex:<http://www.example.org/> .
ex:book1 rdf:type ex:Publication .
ex:book2 rdf:type ex:Article .
ex:Article rdfs:subClassOf ex:Publication .
ex:publishes rdfs:range ex:Publication .
ex:MITPress ex:publishes ex:book3 .
I've been trying the following query (which means using the default graph thus the inference engine)
SELECT ?s WHERE { ?s a ex:Publication . }
As expected, it returns me all three instances
<http://www.example.org/book1>
<http://www.example.org/book2>
<http://www.example.org/book3>
while the query :
SELECT ?s FROM ex: WHERE { ?s a ex:Publication . }
returns only
<http://www.example.org/book1>
Under the said circumstances, shouldn't both results be the same?
What should happen (according to the recommendation) if data and schema are split between two graphs in the store (like <urn:rdfs-schema> and <urn:data>, or even scattered across more graphs) and the query uses both graphs (or a subset of schema-related graphs) in the FROM clause instead of the default graph ?
Meaning should the inferencing be global throughout the store or does it depend on the query dataset ?
Or maybe is the recommendation loose enough to make this an implementation dependent issue ?
Thanks for your lights,
Max.
EDIT this question is being redirected to SPARQL 1.1 entailment regimes and query with FROM clause (follow-up)
Your second query only returns only book1 because in Sesame's RDFS inferencer, entailed statements are inserted in the default graph, not in the named graph(s) from which the premises for the entailment come. So the entailed results are simply not present in the graph you are querying.
The reason for this design choice is at least partly historic, as the Sesame RDFS inference engine predates the W3C notion of entailment regimes. The rationale at the time was that in the case of inferencing over several named graphs (where e.g. one premise comes from graph A and another from B), insertion in the default graph (rather than in either A, or B, or both) was simplest with the least amount for confusion.
Sesame currently does not explicitly support the W3C entailment regimes specification. However, if you feel that a simple improvement would be possible to make it more compatible, by all means log a feature request.
(disclosure: Sesame developer)
What exactly is in the default graph isn't specified by the SPARQL 1.1 standard. In particular, see 13.1 Examples of RDF Datasets which mentions that:
The definition of RDF Dataset does not restrict the relationships of
named and default graphs. Information can be repeated in different
graphs; relationships between graphs can be exposed. Two useful
arrangements are:
to have information in the default graph that includes provenance information about the named graphs
to include the information in the named graphs in the default graph as well.
However, by using FROM clauses to specify which graph should be the default graph, or by using multiple FROM clauses to specify which graphs should be merged to be the default graph.
That's all concerning the default graph. The active graph is another term you'll see in the SPARQL 1.1 spec:
The graph that is used for matching a basic graph pattern is the
active graph. In the previous sections, all queries have been shown
executed against a single graph, the default graph of an RDF dataset
as the active graph. The GRAPH keyword is used to make the active
graph one of all of the named graphs in the dataset for part of the
query.
So, you can use from (possible multiple times) to control the default graph, and thereby the initial active graph, and then graph { … } within a query to change the active graph.