Literals as subject in SPARQL triples - sparql

As far as I read from the grammar of SPARQL https://www.w3.org/TR/sparql11-query/#sparqlGrammar it is allowed to have literals as the subject in a triple:
[75] TriplesSameSubject ::= VarOrTerm PropertyListNotEmpty | TriplesNode PropertyList
[106] VarOrTerm ::= Var | GraphTerm
[109] GraphTerm ::= iri | RDFLiteral | NumericLiteral | BooleanLiteral | BlankNode | NIL
So it would be possible to have a triple such as:
(3, rdfs:label, 'three')
I can handle such triples in Python's rdflib, but when I try to make a federated SPARQL query with SERVICE in Virtuoso version 06.01.3127, Virtuoso complains. Here is an error message from my execution at a local install at http://localhost:8890/sparql
Virtuoso 37000 Error SP031: SPARQL compiler: No one quad map pattern is suitable for GRAPH _:_::default_8_4 { 3 <http://www.w3.org/2000/01/rdf-schema#label> ?s } triple at line 8
SPARQL query:
define sql:big-data-const 0
#output-format:text/html
define sql:signal-void-variables 1 PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
SELECT ?s WHERE {
SERVICE <http://127.0.0.1:5000/sparql> {
SELECT ?s WHERE {
3 rdfs:label ?s .
}
}
}
Is this a Virtuoso issue or is this a more general SPARQL/RDF issue, - or a programming error that I make?

Very few (I believe the current count is zero) enterprise-grade triple- or quad-stores handle "Generalized RDF", which is RDF that permits literals in the Subject position. As yet, this is generally understood not to scale well, though it can be useful or interesting at small scale, such as in CWM, TimBL's Python-based Closed World Machine.
Virtuoso is an enterprise-grade DBMS, handling tabular SQL data, graphical RDF data, and more. At present, and for the planned future, Virtuoso will not handle Generalized RDF.
All that said, it should be noted that you're running a rather old Virtuoso, from circa July 2012. Whether you're running Commercial or Open Source, updating to a more recent build (at least 7.2.4.2 as of April 2016) is strongly recommended for a wide variety of performance and functionality reasons.
ObDisclaimer: OpenLink Software produces Virtuoso, and employs me.

SPARQL has always include literals as subject in the grammar even at SPARQL 1.0 - variables from one position can used in another so via variables matching literals-as-subject happens anyway.
SPARQL 1.0:
https://www.w3.org/TR/rdf-sparql-query/#sparqlTriplePatterns
SPARQL 1.1:
https://www.w3.org/TR/sparql11-query/#sparqlTriplePatterns
Of course, they will not match RDF as stored.
The rules for CONSTRUCT say "reject such triples".
https://www.w3.org/TR/sparql11-query/#construct

Related

SPARQL Query with property path not working

I want to make a query that does the following: Select all triples (s,p,o) if there exists a path with the length of at least 2 edges from s to o with the property p. So all edges of the path have to be labelled with p.
I tried the following:
select ?s <http://dbpedia.org/ontology/isPartOf> ?o
WHERE {
?s <http://dbpedia.org/ontology/isPartOf>{2,} ?o.
?s <http://dbpedia.org/ontology/isPartOf> ?o
}
I executed it with the Jena API:
ParameterizedSparqlString parameterizedSparql = new ParameterizedSparqlString(model);
parameterizedSparql.setCommandText(sparql);
Query query = QueryFactory.create(parameterizedSparql.asQuery().toString(), Syntax.syntaxARQ);
QueryExecutionFactory.create(query, model).execSelect();
I used Syntax.syntaxARQ so that it should understand property paths.
It gives me the following error:
Exception in thread "main" org.apache.jena.query.QueryParseException: Encountered " "{" "{ "" at line 3, column 42.
Was expecting one of:
<IRIref> ...
<PNAME_NS> ...
<PNAME_LN> ...
<BLANK_NODE_LABEL> ...
<VAR1> ...
<VAR2> ...
Can you please show me how I can make the query correctly?
Also, as #AKSW noted, the {2,} syntax from the SPARQL 1.1 Working Draft didn't make it into the final SPARQL 1.1 spec, so you can't rely on it being supported by every SPARQL processor.
You can use the {2,} syntax with Virtuoso, which is the engine powering the public DBpedia endpoint, but to do so through Jena, you have to either use "extended syntax" (Syntax.syntaxARQ) or bypass the ARQ parser.
It appears that your immediate issue comes down to a bug in Jena, where ParameterizedSparqlString.asQuery() does not currently support "extended syntax" (Syntax.syntaxARQ) queries; parameterizedSparql.toString() should be sufficient, as commented by #AndyS.

SPARQL error attempt to attach a filter with used variable

I'm trying to extract information about 2 genes at the same time using this query:
BASE <http://www.southgreen.fr/agrold/>
PREFIX rdf:<http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX rdfs:<http://www.w3.org/2000/01/rdf-schema#>
PREFIX vocab:<vocabulary/>
SELECT DISTINCT ?gene ?gene_lbl ?pathway
WHERE{
VALUES ?gene {<http://identifiers.org/ensembl.plant/BGIOSGA000040>
<http://identifiers.org/ensembl.plant/Sb01g003700.1>}
{
GRAPH ?graph1{
OPTIONAL{?gene rdfs:label ?gene_lbl.}
}
}
UNION
{
GRAPH ?graph2{
OPTIONAL{?gene vocab:is_agent_in ?pathway.}
}
}
}
but it gives me the following error :
Virtuoso 37000 Error SP031: SPARQL compiler: Internal error: sparp_gp_attach_filter_cbk(): attempt to attach a filter with used variable
It work without problems when I run it using only one gene OR when I remove the OPTIONAL keyword, can someone explain to me the reason behind this behaviour ?
Endpoint Virtuoso version: 07.10.3211
EDIT :
Part of the complexity is due to the fact that this is just a sample of a bigger query.
#TallTed, thank you for you answer, when I apply your proposed method to extract more information I don't get the desired results. for instance in this example gene OB12G15100 encodes a protein, but it doesn't show up in the results unless if I comment the OPTIONAL of gene_lbl, as far as I know, since the gene_lbl is optional it can be ignored, hence, showing results of the rest of the query but it doesn't do so and I don't know why.
Please forgive my lack of knowledge.
I believe your target endpoint, now running a 3 year old version, should be encouraged to upgrade to the current Virtuoso Open Source Edition, 07.20.3229 a/k/a 7.2.5.1. Note that both your original query and my revision execute without error on the LOD Cloud Cache (which lacks some of the data in AgroLD, so these don't deliver the results you want).
That said, I think your original query is unnecessarily complex. Note that this revision gets results from AgroLD with no problem --
BASE <http://www.southgreen.fr/agrold/>
PREFIX rdf:<http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX rdfs:<http://www.w3.org/2000/01/rdf-schema#>
PREFIX vocab:<vocabulary/>
SELECT DISTINCT ?gene ?gene_lbl ?pathway
WHERE
{
VALUES ?gene { <http://identifiers.org/ensembl.plant/BGIOSGA000040>
<http://identifiers.org/ensembl.plant/Sb01g003700.1> }
OPTIONAL{ ?gene rdfs:label ?gene_lbl }
OPTIONAL{ ?gene vocab:is_agent_in ?pathway }
}

Use SPARQL property path on DBpedia

I'd like to find out if property paths exist between two entities on DBpedia. This is a sample query that I tried on snorql:
SELECT * WHERE {
:Braveheart (:|!:)* :Mel_Gibson
}
LIMIT 100
The queries runs into a memory error:
Virtuoso 42000 Error TN...: Exceeded 1000000000 bytes in transitive temp memory. use t_distinct, t_max or more T_MAX_memory options to limit the search or increase the pool SPARQL query: define sql:big-data-const 0 #output-format:application/sparql-results+json define input:default-graph-uri PREFIX owl: PREFIX xsd: PREFIX rdfs: PREFIX rdf: PREFIX foaf: PREFIX dc: PREFIX : PREFIX dbpedia2: PREFIX dbpedia: PREFIX skos: SELECT * WHERE { :Braveheart (:|!:)* :Mel_Gibson } LIMIT 100
I suspect someone's going to suggest setting up a local dbpedia mirror. If that's the case, I'd love some detailed steps on how to do so.
I think your query is a bit wrong for what you're trying to answer... also as there are no variables in it select * can't project anything out (i'd consider it a bug to even compile this), so let me rephrase your query to
ASK { dbr:Braveheart (<>|!<>)+ dbr:Mel_Gibson }
Sadly that query errs with the same problem you described.
While i agree, that complicated should be executed against local endpoints, the above query isn't complicated at all, especially considering that there are several direct edges between the two nodes:
SELECT * { dbr:Braveheart ?p dbr:Mel_Gibson }
I consider this a bug in Virtuoso's query planner and reported it: https://github.com/openlink/virtuoso-opensource/issues/641
Having said all that, i'd like to point out that in real cases you're probably interested in paths that don't only point forward. The direction of edges greatly depends on modelling. So consider using queries like these instead:
ASK { dbr:Braveheart ((<>|!<>)|^(<>|!<>))+ dbr:Mel_Gibson }
The expression says follow any edge in their direction or against it (^) for at least one step. (Yes, i also wonder why property paths didn't a short syntax for arbitrary edges ;) )
Spinning off #JörnHees's answer, a couple of points:
<> is an invalid predicate identifier. For Virtuoso, <> identifies a document (Location of Content that returns 200 OK on HTTP GET) which is why <#> or <#this> work. This isn't a parsing issue since it has more to do with the semantics of an identifier.
The public DBpedia endpoint isn't configured to accept that kind of query, hence the error.
Using <#this> rather than <>, we have --
prefix dbpedia: <http://dbpedia.org/resource/>
ASK { dbpedia:Braveheart (<#this>|!<#this>)+ dbpedia:Mel_Gibson }
Two alternative instances, both hosted by OpenLink Software (my employer, and producer of Virtuoso), that produce solutions for that query:
DBpedia-Live instance
LOD Cloud Cache instance

Complex SPARQL query - Virtuoso performance hints?

I have a rather complex SPARQL query, which is executed thousands of times in parallel threads (400 threads). The query is here somewhat simplified (namespaces, properties, and variables have been reduced) for readability, but the complexity is left untouched (unions, number of graphs, etc.). The query is run against 4 graphs, the biggest of which contains 5,561,181 triples.
PREFIX graphA: <GraphABaseURI:>
ASK
FROM NAMED <GraphBURI>
FROM NAMED <GraphCURI>
FROM NAMED <GraphABaseURI>
FROM NAMED <GraphDBaseURI>
WHERE{
{
GRAPH <GraphABaseURI>{
?variableA a graphA:ClassA .
?variableA graphA:propertyA ?variableB .
?variableB dcterms:title ?variableC .
?variableA graphA:propertyB ?variableD .
?variableL<GraphABaseURI:propertyB> ?variableD .
?variableD <propertyBURI> ?variableE
}
.
GRAPH <GraphBURI>{
?variableF <propertyCURI>/<propertyDURI> ?variableG .
?variableF <propertyEURI> ?variableH
}
.
GRAPH <GraphCURI>{
?variableI <http://www.w3.org/2004/02/skos/core#notation> ?variableJ .
?variableI <http://www.w3.org/2004/02/skos/core#prefLabel> ?variableK .
FILTER (isLiteral(?variableK) && REGEX(?variableK, "literalA", "i"))
}
.
FILTER (isLiteral(?variableJ) && ?variableG = ?variableJ) .
FILTER (?variableE = ?variableH)
}
UNION
{
GRAPH <GraphABaseURI>{
?variableA a graphA:ClassA .
?variableA graphA:propertyA ?variableB .
?variableB dcterms:title ?variableC .
?variableA graphA:propertyB ?variableD .
?variableL<propertyBURI> ?variableE .
?variableL <propertyFURI> ?variableD .
}
.
GRAPH <GraphDBaseURI>{
?variableM <propertyGURI> ?variableN .
?variableM <propertyHURI> ?variableO .
FILTER (isLiteral(?variableO) && REGEX(?variableO, "literalA", "i"))
}
.
FILTER (?variableE = ?variableN) .
}
UNION
{
GRAPH <GraphABaseURI>{
?variableA a graphA:ClassA .
?variableA graphA:propertyA ?variableB .
?variableB dcterms:title ?variableC .
?variableA graphA:propertyB ?variableD .
?variableL<propertyBURI> ?variableE .
?variableL <propertyIURI> ?variableD .
}
.
GRAPH <GraphDBaseURI>{
?variableM <propertyGURI> ?variableN .
?variableM <propertyHURI> ?variableO .
FILTER (isLiteral(?variableO) && REGEX(?variableO, "literalA", "i"))
}
.
FILTER (?variableE = ?variableN) .
}
. FILTER (isLiteral(?variableC) && REGEX(?variableC, "literalB", "i")) .
}
I would not expect someone to transform the above query (of course...). I am only posting the query to demonstrate the complexity and all the SPARQL structures used.
My questions:
Would I gain regarding performance if I had all my triples in one graph? This way I would avoid unions and simplify my query, however, would this also benefit in terms of performance?
Are there any kind of indexes that I could built and they could be of any help with the above query? I am not really confident on data indexing, however reading in the RDF Index Scheme section of RDF Performance Tuning, I wonder if Virtuoso 7's default indexing scheme is suitable for queries like the above. While the predicates are defined in the above query's SPARQL triple patterns, there are many triple patterns that have not defined subject or predicate. Could this be a major problem regarding performance?
Perhaps there is a SPARQL syntax structure that I am not aware of and could be of great help in the above query. Could you suggest something? For example, I have already improved performance by removing STR() casts and using the isLiteral() function. Could you suggest anything else?
Perhaps you could suggest overusing a complex SPARQL syntax structure?
Please note that I use Virtuoso Open source edition, built on Ubuntu, Version: 07.20.3214, Build: Oct 14 2015.
Regards,
Pantelis Natsiavas
First thing -- your Virtuoso build is long outdated; updating to 7.20.3217 as of April 2016 (or later) is strongly recommended.
Optimization suggestions are naturally limited when looking at a simplified query., but here are several thoughts, in no particular order...
Index Scheme Selection, the RDF Performance Tuning doc section following RDF Index Scheme, offers a couple of alternative and/or additional indexes which may make sense for your queries and data. As you say that some of your patterns will have defined graph and object, and undefined subject and predicate, some other indexes may also make sense (e.g., GOPS, GOSP), depending on some other factors.
Depending on how much your data has changed since original load, it may be worth rebuilding the free-text indexes, with this SQL command (which may be issued through any SQL interface -- iSQL, ODBC, JDBC, etc.) —
VT_INC_INDEX_DB_DBA_RDF_OBJ ()
Using the bif:contains predicate can result in substantially better performance than regex() filters, for instance replacing —
FILTER (isLiteral(?variableO) && REGEX(?variableO, "literalA", "i")) .
— with —
?variableO bif:contains "'literalA'" .
FILTER ( isLiteral(?variableO) ) .
Explain() and profile() can be helpful in query optimization efforts. Much of this output is meant for analysis by Development, so it may not mean much to you, but providing it to other Virtuoso users can still yield helpful suggestions.
For a number of reasons, the rdf:type predicate (often expressed as a, thanks to SPARQL/Turtle semantic sugar) can be a performance killer. Removing those predicates from your graph pattern is likely to boost performance substantially. If needed, there are other ways to limit the solution set (such as by testing for attributes only possessed by entities your desired rdf:type) which do not have such negative performance impacts.
(ObDisclaimer: OpenLink Software produces Virtuoso, and employs me.)

Protege sees relationship, Virtuoso doesn't

Viewing the go-plus ontology in a freshly installed, stock Protégé 5, I found a useful inference in the entities tab for http://purl.obolibrary.org/obo/GO_0003215:
'cardiac right ventricle morphogenesis' 'results in morphogenesis of' some 'cardiac ventricle'
'results in morphogenesis of' in this case is http://purl.obolibrary.org/obo/RO_0002298 and 'cardiac ventricle' is http://purl.obolibrary.org/obo/UBERON_0002082
If I load the same ontology into Virtuoso Open Source 07.20.3217 and describe http://purl.obolibrary.org/obo/GO_0003215, no relationship with 'cardiac ventricle' is listed. (Even after enabling OWL inference.)
However, http://purl.obolibrary.org/obo/GO_0003215 is linked to an anonymous node with
rdf:type owl:Restriction
owl:onProperty n3:RO_0002298
owl:someValuesFrom n3:UBERON_0002080
Where n3 is http://purl.obolibrary.org/obo/
Is there a Virtuoso configuration that would make this relationship clear in a describe view?
Is there some concise SPARQL syntax that would make the relationship clear? Currently, I'm using
select distinct ?goid (str(?goterm) as ?go_str)
?svf (str(?anatomy ) as ?anat_str)
where
{
?goid obo:hasOBONamespace 'biological_process'^^xsd:string .
?goid rdfs:label ?goterm .
?goid rdfs:subClassOf+ ?parent .
?parent owl:someValuesFrom* ?svf .
?svf rdfs:subClassOf+
<http://purl.obolibrary.org/obo/UBERON_0001062> .
?svf rdfs:label ?anatomy
}
There are many things in play here.
You can use Property Paths for transitivity, as described in comment by #ASKW.
If you want to leverage Virtuoso's built-in reasoning for relationship types described by RDF Schema (rdfs:subClassOf, rdfs:subPropertyOf, rdfs:subClassOf) or OWL (owl:equivalentProperty, owl:equivalentClass, owl:SymmetricProperty, owl:inverseOf, etc.), then you can leverage the inference rules pragma as described in #MarkMiller's comments (note reference to a blog post about that usage pattern).
If you want to write custom inference rules (i.e., use SPARQL as your Inference Rules language), then you will need Virtuoso 8.0 (coming soon) which delivers that capability. Note, this is the ultimate solution, as you can write your own algorithms using SPARQL.
How do you enable OWL reasoning in Virtuoso? You should know that it doesn't support OWL DL reasoning, but only some kind of rule-based reasoning which only covers a small part of OWL DL. Protege on the other hand supports OWL DL reasoning by means of reasoners like HermiT, Pellet, etc.
If you mean by "enable OWL inference" just using SPARQL 1.1 proeprty paths on the rdfs:subClassOf relation, then this is far away from OWL DL reasoning. It just means to consider the transitive closure of that relation from a start node in the graph, nothing more and no more magic behind.