Let's say I make the following insertions into my GraphDB 8.3 triplestore:
PREFIX : <http://example.com/>
insert data { :hello a :word }
and
PREFIX : <http://example.com/>
insert data { graph :farewells { :goodbye a :word }}
now, if I ask
select * where {
graph ?g {
?s ?p ?o .
}
}
I only get
+--------------------------------+------------------------------+---------------------------------------------------+---------------------------+
| ?g | ?s | ?p | ?o |
+--------------------------------+------------------------------+---------------------------------------------------+---------------------------+
| <http://example.com/farewells> | <http://example.com/goodbye> | <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> | <http://example.com/word> |
+--------------------------------+------------------------------+---------------------------------------------------+---------------------------+
I can obviously get both "triples about words" with the following, but then the named-graph membership is not shown
select * { ?s ?p ?o }
How can I write a query that retrieves both triples about words and indicates that { :goodbye a :word } comes from graph :farewells ?
You can do something along these lines:
SELECT *
WHERE {
{ GRAPH ?g { ?s ?p ?o } }
UNION
{ ?s ?p ?o .
FILTER NOT EXISTS { GRAPH ?g { ?s ?p ?o } }
}
}
The first part of the union selects all triples in named graphs. The second part grabs all triples in the default graph, explicitly excluding patterns that occur in a named graph.
In GraphDB, you could use pseudographs for this purpose, i. e. <http://www.ontotext.com/explicit> (it seems you are not using inferencing).
Try this query:
SELECT * FROM NAMED <http://www.ontotext.com/explicit>
{ GRAPH ?g { ?s ?p ?o } }
The result should be:
+------------------------------------+----------+-----------+-------+
| ?g | ?s | ?p | ?o |
+------------------------------------+----------+-----------+-------+
| <http://www.ontotext.com/explicit> | :hello | rdf:type | :word |
| :farewells | :goodbye | rdf:type | :word |
+------------------------------------+----------+-----------+-------+
For comparison, note that
SELECT * FROM NAMED <http://www.openrdf.org/schema/sesame#nil>
{ GRAPH ?g { ?s ?p ?o } }
will return only
+------------------------------------+----------+-----------+-------+
| ?g | ?s | ?p | ?o |
+------------------------------------+----------+-----------+-------+
| <http://www.ontotext.com/explicit> | :hello | rdf:type | :word |
+------------------------------------+----------+-----------+-------+
Short "answer": avoid putting data into the default graph in GraphDB (and other triple stores with a "virtual" default graph which is just a UNION of all named graphs)
Background: GraphDB decided to define the default graph as the union of all named graphs and the default graph. This behaviour is not backed by the SPARQL semantics specification, but is implementation-specific behaviour.
So you really have three options:
Use FILTER NOT EXISTS or MINUS as explained by Jeen Broekstra. This can have a serious negative impact on query performance.
Use the GraphDB pseudographs as exemplified by Stanislav Kralin. This option makes your queries (and your system) dependent on GraphDB -- you can not change the SPARQL engine later, without adapting your queries.
Avoid putting data into the default graph. You might define "your own" default graph, e.g. call it http://default/, and put it in the SPARQL FROM clause.
Other triple stores allow to enable/disable this feature. I couldn't find a switch in the documentation of GraphDB. Otherwise this would be the 4th, and my preferred, option.
Related
Trying to execute a query using rdf4j console against a sparql endpoint to find the path between 2 nodes using property wildcards but no luck. The first query gives an error as
Malformed query: Not a valid (absolute) IRI:
The second query crashes the console. Should I try to use the query using a different way to query the endpoint as this maybe an rdf4j issue or is the query itself wrong?
PREFIX xy: <http://mainuri/>
select
*
where
{
<http://uriOfInstanceOfData> ((<>|!<>)|^(<>|!<>))* ?x .
?x ?p ?o .
?o ((<>|!<>)|^(<>|!<>))* <http://uriOfInstanceOfData>.
}
AND
PREFIX xy: <http://mainuri/>
select
*
where
{
<http://uriOfInstanceOfData> (xy:|!xy:)* ?x .
?x ?p ?o .
?o (xy:|!xy:)* <http://uriOfInstanceOfData>.
}
The first query is syntactically incorrect: <> is not a valid IRI reference. The SPARQL grammar allows the empty string, but the specification also notes that any IRI reference must be a string that (after escape processing results) in a valid RFC3987 IRI. Since an IRI requires, at a mimimum, a scheme identifier, an empty string can by definition not be a valid IRI.
The second query works when I try it on a small test dataset. However it is likely very expensive to process.
EDIT the query I actually tried:
PREFIX xy: <http://mainuri/>
select
*
where
{
rdfs:domain (xy:|!xy:)* ?x .
?x ?p ?o .
?o (xy:|!xy:)* rdf:Property.
}
On a local in-memory database with basic RDFS inferencing enabled, that gives the following result:
Evaluating SPARQL query...
+------------------------+------------------------+------------------------+
| x | p | o |
+------------------------+------------------------+------------------------+
| rdfs:domain | rdf:type | rdf:Property |
| rdfs:domain | rdfs:domain | rdf:Property |
+------------------------+------------------------+------------------------+
2 result(s) (28 ms)
Here a SPARQL query:
PREFIX : <...#>
SELECT *
WHERE { { :Airspace_LSAGE_411 ?p ?o . }
UNION { :Airspace_LSAGN_411 ?p ?o . }
UNION { :Airspace_LSAGS_411 ?p ?o . }
} LIMIT 2000
This will get me the properties and associated values of the three airspace-objects Airspace_LSAGE_411, Airspace_LSAGN_411, Airspace_LSAGS_411.
The Problem is that in the result table, I only have the columns p and o. So I do not know which row belongs to which airspace, for example:
p o
---- ----
:color red
:color blue
Is it possible to repeat the airspace name in the result to get something like this:
s p o
---- ---- ----
Airspace_LSAGE_411 color red
Airspace_LSAGN_411 color blue
I know that differentiation should be easy by doing three queries, one after the another, but my main point is how to get complete triples as a result.
Use VALUES to provide the data inline.
SELECT ?s ?p ?o
{VALUES ?s { :Airspace_LSAGE_411 :Airspace_LSAGN_411 :Airspace_LSAGS_411}
?s ?p ?o. }
My goal is to graphically represent the S->P->O relations within a depth two edges from the specified resource, p:Person_1. I want all relations within that path length to be returned from my query as ?s, ?p, ?o for further processing in my graphical application.
I tried the first query below which gives me my first set of ?s ?p ?o with repeats, then ?p2, ?o2, ?p3, ?o3 as additional columns in the result. I want to bind ?p2 and ?p3 to ?p, ?o2 and ?o3 to ?o.
SELECT *
WHERE {
p:Person_1 ?p ?o .
BIND("p:Person_1" as ?s)
OPTIONAL{
?o ?p2 ?o2 .
}
OPTIONAL{
?o2 ?p3 ?o3 .
}
}
Then, based on How do I construct get the whole sub graph from a given resource in RDF Graph?, I tried using CONSTRUCT to return the graph.
PREFIX p: <http://www.example.org/person/>
PREFIX x: <example.org/foo/>
construct { ?s ?p ?o }
FROM <http://localhost:8890/MYGRAPH>
where { p:Person_1 (x:|!x:)* ?s .
?s ?p ?o .
}
I am using Virtuoso and I get the error:
Virtuoso 37000 Error SP031: SPARQL compiler: Variable ?_::trans_subj_9_3 in T_IN list is not a value from some triple
I could post-process the result from my first query but I want to learn how to do this correctly with SPARQL, preferably on Virtuoso.
Update after testing the advice from #AKSW :
Both CONSTRUCT and SELECT statements work with the pattern suggested.
CONSTRUCT { ?s ?p ?o }
FROM <http://localhost:8890/MYGRAPH>
where { p:Person_1 (x:foo|!x:bar)* ?s .
?s ?p ?o .
} LIMIT 100
and:
SELECT s ?p ?o
FROM <http://localhost:8890/MYGRAPH>
where { p:Person_1 (x:foo|!x:bar)* ?s .
?s ?p ?o .
} LIMIT 100
The SELECT results in several duplicates that cannot be removed using DISTINCT, which results in an error that I assume is due to the 'datatype' of some of the returned values.
Virtuoso 22023 Error SR066: Unsupported case in CONVERT (DATETIME -> IRI_ID)
It appears some post-SPARQL processing is in order.
This gets me most of the way there. Still hoping I can find a solution for SPARQL that is like Cypher's "number of hops away" :
OPTIONAL MATCH path=s-[*1..3]-(o)
Here is a SPARQL query that works in Virtuoso. Note the SPARQL W3C standard does not support this syntax and it will fail in other triplestores.
PREFIX p: <http://www.example.org/person/>
PREFIX x: <example.org/foo/>
# CONSTRUCT {?s ?p ?o} # If you wish to return the graph
SELECT ?s ?p ?o # To return the triples
FROM <http://localhost:8890/MYGRAPH>
where { p:Person_1 (x:foo|!x:bar){1,3} ?s .
?s ?p ?o .
}LIMIT 100
See also K. Idehen's wiki entry here: http://linkedwiki.com/exampleView.php?ex_id=141
And thanks to #Joshua Taylor for advice in the same area.
Working Drafts of SPARQL 1.1 Property Paths included the {n,m} operator for handling this issue, which was implemented (and will remain supported) in Virtuoso. Here's a tweak to #tim's response.
Live SPARQL Query Results Page using the DBpedia endpoint (which is a Virtuoso instance).
Live SPARQL Query Definition Page that opens up query source code in the default DBpedia query editor.
Actual Query Example:
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
SELECT DISTINCT ?s AS ?Entity
?o AS ?Category
WHERE {
?s rdf:type <http://dbpedia.org/ontology/AcademicJournal> ;
rdf:type{1,3} ?o
}
LIMIT 100
Should you be looking for LinkedIn-like presentation of Contact Networks and Degrees of Separation between individuals, here is an example using Virtuoso-specific SPARQL Extensions that solve this particular issue:
SELECT ?o AS ?WebID
((SELECT COUNT (*) WHERE {?o foaf:knows ?xx})) AS ?contact_network_size
?dist AS ?DegreeOfSeparation
<http://www.w3.org/People/Berners-Lee/card#i> AS ?knowee
WHERE
{
{
SELECT ?s ?o
WHERE
{
?s foaf:knows ?o
}
} OPTION (TRANSITIVE, t_distinct, t_in(?s), t_out(?o), t_min (1), t_max (4), t_step ('step_no') AS ?dist) .
FILTER (?s= <http://www.w3.org/People/Berners-Lee/card#i>)
FILTER (isIRI(?s) and isIRI(?o))
}
ORDER BY ?dist DESC (?contact_network_size)
LIMIT 500
Note: this approach is the only way (at the current time) to expose actual relational hops between entities in an Entity Relationship Graph that includes Transitive relations.
Live Link to Query Results
Live Link to Query Source Code
Bearing in mind that the r{n,m} operator was deprecated in the final SPARQL 1.1 (but will remain supported in Virtuoso), you can use r/r?/r? instead of r{1,3}, if you want to work strictly off the current spec:
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
SELECT DISTINCT ?s AS ?Entity
?o AS ?Category
WHERE {
?s rdf:type <http://dbpedia.org/ontology/AcademicJournal> ;
rdf:type / rdf:type? / rdf:type? ?o
}
LIMIT 100
Here's a live example, against the DBpedia instance hosted in Virtuoso.
Is possible to select the "negative" of a sparql query?
For instance, consider the following RDF data, query and desired result:
knowledge base:
#prefix gr:<http://purl.org/goodrelations/v1#>.
#prefix rdfs:<http://www.w3.org/2000/01/rdf-schema#>.
:prod_A :hasProp :propA.
:prod_B :hasProp :propB.
:propB rdfs:label "Hello".
:prod_C :hasProp :propC.
:prod_D :hasProp :propD.
Imaginary Query:
PREFIX gr:<http://purl.org/goodrelations/v1#>
SELECT ?prod WHERE
{ !(
?prod ?p ?o.
?o ?p2 ?o2.
) }
Ideal Result:
| ?prod |
|---------|
| :prod_A |
| :prod_C |
| :prod_D |
is there a way through? (i need it for a delete)
I think MINUS is what you are looking for:
PREFIX gr:<http://purl.org/goodrelations/v1#>
SELECT ?prod WHERE
{
?prod ?p ?o.
MINUS { ?o ?p2 ?o2 }
}
It takes the things matched by the left hand side (?prod ?p ?o) and removes any that correspond to items matched by the MINUS pattern.
Note this doesn't give your desired answer because the ?prod ?p ?o pattern matches everything including the linked property (:propB rdfs:label "Hello") that you aren't interested in for your results
To get your desired answer you need to make the first part of the query more specific like so:
PREFIX :<http://example.org/>
PREFIX gr:<http://purl.org/goodrelations/v1#>
SELECT ?prod WHERE
{
?prod :hasProp ?o.
MINUS { ?o ?p2 ?o2 }
}
Here I changed the ?p variable to be the :hasProp constant instead. With this query I get your desired answer.
NB - You didn't define the empty prefix in your example so I invented one to make the query valid and so I could test that it worked
Another way is to use FILTER NOT EXISTS
SELECT ?prod WHERE
{
?prod :hasProp ?o.
FILTER NOT EXISTS { ?o rdfs:label "Hello". }
}
which does the first ?prod :hasProp ?o. then checks whether ?o causes ?o rdfs:label "Hello" to match. Use the form you find easiest to understand.
How can I find the distance between 2 nodes in a graph using Virtuoso? I've read the Transitivity documentations but they limit you to one predicate e.g.:
SELECT ?link ?g ?step ?path
WHERE
{
{
SELECT ?s ?o ?g
WHERE
{
graph ?g {?s foaf:knows ?o }
}
} OPTION (TRANSITIVE, t_distinct, t_in(?s), t_out(?o), t_no_cycles, T_shortest_only,
t_step (?s) as ?link, t_step ('path_id') as ?path, t_step ('step_no') as ?step, t_direction 3) .
FILTER (?s= <http://www.w3.org/People/Berners-Lee/card#i>
&& ?o = <http://www.advogato.org/person/mparaz/foaf.rdf#me>)
}
LIMIT 20
Only traverses foaf:knows and not any predicate type. How can I extend this to 'whatever predicate'? I don't need the actual path, just a true/false (ASK query). Changing the foaf:knows to ?p seems like an overkill.
I'm currently performing a set of recursive ASKs to find out if two nodes are connected within a specific distance but that doesn't seem efficient.
You should be able to use ?p instead of foaf:knows in your query to determine if there's a path between the nodes. E.g.:
SELECT ?link ?g ?step ?path
WHERE
{
{
SELECT ?s ?o ?g
WHERE
{
graph ?g {?s ?p ?o }
}
} OPTION (TRANSITIVE, t_distinct, t_in(?s), t_out(?o), t_no_cycles, T_shortest_only,
t_step (?s) as ?link, t_step ('path_id') as ?path, t_step ('step_no') as ?step, t_direction 3) .
FILTER (?s= <http://www.w3.org/People/Berners-Lee/card#i>
&& ?o = <http://www.advogato.org/person/mparaz/foaf.rdf#me>)
}
LIMIT 20
Here's an approach that works if there's at most one path between the nodes that you're interested in.
If you have data like this (note that there are different properties connecting the resources):
#prefix : <https://stackoverflow.com/q/3914522/1281433/>
:a :p :b .
:b :q :c .
:c :r :d .
Then a query like the following finds the distance between each pair of nodes. The property path (:|!:) consists a property that is either : or something other than : (i.e., anything). Thus (:|!:)* is zero or more occurrences of any property; it's a wildcard path. (The technique used here is described more fully in Is it possible to get the position of an element in an RDF Collection in SPARQL?.)
prefix : <https://stackoverflow.com/q/3914522/1281433/>
select ?begin ?end (count(?mid)-1 as ?distance) where {
?begin (:|!:)* ?mid .
?mid (:|!:)* ?end .
}
group by ?begin ?end
order by ?begin ?end ?distance
--------------------------
| begin | end | distance |
==========================
| :a | :a | 0 |
| :a | :b | 1 |
| :a | :c | 2 |
| :a | :d | 3 |
| :b | :b | 0 |
| :b | :c | 1 |
| :b | :d | 2 |
| :c | :c | 0 |
| :c | :d | 1 |
| :d | :d | 0 |
--------------------------
To just find out whether there's a path between two nodes that's less than some particular length, you use an ask query instead of a select, fix the values of ?begin and ?end, and restrict the value of count(?mid)-1 rather than binding it to ?distance. E.g., is there a path from :a to :d of length less than three?
prefix : <https://stackoverflow.com/q/3914522/1281433/>
ask {
values (?begin ?end) { (:a :d) }
?begin (:|!:)* ?mid .
?mid (:|!:)* ?end .
}
group by ?begin ?end
having ( (count(?mid)-1 < 3 ) )
Ask => No
On the other hand, there is a path from :a to :c with length less than 5:
prefix : <https://stackoverflow.com/q/3914522/1281433/>
ask {
values (?begin ?end) { (:a :c) }
?begin (:|!:)* ?mid .
?mid (:|!:)* ?end .
}
group by ?begin ?end
having ( (count(?mid)-1 < 5 ) )
Ask => Yes