There is probably an easy to answer to this, but I can't even figure out how to formulate the Google query to find it.
I'm writing SPARQL construct queries against a dataset that includes blank nodes. So if I do a query like
CONSTRUCT {?x ?y ?z .}
WHERE {?x ?y ?z .}
Then one of my results might be:
nm:John nm:owns _:Node
Which is a problem if all of the
_:Node nm:has nm:Hats
triples don't also get into the query result somehow (because some parsers I'm using like rdflib for Python really don't like dangling bnodes).
Is there a way to write my original CONSTRUCT query to recursively add all triples attached to any bnode results such that no bnodes are left dangling in my new graph?
Recursion isn't possible. The closest I can think of is SPARQL 1.1 property paths (note: that version is out of date) but bnode tests aren't available (afaik).
You could just remove the statements with trailing bnodes:
CONSTRUCT {?x ?y ?z .} WHERE
{
?x ?y ?z .
FILTER (!isBlank(?z))
}
or try your luck fetching the next bit:
CONSTRUCT {?x ?y ?z . ?z ?w ?v } WHERE
{
?x ?y ?z .
OPTIONAL {
?z ?w ?v
FILTER (isBlank(?z) && !isBlank(?v))
}
}
(that last query is pretty punishing, btw)
You may be better off with DESCRIBE, which will often skip bnodes.
As user205512 suggests, performing that grab recursively is not possible, and as they point out, using optional(s) to go arbitrary levels down into your data getting the nodes is not feasible on anything but non-trivial sized databases.
Bnodes themselves are locally scoped, to the result set, or to the file. There's no guarantee that a BNode is you get from parsing or from a result set is the same id that is used in the database (though some database do guarantee this for query results). Furthermore, a query like "select ?s where { ?s ?p _:bnodeid1 }" is the same as "select ? where { ?s ?p ?o }" -- note that bnode is treated as a variable in that case, not as "the thing w/ the id 'bnodeid1'" This quirk of the design makes it difficult to query for bnodes, so if you are in control of the data, I'd suggest not using them. It's not hard to generate names for stuff that would otherwise be bnodes, and named resources v. bnodes will not increase overhead during querying.
That does not help you recurse down and grab data, but for that, I don't recommend doing such general queries; they don't scale well and usually return more than you want or need. I'd suggest you do more directed queries. Your original construct query will pull down the contents of the entire database, that's generally not what you want.
Lastly, while describe can be useful, there's not a standard implementation; the SPARQL spec doesn't define any particular behavior, so what it returns is left to the database vendor, and it can be different. That can make your code less portable if you plan on trying different databases with your application. If you want a specific behavior out of describe, you're best off implementing it yourself. Doing something like the concise bounded description for a resource is an easy piece of code, though you can run into some headaches around Bnodes.
With regard to working with the ruby RDF.rb library, which allows SPARQL queries with significant convenience methods on RDF::Graph objects, the following should expand blank nodes.
rdf_type = RDF::SCHEMA.Person # for example
rdf.query([nil, RDF.type, rdf_type]).each_subject do |subject|
g = RDF::Graph.new
rdf.query([subject, nil, nil]) do |s,p,o|
g << [s,p,o]
g << rdf_expand_blank_nodes(o) if o.node?
end
end
def rdf_expand_blank_nodes(object)
g = RDF::Graph.new
if object.node?
rdf.query([object, nil, nil]) do |s,p,o|
g << [s,p,o]
g << rdf_expand_blank_nodes(o) if o.node?
end
end
g
end
Related
Suppose I have the following alternative property path:
select *
where{
?sub :p1|:p2 ?ob
}
How can I bind a third variable (?map) which keeps track of what path (:p1 or :p2) was taken for each match?
If this is impossible is there another way to enumerate paths?
I am intentionally avoiding expanding this predicate as a UNION because in real life my pattern includes multiple property paths and UNIONS would make the query ridiculously long.
edit: clarify in response to comments, I am not looking for graph traversal, just pairs of nodes with properties linking them
can just add the predicate variable explicitly:
select *
where{
?sub :p1|:p2 ?ob .
?sub ?pred ?ob .
}
Sandboxed demos:
answer
comment suggestion (equivalent)
I wrote this query but it does't work. Anyone knows what is the problem.
PREFIX : <http://www.semanticweb.org/ontologies/2009/pizza.owl#>
SELECT ?X ?Y
WHERE {?X :hasCountryOfOrigin "Italy".
?Y :hasCalorificValue "400"}
According to the Pizza ontology tutorial here, there are two main issues with your query:
hasCountryOfOrigin is an object property, thus, the values can't be literals. Italy is an individual, thus, you have to use the correct URI, probably http://www.semanticweb.org/ontologies/2009/pizza.owl#Italy
The data property hasCalorificValue has values of type integer, i.e. literals should be used like "400"^^xsd:integer (or maybe xsd:int, depends on what you've chosen in Protege)
Both triple patterns in your query are not connected, i.e. no shared variable. I don't see the goal of your query.
PREFIX : <http://www.semanticweb.org/ontologies/2009/pizza.owl#>
SELECT ?X ?Y
WHERE {?X :hasCountryOfOrigin :Italy.
?Y :hasCalorificValue "400"^^xsd:integer}
Is there a good kind of SPARQL query that let's me answer if two given nodes are connected on a single / multiple SPARQL endpoints?
Let's say i want to check if the two nodes
<http://wiktionary.dbpedia.org/resource/dog>
and
<http://dbpedia.org/resource/Dog>
are connected. If yes, i'd be interested in the path.
By guessing i already knew they were connected via the label, so a query like this returns a path of length 3:
SELECT * WHERE {
<http://wiktionary.dbpedia.org/resource/dog> ?p1 ?n1.
# SERVICE <http://dbpedia.org/sparql> {
<http://dbpedia.org/resource/Dog> ?p2 ?n1 .
# }
}
try yourself
Now what if i don't have an idea yet and want to do this automatically & for arbitrary length and direction?
I'm aware of SPARQL 1.1's property paths, but they only seem to work for known properties (http://www.w3.org/TR/sparql11-query/#propertypaths):
Variables can not be used as part of the path itself, only the ends.
Also i would want to allow for any path, so the predicates on the path may change.
My current (as i find ridiculous) approach is to query for all possible paths of length k up to a limit of n.
Dumping isn't an option for me as it's billions of triples... I want to use SPARQL!
While you can't use variables in property paths, you can use a wildcard by taking advantage of the fact that for any URI, every property either is that property or isn't. E.g., (<>|!<>) matches any property, since every property either is <> or isn't. You can make a wildcard that goes in either direction by alternating that with itself in the other direction: (<>|!<>)|^(<>|!<>). That means that there's a path, with properties going in either direction, between two nodes ?u and ?v when
?u ((<>|!<>)|^(<>|!<>))* ?v
For instance, the following query should return true (indicating that there is a path):
ASK {
<http://wiktionary.dbpedia.org/resource/dog> ((<>|!<>)|^(<>|!<>))* <http://dbpedia.org/resource/Dog>
}
Now, to actually get the links of a path between between two nodes, you can do (letting <wildcard> stand for the nasty looking wildcard):
?start <wildcard>* ?u .
?u ?p ?v .
?v <wildcard>* ?end .
Then ?u, ?p, and ?v give you all the edges on the path. Note that if there are multiple paths, you'll be getting all the edges from all the paths. Since your wildcards go in either direction, you can actually get to anything reachable from the ?start or ?end, so you really should consider restricting the wildcard somehow.
On the endpoint that you linked to, it doesn't, but that appears to be an issue with Virtuoso's implementation of property paths, rather than a problem with the actual query.
Do note that this will be trivially satisfied in many cases if you have any kind of inference happening. E.g., if you're using OWL, then every individual is an instance of owl:Thing, so there'd always be a path of the form:
?u →rdf:type owl:Thing ←rdf:type ?v
I am trying to build a (local) ontology that describes a finite number of objects, and that links these objects to external resources via the owl:sameAs predicate. However, when I simply query for the number of objects of that kind, I obtain twice as much as the object described. It is clear that also the external resources are counted independently, as what is taken into account is the number of URIs, and not the number of distinct objects.
I have solved this issue in the following way: I assume that the local ontology can be seen as a "reference hub" for knowing basic stuff about these objects, so I select all the objects of a certain kind, and then filter out only those that contain the base URI of the local ontology, i.e.:
# How many objects are there?
PREFIX ch: <http://www.example.com/ontologies/domain#>
SELECT (COUNT(DISTINCT ?elem) AS ?count) WHERE {
?elem a ch:Element.
FILTER (REGEX (STR(?elem) ,"http://www.example.com/ontologies/domain") ).
}
However, I have two concerns with this way of doing:
1) it looks a bit of a hack (even if somehow principled), whilst I would like something that makes more logical sense
2) I have the impression that this query is not very efficient.
I have searched quite a bit here, and on google, but didn't come out with any better solution... any suggestions here?
Thank you very much for any help!
GROUP BY a representative element
If there's some property that should have distinct values for each individual, then you can use it to impose the "equivalent class" structure that you need. E.g., something like this:
prefix ch: <http://www.example.com/ontologies/domain#>
select (count(?label) as ?count) where {
?elem a ch:element ;
rdfs:label ?label .
}
group by ?label
Synthesize a representative element
if there's not a value that will be shared by all elements in an equivalence class, you can still get a representative element from the set by asking for the minimal element in each equivalence class. We can use the IRIs of the elements to order the elements, and use that to select a unique individual. This does presume that each ?elem and all the things that it is the same as have well defined behavior under the str function (and IRIs do).
prefix ch: <http://www.example.com/ontologies/domain#>
select (count(distinct ?elem) as ?count) where {
?elem a ch:element .
filter not exists {
?elem (owl:sameAs|^owl:sameAs)* ?elem_
filter( str(?elem_) < str(?elem) )
}
}
I've studied SPARQL specification on the topic and also found this answer rather interesting. However definitions are complicated enough, so I still don't see the answer for my question.
I can't find any example of query with blank nodes that returns different results than the same query with variables in place of blank nodes.
For example is there any case when the following queries return different results:
SELECT ?a ?b
WHERE {
?a :predicate _:blankNode .
_:blankNode :otherPredicate ?b .
}
SELECT ?a ?b
WHERE {
?a :predicate ?variable .
?variable :otherPredicate ?b .
}
Maybe there are more complex queries that cause different behavior?
In particular I wonder is there any examples of different results of queries executed on an RDF graph that doesn't have blank nodes.
Thanks.
PS. Yes, I know that blank nodes can be used only in one BasicGraphPattern as opposed to variables. But this is not the difference I'm talking about.
The answer that you linked to is about blank nodes in the data that is being queried, not about blank nodes in the query. You're absolutely right that blank nodes in the query act just like variables. The specification says this (emphasis added):
4.1.4 Syntax for Blank Nodes
Blank nodes in graph patterns act as variables, not as references to
specific blank nodes in the data being queried.
Blank nodes are indicated by either the label form, such as "_:abc",
or the abbreviated form "[]". A blank node that is used in only one
place in the query syntax can be indicated with []. A unique blank
node will be used to form the triple pattern. Blank node labels are
written as "_:abc" for a blank node with label "abc". The same blank
node label cannot be used in two different basic graph patterns in the
same query.
As such, your queries
SELECT ?a ?b
WHERE {
?a :predicate _:blankNode .
_:blankNode :otherPredicate ?b .
}
SELECT ?a ?b
WHERE {
?a :predicate ?variable .
?variable :otherPredicate ?b .
}
behave identically. The benefit of using a blank node instead of a variable is that you can use some more compact syntax. In this case, you could write:
SELECT ?a ?b
WHERE {
?a :predicate [ :otherPredicate ?b ] .
}
Actually, in this case, since you're only looking for one property on the thing that the blank node matches, you could use a property path:
SELECT ?a ?b
WHERE {
?a :predicate/:otherPredicate ?b .
}
For most entailment regimes, blank nodes are variables within the basic graph pattern. For OWL-DL (and others) you can get more answers (examples include the "little house" and "Oedipus" examples -- the Description Logic Handbook has details).
In the defn of SPARQL http://www.w3.org/TR/sparql11-query/#BasicGraphPattern for simple entailment the instance mapping σ(b) behaves just like the solution mapping μ(v).
One important factor not explicitly discussed is the fact that a blank node "variable" limits the results to blank node resources. A normal variable uses all available values (named resources, blank node resources, and literals).
Also, a blank node "variable" cannot be used in functions like BIND, etc. or as a column in results.