I am trying to build a (local) ontology that describes a finite number of objects, and that links these objects to external resources via the owl:sameAs predicate. However, when I simply query for the number of objects of that kind, I obtain twice as much as the object described. It is clear that also the external resources are counted independently, as what is taken into account is the number of URIs, and not the number of distinct objects.
I have solved this issue in the following way: I assume that the local ontology can be seen as a "reference hub" for knowing basic stuff about these objects, so I select all the objects of a certain kind, and then filter out only those that contain the base URI of the local ontology, i.e.:
# How many objects are there?
PREFIX ch: <http://www.example.com/ontologies/domain#>
SELECT (COUNT(DISTINCT ?elem) AS ?count) WHERE {
?elem a ch:Element.
FILTER (REGEX (STR(?elem) ,"http://www.example.com/ontologies/domain") ).
}
However, I have two concerns with this way of doing:
1) it looks a bit of a hack (even if somehow principled), whilst I would like something that makes more logical sense
2) I have the impression that this query is not very efficient.
I have searched quite a bit here, and on google, but didn't come out with any better solution... any suggestions here?
Thank you very much for any help!
GROUP BY a representative element
If there's some property that should have distinct values for each individual, then you can use it to impose the "equivalent class" structure that you need. E.g., something like this:
prefix ch: <http://www.example.com/ontologies/domain#>
select (count(?label) as ?count) where {
?elem a ch:element ;
rdfs:label ?label .
}
group by ?label
Synthesize a representative element
if there's not a value that will be shared by all elements in an equivalence class, you can still get a representative element from the set by asking for the minimal element in each equivalence class. We can use the IRIs of the elements to order the elements, and use that to select a unique individual. This does presume that each ?elem and all the things that it is the same as have well defined behavior under the str function (and IRIs do).
prefix ch: <http://www.example.com/ontologies/domain#>
select (count(distinct ?elem) as ?count) where {
?elem a ch:element .
filter not exists {
?elem (owl:sameAs|^owl:sameAs)* ?elem_
filter( str(?elem_) < str(?elem) )
}
}
Related
I want to write some reusable SPARQL queries to do things like take an IRI, get the name part (typically after the # sign), modify it (e.g., replace underscores with blank spaces) and put it in the rdfs:label property. This would be useful for Protege which doesn't fill in the rdfs:label if you use user defined IRIs. Or take an IRI with a user defined name, do the same and then replace the user defined IRI with a UUID. I looked in the SPARQL spec for functions to manipulate IRIs and either they don't exist or I'm missing something obvious. I'm posting this to make sure it isn't the latter. I know it is easy to do the equivalent with things like SUBSTR but I'm surprised that there aren't predefined operators to do things like getting the name part of an IRI and getting the base and want to double check before I roll my own.
In case anyone else wants to do this, I figured it out. There are some answers on this site but they are all for SQL or other languages than SPARQL. The following is for classes and it should be obvious how to adapt it for other entities. Note: this works in the Snap SPARQL Plugin for Protege (that's why I use CONSTRUCT rather than INSERT), however, there is a bug in their implementation of SUBSTR so that it uses 0 based indexing rather than 1 based as the spec says. So if you use this in Snap SPARQL change the 1 to a 2.
CONSTRUCT {?c rdfs:label ?lblname.}
WHERE {?c rdfs:subClassOf owl:Thing.
BIND(STRAFTER(STR(?c), '#') as ?name)
BIND(REPLACE(?name,"([A-Z])", " $1" ) as ?namewbs)
BIND (IF (STRSTARTS(?namewbs," "),SUBSTR(?namewbs,1),?namewbs) AS ?lblname)
FILTER(?c != owl:Thing || ?c != owl:Nothing)}
Suppose I have the following alternative property path:
select *
where{
?sub :p1|:p2 ?ob
}
How can I bind a third variable (?map) which keeps track of what path (:p1 or :p2) was taken for each match?
If this is impossible is there another way to enumerate paths?
I am intentionally avoiding expanding this predicate as a UNION because in real life my pattern includes multiple property paths and UNIONS would make the query ridiculously long.
edit: clarify in response to comments, I am not looking for graph traversal, just pairs of nodes with properties linking them
can just add the predicate variable explicitly:
select *
where{
?sub :p1|:p2 ?ob .
?sub ?pred ?ob .
}
Sandboxed demos:
answer
comment suggestion (equivalent)
I am currently trying to create pointers to datatype values as they cannot be linked directly. However, I would like to be able to evaluate the pointers from within the SPARQL environment, which raised specifically in the case that the desired value is part of an ordered rdf:List some questions for me. My approach is to use property paths within a SPARQL query in which I can use the defined individual, property and index of the ordered list that the pointer has attached to it.
Given the following example data with the shortened syntax for ordered lists by ttl:
ex:myObject ex:somePropery ("1" "2" "3") .
ex:myPointer ex:lookAtIndividual ex:myObject;
ex:lookAtProperty ex:someProperty ;
ex:lookAtIndex "3"^^xsd:integer .
Now I would like to create a SPARQL query that -- based on the pointer -- returns the value at the given index. To my understanding the query could/should look something like this:
SELECT ?value
WHERE {
ex:myPointer ex:lookAtIndividual ?individual ;
ex:lookAtProperty ?prop ;
ex:lookAtIndex ?index .
?individual ?prop/rdf:rest{?index-1}/rdf:first ?value .
}
But if I try to execute this query with TopBraid, it shows an error message that ?index has been found when <INTEGER> was expected. I also tried binding the index in the SPARQL query via BIND(?index-1 AS ?i), again without success. If the pointed value is not stored in a list, the query without property path works fine.
Is it in general possible to use a value that is connected via datatype property within a SPARQL query as path length for property paths?
This syntax: rdf:rest{<number>} is not standard SPARQL. So the short answer is, regrettably: no, you can't use variables as integers in SPARQL property paths, for the simple reason that you can't use integers in SPARQL property paths at all.
In an earlier draft of the SPARQL standard, there was a proposal to use this kind of syntax to allow specifying the min and max length of a property path, e.g. rdf:rest{1, 3} would match any paths using rdf:rest properties between length 1 and 3. But this was never fully standardized and most SPARQL engines don't implement it.
If you happen to use a SPARQL engine that does implement it, you will have to get in touch with the developers directly to ask if they can extend the mechanism to allow use of variables in this position (the error message suggests to me that it's currently just not possible).
As an aside: there's a SPARQL 1.2 community initiative going on. It only just got started but one of the proposals on the table is re-introducing this particular piece of functionality to the standard.
Is there a good kind of SPARQL query that let's me answer if two given nodes are connected on a single / multiple SPARQL endpoints?
Let's say i want to check if the two nodes
<http://wiktionary.dbpedia.org/resource/dog>
and
<http://dbpedia.org/resource/Dog>
are connected. If yes, i'd be interested in the path.
By guessing i already knew they were connected via the label, so a query like this returns a path of length 3:
SELECT * WHERE {
<http://wiktionary.dbpedia.org/resource/dog> ?p1 ?n1.
# SERVICE <http://dbpedia.org/sparql> {
<http://dbpedia.org/resource/Dog> ?p2 ?n1 .
# }
}
try yourself
Now what if i don't have an idea yet and want to do this automatically & for arbitrary length and direction?
I'm aware of SPARQL 1.1's property paths, but they only seem to work for known properties (http://www.w3.org/TR/sparql11-query/#propertypaths):
Variables can not be used as part of the path itself, only the ends.
Also i would want to allow for any path, so the predicates on the path may change.
My current (as i find ridiculous) approach is to query for all possible paths of length k up to a limit of n.
Dumping isn't an option for me as it's billions of triples... I want to use SPARQL!
While you can't use variables in property paths, you can use a wildcard by taking advantage of the fact that for any URI, every property either is that property or isn't. E.g., (<>|!<>) matches any property, since every property either is <> or isn't. You can make a wildcard that goes in either direction by alternating that with itself in the other direction: (<>|!<>)|^(<>|!<>). That means that there's a path, with properties going in either direction, between two nodes ?u and ?v when
?u ((<>|!<>)|^(<>|!<>))* ?v
For instance, the following query should return true (indicating that there is a path):
ASK {
<http://wiktionary.dbpedia.org/resource/dog> ((<>|!<>)|^(<>|!<>))* <http://dbpedia.org/resource/Dog>
}
Now, to actually get the links of a path between between two nodes, you can do (letting <wildcard> stand for the nasty looking wildcard):
?start <wildcard>* ?u .
?u ?p ?v .
?v <wildcard>* ?end .
Then ?u, ?p, and ?v give you all the edges on the path. Note that if there are multiple paths, you'll be getting all the edges from all the paths. Since your wildcards go in either direction, you can actually get to anything reachable from the ?start or ?end, so you really should consider restricting the wildcard somehow.
On the endpoint that you linked to, it doesn't, but that appears to be an issue with Virtuoso's implementation of property paths, rather than a problem with the actual query.
Do note that this will be trivially satisfied in many cases if you have any kind of inference happening. E.g., if you're using OWL, then every individual is an instance of owl:Thing, so there'd always be a path of the form:
?u →rdf:type owl:Thing ←rdf:type ?v
There is probably an easy to answer to this, but I can't even figure out how to formulate the Google query to find it.
I'm writing SPARQL construct queries against a dataset that includes blank nodes. So if I do a query like
CONSTRUCT {?x ?y ?z .}
WHERE {?x ?y ?z .}
Then one of my results might be:
nm:John nm:owns _:Node
Which is a problem if all of the
_:Node nm:has nm:Hats
triples don't also get into the query result somehow (because some parsers I'm using like rdflib for Python really don't like dangling bnodes).
Is there a way to write my original CONSTRUCT query to recursively add all triples attached to any bnode results such that no bnodes are left dangling in my new graph?
Recursion isn't possible. The closest I can think of is SPARQL 1.1 property paths (note: that version is out of date) but bnode tests aren't available (afaik).
You could just remove the statements with trailing bnodes:
CONSTRUCT {?x ?y ?z .} WHERE
{
?x ?y ?z .
FILTER (!isBlank(?z))
}
or try your luck fetching the next bit:
CONSTRUCT {?x ?y ?z . ?z ?w ?v } WHERE
{
?x ?y ?z .
OPTIONAL {
?z ?w ?v
FILTER (isBlank(?z) && !isBlank(?v))
}
}
(that last query is pretty punishing, btw)
You may be better off with DESCRIBE, which will often skip bnodes.
As user205512 suggests, performing that grab recursively is not possible, and as they point out, using optional(s) to go arbitrary levels down into your data getting the nodes is not feasible on anything but non-trivial sized databases.
Bnodes themselves are locally scoped, to the result set, or to the file. There's no guarantee that a BNode is you get from parsing or from a result set is the same id that is used in the database (though some database do guarantee this for query results). Furthermore, a query like "select ?s where { ?s ?p _:bnodeid1 }" is the same as "select ? where { ?s ?p ?o }" -- note that bnode is treated as a variable in that case, not as "the thing w/ the id 'bnodeid1'" This quirk of the design makes it difficult to query for bnodes, so if you are in control of the data, I'd suggest not using them. It's not hard to generate names for stuff that would otherwise be bnodes, and named resources v. bnodes will not increase overhead during querying.
That does not help you recurse down and grab data, but for that, I don't recommend doing such general queries; they don't scale well and usually return more than you want or need. I'd suggest you do more directed queries. Your original construct query will pull down the contents of the entire database, that's generally not what you want.
Lastly, while describe can be useful, there's not a standard implementation; the SPARQL spec doesn't define any particular behavior, so what it returns is left to the database vendor, and it can be different. That can make your code less portable if you plan on trying different databases with your application. If you want a specific behavior out of describe, you're best off implementing it yourself. Doing something like the concise bounded description for a resource is an easy piece of code, though you can run into some headaches around Bnodes.
With regard to working with the ruby RDF.rb library, which allows SPARQL queries with significant convenience methods on RDF::Graph objects, the following should expand blank nodes.
rdf_type = RDF::SCHEMA.Person # for example
rdf.query([nil, RDF.type, rdf_type]).each_subject do |subject|
g = RDF::Graph.new
rdf.query([subject, nil, nil]) do |s,p,o|
g << [s,p,o]
g << rdf_expand_blank_nodes(o) if o.node?
end
end
def rdf_expand_blank_nodes(object)
g = RDF::Graph.new
if object.node?
rdf.query([object, nil, nil]) do |s,p,o|
g << [s,p,o]
g << rdf_expand_blank_nodes(o) if o.node?
end
end
g
end