SPARQL CONSTRUCT query specific hierarchy pattern - sparql

The construct query is supposed to reveal a specific hierarchy. Starting from my leaf object a component of a machine (going upwards) until my root object (company).
BUT, what it does is, if I start at a leaf node, iterate to next, then all patterns "?prev skos:broader ?next" are allowed. Which means I have a leaf, go up one level (machine) and next to my start leaf, I see multiple other leafs (other machine components) which are valid but not wanted.
construct {
?start a :Start . #start node
?prev # declare the previous variable
skos:broader ?next ; # hierarchy iteration
a ?prevType ; # return type
rdfs:label ?prevName ; #label of the asserted node
.
?next
a ?nextType ;
rdfs:label ?nextName ;
.
}
WHERE
{
GRAPH (named graph)
{
values ?start { <IRI> } #leaf node
?start skos:broader+ ?next .
?prev
skos:broader ?next ;
a ?prevType ;
rdfs:label ?prevName ;
.
?next
a ?nextType ;
rdfs:label ?nextName ;
bind(localname(?prevType) as ?prevTypeName)
bind(localname(?nextType) as ?nextTypeName)
}
Pic1: Problem of tripples at the same level
So, at the end of my WHERE, I tried to add an EXIST filter. The purpose is to only filter for patterns that start at my start node and disregard all others aside from that direct path.
This query shows what I want BUT skips the first skos:borader relation. So I have my leave node (component) (rdf:type Start) but then there is a missing "skos:borader" to my next node in the hierarchy (machine). But from there all other hierarchies (up to company) are returned correctly. It's just that first hop
Big question, how do I declare my start right, so the first skos:borader to my L2 node is correctly asserted?
WHERE
{
GRAPH (named graph)
{
values ?start { <IRI> } #leaf node
?start skos:broader+ ?next .
?prev
skos:broader ?next ;
a ?prevType ;
rdfs:label ?prevName ;
.
?next
a ?nextType ;
rdfs:label ?nextName ;
bind(localname(?prevType) as ?prevTypeName)
bind(localname(?nextType) as ?nextTypeName)
}
#PRUPOSE: from all valid skos:borader defined in the construct part abouve -> filter out only the direct paths at each level which derive from ?start directly
Filter EXISTS {
?start skos:broader+ ?prev .
?prev skos:broader ?next }
}
Pic2: Problem of missing first skos:broader hop

I'm not exactly sure if I understand your problem correctly. But it seems to me that if you want to go 'up' and 'down', from your starting point, the problem might be caused by the
?prev
skos:broader ?next ;
Maybe if you try something like
?start ^skos:broader ?prev
you will not get the undesired results?
Hope I'm not pointing you in the wrong direction.

Related

SPARQL - extract subset of graph with filter conditions

I am trying to extract a subset of a graph with a multi-hierarchy.
Example Structure:
root
/ \
SubWant subNotWant
/ \ \
childWant childWant gCWant
/ \
gCWant gCNotWant
/
gGCWant
the subWant subtree is what I would like to get (assume all {'/','\'} or broader/narrower relationships)
My current query is:
CONSTRUCT
{
?children a skos:Concept ;
skos:broader ?parent ;
skos:prefLabel ?childPrefLiteral .
}
WHERE {
?children skos:broader+ <http://example.com#subWant> ;
skos:prefLabel ?childPrefLiteral .
filter(STRBEFORE(?childPrefLiteral, " ") NOT IN (<criteria to prune by>) )
?children skos:broader ?parent ;
# filter parent to only have transitive broader relation to desired concept
?parent skos:broader+ <http://example.com#subWant>
}
This query almost works except that it will not include direct children of the subWant node. Any recommendations on how to elegantly accomplish this? Note the reason for the last triple ?parent skos:broader <http://example.com#subWant> is necessary as nodes have multiple parents (multi-hierarchy) - see gCWant node in sample graph.
Try changing
?parent skos:broader+ <http://example.com#subWant>
into
?parent skos:broader* <http://example.com#subWant>
The difference between "*" and "+" is that "*" also matches results where ?parent is <http://example.com#subWant>.

Multiple SELECT query for getting all children of a given root

I want to define the root categories corresponding to interests of users. Then I need to return all other potential interests under given root directory.
I tried the following query, but it looks like it enters into a loop (the query is executing internally).
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX owl: <http://www.myweb.com/myontology.owl#>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
SELECT ?user ?othercat
WHERE
{
?othercat rdfs:subClassOf ?root .
{
SELECT ?user ?retailcat ?root
WHERE {
?user rdf:type owl:User .
?user owl:hasUserProfile ?userprofile .
?userprofile rdf:type owl:UserProfile .
?userprofile owl:interestedIn ?retailcat .
?entity rdf:type ?type .
?type rdfs:subClassOf* ?retailcat .
?retailcat rdfs:subClassOf ?root .
}
}
}
Indeed when I execute the sub-query, it works fine, but it returns current interests of a user, without providing the information of other child-concepts of the same root.
How to solve this issue?
I tried the following query, but it looks like it enters into a loop (the query is executing internally).
There shouldn't be a way for a SPARQL query to enter into a loop. Subqueries are executed first, and then the enclosing query is executed. There's no way to re-execute the inner part or anything like that. (Of course, a query could be expensive and take a long time, but that's not a loop, and is still bounded, in principle.)
As an aside, using owl: as prefix for something other than the standard OWL namespace is somewhat confusing, and likely to mislead other developers when they see your query. There's nothing incorrect about it per se, but you might want to consider using a different prefix for your namespace.
You do have one part of your query that could make things rather expensive. You have the pattern
?entity rdf:type ?type .
?type rdfs:subClassOf* ?retailcat .
where ?entity isn't connected to anything else, and isn't used anywhere else. That means that you'll have a subquery solution for every possible value of ?entity, and that just means you're multiplying the number of results by the number of possible values of ?entity.
If I understand your query, you're trying to go from a user's categories of interest, up the category tree to some root concepts and then find other categories under that root. You don't actually need a subquery for that; you can do it with a non-nested query:
select ?user ?othercat {
?user rdf:type owl:User .
?user owl:hasUserProfile ?userprofile .
?userprofile rdf:type owl:UserProfile .
?userprofile owl:interestedIn ?retailcat .
?retailcat rdfs:subClassOf ?root .
?othercat rdfs:subClassOf ?root .
}
That will find values of ?othercat that are siblings of ?retailcat, along with ?retailcat itself. If you want to avoid ?retailcat, you can add
filter(?othercat != ?retailcat)
but that shouldn't really impact performance much; that's just one result to filter out.
The only other factor that you might want to consider is that you're not really finding a "root" of the category tree with rdfs:subClassOf; you're just going up one level. E.g., if your category tree looks like
Automobile
SUV
SportsCar
Corvette
Mustang
and a user is interested in Mustang, then you'll go up to SportsCar and find Corvette, but you won't be going any farther up the tree. (If you have inference available, you may actually go farther up the tree, but I'm assuming for the moment that you don't.) To follow the subclass links up the tree, you can add * to the path to follow the chain:
?retailcat rdfs:subClassOf* ?root .
?othercat rdfs:subClassOf ?root .
Then you'd get all the classes in the tree (except the very top level one, Automobile).

using bind concat in construct query

I have the following query
CONSTRUCT{
?entity a something;
a label ?label .
}
WHERE
{
?entity a something;
a label ?label .
BIND(CONCAT(STR( ?label ), " | SOME ADDITIONAL TEXT I WOULD LIKE TO APPEND MANUALLY") ) AS ?label ) .
}
I simply want to concatenate some text with ?label, however when running the query I get the following error:
BIND clause alias '?label' was previously used
I only want to return a single instance of ?label hence, I defined it in the construct clause.
The error message seems to be accurate, but is only the first of many you will get with this query. The usual request to take a look at some SPARQL learning resources to at least understand the basics of triple-based graph pattern matching, along with, a couple of hints one what to look for. CONSTRUCT isn't a bad place to start, and the following should almost do what I think you intend:
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
CONSTRUCT{
?entity rdfs:label ?label .
}
WHERE
{
?entity a ex:something ;
rdfs:label ?oldlabel .
BIND(CONCAT(STR( ?oldlabel ), " | SOME ADDITIONAL TEXT I WOULD LIKE TO APPEND MANUALLY") ) AS ?label ) .
}
There's quite a few things different about that query, so take a look to see if it accurately does what you want. One hint is the syntactic difference between using '.' and ';' to separate the triple patterns. Another is that each clause defines either a URL, using a qname in the example, or a variable, prefixed by a '?'. Neither 'label' or 'something' are valid.
I say "almost" because CONSTRUCT only returns a set of triples. To modify the labels, which I think is the intent, you need to use SPARQL Update, i.e.:
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX ex: <http://example.org/example#>
DELETE {
?entity rdfs:label ?oldlabel .
}
INSERT{
?entity rdfs:label ?label .
}
WHERE
{
?entity a ex:something .
?entity rdfs:label ?oldlabel .
BIND(CONCAT(STR( ?oldlabel ), " | SOME ADDITIONAL TEXT I WOULD LIKE TO APPEND MANUALLY") AS ?label ) .
}
Note how the triple pattern finds matches for ?oldlabel and deletes them, inserting the newly bound ?label instead. This query assumes a default graph is defined that holds both the original data and the target for updates. If not then the graph needs to be specified using WITH or GRAPH. (Also included another hint on the syntactic difference between using '.' and ';' to separate triple patterns.)

Sparql to recover the Type of a DBpedia resource

I need a Sparql query to recover the Type of a specific DBpedia resource. Eg.:
pt.DBpedia resource: http://pt.dbpedia.org/resource/Argentina
Expected type: Country (as can be seen at http://pt.dbpedia.org/page/Argentina)
Using pt.DBpedia Sparql Virtuoso Interface (http://pt.dbpedia.org/sparql) I have the query below:
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
select ?l ?t where {
?l rdfs:label "Argentina"#pt .
?l rdf:type ?t .
}
But it is not recovering anything, just print the variable names. The virtuoso answer.
Actually I do not need to recover the label (?l) too.
Anyone can fix it, or help me to define the correct query?
http in graph name
I'm not sure how you generated your query string, but when I copy and paste your query into the endpoint and run it, I get results, and the resulting URL looks like:
http://pt.dbpedia.org/sparql?default-graph-uri=http%3A%2F%2Fpt.dbpedia.org&sho...
However, the link in your question is:
http://pt.dbpedia.org/sparql?default-graph-uri=pt.dbpedia.org%2F&should-sponge...
If you look carefully, you'll see that the default-graph-uri parameters are different:
yours: pt.dbpedia.org%2F
mine: http%3A%2F%2Fpt.dbpedia.org
I'm not sure how you got a URL like the one you did, but it's not right; the default-graph-uri needs to be http://pt.dbpedia.org, not pt.dbpedia.org/.
The query is fine
When I run the query you've provided at the endpoint you've linked to, I get the results that I'd expect. It's worth noting that the label here is the literal "Argentina"#pt, and that what you've called ?l is the individual, not the label. The individual ?l has the label "Argentina"#pt.
We can simplify your query a bit, using ?i instead of ?l (to suggest individual):
select ?i ?type where {
?i rdfs:label "Argentina"#pt ;
a ?type .
}
When I run this at the Portuguese endpoint, I get these results:
If you don't want the individual in the results, you don't have to select it:
select ?type where {
?i rdfs:label "Argentina"#pt ;
a ?type .
}
or even:
select ?type where {
[ rdfs:label "Argentina"#pt ; a ?type ]
}
If you know the identifier of the resource, and don't need to retrieve it by using its label, you can even just do:
select ?type where {
dbpedia-pt:Argentina a ?type
}
type
==========================================
http://www.w3.org/2002/07/owl#Thing
http://www.opengis.net/gml/_Feature
http://dbpedia.org/ontology/Place
http://dbpedia.org/ontology/PopulatedPlace
http://dbpedia.org/ontology/Country
http://schema.org/Place
http://schema.org/Country

Limit a SPARQL query to one dataset

I'm working with the following SPARQL query, which is an example on the web-based end of my institution's SPARQL endpoint;
SELECT ?building_number ?name ?occupants WHERE {
?site a org:Site ;
rdfs:label "Highfield Campus" .
?building spacerel:within ?site ;
skos:notation ?building_number ;
rdfs:label ?name .
OPTIONAL {
?building soton:buildingOccupants ?occ .
?occ rdfs:label ?occupants .
} .
} ORDER BY ?name
The problem is that as well as getting data from 'Buildings and Places', the Dataset I'm interested in, and would expect the example to use, it also gets data from the 'Facilities and Equipment' dataset, which isn't relevant. You should see this if you follow the link.
I suspect the example may pre-date the addition of the Facilities and Equipment dataset, but even with the research I've done into SPARQL, I can't see a clear way to define which datasets to include.
Can anyone recommend a starting point to limit it to just show 'Buildings', or, more specifically, results from the 'Buildings and Places' dataset.
Thanks
First things first, you really need to use SELECT DISTINCT, as otherwise you'll get repeated results.
To answer your question, you can use GRAPH { ... } to filter certain parts of a SPARQL query to only match data from a specific dataset. This only works if the SPARQL endpoint is divided up into GRAPHs (this one is). The solution you asked for isn't the best choice, as it assumes that things within sites in the 'places' dataset will always be resticted to buildings... That's risky -- as it might end up containing trees and signposts at some time in the future.
Step one is to just find out what graphs are in play:
SELECT DISTINCT ?g1 ?building_number ?name ?occupants WHERE {
?site a org:Site ;
rdfs:label "Highfield Campus" .
GRAPH ?g1 { ?building spacerel:within ?site ;
skos:notation ?building_number ;
rdfs:label ?name .
}
OPTIONAL {
?building soton:buildingOccupants ?occ .
?occ rdfs:label ?occupants .
} .
} ORDER BY ?name
Try it here: http://is.gd/WdRAGX
From this you can see that http://id.southampton.ac.uk/dataset/places/latest and http://id.southampton.ac.uk/dataset/places/facilities are the two relevant ones.
To only look for things 'within' a site according to the "places" graph, use:
SELECT DISTINCT ?building_number ?name ?occupants WHERE {
?site a org:Site ;
rdfs:label "Highfield Campus" .
GRAPH <http://id.southampton.ac.uk/dataset/places/latest> {
?building spacerel:within ?site ;
skos:notation ?building_number ;
rdfs:label ?name .
}
OPTIONAL {
?building soton:buildingOccupants ?occ .
?occ rdfs:label ?occupants .
} .
} ORDER BY ?name
Alternate solutions:
Using rdf:type
Above I've answered your question, but it's not the answer to your problem. This solution is more semantic as it actually says 'only give me buildings within the campus' which is what you really mean.
Instead of filtering by graph, which is not very 'semantic' you could also restrict ?building to be of class 'building' which research facilities are not. They are still sometimes listed as 'within' a site. Usually when the uni has only published what campus they are on but not which building.
?building a rooms:Building
Using FILTER
In extreme cases you may not have data in different GRAPHS and there may not be an elegant relationship to use to filter your results. In this case you can use a FILTER and turn the building URI into a string and use a regular expression to match acceptable ones:
FILTER regex(str(?building), "^http://id.southampton.ac.uk/building/")
This is bar far the worst option and don't use it if you have to.
Belt and Braces
You can use any of these restictions together and a combination of restricting the GRAPH plus ensuring that all ?buildings really are buildings would be my recommended solution.