Simplify SPARQL query - sparql

I’m trying to make a rather complex call to DBPedia using a SPARQL query. I’d like to get some information about a city (district, federal state/»Bundesland«, postal codes, coordinates and geographically related cities).
Try online!
SELECT * WHERE {
#input
?x rdfs:label "Bentzin"#de.
#district
OPTIONAL {
?x dbpedia-owl:district ?district
# ?x dbpprop:landkreis ?district
{ SELECT * WHERE {
?district rdfs:label ?districtName
FILTER(lang(?districtName) = "de")
?district dbpprop:capital ?districtCapital
{ SELECT * WHERE {
?districtCapital rdfs:label ?districtCapitalName
FILTER(lang(?districtCapitalName) = "de")
}}
}}
}
#federal state
OPTIONAL {
# ?x dbpprop:bundesland ?land
?x dbpedia-owl:federalState ?land
{ SELECT * WHERE {
?land rdfs:label ?landName
FILTER(lang(?landName) = "de")
}}
}
#postal codes
?x dbpedia-owl:postalCode ?zip.
#coordinates
?x geo:lat ?lat.
?x geo:long ?long
#cities in the south
OPTIONAL {
?x dbpprop:south ?south
{SELECT * WHERE {
?south rdfs:label ?southName
FILTER(lang(?southName) = "de")
}}
}
#cities in the north
OPTIONAL {
?x dbpprop:north ?north
{ SELECT * WHERE {
?north rdfs:label ?northName
FILTER(lang(?northName) = "de")
}}
}
#cities in the west
...
}
This works in some cases, however, there are a few major problems.
There are several different properties that may contain the value for the federal state or district. Sometimes it’s dbpprop:landkreis (the german word for district, in other cases it’s dbpedia-owl:district. Is it possible to combine those two in cases where only one of them is set?
Further, I’d like to read out the names of the cities in the north, northwest, …. Sometimes, these cities are referenced in dbpprop:north etc. The basic query for each direction is the same:
OPTIONAL {
?x dbpprop:north ?north
{ SELECT * WHERE {
?north rdfs:label ?northName
FILTER(lang(?northName) = "de")
}}
}
I really don’t want to repeat that eight times for every direction, is there any way to simplify this?
Sometimes, there are multiple other cities referenced (example). In those cases, there are multiple datasets returned. Is there any possibility to get a list of the names of those cities in a single dataset instead?
+---+---+---------------------------------------------------------------+
| x | … | southName |
+---+---+---------------------------------------------------------------+
| … | … | "Darmstadt"#de, "Stuttgart"#de, "Karlsruhe"#de, "Mannheim"#de |
+---+---+---------------------------------------------------------------+
Your feedback and your ideas are greatly appreciated!
Till

There are several different properties that may contain the value for the federal state or district. Sometimes it’s dbpprop:landkreis (the
german word for district, in other cases it’s dbpedia-owl:district. Is
it possible to combine those two in cases where only one of them is
set?
SPARQL property paths are great for this. You can just say
?subject dbprop:landkreis|dbpedia-owl:district ?district
If there are more properties, you'll probably prefer a version with values:
values ?districtProperty { dbprop:landkreis dbpedia-owl:district }
?subject ?districtProperty ?district
Further, I’d like to read out the names of the cities in the north,
northwest, …. Sometimes, these cities are referenced in dbpprop:north
etc. The basic query for each direction is the same:
OPTIONAL {
?x dbpprop:north ?north
{ SELECT * WHERE {
?north rdfs:label ?northName
FILTER(lang(?northName) = "de")
}}
}
Again, it's values to the rescue. Also, don't use lang(…) = … to filter languages, use langMatches:
optional {
values ?directionProp { dbpprop:north
#-- ...
dbpprop:south }
?subject ?directionProp ?direction
optional {
?direction rdfs:label ?directionLabel
filter langMatches(lang(?directionLabel),"de")
}
}
Sometimes, there are multiple other cities referenced (example). In
those cases, there are multiple datasets returned. Is there any
possibility to get a list of the names of those cities in a single
dataset instead?
+---+---+---------------------------------------------------------------+
| x | … | southName |
+---+---+---------------------------------------------------------------+
| … | … | "Darmstadt"#de, "Stuttgart"#de, "Karlsruhe"#de, "Mannheim"#de |
+---+---+---------------------------------------------------------------+
That's what group by and group_concat are for. See Aggregating results from SPARQL query. I don't actually see these results in the query you gave though, so I don't have good data to test a result with.
You also seem to be doing a lot of unnecessary subselects. You can just put additional triples in the graph pattern; you don't need a nested query to get additional information.
With those considerations, your query becomes:
select * where {
?x rdfs:label "Bentzin"#de ;
dbpedia-owl:postalCode ?zip ;
geo:lat ?lat ;
geo:long ?long
#-- district
optional {
?x dbpedia-owl:district|dbpprop:landkreis ?district .
?district rdfs:label ?districtName
filter langMatches(lang(?districtName),"de")
optional {
?district dbpprop:capital ?districtCapital .
?districtCapital rdfs:label ?districtCapitalName
filter langMatches(lang(?districtCapitalName),"de")
}
}
#federal state
optional {
?x dbpprop:bundesland|dbpedia-owl:federalState ?land .
?land rdfs:label ?landName
filter langMatches(lang(?landName),"de")
}
values ?directionProp { dbpprop:south dbpprop:north }
optional {
?x ?directionProp ?directionPlace .
?directionPlace rdfs:label ?directionName
filter langMatches(lang(?directionName),"de")
}
}
SPARQL results
Now, if you're just looking for the names of these things, without the associated URIs, you can actually use property paths to shorten a lot of the results that retrieve labels. E.g.:
select * where {
?x rdfs:label "Bentzin"#de ;
dbpedia-owl:postalCode ?zip ;
geo:lat ?lat ;
geo:long ?long
#-- district
optional {
?x (dbpedia-owl:district|dbpprop:landkreis)/rdfs:label ?districtName
filter langMatches(lang(?districtName),"de")
optional {
?district dbpprop:capital/rdfs:label ?districtCapitalName
filter langMatches(lang(?districtCapitalName),"de")
}
}
#-- federal state
optional {
?x (dbpprop:bundesland|dbpedia-owl:federalState)/rdfs:label ?landName
filter langMatches(lang(?landName),"de")
}
optional {
values ?directionProp { dbpprop:south dbpprop:north }
?x ?directionProp ?directionPlace .
?directionPlace rdfs:label ?directionName
filter langMatches(lang(?directionName),"de")
}
}
SPARQL results

Related

sparql check for existing of a property and give zero to the answer

This is my minimum data:
#prefix : <http://example.org/rs#>
:item :hasContext [:weight 0.1 ; :doNotRecommend true] , [:weight 0.2 ] .
:anotherItem :hasContext [:weight 0.4] , [ :weight 0.5 ] .
as you see, each item has one or more hasContext , the object of that hasContext is an instance that could have a doNotRecommed predicate.
What I want is that if one of these instances (that are object of a hasContext) contains the donNotRecommed, i want the whole sum to be zero. ** and by sum I mean the sum of the weight**, so in other words, if that property exist, ignore all the weights (either they were there or not), just put zero
My query
select ?item (SUM(?finalWeight) as ?summedFinalWeight) {
?item :hasContext ?context .
optional
{
?context :doNotRecommend true .
bind( 0 as ?cutWeight)
}
optional
{
?context :weight ?weight .
}
bind ( if(bound(?cutWeight), ?cutWeight , if(bound(?weight), ?weight, 0.1) ) as ?finalWeight )
}
group by ?item
The result
look at the value for :item, it is 0.2 (i know the reason, it is because of 0.2 plus zero (and this zero is because the doNotRecommend is there) but i dont' know the solution, what I want is to have zero in the case of :item
(hint, i know that i can always run another query in an upper level of this query and solve it or i can solve it using filter not exist but i am looking to solve it in the same query, because what i should u is a minimal data, while in my ontology, getting that weight and these objects is a very long query
Update 1
This is my real query, the first part (before the union) checks if the users confirms to a context, the second part (after the union) checks if the user don't conform to a context and here i want to check if that context has a doNotRecommendOrNot . Please be sure that it is imporisslbe that two parts validate together
SELECT ?item (SUM(?finalWeightFinal) AS ?userContextWeight)
WHERE
{ VALUES ?user { bo:ania }
?item rdf:type rs:RecommendableClass
OPTIONAL
{ { FILTER EXISTS { ?item rdf:type ?itemClass }
?item rdf:type rs:RecommendableClass .
?userContext rdf:type rs:UserContext ;
rs:appliedOnItems ?itemClass ;
rs:appliedOnUsers ?userClass
FILTER EXISTS { ?user rdf:type ?userClass }
OPTIONAL
{ ?userContext rs:hasWeightIfContextMatched ?weight }
BIND(if(bound(?weight), ?weight, 0.2) AS ?finalWeight)
}
UNION
{ ?item rdf:type rs:RecommendableClass .
?userContext rdf:type rs:UserContext ;
rs:appliedOnItems ?itemClass ;
rs:appliedOnUsers ?userClass
FILTER EXISTS { ?item rdf:type ?itemClass }
FILTER NOT EXISTS { ?user rdf:type ?userClass }
OPTIONAL
#Here is the skip
{ ?userContext rs:doNotRecommendInCaseNotMatch true
BIND(0 AS ?skip)
}
OPTIONAL
{ ?userContext rs:hasWeightIfContextDoesNotMatch ?weight }
BIND(if(bound(?weight), ?weight, 0.1) AS ?finalWeight)
}
}
BIND(if(bound(?finalWeight), ?finalWeight, 1) AS ?finalWeightFinal)
}
GROUP BY ?item
Update 2
After the appreciate answer of #Joshua Taylor, I tried to applied his approach in the real case, but this time with adding filter !bound(?skip)
Here is the query
SELECT ?item ?itemClass ?userContext ?skip ?finalWeight
WHERE
{ #{
in this block i just select the items that i want to calculate the user context to.
} #
OPTIONAL
{ FILTER EXISTS { ?item rdf:type ?itemClass }
?userContext rdf:type rs:UserContext ;
rs:appliedOnItems ?itemClass ;
rs:appliedOnUsers ?userClass
OPTIONAL
{ ?userContext rs:hasWeightIfContextMatched ?weightMatched }
OPTIONAL
{ ?userContext rs:hasWeightIfContextDoesNotMatch ?weightNotMatched }
OPTIONAL
{ ?userContext rs:doNotRecommendInCaseNotMatch true
BIND(1 AS ?skip)
}
BIND(if(EXISTS { ?user rdf:type ?userClass }, coalesce(?weightMatched, "default User Matched"), coalesce(?weightNotMatched, "default User not matched")) AS ?weight)
}
BIND(if(bound(?weight), ?weight, "no user context found for this item") AS ?finalWeight)
FILTER ( ! bound(?skip) )
}
It works with the data that I have but I just have a test data right now so i want to ask you if it is correct
update 3
my query generates these fields:
item skip ...
and the filter removes the rows that does have a binding for skip, but let's say that an item has two rows, like this:
item skip
A 1
A
A
so in my case i will just remove the first row, i need to know if i can remove the all rows for that item please.
There are lots of ways to do this; here's one that gets each item's sum weight, and then checks whether the item has a do not recommend flag, and if it does, uses 0 as the total weight:
select ?item (if(bound(?skip), 0.0, ?sumWeight_) as ?sumWeight) {
{ select ?item (sum(?weight) as ?sumWeight_) where {
?item :hasContext/:weight ?weight .
}
group by ?item
}
bind(exists { ?item :hasContext/:doNotRecommend true } as ?skip)
}
----------------------------
| item | sumWeight |
============================
| :item | 0.0 |
| :anotherItem | 0.0 |
----------------------------
Conceptually, this query checks once for each item whether any of its contexts mark it as non-recommendable. I think that's relatively efficient.
On bind(exists { … } as ?skip)
Note that combination of bind and exists. You already know how bind works, as you've used it plenty of times. bind(expr as ?variable) evaluates the expression expr and assigns it to the variable ?variable. You'd probably used exists and (not exists) in filter expressions before. exists { … } is true if the pattern inside the braces matches in the graph, and false otherwise. not exists { … } is similar, but reversed. The pattern
?item :hasContext/:doNotRecommend true
is just shorthand, using a property path, for the pattern:
?item :hasContext ?something .
?something :doNotrecommend true .
In this case, if that pattern exists, then we want to skip the item's sum weight and use zero instead.
Alternative
If you're willing to compute the sum for all the items, and then exclude those that have at least non-recommendable context, you can do that, too. The trick is just to figure out how to count the number of skips:
select ?item (sum(?weight_) as ?weight){
?item :hasContext ?context .
?context :weight ?weight_ .
bind(exists { ?context :doNotRecommend true } as ?skip)
}
group by ?item
having (sum(if(?skip,1,0)) = 0)
Considerations
You mentioned that
i know that i can always run another query in an upper level of this
query and solve it or i can solve it using filter not exist but i am
looking to solve it in the same query, because what i should u is a
minimal data, while in my ontology, getting that weight and these
objects is a very long query
The solution above computes the sum weights first, and then decides which to use and which to discard. That means that there's some unnecessary computation. Your solution does something similar: it computes weights for contexts that don't have the :doNotRecommend property, even if some other context for the same item has a doNotRecommend property. If you really want to avoid the unnecessary computation, then you should figure out which items are recommendable first, and then compute scores for those, and figure out which items are not recommendable, and just return zero for those.
It's easy to get a list of which items are which: something like
select distinct ?item ?skip {
?item :hasContext ?anything .
bind(exists{ :hasContext/:doNotRecommend true} as ?skip)
}
will do it just fine. However, since you'd want to do different things with the skippable and the non-skippable values, and that would probably take the form of a union of the two alternatives, and then you have the problem that you'd have to repeat the same subquery in each one. (Or use exists in one and not exists in the other, which is essentially repeating the same query.) It would get ugly pretty quickly. It might look something like this:
select ?item ?weight {
{
#-- get non recommendable items and
#-- set their weights to 0.0.
select distinct ?item (0.0 as ?weight) {
?item :hasContext/:doNotRecommend true #-- (*)
}
}
union
{
#-- get recommendable items and
#-- their aggregate weights
select ?item (sum(?weight_) as ?weight) {
#-- find the recommendable items
{ select distinct ?item {
?item :hasContext ?x .
filter not exists { ?item :hasContext/:doNotRecommend true } #-- (*)
}
}
#-- and get their context's weights.
?item :hasContext/:weight ?weight_
}
group by ?item
}
}
-------------------------
| item | weight |
=========================
| :item | 0.0 |
| :anotherItem | 0.9 |
-------------------------
The problem, in my opinion, is that the lines marked with (*) are really doing the same thing. The other computation doesn't happen multiple times, which is good, but we're still checking twice for each item whether it's recommendable or not.

Query for resources using their URIs

I have a bunch of resource URIs, and I need the property values related to each of them. For a single resource, say <http://my.url/res#resourceUri>, I can write this query:
PREFIX v: <http://my.url/res#>
SELECT ?name
WHERE {
<http://my.url/res#resourceUri> a v:t;
rdfs:label ?name .
}
For multiple resources, I can use UNION, like this:
PREFIX v: <http://my.url/res#>
SELECT ?name
WHERE {
{ <http://my.url/res#resourceUri> a v:t; rdfs:label ?name } UNION
{ <http://my.url/res#anotherResource> a v:t; rdfs:label ?name }
}
Is there a way to write a shorter, leaner version of this second query?
You can use values for this. Your example would be written as
PREFIX v: <http://my.url/res#>
SELECT ?resource ?name WHERE {
values ?resource { <http://my.url/res#resourceUri>
<http://my.url/res#anotherResource> }
?resource a v:t;
rdfs:label ?name
}
The question is different, but the answer to how to use Union/or in sparql path with arbitrary length? is similar.

SPARQL different operator

SPARQL Query
I have some SPARQL query shown below:
SELECT DISTINCT ?name1
WHERE {
GRAPH <blabla>
{
?k swrc:author ?x .
?x foaf:name ?name1 .
} .
GRAPH <blabla2>
{
?l swrc:author ?y .
?y foaf:name ?name2 .
} .
FILTER(?x != ?y) .
}
I want to get the names that exist only in the first graph blabla.
Problem
Counter intuitively I get some names that actually belong to the intersection. This happens because b (of set A) = b (of set B)?
Question
What are exactly the semantics of != ? How can I surpass this problem?
The semantics of != are exactly that its left argument is not equal to its right argument. But a FILTER is evaluated for every possible combination of values - so the query as you formulated it will return all name-values of ?x for which some value of ?y is not equal to it.
If you want to get back only name-values of ?x for which all values of ?y are not equal to it, you should be using a NOT EXISTS clause:
SELECT DISTINCT ?name1
WHERE {
GRAPH <blabla>
{
?k swrc:author ?x.
?x foaf:name ?name1.
}
FILTER NOT EXISTS {
GRAPH <blabla2>
{
?l swrc:author ?x.
}
}
}
Note that using this approach you can actually get rid of variable ?y altogether: you change the condition to just check that author ?x as found in the first graph does not also occur in the second graph.

retrieve list of mathematics categories from dbpedia?

Is there a way using SPARQL to retrieve all topics of in dpbedia?
http://dbpedia.org/snorql/
That is to say is there a way to extract all the subfields of the topics listed here:
http://en.wikipedia.org/wiki/Lists_of_mathematics_topics
The broad topics are lists here: http://dbpedia.org/page/Category:Fields_of_mathematics
I would like a list which shows the parent class and its subfield.
question 1:
depends on how you define topic....
you can query for instance for skos:Concept:
SELECT ?con
WHERE {
?con a skos:Concept
}
limit 1000
see result
question 2:
you can query for skos:broader properties, like:
SELECT ?parent (?label as ?sub)
WHERE {
{
?sub skos:broader <http://dbpedia.org/resource/Category:Fields_of_mathematics> .
?sub rdfs:label ?label .
} UNION {
<http://dbpedia.org/resource/Category:Fields_of_mathematics> rdfs:label ?parent
}
}
see results
retrieve a list of the next level of sub-fields of the above fields with:
SELECT ?parent ?sub ?subsub
WHERE {
{
?sub skos:broader <http://dbpedia.org/resource/Category:Fields_of_mathematics> .
OPTIONAL {?subsub dcterms:subject ?sub}
} UNION {
<http://dbpedia.org/resource/Category:Fields_of_mathematics> rdfs:label ?parent
}
}
see results

extract city data from dbpedia or LinkedGeoData

I'm trying now for a couple of hours to figure out how to get various informations out of dbpedia or LinkedGeoData.
I used this interface (http://dbpedia.org/snorql) and tried a different approaches, but I never got the result that I need.
If I use something lik this:
SELECT * WHERE {
?subject rdf:type <http://dbpedia.org/ontology/City>.
OPTIONAL {
?subject <http://dbpedia.org/ontology/populationTotal> ?populationTotal.
}
OPTIONAL {
?subject <http://dbpedia.org/ontology/populationUrban> ?populationUrban.
}
OPTIONAL {
?subject <http://dbpedia.org/ontology/areaTotal> ?areaTotal.
}
OPTIONAL {
?subject <http://dbpedia.org/ontology/populationUrbanDensity> ?populationUrbanDensity.
}
OPTIONAL {
?subject <http://dbpedia.org/ontology/isPartOf> ?isPartOf.
}
OPTIONAL {
?subject <http://dbpedia.org/ontology/country> ?country.
}
OPTIONAL {
?subject <http://dbpedia.org/ontology/utcOffset> ?utcOffset.
}
OPTIONAL {
?subject <http://dbpedia.org/property/janHighC> ?utcOffset.
}
OPTIONAL {
?subject <http://dbpedia.org/property/janLowC> ?utcOffset.
}
}
LIMIT 20
I run out of limits.
I also tried this:
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX foaf: <http://xmlns.com/foaf/0.1/>
SELECT * WHERE {
?subject rdf:type <http://dbpedia.org/ontology/City>.
?subject rdfs:label ?label.
FILTER ( lang(?label) = 'en'
}
LIMIT 100
But that give me en error, which I don't understand. If I remove the FILTER, it works but give me the labels in all languages...
What I'm looking for is something like this http://dbpedia.org/page/Vancouver
But not all the data, but some of it like population, area, coutry, elevation, lat, long, timezone, label#en, abstract#en etc.
Can someone help me to get working syntax?
Thanks for y'all help.
UPDATE:
I got it to work so far with:
SELECT DISTINCT *
WHERE {
?city rdf:type dbpedia-owl:Settlement ;
rdfs:label ?label;
dbpedia-owl:abstract ?abstract ;
dbpedia-owl:populationTotal ?pop ;
dbpedia-owl:country ?country ;
dbpprop:website ?website .
FILTER ( lang(?abstract) = 'en' && lang(?label) = 'en')
}
LIMIT 20
But still running out of limits if I want to get all settlements. Btw. is there a way to get all cities and settlements in one table?
By "run out of limits", do you mean the error "Bandwidth Limit Exceeded URI = '/!sparql/'"? I guess this is a limit set by dbpedia to make sure that it is not flooded with queries that take "forever" to run, and if so, then there is probably not so much you can do. You can ask for data in chunks, using OFFSET, LIMIT and ORDER BY, see http://www.w3.org/TR/rdf-sparql-query/#modOffset.
UPDATE: Yes, this seems to be the way to go: http://www.mail-archive.com/dbpedia-discussion#lists.sourceforge.net/msg03368.html
In the second query the error is a missing parenthesis. This
FILTER ( lang(?label) = 'en'
should be
FILTER ( lang(?label) = 'en')
For your last question, a natural way to collect multiple things/(similiar queries) in one query/table is using UNION, e.g.,
SELECT ?x
WHERE {
{ ?x rdf:type dbpedia-owl:City }
UNION
{ ?x rdf:type dbpedia-owl:Settlement }
}