getting city information by sparql - sparql

I tried with the following SPARQL query.
SELECT distinct ?city ?cityName ?country ?population ?knownfor WHERE {
?city rdf:type dbo:City .
?city rdfs:label ?cityName.
?city dbo:country ?country.
OPTIONAL{
?city dbp:population ?population.
?city dbo:knownFor ?knownfor.
}
FILTER (lang(?cityName) = 'en')
} ORDER BY ?city
But, the problem is -
Not every city has dbp:population predicate but some city has dbp:populationTotal. So, for some cities we can get the population by this but when i write in the optional section of the query -
OPTIONAL{
?city dbp:population ?population .
?city dbp:populationTotal ?populationTotal
}
the both section become blank. Same goes for dbo:knownFor predicate (not every city have knownFor predicate).
How can I specify in the query that i want only the European city? I can not find any predicate which specify the continent of the city

First thing to know is that DBpedia data is a moving target, just like the Wikipedia data from which it is derived. Updates to Wikipedia will eventually be part of DBpedia. More quickly, they'll be part of DBpedia-live.
The issue with values for neither OPTIONAL predicate showing up when you include both predicates appears to be a bug in the Virtuoso version currently hosting DBpedia. I encourage you to check whether it's been reported yet, report it yourself if not, and monitor the issue.
As to limiting the continent of the cities you get back -- it's usually easiest to check an entity of (or near) the sort you want, to find a relevant attribute/predicate/property. For instance, Aachen-Mitte has a dbo:country of Germany, which has a number of rdf:types including yago:EuropeanCountries -- which might be what you want, but might not yet have been applied to all such. You'll need to add a triple to your pattern like --
?country a yago:EuropeanCountries
Edit To Add
The OPTIONAL { ... } clause returns results for the entire pattern enclosed within the braces. So --
OPTIONAL
{
?city dbp:population ?population .
?city dbo:knownFor ?knownfor .
}
-- will only return values for either predicate when that ?city has values for both predicates.
If you want to get every value for either predicate, you need to split that clause into two --
OPTIONAL
{
?city dbp:population ?population .
}
OPTIONAL
{
?city dbo:knownFor ?knownfor .
}
It's easy to get both dbp:population and dbp:populationTotal, with both OPTIONAL (and adding ?populationTotal to your SELECT list) --
SELECT DISTINCT ?city
?cityName
?country
?population
?populationTotal
?knownfor
...
OPTIONAL
{
?city dbp:population ?population .
}
OPTIONAL
{
?city dbp:populationTotal ?populationTotal .
}
OPTIONAL
{
?city dbo:knownFor ?knownfor .
}
If you only want one of the population values, and especially if you have a preference of one predicate over the other, the construction gets more complex (and should be a new question).

Related

How to better filter the sparql query output to avoid the Query timeout limit reached error?

I am getting the Query timeout limit reached error. Is there any way to maybe filter the output more? Thank you!
SELECT (count(distinct ?city) as ?count) WHERE {
?city wdt:P31/wdt:P279* wd:Q486972. # human settlement
?city wdt:P131 ?region.
?city wdt:P17 ?country.
#not a former country
FILTER NOT EXISTS {?country wdt:P31 wd:Q3024240}
#and not an ancient civilisation (needed to exclude ancient Egypt)
FILTER NOT EXISTS {?country wdt:P31 wd:Q28171280}
#not demolished, abolished countries etc.
FILTER NOT EXISTS {?country wdt:P576 ?abolished}
?article schema:about ?city.
?article schema:isPartOf <https://en.wikipedia.org/>.
}

Overcoming single value constraing issues with P625 coordinate location in Wikidata

I am trying to get a city list together with region and country information with a query like this:
# get a list of cities
# for geograpy3 library
# see https://github.com/somnathrakshit/geograpy3/issues/15
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX wd: <http://www.wikidata.org/entity/>
PREFIX wdt: <http://www.wikidata.org/prop/direct/>
PREFIX p: <http://www.wikidata.org/prop/>
PREFIX ps: <http://www.wikidata.org/prop/statement/>
PREFIX pq: <http://www.wikidata.org/prop/qualifier/>
# get human settlements
SELECT DISTINCT ?city ?cityLabel (max(?cityPop) as ?cityPopulation) ?coord ?region ?regionLabel ?regionIsoCode ?country ?countryLabel ?countryIsoCode ?countryPopulation ?countryGdpPerCapita WHERE {
# if you uncomment this line this query might run for some 3 hours on a local wikidata copy using Apache Jena
# run for Vienna, Illinois, Vienna Austria, Paris Texas and Paris France as example only
# VALUES ?city { wd:Q577544 wd:Q1741 wd:Q830149 wd:Q90}.
# run for Andorra
VALUES ?country {wd:Q228}.
# instance of human settlement https://www.wikidata.org/wiki/Q486972
?city wdt:P31/wdt:P279* wd:Q486972 .
# label of the City
?city rdfs:label ?cityLabel filter (lang(?cityLabel) = "en").
# country this city belongs to
?city wdt:P17 ?country .
# label for the country
?country rdfs:label ?countryLabel filter (lang(?countryLabel) = "en").
# https://www.wikidata.org/wiki/Property:P297 ISO 3166-1 alpha-2 code
?country wdt:P297 ?countryIsoCode.
# population of country
?country wdt:P1082 ?countryPopulation.
OPTIONAL {
?country wdt:P2132 ?countryGdpPerCapita.
}
OPTIONAL {
# located in administrative territory
# https://www.wikidata.org/wiki/Property:P131
?city wdt:P131* ?region.
# administrative unit of first order
?region wdt:P31/wdt:P279* wd:Q10864048.
?region rdfs:label ?regionLabel filter (lang(?regionLabel) = "en").
# isocode state/province
OPTIONAL { ?region wdt:P300 ?regionIsoCode. }
}
# population of city
OPTIONAL { ?city wdt:P1082 ?cityPop.}
# get the coordinates
OPTIONAL { ?city wdt:P625 ?coord. }
} GROUP BY ?city ?cityLabel ?coord ?region ?regionLabel ?regionIsoCode ?country ?countryLabel ?countryIsoCode ?countryPopulation ?countryGdpPerCapita
ORDER BY ?cityLabel
try it!
to experiment with the query i comment out the
# VALUES ?city { wd:Q577544 wd:Q1741 wd:Q830149 wd:Q90}.
# run for Andorra
VALUES ?country {wd:Q228}.
part to see that the results make sense.
Now for The Andorra trial there are cities with multiple coordinates:
https://www.wikidata.org/wiki/Property:P625
Which are event flagged as a problem.
I know there is work-around as explained in How to get only the most recent value from a Wikidata property? and https://w.wiki/EKB
I tried the approach in the snippet
?city p:P1082 ?populationStatement .
?populationStatement ps:P1082 ?cityPopulation.
?populationStatement pq:P585 ?date
FILTER NOT EXISTS { ?city p:P1082/pq:P585 ?date_ . FILTER (?date_ > ?date) }
which makes queries real slow and in this case i am looking into all instance of human settlement which are a few hundred thousand. Even on my local wikidata copy this runs more than 3 hours !
So i wonder whether there is an alternative with MAX, AVG, Subqueries with limit or the like or any other nifty idea that would solve the issue with a decent performance?
You can use sample() as aggregation function (sparql doc).
Starting from you query expression, you will need to change the first line into
SELECT DISTINCT ?city ?cityLabel (max(?cityPop) as ?cityPopulation) (sample(?coord) as ?coordinate) ?region ?regionLabel ?regionIsoCode ?country ?countryLabel ?countryIsoCode ?countryPopulation ?countryGdpPerCapita WHERE {
and your second last line into:
} GROUP BY ?city ?cityLabel ?region ?regionLabel ?regionIsoCode ?country ?countryLabel ?countryIsoCode ?countryPopulation ?countryGdpPerCapita
The result should look like this: https://w.wiki/dRV.
The work-around you tried does not work because unlike P1082 (population), P625 (coordinate) have in most cases no P585 (point in time) qualifier.

can not find out all city from germany

I tried with this query.
SELECT distinct ?city ?cityName ?country WHERE {
?city rdf:type dbo:City .
?city rdfs:label ?cityName.
?city dbo:country ?country.
?city dbo:country dbr:Germany.
FILTER (lang(?cityName) = 'en')
} ORDER BY ?city
But some city which have dbo:country predicate and dbr:Germany value those are still not listed in the output. For example try this link http://dbpedia.org/page/Goslar . There is no "Goslar" city in output. Can anybody explain me why?
First of all, dbpedia is really a messy place. For example, Goslar, in dbpedia, is not even a city it is a dbo:PopulatedPlace dbo:Town yago:City108524735 . That's why It's not in the output. Another example is Paris. You can check it.

How to ASK if there is any country in the United Nations called “Paris” using SPARQL

I have been trying to do this query but im not sure about how is the syntax I need. I have seen a lot of examples of how to ask in a query to check if something is larger or shorter than other thing, but I have no idea of how to ask if something is IN other thing.
This is my last (failed) try:
PREFIX : <http:/dbpedia.org/resource/>
ASK
{
FILTER(<http:/dbpedia.org/resource/Paris> IN <http:/dbpedia.org/resource/Member_states_of_the_United_Nations>)
}
Part of the issue is that there's a DBpedia resource called
http:/dbpedia.org/resource/Member_states_of_the_United_Nations
but what you actually want, since its the value of the dct:subject property, is the category,
http://dbpedia.org/page/Category:Member_states_of_the_United_Nations
For instance, you can get a list of countries that are member states with a query like:
select ?country {
?country a dbo:Country ;
dct:subject dbc:Member_states_of_the_United_Nations
}
SPARQL results
However, Paris isn't a country that's a member. It's a city in France, which is a member. You can check whether something is a member with an ask query like:
ask {
dbr:France a dbo:Country ;
dct:subject dbc:Member_states_of_the_United_Nations
}
SPARQL results (true)
You could get a list of populated places in countries that are member with a query like:
select ?city ?country {
?city a dbo:PopulatedPlace ;
dbo:country ?country .
?country a dbo:Country ;
dct:subject dbc:Member_states_of_the_United_Nations .
}
SPARQL results (limited to 100)
and you can modify that to be an ask query that checks for cities with specific names. E.g.:
ask {
?city rdfs:label "Paris"#en ;
a dbo:PopulatedPlace ;
dbo:country ?country .
?country a dbo:Country ;
dct:subject dbc:Member_states_of_the_United_Nations .
}
SPARQL results (true)
Using the dc:hasPart term :
<http:/dbpedia.org/resource/Member_states_of_the_United_Nations> <http://dublincore.org/documents/dcmi-terms/#terms-hasPart> <http:/dbpedia.org/resource/Paris>
This is described in this w3 document.

SPARQL query to retrieve countries population density from DBPedia

Note: This question is different from SPARQL query to retrieve countries population from DBpedia. This question is about population density as understood by DBPedia itself.
How can I retrieve country population density from DBPedia?
I have tried the following, but Virtuoso endpoint returns an empty result set:
PREFIX p: <http://dbpedia.org/property/>
SELECT DISTINCT ?name ?populationDensity
WHERE {
?country a dbpedia-owl:Country .
?country rdfs:label ?name .
?country p:populationDensity ?populationDensity . }
Your current query returns an empty table because there is no ?country that fulfills your query that has the rdf:type dbpedia-owl:Country (represented by 'a'). Check that with this query.
To find the list of rdf:type's that the set of data that does use your populationDensity you could use this query. Following that lead you can just check all properties for Portugal and find that it does have populationDensity, but not the one you used.
This works:
PREFIX dbpedia-ont-PP: <http://dbpedia.org/ontology/PopulatedPlace/>
SELECT DISTINCT ?country ?populationDensity
WHERE {
?country a dbpedia-owl:Country .
?country dbpedia-ont-PP:populationDensity ?populationDensity .
}