Overcoming single value constraing issues with P625 coordinate location in Wikidata - sparql

I am trying to get a city list together with region and country information with a query like this:
# get a list of cities
# for geograpy3 library
# see https://github.com/somnathrakshit/geograpy3/issues/15
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX wd: <http://www.wikidata.org/entity/>
PREFIX wdt: <http://www.wikidata.org/prop/direct/>
PREFIX p: <http://www.wikidata.org/prop/>
PREFIX ps: <http://www.wikidata.org/prop/statement/>
PREFIX pq: <http://www.wikidata.org/prop/qualifier/>
# get human settlements
SELECT DISTINCT ?city ?cityLabel (max(?cityPop) as ?cityPopulation) ?coord ?region ?regionLabel ?regionIsoCode ?country ?countryLabel ?countryIsoCode ?countryPopulation ?countryGdpPerCapita WHERE {
# if you uncomment this line this query might run for some 3 hours on a local wikidata copy using Apache Jena
# run for Vienna, Illinois, Vienna Austria, Paris Texas and Paris France as example only
# VALUES ?city { wd:Q577544 wd:Q1741 wd:Q830149 wd:Q90}.
# run for Andorra
VALUES ?country {wd:Q228}.
# instance of human settlement https://www.wikidata.org/wiki/Q486972
?city wdt:P31/wdt:P279* wd:Q486972 .
# label of the City
?city rdfs:label ?cityLabel filter (lang(?cityLabel) = "en").
# country this city belongs to
?city wdt:P17 ?country .
# label for the country
?country rdfs:label ?countryLabel filter (lang(?countryLabel) = "en").
# https://www.wikidata.org/wiki/Property:P297 ISO 3166-1 alpha-2 code
?country wdt:P297 ?countryIsoCode.
# population of country
?country wdt:P1082 ?countryPopulation.
OPTIONAL {
?country wdt:P2132 ?countryGdpPerCapita.
}
OPTIONAL {
# located in administrative territory
# https://www.wikidata.org/wiki/Property:P131
?city wdt:P131* ?region.
# administrative unit of first order
?region wdt:P31/wdt:P279* wd:Q10864048.
?region rdfs:label ?regionLabel filter (lang(?regionLabel) = "en").
# isocode state/province
OPTIONAL { ?region wdt:P300 ?regionIsoCode. }
}
# population of city
OPTIONAL { ?city wdt:P1082 ?cityPop.}
# get the coordinates
OPTIONAL { ?city wdt:P625 ?coord. }
} GROUP BY ?city ?cityLabel ?coord ?region ?regionLabel ?regionIsoCode ?country ?countryLabel ?countryIsoCode ?countryPopulation ?countryGdpPerCapita
ORDER BY ?cityLabel
try it!
to experiment with the query i comment out the
# VALUES ?city { wd:Q577544 wd:Q1741 wd:Q830149 wd:Q90}.
# run for Andorra
VALUES ?country {wd:Q228}.
part to see that the results make sense.
Now for The Andorra trial there are cities with multiple coordinates:
https://www.wikidata.org/wiki/Property:P625
Which are event flagged as a problem.
I know there is work-around as explained in How to get only the most recent value from a Wikidata property? and https://w.wiki/EKB
I tried the approach in the snippet
?city p:P1082 ?populationStatement .
?populationStatement ps:P1082 ?cityPopulation.
?populationStatement pq:P585 ?date
FILTER NOT EXISTS { ?city p:P1082/pq:P585 ?date_ . FILTER (?date_ > ?date) }
which makes queries real slow and in this case i am looking into all instance of human settlement which are a few hundred thousand. Even on my local wikidata copy this runs more than 3 hours !
So i wonder whether there is an alternative with MAX, AVG, Subqueries with limit or the like or any other nifty idea that would solve the issue with a decent performance?

You can use sample() as aggregation function (sparql doc).
Starting from you query expression, you will need to change the first line into
SELECT DISTINCT ?city ?cityLabel (max(?cityPop) as ?cityPopulation) (sample(?coord) as ?coordinate) ?region ?regionLabel ?regionIsoCode ?country ?countryLabel ?countryIsoCode ?countryPopulation ?countryGdpPerCapita WHERE {
and your second last line into:
} GROUP BY ?city ?cityLabel ?region ?regionLabel ?regionIsoCode ?country ?countryLabel ?countryIsoCode ?countryPopulation ?countryGdpPerCapita
The result should look like this: https://w.wiki/dRV.
The work-around you tried does not work because unlike P1082 (population), P625 (coordinate) have in most cases no P585 (point in time) qualifier.

Related

How to better filter the sparql query output to avoid the Query timeout limit reached error?

I am getting the Query timeout limit reached error. Is there any way to maybe filter the output more? Thank you!
SELECT (count(distinct ?city) as ?count) WHERE {
?city wdt:P31/wdt:P279* wd:Q486972. # human settlement
?city wdt:P131 ?region.
?city wdt:P17 ?country.
#not a former country
FILTER NOT EXISTS {?country wdt:P31 wd:Q3024240}
#and not an ancient civilisation (needed to exclude ancient Egypt)
FILTER NOT EXISTS {?country wdt:P31 wd:Q28171280}
#not demolished, abolished countries etc.
FILTER NOT EXISTS {?country wdt:P576 ?abolished}
?article schema:about ?city.
?article schema:isPartOf <https://en.wikipedia.org/>.
}

How to get information like capital, currency, language, population about a country in simple way using SPARQL from DBPEDIA

How to get information like capital, currency, language, population about a country in simple way using SPARQL from DBPEDIA.
Is there a simple way to do it?
I found a solution for this problem,
I figured out a query that returns all the required fields,
SELECT DISTINCT ?country ?population ?capital ?currency WHERE {
{?country rdf:type
<http://dbpedia.org/class/yago/WikicatMemberStatesOfTheUnitedNations> .
?country <http://dbpedia.org/ontology/populationTotal> ?population .
?country <http://dbpedia.org/ontology/capital> ?capital .
?country <http://dbpedia.org/ontology/currency> ?currency .}
UNION
{?country rdf:type <http://dbpedia.org/ontology/Country> .
?country <http://dbpedia.org/ontology/populationTotal> ?population .
?country <http://dbpedia.org/ontology/capital> ?capital .
?country <http://dbpedia.org/ontology/currency> ?currency .}
}
SPARQL Result

OR in sparql query

This sparql query on wikidata shows all places in Germany (Q183) with a name that ends in -ow or -itz.
I want to extend this to look for places in Germany and, say, Austria.
I tried modifying the 8th line to something like:
wdt:P17 (wd:Q183 || wd:Q40);
in order to look for places in Austria (Q40), but this is not a valid query.
What is a way to extend the query to include other countries?
Afaik there is no syntax as simple as that. You can, however, use UNION to the same effect like this:
SELECT ?item ?itemLabel ?coord
WHERE
{
?item wdt:P31/wdt:P279* wd:Q486972;
rdfs:label ?itemLabel;
wdt:P625 ?coord;
{?item wdt:P17 wd:Q183}
UNION
{?item wdt:P17 wd:Q40}
FILTER (lang(?itemLabel) = "de") .
FILTER regex (?itemLabel, "(ow|itz)$").
}
or as an alternative create a new variable containing both countries using VALUES:
SELECT ?item ?itemLabel ?coord
WHERE
{
VALUES ?country { wd:Q40 wd:Q183 }
?item wdt:P31/wdt:P279* wd:Q486972;
wdt:P17 ?country;
rdfs:label ?itemLabel;
wdt:P625 ?coord;
FILTER (lang(?itemLabel) = "de") .
FILTER regex (?itemLabel, "(ow|itz)$").
}

getting city information by sparql

I tried with the following SPARQL query.
SELECT distinct ?city ?cityName ?country ?population ?knownfor WHERE {
?city rdf:type dbo:City .
?city rdfs:label ?cityName.
?city dbo:country ?country.
OPTIONAL{
?city dbp:population ?population.
?city dbo:knownFor ?knownfor.
}
FILTER (lang(?cityName) = 'en')
} ORDER BY ?city
But, the problem is -
Not every city has dbp:population predicate but some city has dbp:populationTotal. So, for some cities we can get the population by this but when i write in the optional section of the query -
OPTIONAL{
?city dbp:population ?population .
?city dbp:populationTotal ?populationTotal
}
the both section become blank. Same goes for dbo:knownFor predicate (not every city have knownFor predicate).
How can I specify in the query that i want only the European city? I can not find any predicate which specify the continent of the city
First thing to know is that DBpedia data is a moving target, just like the Wikipedia data from which it is derived. Updates to Wikipedia will eventually be part of DBpedia. More quickly, they'll be part of DBpedia-live.
The issue with values for neither OPTIONAL predicate showing up when you include both predicates appears to be a bug in the Virtuoso version currently hosting DBpedia. I encourage you to check whether it's been reported yet, report it yourself if not, and monitor the issue.
As to limiting the continent of the cities you get back -- it's usually easiest to check an entity of (or near) the sort you want, to find a relevant attribute/predicate/property. For instance, Aachen-Mitte has a dbo:country of Germany, which has a number of rdf:types including yago:EuropeanCountries -- which might be what you want, but might not yet have been applied to all such. You'll need to add a triple to your pattern like --
?country a yago:EuropeanCountries
Edit To Add
The OPTIONAL { ... } clause returns results for the entire pattern enclosed within the braces. So --
OPTIONAL
{
?city dbp:population ?population .
?city dbo:knownFor ?knownfor .
}
-- will only return values for either predicate when that ?city has values for both predicates.
If you want to get every value for either predicate, you need to split that clause into two --
OPTIONAL
{
?city dbp:population ?population .
}
OPTIONAL
{
?city dbo:knownFor ?knownfor .
}
It's easy to get both dbp:population and dbp:populationTotal, with both OPTIONAL (and adding ?populationTotal to your SELECT list) --
SELECT DISTINCT ?city
?cityName
?country
?population
?populationTotal
?knownfor
...
OPTIONAL
{
?city dbp:population ?population .
}
OPTIONAL
{
?city dbp:populationTotal ?populationTotal .
}
OPTIONAL
{
?city dbo:knownFor ?knownfor .
}
If you only want one of the population values, and especially if you have a preference of one predicate over the other, the construction gets more complex (and should be a new question).

can not find out all city from germany

I tried with this query.
SELECT distinct ?city ?cityName ?country WHERE {
?city rdf:type dbo:City .
?city rdfs:label ?cityName.
?city dbo:country ?country.
?city dbo:country dbr:Germany.
FILTER (lang(?cityName) = 'en')
} ORDER BY ?city
But some city which have dbo:country predicate and dbr:Germany value those are still not listed in the output. For example try this link http://dbpedia.org/page/Goslar . There is no "Goslar" city in output. Can anybody explain me why?
First of all, dbpedia is really a messy place. For example, Goslar, in dbpedia, is not even a city it is a dbo:PopulatedPlace dbo:Town yago:City108524735 . That's why It's not in the output. Another example is Paris. You can check it.