Return cities in Wikidata SPARQL Query, similar to a Wikipedia page - sparql

I'm not sure what I'm doing wrong. I have a nice list, but not only are the cities duplicating, but I'm unsure how they're defined as cities. I would expect to see London in the results and have similar results to this Wikipedia page. These results are quite different to the Wikipedia page.
I want to:
Get a list of cities, with their first-level administrative country subdivision (province/state/region), similar to this Wikipedia page
While avoiding duplicate cities.
SELECT ?city ?cityLabel ?country ?population ?countryLabel ?region ?regionLabel ?lat ?long
WHERE
{
?city wdt:P31/wdt:P279 wd:Q515 . # find instances of subclasses of city
?city (wdt:P131) ?region.
?region wdt:P31/wdt:P279 wd:Q10864048 .
?city wdt:P1082 ?population .
?city wdt:P17 ?country . # Also find the country of the city
?city p:P625 ?statement . # coordinate-location statement
?statement psv:P625 ?coordinate_node .
OPTIONAL { ?coordinate_node wikibase:geoLatitude ?lat. }
OPTIONAL { ?coordinate_node wikibase:geoLongitude ?long.}
FILTER (?population > 100000) .
# choose language
SERVICE wikibase:label {
bd:serviceParam wikibase:language "en" .
}
}
LIMIT 8000
Try it
Update:
Although not an answer to this specific question, anyone trying to get similar data to this should have a look here.
Update 2:
With help in the comments from #UninformedUser, the query is now:
SELECT DISTINCT ?city ?cityLabel ?country ?population ?countryLabel ?region ?regionLabel ?lat ?long
WHERE
{
?city wdt:P31/wdt:P279 wd:Q515 . # find instances of subclasses of city
?city (wdt:P131) ?region.
?region wdt:P31/wdt:P279 wd:Q10864048 .
?city p:P1082 ?populationStmt .
?populationStmt ps:P1082 ?population ; pq:P585 ?pop_date .
?city wdt:P17 ?country . # Also find the country of the city
?city p:P625 ?statement . # coordinate-location statement
?statement psv:P625 ?coordinate_node .
OPTIONAL { ?coordinate_node wikibase:geoLatitude ?lat. }
OPTIONAL { ?coordinate_node wikibase:geoLongitude ?long.}
FILTER NOT EXISTS {
?city p:P1082/pq:P585 ?pop_date_ .
FILTER (?pop_date_ > ?pop_date)
}
FILTER (?population > 100000) .
# choose language
SERVICE wikibase:label {
bd:serviceParam wikibase:language "en" .
}
}
LIMIT 8000
Try it

Related

Wikidata: Filter post codes only current valid

I would like to filter post codes to show only the current active for today.
Problem is there are are cities with old post codes (Example).
My current query shows the old post codes:
SELECT ?city ?cityLabel ?postcode ?federal_stateLabel ?federal_state_nr WHERE {
SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],de". }
?city (wdt:P31/(wdt:P279*)) wd:Q7930989;
wdt:P17 wd:Q183;
wdt:P281 ?postcode;
wdt:P131 ?federal_state.
?federal_state wdt:P439 ?federal_state_nr.
}
ORDER BY (?postcode)
LIMIT 10
(query.wikidata.org)
I would have to use start time P580 and end time P582 but I don't see how.
You can use this for filtering out the claims which have an end time:
?city p:P281 ?postCodeStmt .
?postCodeStmt ps:P281 ?postcode .
FILTER NOT EXISTS { ?postcode pq:P582 ?endTime . }
The whole query becomes:
SELECT ?city ?cityLabel ?postcode ?federal_stateLabel ?federal_state_nr WHERE {
SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],de". }
?city (wdt:P31/(wdt:P279*)) wd:Q7930989;
wdt:P17 wd:Q183;
p:P281 ?postCodeStmt;
wdt:P131 ?federal_state.
?federal_state wdt:P439 ?federal_state_nr.
?postCodeStmt ps:P281 ?postcode.
FILTER NOT EXISTS { ?postCodeStmt pq:P582 ?endTime . } # Filtering out old postal codes
}
ORDER BY (?postcode)
LIMIT 100
See also Wikidata:SPARQL tutorial§Qualifiers.

How to query wikidata using SPARQL to find a place that matches certain criteria and is geographically near another city

I wrote the following SPARQL query to find the wikidata item with the label "San Leucio" in Italy.
SELECT DISTINCT * WHERE {
?location ?label 'San Leucio'#en .
?location wdt:P17 wd:Q38 .
?location rdfs:label ?locationName .
OPTIONAL {
?article schema:about ?location .
?article schema:isPartOf <https://en.wikivoyage.org/> .
}
?location wdt:P18 ?image .
FILTER(lang(?locationName) = "en")
}
The query returns these 3 results:
wd:Q55179410
wd:Q20009063
wd:Q846499
The result I want is wd:Q846499, which is outside of Naples, Italy. Is there any way I could further filter this query to return the result that is nearest to Naples? I know that I can get the geoCoordinates for each of these with ?location wdt:P625 ?coordinates, but I'm not sure how I could use that to compare to the geo-coordinates of Naples to get what I want.
SELECT DISTINCT * {
VALUES ?naples {wd:Q2634}
?Napfes wdt:P625 ?naples_coordinates.
?location rdfs:label 'San Leucio'#en .
?location wdt:P17 wd:Q38 .
?location wdt:P18 ?image .
?location wdt:P625 ?location_coordinates.
OPTIONAL {
?article schema:about ?location .
?article schema:isPartOf <https://en.wikivoyage.org/> .
}
BIND (geof:distance(?location_coordinates, ?naples_coordinates) AS ?distance)
} ORDER BY ?distance LIMIT 1

Wikidata query - how to add year to city's population?

I would like to get the year of population count for each city.
Do tou know how to add it correctly? Currently I got empty results.
Here's my query:
SELECT DISTINCT ?cityLabel ?population ?gps ?data WHERE {
?city (wdt:P31/(wdt:P279*)) wd:Q515;
wdt:P1082 ?population;
wdt:P625 ?gps.
OPTIONAL { ?population wdt:P585 ?date. } # here I have a problem
SERVICE wikibase:label { bd:serviceParam wikibase:language "en". }
}
ORDER BY DESC(?population) LIMIT 100
PS. just paste it here: https://query.wikidata.org/
First problem: you are selecting ?data but the actual variable is ?date.
Second problem: ?population is the object of your statement, but qualifiers refer to a whole statement, not just its object.
For referring to the statement, you'll have to use p:P1082 instead of wdt:P1082.
You can obtain what you want with the following query:
SELECT DISTINCT ?cityLabel ?population ?gps ?date WHERE {
?city
wdt:P31/wdt:P279* wd:Q515;
wdt:P625 ?gps.
?city p:P1082 ?populationStatement .
?populationStatement ps:P1082 ?population .
OPTIONAL { ?populationStatement pq:P585 ?date. }
SERVICE wikibase:label { bd:serviceParam wikibase:language "en". }
}
ORDER BY DESC(?population)
LIMIT 10
I set LIMIT 10 because this is a pretty heavy query and sometimes it reaches timeout.
To deepen the topic, I'd suggest you to read Wikidata:SPARQL tutorial§Qualifiers.

Wikidata SPARQL - get company entities and the location of their headquarters

I'm having trouble extracting location attributes of company HQ's.
My query: finds all companies or sub-classes, and returns some basic properties such as ISIN and URL, and the Headquarter location.
I have tried to use this example to extend the Headquarter part of the query to return location information such as city, country, and coordinate latitude and longitude. However I am getting stuck on pulling the values or labels through.
Thank you
SELECT
?item ?itemLabel ?web ?isin ?hq ?hqloc ?inception
# valueLabel is only useful for properties with item-datatype
WHERE
{
?item p:P31/ps:P31/wdt:P279* wd:Q783794.
OPTIONAL{?item wdt:P856 ?web.} # get item
OPTIONAL{?item wdt:P946 ?isin.} # get item
OPTIONAL{?item wdt:P571 ?inception.} # get item
OPTIONAL{?item wdt:P159 ?hq.}
OPTIONAL{?item p:P159 ?hqItem. # get property
?hqItem ps:P159 wd:Q515. # get property-statement wikidata-entity
?hqItem pq:P17 ?hqloc. # get country of city
}
?article schema:about ?item .
?article schema:inLanguage "en" .
?article schema:isPartOf <https://en.wikipedia.org/>.
SERVICE wikibase:label { bd:serviceParam wikibase:language "en". }
}
LIMIT 10
A more simplified query to select some of the values you mentioned:
SELECT
?company ?companyLabel ?isin ?web ?country ?countryLabel ?inception
WHERE
{
?article schema:inLanguage "en" .
?article schema:isPartOf <https://en.wikipedia.org/>.
?article schema:about ?company .
?company p:P31/ps:P31/wdt:P279* wd:Q783794.
?company wdt:P946 ?isin.
OPTIONAL {?company wdt:P856 ?web.}
OPTIONAL {?company wdt:P571 ?inception.}
OPTIONAL {?company wdt:P17 ?country.}
SERVICE wikibase:label { bd:serviceParam wikibase:language "en". }
} LIMIT 10
What I changed:
changed some labels to be more explicit (ex: "?item" -> "?company")
usage of P17 to directly select the country
I removed the OPTIONAL on ISIN to show that there exist some values. You did not get a result because it seems that many company instances on Wikidata lack that information.
From here, selecting the other values should be easy.

sparql query is not returning desired output

This is my sparql query to get all the country name with some property of that country.
SELECT distinct ?country ?capital ?currency ?lat ?long
WHERE {
?country rdf:type dbo:Country .
?country dbo:capital ?capital .
?country dbo:currency ?currency .
?country geo:lat ?lat.
?country geo:long ?long.
}
ORDER BY ?country
But the problem is, Some country is missing, like "Switzerland". You go to http://dbpedia.org/page/Switzerland this page you will see that it's type is country. you will also do not find exactly "Austria" rather "Austrian_Empire". Why? There is a entity called "Austria" and it is dbo:Country type.
Switzerland does not seem to have dbo:capital, which is why it's not included in the results of your query.
If you want to get even results that do not have some of the properties, use OPTIONAL:
SELECT distinct ?country ?capital ?currency ?lat ?long
WHERE {
?country rdf:type dbo:Country .
OPTIONAL
{
?country dbo:capital ?capital .
?country dbo:currency ?currency .
?country geo:lat ?lat.
?country geo:long ?long.
}
}
ORDER BY ?country
Though this query returns even entities that are not countries (but for some reason are dbo:Country), like Cinema of Switzerland.