With the following SPARQL query I’m trying to list countries with their national flags in descending order of population.
I cannot run it without it reaching a timeout limit. It runs in ~2s when the indicated line is commented out (but this returns the cartesian product of all countries and all national flags, not just the associated pairs).
SELECT ?country ?countryLabel ?flag ?population
WHERE {
?country wdt:P31 wd:Q6256;
wdt:P1082 ?population.
?flag wdt:P31 wd:Q186516.
?flag wdt:P1001 ?country. # runs without this line
SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en". }
}
ORDER BY DESC(?population)
LIMIT 100
Try it here
Does anyone know why this query is to complex to compute, and how to get it to run? Thanks!
Related
I'm using the Wikidata query service to learn the SPARQL query language. I'm trying to get information on countries and their identifying information.
Here is a simple query which is intended to return a list of countries (https://www.wikidata.org/wiki/Q6256) along with their ISO 3-letter codes (https://www.wikidata.org/wiki/Property:P298):
SELECT ?country ?countryLabel ?iso
WHERE
{
SERVICE wikibase:label { bd:serviceParam wikibase:language "en". }
?country wdt:P31 wd:Q6256; # wd:Q6256="country; wd:Q3624078="sovereign state"
wdt:P298 ?iso.
}
ORDER BY ?countryLabel
I notice that at least one country is consistently missing from the results, Georgia, and I'm confused about why.
According to its wikidata page:
It is an instance of country (wd:Q6256)
It does have an ISO-3166 3-letter country code (wdt:P298)
I've tried various transformations of this query (e.g. don't include the ISO codes, use labels in different languages, etc) and I consistently get the same result: Georgia is missing.
However if I switch from (instance of a country wd:Q6256) to (instance of a sovereign state wd:Q3624078; a subclass of wd:Q6256), then Georgia is included in the results.
I am at a loss to explain this result; the entity in question should be an instance of both "country" and "sovereign state." And clearly it works for most of the other countries of the world, whose data is represented similarly in Wikidata, in that they're listed as instances of both country wd:Q6256 and sovereign state wd:Q3624078.
Can anyone explain what aspect of the SPARQL language, or representation of the data in question, that I'm not understanding here?
The claim for instanceOf Sovereign State has a PreferredRank, so it's selected in preference to all the other claims which have a NormalRank. Also, SPARQL doesn't do inheritance by default unless you explicitly bake it into the query (because it can be expensive), so you don't automatically get Sovereign State just because it's a subclass of Country.
This will include Georgia
SELECT ?country ?countryLabel ?iso
WHERE
{
SERVICE wikibase:label { bd:serviceParam wikibase:language "en". }
?country p:P31/ps:P31 wd:Q6256; # wd:Q6256="country; wd:Q3624078="sovereign state"
wdt:P298 ?iso.
}
ORDER BY ?countryLabel
but note that it includes deprecated claims as well. I cribbed it from this set of examples: https://en.wikibooks.org/wiki/SPARQL/WIKIDATA_Qualifiers,_References_and_Ranks
As mentioned by #horcrux in the comments, you can modify this to exclude deprecated claims by using a FILTER expression:
SELECT ?country ?countryLabel ?iso
WHERE
{
SERVICE wikibase:label { bd:serviceParam wikibase:language "en". }
FILTER(?rank != wikibase:DeprecatedRank) . ?country p:P31 [ ps:P31 wd:Q6256 ; wikibase:rank ?rank ] ;
wdt:P298 ?iso.
}
ORDER BY ?countryLabel
The results are the same in this case, but it's something worth thinking about when you're considering what kind of data you're looking for.
I am trying to display all the former colonies of the British Empire. I have just managed to retrieve a country which is currently a colony. I am trying to get my head around the query in order to embrace either no longer a colony, or colony during a specific time period.
SELECT ?countryLabel WHERE {
?country wdt:P31 wd:Q6256, #Country
wd:Q133156. #Colony
SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en". }
}
Any help is greatly appreciated!
I want to get coordinates via SPARQL from wikidata displayed as degrees (e.g. 54°54'36"N). It is displayed in wikidata like this, so I suspect there is a built-in function for this purpose but I can not find it.
Example query:
SELECT DISTINCT ?countryLabel ?long
{
?country wdt:P31 wd:Q6256 ;
p:P1332 [ psv:P1332 [wikibase:geoLatitude ?long ]].
SERVICE wikibase:label { bd:serviceParam wikibase:language "en" }
}
This gives the longitude as number (e.g. -30.08). I can calculate the desired output format from this result but would prefer to get it directly from the query.
Thanks.
I have this Wikidata query that returns all the football stadiums with the names, coordinates, club labels and stuff like this. But I cannot figure out how to also get the country and city names where stadiums are located (and possibly the coordinates of the cities too).
Here is my query:
SELECT ?club ?clubLabel ?venue ?venueLabel ?coordinates
WHERE
{
?club wdt:P31 wd:Q476028 .
?club wdt:P115 ?venue .
?venue wdt:P625 ?coordinates .
SERVICE wikibase:label { bd:serviceParam wikibase:language "en" }
}
Link to test the query
EDIT 19th november 2020:
I need the timezone of the cities so I tried this query after looking at the documentation but it does not return the value. Just links like "wd:Q6723" :
SELECT DISTINCT ?timezone ?club ?locationLabel ?countryLabel ?clubLabel ?venue ?venueLabel ?coordinates
WHERE
{
?venue (wdt:P421|wd:Q12143) ?timezone .
?club wdt:P31 wd:Q476028 .
?club wdt:P115 ?venue .
?venue wdt:P625 ?coordinates .
OPTIONAL {?club wdt:P159|(wdt:P115/(wdt:P131|wdt:P276)) ?location .
OPTIONAL { ?location wdt:P17 ?country . }
}
SERVICE wikibase:label { bd:serviceParam wikibase:language "en" }
} limit 500
Split over two now. Original query:
SELECT DISTINCT ?club ?locationLabel ?countryLabel ?clubLabel ?venue
?venueLabel ?coordinates
WHERE {
?club wdt:P31 wd:Q476028 .
?club wdt:P115 ?venue .
?venue wdt:P625 ?coordinates .
OPTIONAL {
?club wdt:P159|(wdt:P115/(wdt:P131|wdt:P276)) ?location .
OPTIONAL { ?location wdt:P17 ?country . }
}
SERVICE wikibase:label { bd:serviceParam wikibase:language "en" }
} limit 500
(Update #2: Previously, I asked for the club's timezone. But, of course, that's not the sort of data usually recorded for a club. Instead, you have to go via the location/venue/headquarters or similar, and possibly a level up to region/country because some suburb also doesn't have timezone data.
This is the general idea how it should work, but it's running into a timeout, and so am I:
SELECT DISTINCT ?timezone ?timezoneLabel ?offset
?club ?clubLabel
WHERE {
?club wdt:P31 wd:Q476028 .
# via country. not perfect, because some have multiple timezones, but shoud be faster
?club wdt:P17/wdt:P421 ?timezone .
# what I really want to do; all sorts of alternatives
#?club wdt:P115?/(wdt:P159|wdt:P276)/wdt:P131?/wdt:P421 ?timezone .
?timezone wdt:P2907 ?offset.
SERVICE wikibase:label { bd:serviceParam wikibase:language "en" }
} limit 500
Short explanation:
This uses three new things. OPTIONAL makes the following statement, well, optional. Clubs where nothing can be found will still be included in the output. The second OPTIONAL ist nested in the first, as it's pointless to ask for the country of a location that we haven't found.
The pipe symbol (|) allows for alternatives. Here, I'm asking for "headquarter location (P159) or check for two different ways to specify the location of the stadium. The slash, used in the latter case, denotes a path (club / venue / "located in district|location").
If there is missing data (there will be missing data), you may want to look at examples and figure out if there are other common patterns that locations are recorded. You could, for example, move the inner OPTIONAL outside for cases where the club has a country statement but no other, more specific, location.
Update: I've included the timezone as requested in the comment. To note:
?timezoneLabel gets the timezone's label (= name), just as ?clubLabel gets the club's. The apppended "...Label" is a "magic" function that translates from IDs to huma-readable labels. It is enabled by including that SERVICE wikibase:label... line.
As you might want to use these timezones, I've included the marked line that gets the numeric offset in hours.
The offset may vary because UTC doesn't have dalight savings time. There should be multiple lines in the results for such cases, and you would need to read the ''qualifiers'' to see when they apply. Alternatively, maybe substract the offset from some other timezone's offset (i. e. yours) and you might get lucky and they cancel out.
Suppose I want to get a list of every country (Q6256) and its most recently recorded Human Development Index (P1081) value. The Human Development Index property for the country contains a list of data points taken at different points in time, but I only care about the most recent data. This query will not work because it gets multiple results for each country (one for each Human Development Index data point):
SELECT
?country
?countryLabel
?hdi_value
?hdi_date
WHERE {
?country wdt:P31 wd:Q6256.
OPTIONAL { ?country p:P1081 ?hdi_statement.
?hdi_statement ps:P1081 ?hdi_value.
?hdi_statement pq:P585 ?hdi_date.
}
SERVICE wikibase:label { bd:serviceParam wikibase:language "en". }
}
Link to Query Console
I'm aware of GROUP BY/GROUP CONCAT but that will still give me every result when I'd prefer to just have one. GROUP BY/SAMPLE will also not work since SAMPLE is not guaranteed to take the most recent result.
Any help or link to a relevant example query is appreciated!
P.S. Another thing I'm confused about is why population P1082 in this query returns only one population result per country
SELECT
?country
?countryLabel
?population
WHERE {
?country wdt:P31 wd:Q6256.
OPTIONAL { ?country wdt:P1082 ?population. }
SERVICE wikibase:label { bd:serviceParam wikibase:language "en". }
}
while the same query but for HDI returns multiple results per country:
SELECT
?country
?countryLabel
?hdi
WHERE {
?country wdt:P31 wd:Q6256.
OPTIONAL { ?country wdt:P1081 ?hdi. }
SERVICE wikibase:label { bd:serviceParam wikibase:language "en". }
}
What is different about population and HDI that causes the behavior to be different? When I view the population data for each country on Wikidata I see multiple population points listed, but only one gets returned by the query.
Both your questions are duplicates, but I'll try to add interesting facts to existing answers.
Question 1 is a duplicate of SPARQL query to get only results with the most recent date.
This technique does the trick:
FILTER NOT EXISTS {
?country p:P1081/pq:P585 ?hdi_date_ .
FILTER (?hdi_date_ > ?hdi_date)
}
However, you should add this clause outside of OPTIONAL, it is not working inside of OPTIONAL (and I'm not sure this is not a bug).
Question 2 is a duplicate of Some cities aren't instances of city or big city?
You can't use wdt-predicates, because missing statements are not truthy.
They are normal-rank statements, but there is a preferred-rank statement.
Truthy statements represent statements that have the best non-deprecated rank for given property. Namely, if there is a preferred statement for property P2, then only preferred statements for P2 will be considered truthy. Otherwise, all normal-rank statements are considered truthy.
The reason why P1081 always has preferred statement is that this property is processed by PreferentialBot.