While examining the results of the official example query "Continents, countries, regions and capitals" (on https://query.wikidata.org/, limited to Germany for your convenience here: link), I noticed that some capitals of German federal states were missing. For example Wiesbaden as capital of Hesse. I noticed that Wiesbaden is an instance of big city, but not of city (see https://www.wikidata.org/wiki/Q1721), in contrast to some other cities. I was able to alleviate the problem by also including cities that are subclasses of city by changing line 17 to ?city wdt:P31/wdt:P279? wd:Q515.
One of the four cities that are still missing is Magdeburg, the capital of Saxony-Anhalt.
The diagnostic query
SELECT ?cityLabel ?props
WHERE {
?city wdt:P31 ?props.
FILTER(?city = wd:Q1733 || ?city = wd:Q1726).
SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en". }
}
shows that Magdeburg is not even an instance of city, although it clearly is according to its Wikidata page https://www.wikidata.org/wiki/Q1733.
I am new to Wikidata and SPARQL. However, this seems wrong to me. What can I do to get all capitals of the german federal states? And what is the reason for this behaviour?
These missing statements are not truthy:
SELECT ?statement ?valueLabel ?rank ?best
WHERE {
wd:Q1733 p:P31 ?statement.
?statement ps:P31 ?value .
?statement wikibase:rank ?rank .
OPTIONAL { ?statement a ?best . }
SERVICE wikibase:label { bd:serviceParam wikibase:language "en". }
}
Try it!
They are normal-rank statements, but there is a preferred-rank statement.
Truthy statements represent statements that have the best
non-deprecated rank for given property. Namely, if there is a
preferred statement for property P2, then only preferred statements
for P2 will be considered truthy. Otherwise, all normal-rank
statements for P2 are considered truthy.
Update
I have decreased the rank of the preferred statement just now. Please test your query again.
Related
I'm using the Wikidata query service to learn the SPARQL query language. I'm trying to get information on countries and their identifying information.
Here is a simple query which is intended to return a list of countries (https://www.wikidata.org/wiki/Q6256) along with their ISO 3-letter codes (https://www.wikidata.org/wiki/Property:P298):
SELECT ?country ?countryLabel ?iso
WHERE
{
SERVICE wikibase:label { bd:serviceParam wikibase:language "en". }
?country wdt:P31 wd:Q6256; # wd:Q6256="country; wd:Q3624078="sovereign state"
wdt:P298 ?iso.
}
ORDER BY ?countryLabel
I notice that at least one country is consistently missing from the results, Georgia, and I'm confused about why.
According to its wikidata page:
It is an instance of country (wd:Q6256)
It does have an ISO-3166 3-letter country code (wdt:P298)
I've tried various transformations of this query (e.g. don't include the ISO codes, use labels in different languages, etc) and I consistently get the same result: Georgia is missing.
However if I switch from (instance of a country wd:Q6256) to (instance of a sovereign state wd:Q3624078; a subclass of wd:Q6256), then Georgia is included in the results.
I am at a loss to explain this result; the entity in question should be an instance of both "country" and "sovereign state." And clearly it works for most of the other countries of the world, whose data is represented similarly in Wikidata, in that they're listed as instances of both country wd:Q6256 and sovereign state wd:Q3624078.
Can anyone explain what aspect of the SPARQL language, or representation of the data in question, that I'm not understanding here?
The claim for instanceOf Sovereign State has a PreferredRank, so it's selected in preference to all the other claims which have a NormalRank. Also, SPARQL doesn't do inheritance by default unless you explicitly bake it into the query (because it can be expensive), so you don't automatically get Sovereign State just because it's a subclass of Country.
This will include Georgia
SELECT ?country ?countryLabel ?iso
WHERE
{
SERVICE wikibase:label { bd:serviceParam wikibase:language "en". }
?country p:P31/ps:P31 wd:Q6256; # wd:Q6256="country; wd:Q3624078="sovereign state"
wdt:P298 ?iso.
}
ORDER BY ?countryLabel
but note that it includes deprecated claims as well. I cribbed it from this set of examples: https://en.wikibooks.org/wiki/SPARQL/WIKIDATA_Qualifiers,_References_and_Ranks
As mentioned by #horcrux in the comments, you can modify this to exclude deprecated claims by using a FILTER expression:
SELECT ?country ?countryLabel ?iso
WHERE
{
SERVICE wikibase:label { bd:serviceParam wikibase:language "en". }
FILTER(?rank != wikibase:DeprecatedRank) . ?country p:P31 [ ps:P31 wd:Q6256 ; wikibase:rank ?rank ] ;
wdt:P298 ?iso.
}
ORDER BY ?countryLabel
The results are the same in this case, but it's something worth thinking about when you're considering what kind of data you're looking for.
While examining the results of the official example query "Continents, countries, regions and capitals" (on https://query.wikidata.org/, limited to Germany for your convenience here: link), I noticed that some capitals of German federal states were missing. For example Wiesbaden as capital of Hesse. I noticed that Wiesbaden is an instance of big city, but not of city (see https://www.wikidata.org/wiki/Q1721), in contrast to some other cities. I was able to alleviate the problem by also including cities that are subclasses of city by changing line 17 to ?city wdt:P31/wdt:P279? wd:Q515.
One of the four cities that are still missing is Magdeburg, the capital of Saxony-Anhalt.
The diagnostic query
SELECT ?cityLabel ?props
WHERE {
?city wdt:P31 ?props.
FILTER(?city = wd:Q1733 || ?city = wd:Q1726).
SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en". }
}
shows that Magdeburg is not even an instance of city, although it clearly is according to its Wikidata page https://www.wikidata.org/wiki/Q1733.
I am new to Wikidata and SPARQL. However, this seems wrong to me. What can I do to get all capitals of the german federal states? And what is the reason for this behaviour?
These missing statements are not truthy:
SELECT ?statement ?valueLabel ?rank ?best
WHERE {
wd:Q1733 p:P31 ?statement.
?statement ps:P31 ?value .
?statement wikibase:rank ?rank .
OPTIONAL { ?statement a ?best . }
SERVICE wikibase:label { bd:serviceParam wikibase:language "en". }
}
Try it!
They are normal-rank statements, but there is a preferred-rank statement.
Truthy statements represent statements that have the best
non-deprecated rank for given property. Namely, if there is a
preferred statement for property P2, then only preferred statements
for P2 will be considered truthy. Otherwise, all normal-rank
statements for P2 are considered truthy.
Update
I have decreased the rank of the preferred statement just now. Please test your query again.
Suppose I want to get a list of every country (Q6256) and its most recently recorded Human Development Index (P1081) value. The Human Development Index property for the country contains a list of data points taken at different points in time, but I only care about the most recent data. This query will not work because it gets multiple results for each country (one for each Human Development Index data point):
SELECT
?country
?countryLabel
?hdi_value
?hdi_date
WHERE {
?country wdt:P31 wd:Q6256.
OPTIONAL { ?country p:P1081 ?hdi_statement.
?hdi_statement ps:P1081 ?hdi_value.
?hdi_statement pq:P585 ?hdi_date.
}
SERVICE wikibase:label { bd:serviceParam wikibase:language "en". }
}
Link to Query Console
I'm aware of GROUP BY/GROUP CONCAT but that will still give me every result when I'd prefer to just have one. GROUP BY/SAMPLE will also not work since SAMPLE is not guaranteed to take the most recent result.
Any help or link to a relevant example query is appreciated!
P.S. Another thing I'm confused about is why population P1082 in this query returns only one population result per country
SELECT
?country
?countryLabel
?population
WHERE {
?country wdt:P31 wd:Q6256.
OPTIONAL { ?country wdt:P1082 ?population. }
SERVICE wikibase:label { bd:serviceParam wikibase:language "en". }
}
while the same query but for HDI returns multiple results per country:
SELECT
?country
?countryLabel
?hdi
WHERE {
?country wdt:P31 wd:Q6256.
OPTIONAL { ?country wdt:P1081 ?hdi. }
SERVICE wikibase:label { bd:serviceParam wikibase:language "en". }
}
What is different about population and HDI that causes the behavior to be different? When I view the population data for each country on Wikidata I see multiple population points listed, but only one gets returned by the query.
Both your questions are duplicates, but I'll try to add interesting facts to existing answers.
Question 1 is a duplicate of SPARQL query to get only results with the most recent date.
This technique does the trick:
FILTER NOT EXISTS {
?country p:P1081/pq:P585 ?hdi_date_ .
FILTER (?hdi_date_ > ?hdi_date)
}
However, you should add this clause outside of OPTIONAL, it is not working inside of OPTIONAL (and I'm not sure this is not a bug).
Question 2 is a duplicate of Some cities aren't instances of city or big city?
You can't use wdt-predicates, because missing statements are not truthy.
They are normal-rank statements, but there is a preferred-rank statement.
Truthy statements represent statements that have the best non-deprecated rank for given property. Namely, if there is a preferred statement for property P2, then only preferred statements for P2 will be considered truthy. Otherwise, all normal-rank statements are considered truthy.
The reason why P1081 always has preferred statement is that this property is processed by PreferentialBot.
While examining the results of the official example query "Continents, countries, regions and capitals" (on https://query.wikidata.org/, limited to Germany for your convenience here: link), I noticed that some capitals of German federal states were missing. For example Wiesbaden as capital of Hesse. I noticed that Wiesbaden is an instance of big city, but not of city (see https://www.wikidata.org/wiki/Q1721), in contrast to some other cities. I was able to alleviate the problem by also including cities that are subclasses of city by changing line 17 to ?city wdt:P31/wdt:P279? wd:Q515.
One of the four cities that are still missing is Magdeburg, the capital of Saxony-Anhalt.
The diagnostic query
SELECT ?cityLabel ?props
WHERE {
?city wdt:P31 ?props.
FILTER(?city = wd:Q1733 || ?city = wd:Q1726).
SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en". }
}
shows that Magdeburg is not even an instance of city, although it clearly is according to its Wikidata page https://www.wikidata.org/wiki/Q1733.
I am new to Wikidata and SPARQL. However, this seems wrong to me. What can I do to get all capitals of the german federal states? And what is the reason for this behaviour?
These missing statements are not truthy:
SELECT ?statement ?valueLabel ?rank ?best
WHERE {
wd:Q1733 p:P31 ?statement.
?statement ps:P31 ?value .
?statement wikibase:rank ?rank .
OPTIONAL { ?statement a ?best . }
SERVICE wikibase:label { bd:serviceParam wikibase:language "en". }
}
Try it!
They are normal-rank statements, but there is a preferred-rank statement.
Truthy statements represent statements that have the best
non-deprecated rank for given property. Namely, if there is a
preferred statement for property P2, then only preferred statements
for P2 will be considered truthy. Otherwise, all normal-rank
statements for P2 are considered truthy.
Update
I have decreased the rank of the preferred statement just now. Please test your query again.
While examining the results of the official example query "Continents, countries, regions and capitals" (on https://query.wikidata.org/, limited to Germany for your convenience here: link), I noticed that some capitals of German federal states were missing. For example Wiesbaden as capital of Hesse. I noticed that Wiesbaden is an instance of big city, but not of city (see https://www.wikidata.org/wiki/Q1721), in contrast to some other cities. I was able to alleviate the problem by also including cities that are subclasses of city by changing line 17 to ?city wdt:P31/wdt:P279? wd:Q515.
One of the four cities that are still missing is Magdeburg, the capital of Saxony-Anhalt.
The diagnostic query
SELECT ?cityLabel ?props
WHERE {
?city wdt:P31 ?props.
FILTER(?city = wd:Q1733 || ?city = wd:Q1726).
SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en". }
}
shows that Magdeburg is not even an instance of city, although it clearly is according to its Wikidata page https://www.wikidata.org/wiki/Q1733.
I am new to Wikidata and SPARQL. However, this seems wrong to me. What can I do to get all capitals of the german federal states? And what is the reason for this behaviour?
These missing statements are not truthy:
SELECT ?statement ?valueLabel ?rank ?best
WHERE {
wd:Q1733 p:P31 ?statement.
?statement ps:P31 ?value .
?statement wikibase:rank ?rank .
OPTIONAL { ?statement a ?best . }
SERVICE wikibase:label { bd:serviceParam wikibase:language "en". }
}
Try it!
They are normal-rank statements, but there is a preferred-rank statement.
Truthy statements represent statements that have the best
non-deprecated rank for given property. Namely, if there is a
preferred statement for property P2, then only preferred statements
for P2 will be considered truthy. Otherwise, all normal-rank
statements for P2 are considered truthy.
Update
I have decreased the rank of the preferred statement just now. Please test your query again.