I have this Wikidata query that returns all the football stadiums with the names, coordinates, club labels and stuff like this. But I cannot figure out how to also get the country and city names where stadiums are located (and possibly the coordinates of the cities too).
Here is my query:
SELECT ?club ?clubLabel ?venue ?venueLabel ?coordinates
WHERE
{
?club wdt:P31 wd:Q476028 .
?club wdt:P115 ?venue .
?venue wdt:P625 ?coordinates .
SERVICE wikibase:label { bd:serviceParam wikibase:language "en" }
}
Link to test the query
EDIT 19th november 2020:
I need the timezone of the cities so I tried this query after looking at the documentation but it does not return the value. Just links like "wd:Q6723" :
SELECT DISTINCT ?timezone ?club ?locationLabel ?countryLabel ?clubLabel ?venue ?venueLabel ?coordinates
WHERE
{
?venue (wdt:P421|wd:Q12143) ?timezone .
?club wdt:P31 wd:Q476028 .
?club wdt:P115 ?venue .
?venue wdt:P625 ?coordinates .
OPTIONAL {?club wdt:P159|(wdt:P115/(wdt:P131|wdt:P276)) ?location .
OPTIONAL { ?location wdt:P17 ?country . }
}
SERVICE wikibase:label { bd:serviceParam wikibase:language "en" }
} limit 500
Split over two now. Original query:
SELECT DISTINCT ?club ?locationLabel ?countryLabel ?clubLabel ?venue
?venueLabel ?coordinates
WHERE {
?club wdt:P31 wd:Q476028 .
?club wdt:P115 ?venue .
?venue wdt:P625 ?coordinates .
OPTIONAL {
?club wdt:P159|(wdt:P115/(wdt:P131|wdt:P276)) ?location .
OPTIONAL { ?location wdt:P17 ?country . }
}
SERVICE wikibase:label { bd:serviceParam wikibase:language "en" }
} limit 500
(Update #2: Previously, I asked for the club's timezone. But, of course, that's not the sort of data usually recorded for a club. Instead, you have to go via the location/venue/headquarters or similar, and possibly a level up to region/country because some suburb also doesn't have timezone data.
This is the general idea how it should work, but it's running into a timeout, and so am I:
SELECT DISTINCT ?timezone ?timezoneLabel ?offset
?club ?clubLabel
WHERE {
?club wdt:P31 wd:Q476028 .
# via country. not perfect, because some have multiple timezones, but shoud be faster
?club wdt:P17/wdt:P421 ?timezone .
# what I really want to do; all sorts of alternatives
#?club wdt:P115?/(wdt:P159|wdt:P276)/wdt:P131?/wdt:P421 ?timezone .
?timezone wdt:P2907 ?offset.
SERVICE wikibase:label { bd:serviceParam wikibase:language "en" }
} limit 500
Short explanation:
This uses three new things. OPTIONAL makes the following statement, well, optional. Clubs where nothing can be found will still be included in the output. The second OPTIONAL ist nested in the first, as it's pointless to ask for the country of a location that we haven't found.
The pipe symbol (|) allows for alternatives. Here, I'm asking for "headquarter location (P159) or check for two different ways to specify the location of the stadium. The slash, used in the latter case, denotes a path (club / venue / "located in district|location").
If there is missing data (there will be missing data), you may want to look at examples and figure out if there are other common patterns that locations are recorded. You could, for example, move the inner OPTIONAL outside for cases where the club has a country statement but no other, more specific, location.
Update: I've included the timezone as requested in the comment. To note:
?timezoneLabel gets the timezone's label (= name), just as ?clubLabel gets the club's. The apppended "...Label" is a "magic" function that translates from IDs to huma-readable labels. It is enabled by including that SERVICE wikibase:label... line.
As you might want to use these timezones, I've included the marked line that gets the numeric offset in hours.
The offset may vary because UTC doesn't have dalight savings time. There should be multiple lines in the results for such cases, and you would need to read the ''qualifiers'' to see when they apply. Alternatively, maybe substract the offset from some other timezone's offset (i. e. yours) and you might get lucky and they cancel out.
Related
While examining the results of the official example query "Continents, countries, regions and capitals" (on https://query.wikidata.org/, limited to Germany for your convenience here: link), I noticed that some capitals of German federal states were missing. For example Wiesbaden as capital of Hesse. I noticed that Wiesbaden is an instance of big city, but not of city (see https://www.wikidata.org/wiki/Q1721), in contrast to some other cities. I was able to alleviate the problem by also including cities that are subclasses of city by changing line 17 to ?city wdt:P31/wdt:P279? wd:Q515.
One of the four cities that are still missing is Magdeburg, the capital of Saxony-Anhalt.
The diagnostic query
SELECT ?cityLabel ?props
WHERE {
?city wdt:P31 ?props.
FILTER(?city = wd:Q1733 || ?city = wd:Q1726).
SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en". }
}
shows that Magdeburg is not even an instance of city, although it clearly is according to its Wikidata page https://www.wikidata.org/wiki/Q1733.
I am new to Wikidata and SPARQL. However, this seems wrong to me. What can I do to get all capitals of the german federal states? And what is the reason for this behaviour?
These missing statements are not truthy:
SELECT ?statement ?valueLabel ?rank ?best
WHERE {
wd:Q1733 p:P31 ?statement.
?statement ps:P31 ?value .
?statement wikibase:rank ?rank .
OPTIONAL { ?statement a ?best . }
SERVICE wikibase:label { bd:serviceParam wikibase:language "en". }
}
Try it!
They are normal-rank statements, but there is a preferred-rank statement.
Truthy statements represent statements that have the best
non-deprecated rank for given property. Namely, if there is a
preferred statement for property P2, then only preferred statements
for P2 will be considered truthy. Otherwise, all normal-rank
statements for P2 are considered truthy.
Update
I have decreased the rank of the preferred statement just now. Please test your query again.
I want to find all children's fiction writers using Wikidata SPARQL query. But I couldn't figure out how? Can someone help, please? The following is my approach but I don't think it is the correct way.
SELECT ?item ?itemLabel {
?item wdt:P31 wd:Q5. #find humans
?item wdt:P106 wd: #humans whose occupation is a novelist
[another condition needed] #children's fiction.
SERVICE wikibase:label {bd:serviceParam wikibase:language 'en'.}
} LIMIT 10
There is not one correct way, especially not in Wikidata where not all items of the same kind necessarily have the same properties.
One way would be to find the authors of works that are intended for (P2360) children:
# it’s a literary work (incl. any sublasses)
?book wdt:P31/wdt:P279* wd:Q7725634 .
# the literary work is intended for children
?book wdt:P2360 wd:Q7569 .
# the literary work has an author
?book wdt:P50 ?author .
# the author is a US citizen
?author wdt:P27 wd:Q30 .
Instead of getting all works that belong to the class "literary work" or any of its subclasses, you could decide to use only the class "fiction literature" (Q38072107) instead; with the risk that not all relevant works use this class.
Another way would be to find all authors that have "children’s writer" (Q4853732), or any of its subclasses, as occupation:
?author wdt:P106/wdt:P279* wd:Q4853732 .
?author wdt:P27 wd:Q30 .
As the different ways might find different results, you could could use them in the same query, using UNION:
SELECT DISTINCT ?author ?authorLabel
WHERE {
{
# way 1
}
UNION
{
# way 2
}
UNION
{
# way 3
}
SERVICE wikibase:label {bd:serviceParam wikibase:language 'en'.}
}
Suppose I want to get a list of every country (Q6256) and its most recently recorded Human Development Index (P1081) value. The Human Development Index property for the country contains a list of data points taken at different points in time, but I only care about the most recent data. This query will not work because it gets multiple results for each country (one for each Human Development Index data point):
SELECT
?country
?countryLabel
?hdi_value
?hdi_date
WHERE {
?country wdt:P31 wd:Q6256.
OPTIONAL { ?country p:P1081 ?hdi_statement.
?hdi_statement ps:P1081 ?hdi_value.
?hdi_statement pq:P585 ?hdi_date.
}
SERVICE wikibase:label { bd:serviceParam wikibase:language "en". }
}
Link to Query Console
I'm aware of GROUP BY/GROUP CONCAT but that will still give me every result when I'd prefer to just have one. GROUP BY/SAMPLE will also not work since SAMPLE is not guaranteed to take the most recent result.
Any help or link to a relevant example query is appreciated!
P.S. Another thing I'm confused about is why population P1082 in this query returns only one population result per country
SELECT
?country
?countryLabel
?population
WHERE {
?country wdt:P31 wd:Q6256.
OPTIONAL { ?country wdt:P1082 ?population. }
SERVICE wikibase:label { bd:serviceParam wikibase:language "en". }
}
while the same query but for HDI returns multiple results per country:
SELECT
?country
?countryLabel
?hdi
WHERE {
?country wdt:P31 wd:Q6256.
OPTIONAL { ?country wdt:P1081 ?hdi. }
SERVICE wikibase:label { bd:serviceParam wikibase:language "en". }
}
What is different about population and HDI that causes the behavior to be different? When I view the population data for each country on Wikidata I see multiple population points listed, but only one gets returned by the query.
Both your questions are duplicates, but I'll try to add interesting facts to existing answers.
Question 1 is a duplicate of SPARQL query to get only results with the most recent date.
This technique does the trick:
FILTER NOT EXISTS {
?country p:P1081/pq:P585 ?hdi_date_ .
FILTER (?hdi_date_ > ?hdi_date)
}
However, you should add this clause outside of OPTIONAL, it is not working inside of OPTIONAL (and I'm not sure this is not a bug).
Question 2 is a duplicate of Some cities aren't instances of city or big city?
You can't use wdt-predicates, because missing statements are not truthy.
They are normal-rank statements, but there is a preferred-rank statement.
Truthy statements represent statements that have the best non-deprecated rank for given property. Namely, if there is a preferred statement for property P2, then only preferred statements for P2 will be considered truthy. Otherwise, all normal-rank statements are considered truthy.
The reason why P1081 always has preferred statement is that this property is processed by PreferentialBot.
While examining the results of the official example query "Continents, countries, regions and capitals" (on https://query.wikidata.org/, limited to Germany for your convenience here: link), I noticed that some capitals of German federal states were missing. For example Wiesbaden as capital of Hesse. I noticed that Wiesbaden is an instance of big city, but not of city (see https://www.wikidata.org/wiki/Q1721), in contrast to some other cities. I was able to alleviate the problem by also including cities that are subclasses of city by changing line 17 to ?city wdt:P31/wdt:P279? wd:Q515.
One of the four cities that are still missing is Magdeburg, the capital of Saxony-Anhalt.
The diagnostic query
SELECT ?cityLabel ?props
WHERE {
?city wdt:P31 ?props.
FILTER(?city = wd:Q1733 || ?city = wd:Q1726).
SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en". }
}
shows that Magdeburg is not even an instance of city, although it clearly is according to its Wikidata page https://www.wikidata.org/wiki/Q1733.
I am new to Wikidata and SPARQL. However, this seems wrong to me. What can I do to get all capitals of the german federal states? And what is the reason for this behaviour?
These missing statements are not truthy:
SELECT ?statement ?valueLabel ?rank ?best
WHERE {
wd:Q1733 p:P31 ?statement.
?statement ps:P31 ?value .
?statement wikibase:rank ?rank .
OPTIONAL { ?statement a ?best . }
SERVICE wikibase:label { bd:serviceParam wikibase:language "en". }
}
Try it!
They are normal-rank statements, but there is a preferred-rank statement.
Truthy statements represent statements that have the best
non-deprecated rank for given property. Namely, if there is a
preferred statement for property P2, then only preferred statements
for P2 will be considered truthy. Otherwise, all normal-rank
statements for P2 are considered truthy.
Update
I have decreased the rank of the preferred statement just now. Please test your query again.
In Wikidata, I want to find an item's country. Either directly if the item has a country directly, or by climbing up the P131s (located in the administrative territorial entity) until I find a country. Here is the query:
?item wdt:P131*/wdt:P17 ?country.
The query above works fine... except when a sub-division used to belong to another country, like for Q25270 (Prishtina). In such case, the result can be anachronistic. That's what I want to fix.
Great news: in such cases we should only consider the unique P131 (located in the administrative territorial entity) that has no P582 (end time) sub-property attached to it, and the problem is solved!
My question: how to alter my query above to achieve that?
Example: Let's say MyItem is in MyStreet is in MyTown is in MyRegion is in MyCountry, I must make sure that MyStreet, MyTown, and MyRegion do not have a P582 (end time).
(If "sub-property" is not the correct term, please let me know the right term and I will fix the question, thanks!)
An attempt
The query below works in most cases, but unfortunately it has a bug: It finds the wrong country in cases where the current country was also the country in the past (for instance Alsace belonged to France until 1871 then to Germany and currently to France again).
SELECT DISTINCT ?country WHERE {
wd:Q6556803 wdt:P131* ?area .
?area wdt:P17 ?country .
OPTIONAL {
wd:Q6556803 wdt:P131*/p:P131 [
pq:P582 ?endTime; ps:P131/wdt:P131* ?area
] .
} .
FILTER( !BOUND( ?endTime ) ) .
}
Wikidata uses different properties for direct links and links with extra information. So, for the statement "Prishtina is located in the administrative territorial entity Socialist Autonomous Province of Kosovo", there's the simple triple:
wd:Q25270 wdt:P131 wd:Q646035
And the long form with additional information (the end time):
wd:Q25270 p:P131 wds:Q25270-7df79cec-4938-8b6d-4e11-4dde6f72d73b .
wds:Q25270-7df79cec-4938-8b6d-4e11-4dde6f72d73b ps:P131 wd:Q646035 ;
pq:P582 "1990-01-01T00:00:00Z"
So, we need to filter out all paths with an end time (pq:582):
SELECT DISTINCT ?s ?sLabel ?country ?countryLabel {
VALUES ?s {
wd:Q25270
}
?s wdt:P131* ?area .
?area wdt:P17 ?country .
FILTER NOT EXISTS {
?s p:P131/(ps:P131/p:P131)* ?statement .
?statement ps:P131 ?area .
?s p:P131/(ps:P131/p:P131)* ?intermediateStatement .
?intermediateStatement (ps:P131/p:P131)* ?statement .
?intermediateStatement pq:P582 ?endTime .
}
SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en" }
}
limit 50
Here, ?intermediateStatement is a statement with an end time on the path from ?s to a country.
This query does seem to time out if there is more than one value set for ?s. Also, the query does not take into account that there might exist multiple links from an item to an area where one has a timestamp and the other doesn't (both paths will be filtered out).