How to improve query optimization to avoid Wikidata query timeout? - sparql

I am new to Stack and I am researching human longevity in various parts of the world. I am using the Wikidata Query Service to collect data. I would like to obtain all the birth years and death years of all people in Wikidata history who were French citizens.
After attempting to run the query below...
SELECT DISTINCT ?item ?itemLabel ?yob ?yod ?birthplaceLabel ?coord ?countryLabel ?continentLabel ?occupationLabel ?citizenLabel WHERE {
?item wdt:P27 wd:Q142;
wdt:P569 ?birthdate;
wdt:P570 ?deathdate;
wdt:P19 ?birthplace;
optional {?birthplace wdt:P17 ?country .}
optional {?birthplace wdt:P30 ?continent .}
BIND(YEAR(?birthdate) as ?yob) .
BIND(YEAR(?deathdate) as ?yod) .
optional {?birthplace wdt:P625 ?coord .}
optional {?item wdt:P27 ?citizen .}
optional {?item wdt:P106 ?occupation .}
SERVICE wikibase:label { bd:serviceParam wikibase:language "en". }
}
ORDER BY ?yob
(I ordered by yob because my work prioritizes figures from earlier history, as I already have data of Chinese figures from the Song through the Qing dynasties)
... I get a query timeout, probably because I imagine there is so much data. I would appreciate any tips as to how to improve my query optimization so that I can query the data I need.
Although the query ran when I replaced P27, country of citizenship, with P19, place of birth. But querying people by country of citizenship is more suited to my research. After this feat, I plan to query lifespan data from other European countries, probably Germany, Spain, Italy.... Ideally I'd want as much European data as possible. Would also appreciate any tips on this end.
Thank you in advance.

Related

`wd:Q887 wdt:P31 ?x` does not list that Yekaterinburg is a "big city". Why? [duplicate]

While examining the results of the official example query "Continents, countries, regions and capitals" (on https://query.wikidata.org/, limited to Germany for your convenience here: link), I noticed that some capitals of German federal states were missing. For example Wiesbaden as capital of Hesse. I noticed that Wiesbaden is an instance of big city, but not of city (see https://www.wikidata.org/wiki/Q1721), in contrast to some other cities. I was able to alleviate the problem by also including cities that are subclasses of city by changing line 17 to ?city wdt:P31/wdt:P279? wd:Q515.
One of the four cities that are still missing is Magdeburg, the capital of Saxony-Anhalt.
The diagnostic query
SELECT ?cityLabel ?props
WHERE {
?city wdt:P31 ?props.
FILTER(?city = wd:Q1733 || ?city = wd:Q1726).
SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en". }
}
shows that Magdeburg is not even an instance of city, although it clearly is according to its Wikidata page https://www.wikidata.org/wiki/Q1733.
I am new to Wikidata and SPARQL. However, this seems wrong to me. What can I do to get all capitals of the german federal states? And what is the reason for this behaviour?
These missing statements are not truthy:
SELECT ?statement ?valueLabel ?rank ?best
WHERE {
wd:Q1733 p:P31 ?statement.
?statement ps:P31 ?value .
?statement wikibase:rank ?rank .
OPTIONAL { ?statement a ?best . }
SERVICE wikibase:label { bd:serviceParam wikibase:language "en". }
}
Try it!
They are normal-rank statements, but there is a preferred-rank statement.
Truthy statements represent statements that have the best
non-deprecated rank for given property. Namely, if there is a
preferred statement for property P2, then only preferred statements
for P2 will be considered truthy. Otherwise, all normal-rank
statements for P2 are considered truthy.
Update
I have decreased the rank of the preferred statement just now. Please test your query again.

Wikidata Query: Find American authors of children’s fiction

I want to find all children's fiction writers using Wikidata SPARQL query. But I couldn't figure out how? Can someone help, please? The following is my approach but I don't think it is the correct way.
SELECT ?item ?itemLabel {
?item wdt:P31 wd:Q5. #find humans
?item wdt:P106 wd: #humans whose occupation is a novelist
[another condition needed] #children's fiction.
SERVICE wikibase:label {bd:serviceParam wikibase:language 'en'.}
} LIMIT 10
There is not one correct way, especially not in Wikidata where not all items of the same kind necessarily have the same properties.
One way would be to find the authors of works that are intended for (P2360) children:
# it’s a literary work (incl. any sublasses)
?book wdt:P31/wdt:P279* wd:Q7725634 .
# the literary work is intended for children
?book wdt:P2360 wd:Q7569 .
# the literary work has an author
?book wdt:P50 ?author .
# the author is a US citizen
?author wdt:P27 wd:Q30 .
Instead of getting all works that belong to the class "literary work" or any of its subclasses, you could decide to use only the class "fiction literature" (Q38072107) instead; with the risk that not all relevant works use this class.
Another way would be to find all authors that have "children’s writer" (Q4853732), or any of its subclasses, as occupation:
?author wdt:P106/wdt:P279* wd:Q4853732 .
?author wdt:P27 wd:Q30 .
As the different ways might find different results, you could could use them in the same query, using UNION:
SELECT DISTINCT ?author ?authorLabel
WHERE {
{
# way 1
}
UNION
{
# way 2
}
UNION
{
# way 3
}
SERVICE wikibase:label {bd:serviceParam wikibase:language 'en'.}
}

Inconsistent Wikidata query results [duplicate]

While examining the results of the official example query "Continents, countries, regions and capitals" (on https://query.wikidata.org/, limited to Germany for your convenience here: link), I noticed that some capitals of German federal states were missing. For example Wiesbaden as capital of Hesse. I noticed that Wiesbaden is an instance of big city, but not of city (see https://www.wikidata.org/wiki/Q1721), in contrast to some other cities. I was able to alleviate the problem by also including cities that are subclasses of city by changing line 17 to ?city wdt:P31/wdt:P279? wd:Q515.
One of the four cities that are still missing is Magdeburg, the capital of Saxony-Anhalt.
The diagnostic query
SELECT ?cityLabel ?props
WHERE {
?city wdt:P31 ?props.
FILTER(?city = wd:Q1733 || ?city = wd:Q1726).
SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en". }
}
shows that Magdeburg is not even an instance of city, although it clearly is according to its Wikidata page https://www.wikidata.org/wiki/Q1733.
I am new to Wikidata and SPARQL. However, this seems wrong to me. What can I do to get all capitals of the german federal states? And what is the reason for this behaviour?
These missing statements are not truthy:
SELECT ?statement ?valueLabel ?rank ?best
WHERE {
wd:Q1733 p:P31 ?statement.
?statement ps:P31 ?value .
?statement wikibase:rank ?rank .
OPTIONAL { ?statement a ?best . }
SERVICE wikibase:label { bd:serviceParam wikibase:language "en". }
}
Try it!
They are normal-rank statements, but there is a preferred-rank statement.
Truthy statements represent statements that have the best
non-deprecated rank for given property. Namely, if there is a
preferred statement for property P2, then only preferred statements
for P2 will be considered truthy. Otherwise, all normal-rank
statements for P2 are considered truthy.
Update
I have decreased the rank of the preferred statement just now. Please test your query again.

Wikidata SPARQL query not returning a known match [duplicate]

While examining the results of the official example query "Continents, countries, regions and capitals" (on https://query.wikidata.org/, limited to Germany for your convenience here: link), I noticed that some capitals of German federal states were missing. For example Wiesbaden as capital of Hesse. I noticed that Wiesbaden is an instance of big city, but not of city (see https://www.wikidata.org/wiki/Q1721), in contrast to some other cities. I was able to alleviate the problem by also including cities that are subclasses of city by changing line 17 to ?city wdt:P31/wdt:P279? wd:Q515.
One of the four cities that are still missing is Magdeburg, the capital of Saxony-Anhalt.
The diagnostic query
SELECT ?cityLabel ?props
WHERE {
?city wdt:P31 ?props.
FILTER(?city = wd:Q1733 || ?city = wd:Q1726).
SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en". }
}
shows that Magdeburg is not even an instance of city, although it clearly is according to its Wikidata page https://www.wikidata.org/wiki/Q1733.
I am new to Wikidata and SPARQL. However, this seems wrong to me. What can I do to get all capitals of the german federal states? And what is the reason for this behaviour?
These missing statements are not truthy:
SELECT ?statement ?valueLabel ?rank ?best
WHERE {
wd:Q1733 p:P31 ?statement.
?statement ps:P31 ?value .
?statement wikibase:rank ?rank .
OPTIONAL { ?statement a ?best . }
SERVICE wikibase:label { bd:serviceParam wikibase:language "en". }
}
Try it!
They are normal-rank statements, but there is a preferred-rank statement.
Truthy statements represent statements that have the best
non-deprecated rank for given property. Namely, if there is a
preferred statement for property P2, then only preferred statements
for P2 will be considered truthy. Otherwise, all normal-rank
statements for P2 are considered truthy.
Update
I have decreased the rank of the preferred statement just now. Please test your query again.

Some cities aren't instances of city or big city? Odd behaviour of Wikidata

While examining the results of the official example query "Continents, countries, regions and capitals" (on https://query.wikidata.org/, limited to Germany for your convenience here: link), I noticed that some capitals of German federal states were missing. For example Wiesbaden as capital of Hesse. I noticed that Wiesbaden is an instance of big city, but not of city (see https://www.wikidata.org/wiki/Q1721), in contrast to some other cities. I was able to alleviate the problem by also including cities that are subclasses of city by changing line 17 to ?city wdt:P31/wdt:P279? wd:Q515.
One of the four cities that are still missing is Magdeburg, the capital of Saxony-Anhalt.
The diagnostic query
SELECT ?cityLabel ?props
WHERE {
?city wdt:P31 ?props.
FILTER(?city = wd:Q1733 || ?city = wd:Q1726).
SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en". }
}
shows that Magdeburg is not even an instance of city, although it clearly is according to its Wikidata page https://www.wikidata.org/wiki/Q1733.
I am new to Wikidata and SPARQL. However, this seems wrong to me. What can I do to get all capitals of the german federal states? And what is the reason for this behaviour?
These missing statements are not truthy:
SELECT ?statement ?valueLabel ?rank ?best
WHERE {
wd:Q1733 p:P31 ?statement.
?statement ps:P31 ?value .
?statement wikibase:rank ?rank .
OPTIONAL { ?statement a ?best . }
SERVICE wikibase:label { bd:serviceParam wikibase:language "en". }
}
Try it!
They are normal-rank statements, but there is a preferred-rank statement.
Truthy statements represent statements that have the best
non-deprecated rank for given property. Namely, if there is a
preferred statement for property P2, then only preferred statements
for P2 will be considered truthy. Otherwise, all normal-rank
statements for P2 are considered truthy.
Update
I have decreased the rank of the preferred statement just now. Please test your query again.