Wikidata Query: Find American authors of children’s fiction - sparql

I want to find all children's fiction writers using Wikidata SPARQL query. But I couldn't figure out how? Can someone help, please? The following is my approach but I don't think it is the correct way.
SELECT ?item ?itemLabel {
?item wdt:P31 wd:Q5. #find humans
?item wdt:P106 wd: #humans whose occupation is a novelist
[another condition needed] #children's fiction.
SERVICE wikibase:label {bd:serviceParam wikibase:language 'en'.}
} LIMIT 10

There is not one correct way, especially not in Wikidata where not all items of the same kind necessarily have the same properties.
One way would be to find the authors of works that are intended for (P2360) children:
# it’s a literary work (incl. any sublasses)
?book wdt:P31/wdt:P279* wd:Q7725634 .
# the literary work is intended for children
?book wdt:P2360 wd:Q7569 .
# the literary work has an author
?book wdt:P50 ?author .
# the author is a US citizen
?author wdt:P27 wd:Q30 .
Instead of getting all works that belong to the class "literary work" or any of its subclasses, you could decide to use only the class "fiction literature" (Q38072107) instead; with the risk that not all relevant works use this class.
Another way would be to find all authors that have "children’s writer" (Q4853732), or any of its subclasses, as occupation:
?author wdt:P106/wdt:P279* wd:Q4853732 .
?author wdt:P27 wd:Q30 .
As the different ways might find different results, you could could use them in the same query, using UNION:
SELECT DISTINCT ?author ?authorLabel
WHERE {
{
# way 1
}
UNION
{
# way 2
}
UNION
{
# way 3
}
SERVICE wikibase:label {bd:serviceParam wikibase:language 'en'.}
}

Related

Wikidata COUNT(*) query times out

I have a straightforward query that counts how many humans have an English Wikipedia page.
prefix schema: <http://schema.org/>
PREFIX wikibase: <http://wikiba.se/ontology#>
PREFIX wd: <http://www.wikidata.org/entity/>
PREFIX wdt: <http://www.wikidata.org/prop/direct/>
SELECT ?item ?article
WHERE
{
?item wdt:P31 wd:Q5 . # Must be of a human
?article schema:about ?item ; # Must have a Wikipedia article
schema:inLanguage "en" ; # Article must be in English
schema:isPartOf <https://en.wikipedia.org/> . # Wikipedia article must be regular article
SERVICE wikibase:label { bd:serviceParam wikibase:language "en". } # Helps get the label in your language, if not, then en language
}
I get expected output as follows:
wd:Q11124 <https://en.wikipedia.org/wiki/Stephen_Breyer>
wd:Q10727 <https://en.wikipedia.org/wiki/Steve_Leo_Beleck>
wd:Q10065 <https://en.wikipedia.org/wiki/Taichang_Emperor>
wd:Q9605 <https://en.wikipedia.org/wiki/Sarah_Allen_(software_developer)>
However, if I change the SELECT statement from
SELECT ?item ?article
to
SELECT (count(?item) as ?count)
I get timeout error. Please note that the count statement works if I only specify "human" condition and exclude English Wiki article condition. So, clearly, some kind of background join is causing the query to timeout.
However, this is a fairly trivial join, so the query timeout is surprising.
Please let me know what may I be missing here.
Thanks!

Get information from Wikidata

I have this Wikidata query that returns all the football stadiums with the names, coordinates, club labels and stuff like this. But I cannot figure out how to also get the country and city names where stadiums are located (and possibly the coordinates of the cities too).
Here is my query:
SELECT ?club ?clubLabel ?venue ?venueLabel ?coordinates
WHERE
{
?club wdt:P31 wd:Q476028 .
?club wdt:P115 ?venue .
?venue wdt:P625 ?coordinates .
SERVICE wikibase:label { bd:serviceParam wikibase:language "en" }
}
Link to test the query
EDIT 19th november 2020:
I need the timezone of the cities so I tried this query after looking at the documentation but it does not return the value. Just links like "wd:Q6723" :
SELECT DISTINCT ?timezone ?club ?locationLabel ?countryLabel ?clubLabel ?venue ?venueLabel ?coordinates
WHERE
{
?venue (wdt:P421|wd:Q12143) ?timezone .
?club wdt:P31 wd:Q476028 .
?club wdt:P115 ?venue .
?venue wdt:P625 ?coordinates .
OPTIONAL {?club wdt:P159|(wdt:P115/(wdt:P131|wdt:P276)) ?location .
OPTIONAL { ?location wdt:P17 ?country . }
}
SERVICE wikibase:label { bd:serviceParam wikibase:language "en" }
} limit 500
Split over two now. Original query:
SELECT DISTINCT ?club ?locationLabel ?countryLabel ?clubLabel ?venue
?venueLabel ?coordinates
WHERE {
?club wdt:P31 wd:Q476028 .
?club wdt:P115 ?venue .
?venue wdt:P625 ?coordinates .
OPTIONAL {
?club wdt:P159|(wdt:P115/(wdt:P131|wdt:P276)) ?location .
OPTIONAL { ?location wdt:P17 ?country . }
}
SERVICE wikibase:label { bd:serviceParam wikibase:language "en" }
} limit 500
(Update #2: Previously, I asked for the club's timezone. But, of course, that's not the sort of data usually recorded for a club. Instead, you have to go via the location/venue/headquarters or similar, and possibly a level up to region/country because some suburb also doesn't have timezone data.
This is the general idea how it should work, but it's running into a timeout, and so am I:
SELECT DISTINCT ?timezone ?timezoneLabel ?offset
?club ?clubLabel
WHERE {
?club wdt:P31 wd:Q476028 .
# via country. not perfect, because some have multiple timezones, but shoud be faster
?club wdt:P17/wdt:P421 ?timezone .
# what I really want to do; all sorts of alternatives
#?club wdt:P115?/(wdt:P159|wdt:P276)/wdt:P131?/wdt:P421 ?timezone .
?timezone wdt:P2907 ?offset.
SERVICE wikibase:label { bd:serviceParam wikibase:language "en" }
} limit 500
Short explanation:
This uses three new things. OPTIONAL makes the following statement, well, optional. Clubs where nothing can be found will still be included in the output. The second OPTIONAL ist nested in the first, as it's pointless to ask for the country of a location that we haven't found.
The pipe symbol (|) allows for alternatives. Here, I'm asking for "headquarter location (P159) or check for two different ways to specify the location of the stadium. The slash, used in the latter case, denotes a path (club / venue / "located in district|location").
If there is missing data (there will be missing data), you may want to look at examples and figure out if there are other common patterns that locations are recorded. You could, for example, move the inner OPTIONAL outside for cases where the club has a country statement but no other, more specific, location.
Update: I've included the timezone as requested in the comment. To note:
?timezoneLabel gets the timezone's label (= name), just as ?clubLabel gets the club's. The apppended "...Label" is a "magic" function that translates from IDs to huma-readable labels. It is enabled by including that SERVICE wikibase:label... line.
As you might want to use these timezones, I've included the marked line that gets the numeric offset in hours.
The offset may vary because UTC doesn't have dalight savings time. There should be multiple lines in the results for such cases, and you would need to read the ''qualifiers'' to see when they apply. Alternatively, maybe substract the offset from some other timezone's offset (i. e. yours) and you might get lucky and they cancel out.

Wikidata examples and subclasses

The examples for counting instances at query.wikidata.org (and elsewhere) use the wdt:P31/wdt:P279* property path. For example, the number of humans in Wikidata:
SELECT (COUNT(?item) AS ?count)
WHERE {
?item wdt:P31/wdt:P279* wd:Q5 .
}
so that entities that are a subclass of human (I don't think this type exists, but think "nobel laureate", etc.) are included. But then other examples such as "women with most sitelinks and no image born in 1921 or later", which is truncated here:
SELECT ?s ?desc ?linkcount
WHERE
{
?s wdt:P31 wd:Q5 ; # human
wdt:P21 wd:Q6581072 ; # gender: female
wdt:P569 ?born .
FILTER (?born >= "1921-01-01T00:00:00Z"^^xsd:dateTime) .
...
don't use the wdt:P31/wdt:P279* property path. Are these generally oversights (or perhaps done for speed?), or is the subclass property path not actually needed in these cases and I'm too dense to see why?

How to check for a sub-property at all levels expanded from a SPARQL * wildcard?

In Wikidata, I want to find an item's country. Either directly if the item has a country directly, or by climbing up the P131s (located in the administrative territorial entity) until I find a country. Here is the query:
?item wdt:P131*/wdt:P17 ?country.
The query above works fine... except when a sub-division used to belong to another country, like for Q25270 (Prishtina). In such case, the result can be anachronistic. That's what I want to fix.
Great news: in such cases we should only consider the unique P131 (located in the administrative territorial entity) that has no P582 (end time) sub-property attached to it, and the problem is solved!
My question: how to alter my query above to achieve that?
Example: Let's say MyItem is in MyStreet is in MyTown is in MyRegion is in MyCountry, I must make sure that MyStreet, MyTown, and MyRegion do not have a P582 (end time).
(If "sub-property" is not the correct term, please let me know the right term and I will fix the question, thanks!)
An attempt
The query below works in most cases, but unfortunately it has a bug: It finds the wrong country in cases where the current country was also the country in the past (for instance Alsace belonged to France until 1871 then to Germany and currently to France again).
SELECT DISTINCT ?country WHERE {
wd:Q6556803 wdt:P131* ?area .
?area wdt:P17 ?country .
OPTIONAL {
wd:Q6556803 wdt:P131*/p:P131 [
pq:P582 ?endTime; ps:P131/wdt:P131* ?area
] .
} .
FILTER( !BOUND( ?endTime ) ) .
}
Wikidata uses different properties for direct links and links with extra information. So, for the statement "Prishtina is located in the administrative territorial entity Socialist Autonomous Province of Kosovo", there's the simple triple:
wd:Q25270 wdt:P131 wd:Q646035
And the long form with additional information (the end time):
wd:Q25270 p:P131 wds:Q25270-7df79cec-4938-8b6d-4e11-4dde6f72d73b .
wds:Q25270-7df79cec-4938-8b6d-4e11-4dde6f72d73b ps:P131 wd:Q646035 ;
pq:P582 "1990-01-01T00:00:00Z"
So, we need to filter out all paths with an end time (pq:582):
SELECT DISTINCT ?s ?sLabel ?country ?countryLabel {
VALUES ?s {
wd:Q25270
}
?s wdt:P131* ?area .
?area wdt:P17 ?country .
FILTER NOT EXISTS {
?s p:P131/(ps:P131/p:P131)* ?statement .
?statement ps:P131 ?area .
?s p:P131/(ps:P131/p:P131)* ?intermediateStatement .
?intermediateStatement (ps:P131/p:P131)* ?statement .
?intermediateStatement pq:P582 ?endTime .
}
SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en" }
}
limit 50
Here, ?intermediateStatement is a statement with an end time on the path from ?s to a country.
This query does seem to time out if there is more than one value set for ?s. Also, the query does not take into account that there might exist multiple links from an item to an area where one has a timestamp and the other doesn't (both paths will be filtered out).

Wikidata Sparql: how to access award point of time and work from a person

I try to adapt a Freebase-based quiz generator to Wikidata since Freebase closed.
I have a lot of trouble doing so, for now I'm stuck with a simple problem:
How can I have the awards winning date and for starting from a person?
Example: I want to have 2016 and The Revenant for Leonardo Dicaprio.
I tried several requests like this one:
SELECT ?id ?idLabel ?date ?forWork
WHERE {
wd:Q38111 wdt:P166 ?id .
?id wdt:P585 ?date .
?id wdt:1411 ?forWork .
SERVICE wikibase:label { bd:serviceParam wikibase:language "fr" }
}
The problem is that the point of date (wdt:P58) is linked to award received (P166), and not Leonardo DiCaprio nor Academy Award for Best Actor.
Those information are available on the Leonardo DiCaprio page (as sub part of awards received)
Another problem I have is to access all data of Leonardo from his name as a string and not an id.
As all the data seems to be in the qualifiers, I came up with something like this:
SELECT ?actor ?actorLabel ?award ?awardLabel ?date ?forWork ?forWorkLabel
WHERE
{
# find a human
?actor wdt:P31 wd:Q5 .
# with English label "Leonardo DiCaprio"
?actor rdfs:label "Leonardo DiCaprio"#en .
# Now comes the statements/qualifiers magic:
# just applying what the documentation says https://www.wikidata.org/wiki/Wikidata:SPARQL_query_service/queries#Working_with_qualifiers
# using this query as example https://www.wikidata.org/wiki/Wikidata:SPARQL_query_service/queries#US_presidents_and_their_spouses.2C_in_date_order
?actor p:P166 ?awardstatement .
?awardstatement ps:P166 ?award .
?awardstatement pq:P585 ?date .
?awardstatement pq:P1686 ?forWork .
SERVICE wikibase:label { bd:serviceParam wikibase:language "en,fr" . }
}
You can try it here and get the data there