Wikidata Sparql: how to access award point of time and work from a person - sparql

I try to adapt a Freebase-based quiz generator to Wikidata since Freebase closed.
I have a lot of trouble doing so, for now I'm stuck with a simple problem:
How can I have the awards winning date and for starting from a person?
Example: I want to have 2016 and The Revenant for Leonardo Dicaprio.
I tried several requests like this one:
SELECT ?id ?idLabel ?date ?forWork
WHERE {
wd:Q38111 wdt:P166 ?id .
?id wdt:P585 ?date .
?id wdt:1411 ?forWork .
SERVICE wikibase:label { bd:serviceParam wikibase:language "fr" }
}
The problem is that the point of date (wdt:P58) is linked to award received (P166), and not Leonardo DiCaprio nor Academy Award for Best Actor.
Those information are available on the Leonardo DiCaprio page (as sub part of awards received)
Another problem I have is to access all data of Leonardo from his name as a string and not an id.

As all the data seems to be in the qualifiers, I came up with something like this:
SELECT ?actor ?actorLabel ?award ?awardLabel ?date ?forWork ?forWorkLabel
WHERE
{
# find a human
?actor wdt:P31 wd:Q5 .
# with English label "Leonardo DiCaprio"
?actor rdfs:label "Leonardo DiCaprio"#en .
# Now comes the statements/qualifiers magic:
# just applying what the documentation says https://www.wikidata.org/wiki/Wikidata:SPARQL_query_service/queries#Working_with_qualifiers
# using this query as example https://www.wikidata.org/wiki/Wikidata:SPARQL_query_service/queries#US_presidents_and_their_spouses.2C_in_date_order
?actor p:P166 ?awardstatement .
?awardstatement ps:P166 ?award .
?awardstatement pq:P585 ?date .
?awardstatement pq:P1686 ?forWork .
SERVICE wikibase:label { bd:serviceParam wikibase:language "en,fr" . }
}
You can try it here and get the data there

Related

Wikidata Query: Find American authors of children’s fiction

I want to find all children's fiction writers using Wikidata SPARQL query. But I couldn't figure out how? Can someone help, please? The following is my approach but I don't think it is the correct way.
SELECT ?item ?itemLabel {
?item wdt:P31 wd:Q5. #find humans
?item wdt:P106 wd: #humans whose occupation is a novelist
[another condition needed] #children's fiction.
SERVICE wikibase:label {bd:serviceParam wikibase:language 'en'.}
} LIMIT 10
There is not one correct way, especially not in Wikidata where not all items of the same kind necessarily have the same properties.
One way would be to find the authors of works that are intended for (P2360) children:
# it’s a literary work (incl. any sublasses)
?book wdt:P31/wdt:P279* wd:Q7725634 .
# the literary work is intended for children
?book wdt:P2360 wd:Q7569 .
# the literary work has an author
?book wdt:P50 ?author .
# the author is a US citizen
?author wdt:P27 wd:Q30 .
Instead of getting all works that belong to the class "literary work" or any of its subclasses, you could decide to use only the class "fiction literature" (Q38072107) instead; with the risk that not all relevant works use this class.
Another way would be to find all authors that have "children’s writer" (Q4853732), or any of its subclasses, as occupation:
?author wdt:P106/wdt:P279* wd:Q4853732 .
?author wdt:P27 wd:Q30 .
As the different ways might find different results, you could could use them in the same query, using UNION:
SELECT DISTINCT ?author ?authorLabel
WHERE {
{
# way 1
}
UNION
{
# way 2
}
UNION
{
# way 3
}
SERVICE wikibase:label {bd:serviceParam wikibase:language 'en'.}
}

Get all properties, sub-properties and label IDs from Wikidata item

I am trying to write a query that returns all possible information from a wikidata page, as https://www.wikidata.org/wiki/Q1299.
Ideally I would like to retrieve all info that are present in that page in english language.
So I am trying this query:
SELECT ?wdLabel ?ps_Label ?wdpqLabel ?pq_Label WHERE {
VALUES ?artist {
wd:Q1299
}
?artist ?p ?statement.
?statement ?ps ?ps_.
?wd wikibase:claim ?p;
wikibase:statementProperty ?ps.
OPTIONAL {
?statement ?pq ?pq_.
?wdpq wikibase:qualifier ?pq.
}
SERVICE wikibase:label { bd:serviceParam wikibase:language "en". }
}
which works quite fine but I would like to retrieve all wikidata ids on the ps_Label column.
For example here I have Paul McCartney as string and I would like to also have the wikidata ID associated to that item, which is Q2599
has part Paul McCartney start time 1960-01-01T00:00:00Z
has part Paul McCartney end time 1970-01-01T00:00:00Z
has part Paul McCartney object has role singer
has part Paul McCartney object has role instrumentalist
Something similar to this other below but I can't merge the two together as I am missing some hint on sub-properties here.
SELECT ?propUrl ?propLabel ?valUrl ?valLabel WHERE {
wd:Q1299 ?propUrl ?valUrl.
?property ?ref ?propUrl;
rdf:type wikibase:Property;
rdfs:label ?propLabel.
?valUrl rdfs:label ?valLabel.
FILTER((LANG(?valLabel)) = "en")
FILTER((LANG(?propLabel)) = "en")
}
ORDER BY (?propUrl) (?valUrl)
Thanks a lot for your help!
This isn’t exactly what you’re asking for, but if you really just want all data on a single object, it is far easier and more economical to use the data access provided by https://www.wikidata.org/wiki/Special:EntityData/Q2599.json

Get information from Wikidata

I have this Wikidata query that returns all the football stadiums with the names, coordinates, club labels and stuff like this. But I cannot figure out how to also get the country and city names where stadiums are located (and possibly the coordinates of the cities too).
Here is my query:
SELECT ?club ?clubLabel ?venue ?venueLabel ?coordinates
WHERE
{
?club wdt:P31 wd:Q476028 .
?club wdt:P115 ?venue .
?venue wdt:P625 ?coordinates .
SERVICE wikibase:label { bd:serviceParam wikibase:language "en" }
}
Link to test the query
EDIT 19th november 2020:
I need the timezone of the cities so I tried this query after looking at the documentation but it does not return the value. Just links like "wd:Q6723" :
SELECT DISTINCT ?timezone ?club ?locationLabel ?countryLabel ?clubLabel ?venue ?venueLabel ?coordinates
WHERE
{
?venue (wdt:P421|wd:Q12143) ?timezone .
?club wdt:P31 wd:Q476028 .
?club wdt:P115 ?venue .
?venue wdt:P625 ?coordinates .
OPTIONAL {?club wdt:P159|(wdt:P115/(wdt:P131|wdt:P276)) ?location .
OPTIONAL { ?location wdt:P17 ?country . }
}
SERVICE wikibase:label { bd:serviceParam wikibase:language "en" }
} limit 500
Split over two now. Original query:
SELECT DISTINCT ?club ?locationLabel ?countryLabel ?clubLabel ?venue
?venueLabel ?coordinates
WHERE {
?club wdt:P31 wd:Q476028 .
?club wdt:P115 ?venue .
?venue wdt:P625 ?coordinates .
OPTIONAL {
?club wdt:P159|(wdt:P115/(wdt:P131|wdt:P276)) ?location .
OPTIONAL { ?location wdt:P17 ?country . }
}
SERVICE wikibase:label { bd:serviceParam wikibase:language "en" }
} limit 500
(Update #2: Previously, I asked for the club's timezone. But, of course, that's not the sort of data usually recorded for a club. Instead, you have to go via the location/venue/headquarters or similar, and possibly a level up to region/country because some suburb also doesn't have timezone data.
This is the general idea how it should work, but it's running into a timeout, and so am I:
SELECT DISTINCT ?timezone ?timezoneLabel ?offset
?club ?clubLabel
WHERE {
?club wdt:P31 wd:Q476028 .
# via country. not perfect, because some have multiple timezones, but shoud be faster
?club wdt:P17/wdt:P421 ?timezone .
# what I really want to do; all sorts of alternatives
#?club wdt:P115?/(wdt:P159|wdt:P276)/wdt:P131?/wdt:P421 ?timezone .
?timezone wdt:P2907 ?offset.
SERVICE wikibase:label { bd:serviceParam wikibase:language "en" }
} limit 500
Short explanation:
This uses three new things. OPTIONAL makes the following statement, well, optional. Clubs where nothing can be found will still be included in the output. The second OPTIONAL ist nested in the first, as it's pointless to ask for the country of a location that we haven't found.
The pipe symbol (|) allows for alternatives. Here, I'm asking for "headquarter location (P159) or check for two different ways to specify the location of the stadium. The slash, used in the latter case, denotes a path (club / venue / "located in district|location").
If there is missing data (there will be missing data), you may want to look at examples and figure out if there are other common patterns that locations are recorded. You could, for example, move the inner OPTIONAL outside for cases where the club has a country statement but no other, more specific, location.
Update: I've included the timezone as requested in the comment. To note:
?timezoneLabel gets the timezone's label (= name), just as ?clubLabel gets the club's. The apppended "...Label" is a "magic" function that translates from IDs to huma-readable labels. It is enabled by including that SERVICE wikibase:label... line.
As you might want to use these timezones, I've included the marked line that gets the numeric offset in hours.
The offset may vary because UTC doesn't have dalight savings time. There should be multiple lines in the results for such cases, and you would need to read the ''qualifiers'' to see when they apply. Alternatively, maybe substract the offset from some other timezone's offset (i. e. yours) and you might get lucky and they cancel out.

Easiest way to query if a name exists in Wikidata?

I'm trying to create the simplest possible query to check if a name exists in Wikidata. For example, I just to want to see if the common name "Jack Smith" is in Wikidata.
Following the example query in this StackOverflow answer, I created the following SPARQL query (run it):
SELECT distinct ?item ?itemLabel ?itemDescription WHERE{
?item ?label "Jack Smith" .
?article schema:about ?item .
?article schema:inLanguage "en" .
?article schema:isPartOf <https://en.wikipedia.org/>.
SERVICE wikibase:label { bd:serviceParam wikibase:language "en". }
} LIMIT 10
However, it returns zero results.
On the other hand, if I search for "Jack Smith" through the Wikidata search webpage, I get back 1200+ results with a lot of people named "Jack Smith".
The behavior with the SPARQL interface is inconsistent. When I run a SPARQL search for "Abraham Lincoln" (run it), I correctly get back an entry for the American president by that name.
My specific questions are:
Why is my SPARQL query returning zero results for "Jack Smith"?
What is the simplest API call to check if a name (e.g. Jack Smith) exists in Wikidata? It could be SPARQL or some other REST API.
Thank you.
The query returns zero results because there is no Jack Smith about whom there is an article in English and is part of Wikipedia.org.
Every line in the body of a SPARQL query is a way to restrict the search as opposed to just a way to add new variables.
Look at this query:
SELECT ?person ?dateOfDeath
WHERE {
?person a :Person .
?person :date_of_death ?dateOfDeath .
}
This would only return people who have a date of death, i.e. who are dead, as well as the date.
If you wanted to return people, and optionally their date of death, i.e. if they are still alive, you don't want a date, but you do want the person, then use this:
SELECT ?person ?dateOfDeath
WHERE {
?person a :Person .
OPTIONAL {?person :date_of_death ?dateOfDeath }
}
In terms of your second question, I'd try something like this:
SELECT ?boolean
WHERE{
BIND(EXISTS{?item ?label "Jack Smith"} AS ?boolean)
}
This can of course also be issued as, say, a cURL request:
curl https://query.wikidata.org/bigdata/namespace/wdq/sparql -X POST --data 'query=SELECT%20%3Fboolean%0AWHERE%7B%0ABIND%28EXISTS%7B%3Fitem%20%3Flabel%20%22Jack%20Smith%22%7D%20AS%20%3Fboolean%29%0A%20%20%7D'

How to check for a sub-property at all levels expanded from a SPARQL * wildcard?

In Wikidata, I want to find an item's country. Either directly if the item has a country directly, or by climbing up the P131s (located in the administrative territorial entity) until I find a country. Here is the query:
?item wdt:P131*/wdt:P17 ?country.
The query above works fine... except when a sub-division used to belong to another country, like for Q25270 (Prishtina). In such case, the result can be anachronistic. That's what I want to fix.
Great news: in such cases we should only consider the unique P131 (located in the administrative territorial entity) that has no P582 (end time) sub-property attached to it, and the problem is solved!
My question: how to alter my query above to achieve that?
Example: Let's say MyItem is in MyStreet is in MyTown is in MyRegion is in MyCountry, I must make sure that MyStreet, MyTown, and MyRegion do not have a P582 (end time).
(If "sub-property" is not the correct term, please let me know the right term and I will fix the question, thanks!)
An attempt
The query below works in most cases, but unfortunately it has a bug: It finds the wrong country in cases where the current country was also the country in the past (for instance Alsace belonged to France until 1871 then to Germany and currently to France again).
SELECT DISTINCT ?country WHERE {
wd:Q6556803 wdt:P131* ?area .
?area wdt:P17 ?country .
OPTIONAL {
wd:Q6556803 wdt:P131*/p:P131 [
pq:P582 ?endTime; ps:P131/wdt:P131* ?area
] .
} .
FILTER( !BOUND( ?endTime ) ) .
}
Wikidata uses different properties for direct links and links with extra information. So, for the statement "Prishtina is located in the administrative territorial entity Socialist Autonomous Province of Kosovo", there's the simple triple:
wd:Q25270 wdt:P131 wd:Q646035
And the long form with additional information (the end time):
wd:Q25270 p:P131 wds:Q25270-7df79cec-4938-8b6d-4e11-4dde6f72d73b .
wds:Q25270-7df79cec-4938-8b6d-4e11-4dde6f72d73b ps:P131 wd:Q646035 ;
pq:P582 "1990-01-01T00:00:00Z"
So, we need to filter out all paths with an end time (pq:582):
SELECT DISTINCT ?s ?sLabel ?country ?countryLabel {
VALUES ?s {
wd:Q25270
}
?s wdt:P131* ?area .
?area wdt:P17 ?country .
FILTER NOT EXISTS {
?s p:P131/(ps:P131/p:P131)* ?statement .
?statement ps:P131 ?area .
?s p:P131/(ps:P131/p:P131)* ?intermediateStatement .
?intermediateStatement (ps:P131/p:P131)* ?statement .
?intermediateStatement pq:P582 ?endTime .
}
SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en" }
}
limit 50
Here, ?intermediateStatement is a statement with an end time on the path from ?s to a country.
This query does seem to time out if there is more than one value set for ?s. Also, the query does not take into account that there might exist multiple links from an item to an area where one has a timestamp and the other doesn't (both paths will be filtered out).