SPARQL query on Wikidata - How to get people who are alive while there is no isDead property? Also, how to identity people as they belong to a country - sparql

How to get the following data from Wikidata using SPARQL?
Result set 1:
List of all people who are male
Result set 2:
Take Result Set 1 as the source and then apply filter to get only the Men who are alive
Can we say if a person is alive if the dateOfDeath is not present?
SELECT DISTINCT ?person ?personLabel
WHERE
{
?person wdt:P31 wd:Q5 . # human
?person wdt:P21 wd:Q6581097 . # man
FILTER NOT EXISTS { ?person wdt:P570 [] } # alive
SERVICE wikibase:label {bd:serviceParam wikibase:language "fr,en" }
}
LIMIT 20
Result set 3:
Take Result Set 2 as the source and then apply filter to get only the Living-Men who are South Africans (a player might be from other country and played for South Africa in International Cricket. Those players should be excluded)
I don't suppose just Country of Citizenship would suffice.
SELECT DISTINCT ?person ?personLabel
WHERE
{
?person wdt:P31 wd:Q5 . # human
?person wdt:P21 wd:Q6581097 . # man
FILTER NOT EXISTS { ?person wdt:P570 [] } # alive
?person wdt:P27 wd:Q258 . # Country of citizenship - South Africa
SERVICE wikibase:label {bd:serviceParam wikibase:language "fr,en" }
}
LIMIT 20
Result set 4:
Take Result Set 3 as the source and then apply filter to get only the Living-South-African-Men who are Cricketers (a cricketer might also be a politician. Those players should be included)
Result set 5:
Take Result Set 4 as the source and then apply filter to get only the Living-South-African-Male-Cricketers who are below 30 years of age (as on today)
I want these 5 queries to be executed in the exact same order. Could you help me get these results, please?

Related

How to get the number of languages a Wikipedia page is available in? (SPARQL query)

I'm trying to have a list of Italian books from the 1980 on and the number of Wikipedia pages their original Wikipedia page has been translated into, for instance I would like to have:
Book, number
The name of the Rose, 5
Where 5 is the number of languages The Name of the Rose has been translated into in Wikipedia, for instance there is an English wiki page, a Dutch one, a Spanish one, a French one, a Greek one.
Here is the query so far:
SELECT ?item ?label
WHERE
{
VALUES ?type {wd:Q571 wd:Q7725634} # book or literary work
?item wdt:P31 ?type .
?item wdt:P577 ?date FILTER (?date > "1980-01-01T00:00:00Z"^^xsd:dateTime) . #dal 1980
?item rdfs:label ?label filter (lang(?label) = "it")
?item wdt:P495 wd:Q38 .
}
I get the list of books, but I can't find the right property to look for.
Did you actually get The Name of the Rose in your example? Because its date of publication is set to 1980 (which is = 1980-01-01 for our purposes), I had to change your query to a >= comparison.
Then, using COUNT() and GROUP BY as mentioned in the comment gets you what you want. But if you really just need the number of sitselinks, there is a shortcut that may be useful. It was added because that number is often used as a good proxy for an item's popularity, and using the precomputed number is vastly more efficient than getting all links, grouping, and counting.
SELECT ?book ?bookLabel ?sitelinks ?date WHERE {
VALUES ?type { wd:Q571 wd:Q47461344 wd:Q7725634 }
?book wdt:P31 ?type;
wdt:P577 ?date;
wdt:P495 wd:Q38;
wikibase:sitelinks ?sitelinks.
FILTER((?date >= "1980-01-01T00:00:00Z"^^xsd:dateTime) && (?date < "1981-01-01T00:00:00Z"^^xsd:dateTime))
SERVICE wikibase:label { bd:serviceParam wikibase:language "it,en". }
}
Query
Note that the values may slightly differ from the version with group & count because here sites such as commons or wikiquote are also included.

How can I use SPARQL to "join table A to table B" and take only those rows in B with a MAX value?

The following SPARQL queries Wikidata for all European countries (sovereign states, Q3624078) and loads for each country the last few known population numbers (P:1082 population):
SELECT DISTINCT ?item $itemLabel ?population ?popDate
WHERE {
?item wdt:P31 wd:Q3624078. #select only sovereign states
?item wdt:P706/wdt:P361*|wdt:P361*|wdt:P30 wd:Q46. #select items which are geographically in Europe
?item p:P31 ?statement.
?statement ps:P31 wd:Q3624078.
FILTER NOT EXISTS { ?statement pq:P582 ?end. } #filter out items which are not sovereign states anymore
FILTER NOT EXISTS { ?item wdt:P31 wd:Q3024240. } #filter out historical countries
OPTIONAL {
?item p:P1082 ?popStatement.
?popStatement ps:P1082 ?population;
pq:P585 ?popDate.
FILTER ( YEAR(?popDate) >= 2014 )
}
SERVICE wikibase:label { bd:serviceParam wikibase:language "en". }
}
ORDER BY ASC($itemLabel)
You can test it here.
How can I limit the population data to the most recent population number?
I tried it by limiting it to a specific year (FILTER (YEAR(?popDate) = 2018), but this does not work because some countries don't have a population estimate for 2018.
This is similar to an SQL query like SELECT * FROM country c INNER JOIN population p ON p.CountryId = c.Id. In SQL, you would need to create a subquery with a MAX statement to find the most recent population data by country. How can I achieve the same in SPARQL?

SPARQL query returning much more rows than expected

I'm using the following SPARQL query to get a list of all US presidents together with the start and end dates of their presidency:
SELECT ?person $personLabel $start $end
WHERE {
?person wdt:P39 wd:Q11696.
?person p:P39 ?statement.
?position_held_statement ps:P39 wd:Q11696.
?statement pq:P580 ?start.
?statement pq:P582 ?end
SERVICE wikibase:label { bd:serviceParam wikibase:language "en". }
}
ORDER BY DESC($start)
You can try it out here
Why does it return so many rows?
EDIT: I know I can use SELECT DISTINCT to get only distinct results, but I want to learn how I can find out the reason for the duplicates. Moreover, there are entries which state that Barack Obama's presidency lasted from January 2009 to January 2017, while others state his two terms separately.
You have ?position_held_statement there. That should be ?statement.
When you unexpectedly have a lot of results in SPARQL, it’s usually because of mismatched variable names turning what should be a join into a cartesian product.

Wikidata SPARQL - Countries and their (still existing) neighbours

I want to query the neighbours to a country with SPARQL from Wikidata like this:
SELECT ?country ?countryLabel WHERE {
?country wdt:P47 wd:Q183 .
FILTER NOT EXISTS{ ?country wdt:P576 ?date } # don't count dissolved country - at least filters German Democratic Republic
SERVICE wikibase:label {
bd:serviceParam wikibase:language "en" .
}
}
My issue is that e.g. in this example for neighbours of germany there are still countries shown which does not exist anymore like:
Kingdom of Denmark or
Saarland.
Already tried
I could already reduce the number by the FILTER statement.
Question
How to make the statement to reduce it to 9 countries?
(also dividing in land boarder and sea boarder would be great)
Alternative
Filtering at this API would be also fine for me https://www.wikidata.org/w/api.php?action=wbgetentities&ids=Q35
a database or lists or prepared HashMaps whatever with all countries of the world with neighbours
You could check entities of type wd:Q133346 ('border') or wd:Q12413618 ('international border'):
SELECT ?border ?borderLabel ?country1Label ?country2Label ?isLandBorder ?isMaritimeBorder ?constraint {
VALUES (?country1) {(wd:Q183)}
?border wdt:P31 wd:Q12413618 ;
wdt:P17 ?country1 , ?country2 .
FILTER (?country1 != ?country2)
BIND (EXISTS {?border wdt:P31 wd:Q15104814} AS ?isLandBorder)
BIND (EXISTS {?border wdt:P31 wd:Q3089219} AS ?isMaritimeBorder)
BIND ((?isLandBorder || ?isMaritimeBorder) AS ?constraint)
SERVICE wikibase:label { bd:serviceParam wikibase:language "en". }
} ORDER BY ?country1Label
Try it!
In some sense, records are duplicated: for the Afghanistan–Uzbekistan border, the list contains both (?country1=Afganistan,?country2=Uzbekistan) and (?country1=Uzbekistan,?country2=Afganistan).
a database or lists or prepared HashMaps whatever with all countries of the world with neighbours
You could ask on https://opendata.stackexchange.com.

Wikidata Sparql: how to get a list of actors that where never directed by a given person?

Let's say I want a list of actors that were never directed by Tim Burton among a list of popular movies.
I tried to do it with this steps:
Select all actors that Tim Burton ever directed (sub select)
Select a list of actors from a list of popular movies (by imdb ids)
Exclude all actors from the first selection in the second selection (NOT IN)
Here is a code I tried that do not works (the NOT IN fail, I don't know why):
SELECT DISTINCT ?actor ?actorLabel
WHERE {
?film wdt:P31 wd:Q11424
;wdt:P161 ?actor
;wdt:P345 ?imdbId .
{
SELECT ?excludeActors
WHERE {
?film wdt:P31 wd:Q11424
; wdt:P57 wd:Q56008
; wdt:P161 ?excludeActors .
}
} .
FILTER(?actor NOT IN (?excludeActors)) .
FILTER(?imdbId = "tt1077368" || ?imdbId = "tt0167260") .
SERVICE wikibase:label { bd:serviceParam wikibase:language "fr" }
}
Or follow this link
(there is a filter on Christopher Lee that you can remove [last one], it is used to highlight what I explain here:)
In this code I have two movies: Dark Shadows (directed by Tim Burton) and The Lord of the Rings 3. In this example Christopher Lee is present in both movies, which means he should be excluded since Tim Burton directed him in Dark Shadows.
You can see that he his in the list.
I really don't understand why the NOT IN fail with the sub select. I tried the sub Select request and I found Christopher Lee inside which means he should be excluded.
If I understood correctly, you want all actors that acted in the given movies, but have never acted in any movie directed by Tim Burton. I would use FILTER NOT EXISTS:
SELECT DISTINCT ?actor ?actorLabel
WHERE {
VALUES ?imdbId { "tt1077368" "tt0167260" }
?film wdt:P31 wd:Q11424
;wdt:P161 ?actor
;wdt:P345 ?imdbId .
FILTER NOT EXISTS {
[] wdt:P31 wd:Q11424
; wdt:P57 wd:Q56008
; wdt:P161 ?actor .
}
SERVICE wikibase:label { bd:serviceParam wikibase:language "fr" }
}
LIMIT 100