Wikidata Sparql: how to get a list of actors that where never directed by a given person? - sparql

Let's say I want a list of actors that were never directed by Tim Burton among a list of popular movies.
I tried to do it with this steps:
Select all actors that Tim Burton ever directed (sub select)
Select a list of actors from a list of popular movies (by imdb ids)
Exclude all actors from the first selection in the second selection (NOT IN)
Here is a code I tried that do not works (the NOT IN fail, I don't know why):
SELECT DISTINCT ?actor ?actorLabel
WHERE {
?film wdt:P31 wd:Q11424
;wdt:P161 ?actor
;wdt:P345 ?imdbId .
{
SELECT ?excludeActors
WHERE {
?film wdt:P31 wd:Q11424
; wdt:P57 wd:Q56008
; wdt:P161 ?excludeActors .
}
} .
FILTER(?actor NOT IN (?excludeActors)) .
FILTER(?imdbId = "tt1077368" || ?imdbId = "tt0167260") .
SERVICE wikibase:label { bd:serviceParam wikibase:language "fr" }
}
Or follow this link
(there is a filter on Christopher Lee that you can remove [last one], it is used to highlight what I explain here:)
In this code I have two movies: Dark Shadows (directed by Tim Burton) and The Lord of the Rings 3. In this example Christopher Lee is present in both movies, which means he should be excluded since Tim Burton directed him in Dark Shadows.
You can see that he his in the list.
I really don't understand why the NOT IN fail with the sub select. I tried the sub Select request and I found Christopher Lee inside which means he should be excluded.

If I understood correctly, you want all actors that acted in the given movies, but have never acted in any movie directed by Tim Burton. I would use FILTER NOT EXISTS:
SELECT DISTINCT ?actor ?actorLabel
WHERE {
VALUES ?imdbId { "tt1077368" "tt0167260" }
?film wdt:P31 wd:Q11424
;wdt:P161 ?actor
;wdt:P345 ?imdbId .
FILTER NOT EXISTS {
[] wdt:P31 wd:Q11424
; wdt:P57 wd:Q56008
; wdt:P161 ?actor .
}
SERVICE wikibase:label { bd:serviceParam wikibase:language "fr" }
}
LIMIT 100

Related

SPARQL query on Wikidata - How to get people who are alive while there is no isDead property? Also, how to identity people as they belong to a country

How to get the following data from Wikidata using SPARQL?
Result set 1:
List of all people who are male
Result set 2:
Take Result Set 1 as the source and then apply filter to get only the Men who are alive
Can we say if a person is alive if the dateOfDeath is not present?
SELECT DISTINCT ?person ?personLabel
WHERE
{
?person wdt:P31 wd:Q5 . # human
?person wdt:P21 wd:Q6581097 . # man
FILTER NOT EXISTS { ?person wdt:P570 [] } # alive
SERVICE wikibase:label {bd:serviceParam wikibase:language "fr,en" }
}
LIMIT 20
Result set 3:
Take Result Set 2 as the source and then apply filter to get only the Living-Men who are South Africans (a player might be from other country and played for South Africa in International Cricket. Those players should be excluded)
I don't suppose just Country of Citizenship would suffice.
SELECT DISTINCT ?person ?personLabel
WHERE
{
?person wdt:P31 wd:Q5 . # human
?person wdt:P21 wd:Q6581097 . # man
FILTER NOT EXISTS { ?person wdt:P570 [] } # alive
?person wdt:P27 wd:Q258 . # Country of citizenship - South Africa
SERVICE wikibase:label {bd:serviceParam wikibase:language "fr,en" }
}
LIMIT 20
Result set 4:
Take Result Set 3 as the source and then apply filter to get only the Living-South-African-Men who are Cricketers (a cricketer might also be a politician. Those players should be included)
Result set 5:
Take Result Set 4 as the source and then apply filter to get only the Living-South-African-Male-Cricketers who are below 30 years of age (as on today)
I want these 5 queries to be executed in the exact same order. Could you help me get these results, please?

How to get the number of languages a Wikipedia page is available in? (SPARQL query)

I'm trying to have a list of Italian books from the 1980 on and the number of Wikipedia pages their original Wikipedia page has been translated into, for instance I would like to have:
Book, number
The name of the Rose, 5
Where 5 is the number of languages The Name of the Rose has been translated into in Wikipedia, for instance there is an English wiki page, a Dutch one, a Spanish one, a French one, a Greek one.
Here is the query so far:
SELECT ?item ?label
WHERE
{
VALUES ?type {wd:Q571 wd:Q7725634} # book or literary work
?item wdt:P31 ?type .
?item wdt:P577 ?date FILTER (?date > "1980-01-01T00:00:00Z"^^xsd:dateTime) . #dal 1980
?item rdfs:label ?label filter (lang(?label) = "it")
?item wdt:P495 wd:Q38 .
}
I get the list of books, but I can't find the right property to look for.
Did you actually get The Name of the Rose in your example? Because its date of publication is set to 1980 (which is = 1980-01-01 for our purposes), I had to change your query to a >= comparison.
Then, using COUNT() and GROUP BY as mentioned in the comment gets you what you want. But if you really just need the number of sitselinks, there is a shortcut that may be useful. It was added because that number is often used as a good proxy for an item's popularity, and using the precomputed number is vastly more efficient than getting all links, grouping, and counting.
SELECT ?book ?bookLabel ?sitelinks ?date WHERE {
VALUES ?type { wd:Q571 wd:Q47461344 wd:Q7725634 }
?book wdt:P31 ?type;
wdt:P577 ?date;
wdt:P495 wd:Q38;
wikibase:sitelinks ?sitelinks.
FILTER((?date >= "1980-01-01T00:00:00Z"^^xsd:dateTime) && (?date < "1981-01-01T00:00:00Z"^^xsd:dateTime))
SERVICE wikibase:label { bd:serviceParam wikibase:language "it,en". }
}
Query
Note that the values may slightly differ from the version with group & count because here sites such as commons or wikiquote are also included.

Wikidata: Get all non-classical Musicians via SPARQL query

I hope that this kind of question is allowed here as it is more a Wikidata specific question. Anyways, I try to get all non-classical-music musicians from Wikidata by SPARQL. Right now I have this code:
SELECT ?value ?valueLabel ?born WHERE {
{
SELECT DISTINCT ?value ?born WHERE {
?value wdt:P31 wd:Q5 . # all Humans
?value wdt:P106/wdt:P279* wd:Q639669 . # of occupation or subclass of occupation is musician
?value wdt:P569 ?born . # Birthdate
FILTER(?born >= "1981-01-01T00:00:00Z"^^xsd:dateTime) # filter by Birthyear
}
ORDER BY ASC(?born)
#LIMIT 500
}
SERVICE wikibase:label { bd:serviceParam wikibase:language "en,ger". }
}
this gets me (theoretically) all People whose occupation is Musician (https://www.wikidata.org/wiki/Q639669) and who were born after 1900. (Theoretically because this query runs way too long and I had to break it into smaller chunks)
What I am after however is to exclude People who are primary classical musicians. Is there any property I am not aware of? Otherwise, how would I change my query to be able to filter by specific properties (like Q21680663, classical composer)?
Thanks!
If you check the Examples tab in the query interface and type music into the search field, you'll find an example that almost hits the spot:
Musicians or singers that have a genre containing 'rock'.
I've used that mostly to just get a list of all musicians with their genres. I finally settled on a MINUS query subtracting any musician who touches western classical music or baroque music, the latter included specifically to get Bach, the old bastard.
SELECT DISTINCT
?human ?humanLabel
(GROUP_CONCAT(DISTINCT ?genreLabel; SEPARATOR = ", ") AS ?genres)
WHERE {
{
?human wdt:P31 wd:Q5;
wdt:P106 wd:Q639669;
wdt:P136 ?genre.
} MINUS {
VALUES ?classics {
wd:Q9730
wd:Q8361
}
?human wdt:P136 ?classics.
}
# This is just boilerplate to get the labels.
# it's slightly faster this way than the label
# service, and the query is close to timing out already
?genre rdfs:label ?genreLabel.
FILTER((LANG(?genreLabel)) = "en")
?human rdfs:label ?humanLabel.
FILTER((LANG(?humanLabel)) = "en")
}
GROUP BY ?humanLabel ?human
In the Query Interface: 25,000 results in 20sec
Here's a taste of what the results look like (from some intermediate version, because I'm not redoing the table now).
artist
genres
Gigi D'Agostino
Latin jazz, Italo dance
Erykah Badu
neo soul, soul music
Yoko Kanno
jazz, blues, pop music, J-pop, film score, New-age music, art rock, ambient music
Michael Franks
pop music, rock music
Harry Nilsson
rock music, pop music, soft rock, baroque pop, psychedelic rock, sunshine pop
Yulia Nachalova
jazz, pop music, soul music, contemporary R&B, blue-eyed soul, estrada
Linda McCartney
pop rock
From the original example, you may want to try also including singers. The following, replacing the existing line with "P106" does that, and results in about twice as many results. But it often times out.
VALUES ?professions {
wd:Q177220
wd:Q639669
}
wdt:P106 ?professions;
Query including singers, 53,000 results but may time out
The example also uses the following to cut down results rather drastically, by including only items with a certain number of statements, assuming those correlate with... something. You may want to experiment with it to focus on the most significant results, or to give you room to avoid the timeout with other changes. Maybe trying lower limits than 50 to find the right balance is a good idea, though.
?human wikibase:statements ?statementcount.
FILTER(?statementcount > 50 )
A query with singers and the statement limit
This is an earlier version. It excludes all the listed genres, but includes any musician linked to any other genre, and there are many of them that would probably qualify as "classics". The filter uses the "NOT IN" construct, which seems cleaner to me than filtering based on labels.
SELECT DISTINCT
?human ?humanLabel
(GROUP_CONCAT(DISTINCT ?genreLabel; SEPARATOR = ", ") AS ?genres)
WHERE {
?human wdt:P31 wd:Q5;
wdt:P106 wd:Q639669;
wdt:P136 ?genre.
# The "MAGIC": Q9730 is "Western Classical Music"
# Q1344 is "opera"
# Then I noticed Amadeus, Wagner, and Bach all slipped through and expanded the list, and it's a really
# ugly way of doing this
FILTER(?genre NOT IN(wd:Q9730, wd:Q1344, wd:Q9734, wd:Q9748, wd:Q189201, wd:Q8361, wd:Q2142754, wd:Q937364, wd:Q1546995, wd:Q1746028, wd:Q207338, wd:Q3328774, wd:Q1065742))
?genre rdfs:label ?genreLabel.
FILTER((LANG(?genreLabel)) = "en")
?human rdfs:label ?humanLabel.
FILTER((LANG(?humanLabel)) = "en")
}
GROUP BY ?humanLabel ?human
This gets me 26,000 results. View in Query Interface
Note that this will still return artists that have "western classical music" among their genres, aw long as they are also linked to other genres. To exclude any musician ever dabbling in the classics, you'll have to start a daytime top-30 radio station use a MINUS construct to, essentially, subtract all those.

Wikidata SPARQL - Countries and their (still existing) neighbours

I want to query the neighbours to a country with SPARQL from Wikidata like this:
SELECT ?country ?countryLabel WHERE {
?country wdt:P47 wd:Q183 .
FILTER NOT EXISTS{ ?country wdt:P576 ?date } # don't count dissolved country - at least filters German Democratic Republic
SERVICE wikibase:label {
bd:serviceParam wikibase:language "en" .
}
}
My issue is that e.g. in this example for neighbours of germany there are still countries shown which does not exist anymore like:
Kingdom of Denmark or
Saarland.
Already tried
I could already reduce the number by the FILTER statement.
Question
How to make the statement to reduce it to 9 countries?
(also dividing in land boarder and sea boarder would be great)
Alternative
Filtering at this API would be also fine for me https://www.wikidata.org/w/api.php?action=wbgetentities&ids=Q35
a database or lists or prepared HashMaps whatever with all countries of the world with neighbours
You could check entities of type wd:Q133346 ('border') or wd:Q12413618 ('international border'):
SELECT ?border ?borderLabel ?country1Label ?country2Label ?isLandBorder ?isMaritimeBorder ?constraint {
VALUES (?country1) {(wd:Q183)}
?border wdt:P31 wd:Q12413618 ;
wdt:P17 ?country1 , ?country2 .
FILTER (?country1 != ?country2)
BIND (EXISTS {?border wdt:P31 wd:Q15104814} AS ?isLandBorder)
BIND (EXISTS {?border wdt:P31 wd:Q3089219} AS ?isMaritimeBorder)
BIND ((?isLandBorder || ?isMaritimeBorder) AS ?constraint)
SERVICE wikibase:label { bd:serviceParam wikibase:language "en". }
} ORDER BY ?country1Label
Try it!
In some sense, records are duplicated: for the Afghanistan–Uzbekistan border, the list contains both (?country1=Afganistan,?country2=Uzbekistan) and (?country1=Uzbekistan,?country2=Afganistan).
a database or lists or prepared HashMaps whatever with all countries of the world with neighbours
You could ask on https://opendata.stackexchange.com.

About UNION and FILTER NOT EXISTS in SPARQL (OpenRDF 2.8.0)

I learnt some semantic technologies, including RDF and SPARQL, a few years ago, then I didn't have chances to work with them for some time. Now I've started a new project which uses OpenRDF 2.8.0 as a semantic store and I'm resuming my knowledge, even though I have some forgotten things to recover.
In particular, in the past days I had some troubles in correctly undestanding the FILTER NOT EXIST construct in SPARQL.
Problem: I have a semantic store imported from DbTune.org (music ontologies). A mo:MusicArtist, intended as foaf:maker of a mo:Track, can be present in four scenarios (I'm only listing relevant statements):
<http://dbtune.org/musicbrainz/resource/artist/013c8e5b-d72a-4cd3-8dee-6c64d6125823> a mo:MusicArtist ;
vocab:artist_type "1"^^xs:short ;
rdfs:label "Edvard Grieg" .
<http://dbtune.org/musicbrainz/resource/artist/032df978-9130-490e-8857-0c9ef231fae8> a mo:MusicArtist ;
vocab:artist_type "2"^^xs:short ;
rel:collaboratesWith <http://dbtune.org/musicbrainz/resource/artist/3db5dfb1-1b91-4038-8268-ae04d15b6a3e> , <http://dbtune.org/musicbrainz/resource/artist/d78afc01-f918-440c-89fc-9d546a3ba4ac> ;
rdfs:label "Doris Day & Howard Keel".
<http://dbtune.org/musicbrainz/resource/artist/1645f335-2367-427d-8e2d-ad206946a8eb> a mo:MusicArtist ;
vocab:artist_type "2"^^xs:short ;
rdfs:label "Pat Metheny & Anna Maria Jopek".
<http://dbtune.org/musicbrainz/resource/artist/12822d4f-4607-4f1d-ab16-d6bacc27cafe> a mo:MusicArtist ;
rdfs:label "René Marie".
From what I understand, the vocab:artist_type is 1 for single artists (example #1) and 2 for groups of collaborations (examples #2 and #3). In this case, there might a few rel:collaboratesWith statements that point to the description of the single members of the group or collaboration (example #2). In some cases, the vocab:artist_type statement is missing (example #4).
Now I want to extract all the artists as single entities, where possibile. I mean, I don't want to retrieve example #2, because I will get "Doris Day" and "Howard Keel" separately. I have to retrieve example #3 "Pat Metheny & Anna Maria Jopek" because I can't do anything else. Of course, I also want to retrieve "René Marie".
I've solved the problem in a satisfactory way with this SPARQL:
SELECT *
WHERE
{
?artist a mo:MusicArtist.
?artist rdfs:label ?label.
MINUS
{
?artist vocab:artist_type "2"^^xs:short.
?artist rel:collaboratesWith ?any1 .
}
}
ORDER BY ?label
It makes sense and it looks like it's readable ("retrieve all mo:MusicArtist items minus those that are collaborations with individual members listed").
I didn't find the solution immediately. I first thought of putting together the three separate cases, with UNION:
SELECT *
WHERE
{
?artist a mo:MusicArtist.
?artist rdfs:label ?label.
# Single artists
{
?artist vocab:artist_type "1"^^xs:short.
}
UNION
# Groups for which there is no defined collaboration with single persons
{
?artist vocab:artist_type "2"^^xs:short.
FILTER NOT EXISTS
{
?artist rel:collaboratesWith ?any1
}
}
UNION
# Some artists don't have this attribute
{
FILTER NOT EXISTS
{
?artist vocab:artist_type ?any2
}
}
}
ORDER BY ?label
I found that the third UNION statements, the ones which should add mo:MusicArtist items without a vocab:artist_type, didn't worked. That is, they didn't find the items such as "René Marie".
While I'm satisfied with the shortest solution I found with MINUS, I'm not ok with the fact that I don't understand why the older solution didn't work. Clearly I'm missing some point with FILTER NOT EXISTS that could be useful for some other case.
Any help is welcome.
When I run the following query, I get the results that it sounds like you're looking for:
select distinct ?label where {
?artist a mo:MusicArtist ;
rdfs:label ?label .
#-- artists with type 1
{
?artist vocab:artist_type "1"^^xs:short
}
#-- artists with no type
union {
filter not exists {
?artist vocab:artist_type ?type
}
}
#-- artists with type 2 that have no
#-- collaborators
union {
?artist vocab:artist_type "2"^^xs:short
filter not exists {
?artist rel:collaboratesWith ?another
}
}
}
------------------------------------
| label |
====================================
| "René Marie" |
| "Pat Metheny & Anna Maria Jopek" |
| "Edvard Grieg" |
------------------------------------
I'm not whether I see where this essentially differs from yours, though. I do think that you could clean this query up a bit though. You can use optional and values to specify that the type is optional, but if present must be 1 or 2. Then you can add a filter that requires that when the value is 2, there is no collaborator.
select ?label where {
#-- get an artist and their label
?artist a mo:MusicArtist ;
rdfs:label ?label .
#-- and optionally their type, if it is
#-- "1"^^xs:short or "2"^^xs:short
optional {
values ?type { "1"^^xs:short "2"^^xs:short }
?artist vocab:artist_type ?type
}
#-- if ?type is "2"^^xs:short, then ?artist
#-- must not collaborate with anyone.
filter ( !sameTerm(?type,"2"^^xs:short)
|| not exists { ?artist rel:collaboratesWith ?anyone })
}
------------------------------------
| label |
====================================
| "René Marie" |
| "Pat Metheny & Anna Maria Jopek" |
| "Edvard Grieg" |
------------------------------------