SPARQL query items sharing distinct properties - sparql

New to SPARQL, I'm trying to query Wikidata for all items that share at least k distinct values for a property with another item. For example, the following example is an attempt to list all diseases that share at least three different symptoms (wdt:P780) with COVID-19 (wd:Q84263196) and list each disease and all its shared symptoms. The following query comes close:
SELECT DISTINCT ?item ?itemLabel ?symptom1Label ?symptom2Label ?symptom3Label
WHERE
{
?item wdt:P780 ?symptom1, ?symptom2, ?symptom3.
wd:Q84263196 wdt:P780 ?symptom1, ?symptom2, ?symptom3.
FILTER (?symptom1 != ?symptom2)
FILTER (?symptom1 != ?symptom3)
FILTER (?symptom2 != ?symptom3)
SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE]". }
}
LIMIT 50
But the output shows all distinct combinations of three (common) symptoms. E.g.,
Q2840 influenza headache cough fatigue
Q2840 influenza headache cough nasal congestion
...
whereas the desired output should list each disease/symptom combination once (in either long or wide form).

Related

extracting covid-19 pandemic most recent data per country using a Wikidata SPARQL query

I'm trying to extract some of the most recents data about covid-19 pandemic statistics (number of cases, recoveries, point in time) per country from Wikidata using this SPARQL query:
SELECT DISTINCT ?COVID19_loc ?COVID19_locLabel ?countryLabel ?cases ?timeC ?recoveries ?timeR {
?pandemic wdt:P1269 wd:Q81068910;
FILTER ( ?pandemic in ( wd:Q83741704)).
?pandemic wdt:P527 ?COVID19_loc.
?COVID19_loc wdt:P17 ?country.
?COVID19_loc wdt:P1603 ?cases.
FILTER NOT EXISTS {?COVID19_loc wdt:P1603/wdt:P585 ?timeC_Other FILTER ( ?timeC_Other > ?timeC)}.
?COVID19_loc wdt:P8010 ?recoveries.
FILTER NOT EXISTS {?COVID19_loc wdt:P8010/wdt:P585 ?timeR_Other FILTER ( ?timeR_Other > ?timeR)}.
SERVICE wikibase:label { bd:serviceParam wikibase:language "en". } }
The thing is I'm getting multiples cases and recoveries results instead of the most recents only, also timeC and timeR don't return a thing as shown in the image bellow:

Hierarchical query in Sparql to fetch all children for entity organisation - wd:43229

I would like to write an equivalent recursive query in Sparql to query all types of organisations on Wikidata that roll up to wd:43229 (organisation)
For example, the following entity Q4926947,
The output should like so
entity|entityLabel|path
Q4926947|Blitz Arcade| Q210167->Q112042224->Q1058914->Q4830453->Q43229
Q4926947|Blitz Arcade| Q210167->Q112042224->Q783794->Q43229
Q4926947|Blitz Arcade| Q210167->Q112042224->Q18388277->Q6881511->Q4830453->Q43229
Q4926947|Blitz Arcade| Q210167->Q112042224->Q18388277->Q6881511->Q362482->Q679206->Q43229
There are several paths that lead to Q43229.
In my query, I would only like to specify the root (Q43229) and it should be able to query all the leaf nodes that link to Q43229
This is what I've got so far, but it is far from the desired result. Any help is appreciated
SELECT ?item ?itemLabel (group_concat(?linkTo; separator=",") as ?org_path) {
wd:Q43229 ^wdt:P279* ?item
OPTIONAL { ?item ^wdt:P279* ?linkTo }
SERVICE wikibase:label {bd:serviceParam wikibase:language "en" }
} group by ?item ?itemLabel
limit 5

Wikidata: Get all non-classical Musicians via SPARQL query

I hope that this kind of question is allowed here as it is more a Wikidata specific question. Anyways, I try to get all non-classical-music musicians from Wikidata by SPARQL. Right now I have this code:
SELECT ?value ?valueLabel ?born WHERE {
{
SELECT DISTINCT ?value ?born WHERE {
?value wdt:P31 wd:Q5 . # all Humans
?value wdt:P106/wdt:P279* wd:Q639669 . # of occupation or subclass of occupation is musician
?value wdt:P569 ?born . # Birthdate
FILTER(?born >= "1981-01-01T00:00:00Z"^^xsd:dateTime) # filter by Birthyear
}
ORDER BY ASC(?born)
#LIMIT 500
}
SERVICE wikibase:label { bd:serviceParam wikibase:language "en,ger". }
}
this gets me (theoretically) all People whose occupation is Musician (https://www.wikidata.org/wiki/Q639669) and who were born after 1900. (Theoretically because this query runs way too long and I had to break it into smaller chunks)
What I am after however is to exclude People who are primary classical musicians. Is there any property I am not aware of? Otherwise, how would I change my query to be able to filter by specific properties (like Q21680663, classical composer)?
Thanks!
If you check the Examples tab in the query interface and type music into the search field, you'll find an example that almost hits the spot:
Musicians or singers that have a genre containing 'rock'.
I've used that mostly to just get a list of all musicians with their genres. I finally settled on a MINUS query subtracting any musician who touches western classical music or baroque music, the latter included specifically to get Bach, the old bastard.
SELECT DISTINCT
?human ?humanLabel
(GROUP_CONCAT(DISTINCT ?genreLabel; SEPARATOR = ", ") AS ?genres)
WHERE {
{
?human wdt:P31 wd:Q5;
wdt:P106 wd:Q639669;
wdt:P136 ?genre.
} MINUS {
VALUES ?classics {
wd:Q9730
wd:Q8361
}
?human wdt:P136 ?classics.
}
# This is just boilerplate to get the labels.
# it's slightly faster this way than the label
# service, and the query is close to timing out already
?genre rdfs:label ?genreLabel.
FILTER((LANG(?genreLabel)) = "en")
?human rdfs:label ?humanLabel.
FILTER((LANG(?humanLabel)) = "en")
}
GROUP BY ?humanLabel ?human
In the Query Interface: 25,000 results in 20sec
Here's a taste of what the results look like (from some intermediate version, because I'm not redoing the table now).
artist
genres
Gigi D'Agostino
Latin jazz, Italo dance
Erykah Badu
neo soul, soul music
Yoko Kanno
jazz, blues, pop music, J-pop, film score, New-age music, art rock, ambient music
Michael Franks
pop music, rock music
Harry Nilsson
rock music, pop music, soft rock, baroque pop, psychedelic rock, sunshine pop
Yulia Nachalova
jazz, pop music, soul music, contemporary R&B, blue-eyed soul, estrada
Linda McCartney
pop rock
From the original example, you may want to try also including singers. The following, replacing the existing line with "P106" does that, and results in about twice as many results. But it often times out.
VALUES ?professions {
wd:Q177220
wd:Q639669
}
wdt:P106 ?professions;
Query including singers, 53,000 results but may time out
The example also uses the following to cut down results rather drastically, by including only items with a certain number of statements, assuming those correlate with... something. You may want to experiment with it to focus on the most significant results, or to give you room to avoid the timeout with other changes. Maybe trying lower limits than 50 to find the right balance is a good idea, though.
?human wikibase:statements ?statementcount.
FILTER(?statementcount > 50 )
A query with singers and the statement limit
This is an earlier version. It excludes all the listed genres, but includes any musician linked to any other genre, and there are many of them that would probably qualify as "classics". The filter uses the "NOT IN" construct, which seems cleaner to me than filtering based on labels.
SELECT DISTINCT
?human ?humanLabel
(GROUP_CONCAT(DISTINCT ?genreLabel; SEPARATOR = ", ") AS ?genres)
WHERE {
?human wdt:P31 wd:Q5;
wdt:P106 wd:Q639669;
wdt:P136 ?genre.
# The "MAGIC": Q9730 is "Western Classical Music"
# Q1344 is "opera"
# Then I noticed Amadeus, Wagner, and Bach all slipped through and expanded the list, and it's a really
# ugly way of doing this
FILTER(?genre NOT IN(wd:Q9730, wd:Q1344, wd:Q9734, wd:Q9748, wd:Q189201, wd:Q8361, wd:Q2142754, wd:Q937364, wd:Q1546995, wd:Q1746028, wd:Q207338, wd:Q3328774, wd:Q1065742))
?genre rdfs:label ?genreLabel.
FILTER((LANG(?genreLabel)) = "en")
?human rdfs:label ?humanLabel.
FILTER((LANG(?humanLabel)) = "en")
}
GROUP BY ?humanLabel ?human
This gets me 26,000 results. View in Query Interface
Note that this will still return artists that have "western classical music" among their genres, aw long as they are also linked to other genres. To exclude any musician ever dabbling in the classics, you'll have to start a daytime top-30 radio station use a MINUS construct to, essentially, subtract all those.

Wikidata+SPARQL: Get tickers of all companies listed on stock exchanges

I want to write a SPARQL query that gives me the wikidata_id, label, stock exchange, and ticker symbol for all instances of a company being listed on a stock exchange.
My query so far looks like
SELECT DISTINCT ?id ?idLabel ?exchangeLabel ?tickerLabel
WHERE {
?id wdt:P31/wdt:P279* wd:Q783794 ;
wdt:P414 ?exchange ;
p:P414 [pq:P249 ?ticker].
SERVICE wikibase:label { bd:serviceParam wikibase:language "en" }
}
While this produces results that almost seem right, there is a problem when companies are listed on multiple exchanges -- here's an example of the problem in the results:
Note how in the above, Credit Suisse is listed three times, with three different tickers. While it's correct that Credit Suisse is listed on three stock exchanges, the problem is that the NYSE is listed as the exchange in all three cases. Even worse, there are in fact nine rows for Credit Suisse, associating every listing with every stock exchange. The correct listing info would contain only three listings, and is provided on Credit Suisse's wikidata page:
What am I doing wrong? How can I get the correct exchange to be associated with each ticker's row?
Thanks to #StansilavKralin (in a comment to my question) I can provide an answer:
SELECT DISTINCT ?id ?idLabel ?exchangeLabel ?tickerLabel
WHERE {
?id wdt:P31/wdt:P279* wd:Q783794 ; p:P414 [pq:P249 ?ticker; ps:P414 ?exchange ] .
SERVICE wikibase:label { bd:serviceParam wikibase:language "en" }
}

Wikidata SPARQL - Countries and their (still existing) neighbours

I want to query the neighbours to a country with SPARQL from Wikidata like this:
SELECT ?country ?countryLabel WHERE {
?country wdt:P47 wd:Q183 .
FILTER NOT EXISTS{ ?country wdt:P576 ?date } # don't count dissolved country - at least filters German Democratic Republic
SERVICE wikibase:label {
bd:serviceParam wikibase:language "en" .
}
}
My issue is that e.g. in this example for neighbours of germany there are still countries shown which does not exist anymore like:
Kingdom of Denmark or
Saarland.
Already tried
I could already reduce the number by the FILTER statement.
Question
How to make the statement to reduce it to 9 countries?
(also dividing in land boarder and sea boarder would be great)
Alternative
Filtering at this API would be also fine for me https://www.wikidata.org/w/api.php?action=wbgetentities&ids=Q35
a database or lists or prepared HashMaps whatever with all countries of the world with neighbours
You could check entities of type wd:Q133346 ('border') or wd:Q12413618 ('international border'):
SELECT ?border ?borderLabel ?country1Label ?country2Label ?isLandBorder ?isMaritimeBorder ?constraint {
VALUES (?country1) {(wd:Q183)}
?border wdt:P31 wd:Q12413618 ;
wdt:P17 ?country1 , ?country2 .
FILTER (?country1 != ?country2)
BIND (EXISTS {?border wdt:P31 wd:Q15104814} AS ?isLandBorder)
BIND (EXISTS {?border wdt:P31 wd:Q3089219} AS ?isMaritimeBorder)
BIND ((?isLandBorder || ?isMaritimeBorder) AS ?constraint)
SERVICE wikibase:label { bd:serviceParam wikibase:language "en". }
} ORDER BY ?country1Label
Try it!
In some sense, records are duplicated: for the Afghanistan–Uzbekistan border, the list contains both (?country1=Afganistan,?country2=Uzbekistan) and (?country1=Uzbekistan,?country2=Afganistan).
a database or lists or prepared HashMaps whatever with all countries of the world with neighbours
You could ask on https://opendata.stackexchange.com.