How to get the path length between a child and parent node using skos:broader* - sparql

I have the following query getting the terminal leaf nodes from a parent category
select distinct ?subcat where {
?subcat skos:broader* category:Buildings_and_structures_in_France_by_city .
optional { ?subsubcat skos:broader ?subcat }
}
group by ?subcat
having count(?subsubcat) = 0
How do I get the path length between the child node ?subcat and the parent node category:Buildings_and_structures_in_France_by_city such that the output would be something like?

If the real task is finding buildings and structures in France, then you can ask for things in that category of some appropriate types. E.g.,
select distinct ?building where {
values ?type { dbpedia-owl:ArchitecturalStructure
dbpedia-owl:Building
dbpedia-owl:Place }
?building a ?type ;
dcterms:subject/skos:broader* category:Buildings_and_structures_in_France_by_city
}
SPARQL results
That gets just about 700 results. If you find some that aren't France, take a look at their values and see what you could exclude them based on. Perhaps you could add a filter to restrict latitude and longitude, or country values, etc.

Related

Retrieving Covid-19 pandemic statistics per country from DBpedia

I'm trying to get the arrival date, the confirmed and recovery cases total and the deaths total of Covid-19 pandemic per country from DBpedia, using this query:
PREFIX dbp: <http://dbpedia.org/property/>
SELECT distinct ?country ?arrivalDate ?confirmedCases ?recoveryCases ?deaths WHERE {
?country a dbp:location;
dbp:arrivalDate ?arrivalDate;
dbp:confirmedCases ?confirmedCases;
dbp:recoveryCases ?recoveryCases;
dbp:deaths ?deaths
}
Unfortunately, it doesn't return anything
?country a dbp:location
With this triple pattern, you are trying to find entities that have http://dbpedia.org/property/location as type (rdf:type). This is not what you intend, because
dbp:location is a property (not a class), and
in the subject position, you don’t seem to want to find locations, but information about the pandemic.
So ideally rename ?country to something like ?pandemicInfo (for clarity), and then ask for the dbp:location of that ?pandemicInfo:
SELECT DISTINCT ?pandemicInfo ?country ?arrivalDate ?confirmedCases ?recoveryCases ?deaths
WHERE {
?pandemicInfo
dbp:location ?country ;
dbp:arrivalDate ?arrivalDate ;
dbp:confirmedCases ?confirmedCases ;
dbp:recoveryCases ?recoveryCases ;
dbp:deaths ?deaths .
}
To only get information about the COVID-19 pandemic, you could add:
dbo:disease dbr:COVID-19
And if there is a type that all entities share, e.g., dbo:Pandemic, you could add:
a dbo:Pandemic
(But you should verify if all the entities you are interested in contain these statements, otherwise you would exclude them.)

Wikidata: Get all non-classical Musicians via SPARQL query

I hope that this kind of question is allowed here as it is more a Wikidata specific question. Anyways, I try to get all non-classical-music musicians from Wikidata by SPARQL. Right now I have this code:
SELECT ?value ?valueLabel ?born WHERE {
{
SELECT DISTINCT ?value ?born WHERE {
?value wdt:P31 wd:Q5 . # all Humans
?value wdt:P106/wdt:P279* wd:Q639669 . # of occupation or subclass of occupation is musician
?value wdt:P569 ?born . # Birthdate
FILTER(?born >= "1981-01-01T00:00:00Z"^^xsd:dateTime) # filter by Birthyear
}
ORDER BY ASC(?born)
#LIMIT 500
}
SERVICE wikibase:label { bd:serviceParam wikibase:language "en,ger". }
}
this gets me (theoretically) all People whose occupation is Musician (https://www.wikidata.org/wiki/Q639669) and who were born after 1900. (Theoretically because this query runs way too long and I had to break it into smaller chunks)
What I am after however is to exclude People who are primary classical musicians. Is there any property I am not aware of? Otherwise, how would I change my query to be able to filter by specific properties (like Q21680663, classical composer)?
Thanks!
If you check the Examples tab in the query interface and type music into the search field, you'll find an example that almost hits the spot:
Musicians or singers that have a genre containing 'rock'.
I've used that mostly to just get a list of all musicians with their genres. I finally settled on a MINUS query subtracting any musician who touches western classical music or baroque music, the latter included specifically to get Bach, the old bastard.
SELECT DISTINCT
?human ?humanLabel
(GROUP_CONCAT(DISTINCT ?genreLabel; SEPARATOR = ", ") AS ?genres)
WHERE {
{
?human wdt:P31 wd:Q5;
wdt:P106 wd:Q639669;
wdt:P136 ?genre.
} MINUS {
VALUES ?classics {
wd:Q9730
wd:Q8361
}
?human wdt:P136 ?classics.
}
# This is just boilerplate to get the labels.
# it's slightly faster this way than the label
# service, and the query is close to timing out already
?genre rdfs:label ?genreLabel.
FILTER((LANG(?genreLabel)) = "en")
?human rdfs:label ?humanLabel.
FILTER((LANG(?humanLabel)) = "en")
}
GROUP BY ?humanLabel ?human
In the Query Interface: 25,000 results in 20sec
Here's a taste of what the results look like (from some intermediate version, because I'm not redoing the table now).
artist
genres
Gigi D'Agostino
Latin jazz, Italo dance
Erykah Badu
neo soul, soul music
Yoko Kanno
jazz, blues, pop music, J-pop, film score, New-age music, art rock, ambient music
Michael Franks
pop music, rock music
Harry Nilsson
rock music, pop music, soft rock, baroque pop, psychedelic rock, sunshine pop
Yulia Nachalova
jazz, pop music, soul music, contemporary R&B, blue-eyed soul, estrada
Linda McCartney
pop rock
From the original example, you may want to try also including singers. The following, replacing the existing line with "P106" does that, and results in about twice as many results. But it often times out.
VALUES ?professions {
wd:Q177220
wd:Q639669
}
wdt:P106 ?professions;
Query including singers, 53,000 results but may time out
The example also uses the following to cut down results rather drastically, by including only items with a certain number of statements, assuming those correlate with... something. You may want to experiment with it to focus on the most significant results, or to give you room to avoid the timeout with other changes. Maybe trying lower limits than 50 to find the right balance is a good idea, though.
?human wikibase:statements ?statementcount.
FILTER(?statementcount > 50 )
A query with singers and the statement limit
This is an earlier version. It excludes all the listed genres, but includes any musician linked to any other genre, and there are many of them that would probably qualify as "classics". The filter uses the "NOT IN" construct, which seems cleaner to me than filtering based on labels.
SELECT DISTINCT
?human ?humanLabel
(GROUP_CONCAT(DISTINCT ?genreLabel; SEPARATOR = ", ") AS ?genres)
WHERE {
?human wdt:P31 wd:Q5;
wdt:P106 wd:Q639669;
wdt:P136 ?genre.
# The "MAGIC": Q9730 is "Western Classical Music"
# Q1344 is "opera"
# Then I noticed Amadeus, Wagner, and Bach all slipped through and expanded the list, and it's a really
# ugly way of doing this
FILTER(?genre NOT IN(wd:Q9730, wd:Q1344, wd:Q9734, wd:Q9748, wd:Q189201, wd:Q8361, wd:Q2142754, wd:Q937364, wd:Q1546995, wd:Q1746028, wd:Q207338, wd:Q3328774, wd:Q1065742))
?genre rdfs:label ?genreLabel.
FILTER((LANG(?genreLabel)) = "en")
?human rdfs:label ?humanLabel.
FILTER((LANG(?humanLabel)) = "en")
}
GROUP BY ?humanLabel ?human
This gets me 26,000 results. View in Query Interface
Note that this will still return artists that have "western classical music" among their genres, aw long as they are also linked to other genres. To exclude any musician ever dabbling in the classics, you'll have to start a daytime top-30 radio station use a MINUS construct to, essentially, subtract all those.

How To Efficiently Design Nested OPTIONAL Clauses In SPARQL

I want to retrieve data that has optional elements at multiple levels. For example, assume I have four ancestors - Fred, Sam, George, and Mark. Fred and Sam have kids ... George and Mark do not. All of Fred's kids have nicknames, but two of Sam's four kids do not.
I want to query all of the kids of my ancestors and return their names, ages, and nicknames.
It seems like this would work:
SELECT DISTINCT ?token ?ancestorName ?childName ?childAge ?childNickname
WHERE
{
FILTER ( ?token IN ("Fred","Sam","George","Mark") )
?ancestor foo:name ?token .
?ancestor foo:fullname ?ancestorName .
OPTIONAL
{
?ancestor foo:parentOf ?child .
?child foo:fullname ?childName .
?child foo:age ?childAge .
OPTIONAL { ?child foo:nickname ?childNickname }
}
}
Everything seems to work fine if an ancestor doesn't have a child ... all of the outer optional clause returns quickly
with no data. If the ancestor has children, and each has a nickname, it returns quickly and fills in the data. The problem seems to happen when the ancestor has a child, but the child does not have a nickname.
It works ... but it takes a very long time (I have lots of data). It appears that the inner OPTIONAL clause
OPTIONAL { ?child foo:nickname ?childNickname }
... does a cross product ... combining every ?child with every ?childNickname ... and then returns the right value.
How can I write this SPARQL SELECT to run efficiently (not do a cross product) and return all of the ancestors and all of the kids even if a kid doesn't have a nickname? I've tried FILTERS. I've tried checking whether ?child was BOUND. I haven't found the secret to make it run quick.
Thanks for the help!

Retrieve the US release date for a movie from Wikidata using Sparql

I am trying to retrieve the titles and release dates (publication date) for movies using the wikidata.org sparkql endpoint (https://query.wikidata.org/). The titles are listed in different languages, which are filtered in the query below. However, some movies also have several publication dates (e.g. for different countries), e.g. https://www.wikidata.org/wiki/Q217020. I'm not sure how the RDF triple structure is actually used to assign a country to the value of another triple, but specifically, how can I only retrieve the publication date for a movie in the US?
SELECT ?item ?title ?publicationdate
WHERE {
?item wdt:P31 wd:Q11424 ;
rdfs:label ?title ;
wdt:P577 ?publicationdate ;
filter ( lang(?title) = "en" )
}
ORDER BY ?movieid
LIMIT 10
Solution
The solution provided by M.Sarmini works. Apparently, facts such as publication data are stored as n-ary relations, they create a unique symbolic tag that links the resources. The value that P577 links to is just the date, when turned into a string will give the release date, while in reality it is a token that you can link to other qualifiers.
Just add a new variable to hold the place of publication and filter your results to just list US films like this:
PREFIX q: <http://www.wikidata.org/prop/qualifier/>
PREFIX s: <http://www.wikidata.org/prop/statement/>
SELECT distinct ?item ?title ?publicationdate
WHERE {
?item wdt:P31 wd:Q11424;
rdfs:label ?title;
p:P577 ?placeofpublication.
?placeofpublication q:P291 wd:Q30.
?placeofpublication s:P577 ?publicationdate;
filter ( lang(?title) = "en")
}
ORDER BY ?item

Query DBpedia to get abstract for different inputs

I have a question I need to build a single query to DBpedia such that, If I give any one of these as input like a City name or a person name or a Institute name or a Instrument name can I get its abstract as a output???
For instance,
New York- New York is a state in the Northeastern and Mid-Atlantic regions of the United States......
Mars- Mars is the fourth planet from the Sun and the second smallest planet in the Solar System....
Michael Jackson- Michael Joseph Jackson was an American singer, songwriter, dancer, and actor......
I have tried but its not working for all.
SELECT ?abstract WHERE {
<http://dbpedia.org/resource/New_York>
<http://dbpedia.org/ontology/abstract>
?abstract
FILTER langMatches(lang(?abstract), "en")
}
If you intend to get the abstract for multiple things, supply those multiple things within a VALUES block. I found that matching by ?name worked sufficiently well for name-based searches.
SELECT DISTINCT ?abstract WHERE {
[ rdfs:label ?name
; dbpedia-owl:abstract ?abstract
] .
FILTER langMatches(lang(?abstract),"en")
VALUES ?name { "New York"#en }
}
LIMIT 10