Plot simple categorical data onto a map in Wikidata - sparql

I use the following Wikidat SPARQL query to get a list of places of birth and death of hairdressers (https://w.wiki/6Gsz):
#defaultView:Map
SELECT ?pers ?pobC ?podC WHERE {
?pers wdt:P106 wd:Q55187 ;
wdt:P19 ?pob ;
wdt:P20 ?pod .
?pob wdt:P625 ?podC .
?pod wdt:P625 ?pobC .
}
I would like to visually separate the places of birth from the places of death. The intended result would be: ?pobC dots in one color, ?podC dots in another.
The examples in the Wikidata SPARQL handbook (https://en.wikibooks.org/wiki/SPARQL/Views#Map) don't quite work for me as the color depends in my case on the variable name (pob vs. pod) and not its value and I cannot figure out how to translate this into SPARQL.
Help would be appreciated!

Developing on #UninformedUser's comment:
You need to return only one coordinate per row, and use the ?layer special variable to split the colors.
The minimal solution to that, based on the comment but without nesting and hints, is a query like:
#defaultView:Map
SELECT ?pers ?pobC ?podC ?layer WHERE {
?pers wdt:P106 wd:Q55187 ;
wdt:P19 ?pob ;
wdt:P20 ?pod .
{
?pob wdt:P625 ?podC.
BIND("birth" AS ?layer)
}
UNION
{
?pod wdt:P625 ?pobC.
BIND("death" AS ?layer)
}
}
https://w.wiki/6H59
Note that in that case you are returning only one place per line, so each dot won't have all the birth and death information.

Related

Get chemical compound data in SI units from wikidata

For a project I need to get data about chemical compounds like density, mass, boiling point and melting point in SI units (meters, kg, Degree Celsius,...) via the CAS number of the compound.
With the Query builder and some testing I managed to achieve some of it with the following code (CAS-Number is the property P231 and I am searching for e.g. 67-64-1):
Wikidata Query Service
SELECT DISTINCT ?itemLabel ?melting_point ?boiling_point ?mass ?density WHERE {
{
SELECT DISTINCT ?item WHERE {
?item p:P231 ?statement0.
?statement0 (ps:P231) "67-64-1".
}
}
OPTIONAL { ?item wdt:P2101 ?melting_point. }
OPTIONAL { ?item wdt:P2102 ?boiling_point. }
OPTIONAL { ?item wdt:P2054 ?density. }
OPTIONAL { ?item wdt:P2067 ?mass. }
SERVICE wikibase:label { bd:serviceParam wikibase:language "de". }
}
The problem is that I don't manage to get only temperatures in Degree Celsius but also Fahrenheit
This is a very interesting question, as WikiData offers plenty of tools to disambiguate -- this is why it's so powerful.
But these tools come with some learning that the user needs to do before using them.
Before starting, let me make three points about your query:
1-You don't actually need an inner query to select acetone, as you would in SQL. This is one of the reasons why SPARQL is so great compared to SQL -- you don't have to navigate an endless field of keys but your data is still 'normalised'.
2-You don't need the ?statement0 variable, as this is not used to disambiguate. You can just use the wdt:P231 property directly links acetone with its CAS registry number.
3-Since you do need to disambiguate the values of the physical quantities associated with acetone, you will need to go through a disambiguation statement.
Now, here is a query that works:
SELECT DISTINCT ?itemLabel ?melting_point ?boiling_point ?density ?mass
WHERE {
?item wdt:P231 "67-64-1".
OPTIONAL {
?item p:P2101 ?ps1 .
?ps1 ps:P2101 ?melting_point;
psv:P2101/wikibase:quantityUnit/wdt:P31/wdt:P279* wd:Q61610698
}
OPTIONAL {
?item p:P2102 ?ps2 .
?ps2 ps:P2102 ?boiling_point;
psv:P2102/wikibase:quantityUnit/wdt:P31/wdt:P279* wd:Q61610698
}
OPTIONAL {
?item p:P2054 ?ps3 .
?ps3 ps:P2054 ?density;
psv:P2054/wikibase:quantityUnit/wdt:P31/wdt:P279* wd:Q61610698
}
OPTIONAL {
?item p:P2067 ?ps4 .
?ps4 ps:P2067 ?mass;
psv:P2067/wikibase:quantityUnit/wdt:P31/wdt:P279* wd:Q61610698
}
SERVICE wikibase:label { bd:serviceParam wikibase:language "de". }
}
To begin with, I removed the inner query and the statement mentioned in §1 and §2 above.
Then, I retrieve the statement that talks about melting point by using the p:P2101/ps:P2101 combination.
This will allows me to distinguish between Celsius and Fahrenheit values.
Now, since we have multiple physical quantities to look for (i.e. not just temperature), and we want these to be SI, we can use a property path (see below for explanation) to restrict the values that we return as being SI (as opposed to returning Celsius, kg/m^3, kg specifically and individually, although this would be perfectly valid too, just more complex).
For reference, a property path is just a way to shorten a query so:
SELECT ?person ?grandparent
WHERE {
?person :hasParent ?parent .
?parent :hasParent ?grandparent .
}
can be shortened to:
SELECT ?person ?grandparent
WHERE {
?person :hasParent/:hasParent ?grandparent .
}
Now, let's get back to the melting point being returned in Celsius and Fahrenheit.
The two statements that give us different units use a psv:P2101 property to tell us more about the value mentioned in the statement.
From this we can use the wikibase:quantityUnit property to determine the unit.
We will then want to make sure the unit is a SI unit or any subclass thereof. So wdt:P31 tells us that the unit is "an instance of" some class, and wdt:P279* wd:Q61610698 (wdt:P279* is another property path) tells us that the class is either the class of SI units (wd:Q61610698 = SI units), or a direct or indirect subclass of SI units.
I added a picture of what the data looks like (although confusingly there are two Celsius melting points for acetone for some reason.

How to construct SPARQL query for a list of Wikidata items

First off, I'm not a developer, and I'm new to writing SPARQL queries. Mostly I've been looking up existing queries and trying to tweak them to get what I need. The issue is that most documentation on query construction have to do with getting new data you don't have, rather than retrieving or extending existing data. And when you do find tips for retrieving existing data, they tend to be for ONE item at a time instead of a full data set of many items.
I mostly use OpenRefine for this. I start by loading up my existing list of names, and used the Wikidata extension service to reconcile the names to existing Wikidata IDs. So now, this is where I am, vs. where I want to go:
1 - We have a list of Wikidata IDs for reconciled matches;
2 - We have used OpenRefine to get most of the data we need from those;
3 - We don't have the label, description, or Wikipedia links (English), which are extremely valuable;
4 - I have figured out how to construct a query for the label and description of just ONE Wikidata Item:
SELECT ?itemLabel ?itemDescription WHERE { VALUES ?item {
wd:Q15485689 } SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en". }
}
5 - I have figured out how to construct a query to extract the Wikipedia English URL for just ONE Wikidata item:
SELECT ?article ?lang ?name WHERE {
?article schema:about wd:Q15485689;
schema:inLanguage ?lang;
schema:name ?name;
schema:isPartOf _:b13.
_:b13 wikibase:wikiGroup "wikipedia".
FILTER(?lang IN("en"))
FILTER(!(CONTAINS(?name, ":")))
OPTIONAL { ?article wdt:P31 ?instance_of. }
}
The questions are:
How do I modify either query to generate these same results for MORE THAN ONE* Wikidata item?
How do I modify the query to give me all three at once, for more than one* Wikidata item?
*we have 667, but I could do smaller batches if that's too much for the service to handle
Ideally, the query would generate something that allowed me to download a CSV file looking much like this (so I can match on and import the new data into our Airtable base which feeds the website application):
ideal CSV output
If anyone can lead me in the right direction here, I'd appreciate it.
I should also note that if OpenRefine has a way of retrieving these I'm all ears! But since these three don't have a property code, I couldn't see how to snag them from OR.
This sort of thing. See how many QIds you can get away with in the values statement. All of them in one go, probably. This query gives you the URL and the article title; clearly, you can snip the article title column if you do not want it. Note also https://www.wikidata.org/wiki/Wikidata:Request_a_query which is wikidata's own location for questions such as these.
SELECT ?item ?itemLabel ?itemDescription ?sitelink ?article
WHERE
{
VALUES ?item {wd:Q105848230 wd:Q6697407 wd:Q2344502 wd:Q1698206}
OPTIONAL {
?article schema:about ?item ;
schema:isPartOf <https://en.wikipedia.org/> ;
schema:name ?sitelink .
}
SERVICE wikibase:label { bd:serviceParam wikibase:language "en". }
}
Yes, a VALUES statement in SPARQL can relay not only hundreds but even thousands of items. I regularly do this when cross-checking to see how Wikidata matches up to an existing data set. Some other things you could do as well that take lists of Wikidata items:
Petscan - https://petscan.wmflabs.org/
TABernacle - https://tabernacle.toolforge.org/

How can I avoid timeout on a SPARQL query on Wikidata?

I am trying to extract all items of a category on Wikidata, with their respective page title in English. It works ok as long as the category does not contain many items, like this:
SELECT ?work ?workLabel
WHERE
{
?work wdt:P31/wdt:P279* wd:Q734454.
?work rdfs:label ?workLabel .
FILTER ( LANGMATCHES ( LANG ( ?workLabel ), "en" ) )
}
ORDER BY ?work
but times out (Query timeout limit reached )as soon as I use a category with more items, such as Q2188189. See This example
I have tried using LIMIT or OFFSET clauses but this does not change the result.
I also have tried to insert a filter like this FILTER (regex(?work, '.*Q1.*')) . to slice the query in subsets, also without success (No matching records found).
For now I have only extracted the ids - and then run queries to get the page title for each one of them, but that seems silly.
Is there a way to work around the timeout?
Standard method
If you want the page title of all the music works which have an article on en.wikipedia.org, you must use the following query:
SELECT ?work ?workTitle
WHERE
{
?work wdt:P31/wdt:P279* wd:Q2188189.
?workLink schema:about ?work ;
schema:isPartOf <https://en.wikipedia.org/> ;
schema:name ?workTitle .
}
I tried it three times and two of them it haven't exceed timeout.
Alternative method
If you don't manage to make it work, the only workaround I can imagine is to retrieve all the possible types (i.e. subclasses) of music work, and adapt the above query to the single-class case.
So, the first step is:
SELECT ?workType WHERE { ?workType wdt:P279* wd:Q2188189. }
You'll get more than a thousand results. For each of them (take for example the result Q2743), you'll then have to run the following query:
SELECT ?work ?workTitle
WHERE
{
?work wdt:P31 wd:Q2743.
?workLink schema:about ?work ;
schema:isPartOf <https://en.wikipedia.org/> ;
schema:name ?workTitle .
}
This will return all the items that are directly instances of Q2743, without caring about subclasses.
This method is a bit cumbersome, abut you can use it if you don't care of doing many queries. The idea is to divide the complexity among many queries, so that you will exceed the timeout less likely for each of them.

Filtering results based on specific properties with specific values (cause timeout connection to DBpedia)

I'm trying to make a SPARQL query using Prolog and DBpedia. My objective is to tag in text all Persons, so for retrieving famous people I made this query that remove all results like Music groups(Band) and Organization, since I want to tag only real people and not abstract
select ?person where{
{
?person a dbpedia-owl:Person; rdfs:label "Name Surname" #it.
}
UNION
{
?person a dbpedia-owl:Person; foaf:name "Name"#it; foaf:surname "Surname"#it.
}
UNION
{
?person a dbpedia-owl:Person; foaf:name "Name Surname"#it.
}
FILTER NOT EXISTS {
{ ?subject <http://airpedia.org/ontology/type_with_conf#10> dbpedia-owl:Band .
?subject rdfs:label ?artistName .
FILTER ( str(?artistName) = "Name Surname" )
}
UNION
{
?subject <http://airpedia.org/ontology/type_with_conf#10> dbpedia-owl:Organisation .
?subject rdfs:label ?artistName .
FILTER ( str(?artistName) = "Name Surname" )
}
}
}
I use It. version of Dbpedia if you run this query use this version although the results will not be good for me.
So for example if I search "Metallica" as a person i don't want to get results cause is it a Band or(for me, but in this case is Metallica are an Organisation too) an Organisation
and it works good this are the results Metallica Query Results and those are for "Michael Jackson" Michael Jackson Query results
My problem is when i put someone that is not a Singer or a Music band for example if i try something like "Jim Carrey" i get " error transction timed out Jim Carrey.
I think I got this problem because those properties are Undefined for Jim Carrey, but i tried an to put an OPTIONAL marker in each subquery in the first filter, but i get too the same error
I put the code in a pastebin file so you can find all three query
I know that i should not use Static String in a query or there are a lot of better mode but i need that since i compose the query with prolog and than send to sparql online so i must do in this way.
TO #Joshua I tried to remove the FILTER(String) in the NOT EXIST (Filter) But I will not work anymore thanks however for helping me
Excuse me for too much editing but i resolved some part of the starting problem but didn't find a solution
First problem :Filtering results based on specific properties with specific values. (Works)
Second : The first works only for Things with that specific property (as show above) like(Metallica,Michael Jackson, The Beatles, ...) but not for thos without the properties in the filter.
(i can't use more than two link because I'm a newbe so i will put a link in the comments with a pastebin links with the 3 Query and the results of they)

Limit a SPARQL query to one dataset

I'm working with the following SPARQL query, which is an example on the web-based end of my institution's SPARQL endpoint;
SELECT ?building_number ?name ?occupants WHERE {
?site a org:Site ;
rdfs:label "Highfield Campus" .
?building spacerel:within ?site ;
skos:notation ?building_number ;
rdfs:label ?name .
OPTIONAL {
?building soton:buildingOccupants ?occ .
?occ rdfs:label ?occupants .
} .
} ORDER BY ?name
The problem is that as well as getting data from 'Buildings and Places', the Dataset I'm interested in, and would expect the example to use, it also gets data from the 'Facilities and Equipment' dataset, which isn't relevant. You should see this if you follow the link.
I suspect the example may pre-date the addition of the Facilities and Equipment dataset, but even with the research I've done into SPARQL, I can't see a clear way to define which datasets to include.
Can anyone recommend a starting point to limit it to just show 'Buildings', or, more specifically, results from the 'Buildings and Places' dataset.
Thanks
First things first, you really need to use SELECT DISTINCT, as otherwise you'll get repeated results.
To answer your question, you can use GRAPH { ... } to filter certain parts of a SPARQL query to only match data from a specific dataset. This only works if the SPARQL endpoint is divided up into GRAPHs (this one is). The solution you asked for isn't the best choice, as it assumes that things within sites in the 'places' dataset will always be resticted to buildings... That's risky -- as it might end up containing trees and signposts at some time in the future.
Step one is to just find out what graphs are in play:
SELECT DISTINCT ?g1 ?building_number ?name ?occupants WHERE {
?site a org:Site ;
rdfs:label "Highfield Campus" .
GRAPH ?g1 { ?building spacerel:within ?site ;
skos:notation ?building_number ;
rdfs:label ?name .
}
OPTIONAL {
?building soton:buildingOccupants ?occ .
?occ rdfs:label ?occupants .
} .
} ORDER BY ?name
Try it here: http://is.gd/WdRAGX
From this you can see that http://id.southampton.ac.uk/dataset/places/latest and http://id.southampton.ac.uk/dataset/places/facilities are the two relevant ones.
To only look for things 'within' a site according to the "places" graph, use:
SELECT DISTINCT ?building_number ?name ?occupants WHERE {
?site a org:Site ;
rdfs:label "Highfield Campus" .
GRAPH <http://id.southampton.ac.uk/dataset/places/latest> {
?building spacerel:within ?site ;
skos:notation ?building_number ;
rdfs:label ?name .
}
OPTIONAL {
?building soton:buildingOccupants ?occ .
?occ rdfs:label ?occupants .
} .
} ORDER BY ?name
Alternate solutions:
Using rdf:type
Above I've answered your question, but it's not the answer to your problem. This solution is more semantic as it actually says 'only give me buildings within the campus' which is what you really mean.
Instead of filtering by graph, which is not very 'semantic' you could also restrict ?building to be of class 'building' which research facilities are not. They are still sometimes listed as 'within' a site. Usually when the uni has only published what campus they are on but not which building.
?building a rooms:Building
Using FILTER
In extreme cases you may not have data in different GRAPHS and there may not be an elegant relationship to use to filter your results. In this case you can use a FILTER and turn the building URI into a string and use a regular expression to match acceptable ones:
FILTER regex(str(?building), "^http://id.southampton.ac.uk/building/")
This is bar far the worst option and don't use it if you have to.
Belt and Braces
You can use any of these restictions together and a combination of restricting the GRAPH plus ensuring that all ?buildings really are buildings would be my recommended solution.