Wikidata SPARQL Query Qualifier Value - sparql

This should be fairly easy for anyone familiar with SPARQL (which I am not). I'm trying to return a qualifier/property value for "score_by" in this query and it's showing up blank:
SELECT ?item ?itemLabel ?IMDb_ID ?_review_score ?_score_by WHERE {
?item wdt:P345 "tt3315342".
SERVICE wikibase:label { bd:serviceParam wikibase:language "en". }
OPTIONAL { ?item wdt:P345 ?IMDb_ID. }
OPTIONAL { ?item wdt:P444 ?_review_score. }
OPTIONAL { ?item ps:P447 ?_score_by. }
}
Here is a link to this query

'Score by' is a tricky thing, because it qualifies a score.
Scores are complex things: they aren't just a value, but are qualified by the scorer (Rotten Tomatoes, IMDB, etc). If your query worked the answers would be misleading, since it wouldn't be clear whether ?_review_score corresponded to ?_score_by, i.e. whether the review score corresponded to the review.
(You might ask why P444 - score - is there, since without a reviewer the information isn't complete. It's a fair question. The actual property is wdt:P444, a wikidata direct property. What that means is that the property was created as a shortcut for convenience, at the expense of losing some context. They're like database views.)
The way they actually work is by 'reifying' the complex review score as a thing, an object 'the review', then hanging the information - score, reviewer etc - off that.
For example:
select * where {
wd:Q24053263 p:P444 ?review . # Get reviews for wolverine
?review ?p ?o # Get all info from the review
}
Link
You can see here that the score is there under p:statement/P444, and there's a 'qualifier' p:qualifier/P447, i.e. the reviewer.
Essentially properties in wikidata can appear in a number of guises, encoded in the prefix.
To answer your question:
OPTIONAL { ?item wdt:P444 ?_review_score. }
OPTIONAL { ?item ps:P447 ?_score_by. }
should be
OPTIONAL {
?item p:P444 ?review .
?review pq:P447 ?_score_by ; ps:P444 ?_review_score
}
Link
i.e. Treat the review as a single thing, then get the score and corresponding reviewer from that.
(If you worry that there might be scores without reviewers you could add another optional within that)

Related

Get chemical compound data in SI units from wikidata

For a project I need to get data about chemical compounds like density, mass, boiling point and melting point in SI units (meters, kg, Degree Celsius,...) via the CAS number of the compound.
With the Query builder and some testing I managed to achieve some of it with the following code (CAS-Number is the property P231 and I am searching for e.g. 67-64-1):
Wikidata Query Service
SELECT DISTINCT ?itemLabel ?melting_point ?boiling_point ?mass ?density WHERE {
{
SELECT DISTINCT ?item WHERE {
?item p:P231 ?statement0.
?statement0 (ps:P231) "67-64-1".
}
}
OPTIONAL { ?item wdt:P2101 ?melting_point. }
OPTIONAL { ?item wdt:P2102 ?boiling_point. }
OPTIONAL { ?item wdt:P2054 ?density. }
OPTIONAL { ?item wdt:P2067 ?mass. }
SERVICE wikibase:label { bd:serviceParam wikibase:language "de". }
}
The problem is that I don't manage to get only temperatures in Degree Celsius but also Fahrenheit
This is a very interesting question, as WikiData offers plenty of tools to disambiguate -- this is why it's so powerful.
But these tools come with some learning that the user needs to do before using them.
Before starting, let me make three points about your query:
1-You don't actually need an inner query to select acetone, as you would in SQL. This is one of the reasons why SPARQL is so great compared to SQL -- you don't have to navigate an endless field of keys but your data is still 'normalised'.
2-You don't need the ?statement0 variable, as this is not used to disambiguate. You can just use the wdt:P231 property directly links acetone with its CAS registry number.
3-Since you do need to disambiguate the values of the physical quantities associated with acetone, you will need to go through a disambiguation statement.
Now, here is a query that works:
SELECT DISTINCT ?itemLabel ?melting_point ?boiling_point ?density ?mass
WHERE {
?item wdt:P231 "67-64-1".
OPTIONAL {
?item p:P2101 ?ps1 .
?ps1 ps:P2101 ?melting_point;
psv:P2101/wikibase:quantityUnit/wdt:P31/wdt:P279* wd:Q61610698
}
OPTIONAL {
?item p:P2102 ?ps2 .
?ps2 ps:P2102 ?boiling_point;
psv:P2102/wikibase:quantityUnit/wdt:P31/wdt:P279* wd:Q61610698
}
OPTIONAL {
?item p:P2054 ?ps3 .
?ps3 ps:P2054 ?density;
psv:P2054/wikibase:quantityUnit/wdt:P31/wdt:P279* wd:Q61610698
}
OPTIONAL {
?item p:P2067 ?ps4 .
?ps4 ps:P2067 ?mass;
psv:P2067/wikibase:quantityUnit/wdt:P31/wdt:P279* wd:Q61610698
}
SERVICE wikibase:label { bd:serviceParam wikibase:language "de". }
}
To begin with, I removed the inner query and the statement mentioned in §1 and §2 above.
Then, I retrieve the statement that talks about melting point by using the p:P2101/ps:P2101 combination.
This will allows me to distinguish between Celsius and Fahrenheit values.
Now, since we have multiple physical quantities to look for (i.e. not just temperature), and we want these to be SI, we can use a property path (see below for explanation) to restrict the values that we return as being SI (as opposed to returning Celsius, kg/m^3, kg specifically and individually, although this would be perfectly valid too, just more complex).
For reference, a property path is just a way to shorten a query so:
SELECT ?person ?grandparent
WHERE {
?person :hasParent ?parent .
?parent :hasParent ?grandparent .
}
can be shortened to:
SELECT ?person ?grandparent
WHERE {
?person :hasParent/:hasParent ?grandparent .
}
Now, let's get back to the melting point being returned in Celsius and Fahrenheit.
The two statements that give us different units use a psv:P2101 property to tell us more about the value mentioned in the statement.
From this we can use the wikibase:quantityUnit property to determine the unit.
We will then want to make sure the unit is a SI unit or any subclass thereof. So wdt:P31 tells us that the unit is "an instance of" some class, and wdt:P279* wd:Q61610698 (wdt:P279* is another property path) tells us that the class is either the class of SI units (wd:Q61610698 = SI units), or a direct or indirect subclass of SI units.
I added a picture of what the data looks like (although confusingly there are two Celsius melting points for acetone for some reason.

How to construct SPARQL query for a list of Wikidata items

First off, I'm not a developer, and I'm new to writing SPARQL queries. Mostly I've been looking up existing queries and trying to tweak them to get what I need. The issue is that most documentation on query construction have to do with getting new data you don't have, rather than retrieving or extending existing data. And when you do find tips for retrieving existing data, they tend to be for ONE item at a time instead of a full data set of many items.
I mostly use OpenRefine for this. I start by loading up my existing list of names, and used the Wikidata extension service to reconcile the names to existing Wikidata IDs. So now, this is where I am, vs. where I want to go:
1 - We have a list of Wikidata IDs for reconciled matches;
2 - We have used OpenRefine to get most of the data we need from those;
3 - We don't have the label, description, or Wikipedia links (English), which are extremely valuable;
4 - I have figured out how to construct a query for the label and description of just ONE Wikidata Item:
SELECT ?itemLabel ?itemDescription WHERE { VALUES ?item {
wd:Q15485689 } SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en". }
}
5 - I have figured out how to construct a query to extract the Wikipedia English URL for just ONE Wikidata item:
SELECT ?article ?lang ?name WHERE {
?article schema:about wd:Q15485689;
schema:inLanguage ?lang;
schema:name ?name;
schema:isPartOf _:b13.
_:b13 wikibase:wikiGroup "wikipedia".
FILTER(?lang IN("en"))
FILTER(!(CONTAINS(?name, ":")))
OPTIONAL { ?article wdt:P31 ?instance_of. }
}
The questions are:
How do I modify either query to generate these same results for MORE THAN ONE* Wikidata item?
How do I modify the query to give me all three at once, for more than one* Wikidata item?
*we have 667, but I could do smaller batches if that's too much for the service to handle
Ideally, the query would generate something that allowed me to download a CSV file looking much like this (so I can match on and import the new data into our Airtable base which feeds the website application):
ideal CSV output
If anyone can lead me in the right direction here, I'd appreciate it.
I should also note that if OpenRefine has a way of retrieving these I'm all ears! But since these three don't have a property code, I couldn't see how to snag them from OR.
This sort of thing. See how many QIds you can get away with in the values statement. All of them in one go, probably. This query gives you the URL and the article title; clearly, you can snip the article title column if you do not want it. Note also https://www.wikidata.org/wiki/Wikidata:Request_a_query which is wikidata's own location for questions such as these.
SELECT ?item ?itemLabel ?itemDescription ?sitelink ?article
WHERE
{
VALUES ?item {wd:Q105848230 wd:Q6697407 wd:Q2344502 wd:Q1698206}
OPTIONAL {
?article schema:about ?item ;
schema:isPartOf <https://en.wikipedia.org/> ;
schema:name ?sitelink .
}
SERVICE wikibase:label { bd:serviceParam wikibase:language "en". }
}
Yes, a VALUES statement in SPARQL can relay not only hundreds but even thousands of items. I regularly do this when cross-checking to see how Wikidata matches up to an existing data set. Some other things you could do as well that take lists of Wikidata items:
Petscan - https://petscan.wmflabs.org/
TABernacle - https://tabernacle.toolforge.org/

Efficient filter query in Wikidata

I am trying to form an efficient filter query in SPARQL on Wikidata. Let me explain my process:
I query the search-entities API using key words e.g. (Apple, Orange)
The API query returns a list of relevant item ID's e.g. (wd:Q629269, wd:Q154950, wd:Q312, wd:Q95, wd:Q4878289, wd:Q10817602)
With this list of ID's, I then query SPARQL and to return items that are CLASS or are SUBLCASS of certain types e.g. (p:P31/ps:P31/wdt:P279* wd:Q43229) - which returns everything if it is an Organisation or subclass thereof.
Then for items in the list of ID's, that are of certain CLASS, return information items if they exists e.g. (OPTIONAL).
I am new to SPARQL. My Question is, is this the most efficient method to achieve this output? It seems to me to be quite inefficient and I cannot find a similar type of problem in the tutorial examples.
You can try the query here.
SELECT distinct ?item ?itemLabel ?itemDescription ?web ?inception ?ISIN
WHERE{
FILTER (?item IN (wd:Q629269, wd:Q154950, wd:Q312, wd:Q95, wd:Q4878289, wd:Q10817602))
?item p:P31/ps:P31/wdt:P279* wd:Q43229.
OPTIONAL {
?item wdt:P856 ?web. # get item-web
?item wdt:P571 ?inception. # get item-web
?item wdt:P946 ?ISIN. # get item-isin
}
SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE]". }
}
LIMIT 10

How to get only the most recent value from a Wikidata property?

Suppose I want to get a list of every country (Q6256) and its most recently recorded Human Development Index (P1081) value. The Human Development Index property for the country contains a list of data points taken at different points in time, but I only care about the most recent data. This query will not work because it gets multiple results for each country (one for each Human Development Index data point):
SELECT
?country
?countryLabel
?hdi_value
?hdi_date
WHERE {
?country wdt:P31 wd:Q6256.
OPTIONAL { ?country p:P1081 ?hdi_statement.
?hdi_statement ps:P1081 ?hdi_value.
?hdi_statement pq:P585 ?hdi_date.
}
SERVICE wikibase:label { bd:serviceParam wikibase:language "en". }
}
Link to Query Console
I'm aware of GROUP BY/GROUP CONCAT but that will still give me every result when I'd prefer to just have one. GROUP BY/SAMPLE will also not work since SAMPLE is not guaranteed to take the most recent result.
Any help or link to a relevant example query is appreciated!
P.S. Another thing I'm confused about is why population P1082 in this query returns only one population result per country
SELECT
?country
?countryLabel
?population
WHERE {
?country wdt:P31 wd:Q6256.
OPTIONAL { ?country wdt:P1082 ?population. }
SERVICE wikibase:label { bd:serviceParam wikibase:language "en". }
}
while the same query but for HDI returns multiple results per country:
SELECT
?country
?countryLabel
?hdi
WHERE {
?country wdt:P31 wd:Q6256.
OPTIONAL { ?country wdt:P1081 ?hdi. }
SERVICE wikibase:label { bd:serviceParam wikibase:language "en". }
}
What is different about population and HDI that causes the behavior to be different? When I view the population data for each country on Wikidata I see multiple population points listed, but only one gets returned by the query.
Both your questions are duplicates, but I'll try to add interesting facts to existing answers.
Question 1 is a duplicate of SPARQL query to get only results with the most recent date.
This technique does the trick:
FILTER NOT EXISTS {
?country p:P1081/pq:P585 ?hdi_date_ .
FILTER (?hdi_date_ > ?hdi_date)
}
However, you should add this clause outside of OPTIONAL, it is not working inside of OPTIONAL (and I'm not sure this is not a bug).
Question 2 is a duplicate of Some cities aren't instances of city or big city?
You can't use wdt-predicates, because missing statements are not truthy.
They are normal-rank statements, but there is a preferred-rank statement.
Truthy statements represent statements that have the best non-deprecated rank for given property. Namely, if there is a preferred statement for property P2, then only preferred statements for P2 will be considered truthy. Otherwise, all normal-rank statements are considered truthy.
The reason why P1081 always has preferred statement is that this property is processed by PreferentialBot.

Getting members of a category from Wikidata

I would like to get all the members of a specific category from Wikidata. For example, I would like to get all the films (instances of film: P31 Q11424) from the category "Category:Films set in Stockholm" (Q7519614).
However, I can't seem to find what the relationship would be. DBpedia uses "subject of" but the Wikidata equivalent (P805) doesn't return any results.
I also thought I could bootstrap my way to the answer with this query, but to no avail:
SELECT ?s ?p ?pLabel WHERE {
?s ?p wd:Q7519614.
SERVICE wikibase:label { bd:serviceParam wikibase:language "en". }
}
May be an indirect answer, but worth trying
When you look for the property related to an entity like Q7519614, it's often worth trying https://www.wikidata.org/wiki/Special:WhatLinksHere/Q7519614 (What links here?)
In this case the answer is empty which means there IS NO relation encoded in WIKIDATA for this information. (it means you need to rely on 3rd parties tool to access WIKIPEDIA information)
The second way to see your question is also encoded with P360 (is a list of)
In this case it says that it's a list of Film with (Q11424) with filming location (P915) equal to Stockholm (Q1754)
So the closest query you're looking at is
SELECT ?film
WHERE {
?film wdt:P31 wd:Q11424;
wdt:P915 wd:Q506250.
}
The API offers "Categorymembers" to get a List of pages that belong to a given category, ordered by page sort title. Parameters are documented here.