Return one image (and sitelinks) per item? - sparql

I want to return a table where each row is a distinct toy item and there are columns for each toy's image and its sitelink count.
Q: Is there a better way to do this than what I finally did below? Why did I have to move labeling and sitelinks to the inner query?
Initially, I naively thought I could run the following query. But I discovered it created one row for each toy-image pair (I suppose it would return what I want if every image property had a priority-ranked image?). E.g., "gumball machine" (wd:Q1737075) has two rows, one for each of its two images.
SELECT ?item ?itemLabel ?image ?sitelinks WHERE {
?item wdt:P31 wd:Q11422; #toy, returns
wdt:P18 ?image;
wikibase:sitelinks ?sitelinks.
SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en". }
}
ORDER BY DESC(?sitelinks)
Run it!
So then I ran the following query, which gives me what I want.
SELECT ?item ?itemLabel ?sitelinks ?image WHERE {
{
SELECT ?item ?itemLabel ?sitelinks (MAX(?_image) AS ?image) WHERE {
?item wdt:P31 wd:Q11422; #toys
wikibase:sitelinks ?sitelinks;
rdfs:label ?itemLabel;
wdt:P18 ?_image.
FILTER(LANG(?itemLabel)="en")
}
GROUP BY ?item ?itemLabel ?sitelinks
}
?item wdt:P18 ?image.
#rdfs:label ?itemLabel;
#wikibase:sitelinks ?sitelinks
#FILTER(LANG(?itemLabel)="en")
}
ORDER BY DESC(?sitelinks)
Run it!
But Is this right? Do I really need to nest queries in order to get one image per item?
Also, you can see from the commented lines that I initially tried running this with the labelling and sitelinks in the outer query. But that led to query timeouts. Why? Shouldn't that have been the more efficient construction, saving the labelling/sitelink work to the end where I have a smaller dataset after the inner query work?

In the second query, there is no need to wrap the inner portion with the outer portion. So the following works just fine
SELECT ?item ?itemLabel ?sitelinks (MAX(?_image) AS ?image)
WHERE {
?item wdt:P31 wd:Q11422;
wikibase:sitelinks ?sitelinks;
rdfs:label ?itemLabel;
wdt:P18 ?_image.
FILTER(LANG(?itemLabel)="en")
}
GROUP BY ?item ?itemLabel ?sitelinks
ORDER BY DESC(?sitelinks)
Run it!
And if you don't need a specific image or to maintain replicability, just use SAMPLE to avoid the extra operations with MAX:
SELECT ?item ?itemLabel ?sitelinks (SAMPLE(?_image) AS ?image)
WHERE {
?item wdt:P31 wd:Q11422;
wikibase:sitelinks ?sitelinks;
rdfs:label ?itemLabel;
wdt:P18 ?_image.
FILTER(LANG(?itemLabel)="en")
}
GROUP BY ?item ?itemLabel ?sitelinks
ORDER BY DESC(?sitelinks)
Run it!

Related

Wikidata do not return me itemLabel sometimes

I am looking for people of french nationality born in 1900 (and still living). I do not well understand the behaviour of wikidata in response to my following request:
SELECT ?item ?itemLabel ?itemDescription
WHERE {
?item wdt:P31 wd:Q5.
?item wdt:P569 ?dateOfBirth.
?item wdt:P27 wd:Q142.
FILTER NOT EXISTS {?item wdt:P570|wdt:P509|wdt:P20 ?o}
FILTER("1900-00-00"^^xsd:dateTime <= ?dateOfBirth && ?dateOfBirth < "1901-00-00"^^xsd:dateTime)
SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],fr". }
}
I do not understand why the folowing request do not return itemLabel for some rows; for example the itemlabel returned for https://www.wikidata.org/wiki/Q47508624 is its "id": Q47508624
By using the wikibase:language option, you're asking for Wikidata to provide you with the labels for each ?item in the ?itemLabel variable. You've requested that it provide you labels in either the language preferred by your browser ([AUTO_LANGUAGE]) or French (fr). I would guess that your browser's default language is French also. With a browser set with English as the default, I get "Hugues Esquerre" as the ?itemLabel value for wd:Q47508624 (this record has labels defined in English and Spanish).
You can add additional acceptable languages in the comma-separated list in the query to increase the liklihood of getting label values back:
SELECT ?item ?itemLabel ?itemDescription
WHERE {
?item wdt:P31 wd:Q5.
?item wdt:P569 ?dateOfBirth.
?item wdt:P27 wd:Q142.
FILTER NOT EXISTS {?item wdt:P570|wdt:P509|wdt:P20 ?o}
FILTER("1900-00-00"^^xsd:dateTime <= ?dateOfBirth && ?dateOfBirth < "1901-00-00"^^xsd:dateTime)
SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],fr,en,es". }
}

SPARQL and counting items

I'm using SPARQL and Wikidata query service to try and determine: The actors that got the Oscar award sorted by the total number of (any) awards they received in decreasing order with the list of all their awards.
So far this is what I have that works but doesn't quite do what the question is asking.
SELECT ?item ?itemLabel
WHERE {
?item wdt:P31 wd:Q5 .
?item wdt:P166 wd:Q103916 .
SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en". }
}
LIMIT 1000
And this I what I have that I am trying to get to work but is not yet working. Im new to this and any help is appreciated. Updated below query because I've gotten a bit closer
SELECT ?item ?itemLabel (COUNT (DISTINCT ?year) AS ?count)
WHERE {
?item wdt:P31 wd:Q5 .
?item wdt:P166 wd:Q103916 .
?item p:P585 ?year .
SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en". }
}
GROUP BY ?item ?itemLabel
ORDER BY ?count
LIMIT 100

Wikidata query for items with one Wikipedia page not in English

I want to find Wikidata items, with each referring to exactly one Wikipedia page which is not an en. Wikipedia page.
I came up with this query:
SELECT ?item WHERE {
?article schema:about ?item .
FILTER (SUBSTR(str(?article), 9, 2) != "en") .
{
SELECT ?item (COUNT(DISTINCT ?lang) AS ?count) WHERE {
?item wdt:P1367 ?yp_id . # BBC 'Your paintings' artist identifier
?article schema:about ?item .
FILTER (SUBSTR(str(?article), 11, 15) = ".wikipedia.org/") .
?article schema:inLanguage ?lang .
} GROUP BY ?item
HAVING (?count=1)
ORDER BY DESC (?count)
}
}
It executes. However, I always get a timeout.
Is there a better query to achieve what I am looking for?
Here's some tip:
Since you take only ?count=1, there is no reason to order by ?count.
Since for each article you can have only one ?lang, you can count by ?article without considering a redundant variable.
Instead of working on (sub)strings, just use the schema:isPartOf property for selecting the specific domain that you want to exclude.
Use FILTER NOT EXISTS instead of FILTER (... != ...)
The fourth optimiziation is the most important and it is sufficient per se.
SELECT ?item WHERE {
FILTER NOT EXISTS {
?article schema:about ?item ;
schema:isPartOf <https://en.wikipedia.org/> .
}
{
SELECT ?item (COUNT(DISTINCT ?article) AS ?count) WHERE {
?item wdt:P1367 ?yp_id . # BBC 'Your paintings' artist identifier
?article schema:about ?item .
FILTER (SUBSTR(str(?article), 11, 15) = ".wikipedia.org/") .
}
GROUP BY ?item
HAVING (?count=1)
}
}

SPARQL query that returns Wikidpedia labels from Wikidata itemLabel

I am new to SPARQL,
Is it possible to write a query that returns Wikipedia box information for a corresponding item label from the Wikipedia box for the Arabic Language that appears at the bottom of the Wikidata item page?
see the picture:
Instead of the Wikipedia URL in the following Query, I need to return the Wikipedia Label, in our case (الرامة (جنين))
Try Query on Wikidata Query Service
SELECT DISTINCT ?article ?item ?itemLabel ?itemDescription ?entity_type ?main_category (GROUP_CONCAT(DISTINCT(?altLabel); separator = ", ") AS ?altLabel_list) WHERE {
?item ?label "الرامة"#ar.
?item wdt:P31 ?entity_type .
MINUS { ?item wdt:P31 wd:Q4167410}
OPTIONAL{ ?item wdt:P910 ?main_category}
?article schema:about ?item;
schema:isPartOf <https://ar.wikipedia.org/>;
OPTIONAL { ?item skos:altLabel ?altLabel . FILTER (lang(?altLabel) = "ar") }
SERVICE wikibase:label { bd:serviceParam wikibase:language "ar" .}
}
GROUP BY ?article ?item ?itemLabel ?itemDescription ?entity_type ?main_category
This is the answer by the UninformedUser
> SELECT ?article ?wikipediaLabel WHERE
> { ?article schema:about wd:Q12187640 . ?article schema:isPartOf <https://ar.wikipedia.org/>; schema:name
> ?wikipediaLabel }

How to get only the first value from an optional property?

Like in the SQL aggregate MAX, MIN or FIRST, it gets only one value, not duplicating lines.
Real Wikidata case
Where the OPTIONAL clause expands from 253 to 257 lines:
# Countries and its codes
SELECT ?code ?item ?itemLabel ?osmId
WHERE
{
?item wdt:P297 ?code.
OPTIONAL{?item wdt:P402 ?osmId .}
SERVICE wikibase:label { bd:serviceParam wikibase:language "en". }
}
ORDER BY ?code
try here
I need only one (any) osmId. How to do something like FIRST{OPTIONAL{?item wdt:P402 ?osmId .}} ?
NOTES:
it is not a duplicate of How to get only the most recent value from a Wikidata property?
it is not a duplicate of Why does this Wikidata SPARQL query only work for the first element in a list?
... no exactly need for simple "any first".
Here a WIKI answer (please you can edit to enhance this answer!)
# Countries and its codes
SELECT ?code ?item ?itemLabel
(MAX(?osmId) as ?osmId_max) (COUNT(?code) as ?osmId_n)
WHERE
{
?item wdt:P297 ?code.
OPTIONAL{?item wdt:P402 ?osmId .}
SERVICE wikibase:label { bd:serviceParam wikibase:language "en". }
}
GROUP BY ?code ?item ?itemLabel
ORDER BY ?code
try
The COUNT(?code) is only to check the lines where osmId was not an Unique-ID.
Other simple solution to filter only the first option?
Using SAMPLE
As the #ValerioCocchi suggestion, we can use SAMPLE instead MAX:
SELECT ?code ?item ?itemLabel (SAMPLE(?osmId) as ?osmId_sample)
WHERE
{
?item wdt:P297 ?code.
OPTIONAL{?item wdt:P402 ?osmId .}
SERVICE wikibase:label { bd:serviceParam wikibase:language "en". }
}
GROUP BY ?code ?item ?itemLabel
ORDER BY ?code
try
SAMPLE use a little bit less CPU-time, but the main motivation to use is when you don’t care which value is returned. In the case of Wikidata, when the property-value is to be unique but there are some (minimal) errors, and you can ignore them.
NOTE about the osmId: the advantage of MAX in this particular query, using an numeric ID related to a temporal sequence, is that it can be a "fresher" ID... But in OpenStreetMap (OSM) the strategy can be the inverse: most old is the most stable ID. So, SAMPLE make sense also in a context of ignorance about better strategy.
Using FILTER
The #StanislavKralin suggestion:
SELECT ?code ?item ?itemLabel ?osmId
WHERE
{
?item wdt:P297 ?code.
OPTIONAL{
?item wdt:P402 ?osmId
FILTER NOT EXISTS {
?item wdt:P402 ?osmId, ?osmId_ .
FILTER (?osmId_ > ?osmId)
}
}
SERVICE wikibase:label { bd:serviceParam wikibase:language "en". }
}
ORDER BY ?code
try
Seems more verbose.