Wikidata COUNT(*) query times out - sparql

I have a straightforward query that counts how many humans have an English Wikipedia page.
prefix schema: <http://schema.org/>
PREFIX wikibase: <http://wikiba.se/ontology#>
PREFIX wd: <http://www.wikidata.org/entity/>
PREFIX wdt: <http://www.wikidata.org/prop/direct/>
SELECT ?item ?article
WHERE
{
?item wdt:P31 wd:Q5 . # Must be of a human
?article schema:about ?item ; # Must have a Wikipedia article
schema:inLanguage "en" ; # Article must be in English
schema:isPartOf <https://en.wikipedia.org/> . # Wikipedia article must be regular article
SERVICE wikibase:label { bd:serviceParam wikibase:language "en". } # Helps get the label in your language, if not, then en language
}
I get expected output as follows:
wd:Q11124 <https://en.wikipedia.org/wiki/Stephen_Breyer>
wd:Q10727 <https://en.wikipedia.org/wiki/Steve_Leo_Beleck>
wd:Q10065 <https://en.wikipedia.org/wiki/Taichang_Emperor>
wd:Q9605 <https://en.wikipedia.org/wiki/Sarah_Allen_(software_developer)>
However, if I change the SELECT statement from
SELECT ?item ?article
to
SELECT (count(?item) as ?count)
I get timeout error. Please note that the count statement works if I only specify "human" condition and exclude English Wiki article condition. So, clearly, some kind of background join is causing the query to timeout.
However, this is a fairly trivial join, so the query timeout is surprising.
Please let me know what may I be missing here.
Thanks!

Related

Wikidata Query: Find American authors of children’s fiction

I want to find all children's fiction writers using Wikidata SPARQL query. But I couldn't figure out how? Can someone help, please? The following is my approach but I don't think it is the correct way.
SELECT ?item ?itemLabel {
?item wdt:P31 wd:Q5. #find humans
?item wdt:P106 wd: #humans whose occupation is a novelist
[another condition needed] #children's fiction.
SERVICE wikibase:label {bd:serviceParam wikibase:language 'en'.}
} LIMIT 10
There is not one correct way, especially not in Wikidata where not all items of the same kind necessarily have the same properties.
One way would be to find the authors of works that are intended for (P2360) children:
# it’s a literary work (incl. any sublasses)
?book wdt:P31/wdt:P279* wd:Q7725634 .
# the literary work is intended for children
?book wdt:P2360 wd:Q7569 .
# the literary work has an author
?book wdt:P50 ?author .
# the author is a US citizen
?author wdt:P27 wd:Q30 .
Instead of getting all works that belong to the class "literary work" or any of its subclasses, you could decide to use only the class "fiction literature" (Q38072107) instead; with the risk that not all relevant works use this class.
Another way would be to find all authors that have "children’s writer" (Q4853732), or any of its subclasses, as occupation:
?author wdt:P106/wdt:P279* wd:Q4853732 .
?author wdt:P27 wd:Q30 .
As the different ways might find different results, you could could use them in the same query, using UNION:
SELECT DISTINCT ?author ?authorLabel
WHERE {
{
# way 1
}
UNION
{
# way 2
}
UNION
{
# way 3
}
SERVICE wikibase:label {bd:serviceParam wikibase:language 'en'.}
}

SPARQL wikidata query: Obtain number of languages that associated wikipedia article is available in

Is it possible to use SPARQL on wikidata to extract the number of languages of a wikipedia article associated with a wikidata item?
I am new to SPARQL and wikidata. I have tried to find examples online, but no luck so far. I am able to extract the url for the wikipedia article. But I wonder if it is possible to count the number of languages for which the url exists. Here is my code so far:
prefix schema: <http://schema.org/>
PREFIX wikibase: <http://wikiba.se/ontology#>
PREFIX wd: <http://www.wikidata.org/entity/>
PREFIX wdt: <http://www.wikidata.org/prop/direct/>
SELECT ?museum ?museumLabel ?article WHERE {
?museum wdt:P17 wd:Q39; # countries
wdt:P31 ?type.
?type (wdt:P279*) wd:Q207694.
# If available, get the "en" entry, use native language as fallback:
SERVICE wikibase:label { bd:serviceParam wikibase:language "en,de". }
# get wikipedia article in english
OPTIONAL {
?article schema:about ?museum .
?article schema:inLanguage "en" .
FILTER (SUBSTR(str(?article), 1, 25) = "https://en.wikipedia.org/")
}
}
order by ?museum
I would like to add another column to my output that gives me the number of languages a wikipedia article exists in.
PREFIX schema: <http://schema.org/>
PREFIX wikibase: <http://wikiba.se/ontology#>
PREFIX wd: <http://www.wikidata.org/entity/>
PREFIX wdt: <http://www.wikidata.org/prop/direct/>
SELECT ?museum ?museumLabel (COUNT(DISTINCT ?article) as ?wiki_article_count) WHERE {
?museum wdt:P17 wd:Q39; # countries
wdt:P31 ?type.
?type (wdt:P279*) wd:Q207694.
SERVICE wikibase:label { bd:serviceParam wikibase:language "en,de". }
OPTIONAL {
?article schema:about ?museum .
}
}
GROUP BY ?museum ?museumLabel
ORDER BY DESC( ?wiki_article_count )
Further developing the query given in the question, one can obtain the number of linked wiki-pages (I am absolutely not sure about the nomenclature).
Please try the Wikidata Query Service here: https://w.wiki/334g
For example, one can see for the wikidata/wiki entry: https://www.wikidata.org/wiki/Q194626 (Kunstmuseum Basel) (nomenclature remains unclear to me for now. Is it a wiki-entity?) 22 linked wiki-pages. That is 21 wikipedia-languages and one multilingual-site. See screenshot here:
I would be thankful for any comments on this answer and could improve this answer accordingly.

Ways to reduce query time of SPARQL query on Wikidata?

I want to create a histogram of births and deaths for people on English Wikipedia, but I am running into query time limits on Wikidata.
I formed the following query:
PREFIX wd: <http://www.wikidata.org/entity/>
PREFIX wdt: <http://www.wikidata.org/prop/direct/>
PREFIX schema: <http://schema.org/>
SELECT ?item ?article ?_date_of_birth ?_date_of_death WHERE {
?item wdt:P31 wd:Q5.
?article schema:about ?item.
?article schema:isPartOf <https://en.wikipedia.org/>.
OPTIONAL { ?item wdt:P569 ?_date_of_birth. }
OPTIONAL { ?item wdt:P570 ?_date_of_death. }
}
LIMIT 10000
Try it here
This works fine in and of itself, but as I'm trying to get the whole list, when I start adding offsets, I run into query time limits around OFFSET 500000. According to the Wikidata manual, I should try to optimize my query, but is there a way to optimize this? There are definitely more than 500000 people on wikipedia, as just finding transclusions of the 'birth date' template yields over 600000.
I have also tried dbpedia, but some of it is out of date, for example Muhammad Ali has no death date on dbpedia.
I've also tried not filtering out the english articles, i.e. asking for all of them and filtering on my end, but similar scaling issues still exist, albeit at a much higher offset.

How can I join climate data to cities with dbpedia?

I am trying to get a dataset that gives me all the data available in a city's climate table but I'm having some trouble.
I was able to get this to work and felt pretty good about myself. When I plug this in on dbpedia's virtuoso client this gives me all the cities that dbpedia has, and all of their countries.
PREFIX dbpedia-owl: <http://dbpedia.org/ontology/>
SELECT DISTINCT ?city_name ?country
WHERE { ?city rdf:type dbpedia-owl:City ;
rdfs:label ?city_name;
dbpedia-owl:country ?country
FILTER (langMatches(lang(?city_name), "EN")) .
}
Update: I have found properties that seem to give what I'm looking for (e.g. dbpedia.org/property/aprHighC) but I'm having trouble adding them to my output.
PREFIX dbpedia-owl: <http://dbpedia.org/ontology/>
SELECT DISTINCT ?city_name ?country ?aprHighC
WHERE { ?city rdf:type dbpedia-owl:City .
?city rdfs:label ?city_name .
?city dbpedia-owl:country ?country
FILTER (langMatches(lang(?city_name), "EN")) .
}
Gives an error: Variable 'aprHighC' is used in the query result set but not assigned. How do I assign it?
For the query to get results in the second query, a city has to have a three properties: rdfs:label, dbpedia-owl:country and dbpedia-owl:climate. Your query pretty much proves that DBPedia data has cities with label and country properties, but not climate. Try the following to see just what properties are found for members of dbpedia-owl:City:
SELECT DISTINCT ?p
WHERE {
?city rdf:type dbpedia-owl:City ;
?p ?o .
}
Note that not all members of dbpedia-owl:City will have these properties, but it gives you a range of what properties are used.
Looking at it the other way, you can ask what entities use the dbpedia-owl:climate property:
SELECT ?s
WHERE {
?s dbpedia-owl:climate ?climate
}
I didn't find any, so it could be the case that the prefix is different than the one you are using? I'd suggest double-checking the property name.
Regardless, it's a good idea to use SPARQL to find what is actually in the data store. And use LIMIT to look at parts of the data without overwhelming the system.
The following query gives the January average daily high (°C). Adding other climate items is as simple as copying the line beginning "OPTIONAL" and changing the item and variable name from janHighC to whatever you are trying to get.
PREFIX dbpedia-owl: <http://dbpedia.org/ontology/>
SELECT * {
{ ?city rdf:type dbo:City .
?city rdf:type schema:City .
?city rdfs:label ?name
}
OPTIONAL {?city dbp:janHighC ?janHighC .}
}
I will note, however, that most cities don't have this information. I had to give up on getting the data this way.

Wikidata Query Service - Incomplete Result: Wikipedia URL missing

I am using the Wikidata Query Service (https://query.wikidata.org) to obtain the (former) spouses of Daniel Craig with this query:
PREFIX schema: <http://schema.org/>
PREFIX wikibase: <http://wikiba.se/ontology#>
PREFIX wd: <http://www.wikidata.org/entity/>
PREFIX wdt: <http://www.wikidata.org/prop/direct/>
SELECT ?s ?sLabel ?s_wpurl1 ?s_wpurl2 WHERE {
?s wdt:P26 wd:Q4547 .
OPTIONAL {
?s_wpurl1 schema:about ?s .
?s_wpurl1 schema:inLanguage "en" .
FILTER (SUBSTR(str(?s_wpurl1), 1, 25) = "https://en.wikipedia.org/")
} .
OPTIONAL {
?s_wpurl2 schema:about ?s .
?s_wpurl2 schema:inLanguage "de" .
FILTER (SUBSTR(str(?s_wpurl2), 1, 25) = "https://de.wikipedia.org/")
} .
SERVICE wikibase:label {
bd:serviceParam wikibase:language "en" .
}
}
As expected, the result set consists of two results: Q134077 (Rachel Weisz) and Q62510 (Heike Makatsch).
But the requested en. and de.wikipedia article urls are only listed for Rachel Weisz, although respective articles exist for Heike Makatsch and are linked to the item representing Makatsch (see https://www.wikidata.org/wiki/Q62510).
The urls for that item are also not listed in other queries (where other items' urls are listed), so the problem is the item and not the query.
Why are wikipedia article urls missing in a SPARQL result set for an item where this information exists?
Update:
Now (a day later) the service is returning the urls.
Does anybody know when the service is expected to leave beta status and to become fully reliable?
The query service had the problem of missing sitelinks due to faulty data loading. It was fixed, so now the links should be fine.