SPARQL wikidata query: Obtain number of languages that associated wikipedia article is available in - sparql

Is it possible to use SPARQL on wikidata to extract the number of languages of a wikipedia article associated with a wikidata item?
I am new to SPARQL and wikidata. I have tried to find examples online, but no luck so far. I am able to extract the url for the wikipedia article. But I wonder if it is possible to count the number of languages for which the url exists. Here is my code so far:
prefix schema: <http://schema.org/>
PREFIX wikibase: <http://wikiba.se/ontology#>
PREFIX wd: <http://www.wikidata.org/entity/>
PREFIX wdt: <http://www.wikidata.org/prop/direct/>
SELECT ?museum ?museumLabel ?article WHERE {
?museum wdt:P17 wd:Q39; # countries
wdt:P31 ?type.
?type (wdt:P279*) wd:Q207694.
# If available, get the "en" entry, use native language as fallback:
SERVICE wikibase:label { bd:serviceParam wikibase:language "en,de". }
# get wikipedia article in english
OPTIONAL {
?article schema:about ?museum .
?article schema:inLanguage "en" .
FILTER (SUBSTR(str(?article), 1, 25) = "https://en.wikipedia.org/")
}
}
order by ?museum
I would like to add another column to my output that gives me the number of languages a wikipedia article exists in.

PREFIX schema: <http://schema.org/>
PREFIX wikibase: <http://wikiba.se/ontology#>
PREFIX wd: <http://www.wikidata.org/entity/>
PREFIX wdt: <http://www.wikidata.org/prop/direct/>
SELECT ?museum ?museumLabel (COUNT(DISTINCT ?article) as ?wiki_article_count) WHERE {
?museum wdt:P17 wd:Q39; # countries
wdt:P31 ?type.
?type (wdt:P279*) wd:Q207694.
SERVICE wikibase:label { bd:serviceParam wikibase:language "en,de". }
OPTIONAL {
?article schema:about ?museum .
}
}
GROUP BY ?museum ?museumLabel
ORDER BY DESC( ?wiki_article_count )
Further developing the query given in the question, one can obtain the number of linked wiki-pages (I am absolutely not sure about the nomenclature).
Please try the Wikidata Query Service here: https://w.wiki/334g
For example, one can see for the wikidata/wiki entry: https://www.wikidata.org/wiki/Q194626 (Kunstmuseum Basel) (nomenclature remains unclear to me for now. Is it a wiki-entity?) 22 linked wiki-pages. That is 21 wikipedia-languages and one multilingual-site. See screenshot here:
I would be thankful for any comments on this answer and could improve this answer accordingly.

Related

Wikidata COUNT(*) query times out

I have a straightforward query that counts how many humans have an English Wikipedia page.
prefix schema: <http://schema.org/>
PREFIX wikibase: <http://wikiba.se/ontology#>
PREFIX wd: <http://www.wikidata.org/entity/>
PREFIX wdt: <http://www.wikidata.org/prop/direct/>
SELECT ?item ?article
WHERE
{
?item wdt:P31 wd:Q5 . # Must be of a human
?article schema:about ?item ; # Must have a Wikipedia article
schema:inLanguage "en" ; # Article must be in English
schema:isPartOf <https://en.wikipedia.org/> . # Wikipedia article must be regular article
SERVICE wikibase:label { bd:serviceParam wikibase:language "en". } # Helps get the label in your language, if not, then en language
}
I get expected output as follows:
wd:Q11124 <https://en.wikipedia.org/wiki/Stephen_Breyer>
wd:Q10727 <https://en.wikipedia.org/wiki/Steve_Leo_Beleck>
wd:Q10065 <https://en.wikipedia.org/wiki/Taichang_Emperor>
wd:Q9605 <https://en.wikipedia.org/wiki/Sarah_Allen_(software_developer)>
However, if I change the SELECT statement from
SELECT ?item ?article
to
SELECT (count(?item) as ?count)
I get timeout error. Please note that the count statement works if I only specify "human" condition and exclude English Wiki article condition. So, clearly, some kind of background join is causing the query to timeout.
However, this is a fairly trivial join, so the query timeout is surprising.
Please let me know what may I be missing here.
Thanks!

Wikidata Query: Find American authors of children’s fiction

I want to find all children's fiction writers using Wikidata SPARQL query. But I couldn't figure out how? Can someone help, please? The following is my approach but I don't think it is the correct way.
SELECT ?item ?itemLabel {
?item wdt:P31 wd:Q5. #find humans
?item wdt:P106 wd: #humans whose occupation is a novelist
[another condition needed] #children's fiction.
SERVICE wikibase:label {bd:serviceParam wikibase:language 'en'.}
} LIMIT 10
There is not one correct way, especially not in Wikidata where not all items of the same kind necessarily have the same properties.
One way would be to find the authors of works that are intended for (P2360) children:
# it’s a literary work (incl. any sublasses)
?book wdt:P31/wdt:P279* wd:Q7725634 .
# the literary work is intended for children
?book wdt:P2360 wd:Q7569 .
# the literary work has an author
?book wdt:P50 ?author .
# the author is a US citizen
?author wdt:P27 wd:Q30 .
Instead of getting all works that belong to the class "literary work" or any of its subclasses, you could decide to use only the class "fiction literature" (Q38072107) instead; with the risk that not all relevant works use this class.
Another way would be to find all authors that have "children’s writer" (Q4853732), or any of its subclasses, as occupation:
?author wdt:P106/wdt:P279* wd:Q4853732 .
?author wdt:P27 wd:Q30 .
As the different ways might find different results, you could could use them in the same query, using UNION:
SELECT DISTINCT ?author ?authorLabel
WHERE {
{
# way 1
}
UNION
{
# way 2
}
UNION
{
# way 3
}
SERVICE wikibase:label {bd:serviceParam wikibase:language 'en'.}
}

How to speed up this federated query?

I have a query which brings people who a dbo:Writer and have different birth-dates stated in DBpedia and Wikidata. I wrote it initially to be run independently but currently I'm running on Wikidata. It's quite slow and the interesting thing is that when I don't call with Wikidata endpoint with SERVICE, which I can afford to do here, it gets even slower. I'd be interested to learn why it is so. But my main question is, how to optimize it? Here is the query:
PREFIX owl: <http://www.w3.org/2002/07/owl#>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX dbo: <http://dbpedia.org/ontology/>
PREFIX dbp: <http://dbpedia.org/property/>
SELECT DISTINCT ?personDBP ?personWD ?personWDLabel ?BirthDateDBP ?BirthDateWD
WHERE {
SERVICE <http://dbpedia.org/sparql> {
?personDBP a dbo:Writer ;
dbo:birthDate ?BirthDate_DBP;
owl:sameAs ?personWD .
FILTER REGEX (?personWD, "wikidata.org")
BIND (xsd:date(?BirthDate_DBP) AS ?BirthDateDBP)
}
SERVICE <https://query.wikidata.org/sparql> {
?personWD wdt:P569 ?BirthDate_WD .
BIND (xsd:date(?BirthDate_WD) AS ?BirthDateWD)
}
FILTER (?BirthDateDBP != ?BirthDateWD)
SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en". }
}
ORDER BY DESC(?BirthDateDBP)
LIMIT 50

How to check if Japanese version of Wikidata item exists

I'm trying to build a list of names and match them to their Japanese equivalents.
I first thought about trying to crawl Wikipedia and following a link to the Japanese version of the page but I didn't know how to check whether the page was about a person or anything else.
Thankfully there's the wikidata and dbpedia projects.
I started tinkering with wikidata and found this example
https://www.mediawiki.org/wiki/Wikibase/Indexing/SPARQL_Query_Examples#People_born_before_year_1880_with_no_death_date
which can be shrunk to a query for 'people'
PREFIX wikibase: <http://wikiba.se/ontology#>
PREFIX wd: <http://www.wikidata.org/entity/>
PREFIX wdt: <http://www.wikidata.org/prop/direct/>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
SELECT * WHERE {
?h wdt:P31 wd:Q5 .
} LIMIT 1
That results in a link for George Washington
https://www.wikidata.org/wiki/Q23
At the bottom of that page is a list of links to wikipedia pages for this person
in other languages including Japanese.
Is there a way of returning the name and Japanese version in the same query?
If you are querying wikidata, you may use label service:
PREFIX wikibase: <http://wikiba.se/ontology#>
PREFIX wd: <http://www.wikidata.org/entity/>
PREFIX wdt: <http://www.wikidata.org/prop/direct/>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX p: <http://www.wikidata.org/prop/>
PREFIX v: <http://www.wikidata.org/prop/statement/>
SELECT * WHERE {
wd:Q30 p:P6/v:P6 ?p .
SERVICE wikibase:label {
bd:serviceParam wikibase:language "en" .
?p rdfs:label ?enName .
}
SERVICE wikibase:label {
bd:serviceParam wikibase:language "ja" .
?p rdfs:label ?jaName .
}
}
Link
I've found answer here

Wikidata Query Service - Incomplete Result: Wikipedia URL missing

I am using the Wikidata Query Service (https://query.wikidata.org) to obtain the (former) spouses of Daniel Craig with this query:
PREFIX schema: <http://schema.org/>
PREFIX wikibase: <http://wikiba.se/ontology#>
PREFIX wd: <http://www.wikidata.org/entity/>
PREFIX wdt: <http://www.wikidata.org/prop/direct/>
SELECT ?s ?sLabel ?s_wpurl1 ?s_wpurl2 WHERE {
?s wdt:P26 wd:Q4547 .
OPTIONAL {
?s_wpurl1 schema:about ?s .
?s_wpurl1 schema:inLanguage "en" .
FILTER (SUBSTR(str(?s_wpurl1), 1, 25) = "https://en.wikipedia.org/")
} .
OPTIONAL {
?s_wpurl2 schema:about ?s .
?s_wpurl2 schema:inLanguage "de" .
FILTER (SUBSTR(str(?s_wpurl2), 1, 25) = "https://de.wikipedia.org/")
} .
SERVICE wikibase:label {
bd:serviceParam wikibase:language "en" .
}
}
As expected, the result set consists of two results: Q134077 (Rachel Weisz) and Q62510 (Heike Makatsch).
But the requested en. and de.wikipedia article urls are only listed for Rachel Weisz, although respective articles exist for Heike Makatsch and are linked to the item representing Makatsch (see https://www.wikidata.org/wiki/Q62510).
The urls for that item are also not listed in other queries (where other items' urls are listed), so the problem is the item and not the query.
Why are wikipedia article urls missing in a SPARQL result set for an item where this information exists?
Update:
Now (a day later) the service is returning the urls.
Does anybody know when the service is expected to leave beta status and to become fully reliable?
The query service had the problem of missing sitelinks due to faulty data loading. It was fixed, so now the links should be fine.