Getting only english property value - sparql

I am trying to get a list countries including the english short names:
# get a list countries with the corresponding ISO code
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX wd: <http://www.wikidata.org/entity/>
PREFIX wdt: <http://www.wikidata.org/prop/direct/>
PREFIX wikibase: <http://wikiba.se/ontology#>
SELECT ?country ?countryLabel ?shortName (MAX(?pop) as ?population) ?coord ?isocode
WHERE
{
# instance of country
?country wdt:P31 wd:Q3624078.
OPTIONAL {
?country rdfs:label ?countryLabel filter (lang(?countryLabel) = "en").
}
OPTIONAL {
# https://www.wikidata.org/wiki/Property:P1813
?country wdt:P1813 ?shortName.
}
OPTIONAL {
# get the population
# https://www.wikidata.org/wiki/Property:P1082
?country wdt:P1082 ?pop.
}
# get the iso countryCode
{ ?country wdt:P297 ?isocode }.
# get the coordinate
OPTIONAL { ?country wdt:P625 ?coord }.
}
GROUP BY ?country ?countryLabel ?shortName ?population ?coord ?isocode
ORDER BY ?countryLabel
try it!
Unfortunately also flags and non english versions of "shortName" are returned. I tried using a subquery but that timed out. I'd like to avoid using the wikibase label service since I need to run the query on my local wikidata copy which uses Apache Jena
How could i get the english shortnames of countries? E.g. China for People's republic of china and USA for United States of America?

There are two issues here:
we need to filter for the English short names only, i.e. we need a filter (lang(?shortName) = "en") clause inside the second OPTIONAL pattern
for some reason, there are some flags having an English language tag, so we have to ignore those somehow - the good thing, there is a statement qualifier that helps here: an instance of (P31) relation to the Wikidata entity emoji flag sequence (Q28840786)
So, overall, we replace
OPTIONAL {
# https://www.wikidata.org/wiki/Property:P1813
?country wdt:P1813 ?shortName.
}
by
OPTIONAL {
?country p:P1813 ?shortNameStmt. # get the short name statement
?shortNameStmt ps:P1813 ?shortName # the the short name value from the statement
filter (lang(?shortName) = "en") # filter for English short names only
filter not exists {?shortNameStmt pq:P31 wd:Q28840786} # ignore flags (aka emojis)
}
Still, there will be multiple entries for some countries because of multiple short names. One way to workaround this is to use some aggregate function like sample or min/max and pick just a single short name per country.

Related

Wikidata COUNT(*) query times out

I have a straightforward query that counts how many humans have an English Wikipedia page.
prefix schema: <http://schema.org/>
PREFIX wikibase: <http://wikiba.se/ontology#>
PREFIX wd: <http://www.wikidata.org/entity/>
PREFIX wdt: <http://www.wikidata.org/prop/direct/>
SELECT ?item ?article
WHERE
{
?item wdt:P31 wd:Q5 . # Must be of a human
?article schema:about ?item ; # Must have a Wikipedia article
schema:inLanguage "en" ; # Article must be in English
schema:isPartOf <https://en.wikipedia.org/> . # Wikipedia article must be regular article
SERVICE wikibase:label { bd:serviceParam wikibase:language "en". } # Helps get the label in your language, if not, then en language
}
I get expected output as follows:
wd:Q11124 <https://en.wikipedia.org/wiki/Stephen_Breyer>
wd:Q10727 <https://en.wikipedia.org/wiki/Steve_Leo_Beleck>
wd:Q10065 <https://en.wikipedia.org/wiki/Taichang_Emperor>
wd:Q9605 <https://en.wikipedia.org/wiki/Sarah_Allen_(software_developer)>
However, if I change the SELECT statement from
SELECT ?item ?article
to
SELECT (count(?item) as ?count)
I get timeout error. Please note that the count statement works if I only specify "human" condition and exclude English Wiki article condition. So, clearly, some kind of background join is causing the query to timeout.
However, this is a fairly trivial join, so the query timeout is surprising.
Please let me know what may I be missing here.
Thanks!

Filter property values by type in wikidata

I am trying to set up a SPARQL query that lists museums for a specified country. My main goal is to find the city the museum is located.
I'm new to wikidata, just started.
So far, I have used the property ?located_in_the_administrative_territorial_entity to find the city. The problem is that for many countries, this gives me not only the city, but also the state, e.g. in Switzerland it gives me Zurich and in another Canton of Zurich.
Is it possible to restrict the search such that it only gives me the city?
prefix schema: <http://schema.org/>
PREFIX wikibase: <http://wikiba.se/ontology#>
PREFIX wd: <http://www.wikidata.org/entity/>
PREFIX wdt: <http://www.wikidata.org/prop/direct/>
SELECT ?museum ?museumLabel ?located_in_the_administrative_territorial_entity ?located_in_the_administrative_territorial_entityLabel
WHERE {
?museum wdt:P17 wd:Q39; # countries
wdt:P31 ?type.
?type (wdt:P279*) wd:Q33506.
# If available, get the "en" entry, use native language as fallback:
SERVICE wikibase:label { bd:serviceParam wikibase:language "en,de". }
# get location if available
OPTIONAL { ?museum wdt:P131 ?located_in_the_administrative_territorial_entity. }
}
order by ?museum
I would like to get a single row per museum that includes the city the museum is located in. I don't want several rows with the same museum that display different geographical hierarchies (state / city).

SPARQL wikidata query: Obtain number of languages that associated wikipedia article is available in

Is it possible to use SPARQL on wikidata to extract the number of languages of a wikipedia article associated with a wikidata item?
I am new to SPARQL and wikidata. I have tried to find examples online, but no luck so far. I am able to extract the url for the wikipedia article. But I wonder if it is possible to count the number of languages for which the url exists. Here is my code so far:
prefix schema: <http://schema.org/>
PREFIX wikibase: <http://wikiba.se/ontology#>
PREFIX wd: <http://www.wikidata.org/entity/>
PREFIX wdt: <http://www.wikidata.org/prop/direct/>
SELECT ?museum ?museumLabel ?article WHERE {
?museum wdt:P17 wd:Q39; # countries
wdt:P31 ?type.
?type (wdt:P279*) wd:Q207694.
# If available, get the "en" entry, use native language as fallback:
SERVICE wikibase:label { bd:serviceParam wikibase:language "en,de". }
# get wikipedia article in english
OPTIONAL {
?article schema:about ?museum .
?article schema:inLanguage "en" .
FILTER (SUBSTR(str(?article), 1, 25) = "https://en.wikipedia.org/")
}
}
order by ?museum
I would like to add another column to my output that gives me the number of languages a wikipedia article exists in.
PREFIX schema: <http://schema.org/>
PREFIX wikibase: <http://wikiba.se/ontology#>
PREFIX wd: <http://www.wikidata.org/entity/>
PREFIX wdt: <http://www.wikidata.org/prop/direct/>
SELECT ?museum ?museumLabel (COUNT(DISTINCT ?article) as ?wiki_article_count) WHERE {
?museum wdt:P17 wd:Q39; # countries
wdt:P31 ?type.
?type (wdt:P279*) wd:Q207694.
SERVICE wikibase:label { bd:serviceParam wikibase:language "en,de". }
OPTIONAL {
?article schema:about ?museum .
}
}
GROUP BY ?museum ?museumLabel
ORDER BY DESC( ?wiki_article_count )
Further developing the query given in the question, one can obtain the number of linked wiki-pages (I am absolutely not sure about the nomenclature).
Please try the Wikidata Query Service here: https://w.wiki/334g
For example, one can see for the wikidata/wiki entry: https://www.wikidata.org/wiki/Q194626 (Kunstmuseum Basel) (nomenclature remains unclear to me for now. Is it a wiki-entity?) 22 linked wiki-pages. That is 21 wikipedia-languages and one multilingual-site. See screenshot here:
I would be thankful for any comments on this answer and could improve this answer accordingly.

SPARQL: How to obtain label in available languages if first option is not available

If a Wikidata resource returned by my query has no available label in the language I filtered for I obtained an empty cell.
SELECT *
WHERE
{
?country wdt:P31 wd:Q6256.
?country rdfs:label ?country_name
FILTER(LANG(?country_name) = 'jbo').
}
link
How to request to have a label returned in one of any of the available languages if the first language fails?
First, prefer langMatches for checking language tags. This is especially important in your case, since you might want, for instance, a label in English, and langMatches(lang(?label), "en") will find a label with the tag "en", or "en-GB", or "en-US", etc. Those are regional variants for the language, and langMatches can help you find them.
Updated Solution based on comments
#svick noticed in the comments that the original solution ends up with a row for each element in the Cartesian product of the English names with the non-English names. You can avoid that by using a select distinct. But there's really a better way: just use the same variable in two optionals; the first checks for an English label, and the second checks for non-English labels. If the first succeeds, then the second never gets invoked. That is, just do:
select ?country ?label {
?country wdt:P31 wd:Q6256
optional {
?country rdfs:label ?label
filter langMatches(lang(?label), "en")
}
optional {
?country rdfs:label ?label
}
}
Other options
If you need to do any aggregation, you may find some help in SPARQL filter language if possible in multiple value context.
If, after the first language, you still have preferences on the remaining languages, you may find the technique used in Sparql multi lang data compression to one row helpful.
Original Solution with COALESCE
After that, though, coalesce will do what you want. It takes a number of arguments, and returns the first one that has a value. So, you can match the preferred language in an optional block, and any language in another, and coalesce the values:
select distinct ?country (coalesce(?enLabel, ?anyLabel) as ?label) {
?country wdt:P31 wd:Q6256
optional {
?country rdfs:label ?enLabel
filter langMatches(lang(?enLabel), "en")
}
optional {
?country rdfs:label ?anyLabel
}
}

How can I join climate data to cities with dbpedia?

I am trying to get a dataset that gives me all the data available in a city's climate table but I'm having some trouble.
I was able to get this to work and felt pretty good about myself. When I plug this in on dbpedia's virtuoso client this gives me all the cities that dbpedia has, and all of their countries.
PREFIX dbpedia-owl: <http://dbpedia.org/ontology/>
SELECT DISTINCT ?city_name ?country
WHERE { ?city rdf:type dbpedia-owl:City ;
rdfs:label ?city_name;
dbpedia-owl:country ?country
FILTER (langMatches(lang(?city_name), "EN")) .
}
Update: I have found properties that seem to give what I'm looking for (e.g. dbpedia.org/property/aprHighC) but I'm having trouble adding them to my output.
PREFIX dbpedia-owl: <http://dbpedia.org/ontology/>
SELECT DISTINCT ?city_name ?country ?aprHighC
WHERE { ?city rdf:type dbpedia-owl:City .
?city rdfs:label ?city_name .
?city dbpedia-owl:country ?country
FILTER (langMatches(lang(?city_name), "EN")) .
}
Gives an error: Variable 'aprHighC' is used in the query result set but not assigned. How do I assign it?
For the query to get results in the second query, a city has to have a three properties: rdfs:label, dbpedia-owl:country and dbpedia-owl:climate. Your query pretty much proves that DBPedia data has cities with label and country properties, but not climate. Try the following to see just what properties are found for members of dbpedia-owl:City:
SELECT DISTINCT ?p
WHERE {
?city rdf:type dbpedia-owl:City ;
?p ?o .
}
Note that not all members of dbpedia-owl:City will have these properties, but it gives you a range of what properties are used.
Looking at it the other way, you can ask what entities use the dbpedia-owl:climate property:
SELECT ?s
WHERE {
?s dbpedia-owl:climate ?climate
}
I didn't find any, so it could be the case that the prefix is different than the one you are using? I'd suggest double-checking the property name.
Regardless, it's a good idea to use SPARQL to find what is actually in the data store. And use LIMIT to look at parts of the data without overwhelming the system.
The following query gives the January average daily high (°C). Adding other climate items is as simple as copying the line beginning "OPTIONAL" and changing the item and variable name from janHighC to whatever you are trying to get.
PREFIX dbpedia-owl: <http://dbpedia.org/ontology/>
SELECT * {
{ ?city rdf:type dbo:City .
?city rdf:type schema:City .
?city rdfs:label ?name
}
OPTIONAL {?city dbp:janHighC ?janHighC .}
}
I will note, however, that most cities don't have this information. I had to give up on getting the data this way.