SPARQL: UNION drops values, OPTIONAL needs separate variables - sparql

How can I get all the labels and aliases for a given country, in both English and the official language(s) of this country, as a single column, from wikidata?
I initially tried the following but UNION seems to drop the labels in the official language:
Attempt #1:
SELECT
?country_iso
(GROUP_CONCAT(DISTINCT(?label) ; separator = ", ") AS ?labels)
WHERE
{
VALUES ?country { wd:Q878 } # FOR TESTING PURPOSES, TO REMOVE ONCE ISSUE RESOLVED.
# country sovereign state unincorporated territory
VALUES ?any_kind_of_country { wd:Q6256 wd:Q3624078 wd:Q783733 }
?country wdt:P31 ?any_kind_of_country . hint:Prior hint:runFirst true .
?country wdt:P297 ?country_iso ; # ISO 3166-1 alpha-2 code
wdt:P37 ?official_lang .
?official_lang wdt:P424 ?wm_lang_code .
{ ?country rdfs:label ?label . FILTER(LANG(?label) = "en") }
UNION
{ ?country rdfs:label ?label . FILTER(LANG(?label) = ?wm_lang_code) }
UNION
{ ?country skos:altLabel ?label . FILTER(LANG(?label) = "en") }
UNION
{ ?country skos:altLabel ?label . FILTER(LANG(?label) = ?wm_lang_code) }
}
GROUP BY ?country_iso
ORDER BY ?country_iso
| country_iso | labels |
|-------------|------------------------------------------------------------------------------------------------------------------|
| AE | United Arab Emirates, ae, 🇦🇪, Emirates, the Emirates, the U.A.E., the UAE, the United Arab Emirates, U.A.E., UAE |
If I use two separate variables ?label and ?altLabel and OPTIONAL, then I seem to get all the labels and aliases, but:
a. I'm not convinced || is right there, although it seems to produce the right results,
b. two columns (?labels and ?altLabels) isn't ideal in my case. (And if I only use one, then only rdfs:label gets returned.)
Attempt #2
SELECT
?country_iso
(GROUP_CONCAT(DISTINCT(?label) ; separator = ", ") AS ?labels)
(GROUP_CONCAT(DISTINCT(?altLabel) ; separator = ", ") AS ?altLabels)
WHERE
{
VALUES ?country { wd:Q878 }
# country sovereign state unincorporated territory
VALUES ?any_kind_of_country { wd:Q6256 wd:Q3624078 wd:Q783733 }
?country wdt:P31 ?any_kind_of_country . hint:Prior hint:runFirst true .
?country wdt:P297 ?country_iso ; # ISO 3166-1 alpha-2 code
wdt:P37 ?official_lang .
?official_lang wdt:P424 ?wm_lang_code .
OPTIONAL { ?country rdfs:label ?label . FILTER(LANG(?label) = "en" || LANG(?label) = ?wm_lang_code) }
OPTIONAL { ?country skos:altLabel ?altLabel . FILTER(LANG(?altLabel) = "en" || LANG(?altLabel) = ?wm_lang_code) }
}
GROUP BY ?country_iso
ORDER BY ?country_iso
| country_iso | labels | altLabels |
|-------------|------------------------------------------------|-------------------------------------------------------------------------------------------------------------------------------------|
| AE | United Arab Emirates, الإمارات العربية المتحدة | الإمارات, دولة الإمارات العربية المتحدة, ae, 🇦🇪, Emirates, the Emirates, the U.A.E., the UAE, the United Arab Emirates, U.A.E., UAE |
What am I missing? 🤔 Thank you very much in advance for your help! 🙏🏻😊
Disclaimer: relatively new to SPARQL (or rather, very rusty).

Thanks to #UninformedUser (see comments) I was able to come up with this query, yielding 206 results:
SELECT
?country_iso
(GROUP_CONCAT(DISTINCT(?label) ; separator = " | ") AS ?labels)
WHERE
{
# country sovereign state unincorporated territory
VALUES ?any_kind_of_country { wd:Q6256 wd:Q3624078 wd:Q783733 }
?country wdt:P31 ?any_kind_of_country . hint:Prior hint:runFirst true .
?country wdt:P297 ?country_iso ; # ISO 3166-1 alpha-2 code
wdt:P37 ?official_lang .
?official_lang wdt:P424 ?wm_lang_code .
?country rdfs:label|skos:altLabel ?label . FILTER(LANG(?label) = "en" || LANG(?label) = ?wm_lang_code)
}
GROUP BY ?country_iso
ORDER BY ?country_iso
P.S.: An extension of this work, also fetching emoji flags is that query.

Related

Return cities in Wikidata SPARQL Query, similar to a Wikipedia page

I'm not sure what I'm doing wrong. I have a nice list, but not only are the cities duplicating, but I'm unsure how they're defined as cities. I would expect to see London in the results and have similar results to this Wikipedia page. These results are quite different to the Wikipedia page.
I want to:
Get a list of cities, with their first-level administrative country subdivision (province/state/region), similar to this Wikipedia page
While avoiding duplicate cities.
SELECT ?city ?cityLabel ?country ?population ?countryLabel ?region ?regionLabel ?lat ?long
WHERE
{
?city wdt:P31/wdt:P279 wd:Q515 . # find instances of subclasses of city
?city (wdt:P131) ?region.
?region wdt:P31/wdt:P279 wd:Q10864048 .
?city wdt:P1082 ?population .
?city wdt:P17 ?country . # Also find the country of the city
?city p:P625 ?statement . # coordinate-location statement
?statement psv:P625 ?coordinate_node .
OPTIONAL { ?coordinate_node wikibase:geoLatitude ?lat. }
OPTIONAL { ?coordinate_node wikibase:geoLongitude ?long.}
FILTER (?population > 100000) .
# choose language
SERVICE wikibase:label {
bd:serviceParam wikibase:language "en" .
}
}
LIMIT 8000
Try it
Update:
Although not an answer to this specific question, anyone trying to get similar data to this should have a look here.
Update 2:
With help in the comments from #UninformedUser, the query is now:
SELECT DISTINCT ?city ?cityLabel ?country ?population ?countryLabel ?region ?regionLabel ?lat ?long
WHERE
{
?city wdt:P31/wdt:P279 wd:Q515 . # find instances of subclasses of city
?city (wdt:P131) ?region.
?region wdt:P31/wdt:P279 wd:Q10864048 .
?city p:P1082 ?populationStmt .
?populationStmt ps:P1082 ?population ; pq:P585 ?pop_date .
?city wdt:P17 ?country . # Also find the country of the city
?city p:P625 ?statement . # coordinate-location statement
?statement psv:P625 ?coordinate_node .
OPTIONAL { ?coordinate_node wikibase:geoLatitude ?lat. }
OPTIONAL { ?coordinate_node wikibase:geoLongitude ?long.}
FILTER NOT EXISTS {
?city p:P1082/pq:P585 ?pop_date_ .
FILTER (?pop_date_ > ?pop_date)
}
FILTER (?population > 100000) .
# choose language
SERVICE wikibase:label {
bd:serviceParam wikibase:language "en" .
}
}
LIMIT 8000
Try it

How to query wikidata using SPARQL to find a place that matches certain criteria and is geographically near another city

I wrote the following SPARQL query to find the wikidata item with the label "San Leucio" in Italy.
SELECT DISTINCT * WHERE {
?location ?label 'San Leucio'#en .
?location wdt:P17 wd:Q38 .
?location rdfs:label ?locationName .
OPTIONAL {
?article schema:about ?location .
?article schema:isPartOf <https://en.wikivoyage.org/> .
}
?location wdt:P18 ?image .
FILTER(lang(?locationName) = "en")
}
The query returns these 3 results:
wd:Q55179410
wd:Q20009063
wd:Q846499
The result I want is wd:Q846499, which is outside of Naples, Italy. Is there any way I could further filter this query to return the result that is nearest to Naples? I know that I can get the geoCoordinates for each of these with ?location wdt:P625 ?coordinates, but I'm not sure how I could use that to compare to the geo-coordinates of Naples to get what I want.
SELECT DISTINCT * {
VALUES ?naples {wd:Q2634}
?Napfes wdt:P625 ?naples_coordinates.
?location rdfs:label 'San Leucio'#en .
?location wdt:P17 wd:Q38 .
?location wdt:P18 ?image .
?location wdt:P625 ?location_coordinates.
OPTIONAL {
?article schema:about ?location .
?article schema:isPartOf <https://en.wikivoyage.org/> .
}
BIND (geof:distance(?location_coordinates, ?naples_coordinates) AS ?distance)
} ORDER BY ?distance LIMIT 1

Why variable + Label is not working in SPARQL

In the examples of the SPARQL of Wikidata, we have this one:
SELECT ?h ?date
WHERE
{
?h wdt:P31 wd:Q5 .
?h wdt:P569 ?date .
OPTIONAL {?h wdt:P570 ?d }
FILTER (?date > "1880-01-01T00:00:00Z"^^xsd:dateTime)
FILTER (!bound(?d))
}
LIMIT 1000
I understand that if you put Label after the name of a variable it shows the label. So, I don't understand why this shows no output:
SELECT ?h ?hLabel ?date ...
Thank you in advance!
I am not aware of that specific feature for Label after the variable name.
However, for rdfs:label, you can to inculde the rdfs:label in your query. Add the following line: ?h rdfs:label ?hLabel.:
SELECT ?h ?hLabel ?date WHERE
{
?h wdt:P31 wd:Q5 .
?h wdt:P569 ?date .
?h rdfs:label ?hLabel.
OPTIONAL {?h wdt:P570 ?d }
FILTER (?date > "1880-01-01T00:00:00Z"^^xsd:dateTime)
FILTER (!bound(?d))
}
LIMIT 1000
If you want labels in a specific language, e.g. for English add FILTER (langMatches( lang(?hLabel), "EN" ) )
Here is a stackoverflow intersting answer about labels.

sparql query is not returning desired output

This is my sparql query to get all the country name with some property of that country.
SELECT distinct ?country ?capital ?currency ?lat ?long
WHERE {
?country rdf:type dbo:Country .
?country dbo:capital ?capital .
?country dbo:currency ?currency .
?country geo:lat ?lat.
?country geo:long ?long.
}
ORDER BY ?country
But the problem is, Some country is missing, like "Switzerland". You go to http://dbpedia.org/page/Switzerland this page you will see that it's type is country. you will also do not find exactly "Austria" rather "Austrian_Empire". Why? There is a entity called "Austria" and it is dbo:Country type.
Switzerland does not seem to have dbo:capital, which is why it's not included in the results of your query.
If you want to get even results that do not have some of the properties, use OPTIONAL:
SELECT distinct ?country ?capital ?currency ?lat ?long
WHERE {
?country rdf:type dbo:Country .
OPTIONAL
{
?country dbo:capital ?capital .
?country dbo:currency ?currency .
?country geo:lat ?lat.
?country geo:long ?long.
}
}
ORDER BY ?country
Though this query returns even entities that are not countries (but for some reason are dbo:Country), like Cinema of Switzerland.

dbpedia SPARQL query to get certain value's for a given city

I am sure what I want to do is very easy, yet I cannot seem to get the query right. I have records in dataset which have values such as city name e.g. 'New York' and it's corresponding country code e.g 'US'. I also have access to the full country name and country ISO codes.
I would like to get the population and abstract value's for these cities off dbpedia, by using a where clause such as:
Get population where name = "New York" and isoCountryCode = "US"
I've searched for help on this to no avail.
so far I have been kindly helped by #rohk with this query, which does not fully work for all locations:
SELECT DISTINCT ?city ?abstract ?pop
WHERE {
?city rdf:type schema:City ;
rdfs:label ?label ;
dbpedia-owl:abstract ?abstract ;
dbpedia-owl:country ?country ;
dbpedia-owl:populationTotal ?pop .
?country dbpprop:countryCode "USA"#en .
FILTER ( lang(?abstract) = 'en' and regex(?label, "New York City"))
}
The above works for New York, however when I change it to:
SELECT DISTINCT ?city ?abstract ?pop
WHERE {
?city rdf:type schema:City ;
rdfs:label ?label ;
dbpedia-owl:abstract ?abstract ;
dbpedia-owl:country ?country ;
dbpedia-owl:populationTotal ?pop .
?country dbpprop:countryCode "THA"#en .
FILTER ( lang(?abstract) = 'en' and regex(?label, "Bangkok"))
}
It returns no results for Bangkok, Thailand.
I just cant seem to get the SPARQL query correct, I'm sure I am being silly with my query. If any guru's could provide me with help I'd appreciate it. Thanks!
I guess you want something like this:
SELECT * WHERE {
?x rdfs:label "New York City"#en.
?x dbpedia-owl:populationTotal ?pop.
?x dbpedia-owl:abstract ?abstract.
}
To get only the English abstract, add a FILTER:
SELECT * WHERE {
?x rdfs:label "New York City"#en.
?x dbpedia-owl:populationTotal ?pop.
?x dbpedia-owl:abstract ?abstract.
FILTER (LANG(?abstract) = 'en')
}
“New York” is the state and it doesn't have a populationTotal figure attached. “New York City” is the city.
This query is working
SELECT DISTINCT *
WHERE {
?city rdf:type schema:City ;
rdfs:label ?label ;
dbpedia-owl:abstract ?abstract ;
dbpedia-owl:country ?country ;
dbpprop:website ?website ;
dbpedia-owl:populationTotal ?pop .
?country dbpprop:countryCode "USA"#en .
FILTER ( lang(?abstract) = 'en' and regex(?label, "New York City"))
}
EDIT : For Bangkok, there are 2 problems :
No country code for Thailand : you can use rdfs:label "Thailand"#en instead.
rdf:type of Bangkok is not schema:City but dbpedia-owl:Settlement
Here is a working query for Bangkok
SELECT DISTINCT *
WHERE {
?city rdf:type dbpedia-owl:Settlement ;
rdfs:label "Bangkok"#en ;
dbpedia-owl:abstract ?abstract ;
dbpedia-owl:populationTotal ?pop ;
dbpedia-owl:country ?country ;
dbpprop:website ?website .
?country rdfs:label "Thailand"#en .
FILTER ( lang(?abstract) = 'en' )
}