How can I get all the labels and aliases for a given country, in both English and the official language(s) of this country, as a single column, from wikidata?
I initially tried the following but UNION seems to drop the labels in the official language:
Attempt #1:
SELECT
?country_iso
(GROUP_CONCAT(DISTINCT(?label) ; separator = ", ") AS ?labels)
WHERE
{
VALUES ?country { wd:Q878 } # FOR TESTING PURPOSES, TO REMOVE ONCE ISSUE RESOLVED.
# country sovereign state unincorporated territory
VALUES ?any_kind_of_country { wd:Q6256 wd:Q3624078 wd:Q783733 }
?country wdt:P31 ?any_kind_of_country . hint:Prior hint:runFirst true .
?country wdt:P297 ?country_iso ; # ISO 3166-1 alpha-2 code
wdt:P37 ?official_lang .
?official_lang wdt:P424 ?wm_lang_code .
{ ?country rdfs:label ?label . FILTER(LANG(?label) = "en") }
UNION
{ ?country rdfs:label ?label . FILTER(LANG(?label) = ?wm_lang_code) }
UNION
{ ?country skos:altLabel ?label . FILTER(LANG(?label) = "en") }
UNION
{ ?country skos:altLabel ?label . FILTER(LANG(?label) = ?wm_lang_code) }
}
GROUP BY ?country_iso
ORDER BY ?country_iso
| country_iso | labels |
|-------------|------------------------------------------------------------------------------------------------------------------|
| AE | United Arab Emirates, ae, 🇦🇪, Emirates, the Emirates, the U.A.E., the UAE, the United Arab Emirates, U.A.E., UAE |
If I use two separate variables ?label and ?altLabel and OPTIONAL, then I seem to get all the labels and aliases, but:
a. I'm not convinced || is right there, although it seems to produce the right results,
b. two columns (?labels and ?altLabels) isn't ideal in my case. (And if I only use one, then only rdfs:label gets returned.)
Attempt #2
SELECT
?country_iso
(GROUP_CONCAT(DISTINCT(?label) ; separator = ", ") AS ?labels)
(GROUP_CONCAT(DISTINCT(?altLabel) ; separator = ", ") AS ?altLabels)
WHERE
{
VALUES ?country { wd:Q878 }
# country sovereign state unincorporated territory
VALUES ?any_kind_of_country { wd:Q6256 wd:Q3624078 wd:Q783733 }
?country wdt:P31 ?any_kind_of_country . hint:Prior hint:runFirst true .
?country wdt:P297 ?country_iso ; # ISO 3166-1 alpha-2 code
wdt:P37 ?official_lang .
?official_lang wdt:P424 ?wm_lang_code .
OPTIONAL { ?country rdfs:label ?label . FILTER(LANG(?label) = "en" || LANG(?label) = ?wm_lang_code) }
OPTIONAL { ?country skos:altLabel ?altLabel . FILTER(LANG(?altLabel) = "en" || LANG(?altLabel) = ?wm_lang_code) }
}
GROUP BY ?country_iso
ORDER BY ?country_iso
| country_iso | labels | altLabels |
|-------------|------------------------------------------------|-------------------------------------------------------------------------------------------------------------------------------------|
| AE | United Arab Emirates, الإمارات العربية المتحدة | الإمارات, دولة الإمارات العربية المتحدة, ae, 🇦🇪, Emirates, the Emirates, the U.A.E., the UAE, the United Arab Emirates, U.A.E., UAE |
What am I missing? 🤔 Thank you very much in advance for your help! 🙏🏻😊
Disclaimer: relatively new to SPARQL (or rather, very rusty).
Thanks to #UninformedUser (see comments) I was able to come up with this query, yielding 206 results:
SELECT
?country_iso
(GROUP_CONCAT(DISTINCT(?label) ; separator = " | ") AS ?labels)
WHERE
{
# country sovereign state unincorporated territory
VALUES ?any_kind_of_country { wd:Q6256 wd:Q3624078 wd:Q783733 }
?country wdt:P31 ?any_kind_of_country . hint:Prior hint:runFirst true .
?country wdt:P297 ?country_iso ; # ISO 3166-1 alpha-2 code
wdt:P37 ?official_lang .
?official_lang wdt:P424 ?wm_lang_code .
?country rdfs:label|skos:altLabel ?label . FILTER(LANG(?label) = "en" || LANG(?label) = ?wm_lang_code)
}
GROUP BY ?country_iso
ORDER BY ?country_iso
P.S.: An extension of this work, also fetching emoji flags is that query.
Related
I'm not sure what I'm doing wrong. I have a nice list, but not only are the cities duplicating, but I'm unsure how they're defined as cities. I would expect to see London in the results and have similar results to this Wikipedia page. These results are quite different to the Wikipedia page.
I want to:
Get a list of cities, with their first-level administrative country subdivision (province/state/region), similar to this Wikipedia page
While avoiding duplicate cities.
SELECT ?city ?cityLabel ?country ?population ?countryLabel ?region ?regionLabel ?lat ?long
WHERE
{
?city wdt:P31/wdt:P279 wd:Q515 . # find instances of subclasses of city
?city (wdt:P131) ?region.
?region wdt:P31/wdt:P279 wd:Q10864048 .
?city wdt:P1082 ?population .
?city wdt:P17 ?country . # Also find the country of the city
?city p:P625 ?statement . # coordinate-location statement
?statement psv:P625 ?coordinate_node .
OPTIONAL { ?coordinate_node wikibase:geoLatitude ?lat. }
OPTIONAL { ?coordinate_node wikibase:geoLongitude ?long.}
FILTER (?population > 100000) .
# choose language
SERVICE wikibase:label {
bd:serviceParam wikibase:language "en" .
}
}
LIMIT 8000
Try it
Update:
Although not an answer to this specific question, anyone trying to get similar data to this should have a look here.
Update 2:
With help in the comments from #UninformedUser, the query is now:
SELECT DISTINCT ?city ?cityLabel ?country ?population ?countryLabel ?region ?regionLabel ?lat ?long
WHERE
{
?city wdt:P31/wdt:P279 wd:Q515 . # find instances of subclasses of city
?city (wdt:P131) ?region.
?region wdt:P31/wdt:P279 wd:Q10864048 .
?city p:P1082 ?populationStmt .
?populationStmt ps:P1082 ?population ; pq:P585 ?pop_date .
?city wdt:P17 ?country . # Also find the country of the city
?city p:P625 ?statement . # coordinate-location statement
?statement psv:P625 ?coordinate_node .
OPTIONAL { ?coordinate_node wikibase:geoLatitude ?lat. }
OPTIONAL { ?coordinate_node wikibase:geoLongitude ?long.}
FILTER NOT EXISTS {
?city p:P1082/pq:P585 ?pop_date_ .
FILTER (?pop_date_ > ?pop_date)
}
FILTER (?population > 100000) .
# choose language
SERVICE wikibase:label {
bd:serviceParam wikibase:language "en" .
}
}
LIMIT 8000
Try it
I wrote the following SPARQL query to find the wikidata item with the label "San Leucio" in Italy.
SELECT DISTINCT * WHERE {
?location ?label 'San Leucio'#en .
?location wdt:P17 wd:Q38 .
?location rdfs:label ?locationName .
OPTIONAL {
?article schema:about ?location .
?article schema:isPartOf <https://en.wikivoyage.org/> .
}
?location wdt:P18 ?image .
FILTER(lang(?locationName) = "en")
}
The query returns these 3 results:
wd:Q55179410
wd:Q20009063
wd:Q846499
The result I want is wd:Q846499, which is outside of Naples, Italy. Is there any way I could further filter this query to return the result that is nearest to Naples? I know that I can get the geoCoordinates for each of these with ?location wdt:P625 ?coordinates, but I'm not sure how I could use that to compare to the geo-coordinates of Naples to get what I want.
SELECT DISTINCT * {
VALUES ?naples {wd:Q2634}
?Napfes wdt:P625 ?naples_coordinates.
?location rdfs:label 'San Leucio'#en .
?location wdt:P17 wd:Q38 .
?location wdt:P18 ?image .
?location wdt:P625 ?location_coordinates.
OPTIONAL {
?article schema:about ?location .
?article schema:isPartOf <https://en.wikivoyage.org/> .
}
BIND (geof:distance(?location_coordinates, ?naples_coordinates) AS ?distance)
} ORDER BY ?distance LIMIT 1
In the examples of the SPARQL of Wikidata, we have this one:
SELECT ?h ?date
WHERE
{
?h wdt:P31 wd:Q5 .
?h wdt:P569 ?date .
OPTIONAL {?h wdt:P570 ?d }
FILTER (?date > "1880-01-01T00:00:00Z"^^xsd:dateTime)
FILTER (!bound(?d))
}
LIMIT 1000
I understand that if you put Label after the name of a variable it shows the label. So, I don't understand why this shows no output:
SELECT ?h ?hLabel ?date ...
Thank you in advance!
I am not aware of that specific feature for Label after the variable name.
However, for rdfs:label, you can to inculde the rdfs:label in your query. Add the following line: ?h rdfs:label ?hLabel.:
SELECT ?h ?hLabel ?date WHERE
{
?h wdt:P31 wd:Q5 .
?h wdt:P569 ?date .
?h rdfs:label ?hLabel.
OPTIONAL {?h wdt:P570 ?d }
FILTER (?date > "1880-01-01T00:00:00Z"^^xsd:dateTime)
FILTER (!bound(?d))
}
LIMIT 1000
If you want labels in a specific language, e.g. for English add FILTER (langMatches( lang(?hLabel), "EN" ) )
Here is a stackoverflow intersting answer about labels.
This is my sparql query to get all the country name with some property of that country.
SELECT distinct ?country ?capital ?currency ?lat ?long
WHERE {
?country rdf:type dbo:Country .
?country dbo:capital ?capital .
?country dbo:currency ?currency .
?country geo:lat ?lat.
?country geo:long ?long.
}
ORDER BY ?country
But the problem is, Some country is missing, like "Switzerland". You go to http://dbpedia.org/page/Switzerland this page you will see that it's type is country. you will also do not find exactly "Austria" rather "Austrian_Empire". Why? There is a entity called "Austria" and it is dbo:Country type.
Switzerland does not seem to have dbo:capital, which is why it's not included in the results of your query.
If you want to get even results that do not have some of the properties, use OPTIONAL:
SELECT distinct ?country ?capital ?currency ?lat ?long
WHERE {
?country rdf:type dbo:Country .
OPTIONAL
{
?country dbo:capital ?capital .
?country dbo:currency ?currency .
?country geo:lat ?lat.
?country geo:long ?long.
}
}
ORDER BY ?country
Though this query returns even entities that are not countries (but for some reason are dbo:Country), like Cinema of Switzerland.
I am sure what I want to do is very easy, yet I cannot seem to get the query right. I have records in dataset which have values such as city name e.g. 'New York' and it's corresponding country code e.g 'US'. I also have access to the full country name and country ISO codes.
I would like to get the population and abstract value's for these cities off dbpedia, by using a where clause such as:
Get population where name = "New York" and isoCountryCode = "US"
I've searched for help on this to no avail.
so far I have been kindly helped by #rohk with this query, which does not fully work for all locations:
SELECT DISTINCT ?city ?abstract ?pop
WHERE {
?city rdf:type schema:City ;
rdfs:label ?label ;
dbpedia-owl:abstract ?abstract ;
dbpedia-owl:country ?country ;
dbpedia-owl:populationTotal ?pop .
?country dbpprop:countryCode "USA"#en .
FILTER ( lang(?abstract) = 'en' and regex(?label, "New York City"))
}
The above works for New York, however when I change it to:
SELECT DISTINCT ?city ?abstract ?pop
WHERE {
?city rdf:type schema:City ;
rdfs:label ?label ;
dbpedia-owl:abstract ?abstract ;
dbpedia-owl:country ?country ;
dbpedia-owl:populationTotal ?pop .
?country dbpprop:countryCode "THA"#en .
FILTER ( lang(?abstract) = 'en' and regex(?label, "Bangkok"))
}
It returns no results for Bangkok, Thailand.
I just cant seem to get the SPARQL query correct, I'm sure I am being silly with my query. If any guru's could provide me with help I'd appreciate it. Thanks!
I guess you want something like this:
SELECT * WHERE {
?x rdfs:label "New York City"#en.
?x dbpedia-owl:populationTotal ?pop.
?x dbpedia-owl:abstract ?abstract.
}
To get only the English abstract, add a FILTER:
SELECT * WHERE {
?x rdfs:label "New York City"#en.
?x dbpedia-owl:populationTotal ?pop.
?x dbpedia-owl:abstract ?abstract.
FILTER (LANG(?abstract) = 'en')
}
“New York” is the state and it doesn't have a populationTotal figure attached. “New York City” is the city.
This query is working
SELECT DISTINCT *
WHERE {
?city rdf:type schema:City ;
rdfs:label ?label ;
dbpedia-owl:abstract ?abstract ;
dbpedia-owl:country ?country ;
dbpprop:website ?website ;
dbpedia-owl:populationTotal ?pop .
?country dbpprop:countryCode "USA"#en .
FILTER ( lang(?abstract) = 'en' and regex(?label, "New York City"))
}
EDIT : For Bangkok, there are 2 problems :
No country code for Thailand : you can use rdfs:label "Thailand"#en instead.
rdf:type of Bangkok is not schema:City but dbpedia-owl:Settlement
Here is a working query for Bangkok
SELECT DISTINCT *
WHERE {
?city rdf:type dbpedia-owl:Settlement ;
rdfs:label "Bangkok"#en ;
dbpedia-owl:abstract ?abstract ;
dbpedia-owl:populationTotal ?pop ;
dbpedia-owl:country ?country ;
dbpprop:website ?website .
?country rdfs:label "Thailand"#en .
FILTER ( lang(?abstract) = 'en' )
}