How to better filter the sparql query output to avoid the Query timeout limit reached error? - sparql

I am getting the Query timeout limit reached error. Is there any way to maybe filter the output more? Thank you!
SELECT (count(distinct ?city) as ?count) WHERE {
?city wdt:P31/wdt:P279* wd:Q486972. # human settlement
?city wdt:P131 ?region.
?city wdt:P17 ?country.
#not a former country
FILTER NOT EXISTS {?country wdt:P31 wd:Q3024240}
#and not an ancient civilisation (needed to exclude ancient Egypt)
FILTER NOT EXISTS {?country wdt:P31 wd:Q28171280}
#not demolished, abolished countries etc.
FILTER NOT EXISTS {?country wdt:P576 ?abolished}
?article schema:about ?city.
?article schema:isPartOf <https://en.wikipedia.org/>.
}

Related

Why this wikidata SPARQL query is missing country information?

This SPARQL query on Wikidata is missing the form of government for a lot of entries. My query:
SELECT DISTINCT ?country ?countryLabel
(group_concat(DISTINCT ?bfogLabel;separator=", ") as ?Government)
WHERE
{
?country wdt:P31 wd:Q3624078.
OPTIONAL {?country wdt:P122 ?bfog } . # basic form of government
SERVICE wikibase:label
{ bd:serviceParam wikibase:language "en" .
?country rdfs:label ?countryLabel .
?bfog rdfs:label ?bfogLabel .
}
}
GROUP BY ?country ?countryLabel
ORDER BY ?countryLabel
Angola is in Wikipedia's infobox: "Unitary dominant-party presidential constitutional republic". But it is empty in this query.
Why is that? More important: is there any fix for this? I saw in this question that wikidata is not as reliable as possible when it comes to data categorization.
Try it out here

Overcoming single value constraing issues with P625 coordinate location in Wikidata

I am trying to get a city list together with region and country information with a query like this:
# get a list of cities
# for geograpy3 library
# see https://github.com/somnathrakshit/geograpy3/issues/15
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX wd: <http://www.wikidata.org/entity/>
PREFIX wdt: <http://www.wikidata.org/prop/direct/>
PREFIX p: <http://www.wikidata.org/prop/>
PREFIX ps: <http://www.wikidata.org/prop/statement/>
PREFIX pq: <http://www.wikidata.org/prop/qualifier/>
# get human settlements
SELECT DISTINCT ?city ?cityLabel (max(?cityPop) as ?cityPopulation) ?coord ?region ?regionLabel ?regionIsoCode ?country ?countryLabel ?countryIsoCode ?countryPopulation ?countryGdpPerCapita WHERE {
# if you uncomment this line this query might run for some 3 hours on a local wikidata copy using Apache Jena
# run for Vienna, Illinois, Vienna Austria, Paris Texas and Paris France as example only
# VALUES ?city { wd:Q577544 wd:Q1741 wd:Q830149 wd:Q90}.
# run for Andorra
VALUES ?country {wd:Q228}.
# instance of human settlement https://www.wikidata.org/wiki/Q486972
?city wdt:P31/wdt:P279* wd:Q486972 .
# label of the City
?city rdfs:label ?cityLabel filter (lang(?cityLabel) = "en").
# country this city belongs to
?city wdt:P17 ?country .
# label for the country
?country rdfs:label ?countryLabel filter (lang(?countryLabel) = "en").
# https://www.wikidata.org/wiki/Property:P297 ISO 3166-1 alpha-2 code
?country wdt:P297 ?countryIsoCode.
# population of country
?country wdt:P1082 ?countryPopulation.
OPTIONAL {
?country wdt:P2132 ?countryGdpPerCapita.
}
OPTIONAL {
# located in administrative territory
# https://www.wikidata.org/wiki/Property:P131
?city wdt:P131* ?region.
# administrative unit of first order
?region wdt:P31/wdt:P279* wd:Q10864048.
?region rdfs:label ?regionLabel filter (lang(?regionLabel) = "en").
# isocode state/province
OPTIONAL { ?region wdt:P300 ?regionIsoCode. }
}
# population of city
OPTIONAL { ?city wdt:P1082 ?cityPop.}
# get the coordinates
OPTIONAL { ?city wdt:P625 ?coord. }
} GROUP BY ?city ?cityLabel ?coord ?region ?regionLabel ?regionIsoCode ?country ?countryLabel ?countryIsoCode ?countryPopulation ?countryGdpPerCapita
ORDER BY ?cityLabel
try it!
to experiment with the query i comment out the
# VALUES ?city { wd:Q577544 wd:Q1741 wd:Q830149 wd:Q90}.
# run for Andorra
VALUES ?country {wd:Q228}.
part to see that the results make sense.
Now for The Andorra trial there are cities with multiple coordinates:
https://www.wikidata.org/wiki/Property:P625
Which are event flagged as a problem.
I know there is work-around as explained in How to get only the most recent value from a Wikidata property? and https://w.wiki/EKB
I tried the approach in the snippet
?city p:P1082 ?populationStatement .
?populationStatement ps:P1082 ?cityPopulation.
?populationStatement pq:P585 ?date
FILTER NOT EXISTS { ?city p:P1082/pq:P585 ?date_ . FILTER (?date_ > ?date) }
which makes queries real slow and in this case i am looking into all instance of human settlement which are a few hundred thousand. Even on my local wikidata copy this runs more than 3 hours !
So i wonder whether there is an alternative with MAX, AVG, Subqueries with limit or the like or any other nifty idea that would solve the issue with a decent performance?
You can use sample() as aggregation function (sparql doc).
Starting from you query expression, you will need to change the first line into
SELECT DISTINCT ?city ?cityLabel (max(?cityPop) as ?cityPopulation) (sample(?coord) as ?coordinate) ?region ?regionLabel ?regionIsoCode ?country ?countryLabel ?countryIsoCode ?countryPopulation ?countryGdpPerCapita WHERE {
and your second last line into:
} GROUP BY ?city ?cityLabel ?region ?regionLabel ?regionIsoCode ?country ?countryLabel ?countryIsoCode ?countryPopulation ?countryGdpPerCapita
The result should look like this: https://w.wiki/dRV.
The work-around you tried does not work because unlike P1082 (population), P625 (coordinate) have in most cases no P585 (point in time) qualifier.

How to get information like capital, currency, language, population about a country in simple way using SPARQL from DBPEDIA

How to get information like capital, currency, language, population about a country in simple way using SPARQL from DBPEDIA.
Is there a simple way to do it?
I found a solution for this problem,
I figured out a query that returns all the required fields,
SELECT DISTINCT ?country ?population ?capital ?currency WHERE {
{?country rdf:type
<http://dbpedia.org/class/yago/WikicatMemberStatesOfTheUnitedNations> .
?country <http://dbpedia.org/ontology/populationTotal> ?population .
?country <http://dbpedia.org/ontology/capital> ?capital .
?country <http://dbpedia.org/ontology/currency> ?currency .}
UNION
{?country rdf:type <http://dbpedia.org/ontology/Country> .
?country <http://dbpedia.org/ontology/populationTotal> ?population .
?country <http://dbpedia.org/ontology/capital> ?capital .
?country <http://dbpedia.org/ontology/currency> ?currency .}
}
SPARQL Result

OR in sparql query

This sparql query on wikidata shows all places in Germany (Q183) with a name that ends in -ow or -itz.
I want to extend this to look for places in Germany and, say, Austria.
I tried modifying the 8th line to something like:
wdt:P17 (wd:Q183 || wd:Q40);
in order to look for places in Austria (Q40), but this is not a valid query.
What is a way to extend the query to include other countries?
Afaik there is no syntax as simple as that. You can, however, use UNION to the same effect like this:
SELECT ?item ?itemLabel ?coord
WHERE
{
?item wdt:P31/wdt:P279* wd:Q486972;
rdfs:label ?itemLabel;
wdt:P625 ?coord;
{?item wdt:P17 wd:Q183}
UNION
{?item wdt:P17 wd:Q40}
FILTER (lang(?itemLabel) = "de") .
FILTER regex (?itemLabel, "(ow|itz)$").
}
or as an alternative create a new variable containing both countries using VALUES:
SELECT ?item ?itemLabel ?coord
WHERE
{
VALUES ?country { wd:Q40 wd:Q183 }
?item wdt:P31/wdt:P279* wd:Q486972;
wdt:P17 ?country;
rdfs:label ?itemLabel;
wdt:P625 ?coord;
FILTER (lang(?itemLabel) = "de") .
FILTER regex (?itemLabel, "(ow|itz)$").
}

can not find out all city from germany

I tried with this query.
SELECT distinct ?city ?cityName ?country WHERE {
?city rdf:type dbo:City .
?city rdfs:label ?cityName.
?city dbo:country ?country.
?city dbo:country dbr:Germany.
FILTER (lang(?cityName) = 'en')
} ORDER BY ?city
But some city which have dbo:country predicate and dbr:Germany value those are still not listed in the output. For example try this link http://dbpedia.org/page/Goslar . There is no "Goslar" city in output. Can anybody explain me why?
First of all, dbpedia is really a messy place. For example, Goslar, in dbpedia, is not even a city it is a dbo:PopulatedPlace dbo:Town yago:City108524735 . That's why It's not in the output. Another example is Paris. You can check it.