OR in sparql query - sparql

This sparql query on wikidata shows all places in Germany (Q183) with a name that ends in -ow or -itz.
I want to extend this to look for places in Germany and, say, Austria.
I tried modifying the 8th line to something like:
wdt:P17 (wd:Q183 || wd:Q40);
in order to look for places in Austria (Q40), but this is not a valid query.
What is a way to extend the query to include other countries?

Afaik there is no syntax as simple as that. You can, however, use UNION to the same effect like this:
SELECT ?item ?itemLabel ?coord
WHERE
{
?item wdt:P31/wdt:P279* wd:Q486972;
rdfs:label ?itemLabel;
wdt:P625 ?coord;
{?item wdt:P17 wd:Q183}
UNION
{?item wdt:P17 wd:Q40}
FILTER (lang(?itemLabel) = "de") .
FILTER regex (?itemLabel, "(ow|itz)$").
}
or as an alternative create a new variable containing both countries using VALUES:
SELECT ?item ?itemLabel ?coord
WHERE
{
VALUES ?country { wd:Q40 wd:Q183 }
?item wdt:P31/wdt:P279* wd:Q486972;
wdt:P17 ?country;
rdfs:label ?itemLabel;
wdt:P625 ?coord;
FILTER (lang(?itemLabel) = "de") .
FILTER regex (?itemLabel, "(ow|itz)$").
}

Related

wikidata sparql query timeout optimisation

I want to retrieve all instances of musicians (Q639669) in a given city (P131) born after 1900. When I pass in the wikidata example city Rotterdam (Q34370) it works. However, replacing the city with a larger city (e.g., Paris, Q90) it will timeout.
Is there a way to optimise this or split into chunks to make repeated queries?
I'm actually only interested the number of cases it returns (i.e. a single value), without needing all the metadata about the artist name, etc.
Would be really helpful if someone can give me pointers to solving this. Thanks!
SELECT ?itemLabel ?itemDescription ?birth
WHERE {
?item wdt:P106/wdt:P279* wd:Q639669 .
?item wdt:P19/wdt:P131* wd:Q34370 .
OPTIONAL {?item wdt:P569 ?birth}
filter (?birth > "1900-01-01"^^xsd:dateTime)
SERVICE wikibase:label {
bd:serviceParam wikibase:language "en" .
}
}
The * (ZeroOrMore) property path operator used on the P131 ("located in the administrative territorial entity") is one of the culprits here. A simple approach to getting an answer would be to manually run queries that build up that property path one element at a time:
In query 1: ?item wdt:P19 wd:Q34370 .
In query 2: ?item wdt:P19/wdt:P131 wd:Q34370 .
In query 3: ?item wdt:P19/wdt:P131/wdt:P131 wd:Q34370 .
etc.
I found through experimentation that there is no data past 3 occurrences of P131. However, be aware that there are duplicates across these queries, because some people are listed as having birth places both "in Paris" and also in some sub-region of Paris (for example, Claude Arrieu (Q272886) listed as being born in both Paris and the 8th arrondissement).
You can also use UNION to put several of these property paths together into a single query, though be aware that this may increase the query time and move you back towards a timeout depending on the data:
SELECT ?item ?itemLabel ?itemDescription ?birth WHERE {
?item wdt:P106 / wdt:P279 * wd:Q639669 .
{
?item wdt:P19 wd:Q90 .
} UNION {
?item wdt:P19 / wdt:P131 wd:Q90 .
} UNION {
?item wdt:P19 / wdt:P131 / wdt:P131 wd:Q90 .
} UNION {
?item wdt:P19 / wdt:P131 / wdt:P131 / wdt:P131 wd:Q90 .
} UNION {
?item wdt:P19 / wdt:P131 / wdt:P131 / wdt:P131 / wdt:P131 wd:Q90 .
}
OPTIONAL {
?item wdt:P569 ?birth
}
FILTER(?birth > "1900-01-01"^^xsd:dateTime)
SERVICE wikibase:label
{
bd:serviceParam wikibase:language "en" .
}
}
A couple of other comments:
If you only want the count of people, you can replace the SELECT variables with a count, which may improve the runtime a bit: SELECT (COUNT(DISTINCT ?item) AS ?count) WHERE { … } which returns the answer 1440.
The use of OPTIONAL to bind the ?birth variable combined with the FILTER outside of the OPTIONAL may not be what you want. The Filter will remove any results where ?birth is unbound, making the OPTIONAL really non-optional. Consider either removing the OPTIONAL and binding ?birth right next to the FILTER, or moving the FILTER inside the OPTIONAL to apply that date range filter only to people who have birth data (which changes the count from 1440 to 2456 – many musicians born in Paris missing birth dates, it seems!)

Why this wikidata SPARQL query is missing country information?

This SPARQL query on Wikidata is missing the form of government for a lot of entries. My query:
SELECT DISTINCT ?country ?countryLabel
(group_concat(DISTINCT ?bfogLabel;separator=", ") as ?Government)
WHERE
{
?country wdt:P31 wd:Q3624078.
OPTIONAL {?country wdt:P122 ?bfog } . # basic form of government
SERVICE wikibase:label
{ bd:serviceParam wikibase:language "en" .
?country rdfs:label ?countryLabel .
?bfog rdfs:label ?bfogLabel .
}
}
GROUP BY ?country ?countryLabel
ORDER BY ?countryLabel
Angola is in Wikipedia's infobox: "Unitary dominant-party presidential constitutional republic". But it is empty in this query.
Why is that? More important: is there any fix for this? I saw in this question that wikidata is not as reliable as possible when it comes to data categorization.
Try it out here

Overcoming single value constraing issues with P625 coordinate location in Wikidata

I am trying to get a city list together with region and country information with a query like this:
# get a list of cities
# for geograpy3 library
# see https://github.com/somnathrakshit/geograpy3/issues/15
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX wd: <http://www.wikidata.org/entity/>
PREFIX wdt: <http://www.wikidata.org/prop/direct/>
PREFIX p: <http://www.wikidata.org/prop/>
PREFIX ps: <http://www.wikidata.org/prop/statement/>
PREFIX pq: <http://www.wikidata.org/prop/qualifier/>
# get human settlements
SELECT DISTINCT ?city ?cityLabel (max(?cityPop) as ?cityPopulation) ?coord ?region ?regionLabel ?regionIsoCode ?country ?countryLabel ?countryIsoCode ?countryPopulation ?countryGdpPerCapita WHERE {
# if you uncomment this line this query might run for some 3 hours on a local wikidata copy using Apache Jena
# run for Vienna, Illinois, Vienna Austria, Paris Texas and Paris France as example only
# VALUES ?city { wd:Q577544 wd:Q1741 wd:Q830149 wd:Q90}.
# run for Andorra
VALUES ?country {wd:Q228}.
# instance of human settlement https://www.wikidata.org/wiki/Q486972
?city wdt:P31/wdt:P279* wd:Q486972 .
# label of the City
?city rdfs:label ?cityLabel filter (lang(?cityLabel) = "en").
# country this city belongs to
?city wdt:P17 ?country .
# label for the country
?country rdfs:label ?countryLabel filter (lang(?countryLabel) = "en").
# https://www.wikidata.org/wiki/Property:P297 ISO 3166-1 alpha-2 code
?country wdt:P297 ?countryIsoCode.
# population of country
?country wdt:P1082 ?countryPopulation.
OPTIONAL {
?country wdt:P2132 ?countryGdpPerCapita.
}
OPTIONAL {
# located in administrative territory
# https://www.wikidata.org/wiki/Property:P131
?city wdt:P131* ?region.
# administrative unit of first order
?region wdt:P31/wdt:P279* wd:Q10864048.
?region rdfs:label ?regionLabel filter (lang(?regionLabel) = "en").
# isocode state/province
OPTIONAL { ?region wdt:P300 ?regionIsoCode. }
}
# population of city
OPTIONAL { ?city wdt:P1082 ?cityPop.}
# get the coordinates
OPTIONAL { ?city wdt:P625 ?coord. }
} GROUP BY ?city ?cityLabel ?coord ?region ?regionLabel ?regionIsoCode ?country ?countryLabel ?countryIsoCode ?countryPopulation ?countryGdpPerCapita
ORDER BY ?cityLabel
try it!
to experiment with the query i comment out the
# VALUES ?city { wd:Q577544 wd:Q1741 wd:Q830149 wd:Q90}.
# run for Andorra
VALUES ?country {wd:Q228}.
part to see that the results make sense.
Now for The Andorra trial there are cities with multiple coordinates:
https://www.wikidata.org/wiki/Property:P625
Which are event flagged as a problem.
I know there is work-around as explained in How to get only the most recent value from a Wikidata property? and https://w.wiki/EKB
I tried the approach in the snippet
?city p:P1082 ?populationStatement .
?populationStatement ps:P1082 ?cityPopulation.
?populationStatement pq:P585 ?date
FILTER NOT EXISTS { ?city p:P1082/pq:P585 ?date_ . FILTER (?date_ > ?date) }
which makes queries real slow and in this case i am looking into all instance of human settlement which are a few hundred thousand. Even on my local wikidata copy this runs more than 3 hours !
So i wonder whether there is an alternative with MAX, AVG, Subqueries with limit or the like or any other nifty idea that would solve the issue with a decent performance?
You can use sample() as aggregation function (sparql doc).
Starting from you query expression, you will need to change the first line into
SELECT DISTINCT ?city ?cityLabel (max(?cityPop) as ?cityPopulation) (sample(?coord) as ?coordinate) ?region ?regionLabel ?regionIsoCode ?country ?countryLabel ?countryIsoCode ?countryPopulation ?countryGdpPerCapita WHERE {
and your second last line into:
} GROUP BY ?city ?cityLabel ?region ?regionLabel ?regionIsoCode ?country ?countryLabel ?countryIsoCode ?countryPopulation ?countryGdpPerCapita
The result should look like this: https://w.wiki/dRV.
The work-around you tried does not work because unlike P1082 (population), P625 (coordinate) have in most cases no P585 (point in time) qualifier.

Wikidata query duplicates

Sorry if my english is bad, but I don't really have any place where I can ask this question in my native language.
I've been trying to create SPARQL query for Wikidata that should create a list of all horror fiction that was created in 1925-1950 years, names of authors and, if available, pictures:
SELECT DISTINCT ?item ?itemLabel ?author ?name ?creation ?picture
WHERE
{
?item wdt:P136 wd:Q193606 . # book
?item wdt:P50 ?author . # author
?item wdt:P577 ?creation .
?item wdt:P577 ?end .
?author rdfs:label ?name .
OPTIONAL{ ?item wdt:P18 ?picture }
FILTER (?creation >= "1925-01-01T00:00:00Z"^^xsd:dateTime) .
FILTER (?end <= "1950-12-31T23:59:59Z"^^xsd:dateTime) .
SERVICE wikibase:label
{
bd:serviceParam wikibase:language "en" .
}
}
However, for some reason this query placing duplicates in the list. DISTINCT doesn't do much. After some time I figured out that the reason is "?item rdfs:label ?name .". If this line is removed, no duplicates are listed. But I need this line to show author name in the list!
Any ideas on how to fix this?
You don't need to use ?item rdfs:label ?name . as you already get items labels as ?itemLabel thank to SERVICE wikibase:label.
Then, you will get duplicate results for every items that have a SELECTed property with possibly multiple values: here, you are SELECTing authors (P50), which will create duplicates for every item with several authors.
The query is actually giving you distinct items. The problem is that some items have multiple rdfs:labels. You can see as an example the item:
SELECT *
WHERE
{
wd:Q2882840 rdfs:label ?label
SERVICE wikibase:label
{
bd:serviceParam wikibase:language "en" .
}
}
And since there are multiple rdfs:label predicates for some items, they are showing up in separate rows.
You can aggregate your results according to the book title (the item's label) using the
group by
keyword.
Thus, every result will be a group which will show up once, and other fields which have different values, will be aggregated using the separator (in this case, a comma).
The fixed query:
SELECT DISTINCT ?item ?itemLabel
(group_concat(distinct ?author;separator=",") as ?author)
(group_concat(distinct ?name;separator=",") as ?name)
(group_concat(distinct ?creation;separator=",") as ?creation)
(group_concat(distinct ?picture;separator=",") as ?picture)
WHERE
{
?item wdt:P136 wd:Q193606 . # book
?item wdt:P50 ?author . # author
?item wdt:P577 ?creation .
?item wdt:P577 ?end .
?author rdfs:label ?name .
OPTIONAL{ ?item wdt:P18 ?picture }
FILTER (?creation >= "1925-01-01T00:00:00Z"^^xsd:dateTime) .
FILTER (?end <= "1950-12-31T23:59:59Z"^^xsd:dateTime) .
SERVICE wikibase:label
{
bd:serviceParam wikibase:language "en" .
}
}
group by ?item ?itemLabel

getting the list of countries which have more than one official language

I tried to find the all the country from dbpedia which have more than one official language. i tried the following sparql query but did not work.
SELECT distinct ?country ?officialLanguage
WHERE {
?country rdf:type dbo:Country .
?country dbo:officialLanguage ?officialLanguage.
FILTER (COUNT(?officialLanguage) >1)
}
and got the following error-
Virtuoso 37000 Error SP030: SPARQL compiler, line 8: Aggregates are allowed only in result sets at ')' before '>'
I am very new to sparql. I think I am missing something.
As an alternative query to the answer of #svick you could try
SELECT ?country (COUNT(?officialLanguage) AS ?nrOfLanguages)
WHERE {
?country rdf:type dbo:Country .
?country dbo:officialLanguage ?officialLanguage.
}
GROUP BY ?country
HAVING(COUNT(?officialLanguage) > 1)
SPARQL doesn't work like that, it can't deduce that you mean the count of distinct ?officialLanguage for each ?country. You will need to be more explicit than that, for example:
SELECT distinct ?country ?officialLanguage
WHERE {
?country rdf:type dbo:Country .
?country dbo:officialLanguage ?officialLanguage.
{
SELECT ?country COUNT(*) AS ?languages
WHERE {
?country dbo:officialLanguage [].
}
}
FILTER (?languages > 1)
}