How to retrieve only actual values - sparql

I have a SPARQL query where I try to retrieve all current german municipalities from wikidata, with some of their properties.
For example I try to retrieve their postal codes and parent regions:
SELECT DISTINCT ?region ?regionLabel ?postalCode ?parentLabel WHERE {
?region wdt:P31 wd:Q262166. # Municipalities
?region wdt:P17 wd:Q183. # from Germany
MINUS { ?region p:P576 _:anyValue. } # Only regions which exist today
OPTIONAL { ?region wdt:P281 ?postalCode. } # Select postal code
OPTIONAL { ?region wdt:P131 ?parent. } # Select administrative parents
SERVICE wikibase:label { bd:serviceParam wikibase:language "de" . } # Show german labels
}
As you can seen, I already found out how to exclude those municipalities which doesn't exist any more (because they have a property p:P576 = end date). I know it is a little bit fuzzy, because it could be an end date in the future (which is only determined already).
But more important, the postal codes and parents include "historical ones", which I would like to exclude. I know that I could do something like answered in "https://stackoverflow.com/questions/49066390/how-to-get-only-the-most-recent-value-from-a-wikidata-property", but the solution their is to bind the end date of the properties, which is usually not set for the current value. Despite the fact that I don't know how to build the query with two optional values.

Related

Is it possible to formulate an OPTIONAL "subquery" so that it returns at most one record

The following Wikidata query returns a list of airports and their IATA codes.
I am using ?airport rdfs:label ?airportName to also get a label for the airports. Most airports have labels in multiple languages, so I want to select preferably the english name. Some airports have only the language en-ca and en-gb, but not en, so I cannot select them with lang(?airportName) = 'en'.
With the current implementation, I get multiple records for some airports:
select
?airport
?airportName
(lang(?airportName) as ?lang)
?IATAAirPortCode
{
?airport
wdt:P238
?IATAAirPortCode
optional {?airport rdfs:label ?airportName .
filter(langMatches(lang(?airportName), 'en')) }
}
order by
?IATAAirPortCode
I'd like to have one record per airport only. Is it somehow possible to formulate an optional { ... } clause to return at most one record of an airport.
For this style of query where you want a single rdfs:label value per result, you can use wikidata's wikibase:label SPARQL extension like this:
SELECT
?airport
?airportLabel
(LANG(?airportLabel) AS ?lang)
?IATAAirPortCode
{
?airport wdt:P238 ?IATAAirPortCode
SERVICE wikibase:label {
bd:serviceParam wikibase:language "en"
}
}
ORDER BY ?IATAAirPortCode
The ?airportLabel variable is automatically bound to the label of each ?airport with only labels in the given preferred language (the language string "en" here can contain multiple, comma-separated acceptable language codes).
A more general-purpose solution that is portable SPARQL (without wikidata extensions) would be more complicated, and might differ depending on the specifics of the query. In this particular case, where your OPTIONAL is only adding one variable, you can do it without using the wikibase extension by using GROUP BY and SAMPLE aggregation:
SELECT
?airport
(SAMPLE(?airportLabel) AS ?airportName)
(LANG(?airportName) AS ?lang)
?IATAAirPortCode
{
?airport wdt:P238 ?IATAAirPortCode
OPTIONAL {
?airport rdfs:label ?airportLabel
FILTER(langMatches(lang(?airportLabel), 'en'))
}
}
GROUP BY ?airport ?IATAAirPortCode
ORDER BY ?IATAAirPortCode

Improve SPARQL for the time range

I am trying to make some sparql query statement with python programming and I got stuck below. What I am trying to make is query statement of the time range somehow general manner. Give me an example below. Question is "Who was the mayor of the new york city in %(YEAR)?" and corresponding sparql I write is same as below. For the far previous range, it works okay. But the recent range, it doesn't work because the current mayor does not have end time. In other words, can I make a code something like FIlTER (?v3 has no value or ?v3 > "2011-01-01"^^xsd:dateTime) ? I'd like to improve my sparql with one query in more general manner corresponding to the time range questions.
CASE 1: Who was the mayor of the new york city in 2011? -ok
SELECT DISTINCT ?v ?vLabel ?v2 ?v3
WHERE
{
wd:Q60 p:P6 ?stmt.
?stmt ps:P6 ?v;
pq:P580 ?v2;
pq:P582 ?v3.
FILTER (?v2 < "2011-01-01"^^xsd:dateTime) # start time
FILTER (?v3 > "2011-01-01"^^xsd:dateTime) # end time
SERVICE wikibase:label { bd:serviceParam wikibase:language "ko,en". }
}
ORDER BY DESC (?v2)
CASE 2: Who was the mayor of the new york city in 2016? - No answer
SELECT DISTINCT ?v ?vLabel ?v2 ?v3
WHERE
{
wd:Q60 p:P6 ?stmt.
?stmt ps:P6 ?v;
pq:P580 ?v2;
pq:P582 ?v3
FILTER (?v2 < "2016-01-01"^^xsd:dateTime) # start time
FILTER (?v3 > "2016-01-01"^^xsd:dateTime) # end time
SERVICE wikibase:label { bd:serviceParam wikibase:language "ko,en". }
}
ORDER BY DESC (?v2)
Step 1: Get the query pattern to match even if there is no value for “end time”. This can be done using OPTIONAL for that triple pattern:
OPTIONAL { ?stmt pq:P582 ?v3 }
Step 2: Change the filter so that it accepts solutions where the end time variable ?v3 has no value. This can be done using the bound function, which returns true if there is a value and false otherwise:
FILTER (!bound(?v3) || ?v3 > "2016-01-01"^^xsd:dateTime) # end time
Complete query:
SELECT DISTINCT ?v ?vLabel ?v2 ?v3
WHERE
{
wd:Q60 p:P6 ?stmt.
?stmt ps:P6 ?v;
pq:P580 ?v2.
OPTIONAL { ?stmt pq:P582 ?v3 }
FILTER (?v2 < "2016-01-01"^^xsd:dateTime) # start time
FILTER (!bound(?v3) || ?v3 > "2016-01-01"^^xsd:dateTime) # end time
SERVICE wikibase:label { bd:serviceParam wikibase:language "ko,en". }
}
ORDER BY DESC (?v2)

want to remove entity that has no label from the result

I'd like to ask one tricky thing about label. Using SERVICE keyword like SERVICE wikibase:label { bd:serviceParam wikibase:language "ko,en". } enable us to switch language label when the first preference is not mached to the target entity label.
However, I want to drop out some entities that does not have any label. However, the service keyword add entity with Qxxxx label when the entity does not have any language match label. How could I remove the entity from the result?
I know we can filter that out using rdfs:label for the all the variables explicitly but setting all the rdfs:label to all the variables is another headeache. So I'd like to know how to improve the query with SERVICE wikibase:label I want to filter out entity that doesn't have any label. Should I replace SERVICE with rdfs:label?
SELECT DISTINCT ?vLabel
WHERE {
hint:Query hint:optimizer "None" .
{
SELECT DISTINCT ?i {
?i wdt:P31 wd:Q515.
}LIMIT 15
}
?v wdt:P937 ?i.
SERVICE wikibase:label { bd:serviceParam wikibase:language "ko,en". }
}
LIMIT 3
RESULT:
Q59780594 <- no lang label
Q24642253 <- no lang label
The Wikidata label service doesn't provide a built-in way to skip resources that don't have a label.
The simplest option would be to wrap the query as a subquery into a new SELECT query, and use a filter to remove any Qxxxx labels. This uses the fact that only the real labels have a language tag:
SELECT ?vLabel {
{
SELECT DISTINCT ?vLabel
...
}
FILTER lang(?vLabel)
}
Edit: Below is my original (and inferior) answer, which used a regular expression on the label itself to remove the Qxxxx ones. It would also filter out any resources that actually have a label of the form Qxxxx, if such resources exist in Wikidata.
SELECT ?vLabel {
{
SELECT DISTINCT ?vLabel
...
}
FILTER (!REGEX(?vLabel, "^Q[0-9]+$"))
}

SPARQL question: how to return property labels and associated date qualifiers from Wikidata

I am trying to return results for a set of persons (Edinburgh University alumni) who have held political office. I would like to return the title label of the office held, along with the start and end dates for each office, with many individuals holding multiple positions. I seem to be able to get one or the other or can get it to work if the person only held one position, but can't get the two to come together where multiple offices were held.
My current version of the query is below. This will give me the start and end dates, but rather than the label if the political office, such as Member of the [x] Parliament of the United Kingdom, ?officeLabel returns a value such as: statement/Q4668868-E3734C7D-40F0-4D4A-8208-E3D6B8C944CB
SELECT DISTINCT ?alumni ?fullName ?roleLabel ?officeLabel ?start ?end WHERE {
SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en". }
?alumni wdt:P69 wd:Q160302.
?alumni rdfs:label ?fullName.
?alumni wdt:P106 ?role.
#Use Values to separate out politicians - Q82955
VALUES (?role) {
(wd:Q82955)
}
#Select only where position of office is stated but make dates optional
?alumni p:P39 ?office.
OPTIONAL { ?office pq:P580 ?start. }
OPTIONAL { ?office pq:P582 ?end. }
FILTER(LANGMATCHES(LANG(?fullName), "en"))
FILTER(NOT EXISTS { FILTER(LANGMATCHES(LANG(?fullName), "en-ca")) })
FILTER(NOT EXISTS { FILTER(LANGMATCHES(LANG(?fullName), "en-gb")) })
}
ORDER BY ?fullName
LIMIT 10
Yeah, I still get tripped up on qualifiers and the Wikidata Data Model too.
Diagram by
By Michael F. Schönitzer - Own work, based on File:Rdf mapping.svg, CC
BY 4.0, https://commons.wikimedia.org/w/index.php?curid=63880194
After going the "p: route" from the "item", you need the "ps: route" to get back to the "simple value".
So, using this to slightly modify your query gives the results I think you want.
SELECT DISTINCT ?alumni ?fullName ?roleLabel ?officeLabel ?start ?end WHERE {
?alumni wdt:P69 wd:Q160302.
?alumni rdfs:label ?fullName.
?alumni wdt:P106 ?role.
VALUES (?role) {
(wd:Q82955)
}
?alumni p:P39 ?officeStmnt.
?officeStmnt ps:P39 ?office.
OPTIONAL { ?officeStmnt pq:P580 ?start. }
OPTIONAL { ?officeStmnt pq:P582 ?end. }
SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en". }
FILTER(LANGMATCHES(LANG(?fullName), "en"))
FILTER(NOT EXISTS { FILTER(LANGMATCHES(LANG(?fullName), "en-ca")) })
FILTER(NOT EXISTS { FILTER(LANGMATCHES(LANG(?fullName), "en-gb")) })
}
ORDER BY ?fullName
LIMIT 10
Link to query on Wikidata

SPARQL - Extract Label from entity URI

I'm trying to extract a list of diseases that have symptoms from Wikidata.
The thing is, when I query I get a list of entity URIs, not a list of labels, for the Symptoms column.
My query, tested on here :
SELECT ?disease ?diseaseLabel (GROUP_CONCAT(?symptoms; SEPARATOR = ", ") AS ?Symptoms)
WHERE {
?disease wdt:P31 wd:Q12136.
SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en". }
OPTIONAL { ?disease wdt:P780 ?symptoms. }
FILTER(EXISTS { ?disease wdt:P780+ ?symptom. })
}
GROUP BY ?disease ?diseaseLabel
Which gives the following result :
For example , at the disease measles ,what I want to select for the Symptoms column is : fever, cough, runny nose, maculopapular rash, lymphadenopathy, anorexia, diarrhea..
Which are the exact labels for the URIs in the Symptom column for the particular disease.
Any help/hint and suggestions are welcomed , thank you !