Wikidata SPARQL queries returning different results after filtering for English labels - sparql

My understanding of Wikidata SPARQL queries is that you can filter results for English labels in two ways.
Adding SERVICE wikibase:label { bd:serviceParam wikibase:language "en" } to invoke a label service; or
Adding ?thing rdfs:label ?thingLabel FILTER (lang(?thingLabel) = "en") for every output label.
I am running a query where I'm trying to get all properties of an entity in English. I followed a Stackoverflow post and came up with two queries.
Query 1: Running this query takes returns 47 results.
SELECT ?itemLabel ?propLabel ?statement_property_objLabel
WHERE {
VALUES (?item) {(wd:Q24)}
?item ?property [?statement_property ?statement_property_obj] .
?prop wikibase:claim ?property.
?prop wikibase:statementProperty ?statement_property.
# Call label service.
SERVICE wikibase:label { bd:serviceParam wikibase:language "en" }
} ORDER BY ?propLabel
Query 2: Running this query returns 35 results.
SELECT ?itemLabel ?propLabel ?statement_property_objLabel
WHERE {
VALUES (?item) {(wd:Q24)}
?item ?property [?statement_property ?statement_property_obj] .
?prop wikibase:claim ?property.
?prop wikibase:statementProperty ?statement_property.
# Call label service for each label.
?item rdfs:label ?itemLabel FILTER (lang(?itemLabel) = "en") .
?statement_property_obj rdfs:label ?statement_property_objLabel FILTER (lang(?statement_property_objLabel) = "en") .
?prop rdfs:label ?propLabel FILTER (lang(?propLabel) = "en") .
} ORDER BY ?propLabel
Why is the second query returning fewer rows? Thanks for any help.

I think the cause is that the wikibase:label service returns label results for any value of ?statement_property_obj, even if that value has no actual rdfs:label defined (it appears to just return the actual value of ?statement_property_obj itself).
As an example, see the very first result in query 1, where ?statement_property_objLabel is bound to topic/Jack_Bauer. This is not the value of an actual rdfs:label property in the data, just a 'fallback' value that the label service provides. So query 2, which explicitly queries for rdfs:label attributes, won't return this (and similar) results.

Related

Get multiple optional properties with multiple optional qualifiers on Wikidata Query Service

I want to load the following data of specific humans with the Wikidata Query Service:
properties
birth date
lifestyles
qualifiers
start times of the lifestyles
end times of the lifestyles
And I want exactly one row for each human in the output, no matter if no properties/qualifiers or multiple properties/qualifiers are set.
I managed to create a query which works with humans who have at least one lifestyle, but when there's none, the query takes a long time and instead of empty cells many lifestyles, start and end times are loaded which are not related to the human.
This is the query for the humans Q185140, Q463434 and Q3378937 and Q3378937 doesn't have a lifestyle, but still the lifestyles, lifestylesStarts and lifestylesEnds cells aren't empty as expected and instead full of values:
SELECT ?human ?humanLabel
(GROUP_CONCAT(DISTINCT ?lifestyleLabel;separator="|") AS ?lifestyles)
(GROUP_CONCAT(DISTINCT ?lifestyleStart;separator="|") AS ?lifestylesStarts)
(GROUP_CONCAT(DISTINCT ?lifestyleEnd;separator="|") AS ?lifestylesEnds)
(GROUP_CONCAT(DISTINCT ?dateBirth;separator="|") AS ?dateBirths)
WHERE {
VALUES ?human { wd:Q185140 wd:Q463434 wd:Q3378937 } .
OPTIONAL{ ?human p:P1576 ?statement . }
OPTIONAL{ ?statement ps:P1576 ?lifestyle . }
OPTIONAL{ ?statement pq:P580 ?lifestyleStart . }
OPTIONAL{ ?statement pq:P582 ?lifestyleEnd . }
OPTIONAL{ ?human wdt:P569 ?dateBirth . }
SERVICE wikibase:label {
bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en" .
?human rdfs:label ?humanLabel .
?lifestyle rdfs:label ?lifestyleLabel .
}
}
GROUP BY ?human ?humanLabel
You can run the query and see the output here.
What do I have to change to get empty cells if properties/qualifiers are not set?

How to access child properties using Wikidata SPARQL Query Service?

I would like to access child properties of wikidata entities. An example may be property P1033, which is a child of P4952, for an entity such as Q49546. How can I do this dynamically in a SPARQL query?
Using the query builder provided by the online Wikidata Query Service, I can construct a simple query, which works for normal properties (in the linked example: mass), but not for the desired sub-properties (in the linked example: NPFA-code for health hazard), which end up empty, even though they are clearly set in the web-result. Side-note: it is a different example than the one from the first paragraph.
The desired objective is the dynamic query as follows:
SELECT ?p ?item ?itemDescription ?prop ?value ?valueLabel ?itemLabel ?itemAltLabel ?propLabel WHERE {
BIND(wd:Q138809 AS ?item)
?prop wikibase:directClaim ?p.
#?item ?p ?value.
SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en".
?value rdfs:label ?valueLabel.
?prop rdfs:label ?propLabel.
?item rdfs:label ?itemLabel;
skos:altLabel ?itemAltLabel;
schema:description ?itemDescription.
}
}
ORDER BY DESC(?prop)
LIMIT 10
With the line 4 as a comment, I can get my propLabel as desired, but no value; doing it the other way round with the line not as comment, I do get only the properties, which are set on first level, but not the child properties.
Thanks to #AKSW, I herewith post the final query solving my problem:
SELECT ?item ?itemLabel ?itemDescription ?itemAltLabel ?prop ?propertyLabel ?propertyValue ?propertyValueLabel ?qualifier ?qualifierLabel ?qualifierValue
{
VALUES (?item) {(wd:Q138809)}
?item ?prop ?statement .
?statement ?ps ?propertyValue .
?property wikibase:claim ?prop .
?property wikibase:statementProperty ?ps .
OPTIONAL { ?statement ?pq ?qualifierValue . ?qualifier wikibase:qualifier ?pq . }
SERVICE wikibase:label { bd:serviceParam wikibase:language "en" }
}
The key step for me was to understand that child properties are actually called qualifiers.

Querying wikidata for "property constraint"

TL;DR
How to query (sparql) about properties of a property?
Or..
So as part of my project I need to find the properties in wikidata that have any time constraint, to be specific both "start time" and "end time".
I tried this query:
SELECT DISTINCT ?prop WHERE {
SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en". }
?person wdt:P31 wd:Q5.
?person ?prop ?statement.
?statement pq:P580 ?starttime.
?statement pq:P582 ?endtime.
}
LIMIT 200
**yeah the properties should be related to humans
Anyway, I do get some good results like:
http://www.wikidata.org/prop/P26
http://www.wikidata.org/prop/P39
But I also get some other properties that definitely wrong.
so, basically what i'm trying to do is to get a list of properties that has the property constraint (P2302) of- allowed qualifiers constraint (Q21510851) with Start time (P580) and End Time (P582)
is that even possible:
I tried some queries like:
SELECT DISTINCT ?property ?propertyLabel ?propertyDescription ?subpTypeOf ?subpTypeOfLabel
WHERE
{
?property rdf:type wikibase:Property .
?property wdt:P2302 ?subpTypeOf.
SERVICE wikibase:label { bd:serviceParam wikibase:language "en" }
}
but does not get the results I wanted.
is it even possible to query this kind of stuff?
Thanks
Qualifiers are used on property pages too. Your second query should be:
SELECT DISTINCT ?prop ?propLabel {
?prop p:P2302 [ ps:P2302 wd:Q21510851 ; pq:P2306 wd:P580, wd:P582 ] ;
p:P2302 [ ps:P2302 wd:Q21503250 ; pq:P2308 wd:Q5 ; pq:P2309 wd:Q21503252 ] .
SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en" }
} ORDER BY ASC(xsd:integer(strafter(str(?prop), concat(str(wd:), "P"))))
Try it!
Your first query is correct, but note that this is an 'as-is' query. For example, wd:P410 does not have respective constraints, but look at wd:Q83855.

How to query Wikidata for "also known as"

I would like to know how to query Wikidata by using the alias ("also known as").
Right now I am trying
SELECT ?item
WHERE
{
?item rdfs:aliases ?alias.
FILTER(CONTAINS(?alias, "Angela Kasner"#en))
}
LIMIT 5
This is simply a query that works if I replace rdfs:aliases by rdfs:labels.
I am trying this, because Help:Aliases says that aliases are searchable in the same way as labels, but I can't find any other resource on that nor can I find an example.
This query might be helpful for someone querying also known as for properties:
SELECT ?property ?propertyLabel ?propertyDescription (GROUP_CONCAT(DISTINCT(?altLabel); separator = ", ") AS ?altLabel_list) WHERE {
?property a wikibase:Property .
OPTIONAL { ?property skos:altLabel ?altLabel . FILTER (lang(?altLabel) = "en") }
SERVICE wikibase:label { bd:serviceParam wikibase:language "en" .}
}
GROUP BY ?property ?propertyLabel ?propertyDescription
LIMIT 5000

Filter by type in Wikidata

This SPARQL request looks for all cities called "Berlin" in Wikidata:
SELECT DISTINCT ?item ?itemLabel ?itemDescription WHERE {
?type (a | wdt:P279) wd:Q515. # Sub-type of city
?item wdt:P31 ?type.
?item rdfs:label "Berlin"#en.
SERVICE wikibase:label { bd:serviceParam wikibase:language "en". }
}
PROBLEM: It returns zero result.
Meanwhile, the request below correctly finds Q64 (capital and city-state of Germany), but it also returns a lot of other things called Berlin, so I want to filter on cities (then in a future phase I will order these cities by population, but that is outside the scope of this question):
SELECT DISTINCT ?item ?itemLabel ?itemDescription WHERE {
?item rdfs:label "Berlin"#en.
SERVICE wikibase:label { bd:serviceParam wikibase:language "en". }
}
Note: My code for getting instances of subclasses of city (Berlin is a big city which is subclass of city) seems to work correctly, as illustrated by the results of this query.
It was a Wikidata bug.
According to Wikidata's Jura1, it was a bug in Wikidata caused by someone's experiments with "preferred rank".
Discussion at https://www.wikidata.org/wiki/Wikidata:Project_chat/Archive/2016/09#P31_inconsistency
The bug has been fixed just now.
You can only query for data that is contained in the dataset.
If you try an alternative of your query
SELECT DISTINCT ?item ?itemLabel ?itemDescription ?type1 ?type2 WHERE {
?item rdfs:label "Berlin"#en.
optional{?item rdf:type ?type1 }
optional{?item wdt:P279 ?type2 }
SERVICE wikibase:label { bd:serviceParam wikibase:language "en". }
}
it returns no types, neither connected by rdf:type nor wdt:P279.
If you have a look at the entity of the capital and city state Berlin, you can see that there is information about "instance of", but this property is supposed to be https://www.wikidata.org/wiki/Property:P31. And none of them links to wd:Q515, I'm wondering from where you got this idea.
But to be honest, I don't know that much about Wikidata and to me, it's not clear why no rdf:type is used, but a common pattern for RDF datasets is to use
?s rdf:type/rdfs:subClassOf* SUPER_CLASS .
if we assume that there is rdf:type information available.
If you check the types wd:Q64 is an instance of
SELECT DISTINCT ?type ?typeLabel WHERE {
wd:Q64 (a | wdt:P31) ?type.
SERVICE wikibase:label { bd:serviceParam wikibase:language "en". }
}
ORDER BY ?item
None of them are City (wd:Q515) or a sub-class of it.
Looks like a data issue. Perhaps you should contact Wikidata.