Get multiple optional properties with multiple optional qualifiers on Wikidata Query Service - sparql

I want to load the following data of specific humans with the Wikidata Query Service:
properties
birth date
lifestyles
qualifiers
start times of the lifestyles
end times of the lifestyles
And I want exactly one row for each human in the output, no matter if no properties/qualifiers or multiple properties/qualifiers are set.
I managed to create a query which works with humans who have at least one lifestyle, but when there's none, the query takes a long time and instead of empty cells many lifestyles, start and end times are loaded which are not related to the human.
This is the query for the humans Q185140, Q463434 and Q3378937 and Q3378937 doesn't have a lifestyle, but still the lifestyles, lifestylesStarts and lifestylesEnds cells aren't empty as expected and instead full of values:
SELECT ?human ?humanLabel
(GROUP_CONCAT(DISTINCT ?lifestyleLabel;separator="|") AS ?lifestyles)
(GROUP_CONCAT(DISTINCT ?lifestyleStart;separator="|") AS ?lifestylesStarts)
(GROUP_CONCAT(DISTINCT ?lifestyleEnd;separator="|") AS ?lifestylesEnds)
(GROUP_CONCAT(DISTINCT ?dateBirth;separator="|") AS ?dateBirths)
WHERE {
VALUES ?human { wd:Q185140 wd:Q463434 wd:Q3378937 } .
OPTIONAL{ ?human p:P1576 ?statement . }
OPTIONAL{ ?statement ps:P1576 ?lifestyle . }
OPTIONAL{ ?statement pq:P580 ?lifestyleStart . }
OPTIONAL{ ?statement pq:P582 ?lifestyleEnd . }
OPTIONAL{ ?human wdt:P569 ?dateBirth . }
SERVICE wikibase:label {
bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en" .
?human rdfs:label ?humanLabel .
?lifestyle rdfs:label ?lifestyleLabel .
}
}
GROUP BY ?human ?humanLabel
You can run the query and see the output here.
What do I have to change to get empty cells if properties/qualifiers are not set?

Related

Identify entity of a Wikipedia page

My question is related to a similar question/comment which unfortunately never received an answer.
Given a list of multiple Wikipedia pages, e.g.:
https://en.wikipedia.org/wiki/Donald_Trump
https://en.wikipedia.org/wiki/The_Matrix
https://en.wikipedia.org/wiki/Tiger
...
how can I find out what type of entity these articles refer to. i.e. ideally I would want something on a higher level e.g. person, movie, animal etc.
My best guess so far was the Wikidata API using SPARQL to move back the instance_of or subclass tree. However, this did not lead to meaningful results.
SELECT ?lemma ?item ?itemLabel ?itemDescription ?instance ?instanceLabel ?subclassLabel WHERE {
VALUES ?lemma {
"Donald Trump"#en
"The Matrix"#en
"Tiger" #en
}
?sitelink schema:about ?item;
schema:isPartOf <https://en.wikipedia.org/>;
schema:name ?lemma.
?item wdt:P31* ?instance.
?item wdt:P279* ?subclass.
SERVICE wikibase:label {
bd:serviceParam wikibase:language "en,da,sv".}
}
The result can be seen here: https://w.wiki/ZmQ
One option would of course also be to look at the itemDescription, but I'm afraid that this is too granular to build meaningful groups from larger lists and count frequencies later on.
Does anyone have a hint/idea on how to get more general entity categories? Maybe also from the mediawiki API?
Any input would be highly appreciated!
Here are three possibilities, side-by-side:
SELECT ?lemma ?item (GROUP_CONCAT(DISTINCT ?instanceLabel; SEPARATOR = " ") AS ?a) (GROUP_CONCAT(DISTINCT ?subclassLabel; SEPARATOR = " ") AS ?b) (GROUP_CONCAT(DISTINCT ?isaLabel; SEPARATOR = " ") AS ?c) WHERE {
VALUES ?lemma {
"Donald Trump"#en
"The Matrix"#en
"Tiger"#en
}
?sitelink schema:about ?item;
schema:isPartOf <https://en.wikipedia.org/>;
schema:name ?lemma.
OPTIONAL { ?item (wdt:P31/(wdt:P279*)) ?instance. }
OPTIONAL { ?item wdt:P279 ?subclass. }
OPTIONAL { ?item wdt:P31 ?isa. }
SERVICE wikibase:label {
bd:serviceParam wikibase:language "en,da,sv".
?instance rdfs:label ?instanceLabel.
?subclass rdfs:label ?subclassLabel.
?isa rdfs:label ?isaLabel.
}
# Here, you could add: FILTER(?instanceLabel in ("mammal"#en, "movie"#en, "musical"#en (and so on...)))
}
GROUP BY ?lemma ?item
Live here.
If you're looking at labels such as "film" and "mammal", i. e. a couple dozen at most, you could explicitly list them in order of preference, then use the first one that occurs.
Note that you may be running into this bug: https://www.wikidata.org/wiki/Wikidata:SPARQL_tutorial#wikibase:Label_and_aggregations_bug

Wikidata SPARQL queries returning different results after filtering for English labels

My understanding of Wikidata SPARQL queries is that you can filter results for English labels in two ways.
Adding SERVICE wikibase:label { bd:serviceParam wikibase:language "en" } to invoke a label service; or
Adding ?thing rdfs:label ?thingLabel FILTER (lang(?thingLabel) = "en") for every output label.
I am running a query where I'm trying to get all properties of an entity in English. I followed a Stackoverflow post and came up with two queries.
Query 1: Running this query takes returns 47 results.
SELECT ?itemLabel ?propLabel ?statement_property_objLabel
WHERE {
VALUES (?item) {(wd:Q24)}
?item ?property [?statement_property ?statement_property_obj] .
?prop wikibase:claim ?property.
?prop wikibase:statementProperty ?statement_property.
# Call label service.
SERVICE wikibase:label { bd:serviceParam wikibase:language "en" }
} ORDER BY ?propLabel
Query 2: Running this query returns 35 results.
SELECT ?itemLabel ?propLabel ?statement_property_objLabel
WHERE {
VALUES (?item) {(wd:Q24)}
?item ?property [?statement_property ?statement_property_obj] .
?prop wikibase:claim ?property.
?prop wikibase:statementProperty ?statement_property.
# Call label service for each label.
?item rdfs:label ?itemLabel FILTER (lang(?itemLabel) = "en") .
?statement_property_obj rdfs:label ?statement_property_objLabel FILTER (lang(?statement_property_objLabel) = "en") .
?prop rdfs:label ?propLabel FILTER (lang(?propLabel) = "en") .
} ORDER BY ?propLabel
Why is the second query returning fewer rows? Thanks for any help.
I think the cause is that the wikibase:label service returns label results for any value of ?statement_property_obj, even if that value has no actual rdfs:label defined (it appears to just return the actual value of ?statement_property_obj itself).
As an example, see the very first result in query 1, where ?statement_property_objLabel is bound to topic/Jack_Bauer. This is not the value of an actual rdfs:label property in the data, just a 'fallback' value that the label service provides. So query 2, which explicitly queries for rdfs:label attributes, won't return this (and similar) results.

How to access child properties using Wikidata SPARQL Query Service?

I would like to access child properties of wikidata entities. An example may be property P1033, which is a child of P4952, for an entity such as Q49546. How can I do this dynamically in a SPARQL query?
Using the query builder provided by the online Wikidata Query Service, I can construct a simple query, which works for normal properties (in the linked example: mass), but not for the desired sub-properties (in the linked example: NPFA-code for health hazard), which end up empty, even though they are clearly set in the web-result. Side-note: it is a different example than the one from the first paragraph.
The desired objective is the dynamic query as follows:
SELECT ?p ?item ?itemDescription ?prop ?value ?valueLabel ?itemLabel ?itemAltLabel ?propLabel WHERE {
BIND(wd:Q138809 AS ?item)
?prop wikibase:directClaim ?p.
#?item ?p ?value.
SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en".
?value rdfs:label ?valueLabel.
?prop rdfs:label ?propLabel.
?item rdfs:label ?itemLabel;
skos:altLabel ?itemAltLabel;
schema:description ?itemDescription.
}
}
ORDER BY DESC(?prop)
LIMIT 10
With the line 4 as a comment, I can get my propLabel as desired, but no value; doing it the other way round with the line not as comment, I do get only the properties, which are set on first level, but not the child properties.
Thanks to #AKSW, I herewith post the final query solving my problem:
SELECT ?item ?itemLabel ?itemDescription ?itemAltLabel ?prop ?propertyLabel ?propertyValue ?propertyValueLabel ?qualifier ?qualifierLabel ?qualifierValue
{
VALUES (?item) {(wd:Q138809)}
?item ?prop ?statement .
?statement ?ps ?propertyValue .
?property wikibase:claim ?prop .
?property wikibase:statementProperty ?ps .
OPTIONAL { ?statement ?pq ?qualifierValue . ?qualifier wikibase:qualifier ?pq . }
SERVICE wikibase:label { bd:serviceParam wikibase:language "en" }
}
The key step for me was to understand that child properties are actually called qualifiers.

Retrieve properties and their descriptions from instances of the same class

I want to retrieve all distinct object properties of instances with the same type (class), starting with two initial seeds (wd:Q963 and wd:Q42320). First, I ask for the type (and maybe subtype) of such seeds. Second, all instances of the same class of the seeds are retrieved. Third, properties of the instances are retrieved. Finally, I want to retrieve descriptions of such properties and if possible alternative labels. My query is as follows:
select distinct ?property ?description ?label where{
{
wd:Q963 wdt:P31 ?typesSubject .
?instancesS (wdt:P31|wdt:P279) ?typesSubject .
?instancesS ?property ?unknown .
}
UNION
{
wd:Q42320 wdt:P31 ?typesObject .
?instancesO (wdt:P31|wdt:P279) ?typesObject .
?unknown ?property ?instancesO .
}
?claimPredicate wikibase:directClaim ?property .
?claimPredicate schema:description ?description .
?claimPredicate rdfs:label ?label .
FILTER(strstarts(str(?property),str(wdt:)))
FILTER(strstarts(str(?unknown),str(wd:)))
FILTER(LANG(?description) = "en").
FILTER(LANG(?label) = "en").
}
The problem is that my actual query takes a lot of time and it fails in the public Wikidata endpoint. Does anyone can provide me some hints to optimize such a query?
To be honest, I can't understand the aim of your query. I suppose you are interested in semantic similarity or something like.
Basically, you could reduce the number of joins, retrieving only unique wdt-predicates with nested SELECT DITINCT.
SELECT ?property ?claimPredicateLabel ?claimPredicateDescription
WHERE {
hint:Query hint:optimizer "None" .
{
SELECT DISTINCT ?property {
VALUES (?s) {(wd:Q963) (wd:Q42320)}
?s wdt:P31/^(wdt:P31|wdt:P279) ?instances .
?instances ?property ?unknown .
}
}
?claimPredicate wikibase:directClaim ?property .
SERVICE wikibase:label { bd:serviceParam wikibase:language "en" . }
}
Try it!
This is fast enough (~ 3s) even with SERVICE wikibase:label.
Also, you don't need FILTER(strstarts(str(?property),str(wdt:))) after ?claimPredicate wikibase:directClaim ?property.
As for hint:Query hint:optimizer "None", this hint forces Blazegraph to follow standard bottom-up evaluation order. In this particular query, hint:Query hint:optimizer "Runtime" or hint:SubQuery hint:runOnce true should also work.

Querying wikidata for "property constraint"

TL;DR
How to query (sparql) about properties of a property?
Or..
So as part of my project I need to find the properties in wikidata that have any time constraint, to be specific both "start time" and "end time".
I tried this query:
SELECT DISTINCT ?prop WHERE {
SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en". }
?person wdt:P31 wd:Q5.
?person ?prop ?statement.
?statement pq:P580 ?starttime.
?statement pq:P582 ?endtime.
}
LIMIT 200
**yeah the properties should be related to humans
Anyway, I do get some good results like:
http://www.wikidata.org/prop/P26
http://www.wikidata.org/prop/P39
But I also get some other properties that definitely wrong.
so, basically what i'm trying to do is to get a list of properties that has the property constraint (P2302) of- allowed qualifiers constraint (Q21510851) with Start time (P580) and End Time (P582)
is that even possible:
I tried some queries like:
SELECT DISTINCT ?property ?propertyLabel ?propertyDescription ?subpTypeOf ?subpTypeOfLabel
WHERE
{
?property rdf:type wikibase:Property .
?property wdt:P2302 ?subpTypeOf.
SERVICE wikibase:label { bd:serviceParam wikibase:language "en" }
}
but does not get the results I wanted.
is it even possible to query this kind of stuff?
Thanks
Qualifiers are used on property pages too. Your second query should be:
SELECT DISTINCT ?prop ?propLabel {
?prop p:P2302 [ ps:P2302 wd:Q21510851 ; pq:P2306 wd:P580, wd:P582 ] ;
p:P2302 [ ps:P2302 wd:Q21503250 ; pq:P2308 wd:Q5 ; pq:P2309 wd:Q21503252 ] .
SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en" }
} ORDER BY ASC(xsd:integer(strafter(str(?prop), concat(str(wd:), "P"))))
Try it!
Your first query is correct, but note that this is an 'as-is' query. For example, wd:P410 does not have respective constraints, but look at wd:Q83855.