Wikidata SPARQL - Duplicate results for spouse start time and end time - sparql

I am trying to construct a query to return a list of actors and their spouses while including marriage and divorce dates for each couple. So I would expect to see each actor duplicate with each instance of a new relationship... however when I try and include the start time and end time properties in the query, I am getting duplicate results. I suspect this is because the "name" of the spouses and the is stored in a different wikidata prefix and I'm not grouping them correctly.
Here is a sample query:
SELECT ?person ?personLabel ?spouse ?spouseLabel ?starttime ?endtime
WHERE
{
?person wdt:P106 wd:Q33999, wd:Q2526255, wd:Q28389, wd:Q3282637;
wdt:P26 ?spouse.
?person p:P26 [pq:P580 ?starttime; pq:P582 ?endtime].
SERVICE wikibase:label { bd:serviceParam wikibase:language "en". }
}
ORDER BY ASC(UCASE(str(?personLabel)))
LIMIT 10
here is a link to the sparql interactive service so you can see the duped results I'm referring to:
https://query.wikidata.org/#SELECT%20%3Fperson%20%3FpersonLabel%20%3Fspouse%20%3FspouseLabel%20%3Fstarttime%20%3Fendtime%0AWHERE%0A%7B%0A%20%20%3Fperson%20wdt%3AP106%20wd%3AQ33999%2C%20wd%3AQ2526255%2C%20wd%3AQ28389%2C%20wd%3AQ3282637%3B%0A%20%20%20%20%20%20%20%20%20%20wdt%3AP26%20%3Fspouse.%0A%20%20%3Fperson%20p%3AP26%20%5Bpq%3AP580%20%3Fstarttime%3B%20pq%3AP582%20%3Fendtime%5D.%0A%20%20SERVICE%20wikibase%3Alabel%20%7B%20bd%3AserviceParam%20wikibase%3Alanguage%20%22en%22.%20%7D%0A%7D%0AORDER%20BY%20ASC%28UCASE%28str%28%3FpersonLabel%29%29%29%0ALIMIT%2010%0A
screencap of duped results

The problem with your query is that there was no link between the spouse and the statement about their marriage.
So for every actor, you are returning all their spouses, and also all the start/end dates of their marriages, regardless of whether they relate to the specific spouse.
What you need to do is to use the ps: namespace, like so:
SELECT ?person ?personLabel ?spouse ?spouseLabel ?starttime ?endtime
WHERE
{
?person wdt:P106 wd:Q33999, wd:Q2526255, wd:Q28389, wd:Q3282637 .
?person p:P26 [ ps:P26 ?spouse ; #This is the necessary change.
pq:P580 ?starttime;
pq:P582 ?endtime ].
SERVICE wikibase:label { bd:serviceParam wikibase:language "en". }
}
ORDER BY ASC(?personLabel)
LIMIT 10
In general, the wdt: namespace is for linking entities directly, the p: namespace links an entity to a statement, ps: links a statement to an entity, and pq: tells us something about the statement.

Related

How to relate p[s[n]] and w[d[t]] properties in wikidata sparql query?

I am trying to get all quantitative values of a given wd:* entry including the statements details (qualifiers?) - foremost the unit, since most numeric values are useless without it.
I was able to come up with the following query to get all quantitative values:
SELECT ?p ?property ?propertyLabel ?propertyDescription ?v WHERE {
wd:Q1726 ?p ?v.
?property wikibase:propertyType wikibase:Quantity;
wikibase:directClaim ?p.
SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en". }
}
ORDER BY ?v
p
property
propertyLabel
propertyDescription
v
wdt:P2046
wd:P2046
area
area occupied by an object
310.71
As you can see this is perfect to get the value of 310 km² as well as the labels and description of property you are looking for.
With the knowledge of the property being wd[t]:2046, I came up with a new query to get the value with quantifiers for it's best statement:
SELECT ?statement ?value ?valueLabel ?unit ?unitLabel WHERE {
wd:Q1726 p:P2046 ?statement.
?statement psn:P2046 ?valuenode.
?valuenode wikibase:quantityAmount ?value.
?valuenode wikibase:quantityUnit ?unit.
?statement a wikibase:BestRank.
SERVICE wikibase:label { bd:serviceParam wikibase:language "en". }
}
statement
value
valueLabel
unit
unitLabel
wds:Q1726-33fec614-441a-a6c4-06cd-b79490a52c4b
310710000
310710000
wd:Q25343
square metre
However I have no idea how to join these two queries to relate ?v of the first one with ?statement of the second one. In the end what I'm looking for is to add a unit column to the first query. All examples and explanations I've seen regarding quantifiers are searching for a given property and I'm struggling to understand how to relate statement with statement/value-normalized/ and statement/value/ or how to filter for a property to be one of those.

How can I group multiple results into one cell with SPARQL in Wikidata

I'm trying to pull (lots of) data for one of my projects.
Specifically trying to get some data on biblical figures.
However, I've noticed that when there are mutiple results per column, I get the results in a new raw. Meaning, there is no option to put multiple results in one row , with a seperator for example.
For example, since some biblical figures have more than one sibling, I get the results in mutpile rows:
Here's an example for a query with siblings
I tried to group by but got an error:
select ?person ?personLabel ?siblingLabel (GROUP_CONCAT(?personLabel) AS ?personLabels)
where {
?person wdt:P31 wd:Q20643955.
?person wdt:P3373 ?sibling.
SERVICE wikibase:label {
bd:serviceParam wikibase:language "en" .
}
}
GROUP BY ?person
ORDER BY ?personLabel
If you want to have all siblings in a cell, you have to use GROUP_CONCAT on ?siblingLabel, not ?personLabel. To omit duplicate labels, you can add DISTINCT to it. To use a delimiter (e.g., a semicolon), you can add SEPARATOR to it.
(GROUP_CONCAT(DISTINCT ?siblingLabel; SEPARATOR="; ") AS ?siblingLabels)
To the GROUP BY you have to add all other variables.
As you are getting the labels with Wikidata’s label service, one more step is needed: You either have to use a sub-query, or you have to list the labels you need in the SERVICE.
Using the latter, your query could be:
SELECT ?person ?personLabel (GROUP_CONCAT(DISTINCT ?siblingLabel; SEPARATOR="; ") AS ?siblingLabels)
WHERE {
?person wdt:P31 wd:Q20643955 ;
wdt:P3373 ?sibling .
SERVICE wikibase:label {
bd:serviceParam wikibase:language "en" .
?sibling rdfs:label ?siblingLabel .
?person rdfs:label ?personLabel .
}
}
GROUP BY ?person ?personLabel
ORDER BY ?personLabel

How to get pseudonyms from Wikidata?

Want to extract Bob Dylan pseudonyms. His Wikidata page is https://www.wikidata.org/wiki/Q392
One way is to walk the following tree and parse value for P742 (pseudonym): https://www.wikidata.org/w/api.php?action=wbgetentities&ids=Q392&format=json
But that's a lot of work and specific to Bob Dylan; other people with pseudonyms may have a different JSON structure.
This query works for his awards:
SELECT ?item ?itemLabel ?linkTo ?linkToLabel {
?item wdt:P166 wd:Q37922 ;
wdt:P910 wd:Q8064684 .
OPTIONAL {
?item wdt:P166 ?linkTo ;
wdt:P31 wd:Q5 } .
SERVICE wikibase:label { bd:serviceParam wikibase:language "en". }
}
as seen here
The weakness is it uses the combination of "topic's main category" (P910) and the target "category" (Q8064684) to specify "Bob Dylan".
For that query, I can't find the method to use his QID (Q392).
I do not want to text search, as is proposed in Group concat not working, because there is more than one Bob Dylan. And many people with pseudonyms will not have a category, so that technique cannot be generalized.
This query works for occupations and uses his QID (Q392), not the Bob Dylan category (Q8064684) from the "awards" query.
SELECT * {
SERVICE <https://query.wikidata.org/sparql> {
wd:Q392 wdt:P31 wd:Q5 ;
wdt:P106 ?occupation
SERVICE wikibase:label {
bd:serviceParam wikibase:language "en" .
?occupation rdfs:label ?label
}
}
}
as seen here
but I cannot find the right SPARQL query to select pseudonyms (which are text) as seen here:
https://www.wikidata.org/wiki/Q392
How to write a Wikidata SPARQL query that uses a person's identifier (e.g. Q392 for Bob Dylan) and can be configured (easily toggled) to select either "award received" (P166), "occupations" (P106) or "pseudonym" (P742)?

Bad aggregate when adding extra items to select wikidata sparql

I am looking to retrieve data for a given location.
So using the below I am able to retrieve the bordering locations of France and Scotland.
SELECT (GROUP_CONCAT(?borderLabel;separator=",") AS ?borders)
WHERE {
?location wdt:P47 ?border.
?location wdt:P2046 ?area.
?location wdt:P1082 ?population.
FILTER (?location=wd:Q142 || ?location=wd:Q22)
SERVICE wikibase:label {
bd:serviceParam wikibase:language "en".
?border rdfs:label ?borderLabel
}
}
GROUP BY ?location
But as soon as I add anything to the SELECT, ie. SELECT ?locationLabel (GROUP_CONCAT(?borderLabel;separator=",") AS ?borders) it tells me Query is malformed: Bad aggregate
I am however able to add ?location to return the wd entity value without issue.
What is wrong here?

Retrieving data from blank nodes in Wikidata

I am attempting to retrieve data about the lifespans of certain people. This is problematic in cases of people that have lived a while ago. The dataset for e.g. Pythagoras seems to have a so called "blank node" for date of birth (P569). But this blank node references another node earliest date (P1319) which has data I could work with just fine.
But for some reason I am not able to retrieve that node. My first try looked like this, but somehow that results in a completly empty result set:
SELECT DISTINCT ?person ?name ?dateofbirth ?earliestdateofbirth WHERE {
?person wdt:P31 wd:Q5. # This thing is Human
?person rdfs:label ?name. # Name for better conformation
?person wdt:P569 ?dateofbirth. # Birthday may result in a blank node
?dateofbirth wdt:P1319 ?earliestdateofbirth # Problem: Plausbible Birth
}
I then found another Syntax that suggested using ?person wdt:P569/wdt:P1319 ?earliestdateofbirth as some kind of "shortcut"-syntax for the explicit navigation I did above but this also ends with a empty result set.
SELECT DISTINCT ?person ?name ?dateofbirth ?earliestdateofbirth WHERE {
?person wdt:P31 wd:Q5. # Is Human
?person rdfs:label ?name. # Name for better conformation
?person wdt:P569/wdt:P1319 ?earliestdateofbirth.
}
So how do I access a node referenced by a blank node (in my case specifically the earliest birthdate) in Wikidata?
But this blank node references another node…
Things are slightly different. The earliest date property is not a property of _:t550690019, but rather is a property of the statement wd:Q10261 wdt:P569 _:t550690019.
In the Wikidata data model, these annotations are expressed using qualifiers.
Your query should be:
SELECT DISTINCT ?person ?name ?dateofbirth ?earliestdateofbirth WHERE {
VALUES (?person) {(wd:Q10261)}
?person wdt:P31 wd:Q5. # --Is human
?person rdfs:label ?name. # --Name for better conformation
?person p:P569/pq:P1319 ?earliestdateofbirth.
FILTER (lang(?name) = "en")
}
Try it!
By the way, time precision (which is used when date of birth is known) is yet another qualifier:
SELECT ?person ?personLabel ?value ?precisionLabel {
VALUES (?person) {(wd:Q859) (wd:Q9235)}
?person wdt:P31 wd:Q5 ;
p:P569/psv:P569 [ wikibase:timeValue ?value ;
wikibase:timePrecision ?precisionInteger ]
{
SELECT ?precision (xsd:integer(?precisionDecimal) AS ?precisionInteger) {
?precision wdt:P2803 ?precisionDecimal .
}
}
SERVICE wikibase:label { bd:serviceParam wikibase:language "en" }
}
Try it!