wikidata sparql query timeout optimisation

wikidata sparql query timeout optimisation - sparql

I want to retrieve all instances of musicians (Q639669) in a given city (P131) born after 1900. When I pass in the wikidata example city Rotterdam (Q34370) it works. However, replacing the city with a larger city (e.g., Paris, Q90) it will timeout.
Is there a way to optimise this or split into chunks to make repeated queries?
I'm actually only interested the number of cases it returns (i.e. a single value), without needing all the metadata about the artist name, etc.
Would be really helpful if someone can give me pointers to solving this. Thanks!
SELECT ?itemLabel ?itemDescription ?birth
WHERE {
?item wdt:P106/wdt:P279* wd:Q639669 .
?item wdt:P19/wdt:P131* wd:Q34370 .
OPTIONAL {?item wdt:P569 ?birth}
filter (?birth > "1900-01-01"^^xsd:dateTime)
SERVICE wikibase:label {
bd:serviceParam wikibase:language "en" .
}
}

The * (ZeroOrMore) property path operator used on the P131 ("located in the administrative territorial entity") is one of the culprits here. A simple approach to getting an answer would be to manually run queries that build up that property path one element at a time:
In query 1: ?item wdt:P19 wd:Q34370 .
In query 2: ?item wdt:P19/wdt:P131 wd:Q34370 .
In query 3: ?item wdt:P19/wdt:P131/wdt:P131 wd:Q34370 .
etc.
I found through experimentation that there is no data past 3 occurrences of P131. However, be aware that there are duplicates across these queries, because some people are listed as having birth places both "in Paris" and also in some sub-region of Paris (for example, Claude Arrieu (Q272886) listed as being born in both Paris and the 8th arrondissement).
You can also use UNION to put several of these property paths together into a single query, though be aware that this may increase the query time and move you back towards a timeout depending on the data:
SELECT ?item ?itemLabel ?itemDescription ?birth WHERE {
?item wdt:P106 / wdt:P279 * wd:Q639669 .
{
?item wdt:P19 wd:Q90 .
} UNION {
?item wdt:P19 / wdt:P131 wd:Q90 .
} UNION {
?item wdt:P19 / wdt:P131 / wdt:P131 wd:Q90 .
} UNION {
?item wdt:P19 / wdt:P131 / wdt:P131 / wdt:P131 wd:Q90 .
} UNION {
?item wdt:P19 / wdt:P131 / wdt:P131 / wdt:P131 / wdt:P131 wd:Q90 .
}
OPTIONAL {
?item wdt:P569 ?birth
}
FILTER(?birth > "1900-01-01"^^xsd:dateTime)
SERVICE wikibase:label
{
bd:serviceParam wikibase:language "en" .
}
}
A couple of other comments:
If you only want the count of people, you can replace the SELECT variables with a count, which may improve the runtime a bit: SELECT (COUNT(DISTINCT ?item) AS ?count) WHERE { … } which returns the answer 1440.
The use of OPTIONAL to bind the ?birth variable combined with the FILTER outside of the OPTIONAL may not be what you want. The Filter will remove any results where ?birth is unbound, making the OPTIONAL really non-optional. Consider either removing the OPTIONAL and binding ?birth right next to the FILTER, or moving the FILTER inside the OPTIONAL to apply that date range filter only to people who have birth data (which changes the count from 1440 to 2456 – many musicians born in Paris missing birth dates, it seems!)

Related

How to relate p[s[n]] and w[d[t]] properties in wikidata sparql query?

I am trying to get all quantitative values of a given wd:* entry including the statements details (qualifiers?) - foremost the unit, since most numeric values are useless without it.
I was able to come up with the following query to get all quantitative values:
SELECT ?p ?property ?propertyLabel ?propertyDescription ?v WHERE {
wd:Q1726 ?p ?v.
?property wikibase:propertyType wikibase:Quantity;
wikibase:directClaim ?p.
SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en". }
}
ORDER BY ?v
p
property
propertyLabel
propertyDescription
v
wdt:P2046
wd:P2046
area
area occupied by an object
310.71
As you can see this is perfect to get the value of 310 km² as well as the labels and description of property you are looking for.
With the knowledge of the property being wd[t]:2046, I came up with a new query to get the value with quantifiers for it's best statement:
SELECT ?statement ?value ?valueLabel ?unit ?unitLabel WHERE {
wd:Q1726 p:P2046 ?statement.
?statement psn:P2046 ?valuenode.
?valuenode wikibase:quantityAmount ?value.
?valuenode wikibase:quantityUnit ?unit.
?statement a wikibase:BestRank.
SERVICE wikibase:label { bd:serviceParam wikibase:language "en". }
}
statement
value
valueLabel
unit
unitLabel
wds:Q1726-33fec614-441a-a6c4-06cd-b79490a52c4b
310710000
310710000
wd:Q25343
square metre
However I have no idea how to join these two queries to relate ?v of the first one with ?statement of the second one. In the end what I'm looking for is to add a unit column to the first query. All examples and explanations I've seen regarding quantifiers are searching for a given property and I'm struggling to understand how to relate statement with statement/value-normalized/ and statement/value/ or how to filter for a property to be one of those.

Wikidata SPARQL - Duplicate results for spouse start time and end time

I am trying to construct a query to return a list of actors and their spouses while including marriage and divorce dates for each couple. So I would expect to see each actor duplicate with each instance of a new relationship... however when I try and include the start time and end time properties in the query, I am getting duplicate results. I suspect this is because the "name" of the spouses and the is stored in a different wikidata prefix and I'm not grouping them correctly.
Here is a sample query:
SELECT ?person ?personLabel ?spouse ?spouseLabel ?starttime ?endtime
WHERE
{
?person wdt:P106 wd:Q33999, wd:Q2526255, wd:Q28389, wd:Q3282637;
wdt:P26 ?spouse.
?person p:P26 [pq:P580 ?starttime; pq:P582 ?endtime].
SERVICE wikibase:label { bd:serviceParam wikibase:language "en". }
}
ORDER BY ASC(UCASE(str(?personLabel)))
LIMIT 10
here is a link to the sparql interactive service so you can see the duped results I'm referring to:
https://query.wikidata.org/#SELECT%20%3Fperson%20%3FpersonLabel%20%3Fspouse%20%3FspouseLabel%20%3Fstarttime%20%3Fendtime%0AWHERE%0A%7B%0A%20%20%3Fperson%20wdt%3AP106%20wd%3AQ33999%2C%20wd%3AQ2526255%2C%20wd%3AQ28389%2C%20wd%3AQ3282637%3B%0A%20%20%20%20%20%20%20%20%20%20wdt%3AP26%20%3Fspouse.%0A%20%20%3Fperson%20p%3AP26%20%5Bpq%3AP580%20%3Fstarttime%3B%20pq%3AP582%20%3Fendtime%5D.%0A%20%20SERVICE%20wikibase%3Alabel%20%7B%20bd%3AserviceParam%20wikibase%3Alanguage%20%22en%22.%20%7D%0A%7D%0AORDER%20BY%20ASC%28UCASE%28str%28%3FpersonLabel%29%29%29%0ALIMIT%2010%0A
screencap of duped results

The problem with your query is that there was no link between the spouse and the statement about their marriage.
So for every actor, you are returning all their spouses, and also all the start/end dates of their marriages, regardless of whether they relate to the specific spouse.
What you need to do is to use the ps: namespace, like so:
SELECT ?person ?personLabel ?spouse ?spouseLabel ?starttime ?endtime
WHERE
{
?person wdt:P106 wd:Q33999, wd:Q2526255, wd:Q28389, wd:Q3282637 .
?person p:P26 [ ps:P26 ?spouse ; #This is the necessary change.
pq:P580 ?starttime;
pq:P582 ?endtime ].
SERVICE wikibase:label { bd:serviceParam wikibase:language "en". }
}
ORDER BY ASC(?personLabel)
LIMIT 10
In general, the wdt: namespace is for linking entities directly, the p: namespace links an entity to a statement, ps: links a statement to an entity, and pq: tells us something about the statement.

How to get pseudonyms from Wikidata?

Want to extract Bob Dylan pseudonyms. His Wikidata page is https://www.wikidata.org/wiki/Q392
One way is to walk the following tree and parse value for P742 (pseudonym): https://www.wikidata.org/w/api.php?action=wbgetentities&ids=Q392&format=json
But that's a lot of work and specific to Bob Dylan; other people with pseudonyms may have a different JSON structure.
This query works for his awards:
SELECT ?item ?itemLabel ?linkTo ?linkToLabel {
?item wdt:P166 wd:Q37922 ;
wdt:P910 wd:Q8064684 .
OPTIONAL {
?item wdt:P166 ?linkTo ;
wdt:P31 wd:Q5 } .
SERVICE wikibase:label { bd:serviceParam wikibase:language "en". }
}
as seen here
The weakness is it uses the combination of "topic's main category" (P910) and the target "category" (Q8064684) to specify "Bob Dylan".
For that query, I can't find the method to use his QID (Q392).
I do not want to text search, as is proposed in Group concat not working, because there is more than one Bob Dylan. And many people with pseudonyms will not have a category, so that technique cannot be generalized.
This query works for occupations and uses his QID (Q392), not the Bob Dylan category (Q8064684) from the "awards" query.
SELECT * {
SERVICE <https://query.wikidata.org/sparql> {
wd:Q392 wdt:P31 wd:Q5 ;
wdt:P106 ?occupation
SERVICE wikibase:label {
bd:serviceParam wikibase:language "en" .
?occupation rdfs:label ?label
}
}
}
as seen here
but I cannot find the right SPARQL query to select pseudonyms (which are text) as seen here:
https://www.wikidata.org/wiki/Q392
How to write a Wikidata SPARQL query that uses a person's identifier (e.g. Q392 for Bob Dylan) and can be configured (easily toggled) to select either "award received" (P166), "occupations" (P106) or "pseudonym" (P742)?

Wikidata query duplicates

Sorry if my english is bad, but I don't really have any place where I can ask this question in my native language.
I've been trying to create SPARQL query for Wikidata that should create a list of all horror fiction that was created in 1925-1950 years, names of authors and, if available, pictures:
SELECT DISTINCT ?item ?itemLabel ?author ?name ?creation ?picture
WHERE
{
?item wdt:P136 wd:Q193606 . # book
?item wdt:P50 ?author . # author
?item wdt:P577 ?creation .
?item wdt:P577 ?end .
?author rdfs:label ?name .
OPTIONAL{ ?item wdt:P18 ?picture }
FILTER (?creation >= "1925-01-01T00:00:00Z"^^xsd:dateTime) .
FILTER (?end <= "1950-12-31T23:59:59Z"^^xsd:dateTime) .
SERVICE wikibase:label
{
bd:serviceParam wikibase:language "en" .
}
}
However, for some reason this query placing duplicates in the list. DISTINCT doesn't do much. After some time I figured out that the reason is "?item rdfs:label ?name .". If this line is removed, no duplicates are listed. But I need this line to show author name in the list!
Any ideas on how to fix this?

You don't need to use ?item rdfs:label ?name . as you already get items labels as ?itemLabel thank to SERVICE wikibase:label.
Then, you will get duplicate results for every items that have a SELECTed property with possibly multiple values: here, you are SELECTing authors (P50), which will create duplicates for every item with several authors.

The query is actually giving you distinct items. The problem is that some items have multiple rdfs:labels. You can see as an example the item:
SELECT *
WHERE
{
wd:Q2882840 rdfs:label ?label
SERVICE wikibase:label
{
bd:serviceParam wikibase:language "en" .
}
}
And since there are multiple rdfs:label predicates for some items, they are showing up in separate rows.

You can aggregate your results according to the book title (the item's label) using the
group by
keyword.
Thus, every result will be a group which will show up once, and other fields which have different values, will be aggregated using the separator (in this case, a comma).
The fixed query:
SELECT DISTINCT ?item ?itemLabel
(group_concat(distinct ?author;separator=",") as ?author)
(group_concat(distinct ?name;separator=",") as ?name)
(group_concat(distinct ?creation;separator=",") as ?creation)
(group_concat(distinct ?picture;separator=",") as ?picture)
WHERE
{
?item wdt:P136 wd:Q193606 . # book
?item wdt:P50 ?author . # author
?item wdt:P577 ?creation .
?item wdt:P577 ?end .
?author rdfs:label ?name .
OPTIONAL{ ?item wdt:P18 ?picture }
FILTER (?creation >= "1925-01-01T00:00:00Z"^^xsd:dateTime) .
FILTER (?end <= "1950-12-31T23:59:59Z"^^xsd:dateTime) .
SERVICE wikibase:label
{
bd:serviceParam wikibase:language "en" .
}
}
group by ?item ?itemLabel

OR in sparql query

This sparql query on wikidata shows all places in Germany (Q183) with a name that ends in -ow or -itz.
I want to extend this to look for places in Germany and, say, Austria.
I tried modifying the 8th line to something like:
wdt:P17 (wd:Q183 || wd:Q40);
in order to look for places in Austria (Q40), but this is not a valid query.
What is a way to extend the query to include other countries?

Afaik there is no syntax as simple as that. You can, however, use UNION to the same effect like this:
SELECT ?item ?itemLabel ?coord
WHERE
{
?item wdt:P31/wdt:P279* wd:Q486972;
rdfs:label ?itemLabel;
wdt:P625 ?coord;
{?item wdt:P17 wd:Q183}
UNION
{?item wdt:P17 wd:Q40}
FILTER (lang(?itemLabel) = "de") .
FILTER regex (?itemLabel, "(ow|itz)$").
}
or as an alternative create a new variable containing both countries using VALUES:
SELECT ?item ?itemLabel ?coord
WHERE
{
VALUES ?country { wd:Q40 wd:Q183 }
?item wdt:P31/wdt:P279* wd:Q486972;
wdt:P17 ?country;
rdfs:label ?itemLabel;
wdt:P625 ?coord;
FILTER (lang(?itemLabel) = "de") .
FILTER regex (?itemLabel, "(ow|itz)$").
}

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

wikidata sparql query timeout optimisation - sparql

Related

How to relate p[s[n]] and w[d[t]] properties in wikidata sparql query?

Wikidata SPARQL - Duplicate results for spouse start time and end time

How to get pseudonyms from Wikidata?

Wikidata query duplicates

OR in sparql query

Categories

Resources