Timeout when retrieving results from Wikidata

Timeout when retrieving results from Wikidata - sparql

This query for retrieving movie posters from Wikidata results in "Query timeout limit reached" errors on query.wikidata.org:
#defaultView:ImageGrid
SELECT ?item ?itemLabel ?pic ?fileTitle ?width ?height
WHERE
{
?item wdt:P31/wdt:P279* wd:Q11424 .
?item wdt:P3383 ?pic .
BIND(STRAFTER(wikibase:decodeUri(STR(?pic)), "http://commons.wikimedia.org/wiki/Special:FilePath/") AS ?fileTitle)
SERVICE wikibase:label {
bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en"
}
SERVICE wikibase:mwapi {
bd:serviceParam wikibase:endpoint "commons.wikimedia.org";
wikibase:api "Generator";
wikibase:limit "once";
mwapi:generator "allpages";
mwapi:gapfrom ?fileTitle;
mwapi:gapnamespace 6; # NS_FILE
mwapi:gaplimit 1;
mwapi:prop "imageinfo";
mwapi:iiprop "dimensions".
?size wikibase:apiOutput "imageinfo/ii/#size".
?width wikibase:apiOutput "imageinfo/ii/#width".
?height wikibase:apiOutput "imageinfo/ii/#height".
}
}
ORDER BY ?item
LIMIT 100
OFFSET 0
Setting a low limit works, with fewer than 100 results, and removing ORDER BY. However, ORDER BY is necessary to ensure that subsequent queries retrieve all results, and adding ORDER BY ?item throws a timeout error again. Ideally, I would like to filter by posters with a minimum width, but adding FILTER(?width>1500) times out no matter how low the limit.
Any suggestions for optimising the query to work consistently and reliably? Even if it only returns a single value, the query can be repeated until all values have been retrieved.

Related

Wikidata limit query to specific property constraints

I'm trying to list video games along with the data size (P3575). I want to limit the results to only show those in megabytes (Q79735) as a property constraint of data size.
Here is what I have now, which lists everything (mainly Megabytes, Gigabytes, block):
SELECT ?item ?itemLabel ?Size ?dataSizeLabel WHERE {
?item wdt:P31 wd:Q7889.
?item p:P3575 ?nodedataSize.
?nodedataSize psv:P3575 ?valuenodedataSize.
?valuenodedataSize wikibase:quantityAmount ?Size;
wikibase:quantityUnit ?dataSize.
SERVICE wikibase:label { bd:serviceParam wikibase:language "en". }
}
If I add:
?item wdt:P2305 wd:Q79735
then I get no results.

How to get only the first value from an optional property?

Like in the SQL aggregate MAX, MIN or FIRST, it gets only one value, not duplicating lines.
Real Wikidata case
Where the OPTIONAL clause expands from 253 to 257 lines:
# Countries and its codes
SELECT ?code ?item ?itemLabel ?osmId
WHERE
{
?item wdt:P297 ?code.
OPTIONAL{?item wdt:P402 ?osmId .}
SERVICE wikibase:label { bd:serviceParam wikibase:language "en". }
}
ORDER BY ?code
try here
I need only one (any) osmId. How to do something like FIRST{OPTIONAL{?item wdt:P402 ?osmId .}} ?
NOTES:
it is not a duplicate of How to get only the most recent value from a Wikidata property?
it is not a duplicate of Why does this Wikidata SPARQL query only work for the first element in a list?
... no exactly need for simple "any first".

Here a WIKI answer (please you can edit to enhance this answer!)
# Countries and its codes
SELECT ?code ?item ?itemLabel
(MAX(?osmId) as ?osmId_max) (COUNT(?code) as ?osmId_n)
WHERE
{
?item wdt:P297 ?code.
OPTIONAL{?item wdt:P402 ?osmId .}
SERVICE wikibase:label { bd:serviceParam wikibase:language "en". }
}
GROUP BY ?code ?item ?itemLabel
ORDER BY ?code
try
The COUNT(?code) is only to check the lines where osmId was not an Unique-ID.
Other simple solution to filter only the first option?
Using SAMPLE
As the #ValerioCocchi suggestion, we can use SAMPLE instead MAX:
SELECT ?code ?item ?itemLabel (SAMPLE(?osmId) as ?osmId_sample)
WHERE
{
?item wdt:P297 ?code.
OPTIONAL{?item wdt:P402 ?osmId .}
SERVICE wikibase:label { bd:serviceParam wikibase:language "en". }
}
GROUP BY ?code ?item ?itemLabel
ORDER BY ?code
try
SAMPLE use a little bit less CPU-time, but the main motivation to use is when you don’t care which value is returned. In the case of Wikidata, when the property-value is to be unique but there are some (minimal) errors, and you can ignore them.
NOTE about the osmId: the advantage of MAX in this particular query, using an numeric ID related to a temporal sequence, is that it can be a "fresher" ID... But in OpenStreetMap (OSM) the strategy can be the inverse: most old is the most stable ID. So, SAMPLE make sense also in a context of ignorance about better strategy.
Using FILTER
The #StanislavKralin suggestion:
SELECT ?code ?item ?itemLabel ?osmId
WHERE
{
?item wdt:P297 ?code.
OPTIONAL{
?item wdt:P402 ?osmId
FILTER NOT EXISTS {
?item wdt:P402 ?osmId, ?osmId_ .
FILTER (?osmId_ > ?osmId)
}
}
SERVICE wikibase:label { bd:serviceParam wikibase:language "en". }
}
ORDER BY ?code
try
Seems more verbose.

SPARQL query for finding films originating from and released in the United States

I have the following SPARQL query that appears to correctly produce the films produced in the US (country of origin) and released in the US (place of publication) in 2018. The issue I'm having is that one row is produced for each release even though the other releases are outside of the US. I've added a limit to reduce the size of the response.
Here is the query:
SELECT ?item ?name ?publication_date ?placeLabel WHERE {
SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en". }
?item rdfs:label ?name;
wdt:P31 wd:Q11424;
wdt:P495 wd:Q30; # -> country of origin US
wdt:P577 ?publication_date.
?item p:P577 ?publication_statement.
?publication_statement pq:P291 ?place.
FILTER(xsd:date(?publication_date) > "2018-01-01"^^xsd:date)
FILTER(
(LANG(?name)) = "en"
&& ?place=wd:Q30) # -> place of publication
}
ORDER BY ?name
LIMIT 10
I would like to change it so that it produces one row per movie IF it had a release in the US in 2018.
Thanks for your help. Comments on the use of FILTER or other non idiomatic SPARQL are also welcome.

You can use GROUP BY:
SELECT ?item (SAMPLE(?name) as ?Name) (SAMPLE(?publication_date) as ?Date) WHERE {
SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en". }
?item rdfs:label ?name;
wdt:P31 wd:Q11424;
wdt:P495 wd:Q30; # -> country of origin US
wdt:P577 ?publication_date.
?item p:P577 ?publication_statement.
?publication_statement pq:P291 ?place.
FILTER(xsd:date(?publication_date) > "2018-01-01"^^xsd:date)
FILTER(
(LANG(?name)) = "en"
&& ?place=wd:Q30) # -> place of publication
}
GROUP BY ?item
ORDER BY ?Name
LIMIT 10
See this query on Wikidata.
And you need to fix the SELECT line as you can't pass out the indeterminate non-group keys without explicitly saying. See similar question.

Querying wikidata for "property constraint"

TL;DR
How to query (sparql) about properties of a property?
Or..
So as part of my project I need to find the properties in wikidata that have any time constraint, to be specific both "start time" and "end time".
I tried this query:
SELECT DISTINCT ?prop WHERE {
SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en". }
?person wdt:P31 wd:Q5.
?person ?prop ?statement.
?statement pq:P580 ?starttime.
?statement pq:P582 ?endtime.
}
LIMIT 200
**yeah the properties should be related to humans
Anyway, I do get some good results like:
http://www.wikidata.org/prop/P26
http://www.wikidata.org/prop/P39
But I also get some other properties that definitely wrong.
so, basically what i'm trying to do is to get a list of properties that has the property constraint (P2302) of- allowed qualifiers constraint (Q21510851) with Start time (P580) and End Time (P582)
is that even possible:
I tried some queries like:
SELECT DISTINCT ?property ?propertyLabel ?propertyDescription ?subpTypeOf ?subpTypeOfLabel
WHERE
{
?property rdf:type wikibase:Property .
?property wdt:P2302 ?subpTypeOf.
SERVICE wikibase:label { bd:serviceParam wikibase:language "en" }
}
but does not get the results I wanted.
is it even possible to query this kind of stuff?
Thanks

Qualifiers are used on property pages too. Your second query should be:
SELECT DISTINCT ?prop ?propLabel {
?prop p:P2302 [ ps:P2302 wd:Q21510851 ; pq:P2306 wd:P580, wd:P582 ] ;
p:P2302 [ ps:P2302 wd:Q21503250 ; pq:P2308 wd:Q5 ; pq:P2309 wd:Q21503252 ] .
SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en" }
} ORDER BY ASC(xsd:integer(strafter(str(?prop), concat(str(wd:), "P"))))
Try it!
Your first query is correct, but note that this is an 'as-is' query. For example, wd:P410 does not have respective constraints, but look at wd:Q83855.

Recover the "original" order

I am trying to recover the cast list for movies from wikidata.
My SPARQL query for Dr. No is as follows:
SELECT ?actor ?actorLabel WHERE {
?movie wdt:P161 ?actor .
SERVICE wikibase:label { bd:serviceParam wikibase:language "en". }
FILTER(?movie = wd:Q102754)
}
LIMIT 1000
I can try it out at query.wikidata.org but the results are not in the order that I want. It gives 'Sean Connery', 'Zena Marshall', 'Ursula Andress'.
The database has the data in the required order as you can see from https://www.wikidata.org/wiki/Q102754 includes the cast list in order (Sean Connery, Ursula Andress, Joseph Wiseman). Generally the cast list is given in billing order and it is that that I want to recover.

SPARQL provides ordering of results by using ORDER BY, see here
The ordering in your example is based on the number of references of a statement. Here is a non-optimized version that does what you want:
SELECT ?actor ?actorLabel WHERE {
?movie p:P161 ?statement .
?statement ps:P161 ?actor .
OPTIONAL {?statement prov:wasDerivedFrom ?ref . }
SERVICE wikibase:label { bd:serviceParam wikibase:language "en". }
FILTER(?movie = wd:Q102754)
}
group by ?movie ?actor ?actorLabel
ORDER BY DESC(count(?ref)) ASC(?actorLabel)
LIMIT 1000

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Timeout when retrieving results from Wikidata - sparql

Related

Wikidata limit query to specific property constraints

How to get only the first value from an optional property?

SPARQL query for finding films originating from and released in the United States

Querying wikidata for "property constraint"

Recover the "original" order

Categories

Resources