How can I return a table of basic physical objects, e.g., Ball (Q18545), Arrow (Q45922), in Wikidata using SPARQL?
I'm not able to directly return objects with the property Physical Object (Q223557) because it has way too many records. But its subtypes, e.g., Toy (Q11422) or Projectile (Q49393), are too narrow for me. I've tried the following to get my broad query working:
removing the label service
using LIMIT for a moderate number of records
filtering out records with very few sitelinks
limiting the objects to those with ids from BNCF Thesaurus ID, BabelNet ID, etc.
Nothing has worked for me. I suspect this is straightforward for anyone who's had more than a few days with Wikidata. Please help.
I shared my wrecked query below.
SELECT ?obj #?objLabel
WHERE {
{
SELECT ?obj WHERE {
?obj wdt:P508 ?bncfid;
wdt:P2581 ?bnid;
wdt:P227 ?gndid;
wdt:P8814 ?wsid;
wdt:P18 ?image;
wikibase:sitelinks ?sitelinks;
wdt:P31/wdt:P279* wd:Q223557.
#FILTER(?sitelinks > 5).
#FILTER(LANG(?objLabel)="en").
#SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en".
# ?obj rdfs:label ?objLabel}
}
LIMIT 1000
}
#?obj rdfs:label ?objLabel
#FILTER(LANG(?objLabel)="en").
}
(Run by clicking here)
Related
I am trying to get filtered data from the Wikidata API, currently I can do a general search using this API, but now there have been specific cases where I have to filter this information, for example, I need to get a list of only authors to get the Q identifier and although I also reviewed the Wikidata Query Service this is too heavy to bring all the items, I used a SPARQL query and did a test and to get less than 3000 results it took 26 seconds, this is too much for a search service.
This is the query I use to get the authors.
SELECT DISTINCT ?author ?authorLabel WITH {
SELECT ?item ?author WHERE {
?item wdt:P50 ?author.
} LIMIT 100000
} AS %FOO {
INCLUDE %FOO
SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en". }
}
I also need to search by categories but it has not been possible for me to filter the searches in any way, does anyone know a way to do it?
First off, I'm not a developer, and I'm new to writing SPARQL queries. Mostly I've been looking up existing queries and trying to tweak them to get what I need. The issue is that most documentation on query construction have to do with getting new data you don't have, rather than retrieving or extending existing data. And when you do find tips for retrieving existing data, they tend to be for ONE item at a time instead of a full data set of many items.
I mostly use OpenRefine for this. I start by loading up my existing list of names, and used the Wikidata extension service to reconcile the names to existing Wikidata IDs. So now, this is where I am, vs. where I want to go:
1 - We have a list of Wikidata IDs for reconciled matches;
2 - We have used OpenRefine to get most of the data we need from those;
3 - We don't have the label, description, or Wikipedia links (English), which are extremely valuable;
4 - I have figured out how to construct a query for the label and description of just ONE Wikidata Item:
SELECT ?itemLabel ?itemDescription WHERE { VALUES ?item {
wd:Q15485689 } SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en". }
}
5 - I have figured out how to construct a query to extract the Wikipedia English URL for just ONE Wikidata item:
SELECT ?article ?lang ?name WHERE {
?article schema:about wd:Q15485689;
schema:inLanguage ?lang;
schema:name ?name;
schema:isPartOf _:b13.
_:b13 wikibase:wikiGroup "wikipedia".
FILTER(?lang IN("en"))
FILTER(!(CONTAINS(?name, ":")))
OPTIONAL { ?article wdt:P31 ?instance_of. }
}
The questions are:
How do I modify either query to generate these same results for MORE THAN ONE* Wikidata item?
How do I modify the query to give me all three at once, for more than one* Wikidata item?
*we have 667, but I could do smaller batches if that's too much for the service to handle
Ideally, the query would generate something that allowed me to download a CSV file looking much like this (so I can match on and import the new data into our Airtable base which feeds the website application):
ideal CSV output
If anyone can lead me in the right direction here, I'd appreciate it.
I should also note that if OpenRefine has a way of retrieving these I'm all ears! But since these three don't have a property code, I couldn't see how to snag them from OR.
This sort of thing. See how many QIds you can get away with in the values statement. All of them in one go, probably. This query gives you the URL and the article title; clearly, you can snip the article title column if you do not want it. Note also https://www.wikidata.org/wiki/Wikidata:Request_a_query which is wikidata's own location for questions such as these.
SELECT ?item ?itemLabel ?itemDescription ?sitelink ?article
WHERE
{
VALUES ?item {wd:Q105848230 wd:Q6697407 wd:Q2344502 wd:Q1698206}
OPTIONAL {
?article schema:about ?item ;
schema:isPartOf <https://en.wikipedia.org/> ;
schema:name ?sitelink .
}
SERVICE wikibase:label { bd:serviceParam wikibase:language "en". }
}
Yes, a VALUES statement in SPARQL can relay not only hundreds but even thousands of items. I regularly do this when cross-checking to see how Wikidata matches up to an existing data set. Some other things you could do as well that take lists of Wikidata items:
Petscan - https://petscan.wmflabs.org/
TABernacle - https://tabernacle.toolforge.org/
I am trying to form an efficient filter query in SPARQL on Wikidata. Let me explain my process:
I query the search-entities API using key words e.g. (Apple, Orange)
The API query returns a list of relevant item ID's e.g. (wd:Q629269, wd:Q154950, wd:Q312, wd:Q95, wd:Q4878289, wd:Q10817602)
With this list of ID's, I then query SPARQL and to return items that are CLASS or are SUBLCASS of certain types e.g. (p:P31/ps:P31/wdt:P279* wd:Q43229) - which returns everything if it is an Organisation or subclass thereof.
Then for items in the list of ID's, that are of certain CLASS, return information items if they exists e.g. (OPTIONAL).
I am new to SPARQL. My Question is, is this the most efficient method to achieve this output? It seems to me to be quite inefficient and I cannot find a similar type of problem in the tutorial examples.
You can try the query here.
SELECT distinct ?item ?itemLabel ?itemDescription ?web ?inception ?ISIN
WHERE{
FILTER (?item IN (wd:Q629269, wd:Q154950, wd:Q312, wd:Q95, wd:Q4878289, wd:Q10817602))
?item p:P31/ps:P31/wdt:P279* wd:Q43229.
OPTIONAL {
?item wdt:P856 ?web. # get item-web
?item wdt:P571 ?inception. # get item-web
?item wdt:P946 ?ISIN. # get item-isin
}
SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE]". }
}
LIMIT 10
What I am trying to do is to get properties with quantity property types of a certain class (such as City, Country, Human, River, Region, Mountain, etc). I tried several classes like Country (wd:Q6256) works okay with the query below, but many other classes makes the query to exeed time limit. How can I achieve the result optimizing the query below? or is there any other way to get the properties of Quantity type in a certain class?
SELECT DISTINCT ?p_ ?pLabel ?pAltLabel
WHERE {
VALUES (?class) {(wd:Q515)}
?x ?p_ [].
?x p:P31/ps:P31 ?class.
?p wikibase:claim ?p_.
?p wikibase:directClaim ?pwdt.
?p wikibase:propertyType ?pType.
FILTER (?pType = wikibase:Quantity)
SERVICE wikibase:label { bd:serviceParam wikibase:language "ko,en". }
}
Attempt 1: Optimizing the query
Some observations:
Instead of p:P31/ps:P31, you could use wdt:P31 which is faster by avoiding the two-property hop, but finds only the truthy statements
The expensive part is the call to the label service at the end, as can be seen by commenting that line out by placing # at the start of the line
The query retrieves every claim on every city (many!), gets the properties of the claims (few!), and only removes the duplicates in the end (with DISTINCT)
As a result, the label service is called many times for the same property, once per claim! This is the big problem with the query
This can be avoided by moving the retrieval of properties with the DISTINCT into a subquery, and calling the label service only at the end on the few properties
After that change it should be fast, but is still slow because the query optimiser seems to evaluate the query in the wrong order. Following hints from this page, we can turn the query optimiser off.
This works for me:
SELECT ?p ?pLabel ?pAltLabel {
hint:Query hint:optimizer "None" .
{
SELECT DISTINCT ?p_ {
VALUES ?class { wd:Q515 }
?x wdt:P31 ?class.
?x ?p_ [].
}
}
?p wikibase:claim ?p_.
?p wikibase:propertyType ?pType.
FILTER (?pType = wikibase:Quantity)
SERVICE wikibase:label { bd:serviceParam wikibase:language "en". }
}
Attempt 2: Splitting the task into multiple queries
Having learned that the approach above doesn't work for some of the biggest categories (like wd:Q5, “human”), I tried a different approach. This will get all results, but not in a single query. It requires sending ~25 individual queries and combining the results afterwards:
We start by listing the quantity properties. There are, as of today, 503 of them.
We want to keep only those properties that are actually used on an item of type “human”.
Because that check is so slow (it needs to look at millions of items), we start by only checking the first 20 properties from our list.
In the second query, we're going to check the next 20, and so on.
This is the query that tests the first 20 properties:
SELECT DISTINCT ?p ?pLabel ?pAltLabel {
hint:Query hint:optimizer "None" .
{
SELECT ?p ?pLabel ?pAltLabel {
?p wikibase:propertyType wikibase:Quantity.
SERVICE wikibase:label { bd:serviceParam wikibase:language "en". }
}
OFFSET 0 LIMIT 20
}
?p wikibase:claim ?p_.
?x ?p_ [].
?x wdt:P31 wd:Q5.
}
Increase the OFFSET to 20, 40, 60 and so on, up to 500, to test all properties.
Absolute Wikidata and SPARQL beginner here. I am trying to find out the Q code of a particular female name, say Jennifer. I can get it with a query like this:
SELECT ?name WHERE {
?name wdt:P31 wd:Q11879590.
?name rdfs:label ?label.
FILTER((STR(?label)) = "Jennifer")
SERVICE wikibase:label { bd:serviceParam wikibase:language "en". }
}
LIMIT 1
That is, I look up entities that are instance of "female given name" and then filter to those with "Jennifer" in the label. It works, but it takes 5 s or more.
If I omit the LIMIT 1 I get many instances of the same results, which signals to me that I am doing something stupid.
Bottom line, is there an efficient way to look up the Q code for a "female given name"?