I would like to get all the members of a specific category from Wikidata. For example, I would like to get all the films (instances of film: P31 Q11424) from the category "Category:Films set in Stockholm" (Q7519614).
However, I can't seem to find what the relationship would be. DBpedia uses "subject of" but the Wikidata equivalent (P805) doesn't return any results.
I also thought I could bootstrap my way to the answer with this query, but to no avail:
SELECT ?s ?p ?pLabel WHERE {
?s ?p wd:Q7519614.
SERVICE wikibase:label { bd:serviceParam wikibase:language "en". }
}
May be an indirect answer, but worth trying
When you look for the property related to an entity like Q7519614, it's often worth trying https://www.wikidata.org/wiki/Special:WhatLinksHere/Q7519614 (What links here?)
In this case the answer is empty which means there IS NO relation encoded in WIKIDATA for this information. (it means you need to rely on 3rd parties tool to access WIKIPEDIA information)
The second way to see your question is also encoded with P360 (is a list of)
In this case it says that it's a list of Film with (Q11424) with filming location (P915) equal to Stockholm (Q1754)
So the closest query you're looking at is
SELECT ?film
WHERE {
?film wdt:P31 wd:Q11424;
wdt:P915 wd:Q506250.
}
The API offers "Categorymembers" to get a List of pages that belong to a given category, ordered by page sort title. Parameters are documented here.
Related
I am trying to get all of the pages in given category from wikipedia, including ones in subcategories. No problem with that, but I also want certain fields from each page, like birth date.
From this topic I suppose I need to use https://wikidata.org/w/api.php and not for example https://pl.wikipedia.org/...
I assumed I should use generator, but my trouble is that with calling WikiData I get an error about bad ID, which I don't get for Wikipedia.
query.params = {
"action": "query", // placeholder for test
"generator": "categorymembers",
"gcmpageid": 1810130, // sophists'category at pl.wikipedia
"format": "json"
}
https://pl.wikipedia.org/w/api.php -> data
https://en.wikipedia.org/w/api.php -> error: nosuchpage (expected)
https://www.wikidata.org/w/api.php -> error: invalidcategory (why???)
I've tried to use that id from WikiData prefixed with "Q", but then I got badinteger
Alternatively I could make requests to Wikipedia for ids and then to WikiData, but calling two times for the same thing and handling all that ids into request...
Please help
TL;DR Using generators from Polish Wikipedia in Wikidata API does not work but other solutions exist.
A few things to note about Wikidata and its API:
Wikidata doesn't know anything about the category hierarchies on Polish Wikipedia (or on any other Wikipedia language version)
There is no API to query pages in all subcategories. This is mainly because the catgory system of MediaWiki allows cycles in the category hierarchies and infinite levels of nested categories.
pageIds are only unique within a project. So using a pageId from pl.wikipedia.org does not work on https://en.wikipedia.org/w/api.php or https://www.wikidata.org/w/api.php
There are multiple solutions to your problem:
Use the query in your question recursively to get all page titles from Kategoria:Sofiści and its subcategories.
Afterwards, use the Wikidata API to retrieve the Wikidata item for each Polish Wikipedia article: e.g. for Protagoras the query is this: https://www.wikidata.org/w/api.php?action=wbgetentities&sites=plwiki&titles=Protagoras&props=claims&format=json
This returns a json file with all statements about Protagoras stored on Wikidata. The birth data you find in that file under claims->P569->mainsnak->datavalue->value->time.
Use the Wikidat Query Service. It allows you to call out MediaWiki API from SPARQL.
SELECT ?item ?itemLabel ?date_of_birth WHERE {
SERVICE wikibase:mwapi {
bd:serviceParam wikibase:api "Generator" .
bd:serviceParam wikibase:endpoint "pl.wikipedia.org" .
bd:serviceParam mwapi:gcmtitle 'Kategoria:Sofiści' .
bd:serviceParam mwapi:generator "categorymembers" .
bd:serviceParam mwapi:gcmprop "ids|title|type" .
bd:serviceParam mwapi:gcmlimit "max" .
?item wikibase:apiOutputItem mwapi:item .
}
?item wdt:P569 ?date_of_birth
SERVICE wikibase:label { bd:serviceParam wikibase:language "pl". }
}
Insert this query on https://query.wikidata.org/. That page also offers you code examples how to access the results programmatically.
The drawback of this solution is, that pages in subcategories are not included.
Fully rely on Wikidata. Use the following query in https://query.wikidata.org/:
SELECT ?item ?itemLabel ?date_of_birth WHERE {
?item wdt:P106 wd:Q3750514.
?item wdt:P569 ?date_of_birth
SERVICE wikibase:label { bd:serviceParam wikibase:language "pl,en". }
}
First off, I'm not a developer, and I'm new to writing SPARQL queries. Mostly I've been looking up existing queries and trying to tweak them to get what I need. The issue is that most documentation on query construction have to do with getting new data you don't have, rather than retrieving or extending existing data. And when you do find tips for retrieving existing data, they tend to be for ONE item at a time instead of a full data set of many items.
I mostly use OpenRefine for this. I start by loading up my existing list of names, and used the Wikidata extension service to reconcile the names to existing Wikidata IDs. So now, this is where I am, vs. where I want to go:
1 - We have a list of Wikidata IDs for reconciled matches;
2 - We have used OpenRefine to get most of the data we need from those;
3 - We don't have the label, description, or Wikipedia links (English), which are extremely valuable;
4 - I have figured out how to construct a query for the label and description of just ONE Wikidata Item:
SELECT ?itemLabel ?itemDescription WHERE { VALUES ?item {
wd:Q15485689 } SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en". }
}
5 - I have figured out how to construct a query to extract the Wikipedia English URL for just ONE Wikidata item:
SELECT ?article ?lang ?name WHERE {
?article schema:about wd:Q15485689;
schema:inLanguage ?lang;
schema:name ?name;
schema:isPartOf _:b13.
_:b13 wikibase:wikiGroup "wikipedia".
FILTER(?lang IN("en"))
FILTER(!(CONTAINS(?name, ":")))
OPTIONAL { ?article wdt:P31 ?instance_of. }
}
The questions are:
How do I modify either query to generate these same results for MORE THAN ONE* Wikidata item?
How do I modify the query to give me all three at once, for more than one* Wikidata item?
*we have 667, but I could do smaller batches if that's too much for the service to handle
Ideally, the query would generate something that allowed me to download a CSV file looking much like this (so I can match on and import the new data into our Airtable base which feeds the website application):
ideal CSV output
If anyone can lead me in the right direction here, I'd appreciate it.
I should also note that if OpenRefine has a way of retrieving these I'm all ears! But since these three don't have a property code, I couldn't see how to snag them from OR.
This sort of thing. See how many QIds you can get away with in the values statement. All of them in one go, probably. This query gives you the URL and the article title; clearly, you can snip the article title column if you do not want it. Note also https://www.wikidata.org/wiki/Wikidata:Request_a_query which is wikidata's own location for questions such as these.
SELECT ?item ?itemLabel ?itemDescription ?sitelink ?article
WHERE
{
VALUES ?item {wd:Q105848230 wd:Q6697407 wd:Q2344502 wd:Q1698206}
OPTIONAL {
?article schema:about ?item ;
schema:isPartOf <https://en.wikipedia.org/> ;
schema:name ?sitelink .
}
SERVICE wikibase:label { bd:serviceParam wikibase:language "en". }
}
Yes, a VALUES statement in SPARQL can relay not only hundreds but even thousands of items. I regularly do this when cross-checking to see how Wikidata matches up to an existing data set. Some other things you could do as well that take lists of Wikidata items:
Petscan - https://petscan.wmflabs.org/
TABernacle - https://tabernacle.toolforge.org/
Another title could be:
Is there an inverse of "member of"?
I don't see anything like that.
I can get all the person who are member of an organization.
SELECT ?p ?pLabel WHERE {
?p wdt:P463 wd:Q3227220.
SERVICE wikibase:label { bd:serviceParam wikibase:language "en". }
}
But I can't understand the way to get the inverse:
I've got a given human person (with ID or label)
I want the organizations he is "member of"
Question 1: 1Is it possible to do it or it's a feature to "not be able" to do that?
Question 2: If it's possible, is there a wizard somewhere to find a magic peace of sheet?
I'm using Wikidata with the purpose to find, for the String "Scotland", the values of the properties "type / instance of", "subclass of" and "part of" if they exist.
For example, manually by browsing the Wikidata's website, I type Scotland, I find the ressource and those data are displayed inside it, just like : https://www.wikidata.org/wiki/Q22. Thus I can see that Scotland is an instance of "country within the United Kingdom".
What would be the equivalent query in SPARQL to do that please ?
I tried this "valid" query but it whether does not return any results or bypasses the time limit :
SELECT ?instanceOf ?subclassOf ?partOf WHERE {
?word rdfs:label ?label;
wdt:P361 ?instanceOf;
wdt:P279 ?subclassOf;
wdt:P361 ?partOf.
FILTER(CONTAINS(?label, "Scotland"))
SERVICE wikibase:label { bd:serviceParam wikibase:language "en".}
}
Try it here
If you already now Scotland is Entity Q22 on wikidata,you can use the https://www.wikidata.org/wiki/Special:EntityData/Qxxxxx.json URL to retrieve all the statements related to Scotland, without using SQARQL: https://www.wikidata.org/wiki/Special:EntityData/Q22.json.
See also: About Data in WikiData's help.
To search matching entities by a string, use the wbsearchentities REST API, for example: https://www.wikidata.org/w/api.php?action=wbsearchentities&search=scotland&language=en .
This should be fairly easy for anyone familiar with SPARQL (which I am not). I'm trying to return a qualifier/property value for "score_by" in this query and it's showing up blank:
SELECT ?item ?itemLabel ?IMDb_ID ?_review_score ?_score_by WHERE {
?item wdt:P345 "tt3315342".
SERVICE wikibase:label { bd:serviceParam wikibase:language "en". }
OPTIONAL { ?item wdt:P345 ?IMDb_ID. }
OPTIONAL { ?item wdt:P444 ?_review_score. }
OPTIONAL { ?item ps:P447 ?_score_by. }
}
Here is a link to this query
'Score by' is a tricky thing, because it qualifies a score.
Scores are complex things: they aren't just a value, but are qualified by the scorer (Rotten Tomatoes, IMDB, etc). If your query worked the answers would be misleading, since it wouldn't be clear whether ?_review_score corresponded to ?_score_by, i.e. whether the review score corresponded to the review.
(You might ask why P444 - score - is there, since without a reviewer the information isn't complete. It's a fair question. The actual property is wdt:P444, a wikidata direct property. What that means is that the property was created as a shortcut for convenience, at the expense of losing some context. They're like database views.)
The way they actually work is by 'reifying' the complex review score as a thing, an object 'the review', then hanging the information - score, reviewer etc - off that.
For example:
select * where {
wd:Q24053263 p:P444 ?review . # Get reviews for wolverine
?review ?p ?o # Get all info from the review
}
Link
You can see here that the score is there under p:statement/P444, and there's a 'qualifier' p:qualifier/P447, i.e. the reviewer.
Essentially properties in wikidata can appear in a number of guises, encoded in the prefix.
To answer your question:
OPTIONAL { ?item wdt:P444 ?_review_score. }
OPTIONAL { ?item ps:P447 ?_score_by. }
should be
OPTIONAL {
?item p:P444 ?review .
?review pq:P447 ?_score_by ; ps:P444 ?_review_score
}
Link
i.e. Treat the review as a single thing, then get the score and corresponding reviewer from that.
(If you worry that there might be scores without reviewers you could add another optional within that)