SPARQL - How to find every entity inside two categories and its subcategories - sparql

I'm trying to find every public university in Brazil, using Wikidata. The problem is that some of them are categorized as instance of university (Q3918) and public educational institution (Q23002037), but others are categorized as public university (Q875538), and some might be under different categories even.
So I figured if I got every entity in a subclass of public educational institution (Q23002037) and in a subclass of public educational institution (Q23002037), I'd get all the entities I needed. So I tried this, with some optimization:
SELECT ?uni ?uniLabel
WHERE {
hint:Query hint:optimizer "None".
?uni wdt:P17+ wd:Q155;
wdt:P31/wdt:P279 wd:Q38723;
wdt:P31/wdt:P279 wd:Q23002037.
SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],pt". }
}
Run this code here
However, this only returns 15 entities. One of the entities missing is Federal University of Cariri (Q10387824), which is an entity of university and of public educational institution so it should have appeared in the query results. Can anyone help me understand what's going on, and why that entity and so many others do not show up in my results?
Thank you in advance. I'm very new to SPARQL and Wikidata Queries.

I'm not sure how this works, but I used the Wikidata Query Builder and it worked. I got the entities I wanted in the categories and subcategories I defined.
SELECT DISTINCT ?item ?itemLabel
WHERE {
SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE]". }
{
SELECT DISTINCT ?item WHERE {
?item p:P31 ?statement0.
?statement0 (ps:P31/(wdt:P279*)) wd:Q38723.
?item p:P31 ?statement1.
?statement1 (ps:P31/(wdt:P279*)) wd:Q23002037.
?item p:P17 ?statement2.
?statement2 (ps:P17) wd:Q155.
}
LIMIT 100
}
}
Run this code here
So yeah. Use the Wikidata Query Builder, then open it as a Wikidata Query Service, and click the (i) "Show query explanation" button, and you can choose what information to display.

Related

Retrieving all living people from wikipedia/wikidata

I have the following query from the WikiData query builder to fetch people, but I would like to fetch only people who are alive. I've tried playing around with date of death but I can't get anything that seems to work, I'm not very familiar with SPARQL.
SELECT DISTINCT ?item ?itemLabel WHERE {
SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE]". }
{
SELECT DISTINCT ?item WHERE {
?item p:P31 ?statement0.
?statement0 (ps:P31/(wdt:P279*)) wd:Q215627.
}
LIMIT 100
}
}
https://query.wikidata.org
I'd optimally want to return the top 50,000 living people sorted by some sort of metric of popularity. If there's a way to do this by API or some other pre-built wrapper around Wikidata I'd appreciate that as well. Thanks!

Return "instances of" property for a wikidata item using SPARQL

I have a list of wikidata items I wish to extract the "instance of" property from. For example, looking up Q1339 I can see that it has a single instance type (P:31) labelled "human" (Q5). I have tried to write a simple query that would extract that but I am not getting any records returned. I am v. new to SPARQL so it's very likely I'm missing something obvious.
SELECT ?item ?itemLabel
WHERE
{
?item wdt:P31 wd:Q1339.
SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE]". }
}

Can I use generators in WikiData's API?

I am trying to get all of the pages in given category from wikipedia, including ones in subcategories. No problem with that, but I also want certain fields from each page, like birth date.
From this topic I suppose I need to use https://wikidata.org/w/api.php and not for example https://pl.wikipedia.org/...
I assumed I should use generator, but my trouble is that with calling WikiData I get an error about bad ID, which I don't get for Wikipedia.
query.params = {
"action": "query", // placeholder for test
"generator": "categorymembers",
"gcmpageid": 1810130, // sophists'category at pl.wikipedia
"format": "json"
}
https://pl.wikipedia.org/w/api.php -> data
https://en.wikipedia.org/w/api.php -> error: nosuchpage (expected)
https://www.wikidata.org/w/api.php -> error: invalidcategory (why???)
I've tried to use that id from WikiData prefixed with "Q", but then I got badinteger
Alternatively I could make requests to Wikipedia for ids and then to WikiData, but calling two times for the same thing and handling all that ids into request...
Please help
TL;DR Using generators from Polish Wikipedia in Wikidata API does not work but other solutions exist.
A few things to note about Wikidata and its API:
Wikidata doesn't know anything about the category hierarchies on Polish Wikipedia (or on any other Wikipedia language version)
There is no API to query pages in all subcategories. This is mainly because the catgory system of MediaWiki allows cycles in the category hierarchies and infinite levels of nested categories.
pageIds are only unique within a project. So using a pageId from pl.wikipedia.org does not work on https://en.wikipedia.org/w/api.php or https://www.wikidata.org/w/api.php
There are multiple solutions to your problem:
Use the query in your question recursively to get all page titles from Kategoria:Sofiści and its subcategories.
Afterwards, use the Wikidata API to retrieve the Wikidata item for each Polish Wikipedia article: e.g. for Protagoras the query is this: https://www.wikidata.org/w/api.php?action=wbgetentities&sites=plwiki&titles=Protagoras&props=claims&format=json
This returns a json file with all statements about Protagoras stored on Wikidata. The birth data you find in that file under claims->P569->mainsnak->datavalue->value->time.
Use the Wikidat Query Service. It allows you to call out MediaWiki API from SPARQL.
SELECT ?item ?itemLabel ?date_of_birth WHERE {
SERVICE wikibase:mwapi {
bd:serviceParam wikibase:api "Generator" .
bd:serviceParam wikibase:endpoint "pl.wikipedia.org" .
bd:serviceParam mwapi:gcmtitle 'Kategoria:Sofiści' .
bd:serviceParam mwapi:generator "categorymembers" .
bd:serviceParam mwapi:gcmprop "ids|title|type" .
bd:serviceParam mwapi:gcmlimit "max" .
?item wikibase:apiOutputItem mwapi:item .
}
?item wdt:P569 ?date_of_birth
SERVICE wikibase:label { bd:serviceParam wikibase:language "pl". }
}
Insert this query on https://query.wikidata.org/. That page also offers you code examples how to access the results programmatically.
The drawback of this solution is, that pages in subcategories are not included.
Fully rely on Wikidata. Use the following query in https://query.wikidata.org/:
SELECT ?item ?itemLabel ?date_of_birth WHERE {
?item wdt:P106 wd:Q3750514.
?item wdt:P569 ?date_of_birth
SERVICE wikibase:label { bd:serviceParam wikibase:language "pl,en". }
}

SPARQL Wikidata request string content for getting all the organizations in which a human person is member of?

Another title could be:
Is there an inverse of "member of"?
I don't see anything like that.
I can get all the person who are member of an organization.
SELECT ?p ?pLabel WHERE {
?p wdt:P463 wd:Q3227220.
SERVICE wikibase:label { bd:serviceParam wikibase:language "en". }
}
But I can't understand the way to get the inverse:
I've got a given human person (with ID or label)
I want the organizations he is "member of"
Question 1: 1Is it possible to do it or it's a feature to "not be able" to do that?
Question 2: If it's possible, is there a wizard somewhere to find a magic peace of sheet?

Wikidata SPARQL Query Qualifier Value

This should be fairly easy for anyone familiar with SPARQL (which I am not). I'm trying to return a qualifier/property value for "score_by" in this query and it's showing up blank:
SELECT ?item ?itemLabel ?IMDb_ID ?_review_score ?_score_by WHERE {
?item wdt:P345 "tt3315342".
SERVICE wikibase:label { bd:serviceParam wikibase:language "en". }
OPTIONAL { ?item wdt:P345 ?IMDb_ID. }
OPTIONAL { ?item wdt:P444 ?_review_score. }
OPTIONAL { ?item ps:P447 ?_score_by. }
}
Here is a link to this query
'Score by' is a tricky thing, because it qualifies a score.
Scores are complex things: they aren't just a value, but are qualified by the scorer (Rotten Tomatoes, IMDB, etc). If your query worked the answers would be misleading, since it wouldn't be clear whether ?_review_score corresponded to ?_score_by, i.e. whether the review score corresponded to the review.
(You might ask why P444 - score - is there, since without a reviewer the information isn't complete. It's a fair question. The actual property is wdt:P444, a wikidata direct property. What that means is that the property was created as a shortcut for convenience, at the expense of losing some context. They're like database views.)
The way they actually work is by 'reifying' the complex review score as a thing, an object 'the review', then hanging the information - score, reviewer etc - off that.
For example:
select * where {
wd:Q24053263 p:P444 ?review . # Get reviews for wolverine
?review ?p ?o # Get all info from the review
}
Link
You can see here that the score is there under p:statement/P444, and there's a 'qualifier' p:qualifier/P447, i.e. the reviewer.
Essentially properties in wikidata can appear in a number of guises, encoded in the prefix.
To answer your question:
OPTIONAL { ?item wdt:P444 ?_review_score. }
OPTIONAL { ?item ps:P447 ?_score_by. }
should be
OPTIONAL {
?item p:P444 ?review .
?review pq:P447 ?_score_by ; ps:P444 ?_review_score
}
Link
i.e. Treat the review as a single thing, then get the score and corresponding reviewer from that.
(If you worry that there might be scores without reviewers you could add another optional within that)