Wikidata/Wikidata-Toolkit Getting entities by SPARQL query (#362) - sparql

I am trying to understand the above issue, but still unclear about the exact requirement in the issue. Can you please describe it once again in detail?
What I can figure out is :-
This query is working on Wikidata Query Service platform
SELECT ?station WHERE {
?station wdt:P954 ?ibnr.
FILTER regex(?ibnr,"^80","i")
}
So is this -
SELECT ?station ?stationLabel WHERE {
?station wdt:P954 ?ibnr
FILTER regex(?ibnr,"^80","i")
SERVICE wikibase:label { # ... include the labels
bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en"
}
}
But, this also gives the labels associated with the station names which are extracted using Service keyword
(These two are the ones you mentioned in the comments. )
So, what exactly is the aim of the API? since the standard SPARQL queries are also working on that Wikidata Query Service platform. If possible also mention the files I should go through for the same.

Related

Can I use generators in WikiData's API?

I am trying to get all of the pages in given category from wikipedia, including ones in subcategories. No problem with that, but I also want certain fields from each page, like birth date.
From this topic I suppose I need to use https://wikidata.org/w/api.php and not for example https://pl.wikipedia.org/...
I assumed I should use generator, but my trouble is that with calling WikiData I get an error about bad ID, which I don't get for Wikipedia.
query.params = {
"action": "query", // placeholder for test
"generator": "categorymembers",
"gcmpageid": 1810130, // sophists'category at pl.wikipedia
"format": "json"
}
https://pl.wikipedia.org/w/api.php -> data
https://en.wikipedia.org/w/api.php -> error: nosuchpage (expected)
https://www.wikidata.org/w/api.php -> error: invalidcategory (why???)
I've tried to use that id from WikiData prefixed with "Q", but then I got badinteger
Alternatively I could make requests to Wikipedia for ids and then to WikiData, but calling two times for the same thing and handling all that ids into request...
Please help
TL;DR Using generators from Polish Wikipedia in Wikidata API does not work but other solutions exist.
A few things to note about Wikidata and its API:
Wikidata doesn't know anything about the category hierarchies on Polish Wikipedia (or on any other Wikipedia language version)
There is no API to query pages in all subcategories. This is mainly because the catgory system of MediaWiki allows cycles in the category hierarchies and infinite levels of nested categories.
pageIds are only unique within a project. So using a pageId from pl.wikipedia.org does not work on https://en.wikipedia.org/w/api.php or https://www.wikidata.org/w/api.php
There are multiple solutions to your problem:
Use the query in your question recursively to get all page titles from Kategoria:Sofiści and its subcategories.
Afterwards, use the Wikidata API to retrieve the Wikidata item for each Polish Wikipedia article: e.g. for Protagoras the query is this: https://www.wikidata.org/w/api.php?action=wbgetentities&sites=plwiki&titles=Protagoras&props=claims&format=json
This returns a json file with all statements about Protagoras stored on Wikidata. The birth data you find in that file under claims->P569->mainsnak->datavalue->value->time.
Use the Wikidat Query Service. It allows you to call out MediaWiki API from SPARQL.
SELECT ?item ?itemLabel ?date_of_birth WHERE {
SERVICE wikibase:mwapi {
bd:serviceParam wikibase:api "Generator" .
bd:serviceParam wikibase:endpoint "pl.wikipedia.org" .
bd:serviceParam mwapi:gcmtitle 'Kategoria:Sofiści' .
bd:serviceParam mwapi:generator "categorymembers" .
bd:serviceParam mwapi:gcmprop "ids|title|type" .
bd:serviceParam mwapi:gcmlimit "max" .
?item wikibase:apiOutputItem mwapi:item .
}
?item wdt:P569 ?date_of_birth
SERVICE wikibase:label { bd:serviceParam wikibase:language "pl". }
}
Insert this query on https://query.wikidata.org/. That page also offers you code examples how to access the results programmatically.
The drawback of this solution is, that pages in subcategories are not included.
Fully rely on Wikidata. Use the following query in https://query.wikidata.org/:
SELECT ?item ?itemLabel ?date_of_birth WHERE {
?item wdt:P106 wd:Q3750514.
?item wdt:P569 ?date_of_birth
SERVICE wikibase:label { bd:serviceParam wikibase:language "pl,en". }
}

Filter search in Wikidata API

I am trying to get filtered data from the Wikidata API, currently I can do a general search using this API, but now there have been specific cases where I have to filter this information, for example, I need to get a list of only authors to get the Q identifier and although I also reviewed the Wikidata Query Service this is too heavy to bring all the items, I used a SPARQL query and did a test and to get less than 3000 results it took 26 seconds, this is too much for a search service.
This is the query I use to get the authors.
SELECT DISTINCT ?author ?authorLabel WITH {
SELECT ?item ?author WHERE {
?item wdt:P50 ?author.
} LIMIT 100000
} AS %FOO {
INCLUDE %FOO
SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en". }
}
I also need to search by categories but it has not been possible for me to filter the searches in any way, does anyone know a way to do it?

How to construct SPARQL query for a list of Wikidata items

First off, I'm not a developer, and I'm new to writing SPARQL queries. Mostly I've been looking up existing queries and trying to tweak them to get what I need. The issue is that most documentation on query construction have to do with getting new data you don't have, rather than retrieving or extending existing data. And when you do find tips for retrieving existing data, they tend to be for ONE item at a time instead of a full data set of many items.
I mostly use OpenRefine for this. I start by loading up my existing list of names, and used the Wikidata extension service to reconcile the names to existing Wikidata IDs. So now, this is where I am, vs. where I want to go:
1 - We have a list of Wikidata IDs for reconciled matches;
2 - We have used OpenRefine to get most of the data we need from those;
3 - We don't have the label, description, or Wikipedia links (English), which are extremely valuable;
4 - I have figured out how to construct a query for the label and description of just ONE Wikidata Item:
SELECT ?itemLabel ?itemDescription WHERE { VALUES ?item {
wd:Q15485689 } SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en". }
}
5 - I have figured out how to construct a query to extract the Wikipedia English URL for just ONE Wikidata item:
SELECT ?article ?lang ?name WHERE {
?article schema:about wd:Q15485689;
schema:inLanguage ?lang;
schema:name ?name;
schema:isPartOf _:b13.
_:b13 wikibase:wikiGroup "wikipedia".
FILTER(?lang IN("en"))
FILTER(!(CONTAINS(?name, ":")))
OPTIONAL { ?article wdt:P31 ?instance_of. }
}
The questions are:
How do I modify either query to generate these same results for MORE THAN ONE* Wikidata item?
How do I modify the query to give me all three at once, for more than one* Wikidata item?
*we have 667, but I could do smaller batches if that's too much for the service to handle
Ideally, the query would generate something that allowed me to download a CSV file looking much like this (so I can match on and import the new data into our Airtable base which feeds the website application):
ideal CSV output
If anyone can lead me in the right direction here, I'd appreciate it.
I should also note that if OpenRefine has a way of retrieving these I'm all ears! But since these three don't have a property code, I couldn't see how to snag them from OR.
This sort of thing. See how many QIds you can get away with in the values statement. All of them in one go, probably. This query gives you the URL and the article title; clearly, you can snip the article title column if you do not want it. Note also https://www.wikidata.org/wiki/Wikidata:Request_a_query which is wikidata's own location for questions such as these.
SELECT ?item ?itemLabel ?itemDescription ?sitelink ?article
WHERE
{
VALUES ?item {wd:Q105848230 wd:Q6697407 wd:Q2344502 wd:Q1698206}
OPTIONAL {
?article schema:about ?item ;
schema:isPartOf <https://en.wikipedia.org/> ;
schema:name ?sitelink .
}
SERVICE wikibase:label { bd:serviceParam wikibase:language "en". }
}
Yes, a VALUES statement in SPARQL can relay not only hundreds but even thousands of items. I regularly do this when cross-checking to see how Wikidata matches up to an existing data set. Some other things you could do as well that take lists of Wikidata items:
Petscan - https://petscan.wmflabs.org/
TABernacle - https://tabernacle.toolforge.org/

How to fetch data by giving page URL in Wikidata Query Service (SPARQL)

I am trying to get some lists of data from Wikipedia. So I am using Wikidata SPARQL query service. What I need to know is how can I fetch the data from the URL (Wikipedia page URL). Currently I am searching from name. Following is my query,
SELECT DISTINCT ?item ?itemLabel ?birthLocation ?birthLocationLabel WHERE {
?item (wdt:P31|wdt:P101|wdt:P106)/wdt:P279* wd:Q482980 ;
rdfs:label "Enid Blyton"#en ;
wdt:P19 ?birthLocation
SERVICE wikibase:label { bd:serviceParam wikibase:language "en". }
}
I need to search by URL instead of name. How can I achieve this? And also is that possible to get the sub links from the URL. I tried by I couldn't find any way to get them. So anybody can tell me what I am doing wrong here? I highly appreciate that. Thank you.

Wikimedia Commons: Get names of sub-categories (using SPARQL or MediaWiki API)

Given a specific category (i.e. https://commons.wikimedia.org/wiki/Category:Motorcycles) I want to get names of all sub-categories recursively, either in SPARQL:
SELECT ?category ?entityLabel WHERE {
SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en". }
#get sub categories of category wd:Q7025402
}
LIMIT 10000
or using MediaWiki API:
https://commons.wikimedia.org/w/api.php?{get all subcategories of Category:Motorcycles}
Is there a way to do this?
API:
https://commons.wikimedia.org/w/api.php?action=query&list=categorymembers&cmtitle=Category:Motorcycles&cmtype=subcat&utf8=1&format=json
SQL:
https://quarry.wmflabs.org/query/28793
(via Quarry tool or directly if you have an account on Toolforge)
But recursive only via PetScan or manually via API/SQL through chain of queries (query for each category, where subcats is not 0):
https://commons.wikimedia.org/w/api.php?action=query&generator=categorymembers&gcmtitle=Category:Motorcycles&gcmtype=subcat&prop=categoryinfo&utf8=1&format=json
https://quarry.wmflabs.org/query/28794
SPARQL
As of July 2018, the structure of Wikimedia Commons categories is not covered by Wikidata:
Exception is Commons, which has by far the largest set of categories and thus we decided not to cover it for now, until we ensure everything works as planned with smaller data sets.
MediaWiki API
Not possible, see T37402.
Alternatives
Use the PetScan tool:
HTML output
JSON output