Pass continue param to MWAPI inside Wikidata query - sparql

Is it possible to pass the continue parameter to a MWAPI call inside a Wikidata SPARQL query?
For instance, I have this query which uses the MWAPI EntitySearch and that always returns at most 50 results. I want to be able to set continue hoping that I can get n result sets, each containing max. 50 entities:
SELECT ?item ?itemLabel WHERE {
SERVICE wikibase:mwapi {
bd:serviceParam wikibase:endpoint "www.wikidata.org";
wikibase:api "EntitySearch";
mwapi:search "York";
mwapi:language "en".
?item wikibase:apiOutputItem mwapi:item.
}
SERVICE wikibase:label {bd:serviceParam wikibase:language "en".}
}

The API does multiple queries to the MWAPI using the continue mechanism by default, so you don’t have to do anything, theoretically.
The two parameters you can twiddle are
bd:serviceParam wikibase:limit 10 .
…which sets the size of each call to the API, it won’t change much because it will just make more/less calls, and…
bd:serviceParam wikibase:limit "once" .
…which disables continuation.
To start fetching from somewhere in the middle, sort by QID or some other value from the data and add FILTER(?qid > y) as appropriate.

Related

How to search a list of Wikidata IDs and return the P31 ("instance of") property using SPARQL?

How do I get the instance type(s) (i.e., property=P31 and associated labels) for multiple Wikidata IDs in a single query? Ideally, I want to output a list with the columns: Wikidata ID | P31 ID | P31 Label, with multiple rows used if a Wikidata ID has more than one P31 attached.
I am using the web query service, which works well in part, but I am struggling to understand the syntax. I have so far managed to work out how to process a list of items, and return each one as a row (simple I know!), but I can't work out how to generate a new column that gives the P31 item:
SELECT ?item
WHERE {
VALUES ?item { wd:Q1347065 wd:Q731635 wd:Q105492052 }
SERVICE wikibase:label { bd:serviceParam wikibase:language "en". }
}
I have found the following from a previusly answered question here, which returns multiple rows per an item of interest, but this requires specifying the P31 type at the outset, which is what I am looking to generate.
Any help would be appreciated as I am really stuck understanding the syntax.
Update:
I have now worked out how to return P31s for a single ID. I need to expand this query to receive a list of IDs, and include the ID as a column:
SELECT ?item ?itemLabel
WHERE
{
wd:Q18656 wdt:P31 ?item.
SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE]". }
}
If I correctly understood your problem, you can use the following query:
SELECT ?item ?class ?classLabel
WHERE {
VALUES ?item { wd:Q1347065 wd:Q731635 wd:Q105492052 }
?item wdt:P31 ?class .
SERVICE wikibase:label { bd:serviceParam wikibase:language "en". }
}
Here, first you fix the possible values for ?item, then you say that ?item is instance of a certain ?class and contestually you also retrieve the label for such ?class.

Return "instances of" property for a wikidata item using SPARQL

I have a list of wikidata items I wish to extract the "instance of" property from. For example, looking up Q1339 I can see that it has a single instance type (P:31) labelled "human" (Q5). I have tried to write a simple query that would extract that but I am not getting any records returned. I am v. new to SPARQL so it's very likely I'm missing something obvious.
SELECT ?item ?itemLabel
WHERE
{
?item wdt:P31 wd:Q1339.
SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE]". }
}

Can I use generators in WikiData's API?

I am trying to get all of the pages in given category from wikipedia, including ones in subcategories. No problem with that, but I also want certain fields from each page, like birth date.
From this topic I suppose I need to use https://wikidata.org/w/api.php and not for example https://pl.wikipedia.org/...
I assumed I should use generator, but my trouble is that with calling WikiData I get an error about bad ID, which I don't get for Wikipedia.
query.params = {
"action": "query", // placeholder for test
"generator": "categorymembers",
"gcmpageid": 1810130, // sophists'category at pl.wikipedia
"format": "json"
}
https://pl.wikipedia.org/w/api.php -> data
https://en.wikipedia.org/w/api.php -> error: nosuchpage (expected)
https://www.wikidata.org/w/api.php -> error: invalidcategory (why???)
I've tried to use that id from WikiData prefixed with "Q", but then I got badinteger
Alternatively I could make requests to Wikipedia for ids and then to WikiData, but calling two times for the same thing and handling all that ids into request...
Please help
TL;DR Using generators from Polish Wikipedia in Wikidata API does not work but other solutions exist.
A few things to note about Wikidata and its API:
Wikidata doesn't know anything about the category hierarchies on Polish Wikipedia (or on any other Wikipedia language version)
There is no API to query pages in all subcategories. This is mainly because the catgory system of MediaWiki allows cycles in the category hierarchies and infinite levels of nested categories.
pageIds are only unique within a project. So using a pageId from pl.wikipedia.org does not work on https://en.wikipedia.org/w/api.php or https://www.wikidata.org/w/api.php
There are multiple solutions to your problem:
Use the query in your question recursively to get all page titles from Kategoria:Sofiści and its subcategories.
Afterwards, use the Wikidata API to retrieve the Wikidata item for each Polish Wikipedia article: e.g. for Protagoras the query is this: https://www.wikidata.org/w/api.php?action=wbgetentities&sites=plwiki&titles=Protagoras&props=claims&format=json
This returns a json file with all statements about Protagoras stored on Wikidata. The birth data you find in that file under claims->P569->mainsnak->datavalue->value->time.
Use the Wikidat Query Service. It allows you to call out MediaWiki API from SPARQL.
SELECT ?item ?itemLabel ?date_of_birth WHERE {
SERVICE wikibase:mwapi {
bd:serviceParam wikibase:api "Generator" .
bd:serviceParam wikibase:endpoint "pl.wikipedia.org" .
bd:serviceParam mwapi:gcmtitle 'Kategoria:Sofiści' .
bd:serviceParam mwapi:generator "categorymembers" .
bd:serviceParam mwapi:gcmprop "ids|title|type" .
bd:serviceParam mwapi:gcmlimit "max" .
?item wikibase:apiOutputItem mwapi:item .
}
?item wdt:P569 ?date_of_birth
SERVICE wikibase:label { bd:serviceParam wikibase:language "pl". }
}
Insert this query on https://query.wikidata.org/. That page also offers you code examples how to access the results programmatically.
The drawback of this solution is, that pages in subcategories are not included.
Fully rely on Wikidata. Use the following query in https://query.wikidata.org/:
SELECT ?item ?itemLabel ?date_of_birth WHERE {
?item wdt:P106 wd:Q3750514.
?item wdt:P569 ?date_of_birth
SERVICE wikibase:label { bd:serviceParam wikibase:language "pl,en". }
}

Select members of parliament with SPARQL from Wikidata

Based on wikidata I want to make a list of all members of the European Parliament and I want some metadata about their membership like start date and the party they represent.
As a start I run the following query:
SELECT ?human ?humanLabel ?positionheldLabel
WHERE
{
# human position_held MembEuroParl
?human wdt:P39 wd:Q27169;
wdt:P39 ?positionheld.
SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE]". }
}
This returns a list of people who once were member of european parliament and the positions they held. However also if those positions were not Member of parliament. See image.
So I change the query to the following adding a line that the positionheld sould be member of parliament:
SELECT ?human ?humanLabel ?positionheld
WHERE
{
# ?human position_held MEP
?human wdt:P39 wd:Q27169;
wdt:P39 ?positionheld.
?positionheld wdt:P31 wd:Q27169.
SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE]". }
}
However, this last query does not return any results. Besides, it feels repetitive
My question how can I select only those rows where positionheld is member of parliament.
The answer is probably trivial, yet currently I am left clueless. Something I wanted to take 1 minutes is taking an hour.

How to get Wikidata ID for DBpedia Entities?

I have a set of DBpedia concepts and would like to get the corresponding wikidata IDs of them. For example, consider word2vec. The wikidata ID of word2vec is wd:Q22673982.
Currently, I am doing it as follows.
SELECT * {
VALUES ?searchTerm { "word2vec" "fasttext" "natural language processing" "deep learning" "support vector machine" }
SERVICE wikibase:mwapi {
bd:serviceParam wikibase:api "EntitySearch".
bd:serviceParam wikibase:endpoint "www.wikidata.org".
bd:serviceParam wikibase:limit 10 .
bd:serviceParam mwapi:search ?searchTerm.
bd:serviceParam mwapi:language "en".
?item wikibase:apiOutputItem mwapi:item.
?num wikibase:apiOrdinal true.
}
?item (wdt:P279|wdt:P31) ?type
}
ORDER BY ?searchTerm ?num
However, I noted that when I do it this way, most of my terms do not get a wikidata ID.
Therefore, I would like to know;
Are all DBpedia concepts associated with its relevent wikidata ID?
How to get the wikidata ID associated with DBpedia using sparql?
I am happy to provide more details if needed.
I used the following SPARQL query to solve my issue:
SELECT distinct ?wikidata_concept
WHERE {dbr:Word2vec owl:sameAs ?wikidata_concept}
LIMIT 100